赖床睡不着容易产生“垃圾睡眠”

Question

I am trying to write a simple tokenizer for a basic arithmetic calculator. Here's the code:

// Example program
#include <iostream>
#include <regex>
#include <string>
#include <vector>
#include <map>
#include <utility>

std::string operandRegex = "^(-?)(0|([1-9][0-9]*))(\\.[0-9]+)?";
enum Operator { ADD = 0, SUBTRACT = 1, MULTIPLY = 2, DIVIDE = 3 };
static const std::map<std::string, enum Operator> operatorRegexMap = {
    std::make_pair("^\\+", Operator::ADD),
    std::make_pair("^\\-", Operator::SUBTRACT),
    std::make_pair("^\\*", Operator::MULTIPLY),
    std::make_pair("^/", Operator::DIVIDE),
};

std::vector<std::string> tokenize(const std::string &expression) {
  std::vector<std::string> result;

  std::string::const_iterator searchStart(expression.cbegin());

  while (searchStart != expression.cend()) {
    std::regex re = std::regex{operandRegex};
    std::smatch match;
    // Operand
    if (regex_search(searchStart, expression.cend(), match, re)) {
      searchStart = match.suffix().first;
      result.push_back(match.str(0));
    } else {
      // Operator
      bool noMatch = true;
      for (const auto &re_op : operatorRegexMap) {
        std::regex re = std::regex{re_op.first};
        if (regex_search(searchStart, expression.cend(), match, re)) {
          searchStart = match.suffix().first;
          result.push_back(match.str(0));
          noMatch = false;
          break;
        }
      }
      if (noMatch) {
        break;
      }
    }
  }

  return result;
}

void print(const std::vector<std::string> &tokens) {
    for (const auto &token : tokens) {
        std::cout << token << std::endl;
    }
}

int main() {
    std::string expression;
    std::cout << "Input Expression: ";
    std::getline(std::cin, expression);
    std::vector<std::string> tokens = tokenize(expression);
    std::cout << "Parsed: " << std::endl;
    print(tokens);
    return 0;
}

I am getting issue with associativity of operator vs operand:

Input Expression: 2-3+5
Parsed: 
2
-3
+
5

These are unary operators, and I could fix it. But what if I wanted to experiment with more complex operators, say, pow() function or integral(function, low, high). How do I change the code to make it into a better design, with emphasis on SOLID principles?

J_H · Accepted Answer · 2025-08-06 16:48:34Z

There's a reason many folks have found lex / yacc (flex / bison) to be helpful. But let's assume we wish to reinvent some aspects of such compiler technology.

specification

The most important part of any program's design, especially for a parser, is to write down exactly what it does. Here, it's necessary to describe a particular language (a grammar) that it accepts.

Many designers will choose to give a concise and precise description using EBNF. With that accomplished, you'd be in a good position to mechanically translate by hand to an RD parser, and could easily trace from functions in the C++ implementation back to particular EBNF production rules, and vice versa.

automated test suite

Writing up a formal specification certainly has instructional value to end users of your language, at an abstract level. But test cases, which are known to run Green, also have enormous instructional value, at a more concrete level. The OP would benefit from the addition of such a test suite.

unusual inputs

This is a valid input to the OP code, though it's not clear that all developers who read the Review Context prose would similarly agree it should be valid.

-0 (different from the quite usual -0.0)

Absent a Specification document, it's not clear these should be marked "invalid":

07
007
7.

We also prohibit +7, which is perhaps nice enough; certainly it makes x+y simpler to parse. Arguably the + is redundant; but then, so is the - in -0, given that we just get 0.

Summary: if there are input language aspects you wish to enforce, you might prefer to do that at a higher level than a regex or EBNF production.

regex vs. strcmp

    std::make_pair("^\\+", Operator::ADD),
    std::make_pair("^\\-", Operator::SUBTRACT),
    std::make_pair("^\\*", Operator::MULTIPLY),
    std::make_pair("^/", Operator::DIVIDE),

All of the OP operators are represented with a single character, which is easily compared. If you feel you'll need to support longer operators, you might as well introduce ** exponentiation now, to at least justify the character versus string decision. Even then the use of regex seems a bit heavy handed.

Presumably the integral(fn, low, high) use case motivates the use of regexes here. But if we listen to YAGNI, it would be better to get some unit tests showing Green right now with a simple technique, and refactor when the time comes to support such function call syntax. By then, you may find that RD or another approach is the better fit for your project's evolving needs.

Reinderien · Accepted Answer · 2025-08-06 16:54:47Z

Set operandRegex as constexpr. It does not benefit from being a std::string and can simply be a const char *. Refer to the constructor for regex.

operatorRegexMap may also be constexpr as of C++26.

re should not be constructed on the inside of the loop. It should be constructed outside of the loop, either in function scope or as a const global. This applies to both token pattern kinds.

For Operator use enum class instead of enum.

The logic of tokenize does not seem correct to me. It tries, in a stateless manner, to match either an operand or an operator on any token. Instead, you should alternate between matching an operand and matching an operator, assuming that all operators are binary. If you also need to support unary operators, you need exactly one binary operator between operands, and zero or more unary operators before each operand.

Since your module doesn't export anything, all global symbols other than main() should be in an anonymous namespace { }. It's a long story, but I find it less repetitive than declaring static on everything.

宫缩什么感觉	社保指的是什么	部级是什么级别	南辕北辙是什么意思	为什么屎是黑色的
小五行属性是什么	人的反义词是什么	哈欠是什么意思	金多水浊什么意思	主理人是什么意思
1909年属什么生肖	为什么月经来是黑色的	北京有什么好吃的	21速和24速有什么区别	上唇肿胀是什么原因
梦见老婆出轨是什么预兆	五心烦热失眠手脚心发热吃什么药	力争是什么意思	s925银是什么意思	路上行人匆匆过是什么歌

雍正为什么只在位13年wuhaiwuya.com	中医讲肾主什么hcv8jop2ns0r.cn	一什么水hcv8jop5ns5r.cn	姨妈安全期是什么时候wzqsfys.com	皮下囊肿挂什么科hcv8jop4ns1r.cn
昀字五行属什么hkuteam.com	热玛吉是什么hcv7jop5ns6r.cn	为什么晚上睡觉会磨牙hcv8jop8ns9r.cn	什么是提肛运动hcv8jop5ns3r.cn	出单是什么意思hcv8jop7ns4r.cn
女性睾酮高意味着什么hcv8jop0ns9r.cn	睡觉做梦多是什么原因hcv8jop8ns3r.cn	小孩办身份证需要什么材料hcv9jop5ns2r.cn	肾外肾盂是什么意思hcv9jop2ns3r.cn	夹生是什么意思hanqikai.com
什么粉底液最好用hcv7jop9ns6r.cn	吃维生素b6有什么好处和副作用hcv8jop4ns0r.cn	3月18号是什么星座hcv7jop4ns6r.cn	拔罐之后要注意什么hcv8jop8ns8r.cn	光绪帝叫什么名字hcv8jop4ns6r.cn

Stack Exchange Network

赖床睡不着容易产生“垃圾睡眠”

2 Answers 2

specification

automated test suite

unusual inputs

regex vs. strcmp

Your Answer

Hot Network Questions

赖床睡不着容易产生“垃圾睡眠”

2 Answers 2

specification

automated test suite

unusual inputs

regex vs. strcmp

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions