自動機是文本匹配文本解析的利器,這裏仿造參考文獻[1],實現一個ini配置文件解析器,狀態機在處理文本解析的工作過程是這樣的,不斷讀取輸入的字符,根據當前的狀態對字符進行處理,處理的過程主要包括狀態的轉換等動作,知道處理完畢全部的輸入字符。
一般ini文件格式如下:
;this is comment
[section1]
aa = 1
bb = 2
[section2]
cc = 3
dd = 4
在ini文件解析的過程中,共涉及到一下幾個狀態:
開始狀態:是一初始的狀態
SectionState:進入到某個section label的狀態
KeyState:進入到處理key的狀態
ValueState:進入到處理value的狀態
CommentStae:進入註釋狀態
狀態轉換過程爲:
開始狀態:
讀入'[',進入SectionState
讀入字母數字字符,進入KeyState
讀入';',進入CommentState
SectionState狀態:
讀入']',返回開始狀態
KeyState狀態:
讀入'=',截取key,並進入ValueState狀態
ValueState狀態:
讀入‘\n',截取value,並進入初始狀態;
CommentState狀態:
讀入'\n',進入初始狀態
下面是完整程序:
#include <stdio.h>
#include <map>
#include <string>
bool IsAlphabet(char c) {
if (c >= 'a' && c <= 'z' ||
c >= 'A' && c <= 'Z' ||
c >= '0' && c <= '9')
return true;
else
return false;
}
bool IsCommentStart(char c) {
if (c == ';' || c == '#') {
return true;
} else {
return false;
}
}
bool IsSectionLabelStart(char c) {
if (c == '[') {
return true;
} else {
return false;
}
}
bool IsSectionLabelEnd(char c) {
if (c == ']') {
return true;
} else {
return false;
}
}
bool IsKeyEnd(char c) {
if (c == '=') {
return true;
} else {
return false;
}
}
bool IsValueEnd(char c) {
if (c == '\n') {
return true;
} else {
return false;
}
}
bool IsCommentEnd(char c) {
if (c == '\n') {
return true;
} else {
return false;
}
}
bool ParseInit(const std::string& init_buffer, std::map<std::string, std::string>* properties) {
enum ParseState {
StartState,
SectionLabelState,
KeyState,
ValueState,
CommentState
};
int offset = 0;
int start_offset;
std::string key;
std::string value;
ParseState parse_state = StartState;
while (offset < init_buffer.size()) {
switch (parse_state) {
case StartState:
if (IsSectionLabelStart(init_buffer[offset])) {
parse_state = SectionLabelState;
break;
}
if (IsAlphabet(init_buffer[offset])) {
parse_state = KeyState;
start_offset = offset;
break;
}
if (IsCommentStart(init_buffer[offset])) {
parse_state = CommentState;
break;
}
break;
case SectionLabelState:
if (IsSectionLabelEnd(init_buffer[offset])) {
parse_state = StartState;
break;
}
break;
case KeyState:
if (IsKeyEnd(init_buffer[offset])) {
parse_state = ValueState;
key = init_buffer.substr(start_offset, offset - start_offset);
start_offset = offset + 1;
break;
}
break;
case ValueState:
if (IsValueEnd(init_buffer[offset])) {
parse_state = StartState;
value = init_buffer.substr(start_offset, offset - start_offset);
(*properties)[key] = value;
break;
}
break;
case CommentState:
if (IsCommentEnd(init_buffer[offset])) {
parse_state = StartState;
break;
}
break;
default:
break;
}
offset++;
}
if (parse_state == ValueState) {
value = init_buffer.substr(start_offset, offset - start_offset + 1);
(*properties)[key] = value;
}
}
int main(int argc, char** argv) {
std::string init_buffer= " [section1] aa = 1 \n bb = 2 \n [section2] \n cc = 3 \n [section3] \n dd = 4 \n ff = 5\n";
std::map<std::string, std::string> properties;
ParseInit(init_buffer, &properties);
std::map<std::string, std::string>::iterator it = properties.begin();
for (; it != properties.end(); ++it) {
printf("key: %s, value %s \n", it->first.c_str(), it->second.c_str());
}
}
爲了提供足夠的靈活性,我們爲條件的判斷使用函數來封裝,使得修改更加方便。
參考文獻
[1]系統程序員成長計劃 P188