基於‘DFA’的使用‘表驅動法’識別‘字符串模式’的方法

一、概述

1. 術語簡介

1）DFA

Deterministic Finite Automata，確定的有窮自動機，這是一個識別字符串模式的模型，術語參考自書籍《編譯原理》。

該模型對應一個狀態，字母表，和轉換函數的集合。

例如：需要識別字符串aabb

狀態：當前字符串識別的狀態，在例子中，其中的，a，aa，aab，aabb對應不同的狀態，假設對應爲狀態1，2，3，4，其中4被稱作接受狀態（accept state，表示已經得到符合模式的字符串）

字母表：字符串中需要進行識別字符集合，在例子中，a和b是需要識別的字符，字母表爲｛a，b｝

轉換函數：描述了每個狀態在字母表中每個字母對應的下一個狀態，如F[1, a] = 2，F[1, b] = error

爲方便舉例，上述使用了某個具體的字符串說明相關概念，實際上DFA可以識別字符串的集合，它的識別能力等價於使用正則表達式描述的字符串的集合。

2）表驅動法

每個DFA模型（狀態，字母表，以及轉換函數的集合）可以表示爲一張轉換表，表中的每行對應一個狀態，表中的每列對應字母表中的字符，對應於上述的例子，在下表中，0狀態表示從此處開始識別，err表示識別到一個錯誤（即當前字符串不符合模式），acc表示識別成功，接受狀態用“（）”標識，如下表所示：

表1.1 aabb的DFA對應的轉換表

	a	b	eof
0	1	err	err
1	2	err	err
2	err	3	err
3	err	4	err
(4)	err	err	acc

表驅動法是以DFA對應的這張表，構造出一個“用表去驅動字符串模式的識別”的方法。

3）字符串模式

字符串模式表示一個字符串的集合，熟悉正則表達式的話可以將他們看做同一事物，識別以“.txt”結尾的開頭爲a或b或c的字符串，可以使用模式，"(a|b|c)\.txt"，表示。（該模式用正則表達式表示，"."表示在正則表達式中有特殊含義：表示任意一個字符，使用"\"將它轉義爲一般字符）

2. 問題描述

在編譯原理課程上，老師佈置了一個任務：用表驅動法模擬DFA的識別字符串（語言）："(a|b)*abb"，的過程。

（該模式用正則表達式表示，"*"表示前面的符號存在0次或多次）

二、算法思路

根據上述的介紹，實際上需要做的工作是：

1）將需要識別的字符串表示爲DFA

2）將DFA轉變爲相應的轉換表

3）根據轉換表實現表驅動法

由於該算法的重心在於實現表驅動法，所以前兩步在紙上完成，先根據《編譯原理》書籍上提到的相關算法（爲字符串構造NFA（Non-deterministic Finite Automata，不確定的有窮自動機），將NFA轉化爲DFA，最小化DFA的狀態）構造出表示"(a|b)*abb"的DFA，然後描述出對應的轉換表，如下圖所示：

圖2.1 "(a|b)*abb"對應的轉換圖

	a	b	eof
0	1	0	err
1	1	2	err
2	1	3	err
(3)	1	0	acc

圖2.2 "(a|b)*abb"對應的轉換表

根據轉換表，表驅動法可以用僞代碼描述如下，參考自《編譯原理》：

s = s0    // s0 is state 0, the beginning state
c = nextChar()
while (c != eof){
    s = F(s, c)    // F is the state-transition method
    c = nextChar()
}
if (s in acceptState())    // whether exit state s is accept state when read eof
    print("yes")
else
    print("no")

三、算法實現

1. C++實現代碼

#include <iostream>    // for cin, cout
#include <map>    // for map

using std::cin;
using std::cout;
using std::map;

// the enumeration of DFA states
enum state {S0=0, S1, S2, S3 ,ERR};

// the max row and colum of transition-table
const int MAXROW = 5;
const int MAXCOL = 3;
//===----------------------------------------===//
//  state-transition table definition
//          a   b
//     S0  S1  S0
//     S1  S1  S2
//     S2  S1  S3
//     S3  S1  S0
//    ERR ERR ERR
//===----------------------------------------===//
// the first column set for invalid char that not in the alphabet
state transTable [MAXROW][MAXCOL] =
{
    {  ERR,  S1,  S0  },
    {  ERR,  S1,  S2  },
    {  ERR,  S1,  S3  },
    {  ERR,  S1,  S0  },
    {  ERR, ERR, ERR  }
};
// the map table for alphabet of (a|b)*abb
map<char, int> alphabet =
{
    // 'a' map to the 1 column and 'b' map to the 2 column of transTable
    { 'a' , 1 },
    { 'b' , 2 }
};

void tableDrive();    // table-drive function
state F(state, char);    // state-transition function
char nextChar();    // get next input char

int main()
{
    cout << "================================\n";
    cout << " String-model: (a|b)*abb        \n";
    cout << " End-of-input: $                \n";
    cout << "================================\n";
    while (1){
        cout << ">";
        tableDrive();
    }
    return 0;
}

// table-drive function for recognize the string-model:(a|b)*abb
void tableDrive()
{
    state s = S0;
    char c = nextChar();
    while (c != '$'){    // '$' indicate the end of input
        s = F(s, c);
        c = nextChar();
    }
    // S3 is the only accept state
    if (s == S3)
        cout << "yes\n";
    else
        cout << "no\n";
}
//===----------------------------------------===//
//  state-transition function
//  s is the current state, c is the current char
//  base on the transition-table above of DFA
//===----------------------------------------===//
state F(state s, char c)
{
    state ret;
    int col = alphabet[c];    // if c not in alphabet, the maped value will set to 0

    ret = transTable[s][col];
    return ret;
}
// get the next input char
char nextChar()
{
    char ret;

    cin >> ret;
    return ret;
}

2. 運行結果

圖3.1 運行結果

四、小結

可以發現，通過DFA的轉換表來完成字符串模式的識別是相當直觀的，這個識別工作可以用作編譯器的詞法分析。

基於‘DFA’的使用‘表驅動法’識別‘字符串模式’的方法

DAPPER 事務 TRANSACTION

2-12-2 摩托車繼承自行車和機動車

2-13-1 動物怎麼叫

C++課程設計-銀行儲蓄系統

HDU OJ -- Count the Trees

基於‘DFA’的使用‘表驅動法’識別‘字符串模式’的方法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結