純C實現的詞法分析和lex實現的詞法分析的對比

（一）：寫在前面

在上面的學習當中，我們通過簡單的lex例子，進一步擴展lex例子，通過和yacc的融合來進行簡單英語語法分析。通過這幾個例子，使我們深深的感受到lex和yacc的方便和強大功能。我們最終的目標是通過學習使用lex和yacc來實現一個簡單的shell解釋器，估計借用lex和yacc力量，我們的shell命令解釋器實現起來就非常簡單了。

（二）：英語簡單語法分析擴展

在這裏我們通過對上一小節中的英語句型分析程序的擴展，實現簡單複合語句的分析。

我們來看一下我們的程序源碼：

文件名：ch05.lex


    %{
        /*
         * 現在我們構建一個有高級語法分析程序使用的詞法分析程序
         */

        #include "y.tab.h"

        #define LOOKUP 0 /* 默認情況 - 不是一個定義的單詞類型 */

        int state;
    %}

    %%

    \n { state = LOOKUP; }

    \.\n { state = LOOKUP;
        return 0; /* 句子結尾 */
        }

    ^verb { state = VERB; }
    ^adj { state = ADJECTIVE; }
    ^adv { state = ADVERB; }
    ^noun { state = NOUN; }
    ^prep { state = PREPOSITION; }
    ^pron { state = PRONOUN; }
    ^conj { state = CONJUNCTION; }

    [a-zA-Z]+ {
        if(state != LOOKUP){
            add_word(state,yytext);
        }else{
            switch(lookup_word(yytext)){
            case VERB:
                return(VERB);
            case ADJECTIVE:
                return(ADJECTIVE);
            case ADVERB:
                return(ADVERB);
            case NOUN:
                return(NOUN);
            case PREPOSITION:
                return(PREPOSITION);
            case PRONOUN:
                return(PRONOUN);
            case CONJUNCTION:
                return(CONJUNCTION);
            default:
                printf("%s: don't recognize\n",yytext);
                /* 不反悔，忽略 */
            }
        }
    }

    . ;

    %%

    /* 定義一個單詞和類型的鏈表 */
    struct word
    {
        char *word_name;
        int word_type;
        struct word *next;
    };

    struct word *word_list;  /* 單詞鏈表中的第一個元素 */
    extern void *malloc();

    int add_word(int type,char *word)
    {
        struct word *wp;
        if(lookup_word(word) != LOOKUP){
            printf("!!warning:word %s already defined\n",word);
            return 0;
        }

        /* 單詞不在那裏，分配一個新的條目並將他連接到鏈表上 */
        wp = (struct word *)malloc(sizeof(struct word));
        wp->next = word_list;

        /* 還必須複製單詞本身 */
        wp->word_name = (char *)malloc(strlen(word)+1);
        strcpy(wp->word_name,word);
        wp->word_type = type;
        word_list = wp;

        return 1;  /* 成功添加 */
    }

    int lookup_word(char *word)
    {
        struct word *wp = word_list;

        /* 向下搜索列表以尋找單詞 */
        for(;wp;wp = wp->next){
            if(strcmp(wp->word_name,word) == 0)
                return wp->word_type;
        }

        return LOOKUP;
    }

    int yywrap()
    {
        return 1;
    }

在這個程序當中，和上一個小節中的內容是差不多的，主要是將相應詞性的詞語放到一個鏈表當中，便於查找。

下面我們來看一下yacc文件中的定義。

文件名：ch05.y


    %{
        #include <stdio.h>
        /* We found the following required for some yacc implementations */
        /* #define YYSTYPE int */
    %}

    %token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION

    %%

    sentence: simple_sentence { printf("Parsed a simple sentence.\n"); }
        | compound_sentence { printf("Parsed a compound sentence.\n"); }
        ;

    simple_sentence: subject verb object
        | subject verb object pre_phrase
        ;

    compound_sentence: simple_sentence CONJUNCTION simple_sentence
        | compound_sentence CONJUNCTION simple_sentence
        ;

    subject: NOUN
        | PRONOUN
        | ADJECTIVE subject
        ;

    verb: VERB
        | ADVERB VERB
        | verb VERB
        ;

    object: NOUN
        | ADJECTIVE object
        ;

    pre_phrase: PREPOSITION NOUN
        ;

    %%

    extern FILE *yyin;


    int main()
    {
        yyparse();
        while(!feof(yyin)){
            yyparse();
        }

        return 0;
    }


    yyerror(s)
    char *s;
    {
        fprintf(stderr,"%s\n",s);
    }

在這裏，我們添加了一些符合語句的分析語法。


    sentence: simple_sentence { printf("Parsed a simple sentence.\n"); }
        | compound_sentence { printf("Parsed a compound sentence.\n"); }
        ;

    simple_sentence: subject verb object
        | subject verb object pre_phrase
        ;

    compound_sentence: simple_sentence CONJUNCTION simple_sentence
        | compound_sentence CONJUNCTION simple_sentence
        ;

我們來看一下這幾行代碼：

在上一節中，我們瞭解到，yacc中，越靠上的規則，其優先級越高。所以，上面的sentence規則定義了是一個簡單的語句，而該簡單的語句simple_sentence又在下面定義了規則subject verb object，通過這兩個規則，我們定義了簡單語句。

然後通過簡單語句的組合又定義了複合語句，這個可以通過我們的第三條規則來看出來。

下面我們來定義一下用於編譯的Makefile文件：

Makefile


    all:
        lex ch05.lex
        yacc -d ch05.y
        gcc -c lex.yy.c y.tab.c
        gcc -o hello lex.yy.o y.tab.o -ll

    clean:
        rm lex.yy.o y.tab.o lex.yy.c y.tab.c y.tab.h hello

接着使用命令make來編譯該程序，編譯完成之後，我們來看一下當前目錄:

.
├── ch05.lex
├── ch05.y
├── hello
├── lex.yy.c
├── lex.yy.o
├── Makefile
├── y.tab.c
├── y.tab.h
└── y.tab.o

0 directories, 9 files

現在我們來運行一下剛剛我們編譯的程序，使用下面的命令來運行該程序：

    ./hello

（三）：lex和手寫的詞法分析程序

下面我們通過使用lex編寫一個詞法分析程序和使用C語言編寫詞法分析程序的比較，來提高我們對lex和yacc的方便性，全面性，整體性的認識。

首先，我們先看一下使用C語言編寫的簡單詞法分析程序，該程序用來處理命令，數字，字符串和換行，忽略註釋和空白的。我們來看一下：


    #include <stdio.h>
    #include <ctype.h>

    #define NUMBER 400
    #define COMMENT 401
    #define TEXT 402
    #define COMMAND 403

    int main(int argc,char **argv)
    {
        int val;

        while(val = lexer())
            printf("value is %d.\n",val);

        return 0;
    }

    int lexer()
    {
        int c;

        while((c = getchar()) == ' ' || c == '\t');

        if(c == EOF)
            return 0;
        if(c == '.' || isdigit(c)) /* 數字 */
        {
            while((c = getchar()) != EOF && isdigit(c));
            ungetc(c,stdin);

            return NUMBER;
        }

        if(c == '#') /* 註釋 */
        {
            int index = 1;
            while((c = getchar()) != EOF && c != '\n');
            ungetc(c,stdin);

            return COMMENT;
        }
        if(c == '"') /* 字符串 */
        {
            int index = 1;
            while((c = getchar()) != EOF && c != '"' && c != '\n');

            if(c == '\n')
                ungetc(c,stdin);

            return TEXT;
        }
        if(isalpha(c)) /* 命令 */
        {
            int index = 1;
            while((c = getchar()) != EOF && isalnum(c));

            ungetc(c,stdin);

            return COMMAND;
        }
        return c;
    }

這個大家可以編譯一下，運行起來看看效果，這裏我們就不編譯運行了，因爲我們主要是爲了對比他們的區別。

下面我們來看一下lex編寫的詞法分析程序：


    %{
        #define NUMBER 400
        #define COMMENT 401
        #define TEXT 402
        #define COMMAND 403
    %}

    %%
    [ \t]+  ;
    [0-9]+  |
    [0-9]+\.[0-9]+  |
    \.[0-9]+    { return NUMBER; }
    #.* { return COMMENT; }
    \"[^\"\n]\" { return TEXT; }
    [a-zA-Z][a-zA-Z0-9]+    { return COMMAND; }
    \n  { return '\n'; }
    %%

    #include <stdio.h>

    int main(int argc,char **argv)
    {
        int val;

        while(val = yylex())
            printf("value is %d\n",val);

        return 0;
    }

    int yywrap()
    {
        return 1;
    }

很明顯，長度上lex版本是C詞法分析程序的三分之一。我們的經驗是程序中的錯誤數一般與他的長度成正比，我們估計詞法分析程序的C版本要花三倍的時間來編寫和排除錯誤。

同時，使用C編寫的詞法分析程序有一個明顯的錯誤，就是註釋有兩顆星的時候，就意味着註釋失敗：


    /** 註釋 **/

所以說，使用C實現的詞法分析程序可能會有一些想不到的錯誤。

（四）：寫在後面

在下面的小節中，我們將更深入的研究lex，yacc的使用，以及lex和yacc混合使用方式，來實現更加複雜的詞語法分析。

純C實現的詞法分析和lex實現的詞法分析的對比

（一）：寫在前面

（二）：英語簡單語法分析擴展

（三）：lex和手寫的詞法分析程序

（四）：寫在後面

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

通義千問 2.5 “客串” ChatGPT4，你分的清嗎？

Garnet：微軟官方基於.NET開源的高性能分佈式緩存存儲數據庫

Flink執行圖

Java響應式編程

評估統計算法在銀行僞造鈔票檢測中的價值

Linux容器 - LXC簡介

Linux啓動過程學習

Docker的安裝，配置，更新和卸載

使用lex---01

驗證的啓動

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結