用python手工編寫一個詞法分析器

@author：x1Nge.

編譯原理基礎實驗

實驗報告目錄

運行截圖及源碼

實驗目的

通過設計編制調試一個具體的詞法分析程序，加深對詞法分析原理的理解，這裏我使用python作爲開發語言。

過程分析

爲了簡化設計，降低難度，這裏設計的詞法分析器識別以下內容：

類別	舉例	輸出舉例（種別編碼，自身值）
保留字	if 、int、while	(1,“int”)
標識符	a、b、c	(2,“a”)
整常數	6、16、36	(3,“0110”)
運算符	+、-、、/、*、>=	(4,"+")
分隔符	；、{、}	(5,";")

基本流程圖如下：

內容一：編寫子功能函數

1.讀入下一個字符

# 把下一個字符讀入到new_ch中
def get_char():
    global p
    temp_ch = res_lst[p]
    p += 1
    return temp_ch

注意：實際工作中並不常用在函數內部用global來使用全局變量

2.跳過空白符直至ch讀入一個非空白符

# 跳過空白符直至ch讀入一個非空白符
def get_blank_ch(temp_ch_1):
    if temp_ch_1 == ' ':
        temp_ch_2 = get_char()
        return temp_ch_2
    return temp_ch_1

3.把ch中的字符連接到str_get之後

# 把ch中的字符連接到str_get之後
def ch_append():
    #直接調用python函數
    return

有話說：剛開始考慮寫詞法分析器並未考慮具體語言，於是有一些函數在這裏並不需要實現

3.查找保留字表中是否有str_get，若存在則返回1，否則返回0

# 查找保留字表中是否有str_get，若存在則返回1，否則返回0
def is_reserved_word( str_result ):
    check_client = pymongo.MongoClient("mongodb://localhost:27017/")
    check_db = check_client["PrincipleOfCompiler"]
    check_col = check_db["ReservedWord"]
    check_query = {"content" : str_result}
    for get_text in check_col.find(check_query):
        # 判斷匹配到的get_text是否爲空，若不爲空則要匹配的字符串在保留字表中找到
        if any(get_text):
            check_client.close()
            return 1
    check_client.close()
    return 0

有話說：這裏我把保留字表放在了數據庫裏，使用的是mongodb，當然也可以用My SQL或者其他數據庫，python使用的時候需要導入包,mongodb是pymongo

pip install pymongo

import pymongo

4.將搜索指針回調一個字符位置

# 將搜索指針回調一個字符位置
def retract_pointer():
    global p
    p -= 1
    return

5.若識別爲標識符，將str_result中的標識符插入符號表並返回符號表指針

# 若識別爲標識符，將str_result中的標識符插入符號表並返回符號表指針
def insert_identifier( str_result ):
    return str_result

6.若識別爲常數，將str_result中的常數插入常數表並返回參數表指針

# 若識別爲常數，將str_result中的常數插入常數表並返回參數表指針
def insert_constant( str_result ):
    return str(bin(int(str_result)))

有話說：爲了簡化這裏我直接返回了其自身值（整常數返回二進制數）

內容二：編寫主功能函數

接下來分析主功能函數的編寫，我們按照流程圖的框架進行分析：

注意：主功能我都寫在一個函數裏了，以下部分均爲函數內代碼,作分析之用，全部源碼見下文

初始化

 # 初始化
    result = []
    str_get = []
    ch = get_char()
    new_ch = get_blank_ch(ch)

識別標識符

# 識別標識符
    if  new_ch.isalpha() or new_ch == '_' or new_ch == '$':
        str_get.append(new_ch)
        new_ch = get_char()
        while new_ch.isalpha() or new_ch.isdigit() or new_ch == '_' or new_ch == '$':
            str_get.append(new_ch)
            new_ch = get_char()
        retract_pointer()
        str_result = ''.join(str_get)
        code = is_reserved_word(str_result)
        if code == 0 :
            value = insert_identifier(str_result)
            result.append('2') # 這裏使用2作爲非保留字的標識符的種別編碼
            result.append(value)
            return result
        else:
            result.append('1') # 這裏使用1作爲保留字的種別編碼
            result.append(str_result) # 實驗例子中value值爲保留字本身
            """
            result.append('-') # 保留字無自身值
            """
            return result

識別整常數

 # 識別整常數
    elif new_ch.isdigit():
        str_get.append(new_ch)
        new_ch = get_char()
        while new_ch.isdigit():
            str_get.append(new_ch)
            new_ch = get_char()
        retract_pointer()
        str_result = ''.join(str_get)
        value = insert_constant(str_result)
        result.append('3') # 這裏使用3作爲整常數的種別編碼
        result.append(value)
        return result

識別運算符

#識別運算符
    elif new_ch == '=' or new_ch == '+' or new_ch == '-' or new_ch == '*' or new_ch == '/' or new_ch == '>'\
        or new_ch == '<' or new_ch == '!' or new_ch == '%':
        if new_ch == '>' or new_ch == '<' or new_ch == '!':
            str_get.append(new_ch)
            value = ''.join(new_ch)
            new_ch = get_char()
            if new_ch == '=':
                str_get.append(new_ch)
                str_result = ''.join(str_get)
                result.append('4') # 這裏使用4作爲運算符的種別編碼
                result.append(str_result)
                return result
            else:
                retract_pointer()
                result.append('4')
                result.append(value)
                return result
        elif new_ch == '*':
            str_get.append(new_ch)
            value = ''.join(new_ch)
            new_ch = get_char()
            if new_ch == '*':
                str_get.append(new_ch)
                str_result = ''.join(str_get)
                result.append('4')
                result.append(str_result)
                return result
            else:
                retract_pointer()
                result.append('4')
                result.append(value)
                return result
        else:
            value = ''.join(new_ch)
            result.append('4')
            result.append(value)
            return  result

識別分隔符

# 識別分隔符
    elif new_ch == ',' or new_ch == ';' or new_ch == '{' or new_ch == '}' or new_ch == '(' or new_ch == ')':
        value = ''.join(new_ch)
        result.append('5') # 這裏使用5作爲分隔符的種別編碼
        result.append(value)
        return result

其他

    else:
        result.append("Error.")
        return result

注意：這裏我只是檢查了幾個主要的分隔符、運算符，若要全部檢查，則查詢一張完整的表即可

內容三：其他功能代碼

文件讀入及預處理。這裏我主要將換行和\t去掉，並把兩個空格替換爲一個空格，如果要去掉全部空格，則記得保留諸如int a這種代碼間的空格

f = open('codeTest.txt','r')
res = f.read().replace('\n','').replace('\t','').replace('  ',' ')
res_lst = list(res)
print(res)
p = 0 # 初始化位置指針

運行

while p in range(len(res_lst)):
    print(check_code())
f.close()

運行截圖及源碼

個人能力有限，程序有許多考慮不周的地方，歡迎提出修改意見
同步更新至CSDN，僅作實驗記錄之用。

例：

main()
{
	int  a,b;
	a = 10;
  	b = a + 20;
}

運行：

源碼：

import pymongo

"""
code,value 暫不進行初始化
"""

f = open('codeTest.txt','r')
res = f.read().replace('\n','').replace('\t','').replace('  ',' ')
res_lst = list(res)
print(res)
p = 0 # 初始化位置指針

def check_code():
    # 初始化
    result = []
    str_get = []
    ch = get_char()
    new_ch = get_blank_ch(ch)
    # 識別標識符
    if  new_ch.isalpha() or new_ch == '_' or new_ch == '$':
        str_get.append(new_ch)
        new_ch = get_char()
        while new_ch.isalpha() or new_ch.isdigit() or new_ch == '_' or new_ch == '$':
            str_get.append(new_ch)
            new_ch = get_char()
        retract_pointer()
        str_result = ''.join(str_get)
        code = is_reserved_word(str_result)
        if code == 0 :
            value = insert_identifier(str_result)
            result.append('2') # 這裏使用2作爲非保留字的標識符的種別編碼
            result.append(value)
            return result
        else:
            result.append('1') # 這裏使用1作爲保留字的種別編碼
            result.append(str_result) # 實驗例子中value值爲保留字本身
            """
            result.append('-') # 保留字無自身值
            """
            return result
    # 識別整常數
    elif new_ch.isdigit():
        str_get.append(new_ch)
        new_ch = get_char()
        while new_ch.isdigit():
            str_get.append(new_ch)
            new_ch = get_char()
        retract_pointer()
        str_result = ''.join(str_get)
        value = insert_constant(str_result)
        result.append('3') # 這裏使用3作爲整常數的種別編碼
        result.append(value)
        return result
    #識別運算符
    elif new_ch == '=' or new_ch == '+' or new_ch == '-' or new_ch == '*' or new_ch == '/' or new_ch == '>'\
        or new_ch == '<' or new_ch == '!' or new_ch == '%':
        if new_ch == '>' or new_ch == '<' or new_ch == '!':
            str_get.append(new_ch)
            value = ''.join(new_ch)
            new_ch = get_char()
            if new_ch == '=':
                str_get.append(new_ch)
                str_result = ''.join(str_get)
                result.append('4') # 這裏使用4作爲運算符的種別編碼
                result.append(str_result)
                return result
            else:
                retract_pointer()
                result.append('4')
                result.append(value)
                return result
        elif new_ch == '*':
            str_get.append(new_ch)
            value = ''.join(new_ch)
            new_ch = get_char()
            if new_ch == '*':
                str_get.append(new_ch)
                str_result = ''.join(str_get)
                result.append('4')
                result.append(str_result)
                return result
            else:
                retract_pointer()
                result.append('4')
                result.append(value)
                return result
        else:
            value = ''.join(new_ch)
            result.append('4')
            result.append(value)
            return  result
    # 識別分隔符
    elif new_ch == ',' or new_ch == ';' or new_ch == '{' or new_ch == '}' or new_ch == '(' or new_ch == ')':
        value = ''.join(new_ch)
        result.append('5') # 這裏使用5作爲分隔符的種別編碼
        result.append(value)
        return result
    else:
        result.append("Error.")
        return result

# 把下一個字符讀入到new_ch中
def get_char():
    global p
    temp_ch = res_lst[p]
    p += 1
    return temp_ch

# 跳過空白符直至ch讀入一個非空白符
def get_blank_ch(temp_ch_1):
    if temp_ch_1 == ' ':
        temp_ch_2 = get_char()
        return temp_ch_2
    return temp_ch_1

# 把ch中的字符連接到str_get之後
def ch_append():
    #直接調用python函數
    return

# 查找保留字表中是否有str_get，若存在則返回1，否則返回0
def is_reserved_word( str_result ):
    check_client = pymongo.MongoClient("mongodb://localhost:27017/")
    check_db = check_client["PrincipleOfCompiler"]
    check_col = check_db["ReservedWord"]
    check_query = {"content" : str_result}
    for get_text in check_col.find(check_query):
        # 判斷匹配到的get_text是否爲空，若不爲空則要匹配的字符串在保留字表中找到
        if any(get_text):
            check_client.close()
            return 1
    """
    check_doc = check_col.find(check_query)
    print(check_doc)
    for res in check_doc:
        print(res)
    """
    check_client.close()
    return 0

# 將搜索指針回調一個字符位置
def retract_pointer():
    global p
    p -= 1
    return

# 若識別爲標識符，將str_result中的標識符插入符號表並返回符號表指針
def insert_identifier( str_result ):
    return str_result

# 若識別爲常數，將str_result中的常數插入常數表並返回參數表指針
def insert_constant( str_result ):
    return str(bin(int(str_result)))

while p in range(len(res_lst)):
    print(check_code())
f.close()

用python手工編寫一個詞法分析器

用python手工編寫一個詞法分析器

實驗報告目錄

實驗目的

過程分析

內容一：編寫子功能函數

內容二：編寫主功能函數

內容三：其他功能代碼

運行截圖及源碼

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

企業大模型如何成爲自己數據的“百科全書”？

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（一）部署K8s

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（三）數據卷掛載NFS（網絡文件系統）

R語言連接SQL Server數據庫和保存數據集到數據庫操作

Android開發課程實驗報告①

【乾貨分享】我個人覺得適合軟件計算機專業大學生使用的幾個實用網站！

在安裝hadoop中出現的一些小問題及解決辦法合集！

大數據MapReduce實例：實現矩陣乘法

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結