編譯器之詞法分析器（Lexical Analyzer）

定義

（來自維基百科）
詞法分析（英語：lexical analysis）是計算機科學中將字符序列轉換爲標記（token）序列的過程。進行詞法分析的程序或者函數叫作詞法分析器（lexical analyzer，簡稱lexer），也叫掃描器（scanner）。詞法分析器一般以函數的形式存在，供語法分析器調用。

目標

爲下面指定的簡單編程語言創建一個詞法分析器。程序應從文件和/或stdin讀取輸入，並將輸出寫入文件和/或stdout。如果使用的語言具有lexer模塊/庫/類，那麼提供兩種版本的解決方案將是很好的選擇：一個不帶lexer模塊，另一個帶lexer模塊。

支持以下標記
操作符

Name	Common name	Character sequence
Op_multiply	multiply	*
Op_divide	divide	/
Op_mod	mod	%
Op_add	plus	+
Op_subtract	minus	-
Op_negate	unary minus	-
Op_less	less than	<
Op_lessequal	less than or equal	<=
Op_greater	greater than	>
Op_greaterequal	greater than or equal	>=
Op_equal	equal	==
Op_notequal	not equal	!=
Op_not	unary not	!
Op_assign	assignment	=
Op_and	logical and	&&
Op_or	logical or	¦¦

符號

Name	Common name	Character
LeftParen	left parenthesis	(
RightParen	right parenthesis	)
LeftBrace	left brace	{
RightBrace	right brace	}
Semicolon	semi-colon	;
Comma	comma	,

關鍵字

Name	Character sequence
Keyword_if	if
Keyword_else	else
Keyword_while	while
Keyword_print	print
Keyword_putc	putc

標識符和字面量
這些與之前的標記不同，都有與之關聯的值。

Name	Common name	Format description	Format regex	Value
Identifier	identifier	one or more letter/number/underscore characters, but not starting with a number	`[_a-zA-Z][_a-zA-Z0-9]*`	as is
Integer	integer literal	one or more digits	`[0-9]+`	as is, interpreted as a number
Integer	char literal	exactly one character (anything except newline or single quote) or one of the allowed escape sequences, enclosed by single quotes	`‘([^’\n]`	\n
String	string literal	zero or more characters (anything except newline or double quote), enclosed by double quotes	`“[^”\n]*"`	the characters without the double quotes and with escape sequences converted

對於char和string字面量，支持使用\n轉義序列來表示換行符
對於char和string字面量，要表示反斜槓，使用\
不支持其他特殊序列。這意味着：
- 字符字面量不能表示單引號字符（值39）
- 字符串字面量不能表示包含雙引號字符的字符串

零寬標記

Name	Location
End_of_input	when the end of the input stream is reached

空格

任意兩個標記之間都允許零個或多個空格字符或用 / * … * / 括起來的註釋，以下說明除外。
“最長標識匹配”用於解決衝突（例如將 <= 匹配爲單個標識而不是 < 和 = 兩個標識）
兩個具有字母數字字符或下劃線的標記之間必須有空格。
- keywords, identifiers, integer literals.
- 例如 ifprint 被識別爲一個標識符，而不是關鍵字 if 和 print
- 例如 42fred 是無效的，既不會被識別爲數字也不會被識別爲標識符
標記內不允許使用空格（字符和字符串屬於值的一部分除外）
- 例如 & & 是無效的，不會被解釋爲&&操作符

例如以下兩個程序片段是等效的：

if ( p /* meaning n is prime */ ) {
    print ( n , " " ) ;
    count = count + 1 ; /* number of primes found so far */
}

if(p){print(n," ");count=count+1;}

所有標記名稱

End_of_input  Op_multiply   Op_divide     Op_mod       Op_add     Op_subtract
Op_negate     Op_not        Op_less       Op_lessequal Op_greater Op_greaterequal
Op_equal      Op_notequal   Op_assign     Op_and       Op_or      Keyword_if
Keyword_else  Keyword_while Keyword_print Keyword_putc LeftParen  RightParen
LeftBrace     RightBrace    Semicolon     Comma        Identifier Integer
String

輸出格式
程序輸出應該是一系列的行，每行包括以下用空格分隔的字段：

標識開始位置的行號
標識開始位置的列號
標識名
標識值 (只對於Identifier, Integer, String)
字段之間的空格數取決於自己，最好對齊

診斷功能
以下錯誤情況需要捕獲：

Error	Example
Empty character constant	`‘’`
Unknown escape sequence.	`\r`
Multi-character constant.	`‘xx’`
End-of-file in comment. Closing comment characters not found.
End-of-file while scanning string literal. Closing string character not found.
End-of-line while scanning string literal. Closing string character not found before end-of-line.
Unrecognized character.	`\|`
Invalid number. Starts like a number, but ends in non-numeric characters.	`123abc`

測試用例

測試用例一

輸入

/*
  Hello world
 */
print("Hello, World!\n");

輸出：

    4      1 Keyword_print
    4      6 LeftParen
    4      7 String         "Hello, World!\n"
    4     24 RightParen
    4     25 Semicolon
    5      1 End_of_input

測試用例二

輸入

/*
  Show Ident and Integers
 */
phoenix_number = 142857;
print(phoenix_number, "\n");

輸出

    4      1 Identifier     phoenix_number
    4     16 Op_assign
    4     18 Integer         142857
    4     24 Semicolon
    5      1 Keyword_print
    5      6 LeftParen
    5      7 Identifier     phoenix_number
    5     21 Comma
    5     23 String         "\n"
    5     27 RightParen
    5     28 Semicolon
    6      1 End_of_input

測試用例三

輸入

/*
  All lexical tokens - not syntactically correct, but that will
  have to wait until syntax analysis
 */
/* Print   */  print    /* Sub     */  -
/* Putc    */  putc     /* Lss     */  <
/* If      */  if       /* Gtr     */  >
/* Else    */  else     /* Leq     */  <=
/* While   */  while    /* Geq     */  >=
/* Lbrace  */  {        /* Eq      */  ==
/* Rbrace  */  }        /* Neq     */  !=
/* Lparen  */  (        /* And     */  &&
/* Rparen  */  )        /* Or      */  ||
/* Uminus  */  -        /* Semi    */  ;
/* Not     */  !        /* Comma   */  ,
/* Mul     */  *        /* Assign  */  =
/* Div     */  /        /* Integer */  42
/* Mod     */  %        /* String  */  "String literal"
/* Add     */  +        /* Ident   */  variable_name
/* character literal */  '\n'
/* character literal */  '\\'
/* character literal */  ' '

輸出

    5     16   Keyword_print
    5     40   Op_subtract
    6     16   Keyword_putc
    6     40   Op_less
    7     16   Keyword_if
    7     40   Op_greater
    8     16   Keyword_else
    8     40   Op_lessequal
    9     16   Keyword_while
    9     40   Op_greaterequal
   10     16   LeftBrace
   10     40   Op_equal
   11     16   RightBrace
   11     40   Op_notequal
   12     16   LeftParen
   12     40   Op_and
   13     16   RightParen
   13     40   Op_or
   14     16   Op_subtract
   14     40   Semicolon
   15     16   Op_not
   15     40   Comma
   16     16   Op_multiply
   16     40   Op_assign
   17     16   Op_divide
   17     40   Integer             42
   18     16   Op_mod
   18     40   String          "String literal"
   19     16   Op_add
   19     40   Identifier      variable_name
   20     26   Integer             10
   21     26   Integer             92
   22     26   Integer             32
   23      1   End_of_input

測試用例四

輸入

/*** test printing, embedded \n and comments with lots of '*' ***/
print(42);
print("\nHello World\nGood Bye\nok\n");
print("Print a slash n - \\n.\n");

輸出

    2      1 Keyword_print
    2      6 LeftParen
    2      7 Integer            42
    2      9 RightParen
    2     10 Semicolon
    3      1 Keyword_print
    3      6 LeftParen
    3      7 String          "\nHello World\nGood Bye\nok\n"
    3     38 RightParen
    3     39 Semicolon
    4      1 Keyword_print
    4      6 LeftParen
    4      7 String          "Print a slash n - \\n.\n"
    4     33 RightParen
    4     34 Semicolon
    5      1 End_of_input

代碼實現

C實現
 C#實現
 Go實現
 Java實現
 JavaScript實現
 Python實現

編譯器之詞法分析器（Lexical Analyzer）

定義

目標

代碼實現

語法分析器（syntax analyzer）【C實現】

編譯器之詞法分析器（Lexical Analyzer）

詞法分析器（Lexical Analyzer）【C實現】

哲學家用餐問題（Dining philosophers）【代碼實現】

多語言實現電子郵件發送功能

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結