鑑釋課堂丨編譯器技術入門知識一網打盡

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近10年,摩爾定律逐漸失效,芯片性能已經摸到了天花板。功率消耗與優化的基石——編譯器技術再次進入了人們視野,我們請到了鑑釋靜態代碼分析工具愛科識(Xcalscan)研發負責人賴建新,通過通俗的語言與示例帶大家走近編譯器技術。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"這次分享將分爲共六個問題向大家介紹:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"什麼是編譯器技術?","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"初學編譯器技術的開發者需要具備哪些基礎?","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"當今現代編譯器的關鍵挑戰是什麼?","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"編譯器中哪個部分最重要?","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"編譯器技術除了生成代碼在進程或VM中執行之外,是否還有其他領域使用編譯器技術?","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":6,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼數據流分析在發現程序問題(包括錯誤、安全漏洞等)方面更有效?","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"一、什麼是編譯器技術?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編譯器技術與高級語言程序設計技術一同誕生,狹義的編譯器技術是指將高級語言翻譯成機器碼。但倘若沒有編譯器的存在,那麼程序員只能用機器碼,這些由0和1組成的數字代碼來編寫程序,在如今龐大的軟件體量裏,這幾乎是不可能完成的任務。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以千萬不要小看編譯器,它自下而上貫穿了程序開發的全過程,從硬件到應用全都有它的存在。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fb/fb0d55c72b3a1f17c1715f2a5c319e12.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了常見的高級語言到機器碼的編譯器外,編譯器技術還廣泛應用於領域特定語言(Domain Specific Language, DSL)到通用編程語言的源到源編譯器(例如區塊鏈中智能合約編譯器,Web開發中各種高級語言到JavaScript的編譯器),高級語言解釋器(例如Python)和虛擬機(例如Java VM和WebAssembly VM),以及二進制翻譯(Binary Translation,例如Apple Silicon M1上的Rosetta 2)等領域。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着現代軟件規模不斷增大,技術棧日趨複雜,只有充分利用編譯器技術,才能在程序的正確性、穩定性、安全性、性能,代碼的可讀性、可維護性、可移植性和開發人員的開發效率等多方面取得更好的平衡。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"二、初學編譯器技術的開發者需要具備哪些基礎?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"熟悉C語言與C++,以及常見的數據結構,特別是樹和圖的構建、查找、遍歷和變換算法是必須的敲門磚","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進階編譯器技術開發還需要對源語言、目標語言或指令集和微架構有深刻的理解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"三、當今現代編譯器的關鍵挑戰是什麼?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從第一個高級語言FORTRAN發明以來,編譯器技術就隨着高級語言和計算機技術的發展而不斷髮展。早期的編譯器側重於高級語言到機器碼的代碼生成技術,主要解決的問題是指令選擇和寄存器分配。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着高級語言的抽象層次不斷提高以及現代處理器和存儲系統日益複雜,編譯器優化技術的重要凸顯出來。編譯優化技術一般可分爲機器無關優化和機器相關優化。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前者能夠幫助消除高級語言抽象引入的額外開銷同時提升程序員的開發效率,增加程序的可讀性、可維護性、可移植性。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後者能幫助程序充分利用現代處理器和存儲系統的資源和特點以提高程序運行性能。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"zerowidth","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着開源軟件和雲計算的到來,軟件開發逐漸從原來封閉的完全內部開發轉爲開放的基於開源軟件組裝或者集成開發;從使用單一編程語言的單機環境,到使用兩種語言的服務器/客戶端環境,再到使用多種語言雲-邊緣-端複雜環境開發。這些給編譯器帶來了新的挑戰。例如:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何優化編譯器本身以更好的支持超大規模軟件系統(例如Android,Chromium)的構建和優化?","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在代碼生成之外,如何將編譯器技術應用到開發過程中,能夠提升程序員開發效率,解決代碼質量問題?","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在混合語言開發或者三方庫集成中,如何利用編譯器技術防範跨越語言邊界或三方庫邊界出現的bug?","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些問題都是今天的編譯器面臨的關鍵挑戰。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"四、編譯器中哪個部分最重要?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們看一下編譯器的整體架構:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5c/5c5e38e209301c3da7625272a97133e8.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編譯器前端的職責是將源語言轉換爲中間表示,包括詞法分析、語法分析、語法檢查和中間代碼生成。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"詞法分析和語法分析","attrs":{}},{"type":"text","text":"都是基於文法和自動機理論,在1970年代已經發展成熟。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"語法檢查","attrs":{}},{"type":"text","text":"遍歷語法分析產生的抽象語法樹(Abstract Syntax Trees,AST),檢查樹的節點和層次關係是否符合源語言規範。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"中間代碼生成","attrs":{}},{"type":"text","text":"是將抽象語法樹簡化和規範化後生成編譯器定義的中間表示(Intermediate Representation,IR)。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"zerowidth","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編譯器後端的職責是將中間表示經過編譯分析和編譯優化後生成彙編代碼或目標代碼的過程。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"預鏈接階段","attrs":{}},{"type":"text","text":"是過程間分析的必備步驟,它使用和最終鏈接階段相同的鏈接規則和順序確定全局函數和變量的訪問目標並以此爲基礎構建函數調用圖(Call Graph)。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"編譯分析","attrs":{}},{"type":"text","text":"主要包括函數內的控制流分析(Control Flow Analysis),數據流分析(Data Flow Analysis),函數調用的上下文分析(Context Analysis)和別名分析(Alias Analysis)。這些分析爲後續的優化和代碼生成提供優化依據和策略。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"機器無關優化","attrs":{}},{"type":"text","text":"包括過程間優化(Inter-Procedure Optimization),循環優化(Loop Optimization)和標量優化(Scalar Optimization)等,它的優化目標和依據和具體的處理器架構無關。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"機器相關優化","attrs":{}},{"type":"text","text":"包括指令選擇和調度,寄存器分配和針對特定硬件結構的優化等。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"代碼生成階段","attrs":{}},{"type":"text","text":"將完成寄存器分配的中間表示通過內置或外置的彙編器產生出目標代碼。目標代碼最終通過鏈接器鏈接產生可執行文件或可加載模塊。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"zerowidth","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"在上述的編譯器各個部分中,編譯分析的重要性越來越大","attrs":{}},{"type":"text","text":"。編譯分析不僅對編譯優化至關重要;對新興的編譯器應用場景,如代碼靜態掃描,更是最爲關鍵的步驟。函數內的控制流分析技術和數據流分析技術已經發展成熟並在各編譯器實現裏廣泛使用,而別名分析和上下文分析仍面臨巨大挑戰。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從這個角度而言,別名分析和上下文分析成爲編譯器中最重要的部分:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"——— 別名分析 ———","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"別名分析用於確定兩個指針或引用所指向的內存或對象是完全重疊、部分重疊或者相互獨立。部分重疊的情形存在於C/C++這類允許聯合體(Union)數據類型或允許指針算術操作的語言中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"別名分析的難點:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一方面在於在函數調用的每一個上下文和函數執行的每一條可能路徑上跟蹤指針變量的定義-使用鏈非常困難,其算法複雜度是指數級的;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一方面在於對於存放在數組等可隨機訪問或遞歸數據結構(如鏈表、樹和圖)中的指針的別名分析幾乎沒有精確的結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"別名分析對優化和靜態掃描非常重要,我們用C語言舉例如下,假定a、b、c、d均爲整型變量,p和q爲整型變量指針:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ad/ad4b415e3a7b737c3546b761ff79b211.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"——— 上下文分析 ———","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上下文分析將函數放進函數調用點的上下文環境中,確定調用前參數和全局變量的取值情況後分析函數行爲以確定函數調用結束後返回值和全局變量的變化情況。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上下文分析的難點:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一是完全上下文敏感分析的算法複雜度是指數級的,例如假設函數A在3個不同位置調用函數B,函數B也在3個不同位置調用函數C,此時函數C有9個不同的上下文需要分析;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二是隨着軟件規模增大及模塊化軟件設計強調函數功能單一化後函數數量和調用點數量激增,上下文數量可能膨脹到無法分析的情況;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"三是由於函數遞歸調用導致難以進行精確的上下文分析結果。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上下文分析同樣對優化和靜態掃描很重要:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6de5c2c8e1b9f31536d28cac4cdb46d4.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"五、編譯器技術除了生成代碼在進程或VM中執行之外,是否還有其他領域使用編譯器技術?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上述領域,編譯器技術目前還能被應用在:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"靜態掃描工具,即利用編譯分析技術查找程序中存在的bug或者不符合規約的代碼。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於抽象語法樹的用於提高程序可讀性和可維護性性的格式化工具、代碼重構工具和代碼規範檢查工具;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與IDE集成的程序分析工具,可爲程序員提供智能提示,並自動修復代碼中的拼寫錯誤;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"源到源的編譯技術則應用於各種領域特定語言(Domain Specific Language,DSL)的處理;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"軟件成分分析工具,即利用編譯技術分析大型軟件可能引用了哪些第三方組件,各三方組件是否有已知安全漏洞、兼容性問題和許可證問題等,從而在開發早期階段就避免或修復此類問題。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"zerowidth","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"六、爲什麼數據流分析在發現程序問題 (包括錯誤、安全漏洞等)方面更有效?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以另一種廣泛使用的基於抽象語法樹(Abstract Syntax Trees,AST)的檢查爲比較對象,基於AST的檢查工具通常用於檢查輸入源語言的詞法、語法檢查和基於模式匹配的規則檢查。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如,對於類似int x = y / 0;這樣的語句,AST檢查器能檢查出語句包含一個除數爲零的錯誤;但如果代碼是int x = y / z; 如果z在程序的某處會被賦值爲0且該賦值語句能到達做除法的這條語句,程序執行同樣會產生除數爲零的錯誤。AST檢查器因爲難以跟蹤變量的定義使用和值傳播情況而無法檢測後面這種錯誤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"類似的,在跨越函數邊界傳遞指針時,通常會約定由調用函數或者被調用函數檢查空指針。在軟件有很多不同組織開發的模塊或者開源模塊集成時,不同模塊間的調用約定不一致時會因重複檢查而導致性能問題,或遺漏檢查而導致程序錯誤和安全問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於相關的函數調用語句和空指針檢查語句處於不同的抽象語法樹中,這樣的檢查也很難在AST檢查器中實現。而結合數據流分析和上下文分析就能很好檢測這類錯誤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"程序中常見的錯誤或安全漏洞往往可以歸納爲源(Source)-匯(Sink)模型,即引起錯誤或安全漏洞的數據來自一個有問題的源,經過一系列的流動(Flow,例如被複制到另一個變量,寫入內存再讀出,等)後最終在匯的地方引發程序錯誤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以安全漏洞中最常見的釋放後使用(Use After Free)爲例,從指針所指向的內存被釋放的語句(Source)開始,如果指針或其別名指針進過一系列的數據流操作後能到達指針被解引用的語句(Sink),即會觸發一個釋放後使用的錯誤。對於這類問題,可以給特定的語句或變量(Source)打上特定標籤,利用數據流分析技術複製和傳播這個標籤,隨後在可能引起程序錯誤的語句或變量(Sink)上檢測是否存在特定標籤,即可高效準確的檢測此類問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"編譯器技術自誕生至今已經過去幾十年,主要的分析和優化技術都誕生於上世紀70年代到90年代。進入二十一世紀後,編譯相關技術已經不是學術界或工業界研究熱點,但仍有不少技術人員持續深耕該領域,希望做出擁有行業影響的優秀產品。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如鑑釋自主開發的代碼分析工具愛科識(Xcalscan)通過編譯器技術,於軟件生命週期的早期就在後端位置檢查代碼漏洞、優化代碼質量,賦能企業提高開發效率、降低開發成本,至今已經與人工智能芯片、無人駕駛、智能家居等領域的頭部企業展開合作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看完了文章,您是否已經對編譯器技術有了更深刻的認識呢?未來鑑釋也將帶來更多技術乾貨,敬請期待!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"本文作者:賴建新,鑑釋研發負責人,畢業於清華大學計算機系,有着豐富的編譯器優化和高級程序靜態分析的經驗。","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"link","attrs":{"href":"http://www.xcalibyte.com.cn/","title":"","type":null},"content":[{"type":"text","text":"點擊瞭解更多鑑釋產品及資訊!","attrs":{}}],"marks":[{"type":"strong"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章