Linux:爲什麼性能工具需要 BPF 技術

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e1/e162423fc924afc417fe0b088471e9fa.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"瞭解更多BPF技術內幕,推薦閱讀","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"《BPF之巔:洞悉Linux系統和應用性能","attrs":{}},{"type":"text","text":"》一書。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"▼","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"BPF","attrs":{}},{"type":"text","text":"是近年來Linux 系統技術領域一個巨大的創新。作爲 Linux 內核的一個關鍵發展節點,其重要程度不亞於虛擬化、容器、SDN 等技術。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"▼BPF 的工作方式十分有趣 :","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終用戶使用 BPF 虛擬機的指令集(也稱 BPF 字節碼)定義過濾器表達式,然後傳遞給內核,由解釋器執行。這使得包過濾可以在內核中直接進行,避免了向用戶態進程複製每個數據包,從而提升了數據包過濾的性能,tcpdump(8) 就是這樣工作的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BPF 還提供了安全性保障,因爲用戶定義的過濾器在執行前必須首先通過安全性驗證。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"早期的包過濾必須在內核空間執行,安全是一個硬性要求。大家可以從下圖瞭解這一切是如何工作的。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e8/e8fe280ffee8897dad252232b27d088e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"tcpdump 和 BPF","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在運行 tcpdump(8) 時帶上命令行參數 -d,可以打印出使用過濾器表達式的 BPF 指令。例如 :","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5b/5b449282a0cb9c00c570e52abd7d9ca5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"▊ 經典 BPF 與擴展版 BPF","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最初的 BPF 現在被稱爲“經典 BPF”,它是一個功能有限的虛擬機。它有兩個寄存器,一個由 16 個內存槽位組成的臨時存儲區域和一個程序計數器。以上部件均按 32 位寄存器大小運行。經典 BPF 於 1997 年進入 Linux 內核版本 2.1.75。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而後Alexei Starovoitov 創造了擴展版 BPF(eBPF)。這是 20 年來 BPF 的第一次重大更新,此舉也","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"將 BPF 擴展爲一個通用的虛擬機。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然BPF通常被稱爲虛擬機,不過這往往指的是它的實現規範。BPF在Linux中的實際實現(運行時支持)同時包括一個解釋器和一個可即時編譯爲本機指令的編譯器。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“虛擬機”一詞似乎意味着在處理器之上運行另一個機器層,而實際BPF執行並非如此。JIT編譯後的代碼會像任何其他本地內核代碼一樣,直接在處理器上運行。要注意,在Spectre漏洞公佈之後,一些發行版默認在x86架構上啓用JIT,完全移除了內核中的解釋器實現(通過條件編譯直接排除了相關代碼)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"擴展版的 BPF 中增加了更多寄存器,並將字長從 32 位增至 64 位,創建了靈活的BPF 映射型存儲(map),並允許調用一些受限制的內核功能。同時,eBPF 被設計爲可以使用即時編譯(JIT),機器指令與寄存器可以一對一映射。這就使得先前的處理器本地指令優化技術,可以重用於 BPF 之上。BPF 驗證器也進行了更新以便支持這些擴展,而且能夠拒絕任何不安全的代碼。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經典 BPF 和擴展版 BPF 之間的差異如下。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/09/09d2e9454ec331ad0dfe16de66516458.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在最早的代碼補丁中,擴展版BPF曾被簡寫爲 eBPF,不過如今有關的開發討論中,都直接使用BPF 這種叫法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Linux BPF 運行時(runtime)的各模塊的架構如下圖。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/aa/aaecea9bc61e2d516ac85919f677e175.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BPF 運行時的內部結構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖展示了 BPF 指令如何通過 BPF 驗證器驗證,再由 BPF 虛擬機執行。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BPF 虛擬機的實現既包括一個解釋器,又包括一個 JIT 編譯器 :JIT 編譯器負責生成處理器可直接執行的機器指令。驗證器會拒絕那些不安全的操作,這包括針對無界循環的檢查 :BPF 程序必須在有限的時間內完成。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BPF 可以利用輔助函數獲取內核狀態,利用 BPF 映射表進行存儲。BPF 程序在特定事件發生時執行,包括 kprobes、uprobes 和跟蹤點等事件。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"接下來我們來討論一下,爲什麼性能工具需要 BPF 技術。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"▊ 爲什麼性能工具需要 BPF 技術","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能工具使用擴展版 BPF 來實現可編程性。BPF 程序可以執行自定義的延遲計算和統計摘要等功能。這些特性本身就足夠使 BPF 成爲一個有趣的工具。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過事實上有很多跟蹤工具都具備了這些功能。BPF 與衆不同之處在於,它還同時具備高效率和生產環境安全性的特點,並且它已經被內置在 Linux 內核中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了 BPF,你就可以在生產環境中直接運行這些工具,而","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"無須增加新的內核組件。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"▼","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們通過一個工具的輸出和一幅圖來看一下性能工具是如何使用 BPF 的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個 例子的輸出來自","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"性能優化大師Gregg","attrs":{}},{"type":"text","text":"以前發佈的一個叫作 bitehist 的 BPF 工具,它用直方圖的形式展示磁盤 I/O 的尺寸分佈:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/5c/5ce2f8488fc6128ab575b3fa8af41be4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖顯示了使用 BPF 之前和之後的直方圖生成過程。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c0/c0729b94ac0fe9bc616ab2cc167a26d7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 BPF 之前和之後生成直方圖過程的對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏的關鍵變化是,直方圖可以在內核上下文中生成,這大大減少了需要複製到用戶空間的數據量。這裏的效率提升是如此的顯著,以至於工具的額外開銷減小到可以在生產環境下直接運行的程度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"使用 BPF 之前","attrs":{}},{"type":"text","text":",製作這一直方圖摘要的最佳步驟如下。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.在內核中 :開啓磁盤 I/O 事件的插樁觀測。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.在內核中,針對每個事件 :向 perf 緩衝區寫入一條記錄。如果使用了跟蹤點技術(推薦方式),記錄中會包含關於磁盤 I/O 的幾個元數據字段。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 在用戶空間 :週期性地將所有事件的緩衝區內容複製到用戶空間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. 在用戶空間 :遍歷每個事件,解析字節字段的事件元數據字段。其他字段會被忽略。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5. 在用戶空間 :生成字節字段的直方圖摘要","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中步驟 2 到步驟 4 對於高 I/O 的系統來說性能開銷非常大。可以想象一下,將 10000個磁盤 I/O 跟蹤記錄複製到用戶空間程序中,然後解析以生成摘要信息—每秒執行 1 次。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"使用 BPF 之後","attrs":{}},{"type":"text","text":",bitesize 程序執行的步驟如下。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 在內核中:啓用磁盤 I/O 事件的插樁觀測,並掛載一個由 bitesize 工具定義的BPF 程序。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 在內核中,對每次事件 :運行 BPF 程序。它只獲取字節字段,並將其保存到自定義的 BPF 直方圖映射數據結構中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.在用戶空間 :一次性讀取 BPF 直方圖映射表並輸出結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個過程避免了將事件複製到用戶空間並再次對其處理的成本,也避免了對未使用的元數據字段的複製。如前面的程序輸出截圖所示,唯一需要複製到用戶空間的數據是“count”列,其是一個數字數組。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"▊ BPF 與內核模塊的對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有一種方法可以理解 BPF 在可觀測性方面的優勢 :","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"將其與內核模塊進行比較","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kprobes 和跟蹤點已經出現多年了,可以直接從可加載的內核模塊中使用。與使用內核模塊相比,使用 BPF 進行跟蹤的優勢如下 :","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"● BPF 程序會通過驗證器的安全性檢查 ;內核模塊則可能會引入 bug(內核崩潰)或安全漏洞。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"● BPF 通過映射提供豐富的數據結構支持。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"● BPF 程序可以一次編譯,然後在任何地方運行,因爲 BPF 指令集、映射表結構、輔助函數和相關基礎設施屬於穩定的 ABI。(當然,有些 BPF 程序包含了不穩定的因素,比如使用了 kprobes 來觀測內核數據結構,這會影響 BPF 程序的自身穩定性)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"● BPF 程序的編譯不依賴內核編譯過程的中間結果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"● 與開發內核模塊所需的工程量相比,BPF 編程更加易學,可以讓更多人上手。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"請注意,在網絡領域應用 BPF 還有額外的好處,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"包括原子性替換 BPF 程序的能力。","attrs":{}},{"type":"text","text":"如果使用內核模塊,則需要先從內核中將其完全卸載,然後再次加載,這可能會導致相關服務中斷。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用內核模塊的一個好處是 :在模塊中可以使用其他內核函數和內核設施,而不僅限於 BPF 提供的輔助函數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過,如果調用任意內核函數的能力被濫用,也會帶來引入bug 的額外風險。","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"瞭解更多BPF技術內幕,推薦閱讀","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"《BPF之巔:洞悉Linux系統和應用性能","attrs":{}},{"type":"text","text":"》一書。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c7/c703899a20a5036bbc6890c17cf64afc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"▊《BPF之巔:洞悉Linux系統和應用性能》","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"【美】Brendan Gregg 著","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"孫宇聰 呂宏利 劉曉舟 譯","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gregg大師新作,《性能之巔》再續新篇","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能優化的萬用金典,150+分析調試工具深度剖析","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本書作爲全面介紹 BPF 技術的圖書,從 BPF 技術的起源到未來發展方向都有涵蓋,不僅全面介紹了 BPF 的編程模型,還完整介紹了兩個主要的 BPF 前端編程框架 — BCC 和 bpftrace,更給出了一系列實現範例,生動展示了 BPF技術的實際能力和未來發展前景。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章