雲原生資源隔離技術——CPU隔離

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"導語","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"混部,通常指在離線混部(也有離在線混部之說),意指通過將在線業務(通常爲延遲敏感型高優先級任務)和離線任務(通常爲 CPU 消耗型低優先級任務)同時混合部署在同一個節點上,以期提升節點的資源利用率。其中的關鍵難點在於底層資源隔離技術,嚴重依賴於 OS 內核,而現有的原生 Linux kernel 提供的資源隔離能力在面對混部需求時,再次顯得有些捉襟見肘(或至少說不夠完美),仍需深度 Hack,方能滿足生產級別的需求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"(雲原生)資源隔離技術主要包括 CPU、memory、IO 和網絡,4個方面。本文聚焦於 CPU 隔離技術和相關背景,後續(系列)再循序漸進,逐步展開到其他方面。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"背景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"無論在 IDC,還是在雲場景中,","attrs":{}},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}},{"type":"strong","attrs":{}}],"text":"資源利用率低","attrs":{}},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"都絕對是大部分用戶/廠商面臨的共同難題。一方面,硬件成本很高(大家都是靠買,而且絕大部分硬件(核心技術)掌握於別人手中,沒有定價權,議價能力也通常很弱),生命週期還很短(過幾年還得換新);另一方面,極度尷尬的是這麼貴的東西無法充分利用,那 CPU 佔用率來說,絕大部分場景的平均佔用率都很低(如果我拍不超過20%(這裏指日均值,或周均值),相信大部分同學都不會有意見,這就意味着,賊貴的東西實際只用了不到五分之一,如果你還想正經的居家過日子,想必都會覺得心疼。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"因此,提升主機(節點)的資源利用率是一項非常值得探索,同時收益非常明顯的任務。解決思路也非常直接,","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"常規的思維模式:多跑點業務。說起來容易,誰沒試過呢。核心困難在於:通常的業務都有明顯的峯谷特徵","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"你希望的樣子可能是這樣的:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/54/54e91dc151ba72cee0763f1dd1f2ebb7.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"但現實的樣子多半是這樣的:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a9/a98980efff5bffc9337f290cb5f74fa9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"而在爲業務做容量規劃時,需要按 Worst Case 做(假設所有業務的優先級都一樣),具體來說,從 CPU 層面的話,就需要按 CPU 峯值(可能是周峯值、甚至月/年峯值)的來規劃容量(通常還得留一定的餘量,應對突發),","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e4/e4d31d11fc49a2914a130a152bff914c.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"而現實中大部分情況是:峯值很高,但實際的均值很低。因此導致了絕大部分場景中的 CPU 均值都很低,實際 CPU 利用率很低。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"前面做了個假設:“所有業務的優先級都一樣”,業務的 Worst Case 決定了整機的最終表現(資源利用率低)。如果換種思路,但業務區分優先級時,就有更多的發揮空間了,可以通過犧牲低優先級業務的服務質量(通常可以容忍)來保證高優先級業務的服務質量,如此能部署在適量高優先級業務的同時,部署更多的業務(低優先級),從而整體上提升資源利用率。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"混部(混合部署)因此應運而生。這裏的“混”,本質上就是“區分優先級”。狹義上,可以簡單地理解爲“在線+離線”(在離線)混部,廣義上,可以擴展到更廣的應用範圍:多優先級業務混合部署。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"其中涉及的核心技術包括兩個層面:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"底層資源隔離技術。(通常)由操作系統(內核)提供,這是本(系列)文章的核心關注點。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"上層的資源調度技術。(通常)由上層的資源編排/調度框架(典型如 K8s)提供,打算另做系列文章展開,仍請期待。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"混部也是業界非常熱門的話題和技術方向,當前主流的頭部大廠都在持續投入,價值顯而易見,也有較高的技術門檻(壁壘)。相關技術起源甚早,頗有淵源,大名鼎鼎的 K8s(前身 Borg)其實源於 Google 的混部場景,而從混部的歷史和效果看,Google 算是行業內的標杆,號稱 CPU 佔用率(均值)能做到60%,具體可參考其經典論文:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"https://dl.acm.org/doi/pdf/10.1145/3342195.3387517","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43438.pdf","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"當然,騰訊(雲)在混部方向的探索也很早,也經歷了幾次大的技術/方案迭代,至今已有不錯的落地規模和成果,詳情又需起另外的話題,不在本文探討。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"技術挑戰","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"如前面所說,混部場景中,底層資源隔離技術至關重要,其中的“資源”,整體上分爲4個大類:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"CPU","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Memory","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"IO","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"網絡","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"本文聚焦於 CPU 隔離技術,主要分析在 CPU 隔離層面的技術難點、現狀和方案。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"CPU隔離","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"前面說的4類資源中,CPU 資源隔離可以說是最基礎的隔離技術。一方面,CPU 是可壓縮(可複用)資源,複用難度相對較低,Upstream的解決方案可用性相對較好;另一方面,CPU 資源與其他資源關聯性較強,其他資源的使用(申請/釋放)往往依賴於進程上下文,間接依賴於 CPU 資源。舉例來說,當 CPU 被隔離(壓制)後,其他如 IO、網絡的請求可能(大部分情況)因爲 CPU 被壓制(得不到調度),從而也隨之被壓制。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"因此,CPU 隔離的效果也會間接影響其他資源的隔離效果,CPU 隔離是最核心的隔離技術。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"內核調度器","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"具體來說,落地到 OS 中,CPU 隔離本質上完全依賴於","attrs":{}},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}},{"type":"strong","attrs":{}}],"text":"內核調度器","attrs":{}},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"實現,內核調度器是負載分配 CPU 資源的內核基本功能單元(很官方的說法),具體來說(狹義說),可以對應到我們接觸最多的 Linux 內核的默認調度器:CFS 調度器(本質上是一個調度類,一套調度策略)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"內核調度器決定了何時、選取什麼任務(進程)到 CPU 上執行,因此決定了混部場景中在線和離線任務的 CPU 運行時間,從而決定了 CPU 隔離效果。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Upstream kernel隔離效果","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Linux 內核調度器默認提供了5個調度類,實際業務能用的基本上只有兩種:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"CFS","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"實時調度器(rt/deadline)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"混部場景中,CPU 隔離的本質在於需要:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"當在線任務需要運行時,盡力","attrs":{}},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}},{"type":"strong","attrs":{}}],"text":"壓制","attrs":{}},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"離線任務","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"當在線任務不運行時,離線任務利用空閒CPU運行","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"對於“壓制”,基於 Upstream kernel(基於 CFS),有如下幾種思路(方案):","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"優先級","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"可以降低離線任務的優先級,或提升在線任務的優先級實現。在不修改調度類的情況下(基於默認的 CFS),可以動態調整的優先級範圍爲:[-20, 20)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"時間片的具體表現爲單個調度週期內可分得的時間片,具體來說:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"普通優先級0與最低優先級19之間的時間片分配權重比爲:1024/15,約爲:68:1","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"最高優先級-20與普通優先級0之間的時間片分配權重比爲:88761/1024,約爲:87:1","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"最高優先級-20和最低優先級19之間的時間片分配權重比爲:88761/15,約爲:5917:1","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"看起來壓制比還比較高,假如通過設置離線任務的優先級爲20,在線保持默認0(通常的做法),此時在線和離線的時間片分配權重爲68:1。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"假設單個調度週期長度爲24ms(大部分系統的默認配置),看起來(粗略估算),單個調度週期中離線能分配到的時間片約爲24ms/69=348us,可佔用約1/69=1.4%的CPU。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"實際的運行邏輯還有點差異:CFS 考慮吞吐,設置了單次運行的最小時間粒度保護(進程單次運行的最小時間):sched_min_granularity_ns,多數情況設置爲10ms,意味着離線一旦發生搶佔後,可以持續運行10ms的時間,也就意味着在線任務的調度延遲(RR切換延遲)可能達10ms。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Wakeup 時也有最小時間粒度保護(Wakeup時,被搶佔任務的最小運行時間保證):sched_wakeup_granularity_ns,多數情況設置爲4ms。意味着離線一旦運行後,在線任務的 wakeup latency(另一種典型的調度延遲)也可能達4ms。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"此外,調整優先級並不能優化搶佔邏輯,具體來說,在實施搶佔時(wakeup 和週期性),並不會參考優先級,並不會因爲優先級不同,而實時不同的搶佔策略(不會因爲離線任務優先級低,而壓制其搶佔,減少搶佔時機),因此有可能導致離線產生不必要的搶佔,從而導致干擾。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Cgroup(CPU share)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Linux內核提供了 CPU Cgroup(對應於容器pod),可以通過設置 Cgroup 的 share 值來控制容器的優先級,也就是說可以通過調低離線 Cgroup 的 share 值來實現“壓制\"目的。對於 Cgroup v1 來說,Cgroup 的默認 share 值爲1024,Cgruop v2 的默認 share(weight) 值爲100(當然還可以調),如果設置離線 Cgroup 的 share/weight 值爲1(最低值),那麼,在CFS中,相應的時間片分配權重比分別爲:1024:1和100:1,對應的CPU佔用分別約爲0.1%和1%。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"實際運行邏輯仍然受限於 sched_min_granularity_ns 和 sched_wakeup_granularity_ns。邏輯跟優先級場景類似。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"與優先級方案類似,搶佔邏輯未根據 share 值優化,可能存在額外的干擾。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"特殊 policy","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"CFS中還提供了一個特殊的調度 policy:SCHED_IDLE,專用於運行優先級極低的任務,看起來是專爲”離線任務“設計的。SCHED_IDLE 類任務本質上是有一個權重爲3的 CFS 任務,其與普通任務的時間片分配權重比爲:1024:3,約爲334:1,此時離線任務的 CPU 佔用率約爲0.3%。時間片分配如:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/63/6398659daebc140a98722b504f203ae2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"實際運行邏輯仍然受限於 sched_min_granularity_ns 和 sched_wakeup_granularity_ns。邏輯跟優先級場景類似。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"CFS 中對於 SCHED_IDLE 任務做了特殊的搶佔邏輯優化(壓制 SCHED_IDLE 任務對其他任務的搶佔,減少搶佔時機),因此,從這個角度看,SCHED_IDLE 爲”適配“(雖然 Upstream 本意並非如此)混部場景邁進了一小步。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"此外,由於 SCHED_IDLE 是 per-task 的標記,並無 Cgroup 級別的 SCHED_IDLE 標記能力,而 CFS 調度時,需要先選 (task)group,然後再從 group 中選 task ,因此對於雲原生場景(容器)混部來說,單純使用 SCHED_IDLE 並不能發揮實際作用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"整體看,雖然 CFS 提供了優先級(share/SCHED_IDLE 原理上類似,本質都是優先級),並可根據優先級對低優先級任務進行一定程度的壓制,但是,CFS 的核心設計在於”公平“,本質上無法做到對離線的”絕對壓制“,即使設置”優先級“(權重)最低,離線任務仍能獲得固定的時間片,而獲得的時間片不是空閒的 CPU 時間片,而是從在線任務的時間片中搶到的。也就是說,CFS 的”公平設計“,決定了無法完全避免離線任務對在線的干擾,無法達到完美的隔離效果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"除此之外,通過(極限)降低離線任務的優先級(上述幾種方案本質都是如此)的方式,本質上,還壓縮了離線任務的優先級空間,換句話說,如果還想進一步在離線任務之間區分優先級(離線任務之間也可能有 QoS 區分,實際可能有這樣的需求),那就無能爲力了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"另,從底層實現的角度看,由於在線和離線均使用 CFS 調度類,實際運行時,在線和離線共用運行隊列(rq),疊加計算 load,共用 load balance 機制,一方面,離線在做共用資源(比如運行隊列)操作時需要做同步操作(加鎖),而鎖原語本身是不區分優先級的,不能排除離線干擾;另一方面,load balance 時也無法區分離線任務,對其做特殊處理(比如激進 balance 防止飢餓、提升 CPU 利用率等),對於離線任務的 balance 效果無法控制。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"實時優先級","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"此時,你可能會想,如果需要絕對搶佔(壓制離線),爲何不用實時調度類(RT/deadline)呢?實時調度類相比於 CFS,剛好達到”絕對壓制“的效果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"確實如此。但是,這種思路下,只能將在線業務設置爲實時,離線任務保持爲 CFS,如此,在線能絕對搶佔離線,同時如果擔心離線被餓死的話,還有 rt_throttle 機制來保證離線不被餓死。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"看起來”完美“,其實不然。這種做法的本質,會壓縮在線任務的優先級空間和生存空間(與之前調低離線任務優先級的結果相反),結果是在線業務只能用實時調度類(儘管大部分在線業務並不滿足實時類型的特徵),再無法利用 CFS 的原生能力(比如公平調度、Cgroup 等,而這恰恰是在線任務的剛需)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"簡單來看,問題在於:實時類型並不能滿足在線任務自身運行的需要,本質上看在線業務自身並不是實時任務,如此強扭爲實時後,會有比較嚴重的副作用,比如系統任務(OS 自帶的任務,比如各種內核線程和系統服務)會出現飢餓等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"總結一下,對於實時優先級的方案:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"認可實時類型對於 CFS 類型的”絕對壓制“能力(這正是我們想要的)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"但當前 Upstream kernel 實現中,只能將在線任務設置爲比 CFS 優先級更高的實時類型,這是實際應用場景中無法接受的。","attrs":{}}]}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"優先級反轉","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"說到這,你心裏可能還有一個巨大的問號:”絕對壓制“後,會有優先級反轉問題吧?怎麼辦?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"答案是:的確存在優先級反轉問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"解釋下這種場景下的優先級反轉的邏輯:如果在線任務和離線任務之間有共享資源(比如內核中的一些公共數據,如 /proc 文件系統之類),當離線任務因訪問共享資源而拿到鎖(抽象一下,不一定是鎖)後,如果被”絕對壓制“,一直無法運行,當在線任務也需要訪問該共享資源,而等待相應的鎖時,優先級反轉出現,導致死鎖(長時間阻塞也可能)。優先級反轉是調度模型中需要考慮的一個經典問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"粗略總結下優先級反轉發生的條件:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"在離線存在共享資源。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"存在共享資源的併發訪問,且使用了睡眠鎖保護。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"離線拿到鎖後,被完全絕對壓制,沒有運行的機會。這句話可以這樣理解:所有的 CPU 都被在線任務100%佔用,導致離線沒有任何運行機會。(理論上,只要有空閒 CPU,離線任務就可能通過 load balance 機制利用上)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"在雲原生混部場景中,對於優先級反轉問題的處理方法(思路),取決於看待該問題的角度,我們從如下幾個不同的角度來看,","attrs":{}}]},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"優先級反轉發生可能性有多大?這取決於實際的應用場景,理論上如果在線業務和離線業務之間不存在共享資源,其實就不會發生優先級反轉。在雲原生的場景中,大體上分兩種情況:","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"(1)安全容器場景。此場景中,業務實際運行於”虛擬機“(抽象理解)中,而虛擬機自身保證了絕大部分資源的隔離性,這種場景中,基本可以避免發生優先級反轉(如果確實存在,可以特事特辦,單獨處理)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"(2)普通容器場景。此場景中,業務運行於容器中,存在一些共享資源,比如內核的公共資源,共享文件系統等。如前面分析,在存在共享資源的前提下,出現優先級反轉的條件還是比較嚴苛的,其中最關鍵的條件是:所有 CPU 都被在線任務100%佔用,這種情況在現實的場景中,是非常少見的,算是非常極端的場景,現實中可以單獨處理這樣的”極端場景“","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"因此,在(絕大部分)真實雲原生場景中,我們可以認爲,在調度器優化/hack 足夠好的前提下,可以規避。","attrs":{}}]},{"type":"numberedlist","attrs":{"start":"2","normalizeStart":"2"},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"優先級反轉如何處理?雖然優先級反轉僅在極端場景出現,但如果一定要處理的話(Upstream 一定會考慮),該怎麼處理?","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"(1)Upstream 的想法。原生 Linux kernel 的 CFS 實現中,爲最低優先級(可以認爲是 SCHED_IDLE )也保留了一定的權重,也就意味着,最低優先級任務也能得到一定的時間片,因此可以(基本)避免優先級反轉問題。這也是社區一直的態度:通用,即使是極度極端的場景,也需要完美cover。這樣的設計也恰恰是不能實現”絕對壓制“的原因。從設計的角度看,這樣的設計並無不妥,但對於雲原生混部場景來說,這樣的設計並不完美:並不感知離線的飢餓程度,也就是說,在離線並不飢餓的情況下,也可能對在線搶佔,導致不必要的干擾。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"(2)另一種想法。針對雲原生場景的優化設計:感知離線的飢餓和出現優先級反轉的可能性,但離線出現飢餓並可能導致優先級反轉時(也就是迫不得已時),才進行搶佔。如此一方面能避免不一樣的搶佔(干擾),同時還能避免優先級反轉問題。達到(相對)完美的效果。當然,不得不承認,這樣的設計不那麼 Generic,不那麼 Graceful,因此 Upstream 也基本不太可能接受。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"超線程干擾","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"至此,還漏了另一個關鍵問題:超線程干擾。這也是混部場景的頑疾,業界也一直沒有針對性的解決方案。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/91/914492b22302ede3ad64501db1ca1831.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"具體的問題是,由於同一個物理 CPU 上的超線程共享核心的硬件資源,比如 Cache 和計算單元。當在線任務和離線任務同時運行在一對超線程上時,相互之間會因爲硬件資源爭搶,而出現相互干擾的情況。而 CFS 在設計時也完全沒有考慮這個問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"導致結果是,在混部場景中,在線業務的性能受損。實際測試使用 CPU 密集型 benchmark,因超線程導致的性能干擾可達40%+。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"注:Intel 官方的數據:物理 core 性能差不多隻能1.2倍左右的單核性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"超線程干擾問題是混部場景中的關鍵問題,而 CFS 在最初設計時是(幾乎)完全沒有考慮過的,不能說是設計缺失,只能說是 CFS 並不是爲混部場景而設計的,而是爲更通用的、更宏觀的場景而生。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Core scheduling","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"說到這,專業(搞內核調度)的同學可能又會冒出一個疑問:難道沒聽說過 Core scheduling 麼,不能解決超線程干擾問題麼?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"聽到這,不得不說這位同學確實很專業,Core Scheduling 是內核調度器模塊 Maintainer Perter 在2019年提交的一個新 feature(基於更早之前的社區中曾提出的 coscheduling 概念),主要的目標在於解決(應該是 mitigation 或者是 workaround) L1TF 漏洞(由於超線程之間共享 cache 導致數據泄露),主要應用場景爲:雲主機場景中,避免不同的虛擬機進程運行於同一對超線程上,導致數據泄露。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"其核心思想是:避免不同標記的進程運行於同一對超線程上。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"現狀是:Core scheduling patchset 在經過長達v10的版本的迭代,近2年的討論和 improve/rework 之後,終於,就在最近(2021.4.22),Perter 發出了看似可能進入(何時能進入還不好說) master 的版本(還不太完整):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"https://lkml.org/lkml/2021/4/22/501","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"關於這個話題,值得一個單獨的深入的分享,不在這裏展開。也請期待...","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"這裏直接拋(個人)觀點(輕拍):","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Core scheduling 確實能用來解決超線程干擾問題。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Core scheduling 設計初衷是解決安全漏洞(L1TF),並非爲混部超線程干擾而設計。由於需要保障安全,需要實現絕對隔離,需要複雜(開銷大)的同步原語(比如 core 級別的 rq lock),重量級的 feature 實現,如 core 範圍的 pick task,過重的 force idle。另外,還有配套的中段上下文的併發隔離等。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Core scheduling 的設計和實現太重、開銷太大,開啓後性能 regression 嚴重,並不能區分在線和離線。不太適合(雲原生)混部場景。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"本質還是:Core scheduling 亦非爲雲原生混部場景而設計。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結論","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"綜合前面的分析,可以抽象地總結下當前現有的各種方案的優點和問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"基於 CFS 中的優先級(share/SCHED_IDLE 類似)方案,優點:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"通用。能力強,能全面hold住大部分的應用場景","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"能(基本)避免優先級反轉問題","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"問題:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"隔離效果不完美(沒有絕對壓制效果)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"其他各種小毛病(不完美)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"基於實時任務類型的方案,優點:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"絕對壓制,隔離效果完美","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"有機制避免優先級反轉(rt_throttle)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"問題:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"不適用。在線任務不能(大部分情況)用實時任務類型。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"有機制(rt_throttle)避免優先級反轉,但開啓後,隔離效果就不完美了。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"基於 Core scheduling 解決超線程干擾隔離,優點:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"完美超線程干擾隔離效果","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"問題:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"設計太重,開銷太大","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"結語","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Upstream Linux kernel 爲考慮通用性,設計的優雅,難以滿足特定場景(雲原生混部)中的極致需求,若想追求卓越和極致,還需要深度 Hack,而 TencentOS Server 一直在路上。(聽着耳熟?確實以前也這麼說過!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"關於 Linux kernel 的內核調度器的具體實現和代碼分析(基於5.4內核(Tkernel4)),我們後續會陸續推出相應的解析系列文章,在探討雲原生場景的痛點的同時,結合相應的代碼分析,以期降低 Linux 內核神祕感,探討更廣闊的 Hack 空間。敬請期待。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"思考","attrs":{}}]},{"type":"numberedlist","attrs":{"start":"1","normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"如果想要讓在線業務使用 CFS (利用 CFS 的強大能力),同時又想具備”絕對壓制“的能力,理想的做法應該怎麼辦?(感覺答案就要呼之欲出了!","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"如果不需要完美隔離效果(絕對壓制),同時需要處理優先級反轉,還需要”接近完美“的隔離效果,還想盡量利用現有機制(不想太大的調度器 Hack,風險更小),那又該怎麼辦?(仔細看看前面的各種現有方案的分析總結,感覺也快接近答案了)","attrs":{}}]}]}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:蔣彪","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:https://www.cnblogs.com/tencent-cloud-native/p/14754230.html","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果覺得本文對你有幫助,可以關注一下我公衆號,回覆關鍵字【面試】即可得到一份Java核心知識點整理與一份面試大禮包!另有更多技術乾貨文章以及相關資料共享,大家一起學習進步!","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/20/20d8e05b028b03ea8419f639ab2bae6f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章