讓容器跑得更快:CPU Burst技術實踐

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讓人討厭的 CPU 限流影響容器運行,有時人們不得不犧牲容器部署密度來避免 CPU 限流出現。我們設計的CPU Burst 技術既能保證容器運行服務質量,又不降低容器部署密度。CPU Burst 特性已合入 Linux 5.14,Anolis OS 8.2、Alibaba Cloud Linux2、Alibaba Cloud Linux3也都支持CPU Burst特性。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 K8S 容器調度中,容器的 CPU 資源上限是由 CPU limits 參數指定。設置 CPU 資源上限可以限制個別容器消耗過多的 CPU 運行時間,並確保其他容器拿到足夠的 CPU 資源。"},{"type":"text","marks":[{"type":"strong"}],"text":"CPU limits 限制在 Linux 內核中是用 CPU Bandwidth Controller 實現的,它通過 CPU限流限制 cgroup 的資源消耗"},{"type":"text","text":"。所以當一個容器中的進程使用了超過 CPU limits 的資源的時候,這些進程就會被 CPU 限流,他們使用的 CPU 時間就會受到限制,進程中一些關鍵的延遲指標就會變差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面對這種情況,我們應該怎麼辦呢?"},{"type":"text","marks":[{"type":"strong"}],"text":"一般情況下,我們會結合這個容器日常峯值的 CPU 利用率並乘以一個相對安全的係數來設置這個容器的 CPU limits ,這樣我們既可以避免容器因爲限流而導致的服務質量變差,同時也可以兼顧 CPU 資源的利用"},{"type":"text","text":"。舉個簡單的例子,我們有一個容器,他日常峯值的 CPU 使用率在 250% 左右,那麼我們就把容器 CPU limits 設置到 400% 來保證容器服務質量,此時容器的 CPU 利用率是 62.5%(250%\/400%)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而生活真的那麼美好嗎?顯然不是!CPU 限流的出現比預期頻繁了很多。怎麼辦?似乎看上去我們只能繼續調大 CPU limits 來解決這個問題。很多時候,"},{"type":"text","marks":[{"type":"strong"}],"text":"當容器的 CPU limits 被放大 5~10 倍的時候,這個容器的服務質量纔得到了比較好的保障,相應的這時容器的總 CPU 利用率只有 10%~20%。"},{"type":"text","text":"所以爲了應對可能的容器 CPU 使用高峯,容器的部署密度必須大大降低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歷史上人們在 CPU Bandwidth Controller 中修復了一些 BUG 導致的 CPU 限流問題,我們發現當前非預期限流是由於100ms級別CPU突發使用引起,並且提出 CPU Burst 技術允許一定的 CPU 突發使用,避免平均 CPU 利用率低於限制時的 CPU 限流。"},{"type":"text","marks":[{"type":"strong"}],"text":"在雲計算場景中,CPU Burst 技術的價值有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"不提高 CPU 配置的前提下改善 CPU 資源服務質量;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"允許資源所有者不犧牲資源服務質量降低CPU資源配置,提升CPU資源利用率;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"降低資源成本(TCO, Total Cost of Ownership)。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"你看到的CPU利用率不是全部真相"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"秒級 CPU 利用率不能反映 Bandwidth Controller 工作的 100ms 級別 CPU 使用情況,是導致非預期 CPU 限流出現的原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Bandwidth Controller 適用於 CFS 任務,用 period 和 quota 管理 cgroup 的 CPU 時間消耗。若 cgroup 的 period 是 100ms quota 是 50ms,cgroup 的進程每 100ms 週期內最多使用 50ms CPU 時間。當 100ms 週期的 CPU 使用超過 50ms 時進程會被限流,cgroup 的 CPU 使用被限制到 50%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"CPU 利用率"},{"type":"text","text":"是一段時間內 CPU 使用的平均,以較粗的粒度統計 CPU 的使用需求,CPU 利用率趨向穩定;當觀察的粒度變細,CPU 使用的突發特徵更明顯。以 1s 粒度和 100ms 粒度同時觀測容器負載運行,當觀測粒度是 1s 時 CPU 利用率的秒級平均在 250% 左右,而在 Bandwidth Controller 工作的 100ms 級別觀測 CPU 利用率的峯值已經突破 400% 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/8a\/d4\/8a653f0c7fd8d6d30dc6b643bcdb54d4.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據秒級觀察到的 CPU 利用率 250% 設置容器 quota 和 period 分別爲 400ms 和 100ms ,容器進程的細粒度突發被 Bandwidth Controller 限流,容器進程的 CPU 使用受到影響。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"如何改善"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們用 CPU Burst 技術來滿足這種細粒度 CPU 突發需求,在傳統的 CPU Bandwidth Controller quota 和 period 基礎上引入 burst 的概念。當容器的 CPU 使用低於 quota 時,可用於突發的 burst 資源累積下來;當容器的 CPU 使用超過 quota,允許使用累積的 burst 資源。最終達到的效果是將容器更長時間的平均 CPU 消耗限制在 quota 範圍內,允許短時間內的 CPU 使用超過其 quota。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fa\/b2\/fa083e98a70e33d34842dc0dceb5d5b2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果用 Bandwidth Controller 算法來管理休假,假期管理的週期(period)是一年,一年裏假期的額度是 quota ,有了 CPU Burst 技術之後今年修不完的假期可以放到以後來休了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"在容器場景中使用 CPU Burst 之後,測試容器的服務質量顯著提升"},{"type":"text","text":"。觀察到 RT 均值下降 68%(從 30+ms 下降到 9.6ms );99%  RT 下降 94.5%(從 500+ms 下降到 27.37ms )。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/1c\/a0\/1c81c21459ff0a0029a920973c2dc5a0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"CPU Bandwidth Controller的保證"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用CPU Bandwidth Controller可以避免某些進程消耗過多CPU時間,並確保所有需要CPU的進程都拿到足夠的CPU時間。之所以有這樣好的穩定性保證,是因爲當Bandwidth Controller設置滿足下述情況時,"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/4d\/5e\/4d24b4f841314123b2feb8b2b2598d5e.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有如下的"},{"type":"text","marks":[{"type":"strong"}],"text":"調度穩定性約束:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/fd\/0b\/fdfa855d986065a156e5459a0a6d190b.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/9e\/a9\/9e862c72e6d29f008a9716cafef0yya9.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"是第i個cgroup的quota,是一個period內該cgroup的CPU需求。Bandwidth Controller對每個週期分別做CPU時間統計,調度穩定性約束保證在一個period內提交的全部任務都能在該週期內處理完;對每個CPU cgroup而言,這意味着任何時候提交的任務都能在一個period內執行完,即"},{"type":"text","marks":[{"type":"strong"}],"text":"任務實時性約束:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/27\/75\/27eb3d88b5319dd1d381dbef199fbd75.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不管任務優先級如何,最壞情況下任務執行時間(WCET, Worst-Case Execution Time)不超過一個period。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如持續出現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/5d\/76\/5d1bdf16fbc2fa958d2ee0c17dba0a76.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度器穩定性被打破,在每個period都有任務積攢下來,新提交的作業執行時間不斷增加。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"使用CPU Burst的影響"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"出於改善服務質量的需要,我們使用CPU Burst允許突發的CPU使用之後,對調度器的穩定性產生什麼影響?"},{"type":"text","marks":[{"type":"strong"}],"text":"答案是當多個cgroup同時突發使用CPU,調度器穩定性約束和任務實時性保證有可能被打破"},{"type":"text","text":"。這時候兩個約束得到保證的概率是關鍵,如果兩個約束得到保證的概率很高,對大多數週期來任務實時性都得到保證,可以放心大膽使用CPU Burst;如果任務實時性得到保證的概率很低,這時候要改善服務質量不能直接使用CPU Burst,應該先降低部署密度提高CPU資源配置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"於是下一個關心的問題是,怎麼計算特定場景下兩個約束被打破的概率。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"評估影響大小"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定量計算可以定義成經典的排隊論問題,並且用蒙特卡洛模擬方法求解。"},{"type":"text","marks":[{"type":"strong"}],"text":"定量計算的結果表明"},{"type":"text","text":",判斷當前場景是否可以使用CPU Burst的主要影響因素是平均CPU利用率和cgroup數目。CPU利用率越低,或者cgroup數目越多,兩個約束越不容易被打破可以放心使用CPU Burst。反之如果CPU利用率很高或者cgroup數目較少,要消除CPU限流對進程執行的影響,應該降低部署提高配置再使用CPU Burst。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"問題定義是:一共有"},{"type":"text","marks":[{"type":"strong"}],"text":"m"},{"type":"text","text":"個cgroup,每個cgroup的quota限制爲1\/m"},{"type":"text","marks":[{"type":"strong"}],"text":","},{"type":"text","text":"每個cgroup在每個週期產生的計算需求(CPU利用率)"},{"type":"text","marks":[{"type":"strong"}],"text":"服從某個具體分佈"},{"type":"text","text":",這些分佈是相互獨立的。假設任務在每個週期的開始到達,如果該週期內的CPU需求超過100%,當前週期任務WCET超過1個period,超過的部分累積下來和下個週期新產生的CPU需求一起在下個需求處理。輸入是cgroup的數目m和每個CPU需求滿足的具體分佈,輸出是每個週期結束WCET > period的概率和WCET期望。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以輸入的CPU需求爲帕累託分佈、m=10\/20\/30的結果爲例進行說明。選擇帕累託分佈進行說明的原因是它產生比較多的長尾CPU突發使用,容易產生較大影響。表格中數據項的格式爲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c2\/90\/c25e6472cdb28511a67446c6134af790.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/39\/7f\/3963e14b8174c0697c601403yy18267f.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"越接近1越好,"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/64\/e7\/645e75b664b241c9147d6c6fa4784fe7.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"概率越低越好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

u_avg

m=10

m=20

m=30


10%

1.0000\/0.00%

1.0000\/0.00%

1.0000\/0.00%


30%

1.0000\/0.00%
1.0000\/0.00%
1.0000\/0.00%


50%

1.0003\/0.03%
1.0000\/0.00%
1.0000\/0.00%


70%

1.0077\/0.66%
1.0013\/0.12%
1.0004\/0.04%


90%

1.4061\/19.35%
1.1626\/10.61%
1.0867\/6.52%
"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結果跟直覺是吻合的。一方面,CPU需求(CPU利用率)越高,CPU突發越容易打破穩定性約束,造成任務WCET期望變長。另一方面,CPU需求獨立分佈的cgroup數目越多,它們同時產生CPU突發需求的可能性越低,調度器穩定性約束越容易保持,WCET的期望越接近1個period。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"場景和參數設定"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們設定整個系統存在 m 個 cgroup,每個 cgroup 公平瓜分總量爲 100% 的 CPU 資源,即"},{"type":"text","marks":[{"type":"strong"}],"text":" quota=1\/m"},{"type":"text","text":"。每個 cgoup 按相同規律(獨立同分布)產生計算需求並交給 CPU 執行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/71\/03\/71df2d21c56fb5204be40940c2b09003.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們參考排隊論的模型,將每個 cgroup 視爲一位顧客,CPU 即爲服務檯,每位顧客的服務時間受到 quota 的限制。爲了簡化模型,我們離散化地定義所有顧客的到達時間間隔爲常數,然後在該間隔內"},{"type":"text","marks":[{"type":"strong"}],"text":" CPU 最多能服務 100% 的計算需求,這個時間間隔即爲一個週期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後我們需要定義每位顧客在一個週期內的"},{"type":"text","marks":[{"type":"strong"}],"text":"服務時間"},{"type":"text","text":"。我們假定顧客產生的計算需求是獨立同分布的,其平均值是自身 quota 的 u_avg 倍。顧客在每個週期得不到滿足的計算需求會一直累積,它每個週期向服務檯提交的服務時間取決於它自身的計算需求和系統允許的最大 CPU time(即其 quota 加上之前週期累積的 token)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,"},{"type":"text","marks":[{"type":"strong"}],"text":"CPU Burst 技術中有一項可調參數 buffer"},{"type":"text","text":",表示允許累積的 token 上限。它決定了每個 cgroup 的瞬時突發能力,我們將其"},{"type":"text","marks":[{"type":"strong"}],"text":"大小用 quota 的 b 倍"},{"type":"text","text":"表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們對上述定義的參數作出瞭如下設置:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"

參數

描述

distribution

計算需求產生的分佈

負指數、帕累託

u_avg

平均產生的計算需求

10%-90%

m

cgroup(容器)個數

10、20、30

b

令牌桶的buffer大小(相對於其quota的倍率)

100%、200%、∞"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"負指數分佈是排隊論模型中最常見、最多被使用的分佈之一。其密度函數爲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ac\/ae\/ac294f675bf4e8ebfe17b5c3fd36f8ae.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/3a\/8d\/3a20ef84f59befdcba38833862ed558d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"帕累託分佈是計算機調度系統中比較常見的分佈,且它能夠模擬出較大的延遲長尾,從而體現 CPU Burst 的效果。其密度函數爲:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/39\/39\/39d4f204f40ff64783e496847bcac739.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了抑制尾部的概率分佈使其不至於過於誇張,我們設置了:"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/7e\/cb\/7e6f104d2956fe20a9c5f52b54ef28cb.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時當 u_avg=30% 時可能產生的最大計算需求約爲 500%。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"數據展示"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按上述參數設置進行蒙特卡洛模擬的結果如下所示。我們將第一張(WCET 期望)的圖表 y 軸進行顛倒來更好地符合直覺。同樣地,第二張圖表(WCET 等於 1 的概率)"},{"type":"text","marks":[{"type":"strong"}],"text":"表示調度的實時性得到保證的概率,以百分制表示"},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"負指數分佈"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/72\/bf\/7279cdf5d4d387397f232e9458de5ebf.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/31\/1a\/31b9aee077cf13c70546f60a6220c91a.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"帕累託分佈"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/66\/yy\/66e3102c46f8a3dbcd0156fc42351cyy.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/80\/54\/805d91798b8e9ca6958eaa4a3302ed54.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結論"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"一般來說,u_avg(計算需求的負荷)越高,m(cgroup數量)越少,WCET 越大"},{"type":"text","text":"。前者是顯然的結論,後者是因爲獨立同分布情況下任務數量越多,整體產生需求越趨於平均,超出 quota 需求的任務和低於 quota 從而空出 cpu 時間的任務更容易互補。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"提高 buffer 會使得 CPU Burst 發揮更好的效果,"},{"type":"text","text":"對單個任務的優化收益更明顯;但同時也會增大 WCET,意味着增加對相鄰任務的干擾。這也是符合直覺的結論。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在設置 buffer 大小時,我們建議根據具體業務場景的計算需求(包括分佈和均值)和容器數量,以及自身需求來決定。"},{"type":"text","marks":[{"type":"strong"}],"text":"如果希望增加整體系統的吞吐量,以及在平均負荷不高的情況下優化容器性能,"},{"type":"text","text":"可以增大 buffer;反之如果希望保證調度的穩定性和公平性,在整體負荷較高的情況下減少容器受到的影響,可以適當減小 buffer。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般而言,在低於 70% 平均 CPU 利用率的場景中,CPU Burst 不會對相鄰容器造成較大影響。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"模擬工具與使用方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說完了枯燥的數據和結論,接下來介紹可能有許多讀者關心的問題:CPU Burst 會不會對我的實際業務場景造成影響?爲了解決這個疑惑,我們將蒙特卡洛模擬方法所用工具稍加改造,從而能幫助大家在自己的實際場景中測試具體的影響~"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"工具可以在這裏獲取:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"https:\/\/codeup.openanolis.cn\/codeup\/yingyu\/cpuburst-simulator"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"詳細的使用說明也附在 README 中了,下面讓我們看一個具體的例子吧。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"小A想在他的服務器上部署 10 臺容器用於相同業務。爲了獲取準確的測量數據,他先啓動了一臺容器正常運行業務,綁定到名爲 cg1 的 cgroup 中,不設限流以獲取該業務的真實表現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然後調用 "},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":"sample.py  "},{"type":"text","text":"進行數據採集:(演示效果只採集了 1000 次,實際建議有條件的情況下采集次數越大越好)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f8\/46\/f8c17bbd09854344a0e0683e450bff46.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些數據被存儲到了"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":".\/data\/cg1_data.npy "},{"type":"text","text":"中。最後輸出的提示說明該業務平均佔用了約 6.5% 的 CPU,部署 10 臺容器的情況下總的平均 CPU 利用率約爲 65%。(PS:方差數據同樣打印出來作爲參考,也許方差越大,越能從 CPU Burst 中受益哦)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,他利用"},{"type":"text","marks":[{"type":"italic"},{"type":"strong"}],"text":" simu_from_data.py "},{"type":"text","text":"計算配置 10個 和 cg1 相同場景的 cgroup 時,將 buffer 設置爲 200% 的影響:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/91\/35\/91bd55fda3c84677f159038cf5322435.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據模擬結果,開啓 CPU Burst 功能對該業務場景下的容器幾乎沒有負面影響,小A可以放心使用啦。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"想要進一步瞭解該工具的用法,或是出於對理論的興趣去改變分佈查看模擬結果,都可以訪問上面的倉庫鏈接找到答案~"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"關於作者"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常懷鑫(一齋),阿里雲內核組工程師,擅長CPU調度領域。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"丁天琛(鷹羽),2021年加入阿里雲內核組,目前在調度領域等方面學習研究。"}]}]}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章