vivo AI計算平臺kubernetes集羣彈性伸縮實踐

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"1、背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"2018 年底,vivo AI 研究院爲了解決統一高性能訓練環境、大規模分佈式訓練、計算資源的高效利用調度等痛點,着手建設 AI 計算平臺。經過兩年的持續迭代,平臺建設和落地取得了很大進展,成爲 vivo AI 領域的核心基礎平臺。平臺從當初服務深度學習訓練爲主,到現在演進成包含 VTraining、VServing、VContainer 三大模塊,對外提供模型訓練、模型推理和容器化能力。VContainer 是計算平臺的底座,基於 Kubernetes 構建的容器平臺,具備資源調度、彈性伸縮、零一混部等核心能力。VContainer 容器集羣有數千個節點,擁有超過 100PFLOPS 的 GPU 算力。集羣裏同時運行着上千個 VTraining 的訓練任務和上百個 VServing 的推理服務以及數百個在線服務項目。本文主要分享了 VContainer 容器平臺在服務彈性伸縮部署方面的實踐和落地。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":" 2、整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"業務容器化之後可以使用容器在服務器上高密度部署,以此提高集羣整體的資源利用率。業務穩定運行一段時間之後,我們發現集羣整體的資源利用率仍然存在巨大的提升空間,當前集羣資源使用主要有以下問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(1)在線服務申請資源時考慮到突發流量和服務穩定性,預留大量的buffer資源,造成資源申請量普遍遠超實際使用量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(2)大部分在線服務的潮汐現象、波峯波谷特徵非常明顯,保留過多常態資源造成巨大浪費。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(3)開發和運維評估和配置的資源規格不合理,並且動態更新不及時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"爲了進一步提升集羣整體的資源利用率、降低服務器成本,我們調研了kubernetes HPA彈性伸縮並進行了落地實踐。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/81\/81d3ce2ed89a550ab106e55b9d121a28.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"kubernetes 將業務運行環境的容器組抽象爲 Pod 資源對象,並提供各種各樣的 workload(deployment、statefulset 等)來部署 Pod,同時也提供多種資源對象來解決 Pod 部署過程中的彈性伸縮和資源供給問題。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"2.1 kubernetes autoscaling"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"kubernetes autoscaling提供多種機制來滿足Pod自動伸縮需求:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(1)Pod級別的自動伸縮:包括"},{"type":"link","attrs":{"href":"https:\/\/kubernetes.io\/docs\/tasks\/run-application\/horizontal-pod-autoscale\/","title":null,"type":null},"content":[{"type":"text","text":"Horizontal Pod Autoscaler"}],"marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(HPA)和"},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes\/autoscaler\/tree\/master\/vertical-pod-autoscaler","title":null,"type":null},"content":[{"type":"text","text":"Vertical Pod Autoscaler"}],"marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(VPA)。其中HPA會基於kubernetes集羣內置資源指標或者自定義指標來計算Pod副本需求數並自動伸縮,VPA會基於Pod在CPU\/Memory資源歷史使用詳情來計算Pod合理的資源請求並驅逐Pod以更新資源配合;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(2)Node級別的自動伸縮:"},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes\/autoscaler\/tree\/master\/cluster-autoscaler","title":null,"type":null},"content":[{"type":"text","text":"Cluster Autoscaler"}],"marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(CA)會綜合考慮Pod部署掛起或集羣容量等信息確定集羣節點資源並相應調整集羣Node數量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"本文聚焦於kubernetes集羣Pod級別的彈性伸縮實踐和落地。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"2.2 KEDA"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"kubernetes HPA原生支持依據CPU\/Memory資源利用率彈性伸縮,並在autoscaling\/v2beta2版本中通過"},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/design-proposals\/instrumentation\/custom-metrics-api.md","title":null,"type":null},"content":[{"type":"text","text":"custom.metrics.k8s.io"}],"marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":" API支持基於自定義資源的彈性伸縮。在HPA實踐落地過程中,僅僅依賴CPU\/Memory利用率彈性伸縮無法滿足業務在多指標擴縮、彈性伸縮穩定性方面的諸多需求,爲此,我們重點調研了kubernetes HPA自定義指標彈性伸縮。開源社區主要有2個相關項目,一個是 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes-sigs\/prometheus-adapter","title":null,"type":null},"content":[{"type":"text","text":"prometheus-adapter"}],"marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":",另外一個是"},{"type":"link","attrs":{"href":"https:\/\/github.com\/kedacore\/keda","title":null,"type":null},"content":[{"type":"text","text":"KEDA"}],"marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":",最終我們採用KEDA作爲彈性伸縮系統的基座,主要考慮到如下優勢點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(1)功能豐富:內嵌CPU\/Cron\/Prom多種伸縮策略,原生支持縮容至零;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(2)擴展性好:解耦被伸縮對象(支持\/scale子資源即可)和伸縮指標,提供強大的插件機制和抽象接口(scaler + metrics adapter),增加伸縮指標非常便利;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(3)維護性好:設計簡潔、功能統一、組件單一提供良好的可維護性;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(4)社區強大:CNCF官方項目,微軟和RedHat強力支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"我們基於KEDA構建kubernetes集羣彈性伸縮系統,整體流程圖如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(1)集羣Pod部署會採用多種workload(deployment、argoRollout、statefulset等),KEDA均支持這些伸縮對象,最終實際生效對象其實是\/scale子資源;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(2)所有的彈性伸縮對象統一使用KEDA管理,包括HPA內置支持的CPU\/Memory利用率彈性伸縮、自定義指標彈性伸縮。我們基於KEDA強大的插件機制擴展支持了業務亟需的GPU利用率彈性伸縮和HTTP\/RPC QPS彈性伸縮;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(3)測試集羣和預發集羣部署業務默認開啓縮容至零特性,以此自動回收長期閒置的測試服務所佔用的計算資源;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(4)基於KEDA在自動伸縮多策略、擴縮時效性、擴縮耗時、成功率、監控告警等穩定性方面做了一系列工作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/55\/5509b9334bd9d447e87200f5b59e2261.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"3、多指標彈性伸縮"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"業務方在發佈平臺做業務部署時,可以直接白屏化配置合適的彈性伸縮策略,整體界面如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/8d\/8d8df24d646c811341d0aabec656807a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"彈性伸縮策略整體分爲兩大部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(1)常規彈性伸縮:包括CPU利用率、內存利用率、GPU利用率、平均QPS,並且伸縮策略都支持配置 伸縮區間(最小副本數  <= 伸縮目標副本數 <= 最大副本數)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(2)定時伸縮:資源使用具有強週期規律、潮汐特徵明顯的業務可以配置定時伸縮,應用副本數在對應時間段內保持在指定數量。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"3.1 CPU\/Memory默認彈性伸縮"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"CPU\/Memory彈性伸縮中,我們只使用平均利用率類型的伸縮指標,整體數據流程如圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6b\/6be65ee9563746decebcbee226028399.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"3.2 Cron定時伸縮"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"互聯網在線業務通常都有明顯的流量波峯波谷現象,潮汐特徵非常明顯,針對這種資源使用具有周期規律的業務,我們通常採用定時伸縮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"kubernetes HPA計算ExternalPerPodMetric 副本數: 當前利用率\/每個Pod目標利用率"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"KEDA CronHPA中 將 \"每個Pod目標利用率\" 硬編碼固定爲1,當前利用率 設置爲  定時擴縮容時間段的副本數。這個巧妙的設計可以將普通的資源利用率彈性伸縮和定時伸縮統一起來,非常便利。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"3.3 GPU彈性伸縮"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"集羣部分業務會使用GPU資源做AI模型推理計算,這種場景下業務方重點關注GPU利用率,希望可以基於GPU資源利用率做動態伸縮。我們基於KEDA scaler插件機制開發了gpu-scaler,很好的滿足了業務方需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"GPU彈性伸縮中整體數據流程如圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b1\/b1ce74eb511342573434b255ae865c0b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"3.4 QPS彈性伸縮"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"集羣部分業務方重點關注服務QPS(HTTP\/RPC),希望可以基於服務QPS做動態伸縮。我們基於KEDA scaler插件機制開發了qps-scaler,很好的滿足了業務方需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"QPS彈性伸縮中整體數據流程如圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fc\/fc8f0fd8f4437249fbb09764ef8ba946.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"4、縮容到零"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"業務方在CICD發佈平臺進行流水線部署通常涉及多個運行環境,通常有測試環境 → 預發環境 → 生產環境,其中測試環境主要用來開發聯調,預發環境主要用來小流量驗證和預部署校驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"業務方在生產集羣部署業務之後,測試環境和預發環境的實例副本通常會閒置,這會浪費一部分的計算資源,尤其是使用到GPU稀缺資源的業務。爲此,我們在測試環境和預發環境默認開啓了縮容到零特性,縮容規則如下:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"4.1 Java業務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"當且僅當業務容器 同時滿足以下條件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"a 容器創建時間點至今 大於48小時,即2天之內新部署的服務不會縮容到零;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"b 基於請求數做縮容到零:PaaS 調用鏈查詢API獲取的 64小時內請求總數 等於0,即64小時內業務在調用鏈系統內無任何請求數據;"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"4.2 GPU業務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"當且僅當服務容器 同時滿足以下條件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"a 容器創建時間點至今 大於48小時,即2天之內新部署的服務不會縮容到零;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"b 基於GPU利用率做縮容到零:容器平臺prometheus獲取的 2天內GPU利用率峯值 小於1%,即2天內業務GPU利用率峯值小於1%;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"某些服務使用了特殊的自定義應用層協議(例如非HTTP\/Dubbo RPC協議的CPU服務),這些服務暫不適合縮容至零,容器平臺提供選項來關閉這個功能。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"5、分時複用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"在線業務彈性縮容會提升業務維度的資源利用率,但是對於整個計算集羣,由於縮容騰出的資源並沒有充分利用,集羣維度的資源利用率並沒有得到相應的提升。基於此現狀,我們開發了彈性控制器,讓集羣內的彈性資源充分的分時複用,整個流程如下所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(1)彈性控制器基於kubernetes controller標準範式開發,一直監聽在線業務的擴縮容事件,並且自動遷移在離線資源池機器;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(2)在線業務縮容時,彈性控制器確保節點空載狀態(整機部署的業務縮容之後直接空載)之後 會將在線節點自動遷移到離線任務隊列,離線業務部署控制器會自動提交併運行一些訓練任務;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(3)在線業務擴容時,彈性控制器會盡快供應所需的計算資源,首先會啓用buffer資源池機器,如若不足會繼續驅逐離線訓練任務(綜合考慮離線訓練任務定義的優先級、最大可驅逐副本數、整機驅逐代價等衆多因素) 然後將離線節點自動遷移到在線業務隊列;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/65\/653cce7a0bac10e5008f66702a74f727.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"6、穩定性建設"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"彈性伸縮的主要目標在於提升計算資源利用率,降低服務器資源成本。同時,彈性伸縮涉及應用副本數的動態增減,系統穩定性至關重要,需要確保應用副本符合預期地動態伸縮、保障應用負載在合理範圍、確保服務的SLO高可用性。爲此,我們圍繞彈性伸縮做了一系列工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"6.1 彈性伸縮可觀測(歷史記錄、事件通知)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"容器平臺基於prometheus指標數據實現彈性伸縮的可觀測,主要包括伸縮動作的時間點、服務副本數變化詳情、服務伸縮時可用性、實例伸縮的整體耗時以及結果詳情,這些指標統一聚合在統一展板內,非常清晰的展現彈性伸縮詳情以及高度聯動的服務可用性詳情。這些指標的歷史數據都通過分佈式存儲系統管理,便於回溯歷史記錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"應用實例動態伸縮的過程中,會出現一些異常情況,例如資源緊缺導致擴容長時間夯住、缺失健康檢查機制導致擴容時流量有損、縮容策略不合理導致應用負載上漲影響服務可用性,針對這些異常情況,我們完善了相應的監控指標以及告警機制,異常事件實時通知到相應業務方。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f1\/f11fc4a23527d532adee18be3916b9e2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"6.2 實例數上下限+多指標混合擴縮"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"kubernetes HPA提供實例副本上下限選項保證彈性伸縮一直在確定範圍區間內,開發運維在發佈平臺配置彈性伸縮策略時 會強制要求設置最小副本數和最大副本數。容器平臺提供服務運行詳情的歷史監控圖,開發運維以此可以非常便利地確定服務副本的合理區間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"部分業務方在使用彈性伸縮的過程中發現只依據單指標擴縮會不準確,針對這種場景,我們推薦業務方使用多指標混合擴縮,例如將CPU利用率彈性伸縮和定時伸縮混合使用,既可以滿足流量請求和資源使用具有周期性規律的場景,又可以滿足流量突發的場景。kubernetes HPA針對每個伸縮策略都會計算出一個目標副本數,最終取其最大值,儘可能確保服務穩定性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"6.3 快擴慢縮防抖動"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"kubernetes HPA實現細節裏提供一些機制來控制擴縮容速率來儘量保證整個過程的平滑,避免實例副本的劇烈抖動:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"a 擴縮決策:HPA控制器每隔15秒(--horizontal-pod-autoscaler-sync-period)執行一次擴縮決策,如果當前指標數據值與前一次的變化比例小於10%(--horizontal-pod-autoscaler-tolerance)則維持當前副本數不變,避免副本數因爲指標小幅變化而劇烈震盪波動;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"b 擴容:擴容邏輯原則是“快擴”,即擴容儘可能快,其中擴容邏輯的常量因子都是硬編碼,無法自定義,在某些場景不符合業務需求;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"c 縮容:縮容邏輯原則是“慢縮”,即縮容儘可能慢,縮容決策決定之後需要等待5min(--horizontal-pod-autoscaler-downscale-stabilization-window)然後才執行實際的縮容操作,這個5min是硬編碼的全局默認縮容時間窗口;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"kubernetes 1.18版本("},{"type":"link","attrs":{"href":"https:\/\/v1-18.docs.kubernetes.io\/docs\/reference\/generated\/kubernetes-api\/v1.18\/#horizontalpodautoscalerbehavior-v2beta2-autoscaling","title":null,"type":null},"content":[{"type":"text","text":"官方文檔"}],"marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":")支持針對服務維度擴縮容的精細化控制,實現不同場景下業務所需的擴縮容靈敏度和速率。我們可以以此特性來調整擴容\/縮容的時間窗口,實現不同的快擴慢縮、快擴快縮、慢擴慢縮、慢擴快縮行爲。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"6.4 健康檢查確保流量無損"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"服務實例動態伸縮的過程中,需要確保流量無損。kubernetes提供健康檢查機制和優雅下線機制來提升服務的可用性。彈性伸縮落地過程中,容器平臺會檢查服務的健康檢查配置並建議合理配置,確保新擴容實例健康檢查通過之後纔開始接受流量。同時容器平臺也會檢查服務的優雅下線配置,對於配置缺失的服務,容器平臺會自動注入默認的PreStop lifecycle配置,確保老實例在下線過程中不會影響當前正在處理的請求"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"7、後續發展規劃"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"目前容器平臺彈性伸縮已經陸續在衆多業務落地,正式環境大約40%業務開啓彈性伸縮,預發環境大約97%業務開啓縮容至零,集羣CPU資源均值利用率從不到10%提升到20%左右,GPU資源均值利用率從10%左右提升到25%左右,極大降低服務器資源成本。後續彈性伸縮平臺會繼續發展演進,進一步降本增效。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"7.1 服務畫像智能推薦"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"當前業務方在使用彈性伸縮需要一定的心智負擔,例如彈性伸縮策略需要配置服務動態伸縮的最小副本數、最大副本數、目標利用率、定時區間段、資源請求量等,這些指標數據難以精確配置,需要開發和運維依靠歷史經驗去自行判斷,並且時效性較差。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"針對這個問題,容器平臺會建立服務畫像模塊,結合服務資源使用詳情的歷史數據、CPI指標時序數據、OOM事件等信息,使用類似Google Autopilot的時間滑動窗口+半衰減指數直方圖算法來自動預測和智能推薦彈性伸縮相關的諸多配置,進而便於開發運維精準配置彈性伸縮策略,進一步提升資源利用率,同時也大大減輕運維成本和心智負擔。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"7.2 在離線混部"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"當前容器平臺僅支持在線業務和離線任務分時複用機器資源,運維成本高,資源提升有限,後續我們會和內核、資源調度等團隊通力合作探索在離線混部方案"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"7.3 穩定性建設"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"彈性伸縮穩定性對業務高可用性至關重要,容器平臺會持續建設穩定性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(1)服務畫像全方位探測和精準預測 負載異常、干擾壓制、服務受損等異常,及時做出驅逐、擴容(水平擴容或垂直擴容)等決策;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(2)彈性伸縮系統和容量Quota系統合作,智能精細化管控業務資源容量,解決彈性擴容過程中的容量受限問題;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":"(3)彈性伸縮系統會推進周邊系統進一步提升彈性擴容卡慢的問題,例如鏡像預熱和分發加速、原地發佈、應用啓動提速等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者簡介:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"王傑,曾就職於阿里巴巴,目前是vivo人工智能部門計算平臺組資深工程師。關注領域:kubernetes、容器、Service Mesh等雲原生技術。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#000000","name":"black"}}],"text":" "}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章