vivo AI 計算平臺雲原生自動化實踐

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1、背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2018 年底,vivo AI 研究院爲了解決統一高性能訓練環境、大規模分佈式訓練、計算資源的高效利用調度等痛點,着手建設 AI 計算平臺。經過兩年的持續迭代,平臺建設和落地取得了很大進展,成爲 vivo AI 領域的核心基礎平臺。平臺從當初服務深度學習訓練爲主,到現在演進成包含 VTraining、VServing、VContainer 三大模塊,對外提供模型訓練、模型推理和容器化能力。VContainer是計算平臺的底座,基於Kubernetes構建的容器平臺,具備資源調度、彈性伸縮、零一混部等核心能力。VContainer的容器集羣有上千個節點,擁有超過100PFLOPS的GPU算力。集羣裏同時運行着上千個VTraining的訓練任務和上百個VServing的推理服務以及上百個在線服務項目。本文主要分享了VContainer雲原生相關基礎組件的自動化實踐,從半工具化人工維護,到白屏化流程的實踐和落地。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2、早期的風險與踩坑"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在2018年底開始使用rke來建設k8s集羣,也算是rke項目早期的用戶。根據實踐經驗,我們將k8s集羣建設和維護劃分爲:機器管理、集羣管理、容器網絡管理3大步驟。在實施過程中,我們面對着一些風險,也踩過了一些坑。早期集羣建設階段,風險難以避免,會出現在變更的各個環節當中。"},{"type":"text","marks":[{"type":"strong"}],"text":"但我們不應該害怕風險,也不能因爲風險的存在而不做變更,我們應該保持平常心,敬畏風險,把穩定性放在首要位置"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"風險一,多集羣場景:機器數據缺乏統一管控能力,集羣A節點出現被添加到集羣B的情況。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"風險二,集羣節點被初始化:集羣維護有標準流程,但是流程中不同操作,使用不同的工具完成,初始化過程出現集羣節點遺漏在初始化列表的情況。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"風險三,變更配置錯誤:在集羣建設和維護三個步驟中,配置項重複繁雜,變更工具缺乏校驗功能,出現配置錯誤情況,導致底層組件故障,影響業務系統。 "}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1、機器管理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器管理有兩個部分:數據信息管理和機器變更。風險和踩坑出現在機器變更過程,早期我們選擇"},{"type":"link","attrs":{"href":"https:\/\/docs.ansible.com\/","title":"ansible","type":null},"content":[{"type":"text","text":"ansible"}]},{"type":"text","text":" 批量操作工具,有3個變更操作頻率很高:機器初始化、機器清理和其他非固定批量操作。初始化是機器添加到k8s集羣前,安裝docker、gpu軟件、配置環境等等;同理,機器清理是卸載和清空docker軟件和相關環境。 "},{"type":"link","attrs":{"href":"https:\/\/docs.ansible.com\/ansible\/latest\/cli\/ansible-playbook.html","title":"ansible-playbook","type":null},"content":[{"type":"text","text":"ansible-play"}]},{"type":"text","text":" 模塊允許我們定義tasks任務,管理相同類型操作的腳本。我們創建了初始化和清理腳本的tasks任務,一系列操作的腳本添加到對應的tasks任務下面,使用時運行相同命令完成變更。需要特別注意是:執行tasks任務需要配置機器列表,如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"192.168.219.10 ansible_ssh_pass=\"123456\"\n\n192.168.219.11 ansible_ssh_pass=\"123456\"\n\n192.168.219.12 ansible_ssh_pass=\"123456\"\n\n192.168.219.13 ansible_ssh_pass=\"123456\"\n\n[rke-prepare]\n\n192.168.219.10\n\n192.168.219.11\n\n192.168.219.12\n\n192.168.219.13"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏存在操作風險,步驟重複繁多或者多人操作情況,機器列表有可能出現重複、錯漏的情況,我們踩過這樣的坑:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"踩坑1:ansible初始化操作,錯誤把集羣中工作節點或者核心節點執行了初始化。因爲配置初始化機器列表,多人操作或者遺忘修改機器列表,導致集羣節點被初始化。這樣的後果非常嚴重,初始化了一般worker節點,影響業務容器;初始化了核心節點影響範圍更加大,整個集羣可用性都會被影響。本人之前把跑着在線業務的worker節點初始化掉,造成業務節點Not Ready,直接影響了線上業務可用性。後來我們在ansible腳本中加上檢查k8s集羣節點的步驟,判斷機器如果已經存在k8s相關組件即可跳過初始化操作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2、集羣管理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集羣管理核心操作:集羣創建、擴縮容、更新、容災4個。前文提到的 "},{"type":"link","attrs":{"href":"https:\/\/rancher.com\/","title":"rancher","type":null},"content":[{"type":"text","text":"rancher"}]},{"type":"text","text":" 開源的k8s集羣管理項目 "},{"type":"link","attrs":{"href":"https:\/\/github.com\/rancher\/rke","title":"rke","type":null},"content":[{"type":"text","text":"rke"}]},{"type":"text","text":" 滿足我們基本需求。在其 "},{"type":"link","attrs":{"href":"https:\/\/www.rancher.cn\/products\/rke\/","title":"官方介紹","type":null},"content":[{"type":"text","text":"官方介紹"}]},{"type":"text","text":" 中說到:RKE是一款經過CNCF認證的開源Kubernetes發行工具,可以在Docker容器內運行。它通過刪除大部分主機依賴項,併爲部署、升級和回滾提供一個穩定的路徑,從而解決了Kubernetes最常見的安裝複雜性問題。藉助RKE,Kubernetes可以完全獨立於您正在運行的操作系統和平臺,輕鬆實現Kubernetes的自動化運維。 和其他雲原生項目一樣,rke也使用 golang 開發,是一個命令行工具。使用配置文件 "},{"type":"link","attrs":{"href":"https:\/\/rancher.com\/docs\/rke\/latest\/en\/example-yamls\/","title":"cluster.yaml","type":null},"content":[{"type":"text","text":"cluster.yml"}]},{"type":"text","text":" 管理k8s集羣,並且通過cluster.rkestate維護k8s集羣狀態。rkestate文件是rke命令行自行管理的k8s狀態文件,用戶不必過多關心。cluster.yml纔是用戶管理k8s集羣的配置文件,rke up操作按照該yml配置文件更變k8s集羣:節點增刪、版本更新等,如下是cluster.yaml實例:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"nodes:\n\n- address: 1.2.3.4\n\nuser: ubuntu\n\nrole:\n\n- controlplane\n\n- etcd\n\n- worker\n\nservices:\n\netcd:\n\nimage: rancher\/coreos-etcd:v3.3.10-rancher1\n\nkube-api:\n\nimage: rancher\/hyperkube:v1.14.3-rancher1\n\nextra_args: {}\n\n... ...\n\nnetwork:\n\nplugin: calico\n\noptions:\n\ncalico_cloud_provider: none\n\naddons: \"\"\n\naddons_include: []\n\n... ... "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"展示裏是簡化後的配置,詳細的配置介紹可以參考"},{"type":"link","attrs":{"href":"https:\/\/docs.rancher.cn\/docs\/rke\/example-yamls\/_index","title":"yaml文件示例","type":null},"content":[{"type":"text","text":"yaml文件示例"}]},{"type":"text","text":"。儘管rke提供了單個yml配置文件管理k8s集羣的功能,但是該文件配置繁雜重複,而且我們一開始就使用了較早版本的rke,也碰到了一些坑:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"踩坑1:rke添加工作(worker)節點時,節點角色錯誤配置爲核心節點(controlplane\\etcd)角色,對於etcd的情況會導致api-server滾動重啓,正在請求api-server的服務連接會被斷開,對於重試服務影響不大。類似kubectl logs、exec等操作會被斷開,重新執行解決。對於controlplane的情況,集羣內部的worker節點會重啓kubelet和kube-proxy。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"踩坑2:使用較早版本rke變更時,每次都會打印證書變更,需要強制更新的日誌,是較早版本日誌輸出的bug不必驚慌,較新版本已經修復:"},{"type":"link","attrs":{"href":"https:\/\/github.com\/rancher\/rke\/issues\/1405","title":null,"type":null},"content":[{"type":"text","text":"https:\/\/github.com\/rancher\/rke\/issues\/1405"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"踩坑3:同樣是較早版本rke up,使用update-only仍然會操作所有worker節點,操作過程偶爾會出現某個節點長時間沒有響應的情況,導致整個變更流程被堵塞,無法完成。我們增加了一個ignore-hosts字段,支持rke up跳過執行機器。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"踩坑4:同樣是較早版本rke up,我們對etcd進行災難恢復演練過程,發現rke etcd restore的操作將整個k8s集羣所有節點進行清理後再重建,其實我們的目是etcd集羣掛掉後,可以快速重建etcd集羣,而不需要變更woker和controlplane的系統組件和calico、ingress-controller等組件,造成業務層面的影響。對此我們進行了改造,rke etcd restore命令恢復etcd集羣時,默認只進行:清理etcd節點、etcd重建、rke up幾個基本操作。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3、容器網絡管理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器網絡我們使用的是calico插件,扁平化需求是與網絡組配置系統交互完成。在日常維護工作中,我們踩過這樣的坑:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"踩坑1:ippool配置錯誤,本人在新集羣剛搭建時,在創建ippool步驟中,把容器網絡的字段,填入了主機網段的值。導致的後果是,在物理主機節上創建了奇怪的路由規則,k8s集羣主機網絡和容器網絡都受到了不同程度影響,後來我們使用ansible批量刪除異常路由。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"踩坑2:扁平化節點集羣配置錯誤,把集羣A的節點配置到集羣B的RR節點上面,當時驗證隻影響配置錯誤節點的扁平網絡功能,其他節點不受影響。但是,錯誤配置的恢復過程比較麻煩。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"k8s集羣作爲基礎設施,上面運行大量在線業務和訓練任務。在集羣變更過程中,小小失誤都有可能導致業務層面直接不可用,我們必須想盡辦法規避風險,力所能及填平所有坑坑窪窪。借鑑傳統運維管理經驗,k8s集羣運維管理也需要自動化。"},{"type":"text","marks":[{"type":"strong"}],"text":"很多人看到自動化第一印象是代碼程序,其實自動化的精髓是標準。如何將複雜、重複、分散的操作標準化、流程化,是自動化的關鍵。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3、自動化設計過程"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1、設計思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動化前半階段目標非常明確:減少人工手動操作,建立標準化流程和提高運維效率;降低操作風險,提高集羣穩定性。根據我們目前探索和實踐的經驗,後半階段的目標也逐漸清晰:高度自動化、半智能化方向設計,檢測和分析定位集羣問題,並提供快速恢復的辦法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/15\/69\/15873b066d4989430d037e81dae4f669.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"自動化建設目標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合我們自動化設計的目標,和我們基礎組件的使用情況,以下設計要點我們重點關注:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"多集羣"},{"type":"text","text":":管理多個k8s集羣和所有物理機信息,多個集羣在工具化階段信息分散,自動化首要任務是把數據同步到一起,用來幫助我們梳理自動化的流程、校驗以及審覈這3個方面的標準,並且設計3個方面要實現的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"標準流程"},{"type":"text","text":":將我們日常重複和複雜的集羣變更操作規範化,制定標準的變更流程,並且將其軟件化、產品化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"自動校驗"},{"type":"text","text":":梳理集羣變更過程需要人工校驗的case,並且設計自動化校驗的步驟,把這些校驗步驟作爲標準變更流程中的必要前置執行條件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"流程可控"},{"type":"text","text":":既然制定了標準流程,那麼整個變更流程就可以全部自動化,一次性完成所有操作。但是,變更總是存在着風險和未知因素,因此流程中每個步驟的執行前後,對應設計人工審覈和控制的環節。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"最小變更原則"},{"type":"text","text":":因爲變更總是帶着風險,所以我們希望變更越少越好,最好是隻做需要的變更,無關聯的變更儘量避免。例如,早期的rke up進行worker節點擴縮容的情況,還是會牽連到對核心節點的一些操作。因爲rke考慮到保證集羣整個狀態是健康可用的,所以rke up會嘗試校驗並且操作集羣中所有節點。站在穩定性角度考慮,我們只想變更worker節點,不想牽連到核心節點,或者其他不用變更的節點也不想牽連。後來rke up可以指定角色變更:worker、controlplane、etcd,而我們也做了定製化,節點擴縮容只會操作需要變更的節點,其他一切節點保持不動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前期自動化聚焦於日常80%集羣運維工作,實現白屏化的建設:機器管理、集羣管理、容器網絡管理。機器管理包括:從CMDB同步機器信息、機器初始化、機器環境清理。集羣管理包括:節點信息可視化、增刪節點、更新節點、rke配置和狀態配置文件管理。而容器網絡管理暫時是ippool的增刪查改4個操作和k8s節點calico網絡扁平化流程。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2、架構設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"按照設計思路,如下是我們自動化設計的簡單架構圖,AutoRke自動化平臺是我們建設的目標,底層操作k8s、calico和docker等雲原生基礎組件的變更,上層對接vivo基礎平臺完成同步數據和流程控制等功能。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/41\/e0\/41036b1d3b57d2bdf817927f2c8201e0.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"自動化實踐簡單架構 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"AutoRke"},{"type":"text","text":":建設一個提供標準流程的白屏化平臺,集成rke 、ansible 等命令行的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"雲原生基礎"},{"type":"text","text":":Autorke自動化平臺管理的目標對象:k8s、calico、docker。在物理機上安裝配置docker環境,使用docker api接口部署和管理k8s組件。管理calico容器網絡扁平化配置和容器網絡地址池ippool。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"vivo基礎平臺: "},{"type":"text","text":"Autorke建設過程實現自動化流程依賴的關鍵系統。我們從CMDB同步機器信息,使用單點登錄來驗證用戶權限。VCalico通過工單流程的方式,完成calico扁平化配置的自動流程。HIC是vivo機器硬件管理相關的系統,正在接入到k8s節點故障處理的流程,幫助我們提高穩定性。作業平臺是vivo機器批量操作的系統,與CMDB信息打通,我們將用來做機器初始化和快速作業的操作。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4、自動化實踐與落地"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動化實踐最終產出工具化、系統化產品,我們的目標是白屏化的平臺。能夠管理多個k8s集羣和所有的物理機,收攏日常分散的k8s集羣變更操作,提供標準、可控、可審查的白屏化流程,完成日常k8s集羣變更,提高變更效率,降低操作風險,提高集羣穩定性。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1、核心技術"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"k8s集羣建設和維護是自動化的核心工作,前文介紹我們使用rke來開展相關工作。rke使用docker的方式部署k8s集羣,在容器中啓動k8s組件。區別於我們平時使用docker命令管理容器生命週期,rke使用docker服務的API接口管理容器。爲了遠程批量管理大量主機的docker服務,rke構建ssh的tcp連接對象,在創建操作遠程主機docker服務的"},{"type":"link","attrs":{"href":"#NewClient","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline"}],"text":"docker client"}]},{"type":"text","text":"時,使用該tcp連接對象爲docker client創建http client,並且綁定到docker.sock。如下圖,rke通過ssh連接的方式構建遠端docker client,使用docker.sock實現docker服務的訪問,其中堡壘機環節是rke支持安全要求的設計,一切物理機只能通過堡壘機ssh登陸。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/b2\/74\/b29c0588bddab50c41a69556c7382774.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"rke工作流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"證書管理"},{"type":"text","text":",可以分爲2個關鍵點:證書發佈和輪換證書。證書發佈操作發生在集羣初始化、master集羣變更以及etcd集羣變更,通過容器的方式發佈證書。證書輪換是在證書即將過期或者證書泄露後需要重新頒發證書,可以在cluster.yml配置,也可以使用rke cert rotate完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"etcd集羣"},{"type":"text","text":",rke實現了3個重要的etcd操作:集羣創建、擴縮容、數據備份與恢復。集羣創建和擴縮容,在cluster.yml中配置etcd節點,執行rke up變更etcd集羣。數據備份與恢復是日常etcd集羣的數據備份,在出現故障時快速恢復數據與k8s集羣的功能。上圖etcd節點啓動容器:etcd、snapshots,其他kubelet相關容器是worker角色所需組件,也就是etcd可以作爲worker節點部署其他服務,但是我們不推薦這麼做。snapshots是etcd集羣數據定期備份的容器。etcd容器使用靜態集羣方式部署,啓動時配置好etcd的集羣規模和節點列表。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"核心節點"},{"type":"text","text":",需要部署3個核心服務apiserver、scheduler、controller的容器,服務參數配置在cluster.yml文件,啓動過程讀取並且設置在容器運行配置。可見,核心節點沒有nginx-proxy組件,這個組件是用來反向代理連接apiserver,核心節點內組件連接本機的apiserver,所以不需要。rke對etcd集羣變更時,訪問etcd集羣證書和IP列表發生變更,需要按照順序重啓核心節點的apiserver服務,重新加載訪問etcd集羣配置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"工作節點"},{"type":"text","text":",rke對集羣工作節點擴縮過程,管理節點上3個k8s組件kubelet、nginx-proxy和kube-proxy容器生命週期:創建、啓動、重啓、刪除、查詢。rke變更核心節點時,會變更訪問核心節點的配置,同理,要需要重啓工作節點的服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"插件部署"},{"type":"text","text":",addons是可選部署,用戶可以通過其他k8s部署服務的方式。rke對addons的部署劃分了3部分:cni、k8sAddons和userAddons,部署過程使用k8s的client,不再使用直連docker api接口方式部署。addons基本以:daemonset、deployment方式部署在k8s集羣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"集羣更新"},{"type":"text","text":",rke在1.0.0版本開始支持k8s集羣更新,cluster.yml配置支持了各個角色節點最大不可用數量、批量更新等參數,但是更新的要求比較苛刻:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 只支持相鄰的次要版本或者補丁版本更新"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 運行在k8s集羣中的業務需要支持健康檢查"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. k8s集羣角色節點、addon組件、業務系統需要支持高可用"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2、實踐過程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們明確實踐計劃分兩步走:rke 工具定製化和autorke白屏化平臺。定製化解決最初我們使用rke過程,出現的不符合我們預期的場景,同時,深入調研rke原理爲白屏化提供技術基礎。白屏化階段實現變更雲原生組件平臺化,制定標準流程,降低變更門檻和風險。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RKE CLI定製化"},{"type":"text","text":":在原生rke 命令基礎上,擴展了calico和worker2個子命令,分別負責calico容器網絡管理和k8s worker節點擴縮容,這兩個子命令支撐我們完成了大部分k8s集羣運維工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. rke calico支持容器網絡扁平化配置:新增、刪除和擴容。同時,也支持了ippool的增、刪、查、改、啓用、禁用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.rke worker,在worker節點擴縮容時,我們只希望做最小的變更,不執行不必要的操作。所以,rke worker命令只變更需要擴縮容的節點,其他不需要變更節點保持不動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Autorke白屏化"},{"type":"text","text":":CLI命令行的方式存在缺陷,只能勝任一次性的操作,不能滿足交互的場景,而且變更的流程規範也沒有完全統一,我們決定把CLI的工作做成白屏化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前平臺功能圍繞k8s集羣管理開展:機器管理、集羣節點管理、網段管理、配置管理。機器管理對物理機信息同步,從CMDB拉取全部機器信息,方便日常機器信息查詢,同時爲集羣變更和日後集羣穩定性建設提供數據基礎。管理k8s集羣worker節點日常變更:擴容、縮容、更新、容器網絡配置。網段管理是容器網段的管理,calico ippool的增刪查改。配置管理實現rke使用的集羣配置文件和狀態文件的版本管理。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/4e\/3d\/4ef7a47ba27040b4560367b5bcaf1e3d.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"自動化功能簡單展示—機器管理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖是機器管理頁面,統一管理機器關於:硬件、軟件、k8s、網絡各個方面的信息。最右邊機器操作下拉框,目前支持k8s集羣節點的添加、移除和更新3個功能。我們簡單介紹集羣添加節點的步驟,因爲步驟較多,使用文字描述具體過程:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 機器管理頁面選中的準備添加到集羣的機器,並且創建添加機器的配置,可以選擇默認配置、自定義配置的方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 運行添加機器校驗,確認機器是否可以添加到k8s集羣"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 初始化目標機器,安裝所需軟件、驅動,配置docker運行環境"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4. 添加機器到k8s集羣,同步機器標籤和污點,生成calico網絡配置"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5. 創建calico扁平網絡工單,給VCalico系統發起容器網絡配置工單"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6. 創建ippool容器網段,調用calico sdk配置所需要的網段信息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7. 監聽VCalico回調信息,更新節點容器網絡標籤"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.3、落地情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用於vivo AI計算平臺中4個k8s集羣,上千臺物理機。後續接入新建集羣,物理機數量將達到數千臺。按照實踐過程兩個迭代階段:rke定製化和autorke白屏化。定製化是對原生的rke命令行工具改造,實現符合我們場景的功能。autorke白屏化是把前期定製化的功能和變更流程實現白屏化,從去年12月上線至今白屏化完成k8s集羣變更工單100+。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.4、改進優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對使用過程出現的痛點,我們也做一些優化:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"失敗重試"},{"type":"text","text":",在同一個節點變更流程中,存在部分節點執行結果返回失敗,在變更流程實現重試失敗節點,優化用戶體驗,提高異常情況的處理效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"流程拆分"},{"type":"text","text":",在calico扁平化配置中,我們需要與VCalico交互完成工單和回調,開始我們考慮自動化流程在提交工單後面的流程,不再需要人工干預。其實,VCalico上報回調報文時,更加需要k8s管理員確認創建ippool的信息。提交容器網絡申請的工單信息也需要人工校驗,而不應該是自動生成配置後,立刻發起容器網絡配置工單。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5、後續計劃"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動化初期實現了雲原生基礎組件日常運維管理工作的白屏化功能,提高了工作效率,降低操作風險,一定程度上提高了基礎組件的穩定性。在今後自動化建設過程中,我們希望豐富自動化的功能,探索半智能化方向,重點關注雲原生基礎組件穩定性和可用性方面的自動化建設。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"巡檢"},{"type":"text","text":",自動檢測k8s集羣存在的問題以及風險點"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"自愈"},{"type":"text","text":",告警與故障自動分析定位以及快速恢復"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"更新"},{"type":"text","text":",基礎組件版本更新和機器升級流程等"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"梁大釗,曾就職於 百度, 啓明星辰 等公司,目前是 vivo AI 計算平臺組的資深工程師,參與平臺中調度、容器網絡、自動化等方向建設。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章