你該如何爲 Kubernetes 定製特性

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 是非常複雜的集羣編排系統,然而哪怕包含豐富的功能和特性,因爲容器的調度和管理本身就有較高的複雜性,所以它無法滿足所有場景下的需求。雖然 Kubernetes 能夠解決大多數場景中的常見問題,但是爲了實現更加靈活的策略,我們需要使用 Kubernetes 提供的擴展能力實現特定目的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個項目在不同的週期會着眼於不同的特性,我們可以將項目的演進過程簡單分成三個不同的階段:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"最小可用"},{"type":"text","text":":項目在早期更傾向於解決通用的、常見的問題,給出開箱即用的解決方案以吸引用戶,這時代碼庫的規模還相對比較小,提供的功能較爲有限,能夠覆蓋領域內 90% 的場景;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"功能完善"},{"type":"text","text":":隨着項目得到更多的使用者和支持者,社區會不斷實現相對重要的功能,社區治理和自動化工具也逐漸變得完善,能夠解決覆蓋內 95% 的場景;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"擴展能力"},{"type":"text","text":":因爲項目的社區變得完善,代碼庫變得逐漸龐大,項目的每個變動都會影響下游的開發者,任何新功能的加入都需要社區成員的討論和審批,這時社區會選擇增強項目的擴展性,讓使用者能夠爲自己的場景定製需求,能夠解決覆蓋內 99% 的場景;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bc\/bcba9f3e4fcec265d260719f703115a9.png","alt":"evolving-of-open-source-project","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 1 - 開源項目的演進"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從 90%、95% 到 99%,每個步驟都需要社區成員花費很多精力,但是哪怕提供了較好的擴展性也無法解決領域內的全部問題,在一些極端場景下仍然需要維護自己的分支或者另起爐竈滿足業務上的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"然而無論是維護自己的分支,還是另起爐竈都會帶來較高的開發和維護成本,這需要結合實際需求進行抉擇。但是能夠利用項目提供的配置能力和擴展能力就可以明顯地降低定製化的開發成本,而我們今天要梳理的就是 Kubernetes 的可擴展性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"擴展接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"API 服務器是 Kubernetes 中的核心組件,它承擔着集羣中資源讀寫的重任,雖然社區提供的資源和接口可以滿足大多數的日常需求,但是我們仍然會有一些場景需要擴展 API 服務器的能力,這一節簡單介紹幾個擴展該服務的方法。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自定義資源"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自定義資源(Custom Resource Definition、CRD)應該是 Kubernetes 最常見的擴展方式"},{"type":"sup","content":[{"type":"text","text":"1"}]},{"type":"text","text":",它是擴展 Kubernetes API 的方式之一。Kubernetes 的 API 就是我們向集羣提交的 YAML,系統中的各個組件會根據提交的 YAML 啓動應用、創建網絡路由規則以及運行工作負載。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"\napiVersion: v1\nkind: Pod\nmetadata:\n name: static-web\n labels:\n role: myrole\nspec:\n containers:\n - name: web\n image: nginx\n ports:\n - name: web\n containerPort: 80\n protocol: TCP\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"Pod"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Service"}]},{"type":"text","text":" 以及 "},{"type":"codeinline","content":[{"type":"text","text":"Ingress"}]},{"type":"text","text":" 都是 Kubernetes 對外暴露的接口,當我們在集羣中提交上述 YAML 時,Kubernetes 中的控制器會根據配置創建滿足條件的容器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"apiVersion: apiextensions.k8s.io\/v1\nkind: CustomResourceDefinition\nmetadata:\n name: crontabs.stable.example.com\nspec:\n group: stable.example.com\n versions:\n - name: v1\n served: true\n storage: true\n schema:\n openAPIV3Schema:\n type: object\n properties:\n spec:\n type: object\n properties:\n cronSpec:\n type: string\n image:\n type: string\n replicas:\n type: integer\n scope: Namespaced\n names:\n plural: crontabs\n singular: crontab\n kind: CronTab\n shortNames:\n - ct"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了這些系統內置的 API 之外,想要實現定製的接口就需要使用 CRD,然而 CRD 僅僅是實現自定義資源的冰山一角,因爲它只定義了資源中的字段,我們還需要遵循 Kubernetes 的控制器模式,實現消費 CRD 的 Operator,通過組合 Kubernetes 提供的資源實現更復雜、更高級的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/98\/98b1509a245a898003966d75f20287bf.png","alt":"modular-kubernetes-api","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 2 - Kubernetes API 模塊化設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示,Kubernetes 中的控制器等組件會消費 Deployment、StatefulSet 等資源,而用戶自定義的 CRD 會由自己實現的控制器消費,這種設計極大地降低了系統之間各個模塊的耦合,讓不同模塊可以無縫協作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們想要讓 Kubernetes 集羣提供更加複雜的功能時,選擇 CRD 和控制器是首先需要考慮的方法,這種方式與現有的功能耦合性非常低,同時也具有較強的靈活性,但是在定義接口時應該遵循社區 API 的最佳實踐設計出優雅的接口2。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"聚合層"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes API 聚合層是 v1.7 版本實現的功能,它的目的是將單體的 API 服務器拆分成多個聚合服務,每個開發者都能夠實現聚合 API 服務暴露它們需要的接口,這個過程不需要重新編譯 Kubernetes 的任何代碼3。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f8\/f8a41c4d4dca3cdc41265fe4a142e493.png","alt":"kubernetes-api-aggregation","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 3 - Kubernetes API 聚合"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們需要在集羣中加入新的 API 聚合服務時,需要提交一個 APIService 資源,這個資源描述了接口所屬的組、版本號以及處理該接口的服務,下面是 Kubernetes 社區中 metrics-server 服務對應的 APIService:"}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"apiVersion: apiregistration.k8s.io\/v1\nkind: APIService\nmetadata:\n name: v1beta1.metrics.k8s.io\nspec:\n service:\n name: metrics-server\n namespace: kube-system\n group: metrics.k8s.io\n version: v1beta1\n insecureSkipTLSVerify: true\n groupPriorityMinimum: 100\n versionPriority: 100"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果我們將上述資源提交到 Kubernetes 集羣中後,用戶在訪問 API 服務器的 \/apis\/metrics.k8s.io\/v1beta1 路徑時,會被轉發到集羣中的 metrics-server.kube-system.svc 服務上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與應用範圍很廣的 CRD 相比,API 聚合機制在項目中比較少見,它的主要目的還是擴展 API 服務器,而大多數的集羣都不會有類似的需求,在這裏也就不過多介紹了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"准入控制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 的准入控制機制可以修改和驗證即將被 API 服務器持久化的資源,API 服務器收到的全部寫請求都會經過如下所示的階段持久化到 etcd 中4:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5e\/5ef07e006d54d0e6e947f103fa6c73e2.png","alt":"kubernetes-admission-control","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 4 - Kubernetes 准入控制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 的代碼倉庫中包含 20 多個准入控制插件5,我們以 TaintNodesByCondition 插件6爲例簡單介紹一下它們的實現原理:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"func (p *Plugin) Admit(ctx context.Context, a admission.Attributes, o admission.ObjectInterfaces) error {\n\tif a.GetResource().GroupResource() != nodeResource || a.GetSubresource() != \"\" {\n\t\treturn nil\n\t}\n\n\tnode, ok := a.GetObject().(*api.Node)\n\tif !ok {\n\t\treturn admission.NewForbidden(a, fmt.Errorf(\"unexpected type %T\", a.GetObject()))\n\t}\n\n\taddNotReadyTaint(node)\n\treturn nil\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"}Go所有的准入控制插件都可以實現上述的 Admit 方法修改即將提交到存儲中的資源,也就是上面提到的 Mutating 修改階段,這段代碼會爲所有傳入節點加上 NotReady 污點保證節點在更新期間不會有任務調度到該節點上;除了 Admit 方法之外,插件還可以實現 Validate 方法驗證傳入資源的合法性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Kubernetes 實現自定義的准入控制器相對比較複雜,我們需要構建一個實現准入控制接口的 API 服務並將該 API 服務通過 MutatingWebhookConfiguration 和 ValidatingWebhookConfiguration 兩種資源將服務的地址和接口註冊到集羣中,而 Kubernetes 的 API 服務器會在修改資源時調用 WebhookConfiguration 中定義的服務修改和驗證資源。Kubernetes 社區中的比較熱門的服務網格 Istio 就利用該特性實現了一些功能7。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"容器接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 作爲容器編排系統,它的主要邏輯還是調度和管理集羣中運行的容器,雖然它不需要從零開始實現新的容器運行時,但是因爲網絡和存儲等模塊是容器運行的必需品,所以它要與這些模塊打交道。Kubernetes 選擇的方式是設計網絡、存儲和運行時接口隔離實現細節,自己把精力放在容器編排上,讓第三方社區實現這些複雜而且極具專業性的模塊。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"網絡插件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器網絡接口(Container Network Interface、CNI)包含一組用於開發插件去配置 Linux 容器中網卡的接口和框架。CNI 僅會關注容器的網絡連通性並在容器刪除時回收所有分配的網絡資源8。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f7\/f74f872ca4ed5473c808171334135fc4.png","alt":"cni-banner","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 5 - 容器網絡接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CNI 插件雖然與 Kubernetes 有密切的關係,但是不同的容器管理系統都可以使用 CNI 插件來創建和管理網絡,例如:mesos、Cloud Foundry 等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所有的 CNI 插件都應該實現包含 ADD、DEL 和 CHECK 操作的二進制可執行文件,容器管理系統會執行二進制文件來創建網絡9。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Kubernetes 中,無論使用哪種網絡插件都需要遵循它的網絡模型,除了每個 Pod 都需要有獨立的 IP 地址之外,Kubernetes 還對網絡模型做出了以下的需求:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任意節點上的 Pod 在不使用 NAT 的情況下都訪問到所有節點上的所有 Pod;"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"type CNI interface {\n\tAddNetworkList(ctx context.Context, net *NetworkConfigList, rt *RuntimeConf) (types.Result, error)\n\tCheckNetworkList(ctx context.Context, net *NetworkConfigList, rt *RuntimeConf) error\n\tDelNetworkList(ctx context.Context, net *NetworkConfigList, rt *RuntimeConf) error\n\tGetNetworkListCachedResult(net *NetworkConfigList, rt *RuntimeConf) (types.Result, error)\n\tGetNetworkListCachedConfig(net *NetworkConfigList, rt *RuntimeConf) ([]byte, *RuntimeConf, error)\n\n\tAddNetwork(ctx context.Context, net *NetworkConfig, rt *RuntimeConf) (types.Result, error)\n\tCheckNetwork(ctx context.Context, net *NetworkConfig, rt *RuntimeConf) error\n\tDelNetwork(ctx context.Context, net *NetworkConfig, rt *RuntimeConf) error\n\tGetNetworkCachedResult(net *NetworkConfig, rt *RuntimeConf) (types.Result, error)\n\tGetNetworkCachedConfig(net *NetworkConfig, rt *RuntimeConf) ([]byte, *RuntimeConf, error)\n\n\tValidateNetworkList(ctx context.Context, net *NetworkConfigList) ([]string, error)\n\tValidateNetwork(ctx context.Context, net *NetworkConfig) ([]string, error)\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開發 CNI 插件對於多數工程師來說都非常遙遠,在正常情況下,我們只需要在一些常見的開源框架中根據需求做出選擇,例如:Flannel、Calico 和 Cilium 等,當集羣的規模變得非常龐大時,也自然會有網絡工程師與 Kubernetes 開發者配合開發相應的插件。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"存儲插件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器存儲接口(Container Storage Interface、CSI)是 Kubernetes 在 v1.9 引入的新特性,該特性在 v1.13 中達到穩定,目前常見的容器編排系統 Kubernetes、Cloud Foundry、Mesos 和 Nomad 都選擇使用該接口擴展集羣中容器的存儲能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1f\/1fb6be7551b730a67e18f48c754f6328.webp","alt":"csi-banner","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 6 - 容器存儲接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CSI 是在容器編排系統向容器化的工作負載暴露塊存儲和文件存儲的標準,第三方的存儲提供商可以通過實現 CSI 插件在 Kubernetes 集羣中提供新的存儲10。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 的開發團隊在 CSI 的文檔中給出了開發和部署 CSI 插件的最佳實踐11,其中最主要的工作是創建實現 Identity、Node 和可選的 Controller 接口的容器化應用,並通過官方的 sanity 包測試 CSI 插件的合法性,需要實現的接口都定義在 CSI 的規格文檔中12。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"service Identity {\n rpc GetPluginInfo(GetPluginInfoRequest)\n returns (GetPluginInfoResponse) {}\n\n rpc GetPluginCapabilities(GetPluginCapabilitiesRequest)\n returns (GetPluginCapabilitiesResponse) {}\n\n rpc Probe (ProbeRequest)\n returns (ProbeResponse) {}\n}\n\nservice Controller {\n ...\n}\n\nservice Node {\n ...\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CSI 的規格文檔非常複雜,除了詳細地定義了不同接口的請求和響應參數。它還定義不同接口在出現相應錯誤時應該返回的 gRPC 錯誤碼,開發者想要實現一個完全遵循 CSI 接口的插件還是很麻煩的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 在較早的版本中分別接入了不同的雲廠商的接口,其中包括 Google PD、AWS、Azure 以及 OpenStack,但是隨着 CSI 接口的成熟,社區未來會在上游移除雲廠商特定的實現,減少上游的維護成本,也能加快各個廠商自身存儲的迭代和支持13。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"運行時接口"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器運行時接口(Container Runtime Interface、CRI)是一系列用於管理容器運行時和鏡像的 gRPC 接口,它是 Kubernetes 在 v1.5 中引入的新接口,Kubelet 可以通過它使用不同的容器運行時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0b\/0b4d00c8d947095fb5f1174e9eddca5b.png","alt":"cri-and-container-runtimes","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 7 - CRI 和容器運行時"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CRI 主要定義的是一組 gRPC 方法,我們能在規格文檔中找到 RuntimeService 和 ImageService 兩個服務14,它們的名字很好地解釋了各自的作用:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"service RuntimeService {\n rpc Version(VersionRequest) returns (VersionResponse) {}\n\n rpc RunPodSandbox(RunPodSandboxRequest) returns (RunPodSandboxResponse) {}\n rpc StopPodSandbox(StopPodSandboxRequest) returns (StopPodSandboxResponse) {}\n rpc RemovePodSandbox(RemovePodSandboxRequest) returns (RemovePodSandboxResponse) {}\n rpc PodSandboxStatus(PodSandboxStatusRequest) returns (PodSandboxStatusResponse) {}\n rpc ListPodSandbox(ListPodSandboxRequest) returns (ListPodSandboxResponse) {}\n\n rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}\n rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}\n rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}\n rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}\n rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}\n rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}\n rpc UpdateContainerResources(UpdateContainerResourcesRequest) returns (UpdateContainerResourcesResponse) {}\n rpc ReopenContainerLog(ReopenContainerLogRequest) returns (ReopenContainerLogResponse) {}\n\n rpc ExecSync(ExecSyncRequest) returns (ExecSyncResponse) {}\n rpc Exec(ExecRequest) returns (ExecResponse) {}\n rpc Attach(AttachRequest) returns (AttachResponse) {}\n rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}\n\n ...\n}\n\nservice ImageService {\n rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}\n rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}\n rpc PullImage(PullImageRequest) returns (PullImageResponse) {}\n rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}\n rpc ImageFsInfo(ImageFsInfoRequest) returns (ImageFsInfoResponse) {}\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器運行時的接口相對比較簡單,上面的這些接口不僅暴露了 Pod 沙箱管理、容器管理以及命令執行和端口轉發等功能,還包含用於管理鏡像的多個接口,容器運行時只要實現上面的二三十個方法可以爲 Kubelet 提供服務。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"設備插件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU、內存、磁盤是主機上常見的資源,然而隨着大數據、機器學習和硬件的發展,部分場景可能需要異構的計算資源,例如:GPU、FPGA 等設備。異構資源的出現不僅需要節點代理 Kubelet 的支持,還需要調度器的配合,爲了良好的兼容後出現的不同計算設備,Kubernetes 社區在上游引入了設備插件(Device Plugin)用於支持多種類型資源的調度和分配"},{"type":"sup","content":[{"type":"text","text":"15"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fd\/fdb04a9975fab84deeb49f5e36a694c6.png","alt":"device-plugin-overview","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 8 - 設備插件概述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設備插件是獨立在 Kubelet 之外單獨運行的服務,它通過 Kubelet 暴露的 Registration 服務註冊自己的相關信息並實現 DevicePlugin 服務用於訂閱和分配自定義的設備16。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"service Registration {\n\trpc Register(RegisterRequest) returns (Empty) {}\n}\n\nservice DevicePlugin {\n rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}\n rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}\n rpc Allocate(AllocateRequest) returns (AllocateResponse) {}\n rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}\n rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當設備插件剛剛啓動時,它會調用 Kubelet 的註冊接口傳入自己的版本號、Unix 套接字和資源名,例如:nvidia.com\/gpu;Kubelet 會通過 Unix 套接字與設備插件通信,它會通過 ListAndWatch 接口持續獲得設備中資源的最新狀態,並在 Pod 申請資源時通過 Allocate 接口分配資源。設備插件的實現邏輯相對比較簡單,感興趣的讀者可以研究 Nvidia GPU 插件的實現原理17。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"調度框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度器是 Kubernetes 中的核心組件之一,它的主要作用是在 Kubernetes 集羣中的一組節點中爲工作負載做出最優的調度決策,不同場景下的調度需求往往都是很複雜的,然而調度器在 Kubernetes 項目早期並不支持易用的擴展能力,僅支持調度器擴展(Extender)這種比較難用的方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 從 v1.15 引入的調度框架纔是今天比較主流的調度器擴展技術,通過在 Kubernetes 調度器的內部抽象出關鍵的擴展點(Extension Point)並通過插件的方式在擴展點上改變調度器做出的調度決策"},{"type":"sup","content":[{"type":"text","text":"18"}]},{"type":"text","text":"。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/fd\/fdde4628608cb9dd0e5387739ca4991e.png","alt":"scheduling-framework-extensions","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"圖 9 - 調度框架擴展點"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前的調度框架總共支持 11 個不同的擴展點,每個擴展點都對應 Kubernetes 調度器中定義的接口,這裏僅展示 FilterPlugin 和 ScorePlugin 兩個常見接口中的方法19:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"type FilterPlugin interface {\n\tPlugin\n\tFilter(ctx context.Context, state *CycleState, pod *v1.Pod, nodeInfo *NodeInfo) *Status\n}\n\ntype ScoreExtensions interface {\n\tNormalizeScore(ctx context.Context, state *CycleState, p *v1.Pod, scores NodeScoreList) *Status\n}\n\ntype ScorePlugin interface {\n\tPlugin\n\tScore(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string) (int64, *Status)\n\tScoreExtensions() ScoreExtensions\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調度框架的出現讓實現複雜的調度策略和調度算法變得更加容易,社區通過調度框架替代更早的謂詞和優先級並實現了協作式調度、基於容量調度等功能更強大的插件20。雖然今天的調度框架已經變得非常靈活,但是串行的調度器可能無法滿足大集羣的調度需求,而 Kubernetes 目前也很難實現多調度器,不知道未來是否會提供更靈活的接口。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 從 2014 年發佈至今已經過去將近 7 年了,從一個最小可用的編排系統到今天的龐然大物,社區的每個代碼貢獻者和成員都有責任。從這篇文章中,我們可以看到隨着 Kubernetes 項目的演進方向,社區越來越關注系統的可擴展性,通過設計接口、移除第三方代碼降低社區成員的負擔,讓 Kubernetes 能夠更專注於容器的編排和調度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Custom Resources · Kubernetes https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/api-extension\/custom-resources\/ ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"API Conventions · Kubernetes Community https:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/devel\/sig-architecture\/api-conventions.md ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Aggregated API Servers · Kubernetes Community https:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/design-proposals\/api-machinery\/aggregated-api-servers.md ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Using Admission Controllers https:\/\/kubernetes.io\/docs\/reference\/access-authn-authz\/admission-controllers\/ ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kubernetes\/plugin\/pkg\/admission\/ · Kubernetes https:\/\/github.com\/kubernetes\/kubernetes\/tree\/master\/plugin\/pkg\/admission ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kubernetes\/plugin\/pkg\/admission\/nodetaint\/admission.go · Kubernetes https:\/\/github.com\/kubernetes\/kubernetes\/blob\/master\/plugin\/pkg\/admission\/nodetaint\/admission.go ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Dynamic Admission Webhooks Overview https:\/\/istio.io\/latest\/docs\/ops\/configuration\/mesh\/webhook\/ ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CNI - the Container Network Interface cni https:\/\/github.com\/containernetworking\/cni ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Container Network Interface Specification https:\/\/github.com\/containernetworking\/cni\/blob\/master\/SPEC.md ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes Container Storage Interface (CSI) Documentation https:\/\/kubernetes-csi.github.io\/docs\/#kubernetes-container-storage-interface-csi-documentation ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Recommended Mechanism (for Developing and Deploying a CSI driver for Kubernetes) https:\/\/kubernetes-csi.github.io\/docs\/#recommended-mechanism-for-developing-and-deploying-a-csi-driver-for-kubernetes ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RPC Interface · Container Storage Interface (CSI) https:\/\/github.com\/container-storage-interface\/spec\/blob\/master\/spec.md#rpc-interface ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"In-tree Storage Plugin to CSI Migration Design Doc https:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/design-proposals\/storage\/csi-migration.md ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Container Runtime Interface (CRI) – a plugin interface which enables kubelet to use a wide variety of container runtimes. https:\/\/github.com\/kubernetes\/cri-api\/blob\/master\/pkg\/apis\/runtime\/v1\/api.proto ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Device Plugins https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/compute-storage-net\/device-plugins\/ ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"API Specification · Device Manager Proposal https:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/design-proposals\/resource-management\/device-plugin.md#api-specification ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"k8s-device-plugin https:\/\/github.com\/NVIDIA\/k8s-device-plugin ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Scheduling Framework https:\/\/github.com\/kubernetes\/enhancements\/tree\/master\/keps\/sig-scheduling\/624-scheduling-framework ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kubernetes\/pkg\/scheduler\/framework\/interface.go https:\/\/github.com\/kubernetes\/kubernetes\/blob\/master\/pkg\/scheduler\/framework\/interface.go ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Repository for out-of-tree scheduler plugins based on scheduler framework. https:\/\/github.com\/kubernetes-sigs\/scheduler-plugins ↩︎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:"},{"type":"link","attrs":{"href":"https:\/\/draveness.me\/","title":"xxx","type":null},"content":[{"type":"text","text":"面向信仰編程"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/draveness.me\/cloud-native-kubernetes-extension\/","title":"xxx","type":null},"content":[{"type":"text","text":"你該如何爲 Kubernetes 定製特性"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章