如何專業化監測一個 Kubernetes 集羣?
{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"引言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 在生產環境應用的普及度越來越廣、複雜度越來越高,隨之而來的穩定性保障挑戰也越來越大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何構建全面深入的可觀測性架構和體系,是提升系統穩定性的關鍵之因素一。ACK將可觀測性最佳實踐進行沉澱,以阿里雲產品功能的能力對用戶透出,可觀測性工具和服務成爲基礎設施,賦能並幫助用戶使用產品功能,提升用戶 Kubernetes 集羣的穩定性保障和使用體驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文會介紹 Kubernetes 可觀測性系統的構建,以及基於阿里云云產品實現 Kubernetes 可觀測系統構建的最佳實踐。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kubernetes 系統的可觀測性架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 系統對於可觀測性方面的挑戰包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"K8s 系統架構的複雜性。"},{"type":"text","text":"系統包括控制面和數據面,各自包含多個相互通信的組件,控制面和數據間之間通過 kube-apiserver 進行橋接聚合。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"動態性。"},{"type":"text","text":"Pod、Service 等資源動態創建以及分配 IP,Pod 重建後也會分配新的資源和 IP,這就需要基於動態服務發現來獲取監測對象。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"微服務架構。"},{"type":"text","text":"應用按照微服務架構分解成多個組件,每個組件副本數可以根據彈性進行自動或者人工控制。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對 Kubernetes 系統可觀測性的挑戰,尤其在集羣規模快速增長的情況下,高效可靠的 Kubernetes 系統可觀測性能力,是系統穩定性保障的基石。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,如何提升建設生產環境下的 Kubernetes 系統可觀測性能力呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 系統的可觀測性方案包括指標、日誌、鏈路追蹤、K8s Event 事件、NPD 框架等方式。每種方式可以從不同維度透視 Kubernetes 系統的狀態和數據。在生產環境,我們通常需要綜合使用各種方式,有時候還要運用多種方式聯動觀測,形成完善立體的可觀測性體系,提高對各種場景的覆蓋度,進而提升 Kubernetes 系統的整體穩定性。下面會概述生產環境下對 K8s 系統的可觀測性解決方案。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"指標(Metrics)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus 是業界指標類數據採集方案的事實標準,是開源的系統監測和報警框架,靈感源自 Google 的 Borgmon 監測系統。2012 年,SoundCloud 的 Google 前員工創造了 Prometheus,並作爲社區開源項目進行開發。2015 年,該項目正式發佈。2016 年,Prometheus 加入 CNCF 雲原生計算基金會。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus 具有以下特性:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多維的數據模型(基於時間序列的 Key、Value 鍵值對)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"靈活的查詢和聚合語言 PromQL"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提供本地存儲和分佈式存儲"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過基於 HTTP 的 Pull 模型採集時間序列數據"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可利用 Pushgateway(Prometheus 的可選中間件)實現 Push 模式"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可通過動態服務發現或靜態配置發現目標機器"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持多種圖表和數據大盤"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Prometheus 可以週期性採集組件暴露在 HTTP(s) 端點的\/metrics 下面的指標數據,並存儲到 TSDB,實現基於 PromQL 的查詢和聚合功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於 Kubernetes 場景下的指標,可以從如下角度分類:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"容器基礎資源指標"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集源爲 kubelet 內置的 cAdvisor,提供容器內存、CPU、網絡、文件系統等相關的指標,指標樣例包括:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器當前內存使用字節數 container_memory_usage_bytes;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器網絡接收字節數 container_network_receive_bytes_total;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器網絡發送字節數 container_network_transmit_bytes_total,等等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kubernetes 節點資源指標"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集源爲 node_exporter,提供節點系統和硬件相關的指標,指標樣例包括:節點總內存 node_memory_MemTotal_bytes,節點文件系統空間 node_filesystem_size_bytes,節點網絡接口 ID node_network_iface_id,等等。基於該類指標,可以統計節點的 CPU\/內存\/磁盤使用率等節點級別指標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kubernetes 資源指標"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採集源爲 kube-state-metrics,基於 Kubernetes API 對象生成指標,提供 K8s 集羣資源指標,例如 Node、ConfigMap、Deployment、DaemonSet 等類型。以 Node 類型指標爲例,包括節點 Ready 狀態指標 kube_node_status_condition、節點信息kube_node_info 等等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kubernetes 組件指標"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 系統組件指標。例如 kube-controller-manager, kube-apiserver,kube-scheduler, kubelet,kube-proxy、coredns 等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 運維組件指標。可觀測類包括 blackbox_operator, 實現對用戶自定義的探活規則定義;gpu_exporter,實現對 GPU 資源的透出能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes 業務應用指標。包括具體的業務 Pod在\/metrics 路徑透出的指標,以便外部進行查詢和聚合。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了上述指標,K8s 提供了通過 API 方式對外透出指標的監測接口標準,具體包括 Resource Metrics,Custom Metrics 和 External Metrics 三類。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.