用戶激增,負載飆升,Pinterest如何平穩擴展K8s?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文最初發佈於Pinterest Engineering Blog,經原作者授權由InfoQ中文站翻譯並分享。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"距離上一次分享我們"},{"type":"link","attrs":{"href":"https:\/\/medium.com\/pinterest-engineering\/building-a-kubernetes-platform-at-pinterest-fb3d9571c948?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"在Pinterest上搭建Kubernetes之旅"}]},{"type":"text","text":"已經過去一年多了。從那時開始,我們交付了許多功能,方便用戶進行採用,確保可靠性和可延展性,並積累了很多運維經驗和最佳實踐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總的來說,Kubernetes平臺的用戶反饋都很正面。根據我們的調查,在用戶心中排行前三的好處分別是,減輕管理計算資源的負擔、更好的資源和故障隔離,以及更靈活的容量管理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2020年底,我們在Kubernetes集羣中利用超過"},{"type":"text","marks":[{"type":"strong"}],"text":"2,500"},{"type":"text","text":"個節點,協調了超過"},{"type":"text","marks":[{"type":"strong"}],"text":"35,000"},{"type":"text","text":"個用於支持Pinterest各項業務的Pod,而這項數據的增長依舊如火箭般竄升。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2020年概況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着用戶採用的不斷增加,負載的種類和數量也在不斷增長。這就意味着Kubernetes平臺需要有更強的可擴展性,才能跟得上日益增長的負載管理、Pod的調度和放置,以及分配和取消分配節點的工作量。隨着更多關鍵業務的負載被搬上Kubernetes平臺,用戶對平臺可靠性的期望自然也水漲船高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全平臺範圍的停機也的確發生過。2020年初,在我們一個集羣上,短時間內有大量的Pod被創建,數量超過了計劃容量的三倍,導致該集羣的自動協調器啓用了900個節點以滿足需求。"},{"type":"link","attrs":{"href":"https:\/\/kubernetes.io\/docs\/concepts\/overview\/components\/#kube-apiserver?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"kube-apiserver"}]},{"type":"text","text":"率先開始出現延遲峯值以及錯誤率的增長,隨後便因資源限制而被OOM殺進程(Out of Memory Kill,內存不足時殺進程)。來自Kubelets的非綁定重試請求導致kube-apiserver負載猛增7倍。寫入請求數量的爆發導致"},{"type":"link","attrs":{"href":"https:\/\/etcd.io\/?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"etcd"}]},{"type":"text","text":"提前到達其總數據量的限制,並開始拒絕所有的寫入請求,平臺無法繼續管理負載。爲了緩解這次事件,我們不得不通過執行etcd操作來恢復平臺運行,例如壓縮舊版本程序,碎片整理冗餘空間,以及禁用警報。此外,我們還得暫時擴容承載kube-apiserver和etcd的Kubernetes主節點,以減少對資源的限制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/22\/22777d160939e336448eb9c1019a420c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kubernetes API服務器延遲峯值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2020年下半年,我們的一個基礎設施組件在kube-apiserver的集成中出了一個bug,表現是短時間內生成大量的、對所有kube-apiserver的Pod和節點的昂貴查詢。這導致了Kubernetes的主節點資源使用量激增,kube-apiserver進入了OMM Kill的狀態。幸運的是,這個出bug的組件被發現得很早,並且很快就回滾了。但在這次的事件中,平臺的性能受到了降級的影響,其中包括負載的處理出現了延遲以及陳舊的服務狀態。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/83\/83ebfe4722014df267c00acccb3f8358.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kubernetes API服務器執行OOM Killed"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲規模化做好準備"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們的Kubernetes之旅中,我們不斷反思自己平臺的治理、彈性和可操作性,尤其是在當事故發生在我們最薄弱的地方時。對於一個工程資源有限的小團隊,我們必須要深入思考,找出問題的根本所在,確定短板的文職,並根據回報與成本的性價比確定解決方案的優先次序。我們面對複雜的Kubernetes生態系統時的策略是,儘量減少方案與社區中提供方法的區別的同時不斷回饋社區,但仍不放棄自己編寫內部組件代碼的可能性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/41\/416d3f8e12b24599a8d687fe9c832558.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Pinterest上的Kubernetes平臺架構(藍色代表我們自己編寫的內容,綠色則是開源內容)"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"治理"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"資源配額的執行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes已有的"},{"type":"link","attrs":{"href":"https:\/\/kubernetes.io\/docs\/concepts\/policy\/resource-quotas\/?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"資源配額"}]},{"type":"text","text":"管理確保了任何的命名空間都無法在絕大多數的維度上"},{"type":"text","marks":[{"type":"strong"}],"text":"無限制"},{"type":"text","text":"地請求或佔用資源,無論是Pod、CPU還是內存。如我們在前文的故障中所提到的,單個命名空間中數量激增的Pod創建事件可能會讓kube-apiserver過載,導致級聯故障。爲確保其穩定性,單個命名空間的資源使用量都應有一定限制,這一點很重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這項任務的難點之一在於,在每一個命名空間強制執行資源配額需要一個潛在條件:所有的Pod和容器都需規定"},{"type":"link","attrs":{"href":"https:\/\/kubernetes.io\/docs\/concepts\/configuration\/manage-resources-containers\/#requests-and-limits?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"資源的請求和限制"}]},{"type":"text","text":"。在Pinterest的Kubernetes平臺上,不同命名空間的負載屬於不同的團隊和不同的項目,而平臺用戶則是通過Pinterest的CRD配置他們的負載。我們對這一問題的解決方案是,爲GRD的轉換層中所有的Pod和容器添加默認資源請求和限制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此之外,我們還在CRD的驗證層中拒絕了所有未規定資源請求和限制的Pod。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一難點則在於,如何簡化跨團隊和組織的配額管理。爲了資源配額的安全實現,我們參考了過往的資源使用情況,在其高峯值的基礎上額外增加20%的淨空,並將其設置爲所有項目資源配額的初始值。我們創建了一個定時任務來監控配額的使用情況,如果某個項目的用量接近一定限度,會在營業時間內向負責該項目的團隊發送警報。這項設置鼓勵了負責團隊對項目進行更好的容量規劃,並在資源配額髮生變動時提出申請。資源配額的變更在人工審覈並簽署之後,纔會進行自動部署。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"客戶端訪問的執行"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們強制要求了所有的KubeAPI客戶端都遵循Kubernetes已有的最佳實踐:"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"控制器框架"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/github.com\/operator-framework?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"控制器框架"}]},{"type":"text","text":"爲優化讀取操作提供了一個利用"},{"type":"link","attrs":{"href":"https:\/\/godoc.org\/k8s.io\/client-go\/informers?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"informer-reflector-cache"}]},{"type":"text","text":"的、可共享的緩存架構。"},{"type":"text","marks":[{"type":"strong"}],"text":"Informer"},{"type":"text","text":"通過kube-apiserver監控目標對象,"},{"type":"text","marks":[{"type":"strong"}],"text":"Reflector"},{"type":"text","text":"將目標對象的變更反應到底層緩存("},{"type":"text","marks":[{"type":"strong"}],"text":"Cache"},{"type":"text","text":")中,並將觀測到的事件傳播給事件處理程序。同一控制器中的多個組件可以爲OnCreate、OnUpdate,以及OnDelete事件註冊Informer事件處理程序,並從Cache中獲取對象(而非是直接從Kube-apiserver中獲取)。這樣一來,就減少了很多不必要或多餘的調用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/73\/7359ec9bc4da49c51ecfc811c735d53b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kubernetes的控制器架構"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"速率限制"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes的API客戶端通常會在不同的控制器中共享,而API是在不同的線程中調用的。Kubernetes的API客戶端是與它支持的可配置QPS和burst的"},{"type":"link","attrs":{"href":"https:\/\/en.wikipedia.org\/wiki\/Token_bucket?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"token桶速率限制器"}]},{"type":"text","text":"相綁定的,burst超過閾值就節制API的調用,這樣單個的控制器就不會影響到kube-apiserver的帶寬。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"共享緩存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了控制器框架自帶kube-apiserver內置緩存之外,我們還在平臺的API中添加了另一個基於Informer的寫通緩存層。這種設置是爲了防止不必要的讀取調用對kube-apiserver的衝擊,重複利用服務器端的緩存也可以避免應用程序代碼中過於繁雜的客戶端。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於"},{"type":"text","marks":[{"type":"strong"}],"text":"從應用中"},{"type":"text","text":"訪問kube-apiserver的情況,我們強制要求所有的請求都需要通過平臺API,利用共享緩存爲訪問控制和流量控制分配安全身份。對於"},{"type":"text","marks":[{"type":"strong"}],"text":"從負載器"},{"type":"text","text":"訪問kube-apiserver的情況,我們強制要求所有控制器的實現都是要基於帶有速率限制的控制框架。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"恢復力"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"硬化Kubelet"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kubernetes的控制平臺會進入級聯故障的一個關鍵原因是,傳統的反射器(Reflector)的實現在處理錯誤時會有"},{"type":"text","marks":[{"type":"strong"}],"text":"無限制次數"},{"type":"text","text":"的重試。這種小瑕疵在有些時候會被無限放大,尤其時當API服務器被OMM Kill時,很容易造成集羣上所有的反射器一起進行同步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決這個問題,我們通過與社區的緊密合作,反饋"},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/87794?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"問題"}]},{"type":"text","text":"、討論解決方案,並最終讓我們的pull request(注"},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes\/kubernetes\/pull\/87829?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"1"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/github.com\/kubernetes\/kubernetes\/pull\/87795?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"2"}]},{"type":"text","text":")通過審覈併成功merge。我們的想法是通過在反射器的ListWatch重試邏輯中添加指數回退,這樣,當kube-apiserver過載或請求失敗時,kubelet和其他的控制器就不會繼續嘗試kube-apiserver了。這種彈性的改進在大多數的情況下都是錦上添花,但我們也發現,隨着Kubernetes集羣中節點和Pod數量的增加,這種改進的必要性也體現出來了。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"調整併發請求"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着我們管理的節點的數量的增加,負載的創建和銷燬速度越快,QPS服務器需要處理的API調用數量也在增加。我們首先根據預估的負載大小,上調了變異和非變異操作的最大併發API調用的次數。這兩項設置將限制需要處理的API調用次數不能超過配置的數量,從而使kube-apiserver的CPU和內存消耗保持在一定的閾值之內。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Kubernetes的API請求處理鏈中,每個請求在最開始都需要通過一連串的過濾器。最大機上API調用的限制則是在過濾器鏈中實現的。對於超過配置閾值的API調用,客戶端會收到“請求過多(429)”的反饋,從而觸發對應的重試操作。在未來,我們計劃對"},{"type":"link","attrs":{"href":"https:\/\/kubernetes.io\/docs\/reference\/access-authn-authz\/admission-controllers\/#eventratelimit?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"EventRateLimit功能"}]},{"type":"text","text":"進行更深入的研究,用更精細的准入控制提供更高的服務質量。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"緩存更多的歷史紀錄"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Watch緩存時kube-apiserver內部的一種機制,它將各類資源中過去的事件緩存到一個環形緩存區中,以方便特定版本的watch調用。緩存區越大,服務器能保存的事件就更多,連接中斷時爲客戶端提供的事件流也就更流暢。鑑於這一事實,我們還改進了kube-apiserver的目標RAM大小,其內部最終會轉移到Watch緩存容量中,以提供更穩健的事件流。Kube-apiserver有提供更詳細的配置更詳細的Watch緩存大小的方法,可以用於更精細的緩存需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/95\/95c959d896fd8749969172790de6df0f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kubernetes的Watch緩存"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"可操作性"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"可視性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲減少事故檢測和緩解的事件,我們不斷改善Kubernetes控制平面的可視性。這一點的挑戰在於要如何平衡故障檢測的覆蓋率和信號的靈敏度。對於現有的Kubernetes指標,我們通過分流並挑選重要區域進行監測和報警,如此一來,我們就可以更加主動地發現問題所在。除此之外,我們還對kube-apiserver進行監測,以覆蓋更細小的區域,從而更快地縮小問題根源所在區域。最後,我們調整了警報統計和閾值大小,以減少噪音和錯誤警報。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在高層次上,我們通過查看QPS和併發請求、錯誤率,以及請求延遲來監控kube-apiserver的負載。我們也可以將流量按照資源類型、請求動詞以及相關的服務賬號進行細分。而對於listing這類的昂貴流量,我們通過對象計數和字節大小來計算請求負載,即使只有很小的QPS,這類流量也很容易導致kube-apiserver過載。最後,我們還監測了etcd的watch事件處理QPS和延遲處理的計數,以作爲重要的服務器性能指標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1f\/1f377d6598592b2149f2e4cf089490db.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kubernetes API調用類型分類"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"可調試性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能更好地瞭解Kubernetes控制面板的性能和資源消耗,我們還用"},{"type":"link","attrs":{"href":"https:\/\/github.com\/etcd-io\/bbolt?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"boltdb"}]},{"type":"text","text":"庫和"},{"type":"link","attrs":{"href":"https:\/\/github.com\/brendangregg\/FlameGraph?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"flamegraph"}]},{"type":"text","text":"搭建了一個etcd數據存儲分析工具,以便對數據存儲的細化進行可視化展示。數據存儲分析的結果讓用戶可以洞察其內部,更便於進行優化。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/67\/679130cf72970bb88cf81011a7496be0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"單個密鑰空間的Etcd數據用量"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除此之外,我們還啓用了Go語言的剖析工具"},{"type":"link","attrs":{"href":"https:\/\/blog.golang.org\/pprof?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"pprof"}]},{"type":"text","text":"以及可視化的堆內存足跡,以快速找出資源最密集的代碼路徑和請求模式,一個例子是,列表資源調用的轉換響應對象。在我們調查kube-apiserver的OOM時發現的另一個大問題是,kube-apiserver使用的"},{"type":"link","attrs":{"href":"https:\/\/www.kernel.org\/doc\/Documentation\/cgroup-v1\/memory.txt?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"頁面緩存"}]},{"type":"text","text":"是被計入cgroup的內存限制的,而通過使用匿名內存可以盜取同一cgroup的頁面緩存用量。於是,即使kube-apiserver只有20GB的堆內存用量,整個cgroup的200GB內存用量也有可能到達上限。雖然目前的內核默認設置是不主動回收分配的頁面以有效地重複利用,但我們目前仍然在研究基於memory.stat文件的設置的監控,並強制要求cgroup在內存使用接近上限時儘可能多地回收頁面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1b\/1ba22e2f178482fd246d348ed941825b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kubernetes API服務器的內存詳情"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過在治理、彈性和可操作性方面的努力,我們在很大程度上避免了計算資源、控制面板帶寬方面的用量激增情況,確保了整個平臺的性能和穩定性。優化後的kube-apiserver減少了90%的QPS(主要是讀取),使得kube-apiserver的用量更加穩定、高效、健壯。我們從中學習到的關於Kubernetes內部的深入認知和額外見解,讓我們的能夠更好地進行系統操作和集羣維護。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0c\/0c5ab35b29f79c38fcdb8269d5519fb6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"優化推出後Kube-apiserver的QPS減少情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下是我們在這段旅程中的一些重要收穫,希望能夠對你在處理Kubernetes的可擴展性和可靠性問題上有所幫助:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"診斷問題並找到其"},{"type":"text","marks":[{"type":"strong"}],"text":"根源"},{"type":"text","text":"所在。在考慮“怎麼辦”之前先搞清楚問題“是什麼”。解決問題的第一步是瞭解瓶頸在哪以及其生成原因。如果你找到了問題的根本,那麼你已經掌握了一半的解決方案。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"首先嚐試"},{"type":"text","marks":[{"type":"strong"}],"text":"小規模"},{"type":"text","text":"、"},{"type":"text","marks":[{"type":"strong"}],"text":"漸進式"},{"type":"text","text":"的優化,而不是直接上手激進的架構改革。在大多數情況下,這種方法不會讓你喫太大的虧,這點在資源不足的小團隊中也是很重要的。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"在規劃或調整調查和修復的優先順序時,請將"},{"type":"text","marks":[{"type":"strong"}],"text":"數據"},{"type":"text","text":"作爲參考憑據。正確的遙測技術可以幫助你在決定優化項目順序時做出更好的決定。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"關鍵的基礎設施組件在設計時應考慮到其復原力。分佈式系統總是會出現故障的,我們總要做好"},{"type":"text","marks":[{"type":"strong"}],"text":"最壞的打算"},{"type":"text","text":"。正確的防護可以有效防止級聯故障,並將事故的影響範圍最小化。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"展望未來"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"聯邦"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着我們規模的穩步增長,單一集羣架構已經不足以支持日益增加的負載。目前的單一集羣環境是高效且穩定的,我們的下一個里程碑即是將這個計算平臺向橫向擴展。通過利用聯邦框架,我們的目標是以最小的操作開銷將新的集羣插入到環境中去,同時保持終端用戶平臺界面的穩定。我們的聯邦集羣環境目前還在開發階段,期待它在產品化後能爲我們帶來更多的可能性。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"容量規劃"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們目前採用的資源配額的執行方法是簡化後的、並依賴反應的容量規劃方式。隨着我們不斷添加用戶的負載和系統組件,平臺的動態變化、項目的等級,或者是集羣範圍的容量限制可能無法跟上版本的變化。我們希望能夠探索出一種主動的容量規劃方案,根據歷史數據、增長軌跡,以及涵蓋資源配額和API配額的複雜的容量模型進行預測。這種更主動,也更準確的容量規劃可以有效防止平臺的過度承諾和交付不足。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/medium.com\/pinterest-engineering\/scaling-kubernetes-with-assurance-at-pinterest-a23f821168da?fileGuid=prJWDc8Hk9cjRkvJ","title":"","type":null},"content":[{"type":"text","text":"https:\/\/medium.com\/pinterest-engineering\/scaling-kubernetes-with-assurance-at-pinterest-a23f821168da"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章