基於Kubernetes Operator的網易數帆生產級雲原生中間件實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在近日的ArchSummit全球架構師峯會2021上海站上,繼網易副總裁、杭研院執行院長、互聯網技術委員會主席、網易數帆總經理汪源發表主題演講《打造開放的雲原生操作系統和系統軟件架構》之後,網易技術委員會委員、網易數帆基礎架構總監張曉龍向與會者進一步講述了網易數帆在雲原生中間件上的思考、實現與經驗。本文爲演講內容實錄。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/33/3317c4f70e2dc01dae868b94f8ac3b87.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今天給大家分享我們面向生產環境的中間件容器化實踐,主要包括四個部分的內容:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一部分從基礎中間件面臨的運維挑戰出發,介紹網易解決這些挑戰的技術演進路徑,以及爲什麼要去做中間件容器化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二部分介紹中間件容器化的需求以及網易數帆整體平臺架構。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三部分針對中間件容器化過程中的一些共性問題,給出我們的思考,以及最佳實踐。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後是中間件容器化工作的總結和未來的計劃。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基礎中間件的挑戰","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在容器技術出來之前,基礎中間件技術如MySQL、Redis、Kafka等早已開源,併成爲服務端架構設計的標準組件,一個典型的互聯網應用,數據庫、緩存、消息隊列三大中間件是必不可少的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"架構師應用這些中間件去架構一個個應用平臺非常簡單,但運維人員遇到了較大的問題,包括如下5個方面:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"中間件本身是比較複雜的分佈式系統,運維需要理解這些分佈式系統的工作原理,編寫出適合它們的運維腳本,複雜性非常高;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"運維效率比較低下,50個以下MySQL實例用手工運維可能沒有問題,但500、1000個數據庫實例,或者如網易雲音樂的數千個Redis實例,如果還用手工腳本來運維,效率必然很低;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性不足,這是由於運維人員總是用手工腳本來運維,在線上抄命令,不小心抄錯命令可能中間件就宕了;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"傳統的中間件是部署在物理機上面的,而物理機制沒辦法提供很強的資源彈性;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"所有比較資深的中間件運維都基本上在互聯網上大廠,因爲這些運維非常複雜,一般企業很難招到一個非常專業的運維,我們認爲解決這個挑戰的最佳實踐,是將中間件運維能力雲服務化。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"將這些中間件做成雲服務有幾個優勢。第一是運維簡單易上手,第二能夠高效地實現大批量實例的自動化運維,第三有很強的SLA保障,因爲不需要敲太多手工的一個命令。第四是能借助IaaS彈性資源能力快速擴容。最後因爲整個運維變得簡單,不再需要大量的專業人員就可以幫業務運維好中間件。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實公有云廠商也看到了這個趨勢,國內三大主流公有云都把開源的基礎中間件做成了雲服務。我想這主要有兩個原因:首先,IaaS資源層面競爭趨於同質化,把PaaS中間件做成雲服務可以消耗更多的資源,把用戶綁定得更深;其次,中間件作爲雲上的增值服務,毛利率遠高於雲主機、雲硬盤,所以很多公有云用戶不喜歡RDS,自己買雲主機搭MySQL。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決中間件運維複雜性的挑戰,網易在六七年前就研發了一個雲基礎中間件平臺。這個平臺有一些技術特點,首先是基於IaaS提供資源彈性,也就是說中間件運行的計算資源是雲主機,存儲資源是雲盤,網絡資源可能就是在租戶的VPC裏面。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二它採用了IaaS的租戶隔離策略,如果一個租戶想要中間件實例,平臺就用他的雲主機、雲硬盤自動化地幫他搭起來,可以做到不同租戶之間很好的隔離。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0d/0d094aa492f7ad45d2f012af4f025602.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們當時研發了6款基礎中間件雲服務,業務團隊研發產品需要中間件,它只需要接入這些雲服務就可以了,不需要重新做一遍。我們主要做的是左邊的控制管理部分,比如實例高可用、部署安裝、實例管理等。當時我們也取得了一些成效,大大提升了運維團隊對中間件的運維能力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着時間的推移,第一代基礎中間件暴露出了三大缺陷,難以解決。第一大缺陷是極限性能不足。因爲它使用KVM虛擬機作爲計算資源,比在物理上運行有非常大的性能折損,沒辦法滿足業務高負載/高壓力下對中間件性能和穩定性的苛刻要求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二是實現資源成本太高,因爲它是基於OpenStack來提供資源編排能力,另外KVM虛擬化技術強隔離的特性使得內存資源沒辦法在多箇中間件實例之間共享,這兩個因素使得跑在虛擬機上的中間件實例部署密度非常低,哪怕有租戶的中間件負載不高,他也不可能把內存釋放,因爲KVM是強隔離的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三點它的交付非常不靈活,它就跟網易的IaaS綁定,沒辦法支持我們未來把它商業化,輸出到網易以外的企業,這個企業的基礎設施可能是在公有云上,也可能是在自己的IDC機房。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"中間件容器化的思考","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近幾年,Docker、Kubernetes等容器技術誕生並飛快發展,無狀態應用的容器化已經成熟,我們認爲容器作爲一個新的已經廣泛落地的基礎設施的技術,完美地對應了第一代基礎中間件的缺陷能力—弱隔離有有助於資源共享;輕量化的虛擬化能夠消除性能損耗,滿足業務在高負載場景;基於鏡像進行標準化的封裝,有利於高效交付;還有強大靈活的調度能力;最關鍵的一點,它是整個雲原生技術棧的一個基石。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Kubernetes編排技術,最關鍵的是它跟基礎設施是松耦合的,使得我們能夠將應用搬到任何一個地方,因爲它就是面向混合雲設計的。另外它是面向大規模生產環境的設計,繼承了Google的大規模生產環境的經驗,所以用容器技術解決中間件服務化的問題是有希望的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/86/86255c96f7190828749645f6957b4c46.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網易內部基於Kubernetes構建了一套雲原生操作系統,它向下能夠適配各類的基礎設施資源,向上能夠作爲各種應用負載的統一提供商--這也是Kubernetes的目標之一。中間件正是整個雲原生操作系統所要支撐的一類業務。從這個角度來看,中間件容器化也是順理成章的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中間件容器化要解決它的運維問題,尤其下面幾個需求必須要考慮的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一,生命週期的管理,我們需要容器化中間件平臺能夠幫助運維完成對於中間件實例級別的各種運維操作,網易數帆會基於Kubernetes Operator這一套框架來實現。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二點是高可用的部署,中間件,特別是在追求更高的可用性的情況下,往往要做多機房的部署,一箇中間件集羣裏面的所有實例,要按照什麼樣的比例分佈在不同的機房,標準的Kubernetes調度器沒辦法做到,我們需要擴展Kubernetes的調度器來實現這樣的編排。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,還要完善監控告警的指標,這個指標就對應雲原生的Prometheus的可觀測性體系。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能是第一代中間件的一個痛點,我們要確保容器化中間件基本達到物理機部署的性能才能支撐核心應用,這需要有針對性地優化各類中間件實例的性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有一點是產品化,因爲我們希望中間件容器化不僅能夠在網易使用,還能夠商業化輸出,所以我們參考公有云上RDS、Redis的產品形態,需要有同等的產品能力,能夠在任意的基礎設施上低成本、靈活交付,我們必須採用松耦合和高複用的架構設計。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"網易數帆選擇了Kubernetes Operator的機制。從深層次理解,Kubernetes構建了一個分佈式系統部署運維所需的“原語”,它內置的對象如Pod、Node、Deployment、StatefulSet等,都是爲了實現一個典型的無狀態分佈式系統提出來的。這些內置的對象相互配合,使得無狀態應用的部署和運維非常高效。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是Kubernetes內置的這些對象沒辦法直接解決中間部署運營的問題。第一點,中間件是有狀態的,它的狀態是存儲,可能網絡 IP。第二,中間件實例與無狀態應用的實例不同,後者的副本相互之間沒有關係,而中間件實例和實例之間、副本和副本之間是有關係的,是要相互訪問的,中間件之間形成一個複雜的拓撲關係,比如在做故障恢復時,Redis兩個副本之間是有主從關係的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/40/40484e2b38e5da3f4dda0b8e0a87c297.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"社區在兩年多之前也開始實現中間件或者說有狀態的應用,提出了一套Operator開發框架。如果我們把Kubernetes理解成爲一個操作系統,那麼Operator就是在這個操作系統上開發原生應用的一套開發框架,支持更高效、更自動化、更可擴展的開發方式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Operator有4個特點,第一它是需要開發出來,是遵循的聲明式的編程理念,有對象的定義,還有控制器部署。Operator其實是一個控制器,遵循着觀察、分析、行動的決策鏈閉環。如果用戶定義了4個資源,Operator就分析這4個資源當前的狀態和目標狀態有哪些不一致。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖中可以看到當前的狀態有1個Pod,他現在是0.0.1的版本,我們定義的狀態要求0.02,還少了一個Pod,如果發現了不一致,它會有一些Action,再擴一個Pod,把它升級到0.0.2。我們實現Operator,其實就是去寫這些Action應該怎麼做。這實際上是封裝了特定領域的運維知識跟經驗,能夠被設計用來管理複雜的狀態應用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Operator開發框架的主體包括三部分,第一部分operator-sdk,研發的一個腳手架;第二部分是operator-lifecycle-manager,一個生命週期管理的組件;第三部分是operatorhub.io,既然任何人都可以爲開發一個應用,一個它可以部署安裝運維的應用,他就應該可以把這個應用放到一個應用市場,operatorhub.io就是這樣的一個應用市場。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不同的機構去開發 Operator,在運維看來是有一定的成熟級別的,應用部署都能夠自動化運維,這是對應運維最希望的一個級別。最基本的第一個級別就是基本安裝Operator,該怎麼去做到把原來安裝部署腳本,用Operator這種工程模式實現。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這是網易數帆實現的一個基於Kubernetes Operator的中間件平臺架構,包括控制面和數據面。左邊控制面面向運維管理的能力,包括一些跟中間件業務無關的但是大家都需要的通用組件,如審計、認證權限、控制檯等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/12/12f1800faadba5fbeac2559248e5c4e0.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中間就是中間件Operator,在這裏我們用Operator的機制研發了Redis、Kafka、MySQL等中間件。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們實現了中間件的生命週期管理,這些Operator本身也是運行在Kubernetes的上面,而且它是一種無狀態應用,以Deployment方式可以運行在上面,因爲它的狀態都是存在etcd裏面的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再下面是Kubernetes的管控面,Master節點需要的一些組件。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最下面是日誌、監控、報警的組件,我們自研的一個日誌管理平臺實現從採集信息去動態更新它的配置,以及把日誌收集上來。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"右邊是中間件的數據面,我畫了三個Node,我們把一箇中間件的集羣用StatefulSet來實現,每一個實例跑在一個Pod上,每個Pod可能會聲明它的對持久卷的用途,Pod跟Node之間是有拓撲關係的,它需要相互進行數據和拓撲同步,用於狀態變更以及故障恢復。每個節點上都會運行Kubernetes的兩個組件,Kublet,kube-proxy,還有一個採集器,用於日誌監控。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們還實現了Pod的掛盤功能,不管是本地盤還是遠程盤,通過 StorageClass的方式去實現,這也是Kubernetes的標準。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"中間件容器化的共性問題與解決之道","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來探討中間件容器化過程中的一些共性問題的解決辦法。中間件最大的特點在於它是有狀態的,Kubernetes只負責計算的編排,中間件的狀態存儲有兩種可能,一種是遠程存儲,一種是本地存儲。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們認爲遠程存儲是最佳實踐。如果你在私有云環境上有一套類似於開源Ceph的遠程分佈式存儲,應該毫不猶豫地使用它來存。如果說Ceph性能不足,你可以找其他更好的分佈式存儲來去直接用。如果你在公有云上,那你應該毫不猶豫地用雲盤來作爲中間件的存儲。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很多情況下,本地存儲是不得已而爲之的一個選擇,因爲沒有太靠譜的分佈式存儲,有可能這個分佈式存儲性能不行,和用本地盤跑起來相差很遠,也有可能分佈式系統後端可靠性不行,會丟數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲此,我們實現了本地存儲的接入。我們做本地存儲需求有兩個,一是要求當Pod去申請PVC的時候做好動態管理配置,本地盤在創建、刪除時,要去做對應的操作。同時在Pod調度時,要實現它與本地盤強綁定,既然Pod開始創建的時候,有本地盤在某一個Node上,你必須保證Pod經過故障恢復或者重調度之後還是跑在那個Node上,以確保中間件數據不丟失。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在技術實現上,我們對於節點上的本地磁盤引入了一個LVM去動態的管理,也採用了Kubernetes Local PV,後者的不足在於需要運維提前在節點上創建PV,這個是不可取的。所以我們做了兩件事,一是調度器擴展,實現本地存儲的資源準備,在創建Pod時聲明所需本地盤的大小,它就能夠動態給創建掛載到這個Pod裏面去,不需要運維提前手動準備。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ea/ea4e62f200715a08c6ecc745ea1c828d.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 如圖中一個Pod的調度過程,用戶創建了一個Pod,它聲明瞭一個PVC,我們加了一個本地存儲調度器擴展,先做一個預調度,算一下每個節點上的本地盤的存儲容量夠不夠,如果夠就把 Node的信息也放到PVC裏面,接下來通知這個Node上一個本地存儲資源準備器,讓資源準備器收到請求的時候去調用LVM把存儲資源給創建出來,並把對應的 PV創建出來。在資源準備器上把PV和PVC綁定,然後通知調度器可以把Pod調度到這個節點上,因爲聲明的本地存儲已經準備好。接下來用Kubernetes把那個節點所在的本地盤掛載到Pod裏面去,完成一個整體的調度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關於中間件容器化的網絡,有兩個場景的實現。第一個場景,我們設計的中間件運行在不同的基礎設施上,對應不同的網絡配置,如果是物理網絡,可以用Calico、Flannel這樣的網絡方案,直接用它的CNI;如果是公有云,就對接公有云上的VPC網絡,好處是每一家公有云都爲Kubernetes提供了一個標準CNI,使得運行在雲主機上的Kubernetes可以去接入他們的網絡。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個場景,我們需要優化網絡性能。我們引入了一個容器的SR-IOV方案,好處是能夠做到優於物理機的低時延。它採用的是網卡直通技術實現,能夠降低50%的時延,可以滿足一些對時延要求很高的超高性能任務需求,但PPS提升不了。直通少了網絡傳輸的虛擬化開銷,但是缺點也比較明顯,這個方案只能用在物理網絡,因爲它完全依賴於硬件網卡,無法用在公有云上實現網絡加速。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在物理網絡環境上要去處理網卡異構問題,包括說是我們可能用英特爾網卡,可能有Mellanox的網卡,需要對VF(SR-IOV的一個概念)進行精細管理。我們把VF當成一個擴展的調度資源,通過標準的Kubernetes Device Plugin 來發現和註冊節點的VF資源, 結合label和taint標記,原生的調度器就可以進行資源管理和分配。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輕舟中間件的集羣是用StatefulSet抽象的,每個實例都是StatefulSet的一個Pod,StatefulSet只能做到Pod的名字不變,它發生不同更新的時候,或者掛了再恢復的時候,都保持Pod的名字不變,但是它沒辦法保持Pod的IP不變。然而,在傳統的中間件運維眼裏,基於物理機部署的IP是不變的,機器重啓之後也還是原來的IP,所以他們的一些運維習慣,都是喜歡用IP而不是域名。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了讓容器化中間件能夠更快地推廣落地,以及兼顧已有的應用,我們做了保持StatefulSet的IP不變的功能,通過引入一個全局的容器地址池組件接管對Pod IP的分配來實現。創建StatefulSet的時候,把分配給它的IP記錄好,哪怕Pod更新的時候被刪掉,IP還給保持住不釋放,等它重新建起來之後,如果名字跟原來那個是一樣的,就把這個IP重新分配給他。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"工程化,我們研發容器化中間件,相對於第一代基於虛擬化的中間件,因爲重用了Kubernetes內置的一些概念以及它在運維、控制上的一些機制,使得我們去研發相同的基礎中間件,研發代價能夠大幅度減少,這個體現在代碼比第一代基礎中間件要減少很多,當然這個代碼減少也是有代價的——開發人員必須非常瞭解Kubernetes Operator這套開發框架,必須得深刻地理解Kubernetes聲明式編程的概念,他才能寫出來。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/29/29565515fda4ce490da5d25391905f42.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在質量保障方面,我們做了兩個事情,第一個就是混沌測試,就是故障測試,基於開源的ChaosBlade去模擬Kubernetes資源故障對中間件服務的影響,另外我們也藉助Kubernetes e2e測試框架來確保運維人員能夠模擬各種中間件實例的生命週期操作是否正常。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"還有一點,要做中間件實例生命週期管理,需要做監控、告警,很多情況下它的 UI都是有共同之處,UI的使用模式都是一樣的,這是我們設計的一個前端頁面渲染,渲染引擎使得用動態表單機制能夠很快地開發控制檯,後端通過配置一下就可以實現控制檯業務的開發能力,這樣使得研發代價更小。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能優化,我們採取了一些策略,使得容器化中間件的性能基本接近於它運行在物理機上的水平。我們在CPU開了性能模式,降低喚醒延遲。在內存方面,我們關閉SWAP及透明大頁,調優同步內存髒頁回寫閾值,這些都是參數級的調優。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/97/97577372b0a4946073f338c82880a43a.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"I/O方面使能內核blk-mq,增大預讀緩存。還有一個比較重要的就是網卡中斷,我們將物理方法中斷跟容器的veth虛擬網卡中斷處理跟CPU給隔離了,確保系統性能不發生抖動。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NUMA也是我們優化的一點,這在高負載上面體現得比較明顯。我們使得容器部署感知NUMA拓撲,將Pod儘量的分配在本地的NUMA,儘量不要讓一個Pod跨NUMA,以免帶來比較大的CPU緩存的開銷。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一代中間件的一個缺陷是不能夠去往外交付。去年我們做了容器化中間件這個產品,名字叫輕舟中間件,具備基礎中間件的標準能力。在接入層我們也增加了一些能力,因爲我們基於Kubernetes來做的,運維人員甚至可以通過Kubectl、YAML文件就可以運維中間件。中間件服務層,我們實現了7個基礎中間件服務,這些中間件基本上具備了前面提到的核心運維能力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/76/7614c6f2d339591b9423ac7f209db846.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 整體上中間件基於Operator,能夠跑在任意Kubernetes集羣之上,底層的資源無所謂,公有云的虛擬機可以作爲Kubernetes的Node,雲盤可以作爲Kubernetes的存儲。另外,我們也允許社區基於Operator開發的一些中間件在我們的平臺上跑。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未來展望","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術是爲業務服務的,中間件最大的痛點是運維,要把它做到託管的雲服務去解決,而容器技術的優勢使得中間件容器化成爲實現中間件雲服務的最佳實踐。在實現上需要Operator,需要有更加雲原生的模式來把容器化中間件給研發出來,當然對開發人員的要求也很高的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未來的計劃有兩點,第一,我們現在的容器化中間件平臺可以跑在任意Kubernetes上面,但是我們還是要做到跑在Kubernetes發行版上,如OpenShift、Rancher等,希望容器化中間件這些Operator也能跑在上面,但是需要做一些兼容。第二,我們整體是想建設雲原生操作系統,中間件是其中的一個負載,我爲什麼不把中間件的負載和無狀態應用負載實現混部?這樣可以給公司帶來更高的一個資源利用率,可以降低成本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"謝謝大家!","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章