百度愛番番與Servicemesh不得不說的故事

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"導讀:","attrs":{}},{"type":"text","text":"服務網格( Servicemesh )於 2018 年夏天隨着 Istio1.0 的正式發佈席捲全球,國內各大公司也遍地開花,其所帶來的理念逐步爲各方所接受並風靡。愛番番基於自身的痛點和 ToB 行業的特點,攜手公司基礎架構,於 2020 年 8 月底正式啓動了 Servicemesh 項目,僅用 3 個月就快速完成了 Java 業務應用的全切,成爲百度第一個將商用生產系統完全基於原生 Kubernetes + Istio 運行的產品。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"全文6492字,預計閱讀時間12分鐘。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"一、緣起:沉浸式治理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"愛番番作爲一站式智能營銷和銷售的加速器,旨在助力企業實現業務增長。在溝通、營銷、銷售、洞察等領域持續發力,在 ToB SaaS 行業中面臨着激烈的競爭,這就意味着在技術上對系統穩定性和研發人效有着非常高的要求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而回頭來看,當下愛番番在業務上面臨着諸多挑戰:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.多語言治理難。","attrs":{}},{"type":"text","text":"存在着 Java、Golang、Nodejs、Python 等語言,在服務治理上主要支撐 Java 的需求,其餘語言的治理或自成一套,或基本缺失。其將帶來很大的治理成本和系統風險。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"2.業務耦合。","attrs":{}},{"type":"text","text":"當前採用 Smart Client 的服務治理框架,推動迭代升級困難。服務治理的週期平均在三個月以上,帶來極大的運維升級成本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.能力缺失。","attrs":{}},{"type":"text","text":"當前採用的服務治理框架缺乏足夠的治理手段,如限流熔斷、混沌、金絲雀、服務分組、流量錄製回放、動態配置等能力的支持。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.人肉配置。","attrs":{}},{"type":"text","text":"當前服務治理框架將治理粒度全部降到方法級,其直接導致過於大量( 2k+ 方法)的人肉配置要求帶來的事實上的不可配置。直接導致愛番番服務治理平臺處於事實上的無人使用狀態。也正因此出過一些嚴重的線上問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因而服務治理的現狀即:治理邊際成本無法下降,反而呈指數上升趨勢,治理由於成本過高只能基於問題驅動進行。這也是業內很多公司服務治理的現狀。最終在效能、穩定性、能力三方面,都面臨着很大的挑戰。同時,由於居高不下的治理成本,我們業務上要進行「 多雲/私有化部署 」的售賣目標看起來將會遙遙無期。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"筆者稱這種治理爲:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"沉沒式治理。看着永遠在治理,其實永遠在沉沒。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"二、抉擇:下一代的服務治理體系","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決以上問題,扭轉沉沒式治理的困境,我們展開了一次艱難而不得不進行的選擇。是否能夠有辦法,既可解決 Smart Client 帶來的多語言&業務耦合的難題,又可以具備功能豐富而治理粒度適宜的服務治理能力?而且考慮到有限的資源,能夠以拿來主義的務實態度去進行問題的解決?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過層層篩選和論述,擺在我們眼前的答案逐漸清晰了起來:服務網格(Servicemesh)。我們選擇了目前的事實上的雲原生標準服務網格設施:Istio。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.1 什麼是服務網格","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務網格( Servicemesh,以下簡稱 Mesh )概念於 2017 年春正式提出,並與2018年夏隨着Google、IBM、Lyft 共建的 Istio1.0 的正式發佈席捲全球。其出現主要在於解決 Smart Client 帶來的一大難題 —— 「 如何解決服務治理與業務代碼強耦合以及跨語言場景治理效率低下 」的問題。Mesh 給出的解決方案即:倡導將服務治理能力進行就近下沉,統一由 Sidecar 進行接管南北東西流量。這樣最直接的好處即可以實現解耦,應用自身 “黑盒化”,整體服務治理進一步實現標準化,達到運營效率提升。在","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此之上,快速進行各種服務治理能力的增強,“一處開發,處處具備” ,徹底解放生產力,如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ae/aee5e4d5494b9609bd8ff2b5b491332e.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Istio 從邏輯上可以分爲數據平面和控制平面,如下圖:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據平面主要由一系列的智能代理(默認爲 Envoy)組成,管理微服務之間的網絡通信以及收集和報告所有 mesh 中的遙測數據。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"控制平面負責管理和配置代理來路由流量。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/53/5355dd262b7bd8be0aa78c8c0ef81b41.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.2 服務網格的曲折前進","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務網格是一個新的概念,但本身並不是一個新奇的架構設計,早在十多年前,Airbnb 就已經在其治理框架 Smartstack 中進行了實踐,攜程的 OSP ,以及充斥在各種雲(mesos/marathon、k8s)裏面的服務治理解決方案都早已是類似的 Local Agent 架構。但此時,業內並未形成統一的標準,而其運維的複雜度也讓諸多人望而卻步。而隨着 k8s 重新定義了基礎設施,服務網格則應運而生重新定義了 Local Agent。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着服務網格的大放異彩,對應的問題也隨之而來。不少人對於 Mesh 理念延伸出的問題如性能、穩定性和資源開銷表現出不同程度的擔憂和質疑,其也直接導致了最具盛名的 Linkerd 的折戟,以及 Istio 架構上的曲折前進。Istio在經歷了控制平面性能漫長的質疑期後,終於不破不立移除了 Mixer,引入了 WASM 機制在數據面上進行插件化能力增強。這是很艱難而勇敢的一步,但也同樣會面臨新的風險。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時至今日,是否要用 Mesh,什麼時候使用 Mesh,如何用好 Mesh,Mesh 的定位和未來仍然爲大家所津津樂道。這也正是其的魅力所在。而從整體上看,Istio 開源社區表現出了積極開放的心態,我們有理由相信,Istio 在成爲服務網格的事實標準之後,能夠不斷釋放更大能量。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"縱觀目前業內 Mesh 的落地情況:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.騰訊雲基於 Istio 推出了 TCM,支持進行集羣託管或者自建,可對多地域流量管控;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.螞蟻 Sofa-Mosn 另闢蹊徑,以 Golang 語言重寫 Mesh 並進行獨立演化,在國內大放異彩;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.美團點評也正在大力推進 OCTO2.0 服務治理體系,進行基於 Envoy+ 自研控制面板的Mesh 轉型;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4.百度內部有 BMesh 和天合Mesh 兩款 Mesh 產品;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5.頭條、快手正在進行對應的建設,網易輕舟進行了 Mesh化,陌陌構建了Java 版 Mesh;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6.Azure、AWS、Google Cloud 都推出了Mesh產品;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7. ......","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整體情況如下圖不完全列舉所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0e/0ee6c3bdb2532fde232d415e19786029.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"我們可以進一步歸納看到:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.Envoy( Istio 默認使用了 Envoy )已成爲事實標準;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.Istio 項目還在快速迭代演進並趨於生產穩定;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.全球主流雲廠商和國內大量公司都已落地 Mesh;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4.目前主流做法採用(二次開發)Envoy + 自研控制面板;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5.業內正在嘗試通過中間件下沉享有 Mesh 紅利。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"我們的選擇:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.從 ROI 來說,我們並不希望自己從 0-1 去自建 Mesh,我們希望集中更多資源投入業務迭代中,所以我們抱定「 滿足 80% 的能力,剩餘的 20% 可以妥協可以增強 」的思路來進行下一步的選擇。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.從語言棧來說,由於 Mesh 本質是「 寄生 」在應用機器上的進程,所以資源控制本身尤其重要。因而現階段選擇 Java 語言來進行 Sidecar 的開發並不明智,也這是 Linkerd1.0 失敗的主要原因。所以我們並不打算引入 Java 技術棧的 Mesh。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.從開源生態來說,Istio 經歷幾年的錘鍊,雖然還有諸多不完美的地方,但其以強大的能力、巨頭的背書、以及生態的活躍等方面來說,已經成爲業內事實上的 Mesh 標準。所以我們希望基於 Istio 構建愛番番的 Mesh 體系。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4.與百度基礎架構的協作上,關於是否直接複用廠內的 Mesh 產品這一問題,我們與基礎架構雲原生的同學進行了多輪溝通,由於「 私有化 / 多雲部署 」這一前提,愛番番本身希望儘量以不改變開源組件原有結構的方式進行輕量部署,如儘量不與廠內獨有基礎設施進行耦合、如按照完全原生的方式落地等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"於是愛番番和基礎架構雙方商定最終方案爲:暫時不直接採用基礎架構的 Mesh,而改由基礎架構爲我們運維 k8s 集羣以及搭建 calico 網絡,並採用百度天合產品進行集羣的管控。愛番番在此基礎上選擇 Istio1.7 原生組件進行落地。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.3 ToB和Toc場景對Mesh核心訴求的差異性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 ToC 場景,性能往往會被高優考慮,Mesh 目前的性能(RT & OPS)並不出衆,官方方案會帶來幾毫秒~十毫秒不等的延時。業內自研/二次開發方案做得較好的約在 0.5~2ms 之間不等。在 toc 高流量場景下,Mesh 的落地會有一定的阻礙。在性能問題解決之後,纔可能會去考慮是不是能很好遷移之類的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而在 ToB SaaS 場景,核心點即","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"【可移植】","attrs":{}},{"type":"text","text":",能夠很好地支撐私有化、多雲部署,產品需要具備良好的可遷移性和可維護性。相比之下,Mesh絕對的性能要求在中前期並不是需要最高優考量的點。而在中後期,隨着中間件能力的下沉,更高的性能要求才會逐步提上議程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即二者差異性:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ToC場景:【性能】早於【可移植】考慮","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"ToB場景:【可移植】早於【性能】考慮","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而愛番番,則是典型的 ToB 場景。Mesh 在做開箱即用上,能夠很好地起到作用。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"三、實踐:平滑遷移與賦能業務","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3.1 愛番番現狀","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"愛番番目前擁有華北、華南、華東3個 IDC,300+ 的 k8s node,300+ 的應用,3k+ 的服務點,8k+ 的pod。日均 10+ 億pv。主業務產品大多部署在華東集羣上,因而本次遷移主要針對華東集羣。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3.2 平滑遷移","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2.1 POC驗證","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們選擇了 Istio1.7 版本,以愛番番的實際使用場景作爲基準進行 POC 的性能測試後,發現單機性能暫時可以滿足愛番番當前需求,在單機 100 QPS 左右,引入 Istio 的性能損耗在 1%以下。並且基於 Istio 的核心能力進行了驗證。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7a/7a98bd945f8e21e82dddc9edf770dd64.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2.2 遷移方案","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遷移的大原則有如下幾個:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.監控先行;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.業務方低感知;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.儘可能無損。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於大原則,產出的遷移方案整體架構如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/fe/fea4effee616de68dcd4559aa69c964e.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總體方案簡述:以 calico 爲網絡設施構建一套新的 Mesh 容器網絡集羣,以入口網關進行灰度。兩個集羣之間採用 Istio-Gateway 進行通信,並在多環節進行容錯處理。以 Istio 作爲服務治理的核心重構基礎設施。整個過程中,對於灰度遷移過程,以及新集羣的表現進行可視化觀察。整體遷移過程通過 CICD 以及 SDK 兩個層面來最大化實現對業務方的細節屏蔽。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2.3 遷移難點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在實施過程中,碰到的主要難點:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"無法進行流量閉環假設。","attrs":{}},{"type":"text","text":"複雜的分佈式拓撲中,遷移時候極難挑選出完全閉環的子拓撲進行先行遷移驗證。而一旦在沒有任何準備的情況下,將服務遷移上容器網絡集羣,這時候調用鏈中的某一環仍然留在主機網絡集羣上,則極容易引起線上事故。爲了解決這個問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.通過 Skywalking 進行鏈路拓撲的觀察,在遷移前期驗證階段時,儘量讓流量不至過於分散;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.藉助老註冊中心和灰度名單,實現容器網絡集羣中的服務在訪問非灰度應用的時候,可直連調回主機。通過這種方式,即可放心地進行服務遷移而無需關注是否進行流量閉環。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"容器網絡環境初期不穩定。","attrs":{}},{"type":"text","text":"在最開始遷移的初期,新集羣偶爾會出現 Node、API Server等基礎設施的不穩定,如果不進行任何干預和快速應對,則可能會導致嚴重的業務問題。爲了解決這一問題,我們在多個環節進行了可用性的保障,包括:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.在基礎設施層面,針對於 api server、etcd 等的抖動迅速止損和優化,並制定相應穩定性保障的 SOP;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.在網關入口層面,基於任意產品線、任意灰度比例進行灰度和回切操作;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.對於冪等的請求,提供失敗時自動 fallback 的機制;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4.對於失敗的請求,提供自動熔斷和恢復的能力;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5.對於常見容易遺漏的定時任務和異步 MQ 消費者進程,進行標識後,一鍵回切時可進行自動縮容;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6.Mesh 容器集羣裏進行調用時,在調用方會進行連接/讀取超時&重試的能力支持。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"大規模的遷移較難對業務方屏蔽影響。","attrs":{}},{"type":"text","text":"基本涉及所有300+的業務應用的遷移,在高速迭代的業務場景之下,如何儘量降低業務方的成本,來實現快速的切換工作。針對這一問題,我們主要有三方面的舉措:","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.SDK 默認儘量向前兼容。避免業務方進行大面積改造;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2在 CICD層面,屏蔽了新老集羣的部署細節,並可以按產品線進行按批次灰度,用一套模板來管控兩套集羣配置,通過這些方式實現在CICD環節對業務方的完全透明;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.對於大規模遷移過程中發現的緊急問題,通過鳳巢商業平臺團隊提供的launcher熱加載機制,實現自動替換注入升級包來完成新功能的零侵入替換和快速驗證。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"對於 Istio 的引入帶來的治理挑戰。","attrs":{}},{"type":"text","text":"Istio 的引入,對於以 Smart Client 理念去構築的原服務治理框架帶來了顛覆性的改變,這塊也會帶來對應的適應和切換的成本,我們如下進行應對:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.理念轉變:整體理念即服務治理理念和模型全面向Istio靠齊,逐步放棄全部基於 ServiceID(方法級)進行治理的思路;","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.配置優化:引入Istio後,會在整個調用鏈路上加入兩跳,針對這兩跳,重新審視連接/讀取超時重試、tcp backlogsize 等核心配置的關係,避免引起不必要的穩定性故障;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.入口收斂:Istio 引入後絕大部分的治理能力都通過 CRD 進行交互。我們將其治理入口暫時集成在 CD 系統上,禁止在kiali等其他地方進行核心配置變更,通過入口收斂來杜絕無序混亂的線上管理;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4.妥協增強:Istio 本身功能非常強大,但部分能力還需要進一步增強,比如限流熔斷、混沌工程等,於是我們也是在 tradeoff 之後進行取捨,對於部分功能做閹割妥協(如短暫放棄集羣限流),對於部分功能做補齊(如引入 chaosmesh 增強混沌)。通過這種方式,達到能夠快速享受 Istio 紅利的目的。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2.4 遷移節奏","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mesh 項目於20年8月底正式啓動,9月初完成 POC 驗證,9月底完成 MVP 交付,並切換愛番番 17% 的應用,在10月之後,進行逐步擴量,並不斷增強新集羣穩定性,同時開始釋放 Istio能力,最終在20年11月底完成華東主集羣業務應用的全量切換。整體投入5人力,僅歷時3個月完成從驗證到切換的過程,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"成爲百度第一個將商用生產系統完全基於原生 Kubernetes+Istio運行的產品。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3.3 紅利釋放","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在完成 istio 的主體切換後,我們並沒有停下腳步,而是緊接着開始進行了業務上的賦能以最大化發揮出 mesh 的價值點。我們基於 mesh 這一標準化的底座,交付了近 20個 功能點,幫助我們的業務實現了效能、穩定性、功能、成本上的全面提升。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.3.1 全鏈路灰度發佈","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以一個 case 爲例,愛番番的「全鏈路灰度發佈」平臺,基於istio通過同構底層「分組多維路由」的架構設計,在解決業內主流 flagger/helm 方案弊端的同時,完成了一套架構對 ABTest、金絲雀、容量評估、多路複用、Set化 在內的多個核心能力的支撐(部分能力研發進行中),對分組節點的全生命週期和流量進行了集中管控。針對於服務端場景,通過 FGR Operator 協調 k8s 以及 istio vs/dr 資源,並打通監控報警與 CICD。針對於端上場景,與對應的前端資源打包和獲取的流程整合,進行用戶級的打標和路由分發。這在傳統解決方案中,需要付出大量的研發成本才能實現,而依託於 istio ,我們的整體資源投入得到了大幅的縮減。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6d/6d5bc04be2f07949682d0ac80654a07a.webp","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.3.2 愛番番對Istio的應用現狀","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Istio 具備豐富的治理能力,在服務連接、服務發現、服務保護、服務可觀察等方面都有豐富的能力進行支撐。目前,愛番番對 Istio 的使用包括但不限於:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"服務連接","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.通訊:基於 Http1 的原協議長連;基於 K8s service 的服務發現;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.負載均衡:默認 RR,對於特殊的應用需求(如愛番番的數據庫中間件 dataio )採用一致性哈希;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.路由分組:金絲雀能力、測試環境多路複用、網關入口流量路由、abtest、開發機直連、灰度鏈路等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"服務保護","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.授權:敏感接口調用權限管控(如獲取用戶手機號);","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.限流熔斷:基於連接數的單機限流,基於慢調用/異常數/率的熔斷;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3.故障注入:東西流量的故障模擬,其餘由 chaosmesh/chaosblade 支持。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"服務運營","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.服務管控:並未使用開源 kiali 管理端,而將對應的節點信息呈現在愛番番一站式平臺上,並提供基礎的一站式管理能力,如限流熔斷、配置管控、服務遷移等;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2.APM:Istio 本身的 APM 中,Logging 基於 EFK架構 進行採集、Metrics 基於Prometheus 進行採集,通過 Grafana 進行一站式管理。業務應用的 APM 暫時維持現狀,仍然採用無 Mesh 的 Skywalking + EFK + Prometheus + Grafana 進行管控。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.4 愛番番切換Servicemesh帶來的收益","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過切換 Mesh,標誌着愛番番雲原生又一核心里程碑的達成,愛番番對自身業務的服務治理進行了底層解構並初步重塑,初步改變了沉沒式治理的現狀。之前的多語言治理難、業務耦合、能力缺失、人肉配置困境得到較大的緩解,在功能上,快速補充了超過 10+ 個之前缺失的核心治理能力,在效能上,將服務治理的生命週期從數月直線拉低到分鐘級,CI pipeline 時間節省 20%,解放了業務方和架構方的效能,測試環境多路複用能力更是可以顛覆現有開發模式,實現並行開發測試,並同時節省 30%以上 的測試聯調等待時間;在穩定性上,提供了限流熔斷和混沌工程的能力,爲業務提供了堅實的自我保護手段。通過金絲雀發佈,更是可以實現上線流量的無損的同時,讓研發人員告別深夜發佈的局面;依託於 istio 構建的穩定性保障體系更是讓愛番番整體穩定性得到了飛躍式的提升。這僅是現在就能帶來的收益,而其未來的收益遠不止此。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"四、結篇:星辰大海","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當下,着眼於務實的角度,愛番番的服務治理仍然面臨着不小的挑戰需要去一一攻克,以最大化發揮出 Istio 的核心紅利。另一邊,我們其實並不滿足於將 Servicemesh 定義爲南北東西向流量的管控上,面對效能難題,Servicemesh 的紅利其實能夠更大的釋放,解決更大範圍的痛點,沉沒式的治理不僅存在於分佈式服務框架中,也會長期存在於所有的中間件裏。我們也關注到業內包括 Istio 自己本身也有一些對應的探索,我們也堅信這在未來必將成爲「多語言微服務架構」背景下的主流趨勢,愛番番也基於自身痛點開始主導 apm mesh — Apache Skywalking Satellite 的孵化併成功 Release。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們更希望愛番番的 Servicemesh 體系,能夠真正意義上成爲","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"「下一代的中間件治理核心」","attrs":{}},{"type":"text","text":"。相信這會在不久的未來和公司其他部門的攜手合作下達成,徹底告別沉沒式治理,加速交付客戶價值點。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"本期作者 | 橙子","attrs":{}},{"type":"text","text":",百度愛番番業務部首席架構師,騰訊雲最具價值專家,QCon出品人,ArchSummit明星講師, Apache Commiter,歷任多家公司平臺&基礎架構&運維負責人。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"招聘信息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論你是後端,前端 ,大數據還是算法,這裏有若干職位在等你,歡迎投遞簡歷,關注同名公衆號百度Geek說,輸入內推即可,愛番番業務部期待你的加入!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"閱讀原文","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzg5MjU0NTI5OQ%3D%3D%26mid%3D2247493554%26idx%3D1%26sn%3D9eaa6cb738547c38980c23798fd66e29%26chksm%3Dc03ed7cef7495ed8422338b880235d04c0ca2ccfd4abb96ca0bd9c9ef47a535729865f1f0cdb%26token%3D1064478816%26lang%3Dzh_CN%23rd","title":null,"type":null},"content":[{"type":"text","text":"百度愛番番與Servicemesh不得不說的故事","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"---------- END ----------","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度Geek說","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度官方技術公衆號上線啦!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術乾貨 · 行業資訊 · 線上沙龍 · 行業大會","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"招聘信息 · 內推信息 · 技術書籍 · 百度周邊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歡迎各位同學關注","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章