京東如何建設基於雲原生架構的監控-日誌系統?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這個人人都談“雲原生”的時代,企業在建設內部相關係統時常常會優先考慮雲原生架構。那麼,雲原生架構的系統與傳統架構系統有什麼不同?又該如何建設呢?本文我們採訪了京東架構師韓超,他分享了京東基於雲原生架構的監控 - 日誌系統的建設之路,希望能對想要建設基於雲原生架構系統的讀者有所助益。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"雲原生監控系統有什麼特殊?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着雲計算的發展,我們發現很多系統都是基於雲原生架構構建的,監控系統也不例外。雲原生架構監控系統與傳統架構監控系統到底有什麼不同?韓超表示:“這兩者的本質區別是雲原生架構監控系統需要以 Cloud Native 的方式來進行部署運維。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cloud Native 是一個發展中的體系,簡單來說,雲原生架構需要融入 kuberneters 體系,讓 Monitoring、Logging、Tracing 幾大功能,按照 Cloud Native 的方式運作。從系統(kuberneters 或 PaaS)的視角來看,雲原生架構的監控需要是一個標準的東西,而非把系統改得“七零八落“;從應用的視角來看,雲原生架構的監控需要與系統融爲一體,接入方式不能太複雜。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統架構的監控系統與雲原生架構的功能目標大致是相同的,但細節的把握不同,後者可以視爲前者的“進階“。當然,兩種監控系統面臨的挑戰也各有不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統架構的監控系統中,主要是面臨多、快、好、省四個方面的挑戰:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多:監控系統需要運作於數萬主機、百萬應用的環境中,而且開發、運維的負責度要做到 O(1)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快:監控系統要隨着主機、應用快速部署完成。一臺主機交付上線,有沒有監控的部署速度應該是相同的;一組應用交付上線,配套的監控應當同步具備;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好:監控的時效性、穩定性要強於應用。監控是應用的保障,發現問題的準確率、召回率是監控系統的關鍵指標。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"省:監控系統需要省資源,監控的 CPU、Memory 開銷是監控的關鍵指標。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而基於雲原生架構的監控系統,其挑戰大多是來自於 Cloud Native 模式本身,主要包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cloud Native 體系本身就有快速部署、自動擴縮等功能,既然應用具有這種特性,那麼監控系統也需要具備同類特性;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在雲原生的 PaaS 中,Kuberneters 與應用層都是標準的,監控系統作爲介於二者之間的部分,“採集端侵入“的危害會更加嚴重;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲原生秉承有標準化、自動化的核心理念,但是大型系統往往需要做“極限優化”,要滿足標準化、自動化的“極限優化”,挑戰就更大了。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"京東如何構建監控系統?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最初,京東的應用程序全部都部署在物理機器上。這種部署方式不僅造成了物理機器資源的嚴重浪費,而且調度缺乏靈活性。由於物理機器的故障,應用程序遷移的時間要花數小時,無法實現自動擴展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決這些問題,京東從 2014 年開始嘗試使用 Docker,並基於 OpenStack + Novadocker 架構創建了第一代容器引擎平臺:JDOS1.0。此後,所有應用程序在容器裏面運行,而不是在物理機器裏面運行。其中,一個 OpenStack 分佈式容器集羣中最多有 10000 個計算節點,至少也有 4000 個計算節點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2016 年,JDOS 1.0 的容器規模由 2000 個擴大到 100000 後,京東推出了新的容器引擎平臺 JDOS 2.0,京東商城的“應用體系”從 OpenStack 切換到 Kubernetes。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/00\/0015551fe05a29dd0da3c11e311f17c6.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"JDOS 2.0 的平臺架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"京東雲原生監控 - 日誌系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東商城基礎架構的 JDOS2.0 已經非常接近“雲原生標準“,京東雲原生監控 - 日誌系統的發展和建設,與之同步。該系統的核心組成主要包括採集端、接入代理、存儲模塊、計算模塊、服務控制中心、報警等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/7d\/7d931a44fc5b3e71f6eb6a0519ca0c7d.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最核心的底層是京東商場基礎架構自研的存儲模塊;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中層則是其它各個核心模塊,採用積木式可以進行排列組合;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在覈心的監控 - 日誌系統之上是基於 API 的擴展能力,比如業務定製擴展、AIOps 擴展等。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東雲原生監控 - 日誌系統各模塊之間也是以服務化的方式進行聯繫,看起來就像是普通的應用。其中採集端比較特殊,因爲它要放到 Node 環境中,並且要求每機一個。這是所有監控 - 日誌系統都繞不過去的事情,京東將其做成 K8s 的標準化組件,做到了“低侵入性“。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東監控 - 日誌系統本質上都是標準的 K8s 組件,與京東容器平臺 JDOS 關係密切。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 K8s 的視角,監控 - 日誌系統是 DaemonSet、RS 的各個 Pod,並非改了系統。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 JDOS 容器平臺的視角,監控 - 日誌系統可以視爲 JDOS 容器平臺的一個“插件“,並非強耦合。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從應用的視角來,監控 - 日誌系統是一個“無需感知“的機制,例如上線無依賴。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"建設監控 - 日誌系統遇到的挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大型系統建設升級的挑戰大多來自上線,上線類似於“在開車的過程中加零件”,需要保證各種穩定性。京東監控 - 日誌系統也不例外,在上線的過程中,新、老系統採用“臨時冗餘化“、雙寫並行運行的模式,解決了平滑切換的問題。同時,小流量機制,也解決了佔用 double 機器資源的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,監控 - 日誌系統上線常常會遇到由於“強侵入性“導致的上線順序左右爲難的問題。因此,京東監控 - 日誌系統在設計之初就避免了這種問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"報警在整個監控系統中是一個定製化極強、需求極強的模塊,一般來說報警包括兩個層面的東西,一是短信、郵件、IM 等通道,二是報警規則的設置引擎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東商城的 app 監控系統採用了極其靈活的報警規則機制,規則的設置在於使用者,而不在於監控系統的開發運維。這種設計給了各個業務開發“自我平衡“的機會,在運作的過程中,將報警的量、級別調整到比較合理的狀態。同時,報警多了容易發生召回率高、準確率低;報警少了容易召回率高、準確率低。雖然這兩點永遠是矛盾的,但京東在技術層面也做了一些小的優化,比如自動合併、調整時間軸等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在韓超看來,目前京東商城的監控 - 日誌系統最大的亮點在於:架構靈活 + 與時俱進。架構靈活是空間維度的概念,包括對標架構、拓撲關係、部署方式三個方面;與時俱進是時間維度的概念,包括維護成本、演進模式、技術發展三個方面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然,該系統也還有很多值得優化的地方,在韓超看來京東監控 - 日誌系統應該優化的地方也是很多大型系統架構層面的固有問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多集羣、多地域:整體架構超越了 K8s 集羣的定義,監控 - 日誌系統也需要進行應有的改變;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個系統的監控對象其實既有普通 App,也有數據庫、緩存、隊列等中間件,這裏面需要整合,才能讓每個業務的開發者更能感受到 Serverless 的優勢;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東商城基礎架構自研的 baudtime、baudlog 已經在很大程度上節省了存儲資源,如果權衡讀、寫能力,仍有成本優化空間,查詢體驗也可以對標更好的 ELK。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"採訪嘉賓:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"韓超,技術棧全面、技術底蘊深厚的跨領域架構師,曾經是中國大陸嵌入式 Linux 的先行者,從事移動多媒體、Linux 平臺、互聯網架構等工作,當前主要技術方向是分佈式基礎架構。曾任 Motorola 資深工程師、Intel-WindRiver MTS、百度主任架構師,現任京東架構師。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"QCon 全球軟件開發大會(北京站)2021 官方講師招募通道已經正式開啓,如果您有一個優質的話題並樂於分享交流,那就提交吧!期待你的精彩分享。"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章