數千個數據庫、遍佈全國的物理機,京東物流全量上雲實錄 | 卓越技術團隊訪談錄

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":1}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採訪嘉賓 | 章華、馬琪、張成遠、陳春輝"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017 年 2 月,時任京東 CEO 的劉強東公佈了未來 12 年的戰略:技術轉型。當年的演講裏,劉強東首先提到的就是雲計算,其次是大數據、人工智能和基因技術。從電商到技術提供商,京東不乏勇氣和底氣。財報顯示,自進入技術戰略升級以來,京東體系已在技術上累計投入近 750 億元。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲計算作爲京東技術戰略裏的重要部分,如今已經獨當一面。4 年多的時間,京東雲已經爲 1500 多家大型企業、152 萬家中小微企業提供技術服務。那麼,如今京東的雲計算實力究竟如何呢?"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"京東雲演進歷程:大促和雲計算相輔相成"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“像十年前或五年前,大家進入雲計算領域的時候都是從基礎設施開始各自發展,沿着相同的環境,只是選擇了不同的成長路徑而已。”"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自 2006 年提出到現在,雲計算已成爲世界頂尖互聯網公司必爭之地。憑藉着出色的雲計算業務,亞馬遜成爲全球市值最大的電商公司。而在中國的土壤上,雲計算的發展同樣跟重度使用 IT 基礎設施的“電商業務”休慼相關。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2008 年到 2012 年,是中國本地雲計算髮展的起始階段,也是中國電商發展的黃金時期,京東商城的日交易量每年指數級增長:從日均 5000 單,到日均 10 萬單,再到日均 50 萬。2011 年,瞬間流量峯值已經突破每秒 10 萬單。在購物節以及大型促銷活動中,電商平臺比拼的是後臺系統以及相應的 IT 資源是否能快速擴充以應對流量洪峯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時的京東系統採用的是集中式的架構,短期內無法進行服務資源補充。2011 年,京東因爲一場圖書大促活動太過火爆而發生服務器宕機的現象。在這場活動的最後半小時,購物車和下單頁面要麼打開遲緩要麼根本打不開,導致許多用戶無法下單。爲此,業界傳聞負責的研發同事還被公司領導請“喝茶”,京東也不得不在微博上向大家道歉。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務壓力爲京東技術架構的雲化轉變帶來契機。2012 年左右,IT 系統採用“分佈式架構”進行了重塑,並將物理機轉向虛擬化,可以彈性調節 IT 資源,然後進一步地轉向了微服務架構。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/32\/322debcd8ed9cd8d3fac60a3a9216226.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"過去十年,京東成長迅速。十年前京東有兩千個員工,收入 40 億人民幣,發展到今年達到 37 萬員工,收入達 7000 多億,與業務發展相對應的是日益增加的基礎設施需求。2014 年之後,京東對技術架構和集羣建設進行整體評估、重新設計規劃。此時 Docker 技術興起,京東將應用從物理機遷移部署在 Docker 上,採用 OpenStack+nova-docker 技術架構,用管理虛擬機的方式管理容器,發展形成京東第一代容器引擎平臺 JDOS1.0。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東主要的一些核心應用比如秒殺、配送員訂單詳情、全球購等都部署在 JDOS1.0 中,2015 年 618 大促前,京東運行的較大規模的 Docker 容器和 KVM 虛擬機集羣,經受住了當年流量的考驗。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"基於複雜場景下的容器化實踐,打造京東混合雲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2015 年是整個京東技術發展的分水嶺,京東一位技術研發負責人表示,在這之前京東的技術一直是服務於業務的發展,而在這之後京東的技術開始驅動業務的發展。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲更好地應對複雜業務場景,這一年,京東對技術做了一次架構調整,將技術部門從業務部門剝離出來,成爲一個單獨的技術大體系。因此京東的技術部門得到了前所未有的獨立性,除了服務於商城業務的應用研發團隊,包括雲、大數據、AI 等技術研發團隊第一次開始了自主的技術研發,也爲後來京東技術的對外輸出和技術轉型奠定了基礎。京東雲的發展,就是在這之後進入了快車道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"JDOS1.0 中原本採用的是 Docker 容器技術,其調度方式較爲單一,只能簡單根據物理機剩餘資源是否滿足要求來進行篩選調度,在提升應用的性能和平臺的使用率方面存在天花板,無法做更進一步提升。2016 年,當容器規模逐漸增長到十萬、十五萬時,京東圍繞 Kubernetes,整合了 JDOS1.0 的存儲、網絡,打通了從源碼到鏡像,再到上線部署的 CI\/CD 全流程(JDOS2.0)。從早期使用較多的 Oracle 和 SQL Server 產品,到全面去 Oracle、SQL Server,開始使用 MySQL 等開源及自研的數據庫產品,京東雲數據庫在 2016 年開始對外提供服務,目前共開放十餘款雲數據庫產品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容器技術是所有平臺服務的基石,京東是容器化最徹底的公司之一。這些對內部基礎設施進行容器化、資源池化,以及一些基於開源的中間件體系打造,形成了京東私有云基礎,結合 2015 年規劃好的公有云平臺一起構成京東混合雲,於次年正式對外開放。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017 年京東又利用 Kubernetes 技術來重構相關技術棧,全面對技術進行升級,並將數據庫、大數據等業務通過 Kubernetes 部署,在容器化基礎上打造了“阿基米德”調度系統,這也是業界比較早的基於 Kubernetes 的混合雲統一調度系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集團核心業務也逐漸往雲上遷移,歷經了幾年 618、11.11 大促,混合雲 PaaS 平臺逐漸被錘鍊成熟。服務器的算力也能被最大程度化利用,2019 年利用原有基礎設施全年沒有再採購物理服務器,一舉節省 IT 成本數十億元。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"降低複雜度,打造雲艦平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據京東的老員工回憶,京東早期系統很小很簡單,只有交易網站、供應鏈管理系統和一套財務系統這三個系統,那時候做促銷活動非常簡單:“早晨開早會的時候談當天做個什麼活動,比如抽獎、轉盤,聊完了研發去做開發,開發到下午 4、5 點鐘,測試看看行不行,到晚上 7、8 點上線就好了。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發展到現在,京東雲底層基礎設施相關係統已經變得龐大而複雜,僅以大促爲例,在全球,有 3 家雲廠商,4 大區域,近 50 個大型數據中心,近 60 朵城市雲,77 個離線數據中心,支持數十萬智能設備,服務近 5 億用戶,這意味着公有云環境、私有云環境,也有邊緣節點、機房服務器並存,甚至還有跑在路上的終端、配送車,這些大規模混合 IT 設施支撐着京東每次大促活動。所以如今做促銷活動,難度大了很多,而且像 618 這樣的活動資源需要迅速地擴充到平時的 135%。京東雲於 2020 年開始做了大量優化,自研了混合雲操作系統“雲艦”,通過雲艦對上提供一個統一的接口來調度 IT 基礎設施,屏蔽掉了底層複雜性,同時對外界用戶更爲友好。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個混合雲操作系統,可以同時調度超 1 千萬核的系統資源,在線管理 Pod 數超 200 萬,承載最複雜場景的雲原生實踐。京東科技京東雲事業羣技術總監,數據庫研發部門負責人張成遠解釋道:“對用戶來說,雲艦屏蔽掉了下面所有的 IaaS 基礎設施,幾十核就可以把整個雲艦操作系統安裝起來,數據庫、中間件等所有服務可以像軟件插件一樣使用,按需安裝。”"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"京東“護城河”業務物流的上雲路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東物流是京東集團的核心資產。從 14 年前劉強東力排衆議成立物流部門到今年獨立上市,京東物流已經成爲京東的一道重要“護城河”。數據顯示,2018 年到 2020 年,京東物流營收分別爲 379 億、498 億、734 億,特別是 2020 年,營收實現了 43.2% 的爆發式增長。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2018 年,京東開始規劃混合雲服務。2019 年左右,上雲成爲整個京東的重要技術戰略,這裏主要指從私有云轉向公有云。京東物流作爲重要的事業羣,同時擁有相對豐富的場景,上雲技術和經驗可以反哺整個集團。在公司政策和自身需要的雙重驅動下,物流上雲成爲京東上雲戰略裏的重要一環。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/ba\/bad4c896b90efeeab85f0292d97f7e6e.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"物流上雲是一次多部門合作的過程。物流部門把握整個業務上雲的節奏,京東雲按需求提供雲基礎設施,兩個部門大約半個月做一次進度溝通。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在新設計的物流雲基礎架構中,之前高度耦合的 Docker、JinDB、ES(Elasticsearch)和 DB(數據庫)等通過 VPC 分別放到對公子網、業務子網和數據子網中。因此,上雲首先就要解決網絡問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"物流並非純粹的互聯網,其基礎設施拓撲結構的複雜性遠遠大於現在的頭部互聯網公司。京東物流系統管理着全國大約 1300 個倉庫,跟實物流轉密切相關,因此有很多系統運行在全國各區域的本地物理機上。不同的 VPC 子網要對應分佈在不同的物理機房(AZ)。京東雲需要制定具體的網絡規劃、完全隔離的 VPC 環境和細化不同業務的網絡配置等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上雲會使物流從重型資產向輕資產化轉變,因此團隊部署了 CMDB 來做混合雲資產管理,並同步計費信息。爲保證整個上雲過程可控,團隊進行了資源監控和性能監控等。此外,團隊還研發了很多自助運維工具和數據同步平臺“數據蜂巢”等適應雲架構,同時沿用一些傳統工具,如 J-one 和 UDBA 等減少研發人員學習成本。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"雲上遷移:“卡”在了對人的依賴"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“遷移之前心裏是沒底的,因爲每個業務系統完全不一樣,會遇到什麼困難也完全預測不到。”"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上雲是個大工程,尤其物流系統業務極其複雜,模塊之間互相依賴,涉及百萬核級別的業務應用、數據庫及中間件等需要向雲上遷移。“運營過程中,涉及的每一筆訂單、每一次交易、每一筆支付、每一個包裹都不能出錯,這是非常大的技術挑戰,可以說整個物流上雲的過程無異於給一架高速飛行的飛機更換引擎。”張成遠說。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"做好準備後,京東物流沒有立即全部上雲,而是先對非核心業務做小規模遷移,來驗證了各種組件的可用性、繼續完善遷移工具。這個“實驗”階段持續了大半年的時間,之後物流系統才迎來了大規模上雲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上雲方式基本可以兩類:對原系統進行雲改造和直接重構。對於物流系統來說,雲改造的佔比更多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東物流技術發展部工程效能組負責人馬琪提到,需要改造的系統,有些成本還是比較高的,“說到底,目前業界流行所謂的雲原生的概念,但物流有些業務當初打造的時候,並沒有考慮未來要跑在雲上。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這在物流系統中體現尤爲明顯。相對於普通的、只有一兩個機房的互聯網企業,物流企業的基礎設施是本地化的,因爲京東在全國的上千個物流倉庫,有很多本地化數據庫:京東的系統從早期開始就部署在這些對當時來說功能極爲強大的物理機上。舉個極端情況下的例子,比如某些數據庫的規格對物理機的性能要求很高,但到雲時代,是可以拆分並分散到不同雲主機上的。“千萬不要把雲上的機器想象成它就是一個物理機”,馬琪強調道,這些系統要遷移到雲上,面臨的改變是巨大的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過架構梳理後,京東將需要遷移的服務分爲了兩種:有狀態和無狀態的。比如 Docker 服務的部署,那麼可以當作無狀態服務處理,部署後可以做大量的驗證,類似平時的上線驗證,通過灰度測試等查看各項指標沒有異常即可,這是最簡單的。而對於有狀態服務,如數據庫、Redis、ES 等,得把整個狀態遷移上去,就變得複雜了,需要花費較多的精力和成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於物理機到 Docker 容器的遷移,每個團隊可以自己做壓測、計算前後承載 QPS 的差異,逐步替換對系統影響不大,可維持系統的高可用和穩定性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"中間件層遷移是團隊面臨的比較大的技術挑戰。一方面是公有云產品會有非常標準的 Open API,而之前內部的一些產品基本只考慮業務需求就可以。另一方面是,各種中間件的版本不一樣,包括雲和本地之間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,遇到的一個比較難的技術點是 Redis 的遷移。有些團隊使用的是零售業務的緩存中間件,和公有云的緩存中間件本質差異很大,京東內部 JimDB 分佈式緩存產品與跟雲上分佈式 Redis 產品的協議有區別,前者更加私有化和定製化的性質,對公有云產品而言並不友好。因此,在遷移到公有云 Redis 集羣過程中,物流、公有云和 Redis 開發團隊都面臨着這個差異帶來的考驗。最終經過幾輪商討,團隊研發出了兼容 Jimdb 集羣和公有云 Redis 集羣的 SDK,在只需修改依賴、URL 等的情況下就可以實現無縫遷移。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個遷移過程中,“瓶頸”落在了數據庫團隊上,畢竟比起 Redis 緩存裏的數據量,動輒幾百 G 或上 T 的數據遷移會複雜很多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東大約在 2014 年開始去 SQL Server 和 Oracle,上雲之前已經大部分業務在用 MySQL。而物流系統中的本地數據庫,雖然大部分是 MySQL,但跟雲上的還存在版本號不一樣,或一些特性沒開啓的情況。在公有云 MySQL 版本更高的情況下,跨版本遷移導致新場景下的 RDS 集羣無法直接掛爲從庫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,在低版本直接升級到高版本時,可能會有一些問題發生,比如變量的數據類型發生變化從而導致 Time stamp 精度發生變化。這些都需要 DBA 來協助解決,同時 DBA 也負責了很多部署、監控、備份之類的工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遷移上千個物流倉庫中的本地數據庫,起初,這項工作大量依賴 DBA 團隊。當時物流的 DBA 團隊只有七、八個人,在緊張的日常任務之外再承擔了幾千套數據庫的遷移工作,項目四處冒煙,幾乎需要 24 小時 oncall,團隊很快便力不從心,也嚴重影響了整個上雲進度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東物流技術發展部首席架構師章華回憶說:“之前更多考慮的是系統架構以及團隊的技術能力能不能適應由上雲帶來的新的運維方式,但確實沒預料到 DBA 的人力會成爲瓶頸,而且臨時也招不到足夠多的人。”因此,DBA 團隊只能暫停工作去研發自動化工具,來代替人力去做遷移、驗證等重複性工作。最後,通過一些 DBA 等工具,將數據複製到 RDS 集羣,然後找時間窗口做域名切換。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比起基礎設施的遷移,應用實例的遷移相對容易很多,可同時將應用流量打到私有云和公有云的分組上,運行穩定後就可以去掉私有云分組了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東也根據不同的物流業務將系統分成了三個等級:零級系統、一級系統和二級系統,這三個等級的系統對業務的影響力依次減弱。影響下單的屬於零級系統,而類似一天只跑一次的統計分析任務屬於級別低的系統。遷移時,一般會先從對業務影響最低的二級系統開始,之後纔是一級系統和零級系統,針對業務劃分系統邊界,並分步驟實施,還要考慮應用遷移如果出問題是否能夠回滾。零級系統還需要有對比測試,灰度切換,會有三天到一個月不等的雙活階段,在驗證新架構沒有問題後,舊架構才被下掉。馬琪建議,遷移到雲上後,研發人員要做足夠的測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"物流是雲資源使用方,而作爲供給方,京東科技技術交付部架構師陳春輝從四個方面總結了遷移時公有云相關部門需要對應做的工作:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第一是提供高可用"},{"type":"text","text":",保證不同倉儲系統的物理機房(AZ)就近接入華東、華南等不同 region 的公有云的機房,從機房層面保證 Docker、數據庫、中間件等系統的高可用;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第二是保證高性能"},{"type":"text","text":",保證機器的利用率,比如 CPU 不低於 40-50% 的閾值,讓機器、容器、數據庫等最大性能地發揮作用;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第三是保障高安全"},{"type":"text","text":",結合 VPC 子網,做 ACL 安全策略、數據庫審計以及 WAF、DDOS 防護,保證業務高安全;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"第四是提供高運維能力"},{"type":"text","text":",利用雲資源提升物流側運維能力,按部門進行資源使用量的核算以及提供精細化計費。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"物流上雲,不只於千萬級別的成本節約"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過團隊兩年左右的努力,物流成爲京東首個實現全面上雲的部門。目前京東物流訂單平臺等核心業務系統已穩定運行在京東雲上,雲上日處理訂單量達千萬級。“全部上雲”的標準是什麼呢?物流團隊也對這個問題思考了很久。最後,團隊得出的結論是:以應用爲標準,看其依賴的 Redis、數據庫和 ES 等資源是否全部上雲,如果這些資源全部上雲就是應用完全上雲。這項標準如今也成爲京東上雲的標準。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上雲帶給物流部門最大的改變就是再也不用在基礎設施上花費過多精力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"物流系統相對普通互聯網企業擁有更多的本地機房,這意味着物流系統的資源使用彈性很小,爲 618 大促等購買的物理機在平時用不到造成了很大的浪費,而隨着業務發展,物流部門需要的物理機和計算資源只會越來越多,資源浪費也會越來越大。同時,物流部門還需要人力統計數據,花很多精力去維護衆多小機房的穩定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上雲之前,物流部門的大部分基礎設施是用零售部門的,自己的基礎設施相對來說成熟度不高,一些維護工作也是零售團隊在做。物流團隊要花費很多精力在保證資源充足上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上雲之後,物流部門可使用資源的彈性大幅增加,資源利用率也得到很大提升,這給部門帶來了千萬級別成本的節省。自動化計費方式也給了研發團隊更直觀的成本觀念。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在馬琪看來,上雲不是將物理機搬到雲上,而是將整個系統和應用打造成適合雲的狀態,這樣才能從上雲中獲得最大的效益。“如果企業有能力、有資源,上雲越快越好。”"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"物流上雲後的第一次大促"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"今年京東 6·18 大促是物流上雲後第一次接受流量洪峯的挑戰。“不抗一次大促,心裏沒底。”章華說道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東 618 可以稱之爲全球最複雜的業務場景之一,涵蓋了從零售、物流、金融、健康多個業務形態。每年大促前,京東各個 BG\/BU 裏的重要負責人會組成備戰委員會,重點工作就是保證流量激增情況下系統的穩定。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近兩年,物流團隊引入了全鏈路壓測,對用戶下單到所有參與系統完成任務的整個過程進行流量測試。其中“訂單到供應鏈系統,再下傳到物流系統,物流系統又下傳到具體庫房”的過程,是整個鏈路的核心,也是改造的重點。壓測結果也是京東雲做容量規劃的重要依據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一年前,物流團隊開發了“搗亂演練”的工具,對各個系統進行梳理,找出薄弱點和高可用的地方,查漏補缺,進一步加強系統的健壯性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當前京東有百萬個微服務化應用,故障排查比較有挑戰性。京東根據多年積累的各種故障經驗模型化,研發了一套故障分析系統進行自動化篩查。原來多部門聯合 20~30 分鐘完成的故障定位,一兩分鐘內即可完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"異地多活的架構也保證了服務的穩定。一旦某個節點出現問題,流量就會被切到其他節點,整個服務不會受影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東物流系統在大促期間也有一定的性能指標。比如 CPU 如果低於 50% 會被判定非高性能,存在負載量不大等問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性的堅強後盾就是充足的資源。今年京東 618 相對平時資源擴充了 135%。上雲後,大促“日常化”成爲可能,技術團隊不必爲此消耗過多的精力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前京東雲評估資源量是以年爲單位,但現在是按季度來評測,甚至在供應鏈方面可以細化到月或周,分批次使用不會造成資源過多積壓。在滿足日常需求外,雲上的資源池一般也會留有剩餘,應對大促流量超出預算或其他緊急情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"京東的單 VPC 內地址規模超過 50 萬,有超過百 G 節點網絡的大規模網管集羣,承載 TB 級專線流量。管理超過 1 千萬覈資源的雲艦支持了大促期間系統快速擴充的需求。另外,雲艦的 IT 基礎設施調度能力,讓物流、零售、健康等系統在統一的調度平臺運行,使整體系統擁有了很好的彈性。數據顯示,618 期間,京東整個系統資源的利用率提升了 3 倍,單位訂單成本下降了 30%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/cf\/cf5c77f7ef7bfcecab07f336c9a487b2.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"京東雲備戰 11.11"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"今年雙十一的特殊挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"堆資源、保穩定是大促常規保障活動,而今年的雙十一有些特殊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進入 10 月以來,全國多地發佈了“有序用電”的通知,各家電商的 IDC 都面臨着被拉閘限電的風險。對於京東物流來說,分揀中心分佈在全國各地,每個分揀中心都有本地設備,如果某地停電,如何在短時間內恢復運營對其也是一個很大的考驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲預防萬一,京東雲做了很多保障工作,保證斷電後的 IDC 有備用電源。對於應用層,核心系統都做了雙機房部署,一個機房斷電後另外一個機房能扛住流量繼續運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外,今年京東雙十一將時間從零點提前到了晚上 20 點,這種脈衝式流量洪峯也給系統帶來嚴峻挑戰。基於混合雲操作系統雲艦及離在線混部技術,京東雲靈活跨平臺分配與調度資源,削峯填谷,實現資源錯峯與平衡,平穩應對晚 8 點形成的脈衝式流量洪峯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“並不存在所謂大招或者祕籍,只要把具體的小事做好,大促就能穩定。但如何讓整個過程更高效、時間更短纔是挑戰。”章華說道。爲此,除升級基礎設施外,流程化、標準化、工具化也是非常重要的。尤其是把大促備戰的事情日常化,會更加的重要。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"寫在最後"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“上雲是一個趨勢。十年前問要不要上雲可能還值得討論,而今天不應該再討論這個問題了,雲是一定要上的。”章華說道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據信通院統計數據,2020 年,我國雲計算整體市場規模達 2091 億元,增速 56.6%。其中,公有云市場規模達 1277 億元,相比 2019 年增長 85.2%。隨着雲計算在企業數字化轉型過程中扮演越來越重要的角色,預計短期內企業將繼續加大基礎設施投入。這對要走向產業的京東雲來說無疑是很大的機會。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"就像京東員工說的,“京東內部這些年一直在喊技術、技術、技術,確實感覺到了不少變化。”京東比以前更加註重流程化、工具化、雲化,在這樣的形勢推動下,京東雲的打造只用了五年,但京東雲的故事遠遠沒有結束,未來要走的路也還很長。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"採訪嘉賓:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"章華"},{"type":"text","text":",京東物流技術發展部首席架構師,曾負責多個公司級重大項目,包括物流上雲、京東 618、11.11 備戰等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"馬琪"},{"type":"text","text":", 京東物流技術發展部中臺技術部工程效能組的負責人,負責物流計算資源的管理與運維。2021 年,通過技術手段推動物流計算資源整體利用率的提高,爲物流帶來千萬級的成本節省。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"張成遠"},{"type":"text","text":",京東科技京東雲事業羣技術總監,數據庫研發部門負責人,帶領團隊實現京東雲數據庫產品線從 0 到 1、從 1 到 N 的建設,並承接集團上雲工作,負責組織協調各部門各團隊間上雲工作的推進落實。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"陳春輝"},{"type":"text","text":",京東科技技術交付部架構師,我們團隊負責集團物流和零售業務上雲工作,在上雲過程中提供技術支持和架構優化服務,在 618、11.11 等重大活動前對集團上雲客戶配合梳理架構提高客戶業務穩定性,在大促期間提供重保服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文選自"},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/minibook\/bdNa3CRKlONxG13Y4Fq4","title":null,"type":null},"content":[{"type":"text","text":"《中國卓越技術團隊訪談錄》(2021 年第六季)"}]},{"type":"text","text":","},{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/minibook\/bdNa3CRKlONxG13Y4Fq4","title":null,"type":null},"content":[{"type":"text","text":"點擊下載全部內容"}]},{"type":"text","text":",查看更多獨家專訪!本期精選了京東、微衆、網易數帆、優酷、恆生等技術團隊在技術落地、團隊建設方面的實踐經驗和心得體會。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"《中國卓越技術團隊訪談錄》是 InfoQ 打造的重磅內容產品,以各個國內優秀企業的 IT 技術團隊爲線索策劃系列採訪,希望向外界傳遞傑出技術團隊的做事方法 \/ 技術實踐,讓開發者瞭解他們的知識積累、技術演進、產品錘鍊與團隊文化等,並從中獲得有價值的見解。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果你身處傳統企業經歷了完整的數字化轉型過程或者正在互聯網公司進行創新技術的研發,並希望 InfoQ 可以關注並採訪你所在的技術團隊,可以添加微信:caifangfang842852,請註明來意及公司名稱。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章