保姆級教程:滴滴如何基於開源引擎,打造自主可控服務體系

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、分享背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在滴滴負責過LogAgent、Kafka、Flink、Elasticsearch、Clickhouse等開源大數據引擎服務體系建設工作,走過很多彎路,趟過很多坑,積累了一些實戰經驗;近年疫情肆虐,加速了企業數字化轉型的步伐,與數十家互聯網、金融、證券、教育企業進行了深度交流,大家對基於開源數據引擎建設自主、可控、安全的服務體系有強烈訴求,常見的困惑是如何基於開源引擎,結合企業特點與發展階段,進行高ROI的服務體系建設。 "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、建設實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"滴滴基於開源引擎搭建大數據基礎設施,始於數據驅動業務運營與商業決策的BI需求,隨着實時數據流量達到百MB\/S,存儲達到PB級,開源數據引擎的服務運營會遇到各種各樣的穩定性、易用性、運維友好性挑戰。經歷了四個階段:引擎體驗期、引擎發展期、引擎突破期、引擎治理期,不同階段遇到的痛點問題與服務挑戰各不相同:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引擎體驗期:伴隨業務快速發展,引擎的選擇、版本的選擇,機型的選擇、部署架構的設計,覆盤來看是早期穩定性工作的關鍵抓手。 "}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引擎發展期:隨着引擎用戶規模的增長,日常運營過程中的問題答疑、最佳實踐落地、線上問題診斷消耗團隊60%+的精力,亟需大數據PaaS層建設,降低用戶引擎技術學習、應用、運維門檻,提升用戶自助服務能力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引擎突破期:隨着集羣規模的膨脹、業務場景的多元,勢必觸碰開源引擎的能力邊界,亟需構建基於開源引擎的內部迭代機制,既需要與開源社區緊密協同,平滑版本升級,享受社區的技術紅利,又需要在開源引擎的基礎上進行BUG FIX與企業特性增強。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引擎治理期:隨着PaaS平臺的構建,引擎版本的快速迭代,會衍生三大類問題:未區分SLA場景的混用、超出引擎能力邊界的誤用、沒有成本意識的濫用。導致引擎服務口碑低、資源(機器+人力)ROI業務價值低,亟需基於元數據驅動的引擎治理體系建設。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1、引擎體驗期"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決特定技術問題的開源引擎衆多,比如消息隊列有Kafka、Pulsar、RocketMQ等,技術選型對於服務的SLA、運維保障至關重要,嚴重影響幸福感與價值感。 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)引擎選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要綜合考慮社區的Star數、Contributor數,國內是否有PMC或Committer;應用的廣泛度,有無衆多大廠生產實踐背書,線下Meetup是否頻繁,線上問題答疑響應速度,線上最佳實踐資料是否豐富,部署架構是否放精簡等多種因素。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)機型選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務從應用場景上可以分爲IOPS場景與TPS場景,開源引擎可以分爲CPU密集型、IO密集型、混合型。以分佈式搜索引擎Elasticsearch爲例,企業級搜索場景,隨機IO頻繁,對磁盤的IOPS能力要求高,SSD磁盤是剛需;CPU消耗需要根據查詢複雜度、QPS來評估此業務場景對CPU的訴求;推薦的做法是模擬業務場景做BenchMark,摸底引擎在特定場景下的性能表現,爲機型選擇提供依據,下圖是滴滴Elasticsearch引擎在做機型選擇時的壓測驗證:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6f\/6f2ea42f8e6130fef3d5bc971bfc2e7b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於引擎原理與業內最佳實踐建議,結合應用場景與壓測結果,根據當下可選機型做選擇,理想狀態是CPU、磁盤容量、網絡IO、磁盤IO資源均衡使用,儘量讓CPU成爲瓶頸。另外隨着引擎的不斷優化,軟硬件基礎設施的發展,機器過保置換是常態,最佳機型選擇是一個動態演進的過程,滴滴運維保障團隊2020年進行了新一輪機型優化與場景的調優,Elasticsearch日誌集羣成本降低了一半,CPU峯值平均利用率達到了50%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3)部署選擇"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以Elasticsearch爲例,最小高可用部署集羣是3個節點,單節點承擔Master Node、Client Node、 Data Node三種角色。一般3到5個節點集羣規模影響不大,一旦到幾十個節點,寫入吞吐量達到百MB\/S,節點網絡處理線程池、內存資源競爭突顯,元數據處理與索引數據處理資源未隔離,會導致集羣元信息出現同步性能或一致性問題,進而誘發集羣不可用,所以達到了一定體量後,需要考慮分角色部署。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a7\/a7f94c64c679bd2e8955bfb0c78d751b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"滴滴Elasticsearch分角色部署實踐"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶來申請所需引擎服務,需要結合業務重要程度給出合理部署架構,常見的有單租戶單集羣VS多租戶大集羣兩種模式。引擎體驗階段,對引擎的掌控力有限,線上服務穩定性要求高,延遲與抖動敏感,建議選擇獨立集羣方案;線下場景在乎的是整體資源利用率,RT抖動不敏感,建議走大集羣多租戶的部署架構。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2、引擎發展期"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着引擎服務業務方的持續增長,集羣個數(10+)與集羣規模快速增長(100+)。一方面會遇到用戶諮詢與答疑量激增的問題,有引擎上手門檻高的原因,更多的是資源申請、Schema變更等高頻用戶操作未平臺化賦能有關;另一方面會觸碰到運維保障能力邊界,指標體系不完備,問題診斷低效,缺乏降級、限流、安全、跨AZ高可用的服務體系支撐,運維人員疲於奔命 。這個階段需要做好兩個方面的工作,一方面亟需提升引擎用戶、運維保障人員的工作效率,將高頻的變更或操作平臺化實現,另一方面需要補足開源引擎在運維友好性上的短板,構建引擎的指標體系,高可用體系,進行服務架構升級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)引擎PaaS平臺建設"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保姆式人肉支撐用戶高頻資源創建、Schema變更等操作,在引擎建設初期用戶不多的情況下尚能應對。隨着引擎服務用戶的增多,引擎上手門檻高,最多有一個Quick Star的小白用戶指南;官方用戶手冊,側重功能性描述,缺乏結合業務場景,給出最佳實踐的場景化指導的問題充分暴露, ALL IN ME的用戶服務體系成爲引擎後向前賦能業務的瓶頸。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般開源引擎都有自己的Metric指標體系,龐雜且晦澀難懂,缺乏對引擎業務過程的深度理解;另外引擎都有自己的原子API能力,腳本化的實現了高頻操作,操作過程不透明,缺乏Double Check的強制機制容易存在安全與歸零風險。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上亟需一套PaaS雲管系統,面對普通用戶降低接入和使用成本,針對常見問題給出FAQ,針對業務場景沉澱最佳實踐。針對運維保障人員,打造全託管的引擎服務,需要體系化的構建引擎的指標體系來提升問題定位的效率;需要平臺化的實現集羣的安裝、部署、升級、擴縮容的自動化執行,提升集羣變更效率與安全性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以滴滴Kafka PaaS雲平臺Logi-KafkaManager爲例介紹平臺建設理念,具體設計參見:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1ODEzNjI2NA==&mid=2247527050&idx=1&sn=bafb07f4145d3125d427e3481a158436&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"滴滴開源Logi-KafkaManager 一站式Kafka監控與管控平臺"}]},{"type":"text","text":",已開源https:\/\/github.com\/didi\/Logi-KafkaManager。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)引擎服務架構升級"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開源項目一般會開放其核心能力,定義好客戶端與服務端的交互協議,周邊生態對接、不同語言SDK版本的維護基本都交給開發者自己貢獻,很多企業級特性比如租戶定義,租戶認證,租戶的限流等都交給用戶自己拓展實現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在開源服務應用早期,往往是研發人員因業務需要引入,對引擎只有基本原理的瞭解,爲了填補開源引擎在安全、限流、災備、監控等方面的不足,往往都由業務\/中間件架構師選擇自己熟悉語言對開源SDK做一層切面包裝,對業務侵入性做到最低,自己維護SDK版本,統一推廣與升級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務的不斷拓展,業務中臺化和微服務化的落地,伴隨着人員和組織的膨脹與熵增,SDK的能力增強和BugFix變更變得異常艱難,推動業務升級SDK,溝通與落地的成本極高,需要要做好壓測、降低業務侵入、論證業務收益、配合業務發佈節奏等各種工作,SDK收斂的週期以年爲單位,SDK的引擎服模式已無法支撐業務快速發展的需要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於以上這些原因,業界普遍採用了經典Proxy代理架構,在服務端與客戶端之間架起了橋樑,滴滴內部很多引擎都有對應的落地實踐,比如Kafka-GateWay,ES-GateWay,DB-Proxy,Redis-Proxy等,下面以滴滴Elasticsearch服務爲例,介紹Proxy架構建設實踐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Elastic Proxy層承接企業特性的拓展如安全、限流、高可用,突破大數據引擎的集羣規模擴展瓶頸,用戶既能享受多租戶共享集羣、獨享集羣、獨立集羣不同SLA保障等級的服務模式,又不用關心底層物理集羣與資源的細節,構建了一套全託管的服務形態。"}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7c\/7cc01633c48ec8d3bc926d01284c16a7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"跨大版本升級,不管是存儲格式、通訊協議、數據模型、都有許多不兼容的點,可以依託於Proxy架構實現跨大版本平滑升級,以下是滴滴2019年從ES2.3跨版本升級6.6.1的架構實踐,詳情參見:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1ODEzNjI2NA==&mid=2247487159&idx=1&sn=49c915854e9e32cfb02f20e6874fcc6e&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"滴滴ElasticSearch平臺跨版本升級以及平臺重構之路"}]}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ad\/ad8b18464c4226330dfacf1b5b153730.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"依託Proxy的平臺架構真正做到了底層引擎服務與上層業務架構的解耦,爲後續底層引擎技術創新、服務架構快速迭代打下了堅實的基礎。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3、引擎突破期"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着業務的快速發展,實時峯值流量達到GB\/S,離線數據增量達TB\/天,不論是數據吞吐量還是集羣規模都達到了開源引擎的能力邊界。一方面低頻故障場景比如磁盤故障、機器死機、Linux內核問題,在集羣實例達到數百近千規模時頻出;另一方面引擎在極端場景的BUG高概率被觸發。都需要對引擎有深度的掌控力,能進行日常Bug Fix與內部版本的迭代,最終做到對引擎的自主與可控。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)引擎原理深度掌控"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着引擎服務業務方的拓展,引擎的商業價值凸顯,企業逐步搭建專門的引擎研發團隊,如何展開對開源引擎的深度學習與改進,以及後續如何Follow社區的節奏,在享受社區技術迭代紅利的同時實現企業增強特性的落地,結合在滴滴的實踐經驗,以Kafka爲例拋出一些自己的理解。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"快速熟悉開源代碼,需要搭建本地調測環境,熟悉打包、編譯、部署各個環節,跑通測試用例,學會基於測試用例進行功能調試,熟讀官方用戶與開發文檔:https:\/\/kafka.apache.org"}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/14\/1430f34ae16795bd734772200dbe94aa.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"熟悉引擎的啓動與日常運行日誌,熟悉功能模塊的主幹流程與運行原理。"}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/18\/18bb392133bbfc76f83a53aac521b3d9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定期開展引擎原理分享與源碼交流,參加社區Meetup,跟進社區Issue列表。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線上問題覆盤:熟讀源碼有利於建立引擎整體的宏觀認知,線上問題纔是最好的學習場地。線上故障深追Root Cause及時覆盤,探尋引擎每個細節,是將知識點變成認知的高效途徑。引擎同學看到問題應該像看到金子一樣,眼睛發光,追根究底,才能提升引擎的掌控力。 "}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/25\/25d88c137492fd16235367ddd5c65353.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"展開混沌工程,梳理引擎各異常場景,結合異常運行日誌與監控指標,掌握該場景引擎的運行原理,明確引擎能力邊界,做好穩定性保障的預案。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/00\/003534d6a1775ddd1e1af836c65510bf.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)引擎內部分支迭代"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線上引擎Bug:一般有兩種解法,一種是社區有對應的Issue\/Feature已解決,我們合併對應的Patch到本地維護的分支版本,進行Bug修復;另一種我們向社區提交Issue,提出解決方案,按照社區Patch提交流程進行測試與Review,最終合併到當前版本。滴滴每年累計向Apache社區包括Hadoop,Spark,Hive,Flink,HBase, Kafka,Elasticsearch,Submarine,Kylin開源項目貢獻150+Patches。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"企業級特性研發:引擎在內部服務的時候,一方面需要跟企業的LDAP、安全體系、管控平臺對接,另一方面,企業增強的引擎特性,實現方式比較本地化或者是極端小衆的場景,社區不接收對應的Patch,我們需要維護自己的特性列表和拓展實現。在維護企業特性方面修改儘可能內聚,能夠拓展引擎接口實現的儘量插件化的實現,方便後續版本升級與本地分支維護,以滴滴Elasticsearch爲例。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d1\/d102e25fd95b36b9a36159912fa3263f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着社區版本的發展,企業內部分支的演進,需要定期跟進社區版本,享受社區紅利,升級方案的制定需要非常嚴謹,涉及的環節和評估的事項較多,以滴滴Elasticsearch 7.6版本爲例進行說明。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/53\/53246acb42315a4f02d6e0d4ee2dfdae.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般涉及版本特性梳理,新特性的體驗,評估升級收益,評估版本的兼容性;內部特性反向合併到開源版本,做好功能測試、性能測試、兼容性測試;制定詳細升級與回滾方案,集羣的升級計劃,評估用戶的改動與影響面等工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4、引擎治理期"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引擎治理的核心目標是向上透傳業務價值,向下索要技術紅利,核心是通過數據看清問題,找到ROI最高的切入點,同時配套推動業務改造的核心抓手,講清引擎服務的長期價值,保障技術持續的投入和價值創造!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)透傳業務價值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着引擎服務業務方的增多,業務方應用場景持續動態演進,一方面我們可以通過資源利用率看清資源ROI情況,另一方通過用戶對RT的敏感度,服務運行穩定性關注度,應用場景的重要程度,定期Review分級保障體系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心業務,在運維保障的精力投入、核心硬件資源的傾斜程度,獨立集羣的服務模式,高可用架構的設計,不遺餘力的守好穩定性底線,就是創造了最大的業務價值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非核心業務,通過業務價值分模型,配合組織建設抓手,進行紅黑榜排名運營,鼓勵“越用越好,越好越用“的正向增長飛輪;對用信用分高的客戶,服務響應、資源申請、業務保障都給予更多的傾斜,最終保障整體資源ROI的最大化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)索要技術紅利"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務運營通常都滿足2-8原則,20%的業務特別核心,分級保障機制運營合理,對於服務運營方壓力可控,在保障核心業務價值基本面的情況下,需要對剩下的80%的業務方,在資源、性能、成本、穩定性上,從平臺層面進行統一治理與優化,結構化降低服務成本,提升服務質量,打造一個人人爲我,我爲人人的正向服務口碑體系建設,基於完善的指標體系,我們可以洞察出開源系統的軟件瓶頸,根據ROI分階段優化與創新,具體案例可以參考,我們在ES上的幾個技術創新:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1NDA4NjU2MA==&mid=2247500894&idx=2&sn=75fc7035e8cf387ff74d06da787188e0&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"滴滴離線索引快速構建FastIndex架構實踐 "}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzU1ODEzNjI2NA==&mid=2247496570&idx=1&sn=a11ff747ce40d3aa1ccc5667b5767f21&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"滴滴 ElasticSearch 千萬級 TPS 寫入性能翻倍技術剖析"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當業務的體量達到一定的規模,技術創新的商業價值得以體現,以上的優化每年都給公司節省近千萬的成本,技術的價值得到了充分的體現。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"張亮,"},{"type":"text","text":"滴滴雲-商業數據 負責人。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2014年加入滴滴,負責過LogAgent、Kafka、ElasticSearch、OLAP的引擎建設工作,具有豐富的高併發、高吞吐場景的架構設計與研發經驗,主持構建過任務調度系統、監控系統、日誌服務、實時計算、同步中心等數據體系的平臺設計與研發工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:dbaplus社羣(ID:dbaplus)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/-h1CXJJkQ8bqHUs3pHxSsA","title":"xxx","type":null},"content":[{"type":"text","text":"保姆級教程:滴滴如何基於開源引擎,打造自主可控服務體系"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章