美團配送實時特徵平臺建設實踐

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2019年5月,美團正式推出新品牌「美團配送」,升級配送開放平臺。那你知道支撐美團配送大腦的實時特徵平臺是如何建設的嗎?如何實現每分鐘生產千萬級的實時特徵?如何在70w+QPS的場景下實現4個9響應耗時在50毫秒的需求?本文將爲大家介紹配送實時特徵平臺的發展歷程,關鍵技術和實踐經驗。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"配送業務介紹"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 商業模型"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a2\/a2b0daa09046bc57d336e99ace09d014.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配送的業務最主要的就是一個履約的行爲,將用戶、商家和騎手關聯起來,促進配送效率提升、用戶體驗提高、配送成本降低,從而形成一個閉環的商業模型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即時配送平臺核心職責就是調整好用戶、商家、騎手三元的關係,用更低的成本爲用戶帶來更好的體驗,爲商家帶來更多的單量,爲騎手帶來更多的收入。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 履約模型"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/93\/9350ed33dc1424329d41f95d0169c4d7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配送就是一個履約的過程,不像電商主要是線上完成,配送主要是線下完成履約。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"物理世界中,一個運單的完成過程是從用戶下單開始到用戶收餐騎手離客爲止,需經歷一系列室內室外的場景——用戶下單,系統派單,騎手室外騎行到目的地然後下車步行到商家,等待取餐駐留室內,商家出餐,騎手取餐離店後步行上車,室外騎行到用戶目的地,下車步行上樓送餐,駐留室內等待用戶收餐,最後用戶收餐騎手離客。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到物理世界是比較複雜的一個過程,那就需要智能決策系統來做一定的調度,派給哪個騎手?何時到店取餐?同時要做一個合理的定價,針對惡劣天氣和爬樓梯等特殊情況收的費用肯定是不同的,收多少配送費?付給騎手多少配送費?最後還要給出一個合理的時間預估,配送時長是多少?商家多久可以出餐?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"算法要做以上這些智能決策過程中,就需要做履約過程整個鏈路的數字化,通過實時特徵平臺來做實時感知的數字化,通過AIoT平臺完成如騎手騎行、爬樓等行爲精準感知的數字化;本次主要介紹實時感知數字化的實時特徵平臺。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"實時特徵平臺建設"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 背景"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/32\/327de4d5153f6bdba6c40424d485570e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017年開始做實時特徵平臺建設的背景是爲了解決兩大問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"千萬訂單履約過程智能化:最開始履約過程是由簡單的規則構建的,現在要實現從規則配置化向智能化過渡,包含智能調度、ETA時間預估、配送費定價和爆單等場景;算法實時決策需要分鐘級數據的時效性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現有開發模式無法及時響應:當時實時特徵開發散落在4個業務團隊,進行煙囪式開發,流程長、效率低,存在一些重複建設;實時特徵開發耦合在業務系統中,穩定性風險較高。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 目標&規劃"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2e\/2e8c8efa5100e3b821c0514788e17732.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於建設背景,設定了建設目標——建設分鐘級時效的實時特徵平臺,多粒度刻畫履約過程,提升研發效率,降低研發成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於建設目標,設定了三階段的演進計劃:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一階段——系統化:和業務系統劃清邊界、確定實時平臺架構;將系統搭建起來,驗證是否可以支撐業務場景;將新增特徵進行收口管理。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二階段——規模化:建設高可用的系統,支撐更多的實時特徵,將舊特徵“絞殺”進行統一收口管理。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三階段——平臺化:將實時計算整合,進行完善的服務治理。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 系統化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"① 設計思路"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e0\/e0e6dab0a9d3d03984b4f9e9754081a0.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"② 整體架構"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/10\/10e4e1d593003ee88a33682eec6ec721.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進行系統化整體架構設計時,先制定了左側的架構標準:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流程標準化:從數據輸入,加工計算到數據輸出做了一個流程的標準化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據分層,將共性沉澱下來。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特徵兜底,降低風險。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於架構標準,最終制定了右側的架構圖,共分爲6層:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據源層:主要有包裹表、運單表、騎手錶以及運單擴展表。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據層:ODS層將數據進行清洗和轉換,在DWD進行維表建模和合流,最後形成索引數據和明細數據的寬表。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算層:通過標準化的SQL對寬表數據進行計算。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲層:存儲計算層輸出的特徵數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務層:通過實時特徵服務將存儲層數據統一輸出應用層應用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用層:主要有ETA時間預估策略服務、調度策略服務、保單策略服務以及定價策略服務。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"管理系統:主要就是一些元數據的管理,比如數據源、特徵口徑以及存儲的管理;還包含兜底策略管理,降級模塊,可以對單特徵降級,也支持批量降級的核按鈕。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"③ 數據層關鍵點"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/6e\/6ed504fce6a4fc8ef0eadeb5c29effb6.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"做實時數據系統大都會碰到兩個挑戰:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流亂序問題"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"端到端的Exactly-Once語義保障"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對以上挑戰,結合配送的實際業務場景給出對應的解決思路:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提前構建“拼圖”模板,實時填充:結合物理世界的配送履約過程,構建業務寬表,涵蓋下單時間、支付時間、發單時間、調度時間、接單時間、取餐時間、送達時間等,根據實時數據流填充模板寬表,避免各數據流填各自的字段,避免亂序問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上游合流保證不丟,下游解決重複問題。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"④ 計算層關鍵點"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/34\/34a992f994f22f2438c62913cf702d1f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017年調研了一些行業解決方案:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Storm開發運維成本較高、SQL化難度高。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Flink沒有現在這麼火,穩定性無法保障;Spark Streaming不是公司運維的關鍵點,對於線上場景的穩定性也是無法保障。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"耦合在業務系統內部的基於Rpc計算在公司有成熟的監控運維體系和技術框架,但有一個問題是,該場景主要是基於關係型數據庫進行計算的,例如MySQL,是存在單點問題的,擴展較難。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲對基於Rpc的計算是相對有把握的,所以首先升級計算框架,將原有基於關係數據庫計算升級爲基於內存計算,計算是無狀態、可擴展的;爲了防止數據傾斜,基於業務特點進行提前按區域分片,採用“能者多勞”模式,計算比較快的節點就多計算一些。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來介紹下計算層的核心邏輯:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,數據層形成的寬表,主要存儲在外存中,例如運單包裹合流信息,那它如何和區域dim起來?就是通過索引表,將區域和運單關聯起來。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,全國有多少區域是確定的,每隔一分鐘會通過定時任務將全國的區域放到MQ中。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,就是實時特徵計算服務FCS,每個計算節點有多個worker,每個worker中有task會做基於內存數據庫H2的計算,當收到MQ的區域信息後,會拉取對應區域的寬表數據,在H2中計算實時特徵,計算完成後會繼續計算下一批的實時特徵。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"⑤ 階段成果"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3a\/3a27d197b53c77806fe50359dd3a1212.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"建設系統化的階段成果主要有:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刻畫粒度:維度上主要有商家和區域;粒度上主要有訂單、運單、包裹。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"效率:特徵上線由原來的多天提升到分鐘級,收口新特徵有60多個。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接入量:接入9個算法模型、15個算法版本。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性:通過特徵兜底策略避免了Kafka集羣故障,系統沒有S級事故。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4. 規模化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"① 數據服務挑戰&思路"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d7\/d74e169b26aa5b4463edceff1898e605.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二階段規模化建設的背景就是推動實時特徵收口。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在外賣的圖中會顯示每單的配送時長,看上去這是一個指標,但實際上這個指標從外賣到配送經過的鏈路是很長的,最少有五六個節點,雖然只是一個ETA的預估時間,但是涉及到的特徵可能有60個左右;另外,商家列表頁會幾百個商家,還要做排序,這樣對實時特徵的壓力非常大,對實時特徵的查詢性能有更高的要求,通常200毫秒的延遲用戶就會感知到。所以實時特徵計算就會面臨兩個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性要求高:交易鏈路"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能要求高:50ms響應時間"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決以上問題的主要思路有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"定製度:“135”制度,1分鐘響應問題,3分鐘定位問題,5分鐘恢復計算"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"保穩定:做全鏈路的監控和降級"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提性能:滿足50ms的響應時間"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"② 穩定性建設框架"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a4\/a4889f984af513e08cd812d395e8f303.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合實際問題,制定了穩定性建設的框架:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"四層監控體系:硬件監控(CPU\/網絡\/磁盤\/內存)、基礎組件監控(DB\/MQ\/ES\/緩存)、服務監控(性能\/異常\/超時率\/QPS)以及全鏈路數據質量監控。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容災體系:事前會做隔離、雙緩存的架構設計,1.5倍容量規劃以及定期壓測;事中會做熔斷、限流,三層降級(計算、服務、算法各層都有各自的降級兜底策略)的容錯降級機制;事後主要是做CaseStudy的總結以及報警工具的完善,同時還要對實時索引和離線數據進行修復。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"制度:有完善的技術方案review機制、代碼reiview機制、上線制度、巡檢制度、值班制度以及報警治理等制度,最終形成一套可監控、可灰度、可回滾的技術體系。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"③ 穩定性建設:拆分、隔離"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/19\/19d8d984451d35a6ca15c3f7d6f0708c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在規模化穩定性建設的拆分和隔離上,主要做法如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務鏈路:遵循按照業務場景垂直拆分、一套代碼部署隔離的服務\/存儲拆分原則,將實時特徵服務拆分爲ETA、調度、定價、爆單四個服務,在物理環境上進行隔離,但是使用的是同一套代碼。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算鏈路:通過雙機房(rz、gh)熱備、三種場景集羣(監控、運營、履約)對Storm集羣進行拆分;使用美團自研的多機房Mafka集羣做了MQ容災,替換了單機房Kafka集羣;對於數據收集的canal做了隔離,離線、實時、zk隔離,並做了多機房的容災。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"④ 數據質量"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ae\/ae5c0bc93dcf204e5d1a76391d46ea77.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據質量上做了全鏈路過程質量的監控:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流計算:監控時效性;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FCS實時特徵計算:監控性能、延遲、完備性、準確性;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FFS實時特徵服務:監控響應時間、可用性、容量;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特徵結果:監控準確性、完備性。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如:數據清洗中會關注消息處理耗時;FCS服務會關注event流入總量、sql流入總量、內存表記錄數、空值特徵比例、單批次計算耗時;寬表會關注寬表記錄數、未填充字段佔比、不合理記錄佔比;特徵質量會關注騎手平均負載最大值發生的時間、區域特徵個數95分位數、商家平均特徵個數、騎手負載中位數、衆數等;FFS服務會關注每次請求耗時、請求密度、請求成功率和QPS等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"⑤ 查詢服務性能優化"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/09\/096896a6d20177f8bccf0ae0aa499e89.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"做查詢服務性能優化主要有三個思路:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IO:使用批量、分組方式降低IO頻次;使用PB格式替換原JSON格式、去除無用字段等減負瘦身方式來降低IO大小;使用高速的本地緩存。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU:減少Stop The World時間,使用G1替換了原來的CMS。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存:減少對象創建,控制對象大小。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終成果是TP4個9的耗時穩定在40毫秒以內。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"⑥ 建設成果"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b7\/b7d2a4955c0c2b3e83cdd4f15509e5dd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"規模化的建設成果主要有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接入量:接入100+算法版本,200+實時特徵全部收口,調度、ETA、定價、爆單策略21個核心服務全部接入。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能:60w+QPS下,4個9的響應時間在50毫秒以內;每分鐘生產1000w+特徵,計算耗時小於40秒;計算、服務、存儲都支持水平擴展。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性:應對了GH機房斷電故障,且沒有S級事故。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"5. 平臺化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"① 背景&策略"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/86\/8693a239eda1b903ced8081e8d8ee37b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"平臺化的建設背景是需要滿足更多粒度的特徵需求,第一類是降雨、降雪等天氣等級等的區域維度以及騎手軌跡等的騎手維度;第二類是通過算法實時加工的特徵,如預計出餐時長和預計進單量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"針對以上背景,有兩大策略來解決:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開放策略:事件驅動,開放履約事件,讓業務平臺可以根據開放平臺自己計算一些特徵。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集成策略:建設數據收集通道,彙總第三方特徵,業務通過收集通道將自己計算的特徵上報彙總起來;將類似於騎手軌跡這種動態維度,屏蔽了計算引擎,引入了Flink進行動態維度計算。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"② 架構升級"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f0\/f0899f9b6af7fbbb57c7030ee72dec36.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖是在平臺化建設過程中升級後的架構,藍色部分是新增的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據源層新增了採集SDK:業務平臺可以將自己算的特徵通過SDK上報到MQ裏面,然後通過計算引擎落到第三方特徵存儲,通過服務層提供給需求方。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"計算層引入了Flink和Storm,建設了一層引擎路由,屏蔽掉了底層的計算引擎,讓用戶使用無感知。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ba\/baff0e28717eaadfb859edc5f8af2481.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據粒度:新增了GeoHash、AOI的維度;新增了天氣、軌跡、預測類特徵的粒度。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接入量:實現了200+第三方特徵自助化上報,履約事件校友對接了ADS、ETA等服務。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性:無S級事故。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"③ 建設成果"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/45\/45db13826c0d4894252569db6ddfc92b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時特徵平臺的建設成果有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務效果:計算400+實時特徵,覆蓋了ETA、爆單、調度、動態定價等配送線上策略;實時特徵成爲配送履約的一環,對算法策略的效果提升顯著。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"效率提升:開發週期從多天降低到分鐘級別。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"性能和穩定性:每分鐘處理上億條數據;在70W+的QPS場景下,實時特徵服務4個9響應時間在50毫秒;沒有S級故障。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未來規劃"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/49\/497461db3f4406ea5a0cfe368c8a9024.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配送數據方向的未來規劃有:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據治理:除了實時特徵外,還有活動類、運營類的實時數據,因此未來考慮實時特徵以及其他場景的數據與實時數倉進行融合;雖然目前做了一些端到端的監控,但大都是單節點的監控,未來會做從數據源到最終提供服務的全鏈路的數據質量建設。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"降低研發成本:可以看到目前架構中使用的計算引擎開發運維成本有點高,有FCS微批處理、Storm、Flink,未來會考慮將計算引擎整合,降低開發運維成本。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"今天的分享就到這裏,謝謝大家。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"嘉賓介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"李金康,美團高級技術專家。2013年加入美團,現任美團配送數據組數據應用組的負責人,長期負責配送數據架構的系統開發與架構升級,主導配送實時數據建設、配送BI系統建設、實時特徵平臺建設,爲全國海量騎手及各級管理團隊和算法團隊提供信息化支持。擁有多年互聯網研發及技術管理經驗,在大數據、高併發、高可用架構設計等領域積累了豐富的經驗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:DataFunTalk(ID:datafuntalk)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/gafduk7jhD9QnB1Lmmb1qw","title":"xxx","type":null},"content":[{"type":"text","text":"美團配送實時特徵平臺建設實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章