餓了麼 EMonitor 演進史

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"序言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時間回到 2008年,還在上海交通大學上學的張旭豪、康嘉等人在上海創辦了餓了麼,從校園外賣場景出發,餓了麼一步一步發展壯大,成爲外賣行業的領頭羊。2017年8月餓了麼併購百度外賣,強強合併,繼續開疆擴土。2018年餓了麼加入阿里巴巴大家庭,與口碑融合成立阿里巴巴本地生活公司。“愛什麼,來什麼”,是餓了麼對用戶不變的承諾。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":"br"}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"餓了麼的技術也伴隨着業務的飛速增長也不斷突飛猛進。據公開報道,2014年5月的日訂單量只有 10 萬,但短短几個月之後就衝到了日訂單百萬,到當今日訂單上千萬單。在短短几年的技術發展歷程上,餓了麼的技術體系、穩定性建設、技術文化建設等都有長足的發展。各位可查看往期文章一探其中發展歷程,在此不再贅述:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzU4NzU0MDIzOQ==&mid=2247490741&idx=3&sn=138f49c5cd0d856c686ca0fa013dc034&chksm=fdeb2ed5ca9ca7c308a526110a7549d71bdfb4d617ba69c15290fb02a6ee8b247998b8abcbd8&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"《餓了麼技術往事(上)》"}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzU4NzU0MDIzOQ==&mid=2247490841&idx=1&sn=d5ef12b413afbae2b717a4d6a5bef2c4&chksm=fdeb2f79ca9ca66f8c6f5f2e0f0177dd4782f92427fc00777fc25a4e81c2661a9eaaf69b7c00&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"《餓了麼技術往事(中)》"}]}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"http:\/\/mp.weixin.qq.com\/s?__biz=MzU4NzU0MDIzOQ==&mid=2247491179&idx=2&sn=336b690b1004ed1ce65e665de66c857c&chksm=fdeb2c0bca9ca51d9f6134ebeee9cd07b222a04124ad0ebdd61e348b14a959348c26146fc782&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"《餓了麼技術往事(下)》"}]}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而可觀測性作爲技術體系的核心環節之一,也跟隨餓了麼技術的飛速發展,不斷自我革新,從“全鏈路可觀測性 ETrace”擴展到“多活下的可觀測性體系 ETrace”,發展成目前“一站式可觀測性平臺 EMonitor”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/00\/00f3ccd03e8c085f816ac58507393d63.gif","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EMonitor 經過 5 年的多次迭代,現在已經建成了集指標數據、鏈路追蹤、可視化面板、報警與分析等多個可觀測性領域的平臺化產品。EMonitor 每日處理約 1200T 的原始可觀測性數據,覆蓋餓了麼絕大多數中間件,可觀測超 5 萬臺機器實例,可觀測性數據時延在 10 秒左右。面向餓了麼上千研發人員,EMonitor 提供精準的報警服務和多樣化的觸達手段,同時運行約 2 萬的報警規則。本文就細數餓了麼可觀測性的建設歷程,回顧下“餓了麼可觀測性建設的那些年”。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d5\/d51970c46a1a0fb97732523b339ba6bd.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1.0:混沌初開,萬物興起"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"翻看代碼提交記錄,ETrace 項目的第一次提交在 2015年10月24日。而 2015年,正是餓了麼發展的第七個年頭,也是餓了麼業務、技術、人員開始蓬勃發展的年頭。彼時,餓了麼的可觀測性系統依賴 Zabbix、Statsd、Grafana 等傳統的“輕量級”系統。而“全鏈路可觀測性”正是當時的微服務化技術改造、後端服務 Java 化等技術發展趨勢下的必行之勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可觀測性團隊,在調研業界主流的全鏈路可觀測性產品——包括著名的開源全鏈路可觀測性產品“CAT”(https:\/\/github.com\/dianping\/cat)後,吸取衆家之所長,在兩個多月的爆肝開發後,推出了初代 ETrace。我們提供的 Java 版本 ETrace-Agent 隨着新版的餓了麼 SOA 框架“Pylon”在餓了麼研發團隊中的推廣和普及開來。ETrace-Agent 能自動收集應用的 SOA 調用信息、API 調用信息、慢請求、慢 SQL、異常信息、機器信息、依賴信息等。下圖爲 1.0 版本的 ETrace 頁面截圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/35\/357c50545eb09bcb02d7a29899fc3819.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在經歷了半年的爆肝開發和各中間件兄弟團隊的鼎力支持,我們又開發了 Python 版本的 Agent,更能適應餓了麼當時各語言百花齊放的技術體系。並且,通過和餓了麼 DAL 組件、緩存組件、消息組件的密切配合與埋點,用戶的應用增加了多層次的訪問信息,鏈路更加完整,故障排查過程更加清晰。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"整體架構體系"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ETrace 整體架構如下圖。通過 SDK 集成在用戶應用中的 Agent 定期將 Trace 數據經 Thrift 協議發送到 Collector(Agent 本地不落日誌),Collector 經初步過濾後將數據打包壓縮發往 Kafka。Kafka 下游的 Consumer 消費這些 Trace數據,一方面將數據寫入 HBase+HDFS,一方面根據與各中間件約定好的埋點規則,將鏈路數據計算成指標存儲到時間序列數據庫-- LinDB 中。在用戶端,Console 服務提供 UI 及查詢指標與鏈路數據的 API,供用戶使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/af\/afd4b6227678e123898ffb74d3c331c7.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"全鏈路可觀測性的實現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所謂全鏈路可觀測性,即每次業務請求中都有唯一的能夠標記這次業務完整的調用鏈路,我們稱這個 ID 爲 RequestId。而每次鏈路上的調用關係,類似於樹形結構,我們將每個樹節點上用唯一的 RpcId 標記。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/96\/96101bf18d11b5d6538ba0b2f5c21326.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖,在入口應用 App1 上會新建一個隨機 RequestId(一個類似 UUID 的 32 位字符串,再加上生成時的時間戳)。因它爲根節點,故 RpcId 爲“1”。在後續的 RPC 調用中,RequestId 通過 SOA 框架的 Context 傳遞到下一節點中,且下一節點的層級加 1,變爲形如“1.1”、“1.2”。如此反覆,同一個 RequestId 的調用鏈就通過 RpcId 還原成一個調用樹。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"也可以看到,“全鏈路可觀測性的實現”不僅依賴與 ETrace 系統自身的實現,更依託與公司整體中間件層面的支持。如在請求入口的 Gateway 層,能對每個請求生成“自動”新的 RequestId(或根據請求中特定的 Header 信息,複用 RequestId 與 RpcId);RPC 框架、Http 框架、Dal 層、Queue 層等都要支持在 Context 中傳遞 RequestId 與 RpcId。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/*\n記錄一個調用鏈路\n\/\nTransaction trasaction = Trace.newTransaction(String type, String name);\n\/\/ business codes\ntransaction.complete();\n\/*\n記錄調用中的一個事件\n\/\nTrace.logEvent(String type, String name, Map tags, String status, String data)\n\/*\n記錄調用中的一個異常\n\/\nTrace.logError(String msg, Exception e)"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Consumer 的設計細節"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Consumer 組件的核心任務就是將鏈路數據寫入存儲。"},{"type":"text","text":"主要思路是以 RequestId+RpcId 作爲主鍵,對應的 Data 數據寫入存儲的 Payload。再考慮到可觀測性場景是寫多讀少,並且多爲文本類型的 Data 數據可批量壓縮打包存儲,"},{"type":"text","marks":[{"type":"strong"}],"text":"因此我們設計了基於 HDFS+HBase 的兩層索引機制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c3\/c36cdc9c14a8ec4e3ec299cf39ff8db9.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖,Consumer 將 Collector 已壓縮好的 Trace 數據先寫入 HDFS,並記錄寫入的文件 Path 與寫入的 Offset,第二步將這些“索引信息”再寫入 HBase。特別的,構建 HBase 的 Rowkey 時,基於 ReqeustId 的 Hashcode 和 HBase Table 的 Region 數量配置,來生成兩個 Byte 長度的 ShardId 字段作爲 Rowkey 前綴,避免了某些固定 RequestId 格式可能造成的寫入熱點問題。( 因RequestId 在各調用源頭生成,如應用自身、Nginx、餓了麼網關層等。可能某應用錯誤設置成以其 AppId 爲前綴 RequestId,若沒有 ShardId 來打散,則它所有 RequestId 都將落到同一個 HBase Region Server 上。)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在查詢時,根據 RequestId + RpcId 作爲查詢條件,依次去 HBase、HDFS 查詢原始數據,便能找到某次具體的調用鏈路數據。但有的需求場景是,只知道源頭的 RequestId 需要查看整條鏈路的信息,希望只排查鏈路中狀態異常的或某些指定 RPC 調用的數據。因此,我們在 HBbase 的 Column Value 上還額外寫了 RPCInfo 的信息,來記錄單次調用的簡要信息。如:調用狀態、耗時、上下游應用名等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,餓了麼的場景下,研發團隊多以訂單號、運單號作爲排障的輸入,因此我們和業務相關團隊約定特殊的埋點規則——在 Transaction 上記錄一個特殊的\"orderId={實際訂單號}\"的 Tag——便會在 HBase 中新寫一條“訂單表”的記錄。該表的設計也不復雜,Rowkey 由 ShardId 與訂單號組成,Columne Value 部分由對應的 RequestId+RpcId 及訂單基本信息(類似上文的 RPCInfo)三部分組成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如此,從業務鏈路到全鏈路信息到詳細單個鏈路,形成了一個完整的全鏈路排查體系。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/89\/892b0d052ef7c1d91fdbd5138ca2c1b5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Consumer 組件的另一個任務則是將鏈路數據計算成指標。"},{"type":"text","text":"實現方式是在寫入鏈路數據的同時,在內存中將 Transaction、Event 等數據按照既定的計算邏輯,計算成 SOA、DAL、Queue 等中間件的指標,內存稍加聚合後再寫入時序數據庫 LinDB。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"指標存儲:LinDB 1.0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"應用指標的存儲是一個典型的時間序列數據庫的使用場景。根據我們以前的經驗,市面上主流的時間序列數據庫——OpenTSDB、InfluxDB、Graphite——在擴展能力、集羣化、讀寫效率等方面各有缺憾,所以我們選型使用 RocksDB 作爲底層存儲引擎,借鑑Kafka的集羣模式,開發了餓了麼的時間序列數據庫--LinDB。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指標採用類似 Prometheus 的“指標名+鍵值對的 Tags”的數據模型,每個指標只有一個支持 Long 或 Double 的 Field。某個典型的指標如:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"COUNTER: eleme_makeorder{city=\"shanghai\",channel=\"app\",status=\"success\"} 45"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們主要做了一些設計實現:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指標寫入時根據“指標名+Tags”進行 Hash 寫入到 LinDB 的 Leader上,由 Leader 負責同步給他的 Follower。 "}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"借鑑 OpenTSDB 的存儲設計,將“指標名”、TagKey、TagValue 都轉化爲 Integer,放入映射表中以節省存儲資源。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocksDB 的存儲設計爲:以\"指標名+TagKeyId + TagValueId+時間(小時粒度)“作爲 Key,以該小時時間線內的指標數值作爲 Value。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲實現 Counter、Timer 類型數據聚合邏輯,開發了 C++ 版本 RocksDB 插件。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這套存儲方案在初期很好的支持了 ETrace 的指標存儲需求,爲 ETrace 大規模接入與可觀測性數據的時效性提供了堅固的保障。有了 ETrace,餓了麼的技術人終於能從全鏈路的角度去排查問題、治理服務,爲之後的技術升級、架構演進,提供了可觀測性層面的支持。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"其中架構的幾點說明"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1. 是否保證所有可觀測性數據的可靠性?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不,我們承諾的是“儘可能不丟”,不保證 100% 的可靠性。基於這個前提,爲我們設計架構時提供了諸多便利。如,Agent 與 Collector 若連接失敗,若干次重試後便丟棄數據,直到 Collector 恢復可用;Kafka 上下游的生產和消費也不必 ACK,避免影響處理效率。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2. 爲什麼在 DK 中的 Agent 將數據發給 Collector,而不是直接發送到 Kafka?"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"避免 Agent 與 Kafka 版本強綁定,並避免引入 Kafka Client 的依賴。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Collector 層可以做數據的分流、過濾等操作,增加了數據處理的靈活性。並且 Collector 會將數據壓縮後再發送到 Kafka,有效減少 Kafka 的帶寬壓力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Collector 機器會有大量 TCP 連接,可針對性使用高性能機器。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3. SDK 中的 Agent 如何控制對業務應用的影響?"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"純異步的 API,內部採用隊列處理,隊列滿了就丟棄。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Agent 不會寫本地日誌,避免佔用磁盤 IO、磁盤存儲而影響業務應用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Agent 會定時從 Collector 拉取配置信息,以獲取後端 Collector 具體 IP,並可實時配置來開關是否執行埋點。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"4. 爲什麼選擇侵入性的 Agent?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"選擇寄生在業務應用中的 SDK 模式,在當時看來更利於 ETrace 的普及與升級。而從現在的眼光看來,非侵入式的 Agent 對用戶的集成更加便利,並且可以通過 Kubernates 中 SideCar 的方式對用戶透明部署與升級。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"5. 如何實現“儘量不丟數據”?"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Agent 中根據獲得的 Collector IP 週期性數據發送,若失敗則重試 3 次。並定期(5分鐘)獲取 Collector 集羣的 IP 列表,隨機選取可用的 IP 發送數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Collector 中實現了基於本地磁盤的 Queue,在後端的 Kafka 不可用時,會將可觀測性數據寫入到本地磁盤中。待 Kafka 恢復後,又會將磁盤上的數據,繼續寫入 Kafka。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"6. 可觀測性數據如何實現多語言支持?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Agent 與 Collector 之間選擇 Thrift RPC 框架,並定製整個序列化方式。Java\/Python\/Go\/PHP 的 Agent 依數據規範開發即可。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.0:異地多活,大勢初成"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2016 年底,餓了麼爲了迎接業務快速增長帶來的調整,開始推進“異地多活”項目。新的多數據中心架構對既有的可觀測性架構也帶來了調整,ETrace 亦經過了一年的開發演進,升級到多數據中心的新架構、拆分出實時計算模塊、增加報警功能等,進入 ETrace2.0 時代。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7d\/7d9677f9b15034c292113aa5b2d3c104.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"異地多活的挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着餓了麼的異地多活的技術改造方案(https:\/\/zhuanlan.zhihu.com\/p\/32009822)確定,對可觀測性平臺提出了新的挑戰:"},{"type":"text","marks":[{"type":"strong"}],"text":"如何設計多活架構下的可觀測性系統?以及如何聚合多數據中心的可觀測性數據?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過一年多的推廣與接入,ETrace 已覆蓋了餓了麼絕大多數各語言的應用,每日處理數據量已達到了數十 T 以上。在此數據規模下,決不可能將數據拉回到某個中心機房處理。"},{"type":"text","marks":[{"type":"strong"}],"text":"因此“異地多活”架構下的可觀測性設計的原則是:各機房處理各自的可觀測性數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/dd\/ddfd14b528688338d97189e0bc912cb4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們開發一個 Gateway 模塊來代理與聚合各數據中心的返回結果,它會感知各機房間內 Console 服務。圖中它處於某個中央的雲上區域,實際上它可以部署在各機房中,通過域名的映射機制來做切換。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如此部署的架構下,各機房中的應用由與機房相綁定的環境變量控制將可觀測性數據發送到該機房內的 ETrace 集羣,收集、計算、存儲等過程都在同一機房內完成。用戶通過前端 Portal 來訪問各機房內的數據,使用體驗和之前類似。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即使考慮極端情況下——某機房完全不可用(如斷網),“異地多活”架構可將業務流量切換到存活的機房中,讓業務繼續運轉。而可觀測性上,通過將 Portal 域名與 Gateway 域名切換到存活的機房中,ETrace 便能繼續工作(雖然會缺失故障機房的數據)。在機房網絡恢復後,故障機房內的可觀測性數據也能自動恢復(因爲該機房內的可觀測性數據處理流程在斷網時仍在正常運作)。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"可觀測性數據實時處理的挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 1.0 版本中的 Consumer 組件,既負責將鏈路數據寫入到 HBase\/HDFS 中,又負責將鏈路數據計算成指標存儲到 LinDB 中。兩個流程可視爲同步的流程,但前者可接受數分鐘的延遲,後者要求達到實時的時效性。當時 HBase 集羣受限於機器性能與規模,經常在數據熱點時會寫入抖動,進而造成指標計算抖動,影響可用性。因此,我們迫切需要拆分鏈路寫入模塊與指標計算模塊。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在選型實時計算引擎時,我們考慮到需求場景是:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"能靈活的配置鏈路數據的計算規則,最好能動態調整;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"能水平擴展,以適應業務的快速發展;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"數據輸出與既有系統(如 LinDB 與 Kafka)無縫銜接;"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"很遺憾的是,彼時業界無現成的拿來即用的大數據流處理產品。我們就基於複雜事件處理(CEP)引擎 Esper 實現了一個類 SQL 的實時數據計算平臺——Shaka。Shaka 包括“Shaka Console”和“Shaka Container”兩個模塊。Shaka Console 由用戶在圖形化界面上使用,來配置數據處理流程(Pipeline)、集羣、數據源等信息。用戶完成 Pipeline 配置後,Shaka Console 會將變更推送到 Zookeeper上。無狀態的 Shaka Container 會監聽 Zookeeper 上的配置變更,根據自己所屬的集羣去更新內部運行的 Component 組件。而各 Component 實現了各種數據的處理邏輯:消費 Kafka 數據、處理 Trace\/Metric 數據、Metric 聚合、運行 Esper 邏輯等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5a\/5a6dea4306587360f9a4aa7720e24d01.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Trace 數據和 Metric 格式轉換成固定的格式後,剩下來按需編寫 Esper 語句就能生成所需的指標了。如下文所示的 Esper 語句,就能將類型爲 Transaction 的 Trace 數據計算成以“{appId}.transaction”的指標(若 Consumer 中以編碼方式實現,約需要近百行代碼)。經過這次的架構升級,Trace 數據能快速的轉化爲實時的 Metric 數據,並且對於業務的可觀測性需求,只用改改 SQL 語句就能快速滿足,顯著降低了開發成本和提升了開發效率。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"ETrace API 示例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Java 或 Python 中提供鏈路埋點的 API:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n@Name('transaction')\n@Metric(name = '{appId}.transaction', tags = {'type', 'name', 'status', 'ezone', 'hostName'}, fields = {'timerCount', 'timerSum', 'timerMin', 'timerMax'}, sampling = 'sampling')\nselect header.ezone as ezone,\n header.appId as appId,\n header.hostName as hostName,\n type as type,\n name as name,\n status as status,\n trunc_sec(timestamp, 10) as timestamp,\n f_sum(sum(duration)) as timerSum,\n f_sum(count(1)) as timerCount,\n f_max(max(duration)) as timerMax,\n f_min(min(duration)) as timerMin,\n sampling('Timer', duration, header.msg) as sampling\nfrom transaction\ngroup by header.appId, type, name, header.hostName, header.ezone, status, trunc_sec(timestamp, 10);"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"新的 UI、更豐富的中間件數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1.0 版本的前端 UI,是集成在 Console 項目中基於 Angular V1 開發的。我們迫切希望能做到前後端分離,各司其職。於是基於 Angular V2 的若干個月開發,新的 Portal 組件登場。得益於 Angular 的數據綁定機制,新的 ETrace UI 各組件間聯動更自然,排查故障更方便。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"餓了麼自有中間件的研發進程也在不斷前行,在可觀測性的打通上也不斷深化。2.0 階段,我們進一步集成了——Redis、Queue、ElasticSearch 等等,覆蓋了餓了麼所有的中間件,讓可觀測性無死角。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"殺手級功能:指標查看與鏈路查看的無縫整合"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"傳統的可觀測性系統提供的排障方式大致是:接收報警(Alert)--查看指標(Metrics)--登陸機器--搜索日誌(Trace\/Log),而 ETrace 通過 Metric 與 Trace 的整合,能讓用戶直接在 UI 上通過點擊就能定位絕大部分問題,顯著拔高了用戶的使用體驗與排障速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/db\/dba0c49a5dd7b4daee26be87c9dac8d6.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某個排查場景如:用戶發現總量異常突然增加,可在界面上篩選機房、異常類型等找到實際突增的某個異常,再在曲線上直接點擊數據點,就會彈出對應時間段的異常鏈路信息。鏈路上有詳細的上下游信息,能幫助用戶定位故障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/e3\/e3ceb3b6611e3e145e070fbe3f5bf700.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它的實現原理如上圖所示。具體的,前文提到的實時計算模塊 Shaka 將 Trace 數據計算成 Metric 數據時,會額外以抽樣的方式將 Trace 上的 RequsetId 與 RpcId 也寫到 Metric 上(即上文 Esper 語句中,生成的 Metric 中的 Sampling 字段)。這種 Metric數據會被 Consumer 模塊消費並寫入到 HBase 一張 Sampling 表中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶在前端 Portal 的指標曲線上點擊某個點時,會構建一個 Sampling 的查詢請求。該請求會帶上:該曲線的指標名、數據點的起止時間、用戶選擇過濾條件(即 Tags)。Consumer 基於這些信息構建一個 HBase 的 RegexStringComparator 的 Scan 查詢。查詢結果中可能會包含多個結果,對應着該時間點內數據點(Metric)上發生的多個調用鏈路(Trace),繼而拿着結果中的 RequestId+RpcId 再去查詢一次 HBase\/HDFS 存儲就能獲得鏈路原文。(注:實際構建 HBase Rowkey 時 Tag 部分存的是其 Hashcode 而不是原文 String。)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"衆多轉崗、離職的餓了麼小夥伴,最念念不忘的就是這種“所見即所得”的可觀測性排障體驗。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"報警 Watchdog 1.0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在應用可觀測性基本全覆蓋之後,報警的需求自然成了題中之義。技術選型上,根據我們在實時計算模塊 Shaka 上收穫的經驗,決定再造一個基於實時數據的報警系統——Watchdog。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/83\/45\/83da54fe32420e2fe0714efefbc54945.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時計算模塊 Shaka 中已經將 Trace 數據計算成指標 Metrics,報警模塊只需消費這些數據,再結合用戶配置的報警規則產出報警事件即可。因此,我們選型使用 Storm 作爲流式計算平臺,在 Spount 層次根據報警規則過濾和分流數據,在 Bolt 層中 Esper 引擎運行着由用戶配置的報警規則轉化成 Esper 語句並處理分流後的 Metric 數據。若有符合 Esper 規則的數據,即生成一個報警事件 Alert。Watchdog Portal 模塊訂閱 Kafka 中的報警事件,再根據具體報警的觸達方式通知到用戶。默認 Esper 引擎中數據聚合時間窗口爲 1 分鐘,所以整個數據處理流程的時延約爲 1 分鐘左右。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Metrics API 與 LinDB 2.0:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 ETrace 1.0 階段,我們只提供了 Trace 相關的 API,LinDB 僅供內部存儲使用。用戶逐步的意識到如果能將“指標”與“鏈路”整合起來,就能發揮更大的功用。因此我們在 ETrace-Agent 中新增了 Metrics 相關的 API:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ 計數器類型\nTrace.newCounter(String metricName).addTags(Map tags).count(int value);\n\/\/ 耗時與次數\nTrace.newTimer(String metricName).addTags(Map tags).value(int value);\n\/\/ 負載大小與次數\nTrace.newPayload(String metricName).addTags(Map tags).value(int value);\n\/\/ 單值類型\nTrace.newGauge(String metricName).addTags(Map tags).value(int value);"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於這些 API,用戶可以在代碼中針對他的業務邏輯進行指標埋點,爲後來可觀測性大一統提供了實現條件。在其他組件同步開發時,我們也針對 LinDB 做了若干優化,提升了寫入性能與易用性:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"增加 Histogram、Gauge、Payload、Ratio 多種指標數據類型;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"從 1.0 版本的每條指標數據都調用一次 RocksDB 的 API 進行寫入,改成先在內存中聚合一段時間,再通過 RocksDB 的 API 進行批量寫入文件。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3.0:推陳出新,融會貫通"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"可觀測性系統大一統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 2017 年的餓了麼,除了 ETrace 外還有多套可觀測性系統:基於 Statsd\/Graphite 的業務可觀測性系統、基於 InfluxDB 的基礎設施可觀測性系統。後兩者都集成 Grafana 上,用戶可以去查看他的業務或者機器的詳細指標。但實際排障場景中,用戶還是需要在多套系統間來回切換:根據 Grafana 上的業務指標感知業務故障,到 ETrace 上查看具體的 SOA\/DB 故障,再到 Grafana 上去查看具體機器的網絡或磁盤 IO 指標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然,我們也開發了 Grafana 的插件來集成 LinDB 的數據源,但因本質上差異巨大的系統架構,還是讓用戶“疲於奔命”式的來回切換系統,用戶難以有統一的可觀測性體驗。因此 2018 年初,我們下定決心:"},{"type":"text","marks":[{"type":"strong"}],"text":"將多套可觀測性系統合而爲一,打通“業務可觀測性+應用可觀測性+基礎設施可觀測性”,讓 ETrace 真正成爲餓了麼的一站式可觀測性平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b1\/b1e563cecb159ba9ae4ed359368d5b92.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1、LinDB 3.0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所謂“改造”未動,“存儲”先行。想要整合 InfluxDB 與 Statsd,先要研究他們與 LinDB 的異同。我們發現,InfluxDB 是支持一個指標名(Measurement)上有多個 Field Key 的。如,InfluxDB 可能有以下指標:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nmeasurement=census, fields={butterfiles=12, honeybees=23}, tags={location=SH, scientist=jack}, timestamp=2015-08-18T00:06:00Z"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若是 LinDB 2.0 的模式,則需要將上述指標轉換成兩個指標:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\nmeasurement=census, field={butterfiles=12}, tags={location=SH, scientist=jack}, timestamp=2015-08-18T00:06:00Z\nmeasurement=census, field={honeybees=23}, tags={location=SH, scientist=jack}, timestamp=2015-08-18T00:06:00Z"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以想見在數據存儲與計算效率上,單Field模式有着極大的浪費。但更改指標存儲的Schema,意味着整個數據處理鏈路都需要做適配和調整,工作量和改動極大。然而不改就意味着“將就”,我們不能忍受對自己要求的降低。因此又經過了幾個月的爆肝研發,LinDB 3.0 開發完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這次改動,除了升級到指標多 Fields 模式外,還有以下優化點:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"集羣方面引入 Kafka 的 ISR 設計,解決了之前機器故障時查詢數據缺失的問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲層面支持更加通用的多 Field 模式,並且支持對多 Field 之間的表達式運算。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引入了倒排索引,顯著提高了對於任意 Tag 組合的過濾查詢的性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"支持了自動化的 Rollup 操作,對於任意時間範圍的查詢自動選取合適的粒度來聚合。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過這次大規模優化後,從最初的每日 5T 指標數據漲到如今的每日 200T 數據,LinDB 3.0 都經受住了考驗。指標查詢的響應時間的 99 分位線爲 200ms。詳細設計細節可參看分佈式時序數據庫--LinDB(https:\/\/zhuanlan.zhihu.com\/p\/35998778)。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2、將 Statsd 指標轉成 LinDB 指標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Statsd 是餓了麼廣泛使用的業務指標埋點方案,各機房有一個數十臺機器規模的 Graphite 集羣。考慮到業務的核心指標都在 Statsd 上,並且各個 AppId 以 ETrace Metrics API 替換 Statsd 是一個漫長的過程(也確實是,前前後後替換完成就花了將近一年時間)。爲了減少對用戶與 NOC 團隊的影響,我們決定:用戶更新代碼的同時,由 ETrace 同時“兼容”Statsd 的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"得益於餓了麼強大的中間件體系,業務在用 Statsd API 埋點的同時會“自動”記一條特殊的 Trace 數據,攜帶上 Statsd 的 Metric 數據。那麼只要處理 Trace 數據中的 Statsd 埋點,我們就能將大多數 Statsd 指標轉化爲 LinDB 指標。如下圖:多個 Statsd 指標會轉爲同一個 LinDB 指標。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"\n\/\/ statsd:\nstats.app.myAppName.order.from_ios.success 32\nstats.app.myAppName.order.from_android.success 29\nstats.app.myAppName.order.from_pc.failure 10\nstats.app.myAppName.order.from_openapi.failure 5\n\/\/ lindb:\nMetricName: myAppName.order\nTags:\n \"tag1\"=[from_ios, from_android,from_pc, from_openapi]\n \"tag2\"=[success, failure]"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前我們的實時計算模塊 Shaka 就在這裏派上了大用場:只要再新增一路數據處理流程即可。如下圖,新增一條 Statsd 數據的處理 Pipeline,並輸出結果到 LinDB。在用戶的代碼全部從 Statsd API 遷移到 ETrace API 後,這一路處理邏輯即可移除。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d5\/17\/d5278bc3bf09f70c5b62d1cec1ab1017.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3、將 InfluxDB 指標轉成 LinDB 指標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfluxDB 主要用於機器、網絡設備等基礎設施的可觀測性數據。餓了麼每臺機器上,都部署了一個 ESM-Agent。它負責採集機器的物理指標(CPU、網絡協議、磁盤、進程等),並在特定設備上進行網絡嗅探(Smoke Ping)等。這個數據採集 Agent 原由 Python 開發,在不斷需求堆疊之後,已龐大到難以維護;並且每次更新可觀測邏輯,都需要全量發佈每臺機器上的 Agent,導致每次 Agent 的發佈都令人心驚膽戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們從 0 開始,以 Golang 重新開發了一套 ESM-Agent,做了以下改進:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可觀測性邏輯以插件的形式,推送到各宿主機上。不同的設備、不同應用的機器,其上運行的插件可以定製化部署。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"制定插件的交互接口,讓中間件團隊可定製自己的數據採集實現,解放了生產力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"移除了 etcd,使用 MySql 做配置數據存儲,減輕了系統的複雜度。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開發了便利的發佈界面,可灰度、全量的推送與發佈 Agent,運維工作變得輕鬆。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最重要的,收集到的數據以 LinDB 多 Fields 的格式發送到 Collector 組件,由其發送到後續的處理與存儲流程上。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3、將 InfluxDB 指標轉成 LinDB 指標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"InfluxDB 主要用於機器、網絡設備等基礎設施的可觀測性數據。餓了麼每臺機器上,都部署了一個 ESM-Agent。它負責採集機器的物理指標(CPU、網絡協議、磁盤、進程等),並在特定設備上進行網絡嗅探(Smoke Ping)等。這個數據採集 Agent 原由 Python 開發,在不斷需求堆疊之後,已龐大到難以維護;並且每次更新可觀測邏輯,都需要全量發佈每臺機器上的 Agent,導致每次 Agent 的發佈都令人心驚膽戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們從 0 開始,以 Golang 重新開發了一套 ESM-Agent,做了以下改進:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可觀測性邏輯以插件的形式,推送到各宿主機上。不同的設備、不同應用的機器,其上運行的插件可以定製化部署。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"制定插件的交互接口,讓中間件團隊可定製自己的數據採集實現,解放了生產力。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"移除了 etcd,使用 MySql 做配置數據存儲,減輕了系統的複雜度。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開發了便利的發佈界面,可灰度、全量的推送與發佈 Agent,運維工作變得輕鬆。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最重要的,收集到的數據以 LinDB 多 Fields 的格式發送到 Collector 組件,由其發送到後續的處理與存儲流程上。"}]}]}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a9\/bb\/a92be88d13395bd4b80145e335d978bb.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"從 ETrace 到 EMonitor,不斷升級的可觀測性體驗"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2017 年底,我們團隊終於迎來了一名正式的前端開發工程師,可觀測性團隊正式從後端開發寫前端的狀態中脫離出來。在之前的 Angular 的開發體驗中,我們深感“狀態轉換”的控制流程甚爲繁瑣,並且開發的組件難以複用(雖然其後版本的 Angular 有了很大的改善)。在調用當時流行的前端框架後,我們在Vue與React之中選擇了後者,輔以 Ant Design 框架,開發出了媲美 Grafana  的指標看版與便利的鏈路看板,並且在 PC 版本之外還開發了移動端的定製版本。"},{"type":"text","marks":[{"type":"strong"}],"text":"我們亦更名了整個可觀測性產品,從“ETrace”更新爲“EMonitor”:不僅僅是鏈路可觀測性系統,更是餓了麼的一站式可觀測性平臺。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1、可觀測性數據的整合:業務指標 + 應用鏈路 + 基礎設施指標 + 中間件指標"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在指標系統都遷移到 LinDB 後,我們在 EMonitor 上集成了“業務指標 + 應用鏈路 + 基礎設施指標 + 中間件指標”的多層次的可觀測性數據,讓用戶能在一處觀測它的業務數據、排查業務故障、深挖底層基礎設施的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b8\/b89b9f3b7e8e33f2d65c2ace0273565c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2、可觀測性場景的整合:指標 + 鏈路 + 報警"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在可觀測性場景上,“指標看板”用於日常業務盯屏與宏觀業務可觀測性,“鏈路”作爲應用排障與微觀業務邏輯透出,“報警”則實現可觀測性自動化,提高應急響應效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/ea\/ea8c44dcd4aa72e44bd4db44d758d57a.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3、靈活的看板配置與業務大盤"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在指標配置上,我們提供了多種圖表類型--線圖、面積圖、散點圖、柱狀圖、餅圖、表格、文本等,以及豐富的自定義圖表配置項,能滿足用戶不同數據展示需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/16\/1621d091a00e5018afc5a810d476ea18.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d7\/d7492df385c5262cf4205d9bc2e023cc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在完成單個指標配置後,用戶需要將若干個指標組合成所需的指標看板。用戶在配置頁面中,先選擇待用的指標,再通過拖拽的方式,配置指標的佈局便可實時預覽佈局效果。一個指標可被多個看板引用,指標的更新也會自動同步到所有看板上。爲避免指標配置錯誤而引起歧義,我們也開發了“配置歷史”的功能,指標、看板等配置都能回滾到任意歷史版本上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"看板配置是靜態圖表組合,而業務大盤提供了生動的業務邏輯視圖。用戶可以根據他的業務場景,將指標配置整合成一張宏觀的業務圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/45\/459bd7869762e68e8d4b3483c1cbf732.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"4、第三方系統整合:變更系統 + SLS 日誌"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因每條報警信息和指標配置信息都與 AppId 關聯,那麼在指標看板上可同步標記出報警的觸發時間。同理,我們拉取了餓了麼變更系統的應用變更數據,將其標註到對應 AppId 相關的指標上。在故障發生時,用戶查看指標數據時,能根據有無變更記錄、報警記錄來初步判斷故障原因。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"餓了麼的日誌中間件能自動在記錄日誌時加上對應的 ETrace 的 RequestId 等鏈路信息。如此,用戶查看 SLS 日誌服務時,能反查到整條鏈路的 RequestId;而 EMonitor 也在鏈路查看頁面,拼接好了該應用所屬的 SLS 鏈接信息,用戶點擊後能直達對應的 SLS 查看日誌上下文。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/eb\/ebf7e968371df7fb33e90e1dc84bc0c6.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"5、使用場景的整合:桌面版 + 移動版"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除提供桌面版的 EMonitor 外,我們還開發了移動版的 EMonitor,它也提供了大部分可觀測性系統的核心功能——業務指標、應用指標、報警信息等。移動版 EMonitor 能內嵌於釘釘之中,打通了用戶認證機制,幫助用戶隨時隨地掌握所有的可觀測性信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2e\/2e32205799ddb66958481c2aaca38846.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"6、爲了極致的體驗,精益求精"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了用戶的極致使用體驗,我們在 EMonitor 上各功能使用上細細打磨,這裏僅舉幾個小例子:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"我們爲極客開發者實現了若干鍵盤快捷鍵。例如,“V”鍵就能展開查看指標大圖。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"圖上多條曲線時,點擊圖例是默認單選,目的是讓用戶只看他關心的曲線。此外,若是“Ctrl+鼠標點擊”則是將其加選擇的曲線中。這個功能在一張圖幾十條曲線時,對比幾個關鍵曲線時尤爲有用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"爲了讓色弱開發者更容易區分成功或失敗的狀態,我們針對性的調整了對應顏色的對比度。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/90\/904aa93433454cb101e8a6a2580b8ba3.jpeg","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"成爲餓了麼一站式可觀測性平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"EMonitor 開發完成後,憑藉優異的用戶體驗與產品集成度,很快在用戶中普及開來。但是,EMonitor 要成爲餓了麼的一站式可觀測性平臺,還剩下最後一戰——NOC 可觀測性大屏。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NOC 可觀測性大屏替換"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"餓了麼有一套完善的應急處理與保障團隊,包括 7*24 值班的 NOC(Network Operation Center)團隊。在 NOC 的辦公區域,有一整面牆上都是可觀測性大屏,上面顯示着餓了麼的實時的各種業務曲線。下圖爲網上找的一張示例圖,實際餓了麼的 NOC 大屏比它更大、數據更多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/62\/6206acfeaa6554d0f0543c38204e8812.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當時這個可觀測大屏是將 Grafana 的指標看版投影上去。我們希望將 NOC 大屏也替換成 EMonitor 的看版。如前文所說,我們逐步將用戶的 Statsd 指標數據轉換成了 LinDB 指標,在 NOC 團隊的協助下,一個一個將 Grafana 的可觀測性指標“搬”到 EMonitor 上。此外,在原來白色主題的 EMonitor 之上,我們開發了黑色主題以適配投屏上牆的效果(白色背景投屏太刺眼)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"終於趕在 2018 年的雙十一之前,EMonitor 正式入駐 NOC 可觀測大屏。在雙十一當天,衆多研發擠在 NOC 室看着牆上的 EMonitor 看版上的業務曲線不斷飛漲,作爲可觀測性團隊的一員,這份自豪之情由衷而生。經此一役,EMonitor 真正成爲了餓了麼的“一站式可觀測性平臺”,Grafana、Statsd、InfluxDB 等都成了過去時。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"報警 Watchdog 2.0"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣在 EMonitor 之前,亦有 Statsd 與 InfluxDB 對應的多套報警系統。用戶若想要配置業務報警、鏈路報警、機器報警,需要輾轉多個報警系統之間。各系統的報警的配置規則、觸達體驗亦是千差萬別。Watchdog 報警系統也面臨着統一融合的挑戰。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"在調研其他系統的報警規則實現後,Watchdog 中仍以 LinDB 的指標作爲元數據實現。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"針對其他報警系統的有顯著區別的訂閱模式,我們提出了\"報警規則+一個規則多個訂閱標籤+一個用戶訂閱多個標籤\"的方式,完美遷移了幾乎其他系統所有的報警規則與訂閱關係。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"其他各系統在報警觸達與觸達內容上也略有不同。我們統一整合成“郵件+短信+釘釘+語音外呼”四種通知方式,並且提供可參數化的自定義 Markdown 模板,讓用戶可自己定時報警信息。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過一番艱苦的報警配置與邏輯整合後,我們爲用戶“自動”遷移了上千個報警規則,並最終爲他們提供了一個統一的報警平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f8\/f8324027cb138935865de6bf88ee7d8b.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"報警,更精準的報警"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"外賣行業的業務特性是業務的午高峯與晚高峯,在業務曲線上便是兩個波峯的形狀。這樣的可觀測數據,自然難以簡單使用閾值或比率來做判斷。即使是根據歷史同環比、3-Sigma、移動平均等規則,也難以適應餓了麼的可觀測性場景。因爲,餓了麼的業務曲線並非一成不變,它受促銷、天氣因素、區域、壓測等因素影響。開發出一個自適應業務曲線變化的報警算法,勢在必行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7c\/7c93982a633e5c774bf81b407dad048e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們經過調研既有規則,與餓了麼的業務場景,推出了全新的“趨勢”報警。簡要算法如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"計算曆史10天的指標數據中值作爲基線。其中這10天都取工作日或非工作日。不取10天的均值而取中值是爲了減少壓測或機房流量切換造成的影響。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"根據二階滑動平均算法,得到滑動平均值與當前實際值的差值。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"將基線與差值相加作爲預測值。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"根據預測值的數量級,計算出波動的幅度(如上界與下界的數值)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"若當前值不在預測值與波動幅度確定的上下界之中,則觸發報警。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示,22點01分的實際值因不在上下界所限定的區域之中,會觸發報警。但從後續趨勢來看,該下降趨勢符合預期,因此實際中還會輔以“偏離持續 X 分鐘”來修正誤報。(如該例中,可增加“持續 3 分鐘才報警”的規則,該點的數據便不會報警)算法中部分參數爲經驗值,而其中波動的閾值參數用戶可按照自己業務調整。用戶針對具備業務特徵的曲線,再也不用費心的去調整參數,配置上默認的“趨勢”規則就可以覆蓋大多數的可觀測性場景,目前“趨勢”報警在餓了麼廣泛運用。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"智能可觀測性:根因分析,大顯神威"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲 AIOPS 中重要的一環,根因分析能幫助用戶快速定位故障,縮短故障響應時間,減少故障造成的損失。2020 年初,我們結合餓了麼場景,攻堅克難,攻破“指標下鑽”、“根因分析”兩大難關,在 EMonitor 上成功落地。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根因分析最大的難點在於:"},{"type":"text","marks":[{"type":"strong"}],"text":"包含複雜維度的指標數據難以找到真正影響數據波動的具體維度;孤立的指標數據也難以分析出應用上下游依賴引起的故障根因。"},{"type":"text","text":"例如,某個應用的異常指標突增,當前我們只能知道突增的異常名、機房維度的異常分佈、機器維度的異常分佈等,只有用戶手工去點擊異常指標看來鏈路之後,才能大致判斷是哪個 SOA 方法\/DB 請求中的異常。繼而用戶根據異常鏈路的環節,去追溯上游或下游的應用,重複類似的排查過程,最後以人工經驗判斷出故障點。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,在“指標下鑽”上,我們針對目標指標的曲線,細分成最精細的每個維度數據(指標 group by 待分析的 tag 維度),使用 KMeans 聚類找出故障數據的各維度的最大公共特徵,依次計算找到最優的公共特徵,如此便能找到曲線波動對應的維度信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/42\/42f204c7bf4c8b0e65a3963ecce544a4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,在鏈路數據計算時,我們就能將額外的上下游附加信息附加到對應的指標之中。如,可在異常指標中追加一個維度來記錄產生異常的 SOA 方法名。這樣在根據異常指標分析時,能直接定位到是這個應用的那個 SOA 方法拋出的異常,接下來“自動”分析是 SOA 下游故障還是自身故障(DB、Cache、GC 等)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1e\/1e6af825f0988ccaf4e57bc792b7313d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/26\/2644ca1808730505b4007176c75d1f49.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 2020.3月在餓了麼落地以來,在分析的上百例故障中,根因分析的準確率達到 90% 以上,顯著縮短了故障排查的時間,幫助各業務向穩定性建設目標向前跨進了一大步。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/be\/be342feb69680663c0d652dcadd81602.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4.0:繼往開來,乘勢而上"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過 4、5 年的發展,風雲變幻但團隊初心不改,爲了讓用戶用好可觀測性系統,EMonitor 沒有停下腳步,自我革新,希望讓“天下沒有難用的可觀測性系統”。我們向集團的可觀測性團隊請教學習,結合本地生活自己的技術體系建設,力爭百尺竿頭更進一步,規劃了以下的 EMonitor 4.0 的設計目標。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"進行多租戶化改造,保障核心數據的時延和可靠性"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在本地生活的技術體系與阿里巴巴集團技術體系的不斷深入的融合之中,單元化的部署環境以及對可觀測性數據不同程度的可靠性要求,催生了“多租戶化”的設計理念。我們可以根據應用類型、數據類型、來源等,將可觀測性數據分流到不同的租戶中,再針對性配置數據處理流程及分配處理能力,實現差異化的可靠性保障能力。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c6\/b5\/c65c897d35a71e08b40c364a35748fb5.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"初步我們可以劃分爲兩個集羣——"},{"type":"text","marks":[{"type":"strong"}],"text":"核心應用集羣與非核心應用集合"},{"type":"text","text":",根據在應用上標記的“應用等級”將其數據自動發送到對應集羣中。兩套集羣在資源配置上優先側重核心集羣,並且完全物理隔離。此外通過配置開關可動態控制某個應用歸屬的租戶,實現業務的柔性降級,避免當下偶爾因個別應用的不正確埋點方式會影響整體可觀測可用性的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"未來可根據業務發展進一步發展出業務相關的租戶,如到家業務集羣、到店業務集羣等。或者按照區域的劃分,如彈內集羣、彈外集羣等。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"打通集團彈內、彈外的可觀測性數據,成爲本地生活的一站式可觀測性平臺"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前本地生活很多業務領域已經遷入集團,在 Trace 鏈路可觀測方面,雖然在本地生活上雲的項目中,EMonitor 已經通過中間件改造實現鷹眼 TraceId 在鏈路上的傳遞,並記錄了 EMonitor RequestId 與鷹眼 TraceId 的映射關係。但 EMonitor 與鷹眼在協議上的天然隔閡仍使得用戶需要在兩個平臺間跳轉查看同一條 Trace 鏈路。因此,我們接下來的目標是與鷹眼團隊合作,將鷹眼的 Trace 數據集成到 EMonitor 上,讓用戶能一站式的排查問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其次,本地生活上雲後,衆多中間件已遷移到雲上中間件,如雲 Redis、雲 Kafka、雲 Zookeeper 等。對應的可觀測性數據也需要額外登陸到阿里雲控制檯去查看。雲上中間的可觀測性數據大多已存儲到 Prometheus 之中,因此我們計劃在完成 Prometheus 協議兼容後,就與雲上中間件團隊合作,將本地生活的雲上可觀測性數據集成到 EMonitor 上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e4\/54\/e438816b667fac281b6cd9c07933a354.jpg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"擁抱雲原生,兼容 Prometheus、OpenTelemetry 等開源協議"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雲原生帶來的技術革新勢不可擋,本地生活的絕大多數應用已遷移到集團的容器化平臺——ASI 上,對應帶來的新的可觀測環節也亟需補全。如,ASI 上 Prometheus 協議的容器可觀測性數據、Envoy 等本地生活 PaaS 平臺透出的可觀測性數據與 Trace 數據等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,我們計劃在原先僅支持 LinDB 數據源的基礎上,增加對 Prometheus 數據源的支持;擴展 OpenTelemetry 的 otel-collector exporter 實現,將 Open Telemetry 協議的 Trace 數據轉換成 EMonitor 的 Trace 格式。如此便可補全雲原生技術升級引起的可觀測性數據缺失,並提供高度的適配性,滿足本地生活的可觀測性建設。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"縱觀各大互聯網公司的產品演進,技術產品的走向與命運都離不開公司業務的發展軌跡。我們餓了麼的技術人是幸運的,能趕上這一波技術變革的大潮,能夠發揮聰明才智,打磨出一些爲用戶津津樂道的技術產品。我們 EMonitor 可觀測性團隊也爲能參與到這次技術變更中深感自豪,EMonitor 能被大家認可, 離不開每位參與到餓了麼可觀測性體系建設的同伴,也感謝各位對可觀測性系統提供幫助、支持、建議的夥伴!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者簡介:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"柯聖,花名“炸天”,餓了麼監控技術組負責人。自 2016 年加入餓了麼,長期深耕於可觀測性領域,全程參與了 ETrace 到 EMonitor 的餓了麼可觀測性系統的發展歷程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:阿里巴巴中間件(ID:Aliware_2018)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/1Vc5hUX7XUwQhaqnv8ls8g","title":"xxx","type":null},"content":[{"type":"text","text":"餓了麼 EMonitor 演進史"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章