百度搜索穩定性問題分析的故事(上)

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"導讀:","attrs":{}},{"type":"text","text":"百度搜索系統是百度歷史最悠久、規模最大並且對其的使用已經植根在大家日常生活中的系統。坊間有一種有趣的做法:很多人通過打開百度搜索來驗證自己的網絡是不是通暢的。這種做法說明百度搜索系統在大家心目中是“穩定”的代表,且事實確是如此。百度搜索系統爲什麼具有如此高的可用性?背後使用了哪些技術?以往的技術文章鮮有介紹。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文立足於大家所熟悉的百度搜索系統本身,爲大家介紹其可用性治理中關於“穩定性問題分析”方面使用的精細技術,以歷史爲線索,介紹穩定性問題分析過程中的困厄之境、破局之道、創新之法。希望給讀者帶來一些啓發,更希望能引起志同道合者的共鳴和探討。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"全文7741字,預計閱讀時間17分鐘。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第1章 困境","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在大規模微服務系統下,如果故障未發生,應該歸功於運氣好。但是永遠不要指望故障不發生,必須把發生故障當作常態。從故障發生到解除過程遵循的基本模式抽象如下。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f0/f02493c7257f3a5cecd82242d43a7408.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可用性治理主要從這3個角度着手提升:1. 加強系統韌性;2. 完善止損手段,提升止損有效性,加速止損效率;3. 加速原因定位和解除效率。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上3點,每個都是一項專題,限於篇幅,本文僅從【3】展開。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度搜索系統的故障原因定位和解除,是一件相當困難的事情,也可能是全公司最具有挑戰性的一件事情。困難體現在以下幾個方面。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"極其複雜的系統 VS. 極端嚴格的可用性要求","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度搜索系統分爲在線和離線兩部分。離線系統每天從整個互聯網抓取資源,建立索引庫,形成倒排、正排和摘要三種重要的數據。然後,在線系統基於這些數據,接收用戶的query,並以極快的速度爲用戶找到他想要的內容。如下圖所示。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/16/16350ae8f90be671f19823bd90ba878a.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度搜索系統是極其龐大的。讓我們通過幾個數字直觀感受一下它的規模:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e1/e1ffc4cdce4c7b988b7636f770c4e4ba.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度搜索系統的資源佔用量摺合成數十萬臺機器,系統分佈在天南海北的N大地域,搜索微服務系統包含了數百種服務,包含的數據量達到數十PB級別,天級變更次數達到數十萬量級,日常的故障種類達到數百種,搜索系統有數百人蔘與研發,系統每天面臨數十億級的用戶搜索請求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然系統是超大規模,但是百度對可用性的要求是極其嚴格的。百度搜索系統的可用性是在5個9以上的。這是什麼概念呢?如果用可提供服務的時間來衡量,在5個9的可用性下,系統一年不可用時間只有5分鐘多,而在6個9的可用性下,一年不可用的時間只有半分鐘左右。所以,可以說百度搜索是不停服的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個query到達百度搜索系統,要經歷上萬個節點的處理。下圖展示了一個query經歷的全部節點的一小部分,大概佔其經歷節點全集的幾千分之一。在這種複雜的路徑下,所有節點都正常的概率是極其小的,異常是常態。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8ef86e828e06e0d9baaf41941fbe3706.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"複雜的系統,意味着故障現場的數據收集和分析是一項浩大的工程。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"多樣的穩定性問題種類","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度搜索系統向來奉行“全”、“新”、“快”、“準”、“穩”五字訣。日常中的故障主要體現在“快”和“穩”方面,大體可歸爲三類:","attrs":{}}]},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"PV損失故障:未按時、正確向用戶返回query結果,是最嚴重的故障。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"搜索效果故障:預期網頁未在搜索結果中展現;或未排序在搜索結果的合理位置;搜索結果頁面響應速度變慢。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"容量故障:因外部或內部等各種原因,無法保證系統高可用需要的冗餘度,甚至容量水位超過臨界點造成崩潰宕機等情況,未及時預估、告警、修復。","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些種類繁多、領域各異的問題背後,不變的是對數據採集加工的需求和人工分析經驗的自動化抽象。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第2章 引進來、本土化:破局","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在2014年以前,故障原因定位和解除都在和數據較勁,當時所能用到的數據,主要有兩種。一是搜索服務在線日誌(logging);二是一些分佈零散的監控(metrics)。這兩類數據,一方面不夠翔實,利用效率低,問題追查有死角;另一方面,對它們的使用強依賴於人工,自動化程度低。以一個例子說明。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"拒絕問題的分析首先通過中控機上部署的腳本定時掃描線上服務抓取單PV各模塊日誌,展現到一個拒絕分析平臺(這個平臺在當時已經算是比較強大的拒絕原因分析工具了)頁面,如下圖所示;然後人工閱讀抓取到的日誌原文進行分析。這個過程雖然具有一定的自動化能力,但是PV收集量較小,數據量不足,很多拒絕的原因無法準確定位;數據平鋪展示需要依賴有經驗的同學閱讀,分析效率極其低下。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9c/9ce70264e3b087bf3390db7306dc6033.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在問題追查死角和問題追查效率上,前者顯得更爲迫切。無死角的問題追查呼籲着更多的可觀測數據被收集到。如果在非生產環境,獲取這些數據是輕而易舉的,雖然會有query速度上的損失,但是在非生產環境都能容忍,然而,這個速度損失的代價,在生產環境中是承受不起的。在理論基石《Dapper, a Large-Scale Distributed Systems Tracing Infrastructure》的指導下,我們建設了kepler1.0系統,它基於query抽樣,產出調用鏈和部分annotation(query處理過程中的非調用鏈的KV數據)。同時,基於業界開源的prometheus方案,我們完善自己的metrics系統。它們上線後立即產生了巨大的應用價值,打開了搜索系統可觀測性建設和應用的想象空間。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 kepler1.0簡介","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統架構如下圖所示。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d5/d5bf6ed17d2530b05ecffe596c948d9e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"階段性使命:kepler1.0在於完善搜索系統的可觀測性,基於開源成熟方案結合公司內組件實現從0到1的建設,快速完成可觀測性能力空白的補齊,具備根據queryID查詢query處理過程的調用鏈以及途徑服務實例日誌的能力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引進來:從kepler1.0的架構不難發現,它從數據通路、存儲架構等方面完整的參考zipkin","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本土化:引進zipkin時數據採集sdk只支持c++,爲了滿足對非c++模塊的可觀測性需求,兼顧sdk的多語言維護成本以及trace的侵入性,採用了常駐進程通過日誌採集輸出格式和c++ sdk兼容的trace數據,即圖中的日誌採集模塊。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 通用metrics採集方案初步探索","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統架構如下圖所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/eb5a14b3a2bf8804977a4aacf04244e3.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"階段性使命:2015年前後搜索開始探索大規模在線服務集羣容器化混部技術,此時公司內的監控系統對多維度指標匯聚支持較弱,基於機器維度指標的傳統容量管理方式已經難以滿足容器化混部場景的需求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"引進來:將開源界成熟的metrics方案引入搜索在線服務混部集羣,實現了符合prometheus協議的容器指標exporter,並依託prometheus的靈活多維度指標查詢接口以及grafana豐富的可視化能力,建設了搜索在線業務混部集羣容量管理依賴的底層數據系統。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本土化:容器指標prometheus-exporter和搜索在線PaaS系統深度對接,將服務元信息輸出爲prometheus的label,實現了容器元信息的指標索引和匯聚能力,滿足容器化混部場景下容量管理的需求。指標和PaaS元信息關聯是雲原生metrics系統的初步探索主要成果。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.3 應用效果初顯","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"場景1:拒絕、效果問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"階段性痛點:人工分析強依賴日誌,從海量調用鏈、日誌數據中精確檢索出某些特定query,通過ssh掃線上機器日誌效率很低,且對線上服務存在home盤io打滿導致穩定性風險。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決情況:對命中常態隨機抽樣拒絕問題、可復現的效果問題開啓強制抽樣採集,通過queryID直接從平臺查詢調用鏈及日誌用於人工分析原因,基本滿足了這個階段的trace需求。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"場景2:速度問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"階段性痛點:僅有日誌數據,缺乏調用鏈的精細時間戳;一個query激發的調用鏈長、扇出度大,日誌散落廣泛,難收集。通過日誌幾乎無法恢復完整的時序過程。這導致速度的優化呈現黑盒狀態。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決情況:補全了調用鏈的精細時間戳,使query的完整時序恢復成爲可能。通過調用鏈可以查找到程序層面耗時長尾階段或調度層面熱點實例等優化點,基於此,孵化並落地了tcp connect異步化、業務回調阻塞操作解除等改進項目。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"場景3:容量問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"階段性痛點:多維度指標信息不足(缺少容器指標、指標和PaaS系統脫節);缺少有效的匯聚、加工、組合、對比、挖掘以及可視化手段。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決情況:建設了搜索在線的容器層面多維度指標數據採集系統,爲容器化的容量管理應用提供了重要的基礎輸出來源,邁出了指標系統雲原生化探索的一步。下圖爲項目上線後通過容器指標進行消耗審計功能的截圖。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6c/6c220a33b0662ea80f866bbef858c819.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"第3章 創新:應用價值的釋放","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然kepler1.0和prometheus打開了可觀測性建設的大門,但是受限於能力,已經難以低成本地獲取更多的使用價值了。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 源動力","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於開源方案的實現在資源成本、採集延遲、數據覆蓋面等方面無法滿足搜索服務和流量規模,這影響了穩定性問題解決的徹底性,特別是在搜索效果問題層面表現尤爲嚴重,諸如無法穩定復現搜索結果異常問題、關鍵結果在索引庫層面未預期召回問題等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"穩定性問題是否得到解決永遠是可觀測性建設的出發點和落腳點,毫不妥協的數據建設一直是重中之重。從2016年起,搜索開始引領可觀測性的創新並將它們做到了極致,使各類問題得以切實解決。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 全量採集","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲搜索系統規模太龐大,所以kepler1.0只能支持最高10%的採樣率,在實際使用中,資源成本和問題解決徹底性之間存在矛盾。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1)搜索系統大部分故障都是query粒度的。很多case無法穩定復現,但又需要分析出歷史上某個特定query的搜索結果異常的原因。讓人無奈的是,當時只有備份下來的日誌才能滿足任一歷史query的數據回溯需求,但它面臨收集成本高的難題;另外,很多query沒有命中kepler1.0的抽樣,其詳細的tracing數據並未有被激發出來,分析無從下手。能看到任一歷史特定query的tracing和logging信息是幾乎所有同學的願望。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2)公司內部存儲服務性價比較低、可維護性不高,通過擴大采樣率對上述問題進行覆蓋需要的資源成本巨大,實際中無法滿足。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於這個矛盾,業界當時並沒有很好的解決方案。於是,我們通過技術創新實現了kepler2.0系統。系統從實現上將tracing和logging兩種數據解耦,通過單一職責設計實現了針對每種數據特點極致優化,以極低的資源開銷和極少的耗時增長爲成本,換取了全量query的tracing和logging能力,天級別數十PB的日誌和數十萬億量級的調用鏈可實現秒查。讓大多數故障追查面臨的問題迎刃而解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/45/451e9ef8dfb8afe8e51fc339e3a9034c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"3.2.1 全量日誌索引","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先,我們介紹全量日誌索引,對應於上圖中日誌索引模塊。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"搜索服務的日誌都會在線上機器備份相當長一段時間,以往的解決方案都着眼於將日誌原文輸出到旁路系統,然而,忽略了在線集羣天然就是一個日誌原文的現成的零成本存儲場所。於是,我們創新的提出了一套方案,核心設計理念概括成一句話:原地建索引。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"北斗中通過一個四元組定義一條日誌的索引,我們叫做location,它由4個字段組成:ip(日誌所在機器)+inode(日誌所在文件)+offset(日誌所在偏移量)+length(日誌長度)。這四個字段共計20字節,且只和日誌條數有關,和日誌長度無關,由此實現對海量日誌的低成本索引。location由log-indexer模塊(部署在搜索在線服務機器上)採集後對原始日誌建立索引,索引保存在日誌所在容器的磁盤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"北斗本地存儲的日誌索引邏輯格式如下圖所示。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/35/35d3914ffe905347f134496b8806f8cc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢時,將inode、offset、length發送給索引ip所在的機器(即原始日誌所在機器),通過機器上日誌讀取模塊,可根據inode、offset、length以O(1)的時間複雜度定點查詢返回日誌原文,避免了對文件的scan過程,減少了不必要的cpu和io消耗,減小了日誌查詢對生產環境服務穩定性的影響。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時,除了支持location索引以外,我們還支持了靈活索引,例如將檢索詞、用戶標識等有業務含義的字段爲二級索引,方便問題追查時拿不到queryID的場景,可支持根據其他靈活索引中的信息進行查詢;在索引的使用方式上,除了用於日誌查詢以外,我們還通過索引推送方式構建了流式處理架構,用於支持對日誌流式分析的應用需求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏還有一個問題:查詢某一query的日誌時,是不是仍然需要向所有實例廣播查詢請求?答案是:不會。我們對查詢過程做了優化,方法是:通過下文介紹的callgraph全量調用鏈輔助,來確定query的日誌位於哪些實例上,實現定點發送,避免廣播。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.2 全量調用鏈","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在dapper論文提供的方案中,同時存在調用鏈和annotation兩種類型的數據。經過重新審視,我們發現,annotation的本質是logging,可以通過logging來表達;而調用鏈既可以滿足分析問題的需要,又因爲它具有整齊一致的數據格式而極易創建和壓縮,達到資源的高性價比利用。所以,callgraph系統(kepler2.0架構圖中紅色部分)就帶着數據最簡、最純潔的特點應運而生。全量調用鏈的核心使命在於將搜索全部query的調用鏈數據在合理的資源開銷下存儲下來並高效查詢。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在tracing的數據邏輯模型中,調用鏈的核心元素爲span,一個span由4部分組成:父節點span_id、本節點span_id、本節點訪問的子節點ip&port、開始&結束時間戳。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"全量調用鏈核心技術創新點在於兩點:(1)自研span_id推導式生成算法,(2)結合數據特徵定製壓縮算法。相比kepler1.0,在存儲開銷上實現了60%的優化。下面分別介紹這兩種技術。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"3.2.2.1 span_id推導式生成算法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說明:下圖中共有兩個0和1兩個span,每個span由client端和server端兩部分構成,每個方框爲向trace系統的存儲中真實寫入的數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"左圖:kepler1.0隨機數算法。爲了使得一個span的client和server能拼接起來並且還原出多個span之間的父子關係,所有span的server端必須保存parent_span_id。因此兩個span實際需要向存儲中寫入4條數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"右圖:kepler2.0推導式算法,span_id自根節點從0開始,每調用一次下游就累加該下游實例的ip作爲其span_id並將其傳給下游,下游實例遞歸在此span_id上繼續累加,這樣可以保證一個query所有調用的span_id是唯一性。實例只需要保存自己的span_id和下游的ip,即可根據算法還原出一個span的client端和server端。由此可見,只需要寫入2條數據且數據中不需要保存parent_span_id,因此存儲空間得到了節省,從而實現了全量調用鏈的採集能力。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"右圖中ip1:port1對ip2:port的調用鏈模擬了對同一個實例ip2:port2訪問多次的場景,該場景在搜索業務中廣泛存在(例如:一個query在融合層服務會請求同一個排序服務實例兩次;調度層面上游請求下游異常重試到同一個實例等),推導式算法均可以保證生成span_id在query內的唯一性,從而保證了調用鏈數據的完整性。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/31/3186b351b6da937a1aad9f0bbaeb6a48.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":5},"content":[{"type":"text","text":"3.2.2.2 數據壓縮","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結合數據特徵綜合採用多種壓縮算法。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) 業務層面:結合業務數據特徵進行了定製化壓縮,而非採用通用算法無腦壓縮。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(a) timestamp:使用相對於base的差值和pfordelta算法。對扇出型服務多子節點時間戳進行了壓縮,只需保存第一個開始時間戳以及相對該時間戳的偏移。以搜索在線服務常見高扇出、短時延場景爲例,存儲偏移比直接存儲兩個時間戳節省70%。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(b) ip:搜索內網服務ip均爲10.0.0.0/24網段,故只保存ip的後3字節,省去第1字節的10,每個ip節省25%。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) protobuf層面:業務層面的數據最終持久化存儲時採用了protobuf,靈活運用protobuf的序列化特性節省存儲。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(a) varint:變長代替原來定長64位對所有的整數進行壓縮保存,對於ip、port、時間戳偏移這種不足64位的數據實現了無存儲浪費。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(b) packed repeated:ip和timestamp均爲repeated類型,只需要保存一次field number。packed默認是不開啓的,導致每個repeated字段都保存一次field number,造成了極大浪費。以平均扇出比爲40的扇出鏈路爲例,開啓packed可節省了25%的存儲空間(40字節的field number)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終,一個span的邏輯格式(上圖)和物理格式(下圖)如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ed/edeb414e2f999b68ec4763a002500afc.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2.3 應用場景的受益","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.3.1 時光穿越:歷史上任一特定query的關鍵結果在索引庫層面未預期召回問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲召回層索引庫是搜索最大規模的服務集羣,kepler1.0在索引庫服務上只支持0.1%抽樣率,使得由於索引庫的某個庫種和分片故障導致的效果問題追查捉襟見肘。全量調用鏈採集較好的解決了這一困境。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"真實案例:PC搜索 query=杭州 未展現百度百科結果,首先通過工具查詢到該結果的url所在數據庫A的9號分片,進一步通過全量調用鏈調用鏈查看該query對數據庫A所有請求中丟失了9號分片(該分片因重試後仍超時被調度策略丟棄),進一步定位該分片所有副本均無法提供服務導致失敗,修復服務後預期結果正常召回。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/12/1203c0b4c417a149de2de08e563a1a0c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2.3.2 鏈式分析:有狀態服務導致“誤中副車”型效果問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有狀態服務效果問題分析複雜性:以最常見的cache服務爲例。如果沒有cache只需通過效果異常的queryID通過調用鏈和日誌即可定位異常原因。但顯然搜索在線系統不可能沒有cache,且通常cache數據會輔以異步更新機制,此時對於命中了髒cache的query只是“受害者”,它的調用鏈和日誌無法用於問題最終定位,需要時序上前一個寫cache的query的調用鏈和日誌進行分析,我們稱其爲“搗亂者”。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/07/075d70e115cafaebbbd12a86192520e4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kepler1.0的侷限性:kepler1.0採樣算法是隨機比例抽樣,“搗亂者”和“受害者”兩個query是否命中抽樣是獨立事件,由於“搗亂者”在先,當“受害者”在受到效果影響時,已無法倒流時間觸發前者抽樣了,導致兩個query在“時序”維度夠成的trace鏈條中斷,追查也隨之陷入了困境。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"kepler2.0的破解之法:在實現“縱向關聯”(某一query處理過程中全量調用鏈和日誌信息)基礎上,藉助全量調用鏈建設了“橫向關聯”能力,支持了對時序上多個關聯query的鏈式追蹤需求。寫cache時將當前query的TraceId記錄到cache結果中,讀cache的query就可通過cache結果中的queryID找到“搗亂者”。藉助全量調用鏈功能即可對“搗亂者”寫髒cache的原因進行分析定位。另外,用戶界面也對時序追蹤的易用性進行了特殊設計,例如,對日誌中寫cache的queryID進行飄紅,點擊該字段可以直接跳轉到對應query的調用鏈和日誌查詢頁面。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"小結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上,極致的數據建設解決了問題追查的死角,此時問題分析效率成爲主要矛盾,下篇我們爲大家帶來百度搜索如何通過對人工分析經驗進行抽象,實現自動化、智能化的故障問題,從而保障百度搜索穩定性。未完待續,敬請期待……","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本期作者 | ZhenZhen;LiDuo;XuZhiMing","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"招聘信息","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"關注同名公衆號百度Geek說,輸入內推即可加入搜索架構部,我們期待你的加入!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"推薦閱讀","attrs":{}},{"type":"text","text":":","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"|","attrs":{}},{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247495229&idx=1&sn=b3cfdbcf0a5ebcc44d673dfbc8f83196&chksm=c03ede41f74957571ed3ef7e2a1f2f5f128bb3e8b9466039e0321a3340b49e8d823ae4f21d10&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"百度關於微前端架構EMP的探索:落地生產可用的微前端架構","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"|","attrs":{}},{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247495152&idx=1&sn=11b4052ed004b010394851423b6c98c5&chksm=c03edd8cf749549a5810740daa40b1496be3167e4785f19dbd515c3dcda92c3f22233398c83f&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"社羣編碼識別黑灰產攻擊實踐","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"|","attrs":{}},{"type":"link","attrs":{"href":"http://mp.weixin.qq.com/s?__biz=Mzg5MjU0NTI5OQ==&mid=2247494914&idx=1&sn=6aeb11be56935107a7eee7618f619cf2&chksm=c03edd7ef74954680cf3f79464bca4e7755598fca29c6d6d58a700508f646a8c795d54727685&scene=21#wechat_redirect","title":null,"type":null},"content":[{"type":"text","text":"PornNet:色情視頻內容識別網絡","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"---------- END ----------","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度Geek說","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"百度官方技術公衆號上線啦!","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術乾貨 · 行業資訊 · 線上沙龍 · 行業大會","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"招聘信息 · 內推信息 · 技術書籍 · 百度周邊","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"歡迎各位同學關注","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章