Uber大型實時數據智能平臺建設

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic"}],"text":"本文最初發表於Uber官方博客,經授權由InfoQ中文站翻譯併發布"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Uber,實時數據(乘車請求數、可用司機數、天氣、遊戲等)可以讓運營團隊作出明智的決定,例如"},{"type":"link","attrs":{"href":"https:\/\/www.uber.com\/drive\/partner-app\/how-surge-works\/","title":"","type":null},"content":[{"type":"text","text":"動態定價"}]},{"type":"text","text":"、最大調度預計到達時間計算以及對我們服務的供求情況進行預測,從而改善 Uber 平臺上的用戶體驗。儘管通過確定中長期趨勢,批量數據可以提供強大的洞察力,但 Uber 服務可以將流式數據與實時處理結合起來,以每分鐘一次的方式創建可操作的洞察力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos 是 Uber 的實時數據處理、存儲和查詢平臺,旨在推動大規模、高效率的數據探索。通過數據智能,團隊可以更好地理解 Uber 市場並提高其效率。應用實例包括動態定價、最大調度預計到達時間計算和供求預測。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲保證 Gairos 可以繼續優化它在不斷擴大的用例組合中的性能,我們重新構建了該平臺,以實現更好的擴展性、穩定性和可持續性。在這兩種最優策略中,影響最大的是數據驅動分片、查詢路由和智能緩存。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該平臺採用數據驅動的分片和查詢路由技術,可支持 4 倍於以往解決方案的併發查詢。有些關鍵的集羣甚至已經從每月一次宕機,穩定爲每月零宕機。自從 2018 年 12 月發佈以來,該平臺通過智能緩存技術,其規模已超過 10 倍,緩存命中率超過 80%。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼是 Gairos?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Uber 生態系統中,每個團隊都有自己的數據管道和用於自己用例的查詢服務,他們必須對此保持關注(監督監控、預警、維護解決方案的流處理框架等),而不是專注於系統優化。Gairos 的出現爲實時數據處理、存儲、查詢建立了統一的平臺,讓團隊可以將更多精力放在系統優化上。與實時數據系統的通用任務相比,用戶可以專注於定製系統的業務邏輯。Gairos 的作用如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"允許用戶在高級別上查詢數據,而不用擔心數據層的所有低級別細節,如潛在的異構數據源、查詢優化、數據處理邏輯和索引方案。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"允許 Gairos 團隊在不影響消費者(通過特定領域中的數據抽象層)的情況下實驗和發展數據層。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在處理高吞吐量、低延遲調用或基於離線批處理\/建模的調用時,它會根據用例進行優化。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Gairos 概述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖 1 所示,Gairos 從不同的 Apache Kafka 主題中獲取數據,然後將數據寫到不同的 Elasticsearch 集羣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/7b\/7be1e4a41c9c48d0bba4110a120ae7c9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 1:Gairos 的簡化架構展示了該平臺中的主要組件"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這些 Elasticsearch 集羣中, Gairos 查詢服務是查詢數據的網關。Gairos 客戶端將查詢發送到 Gairos 查詢服務,實時獲取數據。爲了滿足 Apache Hive 和 Presto 長期分析的需要,數據還被持久存儲在 HDFS 中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Gairos 中有幾個系統:Apache Kafka、Gairos 攝取管道、 Elasticsearch 集羣,Gairos 查詢服務等。這些系統中的任何一個出了問題,客戶都會受到影響。Gairos 的數據管道數量隨着 Uber 市場業務規模的增長而增長。需要向 Gairos 添加越來越多的數據源來支持新的業務用例。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Uber 用例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過使用 Gairos,我們得到了很多見解——在 Uber 收集用例,包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"動態定價:動態定價服務是讀取需求和供應數據後,根據六邊形計算出特定地點和時間的動態倍數。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"利用實時的供需數據生成司機動態定價和碳排放建議。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從用戶打開 Uber 應用開始,我們把每個行程的基本數據稱爲會話。這一行爲引發了一系列的數據事件,從司機實際接受乘車到行程結束。考慮到系統的複雜性和規模,這些數據會分佈在許多不同的事件流中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"舉例來說,當司機打開 Uber 應用時,它會觸發司機的事件流。這個應用程序將顯示該地區提供的行程(uberPOOL、uberX、UberBLACK 等)和每一條行程的價格,這是由我們的動態定價系統產生的,並且每一條行程價格將作爲單獨的事件出現在印象事件流中。司機接受了行程後,這一請求就被送到我們的調度系統,它把乘客和司機配對,並把他們的車輛分配到該行程。在司機搭乘乘客時,應用程序會向調度系統發送一個“接車完成”事件,該事件將有效地啓動行程。當司機到達目的地,它將發送一個 “行程結束”事件,並在應用程序中表明乘客已下車。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個典型的行程生命週期可以跨越六個不同的事件流,這些事件是由乘客應用、司機應用和 Uber 的後臺調度服務器生成的。這些不同的事件流將串聯到一個 Uber 行程中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要讓我們的服務能夠根據數據洞察力迅速行動,實時地處理這些不同的數據流並進行查詢,這是一個挑戰。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"司機的狀態轉換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖(圖 2)顯示在用戶定義的時間窗內舊金山司機的狀態轉換匯總。這是單個查詢在一秒鐘內返回的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/03\/03c365c3572e04df22d244679023975d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 2:調度查詢服務從 Gairos 獲取並顯示舊金山司機的狀態轉換數據"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"單個司機的狀態轉換"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖(圖 3)顯示了舊金山的單個司機應用在用戶定義的時間窗口中的所有狀態轉換。這個查詢與前面的查詢相同,只是多了一個過濾器來匹配給定的司機應用 UUID。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b9\/b90bf076b541be25fcc82ede192be872.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 3:調度服務爲給定司機獲取司機程序狀態數據並顯示"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"按地理位置劃分的司機使用情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖(圖 4)顯示了按地理位置劃分的司機使用情況。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d8\/d808e854d80d04258df55b678fea7836.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 4:按地理位置劃分的司機使用情況"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,讓我們通過 Gairos 的數據來了解一下動態定價的原理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有時,許多人要求乘車,以致路上沒有足夠的車運送他們。舉例來說,壞天氣,高峯時間和特殊情況,都能讓異常多的人在同一時間想乘坐 Uber。當需求很大時,可以提高票價,以幫助確保需要搭車的人也能搭到車,這就是所謂的動態定價。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要計算由"},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/h3\/","title":"","type":null},"content":[{"type":"text","text":"H3"}]},{"type":"text","text":"("},{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/h3\/","title":"","type":null},"content":[{"type":"text","text":"Uber的六角分層空間指數"}]},{"type":"text","text":")定義的六邊形的動態倍數,需要從 Gairos 查詢請求數量(需求)和可用的司機數量(供給)來獲得最新數據。將這些數據輸入定價模型,定價模型將生成該位置的動態倍數。圖 5 顯示了奧克蘭體育場周圍不同的六邊形區域的動態倍數,當時正在進行比賽。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/33\/3365b126d3de00ad28744182a7a98eeb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 5:當有賽事時,奧克蘭體育場不同六邊形的動態倍數"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"可擴展性 \/ 可靠性方面的挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Gairos 的最初實現中,我們遇到了一些技術挑戰和無法預料的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於使用 Gairos 的用例增加,實時數據流也隨之增加。爲了方便起見, Gairos 提供了 1500+ TB 的可查詢數據總量,並提供了 30 個以上的生產線。總共有 4.5 萬億條記錄,集羣有 20 多個。在 Gairos 每秒發生的事件超過一百萬次。越來越重要的是,要讓系統更加穩定、可擴展和可持續,以便爲越來越多的用例提供動力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來,我們將重點介紹在開始擴展 Gairos 之後出現的一些技術挑戰:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"多種用例共享同一個集羣將導致集羣的不穩定性。某個用例中的某些顯著變化可能會影響該集羣中其他所有用例。舉例來說,如果一個用例的輸入數據量翻倍,就會影響到其它用例的數據可用性。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攝取管線的滯後性是一個對所有實時流水線的普遍挑戰。SLA(service level agreement,服務水平協議)通常是非常緊湊的,從幾秒鐘到幾分鐘不等。若管道中的任何一個組件變慢,就會造成延遲和 SLA 錯誤。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢性能因某些客戶端的流量峯值而降低。因爲是多租戶系統,突如其來的流量高峯可能會影響同一個集羣中運行的某些查詢。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些數據源已經不再使用。在將用例加載 Gairos 之後,就無法自動地檢查這些用例的使用情況了。不使用數據時,最好爲其他用例騰出資源。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些繁瑣的查詢會導致整個 Elasticsearch 集羣變慢。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Elasticsearch 集羣主節點宕機。這可能有多種原因:網絡不穩定,元數據太大無法管理,等等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些節點有很高的 CPU 負載。這類節點有熱點問題,換句話說,它們處理的分片或讀寫流量超出了合理的資源處理能力(CPU \/ 內存 \/ 網絡)。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些節點會崩潰。可能是由於磁盤故障或其他硬件故障。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些分片丟失。當一次處理多個節點時,分片僅對這些節點可用。在分片中我們可能會丟失數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"待命工程師經常被派去維修這些管道和系統,費用很高。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但我們第一次迭代 Gairos 的主要問題是,Gairos 的數據是如何使用的,而不會返回到 Gairos 以指導系統的優化和持續改進。Gairos 沒有主動檢查數據是否按規定使用,是否能夠根據變化進行調整 (流量模式、查詢模式等等)。對於 Gairos 自優化項目,我們閉環(圖 6),讓用戶查詢驅動優化,使 Gairos 更穩定、更可擴展、更可持續。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c5\/c544193d1b1b1517e1340b3b6da38853.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 6:新的架構用紅色箭頭指示的新數據流爲 Gairos 閉環"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要讓 Gairos 平臺更穩定,更可擴展,更低的維護成本,就必須讓系統更高效、更智能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5e\/5e39578f6141164f6afb8ab1467eb9d4.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 7:高層架構展示了平臺中的數據流。紅色箭頭代表新的數據流,淺綠色組件代表兩個新的優化:Giaros 查詢分析器和優化引擎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經修訂的高層結構如上圖 7 所示。該系統的主要組成部分如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"客戶端"},{"type":"text","text":":Gairos 的客戶端可以是一項服務、一個儀表盤、一個數據分析師等。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Apache Kafka"},{"type":"text","text":":我們使用 Apache Kafka 作爲消息隊列系統來處理服務中的事件、RT-Gairos 查詢以及 Gairos 平臺的指標和事件。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Gairos-Ingestion"},{"type":"text","text":":Gairos-Ingestion 組件接收來自不同數據源的數據並向 Gairos 發佈事件。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Elasticsearch 集羣"},{"type":"text","text":":這些集羣從 Gairos-Ingestion 管道中存儲輸出數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"RT-Gairos(Real-time-Gairos)"},{"type":"text","text":":RT-Gairos 是 Gairos 查詢服務。它作爲所有 Elasticsearch 集羣的網關。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"查詢分析器"},{"type":"text","text":":Gairos 查詢分析器分析從 RT-Gairos 收集的查詢,併爲我們的優化引擎提供一些見解。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"優化引擎"},{"type":"text","text":":Gairos 優化引擎根據查詢結果和系統統計數據優化 Gairos 的攝取管道、 Elasticsearch 集羣 \/ 索引設置以及 RT-Gairos。舉例來說,一個攝取管道需要使用多少容器,才能達到 SLA 99% 的要求?你想用多少分片來處理寫 \/ 讀流量?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面,我們詳細介紹一下這些組件在整個 Gairos 生態系統中各自負責的內容。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"客戶端"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端可以是服務,也可以是數據分析師等非服務用戶。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務客戶端包括所有依賴 Gairos 爲用戶提供服務的實時服務,包括我們的動態定價和行程預測服務。這些服務會將一些事件發送到 Apache Kafka,供下游服務和管道處理。在爲請求提供服務時,它們可以從 Gairos 中查詢一些數據,然後作出決策。例如,預測服務可能需要查詢來改進預測,以預測高流量事件中司機夥伴的需求和供應,或者我們的動態定價服務可能會利用 Gairos 根據需求、供應和一些預測輸入來確定動態倍數。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Apache Kafka"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache Kafka 是一種分佈式流媒體平臺,允許客戶端發佈 \/ 訂閱事件流。全部實時服務都可以向其發送重要事件,以供下游服務 \/ 管道使用。RT-Gairos 還使用它來收集運行在 Gairos 中的所有查詢。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Gairos-Ingestion(加工層)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos-ingestion 是一種攝取框架,用於處理來自不同數據源的數據,並將其發佈到 Gairos。有些數據源使用 Apache Spark 流。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Elasticsearch(Gairos 存儲層)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos 存儲層 Elasticsearch 對 Gairos-Ingestion 使用的 30 多個不同數據源的數據進行索引,併爲 Gairos 客戶的查詢做了準備。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"RT-Gairos(查詢層)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RT-Gairos 作爲 Gairos 的網關。在到達 Gairos 存儲層之前,所有的查詢都會經過它。實時 Gairos 會強制執行訪問控制、提供路由,並緩存一些查詢結果。RT-Gairos 會收集所有到 Gairos 的查詢,並推送到 Apache Kafka 主題。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"查詢分析器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢分析器對從 RT-Gairos 收集到的查詢進行分析,並生成用於 Gairos 優化引擎輸入的見解。首先,利用簡單的技術(過濾指標、聚合、時間範圍、分片數、索引數)來生成查詢模式。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"優化引擎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos 優化引擎根據系統統計數據和從 Query Analyzer 獲得的查詢信息,推薦使用其生命週期知識庫進行一些優化。這會更新 Gairos 的設置:Ingestion-path、RT-Gairos 和 Elasticsearch。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些設置更改可能需要進行基準測試,以瞭解在應用給定的更改之前 KPI 是否會得到改進。舉例來說,對於一個數據源,最佳的分片數量是多少?這就是索引基準測試服務的作用。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"索引基準服務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要優化 Gairos 的設置,我們需要使用一個基準測試工具來比較基於已定義 KPI 的不同設置(讀 \/ 寫吞吐量、延遲、內存使用等等)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖 8 所示,我們概述了 Gairos 基準測試服務的不同組成部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/71\/718a35b99e498d539e5c197759fa44bc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 8:索引基準服務將進行測試並保存測試結果"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些組件包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Elasticsearch Production Clusters"},{"type":"text","text":":Elasticsearch 生產集羣包含用於負載測試的將複製到暫存的生產數據。生產索引可用作基準測試基準。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Elasticsearch Staging Clusters"},{"type":"text","text":":這些集羣被用來存儲測試數據,也就是隨機生成的數據,或者用來做實驗的生產數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Benchmarking Service"},{"type":"text","text":":基準測試服務接受索引的不同設置,並針對不同設置的索引進行基準測試。測試完成後,測試結果可供其他服務使用。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Load Test Tool"},{"type":"text","text":":給定大量的讀 \/ 寫請求,這個工具可以模擬不同數量的讀 \/ 寫 QPS(每秒查詢)並記錄 KPI。讀取將從生產中的 RT-Gairos 收集查詢。寫入將從生產中使用的相關 Apache Kafka 主題或直接發佈主題進行模擬。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos 基準測試服務將接受 Gairos 優化引擎的請求,並進行基準測試。基準測試服務將複製單個索引,而不是從生產到暫存的整個歷史記錄,從而提高性能並減少資源使用。如果各個索引的性能都有提高,那麼這個數據源的總體性能也會提高,因爲它可以獨立執行鍼對不同索引的查詢。在評估測試結果之後,優化引擎可以決定是否更改生產環境中的索引設置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖 7 所示,整個系統涉及不少步驟。這些步驟包括:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos 客戶端將請求發送到 RT-Gairos 以獲取數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos-ingestion 攝取來自 Apache Kafka 主題的數據並將其發佈到 Elasticsearch 集羣。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos 索引數據,併爲查詢做好準備。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"RT-Gairos 將查詢轉換爲 Elasticsearch 查詢,並從 Elasticsearch 集羣中獲取數據。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"RT-Gairos 將數據發送回客戶端。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":6,"align":null,"origin":null},"content":[{"type":"text","text":"RT-Gairos 向 Apache Kafka 主題發送查詢信息。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":7,"align":null,"origin":null},"content":[{"type":"text","text":"Sample Elasticsearch 集羣數據定期 向 Apache Kafka 主題發送信息。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":8,"align":null,"origin":null},"content":[{"type":"text","text":"Query Analyzer 從查詢 Apache Kafka 主題中提取查詢信息進行分析。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":9,"align":null,"origin":null},"content":[{"type":"text","text":"Optimization Engine 從 Apache Kafka 主題中提取 Gairos 平臺統計數據進行分析。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":10,"align":null,"origin":null},"content":[{"type":"text","text":"Optimization Engine 從 Query analyery 分析器中提取 Gairos 查詢見解,以查看是否需要執行任何操作。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":11,"align":null,"origin":null},"content":[{"type":"text","text":"Optimization Engine 將優化計劃推送到 Gairos 平臺的不同組件。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"優化策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們應用了一些優化策略,其他組織也可以用來優化他們的實時智能平臺。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分片和查詢路由。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於查詢模式和簽名的緩存。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"合併索引。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"處理繁重的查詢。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引模板優化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分片優化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"界定索引範圍。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"清除未使用的數據。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們將逐一詳細闡述。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"分片和查詢路由"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分片就是通過一些鍵對數據進行分區,這樣就可以將鍵相同的數據放入一個分片中。當寫入 Elasticsearch 索引時,必須提供鍵才能將文檔放到正確的分片中。在查詢數據時,如果在查詢中指定了鍵,則可以向特定的分片發送查詢,而不是向所有分片發送查詢。這樣減少查詢所需的節點數量,可以提高延遲,提高彈性(如果單個節點宕機,但查詢不需要,也沒關係)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假設我們想給舊金山的所有司機發送一個促銷優惠,我們需要司機列表。在下圖 9 中,我們查詢的是舊金山的所有司機。在頂部的數據沒有按照城市進行分片,查詢必須在所有四個分片中進行,以檢查是否有司機可用。在底部的數據是按照城市進行分片的。查詢只需從包含舊金山的司機的分片中檢索數據即可。可以看到,進行查詢的次數從 4 次減少到 1 次。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/5e\/5e5a97dca5ee5606719a01e4949ca7e5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 9:舊金山中的司機查詢需要查詢所有分片而不需要進行分片,而它只查詢一個帶分片的分片"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分片的一個常見問題是熱點問題(某些分片需要處理比其他分片高得多的寫入 \/ 查詢流量)。例如,如果我們按城市 ID 來分發聚合的匿名司機夥伴數據,有些城市(包括舊金山)的規模遠比小城市大,從而導致了特定分片或節點負擔過重。爲了幫助分配決策和負載分配,要使分片的大小和效用大致相等,這一點非常重要。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在進行分片時需要考慮的因素如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Write QPS"},{"type":"text","text":":此因素要求分片應該能夠處理高峯期的流量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Read QPS"},{"type":"text","text":":此因素要求分片應該能夠處理峯值查詢。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Filters"},{"type":"text","text":":查詢中使用的前 x 個頻繁過濾器。頂部過濾器可以被認爲是可能的分片關鍵候選者。過濾器必須具有足夠多的不同值。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"SLA"},{"type":"text","text":":無論是分析用例還是實時用例。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Shard Size"},{"type":"text","text":":我們建議將分片大小控制在 60 GB 以內。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據 Write\/Read QPS 和分片大小計算分片的數量。下面是尋找分片鍵的過程(圖 10)。一旦確定了分片鍵,我們就使用歷史數據來檢查分片分佈是否在 Gairos 給定的閾值內。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/de\/dea2c86938787ad4ac4f3c8f9967a139.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖 11 所示是一個簡化的分片示例。對於本例,假設每個節點可以處理 3000 個 Write QPS,並且最多可以存儲 60 GB 數據。僅考慮數據大小和峯值 Write QPS。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d6\/d6199559c8caca653c656fde562a2c55.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 11:基於給定的約束條件,將四個具有不同數據量(即我們平臺上的用戶量較大)和不同 QPS 的城市劃分爲四個分片。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分片必須滿足以下約束:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個分片的峯值 Write QPS <= 3000 QPS"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個分片的數據大小 <= 60 GB"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣做的目的是將數據儘可能均勻地分佈在這些分片上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據每個城市的數據大小,我們可以估算出分片的數量:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Shard # based on data size"},{"type":"text","text":"(30GB + 50GB + 80GB + 20GB)\/60GB = 3"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據峯值 QPS,我們可以得到另一個估計的分片數量:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Shard # based on peak QPS"},{"type":"text","text":"(2k + 3k + 5k + 1k)\/3k = 4"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"求出這兩個估計值的最大值:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"**Shard #**max(3, 4) = 4"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這四座城市將被放在四個分片中。舊金山和南達科他州可以放在同一個分片裏。洛杉磯可以放在一個分片裏。紐約可以分成兩個分片。這樣數據就會更均勻地分佈在不同的分片中,同時每個節點都可以容納數據並處理峯值 QPS。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果要查詢舊金山的司機,可以直接轉到 1 號分片。而查詢紐約的司機,則需要同時指向 3 號和 4 號分片。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了緩解傾斜的分片和熱點問題,我們爲 Gairos 開發了一種定製分片算法,下表 12 是默認分片(之前)和我們的分片算法(之後)的最大 \/ 最小文檔數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
每個分片的最大文檔數每個分片的最小文檔數最大 \/ 最小
之前4700 萬1700 萬2.76
之後3000 萬2300 萬1.3"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"表 12:我們的分片算法生成的分片在文檔數量上差異較小,文檔在分片上分佈更均勻"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看出,文檔在這些分片中的分佈得更均勻。在 Gairos 的默認分片算法中,每個分片的最大和最小文檔數比是 2.76;而我們的定製分片算法是 1.3。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了檢查它們可以支持的延遲和併發用戶,我們做了一些基準測試。需求數據源的結果如下。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 13 顯示了不同數量客戶端下的延遲。可見,有分片的數據延遲比沒有分片的低。客戶端數量越多,差異就越大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/63\/63ced927562c0d8ff9b662ff86c903c3.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 13:對於需求數據來說,使用分片的延遲要低得多,並且隨着客戶端數量的增加,這種差異將會越來越大,這在本用例中已經有描述"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在不同數量的客戶端中,圖 14 顯示了它能夠支持的併發用戶數,可以看到,有分片的 QPS 的最高數量大約是沒有分片的 4 倍左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f7\/f7282cc4da9dbdc5c45995fdebdeb6a0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 14:有分片的 QPS 是沒有分片的 QPS 的 4 倍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們分享一下第二個數據源"},{"type":"codeinline","content":[{"type":"text","text":"supply_geodriver"}]},{"type":"text","text":"的一些優化結果。相對於需求數據源(存儲乘客請求),文檔數量更多,數據大小更大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/cb\/cbae4eb8a2efed4272f55a8de9d7938d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 15:使用分片的延遲較高,並且差異隨着客戶數量的增加而增加"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖 15 所示,使用分片後,平均延遲較差。就其能夠支持的併發用戶數量而言,延遲是未使用分片時間的 4 倍,見下圖 16。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0b\/0bd032e5262ad84d670346778766e27d.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 16:有分片的 QPS 是沒有分片的 QPS 的 4 倍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三個數據源是"},{"type":"codeinline","content":[{"type":"text","text":"supply_status"}]},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c7\/c70c301f13e0f26a572a5bb419ccd334.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 17:當客戶端數量較少時,使用分片的延遲較高,當客戶端數量超過 200 時,延遲就會降低"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/bb\/bb3be254a6b677eade201c96a9129cf6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 18:有分片的 QPS 是沒有分片的 QPS 的 4 倍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 17 表明,當客戶端數量較少時,使用分片的平均延遲較高。當客戶端數量超過 200 的情況下,平均延遲會降低。從圖 18 中 可以看出,使用分片時,它可以支持的併發用戶最多,大約是沒有分片時的 4 倍。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上所述,延遲對於某些大型數據源來說可能更糟糕,而且它能夠支持的併發用戶數量總是 4 倍於不分片的數量。要想在某些大型數據源中獲得延遲和可擴展性,我們可以爲每一個分區調整分區大小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a0\/a0a4558dce9b3a1d8f530c490060a389.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 19:動態定價集羣的 CPU 負載呈現出每日模式,並在一天中隨着時間的推移而增加。峯值 CPU 負載從 60 降低到 10,每個節點的負載在一天中略有變化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲分片策略的副產品,我們能夠穩定定價集羣,如圖 19 所示。由於所有索引都是日索引,所以我們的定價集羣中的節點的 CPU 負載都呈現每日模式。在一天的時間裏,我們可以看到 CPU 負載隨着時間的推移而增加。將分片策略應用到定價集羣中的所有數據源上,CPU 負載穩定。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"基於查詢模式和簽名的緩存"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最簡單的緩存方法是緩存所有查詢結果。但由於我們的數據非常大,這些結果的總大小會比原始數據大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,有些查詢的執行頻率不高,其緩存命中率也比較低。爲了讓緩存更節省資源,我們又引入了兩個概念:查詢簽名和查詢模式。我們先通過一個 Gairos 查詢的例子,來了解 Gairos 查詢是什麼樣子的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/99\/991250487f7b860f510b8eb4f7d2f0bd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gairos 查詢是一個 JSON 對象,它可能包含下列字段:"},{"type":"codeinline","content":[{"type":"text","text":"data source"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"granularity"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"by"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"filter"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"aggregations"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"bucketBy"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"sort"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"limit"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"having"}]},{"type":"text","text":"等。在定義簽名時,只使用以下字段:"},{"type":"codeinline","content":[{"type":"text","text":"datasource"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"granularity"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"by"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"filter"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"aggregations"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"bucketBy"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"sort"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"limit"}]},{"type":"text","text":"。查詢簽名由這些字段生成,並對每個字段進行排序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢模式的定義與字段集相同。惟一的區別是查詢模式只考慮所使用的列,而忽略了過濾器中的操作符和值。對於 Gairos 查詢,可以使用查詢模式和簽名進行更有效的分析。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據查詢模式,我們可以定義 RT-Gairos 的緩存規則,以便 RT-Gairos 能夠對常用的查詢結果進行緩存。舉例來說,客戶端以固定的時間間隔(1 分鐘、5 分鐘、1 小時等)來拉取最近兩週的數據。若能按天緩存數據,則索引命中率將大大提高,緩存可用於改進搜索性能。可以對範圍重疊的重複查詢以及基於查詢模式的時間粒度應用相似的策略。爲提高緩存命中率,需要對查詢進行分片,在此過程中,如果查詢是可分片的,每一個查詢都會根據查詢的時間範圍分片成多個小查詢。有些聚合不能從單一子查詢結果中獲取聚合結果。這些查詢將存儲在 Elasticsearch 集羣而非緩存中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面我們重點介紹一下緩存"},{"type":"codeinline","content":[{"type":"text","text":"rider_sessions"}]},{"type":"text","text":"(樣本數據集)的一些基準測試結果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/1c\/1cea94f14622302e1f2f5cce3e76e657.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 20:使用緩存的延遲要低得多,而且隨着客戶端數量的增加,差異也會顯著增加。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0d\/0d01edebcc2fc9ef9d8fdb01194b8d15.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 21:有緩存的 QPS 是沒有緩存的 QPS 的 10 倍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖 20 所示,當我們將緩存應用於這些查詢時,平均延遲會大大降低。從圖 21 可以看出,它能 夠支持的併發用戶數量非常大。因爲大部分針對"},{"type":"codeinline","content":[{"type":"text","text":"rider_sessions"}]},{"type":"text","text":"的查詢都很麻煩,所以我們將在其他數據源上進行更多的測試,以驗證我們得到的結果。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"supply_status"}]},{"type":"text","text":"的緩存統計如圖 22 所示。可以看出,"},{"type":"codeinline","content":[{"type":"text","text":"supply_status"}]},{"type":"text","text":"的命中率在 80% 以上。命中率 QPS 在 50 左右,而設置 QPS 在 10 左右。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/65\/65c7e0ef655ccd9beb648bcb1ede704f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 22:"},{"type":"codeinline","content":[{"type":"text","text":"supply_status"}]},{"type":"text","text":"的緩存命中率很高,命中 QPS 在 50 左右,而設置 QPS 在 10 左右"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一個數據源"},{"type":"codeinline","content":[{"type":"text","text":"demand_jobs"}]},{"type":"text","text":"如圖 23 所示。命中率爲 80%。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/38\/38f6a252c34243292630ba1646fa9f17.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 23:"},{"type":"codeinline","content":[{"type":"text","text":"demand_jobs"}]},{"type":"text","text":"的緩存命中率在 80% 左右,命中有一些峯值"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/4f\/4f9df377cf71ecd69377302534b8694b.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 24:"},{"type":"codeinline","content":[{"type":"text","text":"supply_geodriver"}]},{"type":"text","text":"的緩存命中率在 30% 左右"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b2\/b2bb6cf76b4f4e0c709111b93adc2ccd.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 25:對於需求,根本沒有緩存命中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,從圖 25 可以看出,需求的緩存命中率爲 0。可見,對於不同的數據源,使用緩存的效果有很大差異。爲了提高緩存命中率,我們計劃做更多的調整。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"合併索引"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Elasticsearch 是使用倒置索引來使搜索速度更快。當刪除一個文檔時,該文檔將被標記爲已刪除,它仍然存在於倒置索引中。已刪除文檔將從搜索結果中排除。若刪除的文檔數量較多,則索引大小較大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刪除這些文件還會影響搜索性能。例如,在圖 26 中,司機 D1、D2、D3 都更新了多次。可以看到,有 8 個文檔,而司機只有 3 個。在合併索引之後,將清除這些刪除的文檔,並且使索引的大小減小。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/47\/4732af678fec6d89bfcdd746ad10bff1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 25:對於需求,根本沒有緩存命中"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一個提高索引性能的重要因素是索引中段的數量。我們會做一些基準測試,以決定應該使用什麼時候才能合併索引。該基準將使用的關鍵指標是索引大小(存儲索引的存儲空間有多大)和搜索延遲(查詢數據所需的時間)。收集到的來自實時系統的查詢將用於搜索性能基準測試。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在確定了每個數據源的合併索引標準之後,優化引擎可以執行一些索引優化任務。集羣將限制合併索引任務的數量,以便重新索引對於集羣性能沒有太大影響。爲了防止性能的顯著下降,任何時候最多隻能運行一個合併索引任務。若發現有任何明顯的影響,所有強制合併的任務將被中止。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"處理繁重的查詢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某些重度查詢可能會影響整個集羣的性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了使集羣更加穩定,可採取下列策略:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"分割查詢"},{"type":"text","text":":分割查詢將多個索引查詢成多個小查詢,可以限制任意時刻查詢的分片數量。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"速率限制"},{"type":"text","text":":識別重度查詢模式並限制重度查詢的速率,可以提高集羣的性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"緩存或創建滾動表"},{"type":"text","text":":對於一些命中率較高的查詢,可以考慮使用緩存或滾動表來提高性能。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"遷移到 Hive\/Presto"},{"type":"text","text":":對於批量使用的情況,有些可能會遷移到 Hive\/Presto。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"索引模板優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從運行的查詢中,可以得到每個數據源中每個字段的以下信息,這樣我們就可以爲每個字段確定索引設置。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"是否使用?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"是否用於過濾?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"是否用於聚合?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"是否需要模糊搜索?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個數據源都必須回答以下問題:用戶是否需要拉取原始數據。基於這些輸入,可以爲每個數據源獲得最佳索引設置。可以將優化引擎更新爲數據源存儲的模板,以便我們能夠獲得更好的磁盤空間或搜索性能。某些設置(例如禁用源文件)是不向後兼容的,在執行之前需要經過一些批准。注意,禁用源文件會導致無法進行更新和重新編制索引。不應該在業務邏輯需要更新文檔時禁用源代碼。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲數據是自動持久化的,並且很容易重放,所以發佈的 Apache Kafka 主題在源被禁用前會轉變爲熱管道主題,這樣就可以通過重放 Apache Kafka 持久主題中的事件來遷移數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖 27 顯示了確定每個字段設置的詳細工作流程。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/b6\/b69415fbe4dc93081e10c400091809ca.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 27:根據使用情況,確認各字段的設置"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"分片優化"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一個數據源複製一個到暫存集羣的索引,然後使用索引工具將複製的索引重新索引到不同數量的分片。對於已複製和已重新索引的數據,基準測試將收集性能數據。所用查詢將來自用戶以前收集的查詢。在爲每個數據源確定了最佳分片後,查詢優化可以在新索引中設置新分片編號。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"界定索引範圍"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據觀察,許多非常小的索引是在一些集羣中生成的。這類索引有大量的分片。這在集羣中引起了分片分配問題。一些節點可能有許多未使用的分片,而一些節點可能有許多繁忙分片,這會導致節點之間的負載不平衡和資源利用率低。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於時間戳出界,通常會出現這些小的索引。在寫入 Elasticsearch 集羣時,根據每個數據源的數據保留和數據預測對數據進行過濾,從而避免創建這些接近空的索引,減少分片的數量。這裏有一個羣集示例。下面的圖 28 顯示在我們的一個集羣中,經過清理這些小索引之後,分片數量從 4 萬左右下降到 2 萬。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/02\/028fe503550e205d3c85cf260f292716.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖 28:清理完這些小索引後,分片數量從 4 萬左右下降到 2 萬"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"清除未使用的數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所收集的查詢可以確定最近 X 天內是否使用了數據源。基於此信息,Gairos 優化引擎可以執行各種數據清理任務,比如觸發通知,刪除數據存儲中數據源的索引等。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"未來工作"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這些優化策略已經被應用於一些主要的數據源中。我們正計劃將優化範圍擴展到所有數據源,尤其是那些用於分片和緩存的數據源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個過程不是自動化的。一旦我們從對這些數據源的優化中積累了足夠的領域知識,並應用了這些優化,我們就會投入更多的精力來實現整個過程的自動化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一些機器學習\/深度學習方法也可用於查詢分析器,我們將在未來的迭代中對此進行探索。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Gang Zhao,在 Uber 領導 Gairos 優化,同時專注於存儲層(Elasticsearch)和查詢層的優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Wenrui Meng,Uber 高級軟件工程師,實力與實時數據指標優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Qing Xu,Uber 市場情報部工程經理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Yanjun Huang,Uber 核心基礎設施團隊高級軟件工程師,也是 Elasticsearch 專家。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/eng.uber.com\/gairos-scalability\/","title":"xxx","type":null},"content":[{"type":"text","text":"https:\/\/eng.uber.com\/gairos-scalability\/"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章