作業幫PB級低成本日誌檢索服務

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌是服務觀察的主要方式,我們依賴日誌去感知服務的運行狀態、歷史狀況;當發生錯誤時,我們又依賴日誌去了解現場,定位問題。日誌對研發工程師來說異常關鍵,同時隨着微服務的流行,服務部署越來越分散化,所以我們需要一套日誌服務來採集、傳輸、檢索日誌。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於這個情況,誕生了以ELK爲代表的開源的日誌服務。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"需求場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在我們的場景下,高峯日誌寫入壓力大(每秒千萬級日誌條數);實時要求高:日誌處理從採集到可以被檢索的時間正常1s以內(高峯時期3s);成本壓力巨大,要求保存半年的日誌且可以回溯查詢(百PB規模)。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"ElasticSearch的不足"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ELK"},{"type":"text","text":"方案裏最爲核心的就是"},{"type":"text","marks":[{"type":"strong"}],"text":"ElasticSearch"},{"type":"text","text":", 它負責存儲和索引日誌, 對外提供查詢能力。"},{"type":"text","marks":[{"type":"strong"}],"text":"Elasticsearch"},{"type":"text","text":" 是一個搜索引擎, 底層依賴了"},{"type":"text","marks":[{"type":"strong"}],"text":"Lucene的倒排索引技術"},{"type":"text","text":"來實現檢索, 並且通過"},{"type":"text","marks":[{"type":"strong"}],"text":"shard"},{"type":"text","text":"的設計拆分數據分片, 從而突破單機在存儲空間和處理性能上的限制。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 寫入性能"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ElasticSearch寫入數據需要對日誌索引字段的倒排索引做更新,從而能夠檢索到最新的日誌。爲了提升寫入性能,可以做聚合提交、延遲索引、減少refersh等等,但是始終要建立索引, 在日誌流量巨大的情況下(每秒20GB數據、千萬級日誌條數), 瓶頸明顯。離理想差距過大,我們期望寫入近乎準實時。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 運行成本"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ElasticSearch需要定期維護索引、數據分片以及檢索緩存, 這會佔用大量的 CPU 和內存,日誌數據是存儲在機器磁盤上,在需要存儲大量日誌且保存很長時間時, 機器磁盤使用量巨大,同時索引後會帶來數據膨脹,進一步帶來成本提升。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"•對非格式化的日誌支持不好"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ELK需要解析日誌以便爲日誌項建立索引, 非格式化的日誌需要增加額外的處理邏輯來適配。存在很多業務日誌並不規範,且有收斂難度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總結:日誌檢索場景是一個"},{"type":"text","marks":[{"type":"strong"}],"text":"寫多讀少"},{"type":"text","text":"的場景, 在這樣的場景下去維護一個龐大且複雜的索引, 在我們看來其實是一個性價比很低的事情。如果採用ElasticSearch方案,經測算我們需要幾萬核規模集羣,仍然保證不了寫入數據和檢索效率,且資源浪費嚴重。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"日誌檢索設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面對這種情況, 我們不妨從一個不同的角度去看待日誌檢索的場景, 用一個更適合的設計來解決日誌檢索的需求, 新的設計具體有以下三個點:"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 日誌分塊"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同樣的我們需要對日誌進行採集,但在處理日誌時我們不對日誌原文進行解析和索引,而是通過日誌時間、日誌所屬實例、日誌類型、日誌級別等日誌元數據對日誌進行分塊。這樣檢索系統可以"},{"type":"text","marks":[{"type":"strong"}],"text":"不對日誌格式做任何要求"},{"type":"text","text":",並且因爲沒有解析和建立索引(這塊開銷很大)的步驟, 寫入速度也能夠達到極致(只取決於磁盤的 IO 速度)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/03\/21\/03df897547bb9a63ab8f10980a53f821.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單來說, 我們可以將一個實例產生的同一類日誌按時間順序寫入到一個文件中, 並按時間維度對文件拆分. 不同的日誌塊會分散在多臺機器上(我們一般會按照實例和類型等維度對日誌塊的存儲機器進行分片), 這樣我們就可以在多臺機器上對這些日誌塊併發地進行處理, 這種方式是支持橫向擴展的. 如果一臺機器的處理性能不夠, 橫向再擴展就行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那如何對入日誌塊內的數據進行檢索呢?這個很簡單, 因爲保存的是日誌原文,可以直接使用 grep 相關的命令直接對日誌塊進行檢索處理。對開發人員來說, grep是最爲熟悉的命令, 並且使用上也很靈活, 可以滿足開發對日誌檢索的各種需求。因爲我們是直接對日誌塊做追加寫入,不需要等待索引建立生效,在日誌刷入到日誌塊上時就可以被立刻檢索到, 保證了檢索結果的"},{"type":"text","marks":[{"type":"strong"}],"text":"實時性"},{"type":"text","text":"。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 元數據索引"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我們看看要如何對這麼一大批的日誌塊進行檢索。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們當日志塊建立時, 我們會基於日誌塊的元數據信息搭建索引, 像服務名稱、日誌時間, 日誌所屬實例, 日誌類型等信息, 並將日誌塊的存儲位置做爲value一起存儲。通過索引日誌塊的元數據,當我們需要對某個服務在某段時間內的某類日誌發起檢索時,就可以快速地找到需要檢索的日誌塊位置,併發處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/16\/06\/16fd67096b1b93e91ed5b4eb7914c706.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引的結構可以按需構建, 你可以將你關心的元數據信息放入到索引中, 從而方便快速圈定需要的日誌塊。因爲我們只對日誌塊的元數據做了索引, 相比於對全部日誌建立索引, 這個成本可以說降到了極低, 鎖定日誌塊的速度也足夠理想。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 日誌生命週期與數據沉降"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌數據以時間維度的方向可以理解爲一種時序數據, 離當前時間越近的日誌會越有價值, 被查詢的可能性也會越高, 呈現一種冷熱分離的情況。而且冷數據也並非是毫無價值,開發人員要求回溯幾個月前的日誌數據也是存在的場景, 即我們的日誌需要在其生命週期裏都能夠對外提供查詢能力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於這種情況,如果將生命週期內的所有日誌塊都保存在本地磁盤上, 無疑是對我們的機器容量提了很大的需求。對於這種日誌存儲上的需求,我們可以採用壓縮和沉降的手段來解決。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/75\/16\/75f67436650eedd2191c05ae2311d216.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單來說,我們將日誌塊存儲分爲本地存儲(磁盤)、遠程存儲(對象存儲)、歸檔存儲三個級別; 本地存儲負責提供實時和短期的日誌查詢(一天或幾個小時), 遠程存儲負責一定時期內的日誌查詢需求(一週或者幾周), 歸檔存儲負責日誌整個生命週期裏的查詢需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在我們看看日誌塊在其生命週期裏是如何在多級存儲間流轉的, 首先日誌塊會在本地磁盤創建並寫入對應的日誌數據, 完成後會在本地磁盤保留一定時間(保留的時間取決於磁盤存儲壓力), 在保存一定時間後, 它首先會被"},{"type":"text","marks":[{"type":"strong"}],"text":"壓縮"},{"type":"text","text":"然後被上傳至遠程存儲(一般是對象存儲中的標準存儲類型), 再經過一段時間後日志塊會被遷移到歸檔存儲中保存(一般是對象存儲中的歸檔存儲類型)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣的存儲設計有什麼好處呢? 如下面的多級存儲示意圖所示, 越往下存儲的數據量越大, 存儲介質的成本也越低, 每層大概爲上一層的 1\/3 左右, 並且數據是在壓縮後存儲的, 日誌的數據壓縮率一般可以達到"},{"type":"text","marks":[{"type":"strong"}],"text":"10:1"},{"type":"text","text":", 由此看歸檔存儲日誌的成本能在本地存儲的"},{"type":"text","marks":[{"type":"strong"}],"text":"1%"},{"type":"text","text":"的左右, 如果使用了 SSD 硬盤作爲本地存儲, 這個差距還會更大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"價格參考:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"\n\n\n\n\n\n

\n

存儲介質

\n\n \n

\n

參考鏈接

\n\n \n\n\n

\n

本地盤

\n\n \n

\n

https:\/\/buy.cloud.tencent.com\/price\/cvm?regionId=8&zoneId=800002

\n\n \n\n

\n

對象存儲

\n\n \n

\n

https:\/\/buy.cloud.tencent.com\/price\/cos

\n\n \n\n

\n

歸檔存儲

\n\n \n

\n

https:\/\/buy.cloud.tencent.com\/price\/cos

\n\n \n\n"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a5\/fa\/a5605a4fba90fc5a1712d77d65a5c9fa.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那在多級存儲間又是如何檢索的呢? 這個很簡單, 對於本地存儲上的檢索, 直接在本地磁盤上進行即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果檢索涉及到遠程存儲上的日誌塊, 檢索服務會將涉及到的日誌塊下載到本地存儲, 然後在本地完成解壓和檢索。因爲日誌分塊的設計,日誌塊的下載同檢索一樣,我們可以在多臺機器上並行操作; 下載回本地的數據複製支持在本地緩存後一定的時間後再刪除, 這樣有效期內對同一日誌塊的檢索需求就可以在本地完成而不需要再重複拉取一遍(日誌檢索場景裏多次檢索同樣的日誌數據還是很常見)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於歸檔存儲, 在發起檢索請求前, 需要對歸檔存儲中的日誌塊發起取回操作, 取回操作一般耗時在幾分鐘左右, 完成取回操作後日志塊被取回到遠程存儲上,再之後的數據流轉就跟之前一致了。即開發人員如果想要檢索冷數據, 需要提前對日誌塊做歸檔取回的申請,等待取回完成後就可以按照熱數據速度來進行日誌檢索了。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"檢索服務架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在瞭解上面的設計思路後, 我們看看基於這套設計的日誌檢索服務是怎麼落地的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌檢索服務分爲以下幾個模塊:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• "},{"type":"text","marks":[{"type":"strong"}],"text":"GD-Search"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢調度器, 負責接受查詢請求, 對查詢命令做解析和優化, 並從"},{"type":"text","marks":[{"type":"strong"}],"text":"Chunk Index"},{"type":"text","text":"中獲取查詢範圍內日誌塊的地址, 最終生成分佈式的查詢計劃"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"GD-Search"},{"type":"text","text":"本身是無狀態的, 可以部署多個實例,通過負載均衡對外提供統一的接入地址。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• "},{"type":"text","marks":[{"type":"strong"}],"text":"Local-Search"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地存儲查詢器, 負責處理"},{"type":"text","marks":[{"type":"strong"}],"text":"GD-Search"},{"type":"text","text":"分配過來的本地日誌塊的查詢請求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• "},{"type":"text","marks":[{"type":"strong"}],"text":"Remote-Search"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遠程存儲查詢器, 負責處理"},{"type":"text","marks":[{"type":"strong"}],"text":"GD-Search"},{"type":"text","text":"分配過來的遠程日誌塊的查詢請求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Remote-Search"},{"type":"text","text":"會將需要的日誌塊從遠程存儲拉取到本地並解壓, 之後同"},{"type":"text","marks":[{"type":"strong"}],"text":"Local-Search"},{"type":"text","text":"一樣在本地存儲上進行查詢。同時"},{"type":"text","marks":[{"type":"strong"}],"text":"Remote-Search"},{"type":"text","text":"會將日誌塊的本地存儲地址更新到"},{"type":"text","marks":[{"type":"strong"}],"text":"Chunk Index"},{"type":"text","text":"中,以便將後續同樣日誌塊的查詢請求路由到本地存儲上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• "},{"type":"text","marks":[{"type":"strong"}],"text":"Log-Manager"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本地存儲管理器,負責維護本地存儲上日誌塊的生命週期。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Log-Manager"},{"type":"text","text":"會定期掃描本地存儲上的日誌塊, 如果日誌塊超過本地保存期限或者磁盤使用率到達瓶頸,則會按照策略將部分日誌塊淘汰(壓縮後上傳到遠程存儲, 壓縮算法採用了"},{"type":"text","marks":[{"type":"strong"}],"text":"ZSTD"},{"type":"text","text":"), 並更新日誌塊在"},{"type":"text","marks":[{"type":"strong"}],"text":"Chunk Index"},{"type":"text","text":"中的存儲信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• "},{"type":"text","marks":[{"type":"strong"}],"text":"Log-Ingester"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌攝取器模塊, 負責從日誌 kafka 訂閱日誌數據, 然後將日誌數據按時間維度和元數據維度拆分, 寫入到對應的日誌塊中。在生成新的日誌塊同時, "},{"type":"text","marks":[{"type":"strong"}],"text":"Log-Ingester"},{"type":"text","text":"會將日誌塊的元數據寫入"},{"type":"text","marks":[{"type":"strong"}],"text":"Chunk Index"},{"type":"text","text":"中, 從而保證最新的日誌塊能夠被實時檢索到。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• "},{"type":"text","marks":[{"type":"strong"}],"text":"Chunk Index"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"日誌塊元數據存儲, 負責保存日誌塊的元數據和存儲信息。當前我們選擇了"},{"type":"text","marks":[{"type":"strong"}],"text":"Redis"},{"type":"text","text":"作爲存儲介質, 在元數據索引並不複雜的情況下, redis 已經能夠滿足我們索引日誌塊的需求, 並且基於內存的查詢速度也能夠滿足我們快速鎖定日誌塊的需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/e6\/cf\/e654a6bb062cbf193a470c655d9111cf.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"檢索策略"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在檢索策略設計上, 我們認爲檢索的返回速度是追求更快, 同時避免巨大的查詢請求進入系統。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們認爲日誌檢索一般有以下三種場景:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 查看最新的服務日誌"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 查看某個請求的日誌, 依據logid來查詢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3. 查看某類日誌, 像訪問mysql的錯誤日誌, 請求下游服務的日誌等等"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在大部分場景下, 用戶是不需要所有匹配到的日誌, 拿一部分日誌足以處理問題。所以在查詢時使用者可以設置 limit 數量, 整個檢索服務在查詢結果滿足 limit設置的日誌數量時, 終止當前的查詢請求並將結果返回給前端。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外 "},{"type":"text","marks":[{"type":"strong"}],"text":"GD-Search"},{"type":"text","text":"組件在發起日誌塊檢索時, 也會提前判斷檢索的日誌塊大小總和, 對於超限的大範圍檢索請求會做拒絕。(用戶可以調整檢索的時間範圍多試幾次或者調整檢索語句使其更有選擇性)"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"性能一覽"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 1KB每條的日誌進行測試, 總的日誌塊數量在10000左右, 本地存儲使用NVME SSD硬盤, 遠程存儲使用S3協議標準存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"• 寫入"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單核可支持 2W條\/S的寫入速度, 1W 條\/S的寫入速度約佔用 1~2G 左右的內存,可分佈式擴展,無上限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"•  查詢(全文檢索)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於本地存儲的1TB日誌數據查詢速度可在3S以內完成"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於遠程存儲的1TB日誌數據查詢耗時在10S間。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"成本優勢"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在每秒千萬級寫入,百PB存儲上,我們使用十幾臺物理服務器就可以保證日誌寫入和查詢。熱點數據在本地nvme磁盤上,次熱數據在對象存裏,大量日誌數據存儲在歸檔存儲服務上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1. 計算對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲不需要建立索引,我們只需要千核級別就可以保證寫入,同時日誌索引是個寫多讀少的服務,千核可以保證百級別QPS查詢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ES在這個量級上需要投入幾萬核規模。來應對寫入性能和查詢瓶頸,但是仍不能保證寫入和查詢效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2. 存儲對比"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心是在保證業務需求下,使用更便宜的存儲介質(歸檔存儲vs本地磁盤)和更少的存儲數據(壓縮率1\/10vs日誌數據索引膨脹)。能有兩個量級的差距。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"呂亞霖,作業幫基礎架構 - 架構研發團隊負責人。負責技術中臺和基礎架構工作。在作業幫期間主導了雲原生架構演進、推動實施容器化改造、服務治理、GO 微服務框架、DevOps 的落地實踐。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"專題推薦:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.infoq.cn\/theme\/97","title":"xxx","type":null},"content":[{"type":"text","text":"《作業幫AI+大數據技術實踐》"}]}]}]}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章