記十億級Es數據遷移mongodb成本節省及性能優化實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 某智能產品業務數據之前存儲在Elasticsearch(Es)中,磁盤佔用約30T(按照單副本計算),總數據量25億,按照不同業務分類分別存在於不同表中。遷移前,業務存在較嚴重的性能抖動及成本問題,當前業務已經遷移部分數據到mongodb中,遷移後效果明顯,成本可實現十倍級節省,業務抖動問題也得以解決。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當前我司已有數百億Es數據遷移mongodb,同時也有數百億mongodb遷移Es,根本原因業務就是選型錯誤引起。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    本文以該場景業務遷移作爲案例,主要分享以下方面的內容:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongodb適用場景及不適用場景","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongodb和Es各自優勢","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongodb和Es同樣數據,真實磁盤消耗對比","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對《從MongoDB遷移到ES後,我們減少了80%的服務器》一文的一些不同看法","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     沒有萬能的數據庫,本文最後會總結mongodb和Es各自的適用場景,以客觀立場分析評價mongodb和Es,拒絕“捧一個,踩一個”。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     Es絕對是一款優秀的搜索引擎,在模糊匹配、全文搜索、複雜檢索等方面相比mongodb擁有更大的優勢。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     本文對應業務場景查詢比較簡單,查詢更新等條件都是固定字段,不涉及複雜檢索,所以在此場景mongodb更具優勢。另外,mongodb當前默認的wiredtiger存儲引擎,在高壓縮、高性能、鎖粒度等方面進一步提升了mongodb在該場景下的優勢。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     我司已有多個業務數百億級數據從Es遷移mongodb,由於其他業務遷移mongodb的時間較早,之前沒有詳細記錄遷移前後的ES和mongodb詳細資源對比,只有大概資源消耗比值。爲了儘量客觀評價遷移前後的數據對比,因此選擇近期正在從Es遷移mongodb的一個集羣,同時記錄源集羣和目的集羣的詳細資源佔用情況,這樣的對比結果會更加客觀真實。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"關於作者","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 前滴滴出行技術專家,現任 OPPO 文檔數據庫 mongodb 負責人,負責 oppo 數萬億級數據量文檔數據庫 mongodb 內核研發及運維工作,一直專注於分佈式緩存、高性能服務端、數據庫、中間件等相關研發。Github 賬號地址:","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/y123456yz","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"https://github.com/y123456yz","attrs":{}}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"序言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    本文是《mongodb 源碼實現、調優、最佳實踐系列》專欄的第 19篇文章,其他文章可以參考如下鏈接:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/304a748ad3dead035a449bd51","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"Qcon-萬億級數據庫MongoDB集羣性能數十倍提升及機房多活容災實踐","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/0c51f3951f3f10671d7d7123e","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"Qcon現代數據架構-《萬億級數據庫MongoDB集羣性能數十倍提升優化實踐》核心17問詳細解答","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/def1add963ccb58680eea472f","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"百萬級高併發mongodb集羣性能數十倍提升優化實踐(上篇)","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/329cd8f770f445d87e26637bc","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"百萬級高併發mongodb集羣性能數十倍提升優化實踐(下篇)","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/372320c6bb93ddc5b7ecd0b6b","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"盤點2020 |我要爲分佈式數據庫mongodb在國內影響力提升及推廣做點事","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/7b2c1dc67de82972faac2812c","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":" 百萬級代碼量mongodb內核源碼閱讀經驗分享","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/180d98535bfa0c3e71aff1662","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"話題討論| mongodb擁有十大核心優勢,爲何國內知名度遠不如mysql高?","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/d4460b807c9835c6d80707410","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"Mongodb網絡模塊源碼實現及性能極致設計體驗","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/1cac5adcd1b4f3fe512a1457a","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"網絡傳輸層模塊實現二","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/971baa2c4351702de4cff69d1","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"網絡傳輸層模塊實現三","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/209eb17f8ea651f63bd187e36","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"網絡傳輸層模塊實現四","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/ed28fa03590e58fd88e987c82","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"command命令處理模塊源碼實現一","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/6de3aef4978a5078da44a8228","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"command命令處理模塊源碼實現二","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/3184cdc42c26c86e2749c3e5c","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"mongodb詳細表級操作及詳細時延統計實現原理(快速定位表級時延抖動)","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/81d2214c49049f5d70e3eb448","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"[圖、文、碼配合分析]-Mongodb write寫(增、刪、改)模塊設計與實現","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/edit/5932858d57db13d43a8b8d62a","title":"","type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"300條數據變更引發的血案-記某十億級核心mongodb集羣部分請求不可用故障踩坑記","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/draft/65357","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/draft/65357","attrs":{}}]}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"1. 業務背景","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 該業務存儲智能產品相關數據,總數據量20多億,單個集羣ES磁盤消耗約30T。業務遷移背景如下(以下爲業務開發同事整理):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① Es集羣不太穩定,造成秒級耗時,對我們業務影響挺大,感知非常明顯。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 具體我們業務需求,主要是根據用戶id來進行精確查詢,沒有複雜全文檢索、模糊查詢等需求,所以其實用不到Es的優點。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ Es成本太高。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"2. 源Elasticsearch集羣資源及部署情況","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 源ES集羣業務在兩個機房各申請了一個集羣,由業務自己通過雙寫的方式來保障數據一致性,當一個集羣異常業務自己切流量到另外一個集羣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 源ES集羣部署架構及資源規格如下:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.1源Elasticsearch集羣部署架構","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c0/c0c130a98ce04a96cfcfecb477070b40.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 如上圖所示,由於爲了實現兩機房雙活容災及單集羣抖動引起的業務故障,在A機房和B機房各搭建了一個ES集羣,業務通過雙寫來自己維護兩個集羣的數據一致性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當A機房集羣1異常,或者A機房掉電,則業務切流量到B機房備集羣2,對應架構圖如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3d/3d3c2e04e2ef0d8bca0337c6f9a290ad.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.2集羣資源規格","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     A機房集羣1和B機房集羣2內部部署架構完全已有,單個集羣總共有26個節點,每個節點都部署在容器中,單個容器規格資源如下所示:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU:32","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存:64G","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磁盤:2T","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磁盤類型:SSD","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     單個集羣總資源消耗如下:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU:32*26=832","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存:64G*32=2048G","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磁盤:2T*32=64T","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     兩個集羣總資源消耗如下:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"CPU:32*26*2=1664","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存:64G*32*2=4096G","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"磁盤:2T*32*2=128T","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.3源集羣架構業務痛點","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 從上面的分析可以看出,爲了保障業務多活和集羣高可用,業務通過雙寫實現,異常後業務自己判斷切換,這增加了業務痛點。該架構主要痛點如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 成本高,本身每個集羣都是多副本,第2個備集羣進一步增加了成本","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 增加了業務開發難度,業務需要雙寫邏輯","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 數據一致性無法得到保障,例如集羣1異常或者故障,業務讀寫切到集羣2,當集羣1恢復正常,異常這斷時間內,集羣2的數據會比集羣1多。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"④ 不利於業務快速迭代開發","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"⑤ 業務只是按照固定字段做查詢,查詢條件單一,這種場景選擇mongodb本身性能會更好。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"3. 目的mongodb集羣架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 業務開始遷移mongodb的時候,通過和業務對接梳理,該集羣規模及業務需求總結如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 總數據量20多億","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② Es數據單集羣磁盤消耗總和30.5T左右","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 讀寫峯值流量流量很小,幾百上千","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"④ 同城兩機房多活容災","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3.1 mongodb資源評估","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     分片數及存儲節點套餐規格選定評估過程如下:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"內存評估","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我司都是容器化部署,以以網經驗來看,mongodb對內存消耗不高,歷史百億級以上mongodb集羣單個容器最大內存基本上都是64Gb,因此內存規格確定爲64G。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分片評估","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    業務讀寫流量很低,但是數據量較大,因此分片數確定爲2個分片。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"磁盤評估","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 按照以往測試驗證及線上真實數據遷移對比,同樣的數據存入mongodb和Es中真實磁盤消耗佔比如下:","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"mongodb:Es ≈ 1:6","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 25億Es真實磁盤消耗30.5T,預計mongodb磁盤消耗5T左右,考慮到未來數據增長,我們按照50億數據計算,預計需要10T空間。2個分片,因此每個分片5T數據,最終確定單個mongod實例容器磁盤規格5T。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"CPU規格評估","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 由於容器調度套餐化限制,因此CPU只能限定爲16CPU(實際上用不了這麼多CPU)。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"mongos代理及config server規格評估","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 此外,由於分片集羣還有mongos代理和config server複製集,因此還需要評估mongos代理和config server節點規格。由於config server只主要存儲路由相關元數據,因此對磁盤、CUP、MEM消耗都很低; mongos代理只做路由轉發只消耗CPU,因此對內存和磁盤消耗都不高。最終,爲了最大化節省成本,我們決定讓一個代理和一個config server複用同一個容器,容器規格如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 8CPU/8G內存/50G磁盤,一個代理和一個config server節點複用同一個容器。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 分片及存儲節點規格總結:","attrs":{}},{"type":"text","text":"2分片/16CPU、64G內存、5T磁盤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" mongos及config server規格總結:","attrs":{}},{"type":"text","text":"8CPU/8G內存/50G磁盤","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c5/c5ea6d7343c245c94c31fcb4d1040ed1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3.2集羣部署架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     由於該業務所在城市只有兩個機房,因此我們採用2+2+1(2mongod+2mongod+1arbiter模式),在A機房部署2個mongod節點,B機房部署2個mongod節點,C機房部署一個最低規格的選舉節點,如下圖所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/07/075e9fd9ba235f56e6ffa87d7730ae47.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"說明:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 每個機房代理部署2個mongos代理,保證業務訪問代理高可用,任一代理掛掉,對應機房業務不受影響。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 如果機房A掛掉,則機房B和機房C剩餘2mongod+1arbiter,則會在B機房mongod中從新選舉一個主節點。arbiter選舉節點不消耗資源","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 客戶端配置nearest ,實現就近讀,確保請求通過代理轉發的時候,轉發到最近網絡時延節點,也就是同機房對應存儲節點讀取數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"弊端:如果是異地機房,B機房和C機房寫存在跨機房寫場景。如果A B C爲同城機房,則沒用該弊端,同城機房時延可以忽略。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"4. 性能優化過程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 該集羣優化過程按照如下兩個步驟優化:數據遷移開始前的提前預優化、遷移過程中瓶頸分析及優化、遷移完成後性能優化。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4.1數據遷移開始前的提前預操作","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 和業務溝通確定,業務每條數據都攜帶有唯一_id(用戶生成的,不是mongodb內部生成),同時業務查詢更新等都是根據_id維度查詢該設備下面的單條或者一批數據,因此片建選擇_id。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分片方式","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 爲了充分散列數據到2個分片,因此選擇hash分片方式,這樣數據可以最大化散列,同時可以滿足同一個_id數據落到同一個分片,保證查詢效率。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"預分片","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" mongodb如果分片片建爲hashed分片,則可以提前做預分片,這樣就可以保證數據寫進來的時候比較均衡的寫入多個分片。預分片的好處可以規避非預分片情況下的chunk遷移問題,最大化提升寫入性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" sh.shardCollection(\"user_xxx.user_xxx\", {_id:\"hashed\"}, false, { numInitialChunks: 8192} )","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 注意事項:","attrs":{}},{"type":"text","text":"切記提前對ssoid創建hashed索引,否則對後續分片擴容有影響。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"就近讀","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端增加nearest 配置,從離自己最近的節點讀,保證了讀的性能。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"mongos代理配置","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"      A機房業務只配置A機房的代理,B機房業務只配置B機房代理,同時帶上nearest配置,最大化的實現本機房就近讀,同時避免客戶端跨機房訪問代理。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"禁用enableMajorityReadConcern","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 禁用該功能後ReadConcern majority將會報錯,ReadConcern majority功能注意是避免髒讀,和業務溝通業務沒該需求,因此可以直接關閉。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" mongodb默認使能了enableMajorityReadConcern,該功能開啓對性能有一定影響,參考:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://developer.aliyun.com/article/60553","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"MongoDB readConcern 原理解析","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://mongoing.com/archives/29934","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"OPPO百萬級高併發MongoDB集羣性能數十倍提升優化實踐","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"   ","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/y123456yz/reading-and-annotate-mongodb-3.6","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"mongodb源碼分析、更多實踐案例細節","attrs":{}}]}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"存儲引擎cacheSize規格選擇","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 單個容器規格:16CPU、64G內存、7T磁盤,考慮到全量遷移過程中對內存壓力,內存碎片等壓力會比較大,爲了避免OOM,設置cacheSize=42G。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4.2數據遷移過程中優化過程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 全量數據遷移過程中,遷移速度較塊,內存漲數據較多,當髒數據比例達到一定比例後用戶讀寫請求對應線程將會阻塞,用戶線程也會去淘汰內存中的髒數據page,最終寫性能下降明顯。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"wiredtiger存儲引擎cache淘汰策略相關的幾個配置如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/23/23725f610394f7f13eed8cc4408bfb53.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"   由於業務全量遷移數據是持續性的大流量寫,而不是突發性的大流量寫,因此eviction_target、eviction_trigger、eviction_dirty_target、eviction_dirty_trigger幾個配置用處不大,這幾個參數閥值只是在短時間突發流量情況下調整纔有用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"      但是,在持續性長時間大流量寫的情況下,我們可以通過提高wiredtiger存儲引擎後臺線程數來解決漲數據比例過高引起的用戶請求阻塞問題,淘汰漲數據的任務最終交由evict模塊後臺線程來完成。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"      全量大流量持續性寫存儲引擎優化如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"db.adminCommand( { setParameter : 1, \"wiredTigerEngineRuntimeConfig\" : \"eviction=(threads_min=4, threads_max=20)\"})","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     更多存儲引擎及mongodb內核涉及實現參考:","attrs":{}},{"type":"link","attrs":{"href":"https://github.com/y123456yz/reading-and-annotate-mongodb-3.6","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"mongodb源碼分析、更多實踐案例細節","attrs":{}}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4.3業務流量讀寫優化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 前面章節我們提到,在容器資源評估的時候,我們最終確定選擇單個容器套餐規格爲如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 16CPU、64G內存、5T磁盤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 全量遷移過程中爲了避免OOM,預留了約1/3內存給mongodb server層、操作系統開銷等,當數據遷移完後,業務寫流量相比全量遷移過程小了很多。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 也就是說,前量遷移完成後,cache中漲數據比例幾乎很少,基本上不會達到20%閥值,業務讀流量相比之前多了很多(數據遷移過程中讀流量走原Es集羣)。爲了提升讀性能,因此做了如下性能調整(提前建好索引):","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"節點cacheSize從之前的42G調整到55G,儘量多的緩存熱點數據到內存,供業務讀,最大化提升讀性能。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"每天凌晨低峯期做一次cache內存加速釋放,避免OOM。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"5. 遷移mongodb後性能對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當前已有2個表從Es遷移到該mongodb集羣,同時該業務新增了15億其他業務數據到該集羣,當前目的mongodb集羣已有近20億數據。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5.1 Es時延情況","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 由於該集羣ES沒有歷史時延統計曲線統計,因此ES的時延統計只有以下現象(來自業務方反饋):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢秒級耗時,對我們業務影響挺大,感知非常明顯。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5.2 mongodb集羣時延曲線","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/63/63ca44a4e047eece9fd899c4812bbd29.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"   從上面的監控可以看出,由於除了遷移有源Es的數據,另外還有該業務的其他業務數據流量流向該集羣,因此mongodb集羣流量相比Es會更高,mongodb整體時延約1.5ms左右,遠遠好於之前Es的有時秒級時延抖動。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"6. 遷移成本收益對比","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"6.1 ElasticSearch集羣規格","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"  原Es單個集羣一共26個節點,每個節點副本容器規格:32CPU、64Gmem、2T磁盤,磁盤類型SSD,單個集羣規格總結如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 單集羣節點總數:26","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 每個節點規格:32CPU、64Gmem、2T磁盤","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 總數據量:25億","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"④ 爲了實現機房多活容災和業務高可用,實際部署了兩個Es集羣,實際規格成本還的在上面的基礎上增加一倍。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"6.2 mongodb集羣規格","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8f/8faa58cb59cc2377ef3a7eb325776962.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當前該mongodb集羣已有約16億數據(其中部分爲Es集羣以外數據,該集羣除了存儲部分Es遷移過來的數據,還存儲該業務線其他業務數據),該mongodb集羣規格如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 分片數:2","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 單分片副本數:4","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 每個節點規格:16CPU、64G mem、5T磁盤","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"④ 兩個分片預計存儲最大數據量:預計存儲Es集羣中總數據量的兩倍。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"6.3成本對比計算過程","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"CPU、MEM內存成本對比計算過程","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 源Es兩個集羣和目的mongodb集羣資源對比如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/26/26677190d6e2c49ed64eb38146115d12.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 說明:","attrs":{}},{"type":"text","text":"由於集羣部署方式可能有很多冗餘,上面的CPU和內存成本比對比實際上不客觀,可能Es部署時候規格設置浪費。當然,mongodb實際上CPU資源也非常空閒,所以CPU和內存指標對比無太大參考作用。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"磁盤成本對比計算過程(成本比約6:1)","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 由於目的mongodb集羣中當前除了有源ES遷移過來的部分表外,還有該業務的其他數據,爲了保障磁盤對比的客觀性,磁盤對比選材過程如下(說明:源Es有兩個集羣,這裏只計算單個集羣單分數據的磁盤消耗,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"如果按照兩個集羣計算,磁盤成本比爲12:1,這種對比方法不客觀","attrs":{}},{"type":"text","text":"):","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"① 只對比從源Es中遷移到mongodb中的表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"② 只對比源Es集羣和目的mongodb集羣表中數據量完全一樣的表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"③ 排除源ES遷移到mongodb但是當前還沒有遷移完的表","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"④ 源Es集羣只計算一個集羣單副本的磁盤消耗","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通過上面的選擇算法,基本上做到了同樣數據在Es和mongodb中的對比,磁盤真實消耗對比結果如下(說明:以下數據的Es磁盤佔用爲單個Es集羣單副本方式計算結果,由我司Es運維開發人員提供;mongodb磁盤佔用計算方法爲2個分片主節點真實磁盤佔用之和,包括數據磁盤消耗+索引磁盤消耗):","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b0/b0bd9f18d68bbac9a4b5d854ecfdb570.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 從上面的數據對比可以看出,同樣的數據Es磁盤佔用約爲mongodb磁盤佔用的6倍,和之前其他Es遷移mongodb過程的數據佔用比值類似。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 表1文檔內容如下:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"1.{  \n2.     \"_id\" : null,  \n3.     \"channel\" : null,  \n4.     \"content\" : \"04A193398BE7xxx7E080E2C3CC7B3sxxxxxxxxxC99F9520B8CD0842638DB0F550E125xxxxxxxxxxxxxDB5D3F320642A42CECD3EB5C27714524D0C1BF2A0C6B607D0DFDB669D6633A0E48C65B2623EA15E6DBB0FBF643150E18DD3D0575BDE448C03735A8841E312F8AF0D2BF67D1D357D1AB6249BF3FA4E014C5Axxxxxxxxxxxxxx30C10487667\",  \n5.     \"create_time\" : ISODate(\"2021-02-16T09:56:03Z\"),  \n6.     \"duid\" : \"6F9E856EDDBB5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxF41B0676FB2DE2171C18450E683DE1E9523B518F266856E01B0D6855E29911E5D10F0FA7E4A8EE5816333D89296E7554F05A58\",  \n7.     \"update_time\" : ISODate(\"2021-02-16T09:56:03Z\")  \n8.}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 表2文檔內容如下:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"1.{  \n2.     \"_id\" : \"F9654874C1A37F74DA5E862C408EDC45\",  \n3.     \"channel\" : \"10001\",  \n4.     \"content\" : \"BA7110279AF5XXXXXXXXX6D7ACB8D0F9XXXXXXXXXXXXXX5CDB4693BE949F70E78A20E\",  \n5.     \"create_time\" : ISODate(\"2021-01-21T11:49:46Z\"),  \n6.     \"imei\" : \"F96548XXXXXXXXXXXXXXXXX2C408EDC45\",  \n7.     \"update_time\" : ISODate(\"2021-01-21T11:49:46Z\")  \n8.}","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"7. Elasticsearch和mongodb各自適用場景總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 由於業務開發對業務場景評估不到位,當前我司線上mongodb和Es適用過程存在如下現象:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1) 數百億Es數據遷移mongodb","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2) 也有數百億mongodb數據遷移到Es    ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 從線上真實業務使用情況爲例,以下場景不適合mongodb,實際上也不適合mysql等數據庫:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"8字段以上的隨機組合查詢","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有些業務場景,查詢條件是由用戶觸發,查詢條件不固定,可能存在多個字段的隨機組合查詢。mongodb和mysql等數據庫,都需要手動創建索引,由於8字段以上的隨機組合查詢情況種類太多,因此很難手動建索引覆蓋所有場景,所以選擇Es更優,只是成本會更高。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"全文檢索","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然mongodb也支持全文檢索,mongodb-4.2以下版本全文檢索能力性能和ES沒法比,建議全文檢索適用Es。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongodb-4.2全文搜索已經開始支持Lucene 引擎,可能性能會有很大提升,暫時沒做研究,也沒做性能對比,後續有空在研究。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"其他複雜檢索,例如非前綴模糊匹配查詢","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 例如查詢db.member.find({\"name\":{ $regex:/XXX/ }}),查詢name字段包含XXX的查詢,這類查詢Es更優,因爲mongodb、mysql等底層都是KV存儲,查找KEY的時候都是從左到右比較key字符串,如果是非前綴匹配模糊查詢,就需要全表掃描。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 查詢以某字段爲開頭的文檔,db.member.find({\"name\":{$regex:/^XXX/}})這類就比較適合用mongodb查詢,前綴匹配。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"mongodb和Es不同場景性能對比(以下爲真實線上數據對比):","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c3/c37b96d15e1ecf8eb0aa760ecd1b2536.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 最後,脫離業務場景評估一個數據庫優劣很不合適,主流數據庫都優其存在的意義,不能因爲數據庫在某種場景下不合適而全盤否定該數據庫。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"8. 對《從MongoDB遷移到ES後,我們減少了80%的服務器》一文的不同看法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    《從MongoDB遷移到ES後,我們減少了80%的服務器》一文中以下觀點個人認不太贊同,主要如下:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"8.1 文章內容的不同看法","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     mongodb近幾年持續排名全球前五,市值近兩年已翻數倍,當前市值近200億左右。DB-Engines Ranking排名得分持續提升,說明本身有自己得市場和應用場景,而不是浪得虛名,沒《從MongoDB遷移到ES後,我們減少了80%的服務器》一文中說的不堪一擊。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"服務器80%節省?mongodb部署架構嚴重資源浪費","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 從該文章可以看出,單個文檔200多字節,mongodb存儲引擎wiredtiger默認高壓縮、高性能、細粒度鎖。單個複製集即可存儲數十億數據,你們用了十多個容器。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 同樣的數據,默認mongodb磁盤佔用是Es的六分之一,加上這是日誌集羣,mongodb可以採用1mongod+1mongod+1arbiter部署,規格8c/32gb/100gb 2個容器就可以滿足要求。這樣的部署纔是合理的,成本會比同樣數據Es減少數倍。如果用mongodb,本身2個8c/32gb/100gb+1個低規格選舉節點容器即可搞定的事,你用了15臺。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"任意組合的,現有MongoDB是不支持的?","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 任意組合的查詢不是mongodb不支持,是建索引麻煩,包括mysql、tidb等數據庫都是需要手動一個一個建索引,字段太多很難覆蓋所有查詢。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"性能提高十倍?一個走索引一個不走索引,這種數據對比不客觀","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 這本身就是個數據庫選型問題,隨機組合條件太多,索引不好建,你應該把索引查詢條件對應索引建好後對比。不過多字段的隨機組合查詢,確實不適合用mongodb,建索引麻煩。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 建議把mongodb對應查詢索引建好,重新測試下查詢性能數據。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"MongoDB單集合數據量超過10億條,此情況下即使簡單條件查詢性能也不理想?","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 我司最大的mongodb集羣單表幾千億數據,查詢2ms以內。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 10億規模對我們線上業務就是毛毛雨,我們把mongodb集羣歸類爲如下幾檔:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 小規模集羣:數據量<100億","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 中型集羣:數據量100億到1000億之間","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 大型集羣:數據量大於1000億","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當前我司15%-20%左右集羣是百億級以上集羣,已成功實現單個集羣","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"萬億級離線數據","attrs":{}},{"type":"text","text":"讀寫存儲,當前正在挑戰單個","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"萬億級實時在線數據","attrs":{}},{"type":"text","text":"高併發在線讀寫。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 萬億級離線數據讀寫優化案例詳見Qcon、dbaplus、mongodb社區等分享:","attrs":{}},{"type":"link","attrs":{"href":"http://dbaplus.cn/news-162-3666-1.html","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"underline","attrs":{}}],"text":"用最少人力玩轉萬億級數據,我用的就是MongoDB!","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     後續將分享《單集羣萬億級在線數據高併發讀寫優化實踐》","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"沒有人敢在覈心項目中使用MongoDB?","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    我司25%-35%以上數據存儲文件、圖片等元數據,甚至包括少量交易集羣,非常核心,當前我司規模早已超過萬億級。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"8.2文章以下回復的不同看法","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"mongodb有的,Es都有?","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 沒有萬能的數據庫,正如上文所述,mongodb在高併發寫、固定字段索引查詢方面的性能表現。此外mongodb存儲引擎高壓縮高性能,在成本上面體現很明顯。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 業務場景很重要,不能“捧一個,踩一個”,主流數據庫都有其存在的理由,排名第五絕肯定也不是浪得虛名。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"只有用事務的纔是核心數據?","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 不太贊成這樣定義核心數據,用事務的不一定是核心數據,不用事務的也未必不是核心數據。例如我司存儲的幾千億文件、圖片等元數據,沒有用到事務,但是是非常核心的數據,丟失會造成嚴重後果。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"8.3 業界對數據庫幾個錯誤認識(以OPPO接入業務過程真實案例分享)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 結合公司內部使用mongodb爲例,有時候出現以下情況:一些不適合mongodb的業務場景,例如全文檢索、8字段以上的隨機組合查詢、非前綴匹配模糊查詢,這些場景本身不適合選用mongodb,但是業務選型錯誤,造成使用過程中的瓶頸。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 甚至有相關研發人員因爲選型錯誤在各種羣裏面說:“遠離mongo,珍愛生命”(該業務有全文檢索需求);此外還存在業務數組索引使用不當引起集羣抖動(該業務數組用法沒建索引引起),認爲mongodb設計有問題,這些都是極不負責的行爲。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通過這些真實業務接觸過程中的案例,總結如下幾點:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"① 沒有萬能的數據庫,切記結合自身業務場景,選擇最優數據庫;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"② 主流數據庫都有其存在的理由,不要因爲在某些場景不適合而全盤否定該數據庫其他層面的優勢。例如不能因爲mongodb在全文檢索等複雜檢索上面的弱勢而全面否定mongodb,也不能因爲Es在磁盤成本、高併發讀寫等方面的劣勢而否定Es在複雜檢索上面的優勢。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"③ 切記以偏概全,捧一個踩一個,這都是及其不客觀的行爲。","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章