搞會這個索引添加法,十億級時延敏感集羣想抖動都難

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"線上某mongodb集羣存儲影響公司收入流水的核心數據,本文分享該集羣爲何多個索引串行後臺會引起集羣抖動,並且部分節點出現了連接數耗光等問題。同時通過本案例,給出時延敏感業務該最優方式添加索引,做到對業務最小化影響或者無影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引對業務查詢性能提升起着至關重要的作用,但是絕大部分mongodb程序員和DBA對時延敏感業務的索引添加方法是錯誤的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要完成一下幾個目的:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何background後臺加索引會引起時延敏感集羣抖動?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何前面兩個索引添加過程沒觸發告警,第三個索引添加完成後才觸發告警?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何只有從節點抖動,主節點時延一切正常?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何連接數暴漲?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"連接數耗光,mongo shell無法登陸查看節點內部狀態信息,如何破局?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時延敏感型業務如何做到業務無感知索引添加?"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、業務背景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某業務存儲公司核心數據,集羣異常會影響公司流水收入,該業務對時延非常敏感,稍有抖動就容易引起客戶端超時異常,該業務場景如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據量很小,10億級"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心業務"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時延敏感"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分片模式,單個分片"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀寫分離"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讀多寫少"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"峯值流量8-10W\/s"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該集羣對應mongodb內核版本:3.6.13,某天業務自己通過mongodb管控平臺串行方式添加幾個索引(background後臺添加),一個索引添加執行完成返回後,業務開始下一個索引的添加。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"添加第一個索引和第二個索引完成後,業務沒告警,但是當業務添加完第三個索引後,開始收到部分查詢時延超過閥值告警。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、集羣架構"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 集羣部署架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該集羣部署架構如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/79\/79f1c651c3490619632875f7eceaad60.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該業務集羣對應流量監控曲線如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0b\/0b73d88d4f5d30ce7a105ba610bea08d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示,該業務部署只有一個分片,該分片爲一主四從結構5節點。分片1採用5節點的原因如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"核心業務,5副本方式部署,可以容忍兩個節點估值"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時延敏感,由於業務優先讀從節點,因此可以通過增加分片從節點的方式提升業務的QPS。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 一個分片爲何要選擇分片模式?複製集不是可以滿足要求嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面的結構圖可以看出,該集羣只有一個分片,採用了分片模式架構,爲何不選擇複製集架構,這樣還可以省掉mongos代理和config server的成本開銷。採用分片模式主要基於如下因素考慮:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該業務當前數據比較小,10億級別,但是隨着時間增長後續可能會增加到百億級別,考慮到以後可能存在分片擴容的需求,因此採用了分片模式。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該集羣當前寫入更新比較少,後續可能存在大量寫入更新的場景,大量寫入更新需要多分片來支撐。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我司在mongos代理增加了很多功能,例如限流、流量控制、權限細化控制、監控信息完善等功能,因此默認採用分片模式。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、問題快速發現及解決"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 問題發現"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"某天,突然告警中心打來電話,突然收到如下告警信息:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9b\/9b0c8911197d429f83a87ac557bc933c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"幾乎四個從節點先後收到同樣的告警,節點時延部分請求超過20ms,由於該業務是非常核心的影響業務營收的核心集羣,非常緊張。但是,有一個很奇怪的現象,主節點訪問時延正常,只有從節點時延抖動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,還不停收到實例不可用異常告警,對應監控曲線如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/a2\/a299ba8a49bf9432ae21906f61578f1d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/9b\/9bd2916a8d8b19902fc8765f93b5c162.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說明:上圖曲線一根代表客戶端當前已用連接數,一根曲線代表剩餘可用連接數。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 問題排查過程 "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"收到告警後,發現業務有很多慢日誌(時延敏感業務,慢日誌打印閥值爲20ms),同時慢日誌都走了最優索引。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過mongo shell登陸對應節點後臺"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"於是通過mongo shell登陸節點後臺,但是登陸不上,出現如下打印:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"1.MongoDB shell version v3.6.13 \n2.connecting to: mongodb:\/\/x.x.x.x:20001\/test?gssapiServiceName=mongodb \n3.2021-04-29T11:09:15.049+0800 E QUERY [thread1] Error: network error while attempting to run command 'isMaster' on host x.x.x.x:20001' : \n4.connect@src\/mongo\/shell\/mongo.js:263:13 \n5.@(connect):1:6 \n6.exception: connect failed "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於節點登陸不上,因此登陸到存儲節點查看後臺日誌,日誌中有大量的打印提示連接數耗光了,如下圖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c6\/c653edf55c4e2996bbdee8ebc17608a5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"節點系統監控統計分析"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面的現象可以看出鏈接耗光了,於是分析節點所在服務器系統監控,發現一個問題,磁盤IO非常高,如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/78\/78624ff94b38e815a8c7ca3768d4afa6.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"分析mongod實例日誌"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於從節點登陸不上,系統磁盤IO很高,因此懷疑有慢操作在運行,於是分析實例日誌,發現如下現象:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/0c\/0cf2ad5cbf54081f645a4807d8eb7e57.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"問題確認"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過前面的分析可以得出問題根因在於加索引引起從節點磁盤IO過高,最終引起業務查詢時延上升抖動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過和業務溝通,業務這段時間確實通過我們的管控平臺串行方式加了幾個索引,磁盤IO過高由業務加索引引起,同時從節點同一時刻有多個索引添加。加索引過程首先需要讀取表數據,然後通過數據構建索引,這個過程都會有多次IO操作。磁盤IO是公用的,服務器IO高會引起該服務器上所有的IO操作變慢,因此最終引起從節點讀服務抖動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"問題解決過程"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到這裏,我們已經確定問題是由於加索引引起,只有把索引幹掉磁盤IO纔會恢復正常,因此我們需要儘快幹掉索引。然而,由於連接數已經耗光,無法鏈接從節點,所以我們不能做killop操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於無法登陸後臺做killop操作,於是直接kill進程,kill進程後啓動,發現mongod還是在構建索引,如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/39\/39cb8d1e02a18545714b28143afcfbd6.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"重啓後,還是需要構建索引,因爲之前索引沒有執行完成mongod進程就掛了,因此需要重建索引來保持與主節點狀態一直。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不過,mongod爲了解決類似問題,提供了一個noIndexBuildRetry參數來跳過實例加索引中途異常重啓後重構索引的流程,該參數功能如下說明:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"don't retry any index builds that were interrupted by shutdown    "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"noIndexBuildRetry放棄啓動從節點mongod實例,業務很快恢復:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"mongod -f \/home\/service\/mongodb\/conf\/mongod_20001.conf --noIndexBuildRetry "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、createIndex構建索引核心流程及問題暴露過程"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.1 createIndex構建索引核心流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務鏈接代理通過createIndex命令添加background後臺索引,其運行流程如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/f3\/f3b117e3283889221fe9aa7f7c86fbac.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主節點接受到createIndex命令後的執行主要流程如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主節點查詢對應表數據,然後build構建索引。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引數據構建執行完成後,返回客戶端OK。(注意:主構建完成後就通知OK給客戶端,實際上這時候從節點還沒有開始構建索引)"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生成createIndex對應oplog數據到oplog表"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從節點獲取到createIndex對應oplog操作,然後重放createIndex構建索引。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"4.2 問題暴露流程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過分析日誌時間點和告警時間點,和業務確認,發現當業務第三個索引添加完成後(實際上只是主節點構建索引完成),開始觸發時延告警閥值。總接時間序列如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"T1時刻第一個索引主節點構建完成,然後同步到兩個從節點構建索引,也就是T1時刻兩個從節點只有一個索引index1在運行。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"T2時刻第二個索引主節點構建完成,然後從節點獲取到這個索引執行,這時候由於從節點讀流量大,因此構建索引比主節點慢,最終index1和index2都在兩個從節點運行。此時,訪問時延還沒有觸發時延告警閥值。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以此類推,T3時刻第三個索引添加完成,從節點通過oplog獲取到第三個索引運行,由於此時index1、index2都還沒有運行完成,因此兩個從節點同時構建index1、index2和index3索引。三個索引的同時運行,進一步加重了磁盤IO負載和系統開銷,業務訪問時延進一步上升,最終造成部分查詢時延超過20ms。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總結如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/13\/1361cc90ec78e339aa504448d268a9bb.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、疑問解答"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何background後臺加索引會引起時延敏感集羣抖動?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上面分析,雖然業務是串行的方式一個索引添加成功後再添加下一個background後臺索引,由於主從索引構建執行時間的長短不同,從節點通過拉取對應oplog重放,最終引起某一時刻開始三個索引在所有從節點同時運行,引起IO負載很高,最終觸發業務訪問時延告警。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何前面兩個索引添加過程沒觸發告警,第三個索引添加完成後才觸發告警?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上,從節點拉取Oplog獲取到第三個索引執行的時候IO負載進一步增加,最終觸發了20ms訪問時延閥值。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何只有從節點抖動,主節點時延一切正常?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主節點由於業務添加是一個索引後臺添加完成後,才添加第二個索引。也就是主節點同一時刻只會有一個索引在執行,IO負載低,此外由於主節點寫流量本身不高,讀流量幾乎都在從節點,索引加索引執行很快,並且幾乎不會影響寫流量時延。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲何連接數暴漲?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"連接數暴漲實際上是加索引引起業務訪問慢的結果,由於三個索引同時在從節點構建索引運行,造成從節點IO負載很高,最終造成業務訪問變慢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"訪問變慢後,會引起客戶端鏈接池中的鏈接不夠用,於是客戶端會動態的增加鏈接池中的連接數來進行後端DB訪問,最終造成了mongod服務端連接數到達配置上線出現無法鏈接的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"連接數耗光,mongo shell無法登陸查看節點內部狀態信息,如何破局?"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"連接數耗光,mongo shell將無法連接節點,無法獲取節點內部狀態。可以對該功能做優化,對指定的客戶端(默認127.0.0.1)設置白名單,取消max connections限制,這樣我們即可通過節點本機登陸mongod後臺獲取內部狀態信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如增加了鏈接限制白名單後,就可以通過127.0.0.1登陸到節點內部,然後通過killOp操作把從節點正在構建索引的操作幹掉。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"六、時延敏感型業務如何做到業務無感知索引添加?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方法一:所有主從確保索引執行完成後添加下一個索引(影響相對較小)後臺background加索引,確保所有主從索引構建完成後,纔開始下一個索引的創建,避免出現本文所說的多個索引同時在從節點執行引起業務抖動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說明:mongodb高版本中對後臺添加索引做了優化,從節點拉取建索引對應oplog重放的時候,只有第一個索引執行完成,纔會執行第二個索引,從而避免了同時多個索引同時執行引起的抖動。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方法二:單機啓動,然後加索引,加完索引後再加入到副本集(業務無任何感知)無感知添加索引步驟如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從複製集中移除某個從節點"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單機方式啓動該節點"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"阻塞方式(不帶background)加索引,這樣索引構建速度更快"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引添加完成後,副本集方式啓動該節點"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"把該節點加入複製集"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過以上步驟,即可無感知方式完成一個從節點的索引添加,其他節點添加過程重複該操作過程即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"楊亞洲"},{"type":"text","text":",前滴滴出行專家工程師,現任OPPO文檔數據庫mongodb負責人,負責數萬億級數據量文檔數據庫mongodb內核研發、性能優化及運維工作,一直專注於分佈式緩存、高性能服務端、數據庫、中間件等相關研發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:dbaplus社羣(ID:dbaplus)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/rjyhIf2S1Z2NxNB4qVHu6Q","title":"xxx","type":null},"content":[{"type":"text","text":"搞會這個索引添加法,十億級時延敏感集羣想抖動都難"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章