MongoDB find getmore操作慢問題排查

本文來自獲得《2021MongoDB技術實踐與應用案例徵集活動》入圍案例獎作品

作者:張家僑

問題描述

本文介紹一次幫助業務排查線上查詢操作慢的問題的詳細過程以及解決方法。

業務在使用find().limit()命令查詢數據時遇到卡頓,一次操作卡最多1分鐘,問題描述如下:

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom:"zhang" } ], nid: 50, status: 0 }).sort({ timetag: 1 }).limit(200)

業務反饋:出問題的MongoDB版本爲4.2.12。這個查詢從慢日誌來看要四五十秒,但是直接跑的話第一批返回數據很快,getmore的時候就慢了,你看下爲啥會慢在getmore

從問題描述來看,直觀上是getmore造成了查詢卡頓,可能原因有如下:

getmore操作內核出現卡頓 -- 內核問題可能性較低

查詢計劃與被查詢數據不匹配 -- 可能性較高

下面問題排查將從這兩個主要問題點入手。

問題重現

將數據恢復到一個備用集羣,然後使用業務的查詢進行操作:

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).limit(200)

第一次查詢立刻返回,當獲取數據條數到102條時,出現卡頓。操作產生慢日誌,發現是觸發了一次getmore操作。日誌輸出如下:

2021-12-15T10:21:55.007+0800 I COMMAND [conn857] command xxx_db.xxx_collection appName: "MongoDB Shell" command: getMore { getMore: 1244654399487918694, collection: "xxx_collection", ****** planSummary: IXSCAN { timetag: 1 } cursorid:1244654399487918694 keysExamined:11338296 docsExamined:11338296cursorExhausted:1 numYields:88583 nreturned:99 reslen:100170 locks:{ReplicationStateTransition: { acquireCount: { w: 88584 } }, Global: { acquireCount: { r: 88584 } }, Database: { acquireCount: { r: 88584 } }, Collection: { acquireCount: { r: 88584 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 15442700982, timeReadingMicros: 40865619 },timeWaitingMicros: { cache: 33773 } }protocol:op_msg 65270ms

問題排查

確認是否是getmore的問題

在MongoDB中,其查詢返回結果批大小默認爲101。也就是說,MongoDB一次查詢會找到101個結果然後一次性返回,當需要更多結果時再查找101個結果,通過getmore返回。

我們是可以通過batchSize操作來設置MongoDB一次返回的數據個數的,包括getmore操作。

如果是getmore的問題,理論上調整其batchSize並不會影響問題的出現,所以我們進行了如下調整。

將batchSize設置爲150:

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).batchSize(150).limit(200)

第一次查詢立刻返回結果,超過batchSize數據後,getmore又卡頓,問題依舊:

2021-12-15T10:25:54.781+0800 I COMMAND [conn859] command xxx_db.xxx_collection appName: "MongoDB Shell" command: getMore { getMore: 8826588127480667691, collection: "xxx_collection", batchSize: 150, ******planSummary: IXSCAN { timetag: 1 }cursorid:8826588127480667691 keysExamined:11338234 docsExamined:11338234cursorExhausted:1 numYields:88582 nreturned:50 reslen:50818 locks:{ReplicationStateTransition: { acquireCount: { w: 88583 } }, Global: { acquireCount: { r: 88583 } }, Database: { acquireCount: { r: 88583 } }, Collection: { acquireCount: { r: 88583 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 16610295032, timeReadingMicros: 30201139 },timeWaitingMicros: { cache: 17084 } }protocol:op_msg 53826ms

調整爲199後效果也類似,但是調整爲200後,問題變爲第一次查詢就出現卡頓:

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).batchSize(200).limit(200)

相應慢日誌如下:

2021-12-15T10:27:23.729+0800 I COMMAND [conn859] command xxx_db.xxx_collection appName: "MongoDB Shell" command: find { find: "xxx_collection", filter: { timetag: { $gt: 1636513802167.0 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50.0, status: 0.0 }, projection: {$sortKey: { $meta: "sortKey" } }, sort: { timetag: 1.0 }, limit: 200, batchSize: 200, ******planSummary: IXSCAN { timetag: 1 }keysExamined:11338445 docsExamined:11338445 cursorExhausted:1 numYields:88582nreturned:200 queryHash:ECA82717 planCacheKey:AC7EC9E3 reslen:202045 locks:{ReplicationStateTransition: { acquireCount: { w: 88583 } }, Global: { acquireCount: { r: 88583 } }, Database: { acquireCount: { r: 88583 } }, Collection: { acquireCount: { r: 88583 } }, Mutex: { acquireCount: { r: 2 } } } storage:{ data: { bytesRead: 17688667726, timeReadingMicros: 14907251 },timeWaitingMicros: { cache: 11784 } }protocol:op_msg 36654ms

所以我們可以基本排除是getmore操作本身的問題。從慢操作日誌我們可以看出,查詢計劃使用timetag索引來進行數據獲取和過濾,一共遍歷了1千萬的數據。問題應該和數據查詢計劃以及數據分佈有關,具體在查詢進行第199~200個結果獲取時發生了卡頓,且遍歷timetag索引不夠快。

所以我們的分析方向轉爲查詢計劃以及查詢的數據上,確認是否存在問題,即查詢計劃不適合目標數據。

分析查詢的數據分佈

首先,我們需要了解業務數據的分佈格式以及查詢的目的,業務數據關鍵字段如下:

{ "nto" : , "nfrom" : , "nid" : , "timetag" : , "status" : }

從庫表名以及數據格式來看,查詢數據爲某種消息的傳遞記錄。

目標查詢如下:

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).limit(200)

而查詢所要做的就是要找到某人在某個時間點後,接收到或者發送的消息記錄,且nid = 50 & status = 0。所以整個查詢比較重要的條件就是時間timetag以及用戶名。而大於某個時間點這種查詢可能存在目標數據量特別大的情形,下面進行確認。

數據在timetag上的分佈

mongos>db.xxx_collection.count()
538058824

由上我們可以看出,整個目標集合數據量爲5億多,數據量較大。下面針對性取出造成卡頓的第199個和第200個查詢結果的timetag。

第199個結果timetag:

{ "_id" : ObjectId("618b380a82010a586b4265e6"), "timetag" : NumberLong("1636513802759") }

第200個結果timetag:

{ "_id" : ObjectId("618ce26682010a586be027f9"), "timetag" : NumberLong("1636622950801") }

以查詢條件的時間點1636513802167爲T0,第199個結果的時間點1636513802759爲T1,第200個結果的時間點1636622950801爲T2。下面以這三個時間爲節點,對數據進行分段分析。

查詢數據的分佈

T0後總的數據量:

mongos>db.xxx_collection.find({ timetag: { $gt: 1636513802167 }}).count()
191829505

處於目標時間段的數據總量爲1.9億多,數據量比較大

T0~T1的數據量:

mongos>db.xxx_collection.find({ $and: [{timetag: {$gt: 1636513802167}}, {timetag: {$lte: 1636513802759}}]}).count()
287

T1~T2之間的數據量:

mongos>db.xxx_collection.find({ $and: [{timetag: {$gt: 1636513802759}}, {timetag: {$lt: 1636622950801}}]}).count()
11338157

T2後的數據量:

mongos>db.xxx_collection.find({timetag: {$gte: 1636622950801}}).count()
180491061

查詢結果的分佈

總查詢結果:

mongos>db.xxx_collection.find({ timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).count()
428

T0~T1:

mongos>db.xxx_collection.find({ $and:[{timetag: { $gt: 1636513802167 }}, {timetag: {$lte: 1636513802759}}], $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).count()
199

T1~T2:

mongos>db.xxx_collection.find({ $and:[{timetag: { $gt: 1636513802759 }}, {timetag: {$lt: 1636622950801}}], $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).count()
0

T2後:

mongos>db.xxx_collection.find({ timetag: { $gte: 1636622950801 }, $or: [ { nto:"zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0}).sort({ timetag: 1 }).count()
229

從數據以及相應結果數量的分佈可以看出,查詢結果主要分佈在T0~T1和T2後的時間段內。T1~T2時間段內不存在符合條件的結果,有1千萬不符合條件的數據存在。總結來說,結果分佈在timetag字段上呈現兩頭分佈,中間存在巨大空洞。

分析執行的查詢計劃

原查詢計劃

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).limit(200).explain("executionStats")

得到的查詢計劃:

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : [
"(1636513802167.0, inf.0]"
]
}
}
}
}

由上面的結果我們可以看出,原查詢的查詢計劃爲利用timetag的索引進行掃描,然後逐個過濾出符合timetag條件的記錄。在數據分佈分析中我們知道,符合目標timetag的數據有1.9億,而掃描這些數據將會非常慢,即使使用索引。因爲getmore操作使用的cursor是原查詢計劃產生的,同一個查詢內只會使用同一個查詢計劃。下面通過分段執行原查詢計劃來佐證掃描timetag慢。

T0~T1數據段執行原計劃

使用上述查詢計劃查詢T0~T1的數據:

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$lte" : 1636513802759
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : [
"(1636513802167.0, 1636513802759.0]"
]
}
}
}
},

結果立刻得到,因爲總的需要掃描的數據量只有287。

T1~T2數據段執行原計劃

使用上述查詢計劃查詢T1~T2的數據:

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$lt" : 1636622950801
}
},
{
"timetag" : {
"$gt" : 1636513802759
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"timetag" : 1
},
"limitAmount" : 200,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"nto" : 1,
"validflag" : 1,
"timetag" : 1
},
"indexName" : "nto_1_validflag_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"nto" : [ ],
"validflag" : [ ],
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nto" : [
"["zhang", "zhang"]"
],
"validflag" : [
"[MinKey, MaxKey]"
],
"timetag" : [
"(1636513802759.0, 1636622950801.0)"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"nfrom" : 1,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"nfrom" : [ ],
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nfrom" : [
"["zhang", "zhang"]"
],
"timetag" : [
"(1636513802759.0, 1636622950801.0)"
]
}
}
]
}
}
}
},

查詢計劃變了,強制使用timetag_1索引來使用原查詢計劃:

mongos>db.xxx_collection.find({ $and:[{timetag: { $gt: 1636513802759 }}, {timetag:{$lt: 1636622950801}}], $or: [ { nto: "zhang" }, { nfrom:"zhang" } ], nid: 50, status: 0 }).sort({ timetag: 1}).limit(200).hint("timetag_1").explain("executionStats")

查詢計劃:

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$lt" : 1636622950801
}
},
{
"timetag" : {
"$gt" : 1636513802759
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : [
"(1636513802759.0, 1636622950801.0)"
]
}
}
}
},

查詢耗時:

2021-12-15T11:18:43.650+0800 I COMMAND [conn913] command xxx_db.xxx_collection appName: "MongoDB Shell" command: find { find: "xxx_collection", filter: { $and: [ { timetag: { $gt: 1636513802759.0 } }, { timetag: { $lt: 1636622950801.0 } } ], $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50.0, status: 0.0 }, projection: {$sortKey: { $meta: "sortKey" } }, sort: { timetag: 1.0 }, hint: { $hint: "timetag_1" }, limit: 200, runtimeConstants: { localNow: new Date(1639538294423), clusterTime:Timestamp(1639538291, 1) },shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], ****** planSummary: IXSCAN {timetag: 1 } keysExamined:11338157 docsExamined:11338157 cursorExhausted:1numYields:88579 nreturned:0 reslen:386 locks:{ ReplicationStateTransition: {acquireCount: { w: 88580 } }, Global: {acquireCount: { r: 88580 } }, Database:{ acquireCount: { r: 88580 } },Collection: { acquireCount: { r: 88580 } }, Mutex: {acquireCount: { r: 2 } } } storage:{data: { bytesRead: 16223299833,timeReadingMicros: 9431804 },timeWaitingMicros: { cache: 14082 } }protocol:op_msg 29226ms

我們發現,查詢T1~T2的空洞區域非常慢,耗時29秒,因爲需要掃描1千萬多的數據。

T2後數據段執行原計劃

mongos> db.xxx_collection.find({timetag: { $gt: 1636622950801 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).limit(200).explain("executionStats")

查詢計劃:

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636622950801
}
}
]
},
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"timetag" : 1
},
"limitAmount" : 200,
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "OR",
"inputStages" : [
{
"stage" : "IXSCAN",
"keyPattern" : {
"nto" : 1,
"validflag" : 1,
"timetag" : 1
},
"indexName" : "nto_1_validflag_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"nto" : [ ],
"validflag" : [ ],
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nto" : [
"["zhang", "zhang"]"
],
"validflag" : [
"[MinKey, MaxKey]"
],
"timetag" : [
"(1636622950801.0, inf.0]"
]
}
},
{
"stage" : "IXSCAN",
"keyPattern" : {
"nfrom" : 1,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"nfrom" : [ ],
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nfrom" : [
"["zhang", "zhang"]"
],
"timetag" : [
"(1636622950801.0, inf.0]"
]
}
}
]
}
}
}
},

查詢計劃變了,強制使用timetag_1索引來使用原查詢計劃:

mongos>db.xxx_collection.find({ timetag: { $gt: 1636622950801 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).limit(200).hint("timetag_1").explain("executionStats")

查詢計劃:

"parsedQuery" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636622950801
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"$or" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nto" : {
"$eq" : "zhang"
}
}
]
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"timetag" : 1
},
"indexName" : "timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"timetag" : [
"(1636622950801.0, inf.0]"
]
}
}
}
},

日誌時間輸出:

2021-12-15T11:36:34.388+0800 I COMMAND [conn918] command xxx_db.xxx_collection appName: "MongoDB Shell" command: explain { explain: { find: "xxx_collection", filter: { timetag: { $gt: 1636622950801.0 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50.0, status: 0.0 }, sort: { timetag: 1.0 }, hint: { $hint: "timetag_1" }, limit: 200, ****** $db: "xxx_db" } numYields:1109015 reslen:2691 locks:{ReplicationStateTransition: { acquireCount: { w: 1109016 } }, Global: { acquireCount: { r: 1109016 } }, Database: { acquireCount: { r: 1109016 } }, Collection: { acquireCount: { r: 1109016 } }, Mutex: { acquireCount: { r: 2 } } } storage:{ data: { bytesRead: 195293544507, timeReadingMicros: 518472952 },timeWaitingMicros: { cache: 272870 } }protocol:op_msg 801697ms

由上面可以看出,查詢T2後的數據時,需要800秒,因爲需要掃描1.8億的數據。因爲原查詢使用limit(200)限制了查詢的總結果數,且只有一個結果在這個時間段內,所以查詢找到一個結果即可返回,不需要掃描整個時間段。

問題原因總結

總結來說,問題出現原因是MongoDB給出的查詢計劃不符合目標數據的查詢。

原查詢計劃爲使用timetag索引來進行掃描,獲取結果後再根據其他條件進行過濾。

我們發現,查詢結果在timetag上分段分佈,分佈在timetag的兩頭,中間存在一大段無效數據。第1~199的結果分佈在T0~T1時間段內,第200後的結果分佈在T2時間段後。

如果使用原查詢計劃,即只使用timetag索引來掃描全表,在查詢第199個後的結果時將會非常慢,因爲這些結果分佈在1.9億的數據裏。這也解釋了第一次查詢獲取101個結果快,因爲只需查掃描T0~T1時間段內400多個數據,第二次查詢需要獲取第102~202個數據則慢,因爲需要先掃描1千多萬的無效數據然後才能返回結果。原查詢計劃不夠高效,導致掃描比較慢。

問題解決方案

只使用timetag索引進行數據遍歷較低效,使用聚合索引來在遍歷數據時進行過濾,減少遍歷的數據量。所以預期要使用timetag和nfrom以及timetag和nto的聯合索引來進行並行查詢,最後將兩個查詢結果進行合併。由於MongoDB的查詢優化器不能將原有的查詢轉化爲上面預期的查詢計劃,所以我們需要改寫查詢計劃,便於MongoDB的查詢優化器進行識別。

將原查詢:

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).limit(200)

轉化爲等價的優化查詢:

db.xxx_collection.find({$or: [ {$and: [{nto: "zhang"}, {timetag: {$gt: 1636513802167}}, { nid:50}, {status: 0}]}, {$and: [{nfrom: "zhang"}, {timetag: {$gt: 1636513802167}}, { nid:50}, {status: 0}]} ] }).sort({timetag:1}).limit(200)

並在nto和timetag以及nfrom和timetag(備註:原庫已經有nfrom和timetag的聯合索引)上建立聯合索引:

{
"v" : 2,
"key" : {
"nfrom" : 1,
"timetag" : 1
},
"name" : "nfrom_1_timetag_1",
"ns" : "xxx_db.xxx_collection"
},
{
"v" : 2,
"key" : {
"nto" : 1,
"timetag" : 1
},
"name" : "nto_1_timetag_1",
"ns" : "xxx_db.xxx_collection"
},

得到的查詢計劃:

"parsedQuery" : {
"$or" : [
{
"$and" : [
{
"nfrom" : {
"$eq" : "zhang"
}
},
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
},
{
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"nto" : {
"$eq" : "zhang"
}
},
{
"status" : {
"$eq" : 0
}
},
{
"timetag" : {
"$gt" : 1636513802167
}
}
]
}
]
},
"winningPlan" : {
"stage" : "SUBPLAN",
"inputStage" : {
"stage" : "LIMIT",
"limitAmount" : 200,
"inputStage" : {
"stage" : "PROJECTION_SIMPLE",
"transformBy" : {
"timetag" : 1
},
"inputStage" : {
"stage" : "SORT_MERGE",
"sortPattern" : {
"timetag" : 1
},
"inputStages" : [
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"nfrom" : 1,
"timetag" : 1
},
"indexName" : "nfrom_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"nfrom" : [ ],
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nfrom" : [
"["zhang", "zhang"]"
],
"timetag" : [
"(1636513802167.0, inf.0]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"nid" : {
"$eq" : 50
}
},
{
"status" : {
"$eq" : 0
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"nto" : 1,
"timetag" : 1
},
"indexName" : "nto_1_timetag_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"nto" : [ ],
"timetag" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"nto" : [
"["zhang", "zhang"]"
],
"timetag" : [
"(1636513802167.0, inf.0]"
]
}
}
}
]
}
}
}
},

這個查詢計劃符合我們的預期。

查詢等價證明

上述優化的查詢計劃可以通過布爾等價運算得到。

原查詢爲:

db.xxx_collection.find({timetag: { $gt: 1636513802167 }, $or: [ { nto: "zhang" }, { nfrom: "zhang" } ], nid: 50, status: 0 }).sort({ timetag:1 }).limit(200)

我們將find的查詢條件定義抽象如下:

a:timetag > 1636513802167

b:nto = "zhang"

c:nfrom = "zhang"

d:nid = 50

e:status = 0

所以原查詢條件相當於一個合取範式:

$$
a \bigwedge (b \bigvee c) \bigwedge d \bigwedge e
$$

通過邏輯運算,可以轉化爲一個析取範式:

$$
(a \bigwedge b \bigwedge d \bigwedge e) \bigvee (a \bigwedge c \bigwedge d\bigwedge e)
$$

對應如下查詢:

db.xxx_collection.find({$or: [ {$and: [{nto: "zhang"}, {timetag: {$gt: 1636513802167}}, { nid:50}, {status: 0}]}, {$and: [{nfrom: "zhang"}, {timetag: {$gt: 1636513802167}}, { nid:50}, {status: 0}]} ] }).sort({timetag:1}).limit(200)

查詢計劃等價得證

關於作者:網易遊戲 張家僑

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章