MongoDB 常用 Aggregation Pipeline Stages - group and lookup etc

最近有把一個小項目的底層數據庫由 MySQL 修改成 MongoDB,藉此更進一步熟悉了 Aggregation Pipeline Stages

本文僅包含翻譯小項目 SQL 版本所需的 MongoDB Aggregation Pipeline Stages 知識,成文時 MongoDB 最新版本爲 V4.2

db.collection.aggregate( [ { <stage> }, ... ] )

注:除了 outout、merge、$geoNear 所有 stage(階段)均可在 pipeline(管道)中出現多次,意味着本文所提這些均可出現多次

Examples

示例僅演示本文相關 stage 主要用法,詳細請訪問各個 stage 的官方文檔

db.message.insertMany([
    {"id": 1,"num": 2},
    {"id": 1,"num": 4},
    {"id": 2,"num": 6},
    {"id": 2,"num": 4},
    {"id": 2,"num": 2},
    {"id": 3,"num": 2}
]);

db.user.insertMany([
    {"id": 1,"name": "n1"},
    {"id": 2,"name": "n2"},
    {"id": 3,"name": "n3"}
]);
db.message.aggregate([{
    $match: {
        id: 2
    }
}, {
    $sort: {
        num: 1
    }
}, {
    $limit: 2
}, {
    $project: {
        _id: 0
    }
}]);

[
    {"id": 2,"num": 2},
    {"id": 2,"num": 4}
]
db.message.aggregate([{
    $group: {
        _id: "$id",
        max: {
            $max: "$num"
        },
        sum: {
            $sum: "$num"
        },
        count: {
            $sum: 1
        }
    }
}]);

[
    {"_id": 3,"max": 2,"sum": 2,"count": 1},
    {"_id": 2,"max": 6,"sum": 12,"count": 3},
    {"_id": 1,"max": 4,"sum": 6,"count": 2}
]
db.user.aggregate([{
    $match: {
        id: 3
    }
}, {
    $lookup: {
        from: "message",
        localField: "id",
        foreignField: "id",
        as: "messages"
    }
}, {
    $project: {
        _id: 0,
        "messages._id": 0
    }
}]);

{"id": 3,"name": "n3","messages": [
        {"id": 3,"num": 2}
    ]
}

$match

{ $match: { <query> } }

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe

If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes

$sort

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

set the sort order to 1 or -1 to specify an ascending or descending sort respectively

When a $sort precedes a $limit and there are no intervening stages that modify the number of documents, the optimizer can coalesce the $limit into the $sort

The $sort stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $sort will produce an error. To allow for the handling of large datasets, set the allowDiskUse option to true to enable $sort operations to write to temporary files

$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the $project, $unwind, and $group aggregation operators. If $project, $unwind, or $group occur prior to the $sort operation, $sort cannot use any indexes

$limit

{ $limit: <positive integer> }

$limit takes a positive integer that specifies the maximum number of documents to pass along

$group

{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }

The _id field is mandatory; however, you can specify an _id value of null, or any other constant value, to calculate accumulated values for all the input documents as a whole

The remaining computed fields are optional and computed using the <accumulator> operators

Accumulator Operator

$avg Returns an average of numerical values. Ignores non-numeric values

$max Returns the highest expression value for each group

$min Returns the lowest expression value for each group

$sum Returns a sum of numerical values. Ignores non-numeric values

$lookup

Equality Match

To perform an equality match between a field from the input documents with a field from the documents of the “joined” collection

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

from Specifies the collection in the same database to perform the join with. The from collection cannot be sharded

localField Specifies the field from the documents input to the $lookup stage

foreignField Specifies the field from the documents in the from collection

as Specifies the name of the new array field to add to the input documents. The new array field contains the matching documents from the from collection. If the specified name already exists in the input document, the existing field is overwritten

if does not contain the localField or foreignField, the $lookup treats the value as null for matching purposes

Join Conditions and Uncorrelated Sub-queries

$skip

{ $skip: <positive integer> }

$skip takes a positive integer that specifies the maximum number of documents to skip

$sample

{ $sample: { size: <positive integer> } }

Randomly selects the specified number of documents from its input

$sample may output the same document more than once in its result set

$count

{ $count: <string> }

<string> is the name of the output field which has the count as its value. <string> must be a non-empty string, must not start with $ and must not contain the . character

The $count stage is equivalent to the following $group + $project sequence

db.collection.aggregate( [
   { $group: { _id: null, myCount: { $sum: 1 } } },
   { $project: { _id: 0 } }
] )
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章