Docs Menu

Aggregation Pipeline Optimization

On this page

Aggregation pipeline operations have an optimization phase which attempts to reshape the pipeline for improved performance.

To see how the optimizer transforms a particular aggregation pipeline, include the explain option in the db.collection.aggregate() method.

Optimizations are subject to change between releases.

In addition to learning about the aggregation pipeline optimizations performed during the optimization phase, you will also see how to improve aggregation pipeline performance using indexes and document filters. See Improve Performance with Indexes and Document Filters.

The aggregation pipeline can determine if it requires only a subset of the fields in the documents to obtain the results. If so, the pipeline will only use those required fields, reducing the amount of data passing through the pipeline.

For an aggregation pipeline that contains a projection stage ($project or $unset or $addFields or $set) followed by a $match stage, MongoDB moves any filters in the $match stage that do not require values computed in the projection stage to a new $match stage before the projection.

If an aggregation pipeline contains multiple projection and/or $match stages, MongoDB performs this optimization for each $match stage, moving each $match filter before all projection stages that the filter does not depend on.

Consider a pipeline of the following stages:

{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: {
name: "Joe Schmoe",
maxTime: { $lt: 20 },
minTime: { $gt: 5 },
avgTime: { $gt: 7 }
} }

The optimizer breaks up the $match stage into four individual filters, one for each key in the $match query document. The optimizer then moves each filter before as many projection stages as possible, creating new $match stages as needed. Given this example, the optimizer produces the following optimized pipeline:

{ $match: { name: "Joe Schmoe" } },
{ $addFields: {
maxTime: { $max: "$times" },
minTime: { $min: "$times" }
} },
{ $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
{ $project: {
_id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
avgTime: { $avg: ["$maxTime", "$minTime"] }
} },
{ $match: { avgTime: { $gt: 7 } } }

The $match filter { avgTime: { $gt: 7 } } depends on the $project stage to compute the avgTime field. The $project stage is the last projection stage in this pipeline, so the $match filter on avgTime could not be moved.

The maxTime and minTime fields are computed in the $addFields stage but have no dependency on the $project stage. The optimizer created a new $match stage for the filters on these fields and placed it before the $project stage.

The $match filter { name: "Joe Schmoe" } does not use any values computed in either the $project or $addFields stages so it was moved to a new $match stage before both of the projection stages.

Note

After optimization, the filter { name: "Joe Schmoe" } is in a $match stage at the beginning of the pipeline. This has the added benefit of allowing the aggregation to use an index on the name field when initially querying the collection. See Improve Performance with Indexes and Document Filters for more information.

When you have a sequence with $sort followed by a $match, the $match moves before the $sort to minimize the number of objects to sort. For example, if the pipeline consists of the following stages:

{ $sort: { age : -1 } },
{ $match: { status: 'A' } }

During the optimization phase, the optimizer transforms the sequence to the following:

{ $match: { status: 'A' } },
{ $sort: { age : -1 } }

When possible, when the pipeline has the $redact stage immediately followed by the $match stage, the aggregation can sometimes add a portion of the $match stage before the $redact stage. If the added $match stage is at the start of a pipeline, the aggregation can use an index as well as query the collection to limit the number of documents that enter the pipeline. See Improve Performance with Indexes and Document Filters for more information.

For example, if the pipeline consists of the following stages:

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

The optimizer can add the same $match stage before the $redact stage:

{ $match: { year: 2014 } },
{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "$$PRUNE", else: "$$DESCEND" } } },
{ $match: { year: 2014, category: { $ne: "Z" } } }

New in version 3.2.

When you have a sequence with $project or $unset followed by $skip, the $skip moves before $project. For example, if the pipeline consists of the following stages:

{ $sort: { age : -1 } },
{ $project: { status: 1, name: 1 } },
{ $skip: 5 }

During the optimization phase, the optimizer transforms the sequence to the following:

{ $sort: { age : -1 } },
{ $skip: 5 },
{ $project: { status: 1, name: 1 } }

When possible, the optimization phase coalesces a pipeline stage into its predecessor. Generally, coalescence occurs after any sequence reordering optimization.

Changed in version 4.0.

When a $sort precedes a $limit, the optimizer can coalesce the $limit into the $sort if no intervening stages modify the number of documents (e.g. $unwind, $group). MongoDB will not coalesce the $limit into the $sort if there are pipeline stages that change the number of documents between the $sort and $limit stages..

For example, if the pipeline consists of the following stages:

{ $sort : { age : -1 } },
{ $project : { age : 1, status : 1, name : 1 } },
{ $limit: 5 }

During the optimization phase, the optimizer coalesces the sequence to the following:

{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(5)
}
},
{ "$project" : {
"age" : 1,
"status" : 1,
"name" : 1
}
}

This allows the sort operation to only maintain the top n results as it progresses, where n is the specified limit, and MongoDB only needs to store n items in memory [1]. See $sort Operator and Memory for more information.

Note
Sequence Optimization with $skip

If there is a $skip stage between the $sort and $limit stages, MongoDB will coalesce the $limit into the $sort stage and increase the $limit value by the $skip amount. See $sort + $skip + $limit Sequence for an example.

[1] The optimization will still apply when allowDiskUse is true and the n items exceed the aggregation memory limit.

When a $limit immediately follows another $limit, the two stages can coalesce into a single $limit where the limit amount is the smaller of the two initial limit amounts. For example, a pipeline contains the following sequence:

{ $limit: 100 },
{ $limit: 10 }

Then the second $limit stage can coalesce into the first $limit stage and result in a single $limit stage where the limit amount 10 is the minimum of the two initial limits 100 and 10.

{ $limit: 10 }

When a $skip immediately follows another $skip, the two stages can coalesce into a single $skip where the skip amount is the sum of the two initial skip amounts. For example, a pipeline contains the following sequence:

{ $skip: 5 },
{ $skip: 2 }

Then the second $skip stage can coalesce into the first $skip stage and result in a single $skip stage where the skip amount 7 is the sum of the two initial limits 5 and 2.

{ $skip: 7 }

When a $match immediately follows another $match, the two stages can coalesce into a single $match combining the conditions with an $and. For example, a pipeline contains the following sequence:

{ $match: { year: 2014 } },
{ $match: { status: "A" } }

Then the second $match stage can coalesce into the first $match stage and result in a single $match stage

{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }

New in version 3.2.

When a $unwind immediately follows another $lookup, and the $unwind operates on the as field of the $lookup, the optimizer can coalesce the $unwind into the $lookup stage. This avoids creating large intermediate documents.

For example, a pipeline contains the following sequence:

{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y"
}
},
{ $unwind: "$resultingArray"}

The optimizer can coalesce the $unwind stage into the $lookup stage. If you run the aggregation with explain option, the explain output shows the coalesced stage:

{
$lookup: {
from: "otherCollection",
as: "resultingArray",
localField: "x",
foreignField: "y",
unwinding: { preserveNullAndEmptyArrays: false }
}
}

New in version 5.2.

Starting in MongoDB 5.2, MongoDB uses the slot-based execution query engine to execute $group stages when $group is either:

  • The first stage in the pipeline.
  • Part of a series of stages executed by the slot-based engine that occurs at the beginning of the pipeline. For example, if a pipeline begins with $match followed by $group, the $match and $group stages are executed by the slot-based engine.

In most cases, the slot-based engine provides improved performance and lower CPU and memory costs compared to the classic query engine.

To verify that the slot-based engine is used, run the aggregation with the .explain() option. This option outputs information on the aggregation's query plan.

When the slot-based query execution engine is used for $group, the explain results include:

  • explain.explainVersion: '2'
  • explain.queryPlanner.winningPlan.queryPlan.stage: "GROUP"

The following sections show how you can improve aggregation performance using indexes and document filters.

The query planner analyzes an aggregation pipeline to determine if indexes can be used to improve pipeline performance.

The following list shows some pipeline stages that can use indexes:

$match stage
$match can use an index to filter documents if $match is the first stage in a pipeline.
$sort stage
$sort can use an index if $sort is not preceded by a $project, $unwind, or $group stage.
$group stage

$group can potentially use an index to find the first document in each group if:

  • $group is preceded by $sort that sorts the field to group by, and
  • there is an index on the grouped field that matches the sort order, and
  • $first is the only accumulator in $group.

See $group Performance Optimizations for an example.

$geoNear stage
$geoNear can use a geospatial index. $geoNear must be the first stage in an aggregation pipeline.

Starting in MongoDB 4.2, in some cases, an aggregation pipeline can use a DISTINCT_SCAN index plan, which typically has higher performance than IXSCAN.

Indexes can cover queries in an aggregation pipeline. A covered query uses an index to return all of the documents and has high performance.

If your aggregation operation requires only a subset of the documents in a collection, filter the documents first:

  • Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline.
  • When possible, put $match at the beginning of the pipeline to use indexes that scan the matching documents in a collection.
  • $match followed by $sort at the start of the pipeline is equivalent to a single query with a sort, and can use an index.

A pipeline contains a sequence of $sort followed by a $skip followed by a $limit:

{ $sort: { age : -1 } },
{ $skip: 10 },
{ $limit: 5 }

The optimizer performs $sort + $limit Coalescence to transforms the sequence to the following:

{
"$sort" : {
"sortKey" : {
"age" : -1
},
"limit" : NumberLong(15)
}
},
{
"$skip" : NumberLong(10)
}

MongoDB increases the $limit amount with the reordering.

Tip
See also:
Give Feedback
MongoDB logo
© 2021 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2021 MongoDB, Inc.