Lets consider a document structure with arrays with sub-documents as elements. For example, a collection with these two documents:
{
"_id" : 1,
"fld1" : "str-1",
"fld2" : [
{
"a" : 1,
"b" : [
{
"x" : 1,
"y" : 2
},
{
"x" : 11,
"y" : 22
}
]
},
{
"a" : 2,
"b" : [
{
"x" : 111,
"y" : 222
}
]
}
]
},
{
"_id" : 2,
"fld1" : "str-2",
"fld2" : [
{
"a" : 9,
"b" : [
{
"x" : 91,
"y" : 92
},
{
"x" : 911,
"y" : 922
}
]
}
]
}
The Query:
In an aggregation query, the $match+$project stage filtering and $match+$unwind+$match filtering work differently on a document. But, can produce similar output.
The following query is using $match+$unwind+$match filter. Note the output:
db.nests2.aggregate( [
{ $match: { fld1: "str-1" } },
{ $unwind: "$fld2" },
{ $match: { "fld2.a": 1 } },
] ).pretty()
{
"_id" : 1,
"fld1" : "str-1",
"fld2" : {
"a" : 1,
"b" : [
{
"x" : 1,
"y" : 2
},
{
"x" : 11,
"y" : 22
}
]
}
}
The following query is using $match (filter on multiple levels) + $project (using $addFields in the example).
db.nests2.aggregate( [
{ $match: { fld1: "str-1", "fld2.a": 1 } },
{ $addFields: { fld2: {
$filter: {
input: "$fld2",
cond: {
$eq: [ "$$this.a", 1 ]
}
}
} } }
] ).pretty()
{
"_id" : 1,
"fld1" : "str-1",
"fld2" : [
{
"a" : 1,
"b" : [
{
"x" : 1,
"y" : 2
},
{
"x" : 11,
"y" : 22
}
]
}
]
}
Indexes:
If a compound index is defined on the keys { fld1 : 1, "fld2.a" : 1 }, this index is applied on both the queries. But, on the first $match stage only. This can be verified by generating a query plan using explain() on both queries.
( [EDIT ADD]: Even if an index is created on the single field { "fld2.a": 1 } it will not be used by the query).
Same goes as we go down multiple levels down. For example in the following case with $match and the index:
{ $match: { fld1: "str-1", "fld2.a": 1, "fld2.b.y": 22 } }
and the index, { fld1 : 1, "fld2.a" : 1, "fld2.b.y": 1 }
NOTES:
In general, indexes on the document fields are best utilized when the aggregation stages that use the index come early in the pipeline (mostly as the first stage as in this case). The stages $match and $sort use the indexes for performance. Also see documentation: Aggregation Pipeline Optimization.
Note that creating indexes on arrays (a.k.a. multikey indexes ) on large array fields can affect performance, as the indexes can grow very large.