So, I have a collection with millions of documents. I’m trying to create an aggregation which does several steps to eventually retrieve a field and its counting in the same collection. by doing some filtering.
Here’s my pipeline:
[{$match: {
field1: 'xxxxxxxx-yyyy-zzzz-wwww-qqqqqqqqqqqq'
}},
{$limit: 50000},
{$lookup: {
from: 'collection',
localField: 'field2',
foreignField: 'field2',
as: 'field2_field2'
}},
{$lookup: {
from: 'collection',
localField: 'field2',
foreignField: 'field3.field2',
as: 'field2_field3'
}},
{$lookup: {
from: 'collection',
localField: 'field3.field2',
foreignField: 'field2',
as: 'field3_field2'
}},
{$project: {
field2: 1,
ids: {
$concatArrays: [
{
$ifNull: [
'$field2_field2',
[]
]
},
{
$ifNull: [
'$field2_field3',
[]
]
},
{
$ifNull: [
'$field3_field2',
[]
]
}
]
}
}},
{$project: {
field2: 1,
ids: {
$filter: {
input: '$ids',
as: 'item',
cond: {
$and: [
{
$ne: [
'$$item._id',
'$_id'
]
},
{
$eq: [
'$$item.field4',
'23'
]
},
{
$eq: [
{
$type: '$$item.deleted'
},
'missing'
]
}
]
}
}
}
}},
{$project: {
_id: 0,
field2: 1,
counts: {
$size: '$ids'
}
}}]
I’m having troubles with the indexes that I think are not being used properly.
First of all, when I perform an explain(“executionStats”) on the aggregate, the final explain doesn’t have for each stage the indexes used. This way I cannot understand how it worked.
Secondly, I’m limiting the query to 50K documents, since if I do it for the whole collection by field1 (2M+ docs) it gets stuck.
I’m using index on field1:
{“field1”:1, …}
(there are a lot of compound indexes with field1 as prefix)
then for the lookup stages:
{“field2”:1, field4}
(used for another query)
{“field3.field2”:1, “field2”:1}
And these are the execution stats:
{ executionSuccess: true,
nReturned: 50000,
executionTimeMillis: 18732,
totalKeysExamined: 50000,
totalDocsExamined: 50000,
executionStages:
{ stage: 'LIMIT',
nReturned: 50000,
executionTimeMillisEstimate: 84,
works: 50001,
advanced: 50000,
needTime: 0,
needYield: 0,
saveState: 56,
restoreState: 56,
isEOF: 1,
limitAmount: 50000,
inputStage:
{ stage: 'PROJECTION_DEFAULT',
nReturned: 50000,
executionTimeMillisEstimate: 84,
works: 50000,
advanced: 50000,
needTime: 0,
needYield: 0,
saveState: 56,
restoreState: 56,
isEOF: 0,
transformBy:
{ _id: 1,
field3_field2: 1,
'field3.field2': 1,
field2_field3: 1,
field2_field2: 1,
field2: 1 },
inputStage:
{ stage: 'FETCH',
nReturned: 50000,
executionTimeMillisEstimate: 73,
works: 50000,
advanced: 50000,
needTime: 0,
needYield: 0,
saveState: 56,
restoreState: 56,
isEOF: 0,
docsExamined: 50000,
alreadyHasObj: 0,
inputStage:
{ stage: 'IXSCAN',
nReturned: 50000,
executionTimeMillisEstimate: 21,
works: 50000,
advanced: 50000,
needTime: 0,
needYield: 0,
saveState: 56,
restoreState: 56,
isEOF: 0,
keyPattern: { field1: 1, field2: 1 },
indexName: 'i_field1_field2',
isMultiKey: false,
multiKeyPaths: { field1: [], field2: [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds:
{ field1: [ '["xxxxxxxx-yyyy-zzzz-wwww-qqqqqqqqqqqq"]' ],
field2: [ '[MinKey, MaxKey]' ] },
keysExamined: 50000,
seeks: 1,
dupsTested: 0,
dupsDropped: 0 } } } } } },
nReturned: 50000,
executionTimeMillisEstimate: 215 },
{ '$lookup':
{ from: 'collection',
as: 'field2_field2',
localField: field2',
foreignField: 'field2' },
nReturned: 50000,
executionTimeMillisEstimate: 5456 },
{ '$lookup':
{ from: 'collection',
as: 'field2_field3',
localField: 'field2',
foreignField: 'field3.field2' },
nReturned: 50000,
executionTimeMillisEstimate: 14914 },
{ '$lookup':
{ from: 'collection',
as: 'field3_field2',
localField: 'field3.field2',
foreignField: 'field2' },
nReturned: 50000,
executionTimeMillisEstimate: 18680 },
{ '$project':
{ _id: true,
field2: true,
ids:
{ '$concatArrays':
[ { '$ifNull': [ '$field2_field2', { '$const': [] } ] },
{ '$ifNull': [ '$field2_field3', { '$const': [] } ] },
{ '$ifNull': [ '$field3_field2', { '$const': [] } ] } ] } },
nReturned: 50000,
executionTimeMillisEstimate: 18681 },
{ '$project':
{ _id: true,
field2: true,
ids:
{ '$filter':
{ input: '$ids',
as: 'item',
cond:
{ '$and':
[ { '$ne': [ '$$item._id', '$_id' ] },
{ '$eq': [ '$$item.field4', { '$const': '23' } ] },
{ '$eq': [ { '$type': [ '$$item.deleted' ] }, { '$const': 'missing' } ] } ] } } } },
nReturned: 50000,
executionTimeMillisEstimate: 18684 },
{ '$project':
{ field2: true,
counts: { '$size': [ '$ids' ] },
_id: false },
nReturned: 50000,
executionTimeMillisEstimate: 18685 } ],
Here’s an example of final output document:
{“field2”: “a field”, “counts”: 15}
- Did I miss any possible index?
- Can the pipeline be optimized?
- And why is the $match stage FETCHing docs while being supported by field1 index?