Array Index issue - Not working

We are trying to fetch data out of 100 million records. The filter attributes are in different arrays. We created separate indexes on both arrays. We used multiple stages to filter the records. But when wee checking the execution plan, both match stages are combined and the index is chosen. Its taking more than 20 seconds to get the final result set(only 23 records). Below the execution plan

                  'namespace': 'dap-ods.bookings',
                  'optimizedPipeline': True,
                  'parsedQuery': {'$and': [{'contacts.value': {'$eq': '11133 3336-8878'}},
                                           {'segments.departureAirportCode': {'$eq': 'SEA'}},
                                           {'segments.departureDateTimeStnLocal': {'$lte': datetime.datetime(2023, 10, 14, 0, 0)}},
                                           {'segments.departureDateTimeStnLocal': {'$gte': datetime.datetime(2023, 10, 13, 0, 0)}}]},
                  'planCacheKey': '6820AF83',
                  'plannerVersion': 1,
                  'queryHash': 'D0CBB6D1',
                  'rejectedPlans': [{'inputStage': {'filter': {'$and': [{'contacts.value': {'$eq': '11133 3336-8878'}},
                                                                        {'segments.departureAirportCode': {'$eq': 'SEA'}},
                                                                        {'segments.departureDateTimeStnLocal': {'$gte': datetime.datetime(2023, 10, 13, 0, 0)}}]},
                                                    'inputStage': {'direction': 'forward',
                                                                   'indexBounds': {'segments.departureDateTimeStnLocal': ['[new '
                                                                                                                          'Date(-9223372036854775808), '
                                                                                                                          'new '
                                                                                                                          'Date(1697241600000)]']},
                                                                   'indexName': 'segments.departureDateTimeStnLocal_1',
                                                                   'indexVersion': 2,
                                                                   'isMultiKey': True,
                                                                   'isPartial': False,
                                                                   'isSparse': False,
                                                                   'isUnique': False,
                                                                   'keyPattern': {'segments.departureDateTimeStnLocal': 1},
                                                                   'multiKeyPaths': {'segments.departureDateTimeStnLocal': ['segments']},
                                                                   'stage': 'IXSCAN'},
                                                    'stage': 'FETCH'},
                                     'stage': 'PROJECTION_SIMPLE',
                                     'transformBy': {'_id': True,
                                                     'businessKey': True,
                                                     'passengersInfo': True}},
                                    {'inputStage': {'filter': {'$and': [{'contacts.value': {'$eq': '1111333336 8878'}},
                                                                        {'segments.departureAirportCode': {'$eq': 'SEA'}},
                                                                        {'segments.departureDateTimeStnLocal': {'$lte': datetime.datetime(2023, 10, 14, 0, 0)}}]},
                                                    'inputStage': {'direction': 'forward',
                                                                   'indexBounds': {'segments.departureDateTimeStnLocal': ['[new '
                                                                                                                          'Date(1697155200000), '
                                                                                                                          'new '
                                                                                                                          'Date(9223372036854775807)]']},
                                                                   'indexName': 'segments.departureDateTimeStnLocal_1',
                                                                   'indexVersion': 2,
                                                                   'isMultiKey': True,
                                                                   'isPartial': False,
                                                                   'isSparse': False,
                                                                   'isUnique': False,
                                                                   'keyPattern': {'segments.departureDateTimeStnLocal': 1},
                                                                   'multiKeyPaths': {'segments.departureDateTimeStnLocal': ['segments']},
                                                                   'stage': 'IXSCAN'},
                                                    'stage': 'FETCH'},
                                     'stage': 'PROJECTION_SIMPLE',
                                     'transformBy': {'_id': True,
                                                     'businessKey': True,
                                                     'passengersInfo': True}},
                                    {'inputStage': {'filter': {'$and': [{'contacts.value': {'$eq': '11133 3336-8878'}},
                                                                        {'segments.departureAirportCode': {'$eq': 'SEA'}},
                                                                        {'segments.departureDateTimeStnLocal': {'$gte': datetime.datetime(2023, 10, 13, 0, 0)}}]},
                                                    'inputStage': {'direction': 'forward',
                                                                   'indexBounds': {'segments.departureAirportCode': ['[MinKey, '
                                                                                                                     'MaxKey]'],
                                                                                   'segments.departureDateTimeStnLocal': ['[new '
                                                                                                                          'Date(-9223372036854775808), '
                                                                                                                          'new '
                                                                                                                          'Date(1697241600000)]']},
                                                                   'indexName': 'seg_departureDate_depAirport_1',
                                                                   'indexVersion': 2,
                                                                   'isMultiKey': True,
                                                                   'isPartial': False,
                                                                   'isSparse': False,
                                                                   'isUnique': False,
                                                                   'keyPattern': {'segments.departureAirportCode': 1,
                                                                                  'segments.departureDateTimeStnLocal': 1},
                                                                   'multiKeyPaths': {'segments.departureAirportCode': ['segments'],
                                                                                     'segments.departureDateTimeStnLocal': ['segments']},
                                                                   'stage': 'IXSCAN'},
                                                    'stage': 'FETCH'},
                                     'stage': 'PROJECTION_SIMPLE',
                                     'transformBy': {'_id': True,
                                                     'businessKey': True,
                                                     'passengersInfo': True}},
                                    {'inputStage': {'filter': {'$and': [{'contacts.value': {'$eq': '11133 3336-8878'}},
                                                                        {'segments.departureAirportCode': {'$eq': 'SEA'}},
                                                                        {'segments.departureDateTimeStnLocal': {'$lte': datetime.datetime(2023, 10, 14, 0, 0)}}]},
                                                    'inputStage': {'direction': 'forward',
                                                                   'indexBounds': {'segments.departureAirportCode': ['[MinKey, '
                                                                                                                     'MaxKey]'],
                                                                                   'segments.departureDateTimeStnLocal': ['[new '
                                                                                                                          'Date(1697155200000), '
                                                                                                                          'new '
                                                                                                                          'Date(9223372036854775807)]']},
                                                                   'indexName': 'seg_departureDate_depAirport_1',
                                                                   'indexVersion': 2,
                                                                   'isMultiKey': True,
                                                                   'isPartial': False,
                                                                   'isSparse': False,
                                                                   'isUnique': False,
                                                                   'keyPattern': {'segments.departureAirportCode': 1,
                                                                                  'segments.departureDateTimeStnLocal': 1},
                                                                   'multiKeyPaths': {'segments.departureAirportCode': ['segments'],
                                                                                     'segments.departureDateTimeStnLocal': ['segments']},
                                                                   'stage': 'IXSCAN'},
                                                    'stage': 'FETCH'},
                                     'stage': 'PROJECTION_SIMPLE',
                                     'transformBy': {'_id': True,
                                                     'businessKey': True,
                                                     'passengersInfo': True}}],
                  'winningPlan': {'inputStage': {'filter': {'$and': [{'segments.departureAirportCode': {'$eq': 'SEA'}},
                                                                     {'segments.departureDateTimeStnLocal': {'$lte': datetime.datetime(2023, 10, 14, 0, 0)}},
                                                                     {'segments.departureDateTimeStnLocal': {'$gte': datetime.datetime(2023, 10, 13, 0, 0)}}]},
                                                 'inputStage': {'direction': 'forward',
                                                                'indexBounds': {'contacts.value': ['["11133 3336-8878", "11133 3336-8878"]']},
                                                                'indexName': 'contacts_1',
                                                                'indexVersion': 2,
                                                                'isMultiKey': True,
                                                                'isPartial': False,
                                                                'isSparse': False,
                                                                'isUnique': False,
                                                                'keyPattern': {'contacts.value': 1},
                                                                'multiKeyPaths': {'contacts.value': ['contacts']},
                                                                'stage': 'IXSCAN'},
                                                 'stage': 'FETCH'},
                                  'stage': 'PROJECTION_SIMPLE',
                                  'transformBy': {'_id': True,
                                                  'businessKey': True,
                                                  'passengersInfo': True}}},```
 

Here is the original query we tried
```agg_pipeline = [
  {'$match':{"contacts.value":"11133 3336-8878"}},
  {'$match':{"segments.departureDateTimeStnLocal":{'$gte':start_date, '$lte':end_date},'segments.departureAirportCode':'SEA'}},
  {'$project':{'businessKey':1, 'passengersInfo':1}}
]
explain_output = db.command('aggregate', 'test_collection', pipeline=agg_pipeline, explain=True)```

Hello @Habeeb_Raja ,

Firstly, I would recommend you to run your query with explain in executionStats mode (e.g. `db.collection.explain(‘executionStats’).aggregate(…)) and examine the output.

As per my understanding, You can consider following changes

  • Combine your two $match stages into one to make the query more efficient. This will reduce the number of documents that need to be processed and eliminate the need for combining separate $match conditions.

  • If your result set is large but you only need a limited number of records, consider adding a $limit stage to limit the number of documents returned. This can improve performance and reduce network traffic.

  • If you need to process a large number of documents, you might consider splitting your aggregation into smaller batches and using the $skip and $limit stages to process the data incrementally.

Here’s the improved aggregation pipeline with the suggested changes:

agg_pipeline = [
    {'$match': {
        "contacts.value": "11133 3336-8878",
        "segments.departureDateTimeStnLocal": { '$gte': start_date, '$lte': end_date },
        "segments.departureAirportCode": "SEA"
    }},
    {'$project': {
        'businessKey': 1,
        'passengersInfo': 1
    }}
]

Remember to fine-tune your indexes and adapt the pipeline based on the specifics of your dataset and query patterns for the best performance.

Regards,
Tarun

@Tarun_Gaur Thanks for the update.

We have tried the changes you suggested. Here is the output

"executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 23,
                "executionTimeMillis" : 69019,
                "totalKeysExamined" : 6666,
                "totalDocsExamined" : 6666,
                "executionStages" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "$and" : [
                                        {
                                                "segments.departureAirportCode" : {
                                                        "$eq" : "SEA"
                                                }
                                        },
                                        {
                                                "segments.departureDateTimeStnLocal" : {
                                                        "$lte" : ISODate("2023-10-14T00:00:00Z")
                                                }
                                        },
                                        {
                                                "segments.departureDateTimeStnLocal" : {
                                                        "$gte" : ISODate("2023-10-13T00:00:00Z")
                                                }
                                        }
                                ]
                        },
                        "nReturned" : 23,
                        "executionTimeMillisEstimate" : 32163,
                        "works" : 6668,
                        "advanced" : 23,
                        "needTime" : 6643,
                        "needYield" : 0,
                        "saveState" : 4543,
                        "restoreState" : 4543,
                        "isEOF" : 1,
                        "docsExamined" : 6666,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 6666,
                                "executionTimeMillisEstimate" : 38,
                                "works" : 6667,
                                "advanced" : 6666,
                                "needTime" : 0,
                                "needYield" : 0,
                                "saveState" : 4543,
                                "restoreState" : 4543,
                                "isEOF" : 1,
                                "keyPattern" : {
                                        "contacts.value" : 1
                                },
                                "indexName" : "contacts_1",
                                "isMultiKey" : true,
                                "multiKeyPaths" : {
                                        "contacts.value" : [
                                                "contacts"
                                        ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "contacts.value" : [
                                                "[\"11133 3336-8878\", \"11133 3336-8878\"]"
                                        ]
                                },
                                "keysExamined" : 6666,
                                "seeks" : 1,
                                "dupsTested" : 6666,
                                "dupsDropped" : 0
                        }
                }
}

Final result will be 23 documents. But the Keys Examined is 6666. How to reduce that. Also we are not much clear about 3rd point “aggregation into smaller batches and using the $skip and $limit stages to process the data incrementally”. Can you share some reference to that

Queries might return many results. To make navigating results easier, one can use the aggregation pipeline stages to paginate the query results. As you mentioned that your results only includes 23 documents so this scenario does not require pagination. Kindly refer How to Paginate Query Results to learn more about this.

When the collection gets much larger, with respect to the indexing, the query still should return relatively quick (with caveat of data size vs. hardware capabilities, of course). I think the indexes you have created might be optimal for your use-case.

Can you share more details mentioned below for me to provide further optimization recommendations?

  • MongoDB Version
  • Sample documents
  • Existing indexes on his collection. Please provide output of db.collection.getIndexes().
  • Output of db.collection.stats()

Regards,
Tarun