MongoDB aggregation query that experiences a significant increase in execution time when adding $count or $group stages

Mehul_Sanghvi · May 13, 2024, 9:47am

I am attempting to execute a query to count documents based on the ledgerName field, which is used to search for the required ledger name. Additionally, I need to obtain distinct counts of each status for the searched ledger name and store all data that matches each status into a list.

Below is the demo data for three collections involved in my query:

1. Settings: (Collection Size: 30K documents)
{

	_id: // ObjectId of doc
	company_id: // ObjectId of company
	ledgerName_id: // ID of ledger in String

}
2. History: (Collection Size: 600K documents)
{
	_id: // ObjectId of doc
	company_id: // ObjectId of company
	ledgerName_id: // ID of ledger in String
	createdAt: // Date of creation of history
	status: // Status 1 or 0
}
3. Ledger_Name: (Collection Size 50M documents)
{
	_id: // ObjectId of doc
	company_id: // ObjectId of company
	name_id: // ID of ledger in String
	ledgerName: // Name of ledger in string
}

Here is the query that utilizes indexed scans across the Settings collection and also leverages indices in the other two collections during the lookup stages:

Query:
db.getCollection("Settings").aggregate(
[{
	$match:{
	"company_id": // Some Company ID
	}
},
{
	$lookup:{
		as: "SettingsData",
		from: "Settings",
		"let": {
			"company_id": "$company_id",
			"ledgerName_id": "$ledgerName_id"
                },
		pipeline:[
                    {
                        "$match": {
                            "$expr": {
                                "$and": [
                                    { "$eq": [ "$company_id", "$company_id" ] },
                                    { "$eq": [ "$ledgerName_id", "$ledgerName_id" ] }
                                ]
                            },
                            "createdAt": {
                                "$gte": ISODate("2024-04-01T00:00:00.000+0530"),
                                "$lt": ISODate("2025-04-01T00:00:00.000+0530")
                            },

                        }
                    },
                    { "$sort": { "createdAt": -1 } }, // Sort by createdAt in descending order
                    { "$limit": 1 }, //
                    {
                        "$project": {
                            "_id": 1,
                            "createdAt": 1,
                            "status": 1,
                        }
                    }
                ]
	}
},
{
	"$unwind": {
		"path": "$SettingsData",
		"preserveNullAndEmptyArrays": true
            }
},
{
            "$lookup": {
                "as": "ledgersData",
                "from": "Ledger_Name",
                "let": {
                    "company_id": "$company_id",
                    "ledgerName_id": "$ledgerName_id"
                },
                "pipeline": [
                    {
                        "$match": {
                            "$expr": {
                                "$and": [
                                    { "$eq": [ "$company_id", "$company_id" ] },
                                    { "$eq": [ "$name_id", "$ledgerName_id" ] } 
				] 
				}
			}
		    },
                    {
                        "$project": {
                            "_id": 0,
                            "ledgerName": 1 // Search Term
                        }
                    }
                ]
            }
        },	
},
{
	"$unwind": {
                "path": "$ledgersData",
                "preserveNullAndEmptyArrays": true
        }
},
{
	"$count": "count"
}

When I run this query without any aggregation operations like $count or $group, it completes within 500 milliseconds. However, adding these stages extends the execution time to 15-20 seconds. Could anyone help me in optimizing this query?

Peter_Hubbard · May 14, 2024, 8:59am

$group and $count are so-called “blocking stages”, which means you need to wait for them to count (or group) ALL the results from the previous stage. This is expected behaviour.

Are you sure the query is actually completing after 500ms, and not just a cursor batch??

Mehul_Sanghvi · May 20, 2024, 12:13pm

@Peter_Hubbard Hello

Yes I am sure that without $count stage query is taking below 500ms and after adding it, goes to 15 to 20 seconds.

steevej · May 21, 2024, 2:02pm

Please share the code you use without the $count stage.

Simply doing result = …aggregate( [ the pipeline without $count ] ) does not process all the documents of the result set so in fact you just get

Calling result.toArray() should get you all the documents.

Since you $unwind, just to count, you might want to simply $project the size of $ledgersData, which will not be blocking. And then $group to $sum of size of each array. This would at lease reduce the potential extra RAM usage of unwind.

And with a second look at the code, you could try a $count in the last $lookup pipeline. I am just not too sure of the empty array case.

steevej · May 28, 2024, 12:14pm

@Mehul_Sanghvi, please do not let your thread to die with proper closure. Please followup.

Mehul_Sanghvi · May 30, 2024, 7:58am

@steevej Appologies for late reply.

Answer to your 1st question: the query that I have provided, just remove $count stage from it. It is the same query.

Additional pipelines will include $group stage that will fetch summation of docmentations with different business logics. (I can’t share that thing with you that’s why i have created a simple sample query with count stage. If that query is executed in minimum amount of time then I could also get those thing too)

steevej · May 31, 2024, 11:36am

So you confirm, as Peter_Hubbard hinted, that your query is NOT completing after 500ms because just removing the $count stage means that you

To complete the query, rather than a simple cursor batch, you have to

Please share the explain plan.

Mehul_Sanghvi · June 6, 2024, 12:04pm

Explain output for 1st Lookup stage:

   "totalDocsExamined" : NumberLong(42526),
    "totalKeysExamined" : NumberLong(42526),
    "collectionScans" : NumberLong(0),
    "indexesUsed" : [
        "company_id_1_ledgerName_id_1_createdAt_1"
    ],
    "nReturned" : NumberLong(42544),
    "executionTimeMillisEstimate" : NumberLong(639)

Explain output for 2nd Lookup stage:

    "totalDocsExamined" : NumberLong(42534),
    "totalKeysExamined" : NumberLong(42534),
    "collectionScans" : NumberLong(0),
    "indexesUsed" : [
        "company_id_1_ledgerName_id_1_name_1"
    ],
    "nReturned" : NumberLong(42544),
    "executionTimeMillisEstimate" : NumberLong(9909)