Aggregation query is taking too long

Hi, :wave:  The below query takes more then a minute to aggregate the data. I have around 6 million records in a collection.
I tried multiple approaches to optimise it but nothing worked.

I am using the mongoDB version 4.4.8.

Through below query I am trying to get the total count and average of all the records of a customer in given period. The period can be day, week, month and year).

db.getCollection(collectionName).aggregate([
    {
        $match: {
            "customer.id": { $in: ["10001","10002"] }
	    "date.year": { $in: [ 2020, 2021, 2022 ] }
        }
    },
    {
        $group:
            {
                _id: { "customer_id": "$customer.id" },
                "total_records": { $sum: 1 },
                "total_rating": { "$avg": "$rating.my_rating.rating" }
            }
    },
    {
        $project:
            {
                _id: 0,
                "customer_id": "$_id.customer_id",
                "total_records": "$total_records",
                "total_rating": "$total_rating"
            }
    }
])

The document in collection is look like this :
{
    "_id" : ObjectId("5r21grnf457sdfbhdghdsh876c17a1"),
    "id" : "12345",  
    "date" : {
        "date_id" : NumberInt(20180906),
        "date_value" : "2018-09-06",
        "day" : NumberInt(6),
        "day_short_name" : "Thu",
        "day_long_name" : "Thursday",
        "week_id" : NumberInt(201836),
        "week" : NumberInt(36),
        "week_short_name" : "WK36",
        "week_long_name" : "WEEK36",
        "week_year" : "WK36'18",
        "month_id" : NumberInt(201809),
        "month" : NumberInt(9),
        "month_short_name" : "Sep",
        "month_long_name" : "September",
        "month_year" : "Sep'18",
        "qtr_id" : NumberInt(201803),
        "quarter" : NumberInt(3),
        "qtr_short_name" : "Q3",
        "qtr_long_name" : "QUARTER3",
        "qtr_year" : "Q3'18",
        "year" : NumberInt(2018)
    },
    "customer" : {
        "id" : "888",
        "name" : "ABCD",
        "status" : ""
    },
    "category" : {
        "id" : "1",
        "name" : "Consumers"
    },
    "review" : {
        "source_id" : "10",
        "source_name" : "Facebook"        
    },
    "reviewer" : {
        "commentedby" : "XYZ",
        "userurl" : "",
        "address" : "",
        "city" : "",
        "state" : "",
        "country" : ""
    },   
    "rating" : {
        "my_rating" : {"rating" : 5 }
    },
    "processdata" : {
        "processstatus" : "xxxx",
        "processdatetime" : ISODate("2017-11-11T00:43:58.000+0000"),
        "created_by" : "xxxx",
        "created_date" : ISODate("2017-11-11T00:43:58.000+0000"),
        "updated_by" : "xxxx",
        "updated_date" : ISODate("2018-11-12T19:08:04.000+0000"),
        "updated_count" : NumberInt(0)
    },
    "customer_id" : "999"
}

Below is the index which covers the filter query :
{ "customer.id" : 1.0, "date.year" : 1.0 }

Below is the query explain(executionStats) :
{
    "stages" : [
        {
            "$cursor" : {
                "queryPlanner" : {
                    "plannerVersion" : 1.0,
                    "namespace" : "xxxxx",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "$and" : [ 
                            {
                                "customer.id" : {
                                    "$in" : [
                                        "10001",
                                        "10002"
                                    ]
                                }
                            },
                            {
                                "date.year" : {
                                    "$in" : [
                                        2020.0,
                                        2021.0,
                                        2022.0
                                    ]
                                }
                            }
                        ]
                    },
                    "collation" : {
                        "locale" : "en",
                        "caseLevel" : false,
                        "caseFirst" : "off",
                        "strength" : 2.0,
                        "numericOrdering" : false,
                        "alternate" : "non-ignorable",
                        "maxVariable" : "punct",
                        "normalization" : false,
                        "backwards" : false,
                        "version" : "57.1"
                    },
                    "queryHash" : "xxxxxx",
                    "planCacheKey" : "xxxxxx`Preformatted text`",
                    "winningPlan" : {
                        "stage" : "PROJECTION_DEFAULT",
                        "transformBy" : {
                            "customer.id" : 1.0,
                            "rating.my_rating.rating" : 1.0,
                            "_id" : 0.0
                        },
                        "inputStage" : {
                            "stage" : "FETCH",
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                    "customer.id" : 1.0,
                                    "date.year" : 1.0
                                },
                                "indexName" : "idx_customer_year",
                                "collation" : {
                                    "locale" : "en",
                                    "caseLevel" : false,
                                    "caseFirst" : "off",
                                    "strength" : 2.0,
                                    "numericOrdering" : false,
                                    "alternate" : "non-ignorable",
                                    "maxVariable" : "punct",
                                    "normalization" : false,
                                    "backwards" : false,
                                    "version" : "57.1"
                                },
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                    "customer.id" : [

                                    ],
                                    "date.year" : [

                                    ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2.0,
                                "direction" : "forward",
                                "indexBounds" : {
                                    "customer.id" : [
                                        "[\"\\x14\\x12\\x12\\x16\\x01\b\", \"\\x14\\x12\\x12\\x16\\x01\b\"]",
                                        "[\"\\x1A\\x14\\x12\\x12\\x01\b\", \"\\x1A\\x14\\x12\\x12\\x01\b\"]"
                                    ],
                                    "date.year" : [
                                        "[2020, 2020]",
                                        "[2021, 2021]",
                                        "[2022, 2022]"
                                    ]
                                }
                            }
                        }
                    },
                    "rejectedPlans" : [...]
                },
                "executionStats" : {
                    "executionSuccess" : true,
                    "nReturned" : 472770.0,
                    "executionTimeMillis" : 108840.0,
                    "totalKeysExamined" : 472773.0,
                    "totalDocsExamined" : 472770.0,
                    "executionStages" : {
                        "stage" : "PROJECTION_DEFAULT",
                        "nReturned" : 472770.0,
                        "executionTimeMillisEstimate" : 98016.0,
                        "works" : 472773.0,
                        "advanced" : 472770.0,
                        "needTime" : 2.0,
                        "needYield" : 0.0,
                        "saveState" : 4602.0,
                        "restoreState" : 4602.0,
                        "isEOF" : 1.0,
                        "transformBy" : {
                            "customer.id" : 1.0,
                            "rating.my_rating.rating" : 1.0,
                            "_id" : 0.0
                        },
                        "inputStage" : {
                            "stage" : "FETCH",
                            "nReturned" : 472770.0,
                            "executionTimeMillisEstimate" : 97188.0,
                            "works" : 472773.0,
                            "advanced" : 472770.0,
                            "needTime" : 2.0,
                            "needYield" : 0.0,
                            "saveState" : 4602.0,
                            "restoreState" : 4602.0,
                            "isEOF" : 1.0,
                            "docsExamined" : 472770.0,
                            "alreadyHasObj" : 0.0,
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 472770.0,
                                "executionTimeMillisEstimate" : 1720.0,
                                "works" : 472773.0,
                                "advanced" : 472770.0,
                                "needTime" : 2.0,
                                "needYield" : 0.0,
                                "saveState" : 4602.0,
                                "restoreState" : 4602.0,
                                "isEOF" : 1.0,
                                "keyPattern" : {
                                    "customer.id" : 1.0,
                                    "date.year" : 1.0
                                },
                                "indexName" : "idx_customer_year",
                                "collation" : {
                                    "locale" : "en",
                                    "caseLevel" : false,
                                    "caseFirst" : "off",
                                    "strength" : 2.0,
                                    "numericOrdering" : false,
                                    "alternate" : "non-ignorable",
                                    "maxVariable" : "punct",
                                    "normalization" : false,
                                    "backwards" : false,
                                    "version" : "57.1"
                                },
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                    "customer.id" : [

                                    ],
                                    "date.year" : [

                                    ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2.0,
                                "direction" : "forward",
                                "indexBounds" : {
                                    "customer.id" : [
                                        "[\"\\x14\\x12\\x12\\x16\\x01\b\", \"\\x14\\x12\\x12\\x16\\x01\b\"]",
                                        "[\"\\x1A\\x14\\x12\\x12\\x01\b\", \"\\x1A\\x14\\x12\\x12\\x01\b\"]"
                                    ],
                                    "date.year" : [
                                        "[2020, 2020]",
                                        "[2021, 2021]",
                                        "[2022, 2022]"
                                    ]
                                },
                                "keysExamined" : 472773.0,
                                "seeks" : 3.0,
                                "dupsTested" : 0.0,
                                "dupsDropped" : 0.0
                            }
                        }
                    }
                }
            },
            "nReturned" : NumberLong(472770),
            "executionTimeMillisEstimate" : NumberLong(107046)
        },
        {
            "$group" : {
                "_id" : {
                    "customer_id" : "$customer.id"
                },
                "total_records" : {
                    "$sum" : {
                        "$const" : 1.0
                    }
                },
                "total_rating" : {
                    "$avg" : "$rating.my_rating.rating"
                }
            },
            "nReturned" : NumberLong(2),
            "executionTimeMillisEstimate" : NumberLong(108040)
        },
        {
            "$project" : {
                "customer_id": "$_id.customer_id",
                "total_records": "$total_records",
                "total_rating": "$total_rating"
                "_id" : false
            },
            "nReturned" : NumberLong(2),
            "executionTimeMillisEstimate" : NumberLong(108040)
        }
    ],
    "serverInfo" : {
        "host" : "xxxx",
        "port" : xxxx,
        "version" : "4.4.8",
        "gitVersion" : "xxxxxxxx"
    },
    "ok" : 1.0
}

It would be helfull if anybody can give some ideas to optimise this query.

Thanks in advance.

Sometimes performance issues are not logic specific.

Your query is doing IXSCAN. Also nReturned, totalKeysExamined and totalDocsExamined are almost all identical. For this specific query, adding rating.my_rating.rating to your index might help.

Sometimes performance issues are related to your data size vs system configuration size. If your documents need to read from disk because the working set does not fit in RAM it will be slow. In your case you FETCH ~470K documents. And your documents appear to be bloated (I mean big). The date field is certainly over-engineered as it contains many values that can be easily computed from a simple Date field.

Hi Steevej,

Thanks for chipping in. Could you kindly advise me on the appropriate RAM to use for the below-DB storage?

image

First, I mentioned

So your case might be different.

Nobody can really recommend an exact size. It depends on working set. Which might be 100% of your data and indexes size, or just 20%. I don’t know. Only you can determine that. If your disk is working 100% of the time you need more RAM, if your disk is working under 20% your problem lies elsewhere.

Thank you Steveej.

I’ll verify how much of the disk is being used when the query is run and will decide the RAM based on that.

1 Like