Query performance is very poor for 10millions records

Hi, We are working on a project which is storing the data into MongoDB collections. We have three collections and each of them has more than 10million documents. We created the index for one of the fields(date) and executed the simple aggregation query to fetch the data from one collection. But it takes more than a minute to return the aggregated data. If we are querying the fewer data with less date range, it’s coming within a seconds. If we increase the date range, then we face the latency issue.

We have increased the throughput of the server and created one primary and two replica sets in the MongoDB cluster, but the performance issue still persists.

Note : Replica set is only for reading operations

Example :

db.collection.aggregate(
[
    { $match : { date: {
            $gte: ISODate("2021-06-01T00:00:00.00Z"),
            $lte: ISODate("2021-08-30T23:59:59.00Z")}
    }},
    { $sort : { "source.value" : -1}},
    { $project : {
        title : 1,
        url : 1
    }},
    { $limit : 5}
])

Index size : 100MB

Avg Document size : 13KB

Server Details:

Hard Disk: 500GB

Ram : 16GB

If anyone provide the solution, that will be great for us

1 Like

Hello @Jony_Jackson_C, welcome to the MongoDB Community forum!

I see the aggregation query has two stages which can benefit from the indexes - the $match and the $sort. Indexes help improve the performance of a query. In aggregation queries having a $match and $sort stages at the beginning of the pipeline is a good practice (and you already have followed it). Both the stages are capable of using the indexes if properly configured - both the query as well as the indexes. Refer this Aggregation Pipeline Optimization.

You can run the explain method on the query to generate a Query Plan. The query plan shows how and if any indexes are used for a query operations - for both the match and sort operations.

I suspect that there is no index being used on the sort stage and maybe one of the reasons for the query being not performant. The query plan will show it as SORT stage, which means an in-memory sort - which can be memory and time intensive.

Please generate a query plan on the query you are having issues with and post it here. Make sure to use the “executionStats” verbosity mode.

Query performance is dependent upon many factors, including number of documents, their size, RAM memory, indexes, and also the way a query is built - to make use of the available indexes and memory resources. Refer the Indexing Strategies documentation for some ideas on how indexing can be used efficiently for sorting, using compound indexes and about query selectivity.

1 Like

Thanks, @Prasad_Saya for your valuable reply. I have executed the query with explain().

Here is the execution status result

"executionSuccess" : true,
                "nReturned" : 5,
                "executionTimeMillis" : 46698,
                "totalKeysExamined" : 261722,
                "totalDocsExamined" : 261722,
                "executionStages" : {
                        "stage" : "PROJECTION_DEFAULT",
                        "nReturned" : 5,
                        "executionTimeMillisEstimate" : 38692,
                        "works" : 261729,
                        "advanced" : 5,
                        "needTime" : 261723,
                        "needYield" : 0,
                        "saveState" : 2505,
                        "restoreState" : 2505,
                        "isEOF" : 1,
                        "transformBy" : {
                                "_id" : true,
                                "url" : true,
                                "title" : true
                        },
                        "inputStage" : {
                                "stage" : "SORT",
                                "nReturned" : 5,
                                "executionTimeMillisEstimate" : 38682,
                                "works" : 261729,
                                "advanced" : 5,
                                "needTime" : 261723,
                                "needYield" : 0,
                                "saveState" : 2505,
                                "restoreState" : 2505,
                                "isEOF" : 1,
                                "sortPattern" : {
                                        "source.value" : -1
                                },
                                "memLimit" : 104857600,
                                "limitAmount" : 5,
                                "type" : "simple",
                                "totalDataSizeSorted" : NumberLong("3592156118"),
                                "usedDisk" : false,
                                "inputStage" : {
                                        "stage" : "FETCH",
                                        "nReturned" : 261722,
                                        "executionTimeMillisEstimate" : 37991,
                                        "works" : 261723,
                                        "advanced" : 261722,
                                        "needTime" : 0,
                                        "needYield" : 0,
                                        "saveState" : 2505,
                                        "restoreState" : 2505,
                                        "isEOF" : 1,
                                        "docsExamined" : 261722,
                                        "alreadyHasObj" : 0,
                                        "inputStage" : {
                                                "stage" : "IXSCAN",
                                                "nReturned" : 261722,
                                                "executionTimeMillisEstimate" : 220,
                                                "works" : 261723,
                                                "advanced" : 261722,
                                                "needTime" : 0,
                                                "needYield" : 0,
                                                "saveState" : 2505,
                                                "restoreState" : 2505,
                                                "isEOF" : 1,
                                                "keyPattern" : {
                                                        "date" : 1
                                                },
                                                "indexName" : "date_1",
                                                "isMultiKey" : false,
                                                "multiKeyPaths" : {
                                                        "date" : [ ]
                                                },
                                                "isUnique" : false,
                                                "isSparse" : false,
                                                "isPartial" : false,
                                                "indexVersion" : 2,
                                                "direction" : "forward",
                                                "indexBounds" : {
                                                        "date" : [
                                                                "[new Date(1627776000000), new Date(1630367999000)]"
                                                        ]
                                                },
                                                "keysExamined" : 261722,
                                                "seeks" : 1,
                                                "dupsTested" : 0,
                                                "dupsDropped" : 0
                                        }
                                }
                        }
                }

From these stats, the index stage is working fine. But the FETCH and SORT stage is taking more time. So I am thinking that the working set and index are not fit into RAM. I have cross-checked my server configuration and have only 8GB RAM(I made a mistake in the above post).

My working set is 261,722docs X 13Kb = 3.4GB and my date field index is 110MB and overall collection is 750MB. And we have 2 more collections which are combined with 16GB(index).
MongoDB document says, 50% of (RAM - 1 GB). In this case, (0.5 * (8 -1GB)) = 3.5GB Ram we have. So I suspect this could be a memory problem, If so increasing the RAM will be solving the problem or any other suggestions. Please correct me If I am wrong.

Hello @Jony_Jackson_C, the presence of SORT stage (as I had mentioned already in my earlier post), indicates the sort operation is being performed in the RAM memory. Having an index on the field being sorted (and it applied on the sorted field) can eliminate this stage (and hence can improve performance). How to create an index on this field and make it work is an issue.

Here is some documentation on applying index on a field to be sorted. Please see: Sort and Non-prefix Subset of an Index ( I request you to browse the entire topic Use Indexes to Sort Query Results of the link for the sub-topic).

1 Like

Thanks and much-appreciated @Prasad_Saya After applying the index to the sort field, it returns the result in 40ms

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.