Index use by Aggregation on Large Database Only Performant after Multiple Queries

canpan14 · February 2, 2023, 2:23pm

I want to find the latest records for each groupId that are in my main/sub category who have their latest status as failed for the give main/sub category pair within a groupId.

Example data (real documents are larger with a number of more fields):

{
  "groupId": "1",
  "mainCategory": "breakfast",
  "subCategory": "eggs",
  "status": "success",
  "dateField": "2023-01-20T00:00:00Z"
},
{
  "groupId": "1",
  "mainCategory": "breakfast",
  "subCategory": "bacon",
  "status": "success",
  "dateField": "2023-01-20T00:00:00Z"
},
{
  "groupId": "1",
  "mainCategory": "breakfast",
  "subCategory": "eggs",
  "status": "failure",
  "dateField": "2023-01-01T00:00:00Z"
}

With the above data, if i search for { mainCategory: "breakfast", subCategory: "eggs", "status": "failure"} I’d expect to get no results back because the latest result (due to the sorting on the date) is a success.

If I did the same but searched for a status of ‘success’ I’d expect to get one result back as the latest breakfast + eggs result in that group is a success.

This is why in the steps below I need to do the status match after the other steps, so that I don’t filter things out by status until I know if they were the latest or not.

My index

{
    "mainCategory" : 1,
    "subCategory" : 1,
    "dateField" : -1
}

Example aggregate query

db.getCollection("my-collection").explain("executionStats").aggregate([
        {
            "$match": {
                "mainCategory": "breakfast",
                "subCategory": "eggs",
                "dateField": { $gte: "2023-01-01T00:00:00Z" }
            }
        }, 
        {
            "$sort": {
                "dateField": -1.0
            }
        }, 
        {
            "$group": {
                "_id": {
                    "groupId": "$groupId",
                    "subCategory": "$subCategory"
                },
                "latestRecord": {
                    "$first": "$$ROOT"
                }
            }
        }, 
        {
            "$match": {
                "latestRecord.status": "failure"
            }
        }, 
        {
            "$limit": 100.0
        }, 
        {
            "$replaceRoot": {
                "newRoot": "$latestRecord"
            }
        }
    ])

Execution Stats

  {
    "executionStats" : {
        "executionSuccess" : true,
        "executionTimeMillis" : "663941.077",
        "planningTimeMillis" : "0.530",
        "executionStages" : {
            "stage" : "SUBSCAN",
            "nReturned" : "100",
            "executionTimeMillisEstimate" : "663921.912",
            "inputStage" : {
                "stage" : "LIMIT_SKIP",
                "nReturned" : "100",
                "executionTimeMillisEstimate" : "663914.564",
                "inputStage" : {
                    "stage" : "SUBSCAN",
                    "nReturned" : "100",
                    "executionTimeMillisEstimate" : "663914.535",
                    "inputStage" : {
                        "stage" : "SORT_AGGREGATE",
                        "nReturned" : "100",
                        "executionTimeMillisEstimate" : "663914.492",
                        "inputStage" : {
                            "stage" : "SORT",
                            "nReturned" : "8547",
                            "executionTimeMillisEstimate" : "663655.029",
                            "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : "461546",
                                "executionTimeMillisEstimate" : "660287.707",
                                "indexName" : "myIndex", // It is correctly using my index here
                                "direction" : "forward"
                            }
                        }
                    }
                }
            }
        }
    },
    "serverInfo" : {
        "host" : "my-database",
        "port" : 27031.0,
        "version" : "4.0.0" // DocumentDB, if there is no way around this issue, could consider switching to Atlas
    },

I can’t figure out if my index is just bad, or if there is a better way to do the query.
If I keep running the same query, it does get very fast. Issue is those initial queries take so long that it times out through AWS API Gateway