Grouping on multiple fields faster when using an array as key?

alexwchr · May 23, 2023, 8:30am

Hello,

I stumbled into this yesterday and couldn’t find anything about it online. An identical aggregation became much faster than switching the object key to array. Examples:

Slow:

[
  {
    $group: {
      _id: { Field1: "$Field1", Field2: "$Field2" },
      Field1: {
        $first: "$Field1",
      },
      Field2: {
        $first: "$Field2",
      },
      Field3: {
        $push: {
          Value: "$Value",
          Date: "$Date",
        },
      },
    },
  },
]

Fast:

[
  {
    $group: {
      _id: ["$Field1", "$Field2"],
      Field1: {
        $first: "$Field1",
      },
      Field2: {
        $first: "$Field2",
      },
      Field3: {
        $push: {
          Value: "$Value",
          Date: "$Date",
        },
      },
    },
  },
]

Any clues as to why this is? I assume it has something to do with how MongoDB treats the keys internally and I also noticed that the execution plan changed (the one with object using slot-based, the other not).

I haven’t found any examples using an array online, but it yields an identical resultset in our case, just much faster.

Thanks!

Aasawari · June 1, 2023, 4:13am

Hi @alexwchr and welcome to MongoDB community forums!!

Ideally, the group stage would treat both the fields in the similar way. However, to understand more, could you help me with some details to replicate the same in my local environment:

A sample document and the collection size for which the query has been tried.
The desired output from the aggregation query
The MongoDB version you are on.
Explain output for both the query tried.

Regards
Aasawari

alexwchr · June 16, 2023, 8:51am

Hi,

MongoDB version is 6.0.2. Collection contains 20M docs. Before the group stage we run a $match that filters out approx 1M of these docs.

I am not able to provide you with the other details you requested on a public forum since it’s too cumbersome to anonymize the schema details. Is there a way for me to e-mail these to you?

What I can say is that with the array I get

explainVersion "1"

and with the object

explainVersion "2"

Indicating slot vs non-slot based execution.

Aasawari · June 29, 2023, 4:59am

Hi @alexwchr

Thank you for sharing the details.

Unfortunately without some document example and explain plan output, it’s difficult to determine what’s really going on. This is because MongoDB query planner could tailor its approach depending on how the documents are structured, indexes on the collection, and other information.

Unlike SQL databases with rigid schema, MongoDB does not have the luxury of knowing beforehand the content and datatype of each collection.It will be helpful if you can provide an example document, the output of db.collection.explain('executionStats').aggregate(...) for both queries, the output of db.collection.getIndexes() , and db.collection.stats().

I don’t think any of those commands will print an actual document in their output, only some statistics about the collection and the aggregation’s execution. However we do need an example document to recreate what you’re seeing.

Regards
Aasawari