MongoDB slow $unwind and $group aggregation

Nikolai_Klingeln · September 23, 2022, 2:41pm

im having some trouble with my MongoDB aggregation. I have documents like:

{
    _id: "123456",
    values: [
        {
            "value": "A",
            "begin":0,
            "end":1,
        }
        {
            "value": "B",
            "begin":1,
            "end":2,
        }
        {
            "value": "C",
            "begin":3,
            "end":7,
        }
    ],
    "name": "test"
}

And i want to count only the “value” in “values”. With some help i got the following aggregation:

db.collection.aggregate([
  {$unwind: "$values"},
  {$group: {_id: "$values.value", count: {$sum: 1}}}
])

The problem is: it takes me about 20 seconds to get the result for 6k documents. Is there anything i can do for optimication?

Greetings

Doug_Duncan · September 23, 2022, 2:52pm

Hi @Nikolai_Klingeln and welcome to the MongoDB Community forums!

I have a couple questions for you:

Is the 6k documents before or after you run $unwind? If before how many subdocuments does each of the values array have on average?
What version of MongoDB are you running?
Is MongoDB doing a lot of other processing at the same time?
What amount of RAM and CPU does your machine have?
Does this machine only run MongoDB, or does it run other services as well?

All of these could have impacts on MongoDB and the processing of this query.

Nikolai_Klingeln · September 24, 2022, 9:28am

Hey, thanks for the welcome
The 6k Documents are before the unwind, after the unwind i have a total of 32 million documents.
I use MongoDB with java and i have there the latest version of MongoDB.
I have 16GB of RAM and a ryzen 4500u (6 CPUs ~2.4GHz).
I have some other programs running but only low weight applications. Before i start the java program i have only 5% cpu usage and i only use 7GB of ram.

Nikolai_Klingeln · September 24, 2022, 9:30am

I forgot the mention it.

db.collection.aggregate([
   {
      $project: {
         values: {
            $map: {
               input: { $setUnion: "$values.value" },
               as: "value",
               in: {
                  value: "$value",
                  count: {
                     $size: {
                        $filter: {
                           input: "$values",
                           cond: { $eq: ["$this.value", "$value"] },
                        }
                     }
                  }
               }
            }
         }
      }
   },
   { $unwind: "$values" },
   { $group: { _id: "$values.value", count: { $sum: "$values.count" } } }
])

I used this with the hope it would create less documents but it got even worse

The project and unwind operations was fast but the group operation was very slow and i had to wait 4 minutes for the aggregation. (It was before 20 seconds)

Nikolai_Klingeln · September 27, 2022, 12:21pm

Could indexes help me with my problem? Im at the point where im trying everything

Doug_Duncan · September 27, 2022, 7:57pm

Sorry for the delayed response. I was sick over the weekend and just got back to this thread.

If you have 6k document before the $unwind and then 32M after, that means your values arrays have 5,333 items on average. sending 32M documents to a group stage is going to take time.

How often does this query run? If not that often I would just let it be. 20 seconds to process 32M documents doesn’t seem like a lot of time to me. It’s rare that I ever want to do an aggregation over my entire dataset, but then I don’t know your data or requirements of the query you’re running so this might be perfectly normal in this case.

You probably won’t get much help from an index since you’re unwinding an array, but you could tray. MongoDB does find ways of surprising me with what it can do.