$addToSet slower/hanging on large collections since MongoDB 6

An aggregate query that previously ran quickly in Mongo 4.0 runs slowly (actually, it hangs, causing 100% CPU until killed) in Mongo 6.0.2.

The query in question is below and is attempting to get the number of documents in a collection grouped by “firstField”, counting the documents based on the uniqueness of “secondField”. secondField is a small value, around 60 bytes, and is indexed too. According to an explain of this query though, no index is being utilized (even if I hint it, there’s no performance improvement). This query runs fine on smaller collections but on this particular collection where there are hundreds of thousands of unique values for secondField for a given value of firstField, the query just hangs forever. Switching ‘allowDiskUse’ to true/false makes no difference. I’ve also tried setting internalQueryMaxAddToSetBytes to a 10x value but no difference either.

db['mycollection'].aggregate([
   {
     "$group": {
       "uniqueList": {
         "$addToSet": "$secondField"
       },
       "_id": {
         "GroupedField": "$firstField"
       }
     }
   }
 ]);

I’ve been banging my head against the wall for over two days on this issue so am hoping that someone can tell me why this is slow with 6.0 and wasn’t with 4.0, or if there’s a more modern way to aggregate to get the results I’m after.

Thank you!

I’ve been investigating this further, and have even spun up a separate MongoDB 4.0 cluster (same specs) to test alongside the 6.0.2 cluster. I’ve found two things:

  1. On both clusters, with the same data (1.3m documents), running the exact same query, same indexes, same specs, etc. the 4.0 database doesn’t resort to use disk for the $group at all (explicitly disabled with allowDiskUse: false). On the 6.0.2 cluster however, not only does it use disk but uses it even with 100,000 documents (tested by inserting a $limit before the $group). Why is resorting to disk so aggressive in 6.0? Is this configurable at all?

  2. When it does resort to disk, the performance difference is significant (not surprising). I see that an “internal-xxx-xxxx.wt” file is created in my mongodb data directory, growing to 10MB size (tiny), but does so slowly - takes about 3 minutes just to get to this size, growing about 3MB per minute, then stopping at that size before decreasing to 8MB. I/O on the impacted volume is incredibly low (oplog actually sits on a separate disk). Meanwhile CPU on one of the cores of the 8-core, 64GB RAM machine is at 100%.

This feels to me like a bug but I’d appreciate confirmation either way to ensure I’m not wasting my time here.

Just an idea I would try.

Start with a $sort on firstField. In principal, a $group stage does not output any document before all input documents are processed. That is logocal since it does not know if more documents will have the same group _id. But if they are sorted, then it may.

$addToSet might be O(n2), so I would try to sort on firstField:1,secondField:1, then $group with _id:{firstField,secondField} using $first of secondField. May be a group document with only $first accumulator gets output right away. The another $group with _id:firstField, that uses $push:secondField rather than $addToSet.

I think I’ve narrowed down the issue to the Slot-Based Engine. I’ve just forced the use of the classic query engine with:

db.adminCommand({'setParameter': 1, internalQueryForceClassicEngine: true });

No more issues. The performance of the full collection aggregation is just as fast as it was on 4.0. I also don’t see any spikes in CPU or memory (or at least, the query finishes too fast for me to be able to spot it in my graphs).

I’m assuming the performance of the SBE in this case is not working as intended, since I imagine the SBE should be much faster and not require disk use more than the classic engine?

2 Likes

Created a bug for this issue: https://jira.mongodb.org/browse/SERVER-70395

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.