Hi everyone,
We’re using a MongoDB M60 cluster (version 8.0.11), and we have a performance concern with one of our aggregation queries.
We have an orders collection with ~50 million documents, and each document contains a tenantId. Our goal is to identify all UUIDs (grouped orders) for a specific tenant where none of the orders have a status that is considered “completed”.
Here is the aggregation pipeline we use:
{
$match: {
tenantId: "MY_TENANT_ID",
status: { $exists: true }
}
}
{
$group: {
_id: "$uuid",
condition_count_complated: {
$sum: {
$cond: [
{
$in: [
"$status",
["WAITING_FOR_APPROVAL", "APPROVED", "SENT"]
]
},
1,
0
]
}
}
}
}
{
$match: {
condition_count_complated: { $eq: 0 }
}
}
We also have a compound index on:
{ uuid: 1, status: 1 }
Unfortunately, the $group stage is taking the most time, sometimes up to 10–20 minutes, especially for tenants with a large number of orders.
From what I understand, $group doesn’t utilize indexes (and I assume $sort won’t help here either since we’re not using $first or $last).
Main Questions:
- Is there any way to optimize this query to reduce the time spent on
$group? - Is MongoDB suited for this kind of workload — aggregating across millions of documents for analytics-style queries?
- Should we consider a different data model, or maybe offload this to a data warehouse?
Any tips, indexing strategies, or restructuring ideas would be greatly appreciated!