$group is performing terribly. I have tried many different approaches. Below are some of the approaches. Explain Plan at the end of this post. Any help would be helpful. I have tried this on local MDB, Atlas (M40). Any help would be great!
Take the following schema. Everything not required has been removed. _id is not used.
“bankAccount” can be null (an unallocated account). “accountNumber” is not unique. I want to find the number of accounts allocated to each “accountNumber”.
{
"_id" : 12345,
"bankAccount" : {
"accountNumber" : "1234567890123450"
}
}
}
I have several indexes and the Explain Plan (at the end of this post) indicates an index is being used!
The collection size is 9 million (soon another 9 million will be added), with 5 million having am “BankAccount” object.
Using LINQ:
var sw = Stopwatch.StartNew();
var result = await bankAccountRepo.Queryable()
.Where(n => n.BankAccount.AccountNumber != null)
.GroupBy(n => n.BankAccount.AccountNumber)
.Select(n => new { AccountNumber = n.Key, Count = n.Count() })
.ToListAsync();
sw.Stop();
Console.WriteLine($"{sw.Elapsed} Rows:{result.Count}");
Result
00:00:13.7990042 Rows:175
14 seconds - Kidding me! SQL-Server does this in <100ms It’s only returning 175 rows.
Next, I tried an aggregation.
[
{ "$match" : { "bankAccount.accountNumber" : { "$ne" : null } } },
{ "$group" : { "_id" : "$bankAccount.accountNumber", "count" : { "$sum" : 1 } } },
{ "$project" : { "AccountNumber" : "$_id", "Count" : "$count", "_id" : 0 } }
]
Result
00:00:13.2047294 Rows:175
Same!
The Explain says it’s using an index scan,
It does not need to examine any of the documents according to the Explain Plan
“totalKeysExamined” : NumberInt(5043932),
** “totalDocsExamined” : NumberInt(0),**
The group section of the plan estimates 14.8sec
“executionTimeMillisEstimate” : NumberLong(14854)
The projection stage estimates at 2.1sec
“executionTimeMillisEstimate” : NumberInt(2101),
What is going on? This makes the $group operator totally unusable!
Next Thought: I’ll retrieve all the "accountNumber"s and group them in memory using LINQ.
- Retrieve all account Numbers
var accountNumbers = await bankAccountRepo.Queryable()
.Where(n => n.BankAccount.AccountNumber != null)
.Select(n => n.BankAccount.AccountNumber)
.ToListAsync();
- Group them inMemory using LINQ
var grouped = accountNumbers.GroupBy(n => n)
.Select(n => new { AccountNumber = n.Key, Count = n.Count() })
.ToList();
3. Results
- Retrieve all AccountNumbers from Mongo: 00:00:20.3103406 Rows:5043932
- InMemory LINQ Grouping: 00:00:00.2270047 Rows:175
Conclusion AND Questions
- I want to know if I have done something wrong. Right now, this is unusable.
- Why can LINQ group 5 Million rows in 0.23 seconds while MongoDB is taking 15 seconds?
- Why does it take 20.3 seconds to return all the values in an index (which is in memory on the server).
- I’m migrating a large real-time banking application from SQL-Server to MongoDB. When I have found poor MongoDB performance in the past, I have been able to split up the query and use other tricks to at least meet the performance of SQL-Server. With this simple reporting query, Nothing I have tried has dropped performance below 13 seconds.
Explain Plan:
{
"explainVersion" : "1",
"stages" : [
{
"$cursor" : {
"queryPlanner" : {
"namespace" : "paymentsDb-Migration.bankAccount",
"indexFilterSet" : false,
"parsedQuery" : {
"bankAccount.accountNumber" : {
"$not" : {
"$eq" : null
}
}
},
"queryHash" : "2B6152F6",
"planCacheKey" : "10155D97",
"maxIndexedOrSolutionsReached" : false,
"maxIndexedAndSolutionsReached" : false,
"maxScansToExplodeReached" : false,
"winningPlan" : {
"stage" : "PROJECTION_DEFAULT",
"transformBy" : {
"bankAccount.accountNumber" : NumberInt(1),
"_id" : NumberInt(0)
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"bankAccount.accountNumber" : NumberInt(1),
"bankAccount.abaDetails.number" : NumberInt(1)
},
"indexName" : "accountNumber_abaDetails.number_V1",
"isMultiKey" : false,
"multiKeyPaths" : {
"bankAccount.accountNumber" : [
],
"bankAccount.abaDetails.number" : [
]
},
"isUnique" : true,
"isSparse" : true,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"bankAccount.accountNumber" : [
"[MinKey, undefined)",
"(null, MaxKey]"
],
"bankAccount.abaDetails.number" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "PROJECTION_DEFAULT",
"transformBy" : {
"bankAccount.accountNumber" : NumberInt(1),
"_id" : NumberInt(0)
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"bankAccount.accountNumber" : NumberInt(1),
"bankAccount.stipStatus" : NumberInt(1),
"bankAccount.isActive" : NumberInt(1),
"bankAccount.createdDateTime" : NumberInt(1)
},
"indexName" : "accountNumber_stipStatus_isActive_createdDateTime_V1",
"isMultiKey" : false,
"multiKeyPaths" : {
"bankAccount.accountNumber" : [
],
"bankAccount.stipStatus" : [
],
"bankAccount.isActive" : [
],
"bankAccount.createdDateTime" : [
]
},
"isUnique" : false,
"isSparse" : true,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"bankAccount.accountNumber" : [
"[MinKey, undefined)",
"(null, MaxKey]"
],
"bankAccount.stipStatus" : [
"[MinKey, MaxKey]"
],
"bankAccount.isActive" : [
"[MinKey, MaxKey]"
],
"bankAccount.createdDateTime" : [
"[MinKey, MaxKey]"
]
}
}
},
{
"stage" : "PROJECTION_DEFAULT",
"transformBy" : {
"bankAccount.accountNumber" : NumberInt(1),
"_id" : NumberInt(0)
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"bankAccount.accountNumber" : NumberInt(1),
"bankAccount.createdDateTime" : NumberInt(1)
},
"indexName" : "accountNumber_createdDateTime_V1",
"isMultiKey" : false,
"multiKeyPaths" : {
"bankAccount.accountNumber" : [
],
"bankAccount.createdDateTime" : [
]
},
"isUnique" : false,
"isSparse" : true,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"bankAccount.accountNumber" : [
"[MinKey, undefined)",
"(null, MaxKey]"
],
"bankAccount.createdDateTime" : [
"[MinKey, MaxKey]"
]
}
}
},
{
"stage" : "PROJECTION_DEFAULT",
"transformBy" : {
"bankAccount.accountNumber" : NumberInt(1),
"_id" : NumberInt(0)
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"bankAccount.accountNumber" : NumberInt(1)
},
"indexName" : "accountNumber_V1",
"isMultiKey" : false,
"multiKeyPaths" : {
"bankAccount.accountNumber" : [
]
},
"isUnique" : false,
"isSparse" : true,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"bankAccount.accountNumber" : [
"[MinKey, undefined)",
"(null, MaxKey]"
]
}
}
}
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : NumberInt(5043932),
"executionTimeMillis" : NumberInt(14858),
"totalKeysExamined" : NumberInt(5043932),
"totalDocsExamined" : NumberInt(0),
"executionStages" : {
"stage" : "PROJECTION_DEFAULT",
"nReturned" : NumberInt(5043932),
"executionTimeMillisEstimate" : NumberInt(2101),
"works" : NumberInt(5043933),
"advanced" : NumberInt(5043932),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(5557),
"restoreState" : NumberInt(5557),
"isEOF" : NumberInt(1),
"transformBy" : {
"bankAccount.accountNumber" : NumberInt(1),
"_id" : NumberInt(0)
},
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : NumberInt(5043932),
"executionTimeMillisEstimate" : NumberInt(737),
"works" : NumberInt(5043933),
"advanced" : NumberInt(5043932),
"needTime" : NumberInt(0),
"needYield" : NumberInt(0),
"saveState" : NumberInt(5557),
"restoreState" : NumberInt(5557),
"isEOF" : NumberInt(1),
"keyPattern" : {
"bankAccount.accountNumber" : NumberInt(1),
"bankAccount.abaDetails.number" : NumberInt(1)
},
"indexName" : "accountNumber_abaDetails.number_V1",
"isMultiKey" : false,
"multiKeyPaths" : {
"bankAccount.accountNumber" : [
],
"bankAccount.abaDetails.number" : [
]
},
"isUnique" : true,
"isSparse" : true,
"isPartial" : false,
"indexVersion" : NumberInt(2),
"direction" : "forward",
"indexBounds" : {
"bankAccount.accountNumber" : [
"[MinKey, undefined)",
"(null, MaxKey]"
],
"bankAccount.abaDetails.number" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : NumberInt(5043932),
"seeks" : NumberInt(1),
"dupsTested" : NumberInt(0),
"dupsDropped" : NumberInt(0)
}
}
}
},
"nReturned" : NumberLong(5043932),
"executionTimeMillisEstimate" : NumberLong(13523)
},
{
"$group" : {
"_id" : "$bankAccount.accountNumber",
"count" : {
"$sum" : {
"$const" : 1.0
}
}
},
"maxAccumulatorMemoryUsageBytes" : {
"count" : NumberLong(12600)
},
"totalOutputDataSizeBytes" : NumberLong(45675),
"usedDisk" : false,
"nReturned" : NumberLong(175),
"executionTimeMillisEstimate" : NumberLong(14854)
},
{
"$project" : {
"AccountNumber" : "$_id",
"Count" : "$count",
"_id" : false
},
"nReturned" : NumberLong(175),
"executionTimeMillisEstimate" : NumberLong(14854)
}
],
"serverInfo" : {
"host" : "atlas-4vy2as-shard-00-02.krrw3.mongodb.net",
"port" : NumberInt(27017),
"version" : "5.0.14",
"gitVersion" : "1b3b0073a0b436a8a502b612f24fb2bd572772e5"
},
"serverParameters" : {
"internalQueryFacetBufferSizeBytes" : NumberInt(104857600),
"internalQueryFacetMaxOutputDocSizeBytes" : NumberInt(104857600),
"internalLookupStageIntermediateDocumentMaxSizeBytes" : NumberInt(104857600),
"internalDocumentSourceGroupMaxMemoryBytes" : NumberInt(104857600),
"internalQueryMaxBlockingSortMemoryUsageBytes" : NumberInt(104857600),
"internalQueryProhibitBlockingMergeOnMongoS" : NumberInt(0),
"internalQueryMaxAddToSetBytes" : NumberInt(104857600),
"internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : NumberInt(104857600)
},
"command" : {
"aggregate" : "cuscalBankAccount",
"pipeline" : [
{
"$match" : {
"bankAccount.accountNumber" : {
"$ne" : null
}
}
},
{
"$group" : {
"_id" : "$bankAccount.accountNumber",
"count" : {
"$sum" : 1.0
}
}
},
{
"$project" : {
"AccountNumber" : "$_id",
"Count" : "$count",
"_id" : 0.0
}
}
],
"allowDiskUse" : false,
"maxTimeMS" : NumberLong(0),
"cursor" : {
},
"$db" : "monoovaPaymentsDb-Migration"
},
"ok" : 1.0,
"$clusterTime" : {
"clusterTime" : Timestamp(1676282832, 1),
"signature" : {
"hash" : BinData(0, "bx+wC+MVULo2KX1DYS8h0apGhcU="),
"keyId" : NumberLong(7133047024810721286)
}
},
"operationTime" : Timestamp(1676282832, 1)
}