I want a faster way to count the distinct by field in MongoDB Collection that contain more than 10 Million documents; my schema looks like this :
[
{
//-- type and date
"date": "2023-11-09",
"type": "my_type",
"resource": "2222"
},
{
//-- type and date
"date": "2023-11-09",
"type": "my_type",
"resource": "1111"
},
{
//-- type and date
"date": "2023-11-09",
"type": "my_type",
"resource": "1111"
}
]
I want a very performance query that can count the distinct resources over my collection
Just to build off what Chris wrote, you should be able to do this as a covered query.
For example:
db.foo.drop();
db.foo.insertMany([
{
//-- type and date
"date": "2023-11-09",
"type": "my_type",
"resource": "2222"
},
{
//-- type and date
"date": "2023-11-09",
"type": "my_type",
"resource": "1111"
},
{
//-- type and date
"date": "2023-11-09",
"type": "my_type",
"resource": "1111"
}
])
db.foo.createIndex({ resource: 1 });
db.foo.aggregate([
{ $sort: { resource: 1 } },
{ $group: {
_id: "$resource",
count: { $sum: 1 }
}}
])
This should efficiently use the index and prevent any documents from being fetched from disk to satisfy the distinct count requirements you’ve outlined.
As to how long this operation would take would depend on a number of factors (size of cluster, memory available, resource usage, etc).