Hey @Pavel_Duchovny, thank you for responding. I will give you some more context. A document in my collection looks like this:
{
"_id" : ObjectId("6055ec15bfcb7198dffa042c"),
"message_id" : "TeiaBrCXgGyaf6aTk",
"client_id" : "april",
// bunch of other fields
}
_id
is the primary key. And I want to create the unique index on (client_id, message_id) fields. I tried to find the dupes, using aggregate framework, it was very slow. The console wouldn’t respond for hours.
So, I created two indexes:
{
"v" : 2,
"key" : {
"client_id" : 1,
"message_id" : 1
},
"name" : "client_id_1_message_id_1",
"ns" : "my_db.msgs"
},
{
"v" : 2,
"key" : {
"_id" : 1,
"client_id" : 1,
"message_id" : 1
},
"name" : "_id_1_client_id_1_message_id_1",
"ns" : "my_db.msgs"
},
this is my aggregate query, do you find anything odd in it?
db.msgs.aggregate([
{ $group: {
_id: { "message_id": "$message_id", "client_id": "$client_id" },
count: { $sum: 1 },
id : { $push : "$$ROOT._id" }
} },
{ $sort : { count : -1} },
{ $match: {
count: { $gte: 2 }
} },
], {
allowDiskUse: true
});
Why this is not using any of the indexes? Here is what it says when I do explain on it:
{
"stages" : [
{
"$cursor" : {
"query" : {
},
"fields" : {
"client_id" : 1,
"message_id" : 1,
"_id" : 1
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "my_db.msgs",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
}
},
{
"$group" : {
"_id" : {
"message_id" : "$message_id",
"client_id" : "$client_id"
},
"count" : {
"$sum" : {
"$const" : 1
}
},
"id" : {
"$push" : "$$ROOT._id"
}
}
},
{
"$match" : {
"count" : {
"$gte" : 2
}
}
},
{
"$sort" : {
"sortKey" : {
"count" : -1
}
}
}
],
"serverInfo" : {
"host" : "my_db",
"port" : 27017,
"version" : "4.0.23",
"gitVersion" : "07c6611b38d2aacbdb1846b688db70b3273170fb"
},
"ok" : 1
}