Hi @amazing,
Something like the following might be suitable for your needs:
db.foo.insertMany([
{ "ip" : "127.0.0.1", "user": 1 },
{ "ip" : "127.1.1.1", "user": 2 },
{ "ip" : "127.2.1.1", "user": 3 },
{ "ip" : "127.0.0.1", "user": 4 },
{ "ip" : "127.1.1.1", "user": 5 },
]);
db.foo.createIndex({ ip: 1, user: 1 });
db.foo.aggregate([
{ $sort: { ip: 1 } },
{ $group: {
_id: "$ip",
used: { $sum: 1 },
users: { $push: { user: "$user" } }
}},
{ $match: { used: { $gt: 1 } } },
])
Note that if you’re only using the ip and user fields, having an index on these (as shown above) should greatly improve the performance of this operation.
I’ve written up a longer form of this response at Efficiently Identifying Duplicates using MongoDB | ALEX BEVILACQUA as this question comes up pretty frequently ![]()