Hi all,
Here is a sample document
{
_id : ObjectId("5f4e18f4f90762fe549a1eee")
name: "Midhuna",
age: 83,
places: "New York"
hobbies: ["Singing", "Reading Books"]
child: [
{
_id : ObjectId("5fde18f4f90762fe549a1eee")
uniqueid : 1234
name: "Akash",
age: 25
},
{
_id : ObjectId("5fde18f4f90762fe549a1eee")
uniqueid : 1234
name: "Ratan",
age: 35
}
],
jobs: [
{
_id : ObjectId("5fde18f4f90762fe549a554e")
uniqueid : 3234
title: "Engineer",
Salary: 325
},
{
_id : ObjectId("5fde18f4f90762fe549a554e")
uniqueid : 1234
title: "Sales Director",
Salary: 354
},
{
_id : ObjectId("5fde18f4f90762fe549a554e")
uniqueid : 12345
title: "Customer Success",
Salary: 445
}
]
}
I have a collection with several arrays/sub documents. I want to find out if there are duplicates in ‘_id’ field. This has happened in all documents of my database - some nested documents have the same _id despite having different content/values. For example; in the below sample doc I shared the child array and the jobs array both have duplicates in _id field despite all other fields being completely different. I have over 100 different subdocs/arrays and I want to find out how many subdocs have duplicates at _id level and what are those
I used this to find duplicates from a single subdoc:
db.customer.aggregate([
{ $unwind: "$child" },
{
$group: {
_id: {
_id: "$_id",
child_uniqueid: "$child._id"
},
sum: { $sum: 1 }
}
},
{ $match: { sum: { $gt: 1 } } }
])
This works fine. However, I have over 100 subdocs/arrays and I don’t want to manually change the array name here and run it a 100 times. Is there a way, I can loop over the document and run this code for all arrays automatically?