How to remove duplicates when creating unique indexes

Joe · April 18, 2022, 6:41pm

I am trying to create a unique index but it’s erroring because of duplicate documents I currently have. I have about 400,000 documents, and I need to remove all duplicates using the id field. What’s the easier/most efficient way to accomplish this? I see there’s no longer dropdups.

Thanks

Aasawari · April 19, 2022, 9:13am

Hi @Joe
Welcome to the community forum!!

Could you please confirm if you are using a script to generate id field such that duplicate data is created for the collection for id field?
Since the _id from MongoDB are always unique and hence, duplicate data entry is not allowed on _id field.
Please refer to the documentation here ObjectId

Also, could you please provide a sample dummy document or a sample schema so that we could assist you with better solution.

Thanks
Aasawari

Joe · April 19, 2022, 10:21am

I think I figured it out. I aggregate and then push the results to an array and then remove, unless there’s a better way.

Heres an example document. If the document has duplicate “id” fields then I want to delete duplicates.

{
“_id” : ObjectId(“62562a8867ea78ee7907baa0”),
“id” : “24242424243453536”,
“commandUses” : 0,
“daily” : 0,
“reps” : 0,
}

system · May 31, 2022, 7:28am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.