Hi,
I have duplicate records in collection invoices. I want to perform following steps on the collection:
- Identify duplicate records based on a key which I am able to find out:
db.invoices.aggregate([{ $group: { _id: "$request.doc_number", count: { $sum: 1 } } }, { $match: { count: { $gt: 50 } } }] )
- Insert duplicate records in new collection:
I’m facing issues with this as they query is inserting only ObjectIds and not full documents.
// Define the name of the new collection
const newCollectionName = "DuplicateInvoices";
// Step 1: Find duplicate doc_number groups
const duplicates = db.Invoices.aggregate([
{
$group: {
_id: "$request.doc_number",
count: { $sum: 1 },
docs: { $push: "$_id" } // Collect all document IDs for each doc_number
}
},
{
$match: {
count: { $gt: 1 } // Include only groups with more than 1 occurrence
}
}
]).toArray();
// Step 2: Insert duplicates into the new collection
duplicates.forEach(group => {
// Remove one document ID from the list of duplicates to keep in the original collection
const [idToKeep, ...idsToMove] = group.docs;
// Insert the duplicates (excluding one record) into the new collection
db[newCollectionName].insertMany(
idsToMove.map(id => ({ _id: id })) // Insert documents by their IDs
);
});
- I want to delete the duplicate invoices from the original collection except 1 original record.
I am yet to try this.
Appreciate any tips.
Best Regards,