How to merge multiple documents of a collection to a single document with same field names?

Hi,

I have been trying to merge multiple documents to a single document based on a unique id present on all these docs of a collection. I have tried using group, $mergeObjects , $set however I m still not able to achieve the expected output, didnt find any similar question on the internet as well so can someone help to solve this?
In below sample documents of the collection, need to merge documents having same “UniqueID” and the value in same field names of the documents getting merged should be decided conditionally.

{"_id":1,
"UniqueID":"111",
"Search_Criteria":"NameOfOrg",
"Searched_value":"IBM",
"UserNavPage_YN":"Y",
"PDF_DL":"N",
"Excel_DL":"Y",
"Id_of_entity":"121212",
"ProcessedYN":"N",
},
{"_id":2,
"UniqueID":"111",
"Search_Criteria":"NameOfOrg",
"Searched_value":"IBM",
"UserNavPage_YN":"Y",
"PDF_DL":"Y",
"Excel_DL":"N",
"Id_of_entity":"121212",
"ProcessedYN":"N",
},
{"_id":3,
"UniqueID":"222",
"Search_Criteria":"NameOfOrg",
"Searched_value":"Tesla",
"UserNavPage_YN":"Y",
"PDF_DL":"N",
"Excel_DL":"N",
"Id_of_entity":"2121",
"ProcessedYN":"N",
},
{"_id":4,
"UniqueID":"222",
"Search_Criteria":"NameOfOrg",
"Searched_value":"Tesla",
"UserNavPage_YN":"Y",
"PDF_DL":"N",
"Excel_DL":"Y",
"Id_of_entity":"2121",
"ProcessedYN":"N",
}
Expected Output:
The two docs with UniqueID = 111, should be merged to single doc as below: 

{
"UniqueID":"111",
"Search_Criteria":"NameOfOrg",
"Searched_value":"IBM",
"UserNavPage_YN":"Y", // Since both docs have same value so it should remain Y
"PDF_DL":"Y", // if any doc with same uniqueId contains Y in this field then keep this as Y
"Excel_DL":"Y", // if any doc with same uniqueId contains Y in this field then keep this as Y
"Id_of_entity":"121212", 
"ProcessedYN":"N",
}

Same logic to be applied to the two docs with UniqueId = 222

As a very basic possible solution something like this:

I’ve obviously made up some logic for the other fields, but you could take this approach, group up and push the other fields into an array and then do some logic on those, either like I did or a reduce function over the array to get the desired output.

In this case, make the assumption that a field is either Y or N, if it’s more than one value then we know it’ll have Y in there, else check that the single value is Y and set the output accordingly.

I’ve not tested this in terms of performance, obviously you’ll want an index for the grouping (probably sort first as well) and with a lot of data you may want to run it in batches possibly.

db.getCollection("Test").aggregate([
{
    $group:{
        _id:'$UniqueID',
        "Search_Criteria" : {$addToSet:'$Search_Criteria'},
        "Searched_value" : {$addToSet:'$Searched_value'},
        "UserNavPage_YN" : {$addToSet:'$UserNavPage_YN'},
        "PDF_DL" : {$addToSet:'$PDF_DL'},
        "Excel_DL" : {$addToSet:'$Excel_DL'},
        "Id_of_entity" : {$addToSet:'$Id_of_entity'},
        "ProcessedYN" : {$addToSet:'$ProcessedYN'},
    }
},
{
    $addFields:{
        'UniqueID':'$_id',
        "Search_Criteria" : {$arrayElemAt:['$Search_Criteria', 0]},
        "Searched_value" : {$arrayElemAt:['$Searched_value', 0]},
        "UserNavPage_YN" : {
            $cond:{
                if:{
                    $or:[
                        {
                            $and:[
                                {$eq:[1, {$size:'$UserNavPage_YN'}]},
                                {$eq:['Y', {$arrayElemAt:['$UserNavPage_YN', 0]}]}
                            ]
                        },
                        {
                            $gt:[1, {$size:'$UserNavPage_YN'}]
                        }                        
                    ]
                },
                then:'Y',
                else:'N'
            }
        },
        "PDF_DL" : {
            $cond:{
                if:{
                    $or:[
                        {
                            $and:[
                                {$eq:[1, {$size:'$PDF_DL'}]},
                                {$eq:['Y', {$arrayElemAt:['$PDF_DL', 0]}]}
                            ]
                        },
                        {
                            $gt:[1, {$size:'$PDF_DL'}]
                        }                        
                    ]
                },
                then:'Y',
                else:'N'
            }
        },
        "Excel_DL" : {
            $cond:{
                if:{
                    $or:[
                        {
                            $and:[
                                {$eq:[1, {$size:'$Excel_DL'}]},
                                {$eq:['Y', {$arrayElemAt:['$Excel_DL', 0]}]}
                            ]
                        },
                        {
                            $gt:[1, {$size:'$Excel_DL'}]
                        }                        
                    ]
                },
                then:'N',
                else:'Y'
            }
        },
        "Id_of_entity" : {$arrayElemAt:['$Id_of_entity', 0]},
        "ProcessedYN" : {$arrayElemAt:['$ProcessedYN', 0]},
        
    }
},
{
    $project:{
        _id:0
    }
}
])

Thank you very much, your solution indeed solves the problem to great extent. one minor change I made to the query was updating $gt:[1, {$size:‘$Excel_DL’}] to $gt:[{$size:‘$Excel_DL’},1] at all applicable places since $gt returns true if first argument is greater than the 2nd one. Cheers :slight_smile:

D’oh…stupid error!

Glad you got it working!