Merge from multiple tables

Setting the scene:-
I am writing functions to update fields within a collection based on values in multiple separate collections. Most of the updates are of the form if fieldA is null lookup a value from table 2. set fieldA = looked_up_value_from_table2
To do this I am using an aggregation pipeline followed by a merge.
Problem.
While the merge is setting the right value it is also overwriting some data I expected it to be leaving. E.g If I merge value ‘content.labels.name’ in the json copied below, the whole of content branch is overwritten.

{
“_id” : ObjectId(“5b0e9ca36f72c14f486e4e4d”),
“team_id” : 28,
“service_name” : “aaa”,
“updated_at” : ISODate(“2019-05-30T00:00:00.000Z”),
“created_at” : ISODate(“2018-05-30T00:00:00.000Z”),
“document_type” : “questionnaireResponse”,
“content” : {
“questionnaire_uuid” : “9bf3daa775d3d79758f10186679aaf1baccb3eb219176d69ac84c15e77e9d3e3”,
“questionnaire_name” : “B1234”,
“status” : “expired”,
“labels” : {
“referral_uuid” : “99999”,
“context” : “rrh_as”
}
},
“basfiFound” : true
}
merge with values for _id and ‘content.labels.name’ results in
{
“_id” : ObjectId(“5b0e9ca36f72c14f486e4e4d”),
“team_id” : 28,
“service_name” : “aaa”,
“updated_at” : ISODate(“2019-05-30T00:00:00.000Z”),
“created_at” : ISODate(“2018-05-30T00:00:00.000Z”),
“document_type” : “questionnaireResponse”,
“content” : {
“labels” : {
“name” : “new value”
}
},
“basfiFound” : true
}
I lose the whole of the content branch
Is there a way to get merge to not overwrite the parts of the branch not being updated? … or is there a better way to do this not using merge?

What do you mean by ‘merge’?
$merge aggregation pipeline stage?
$mergeObjects aggregation pipeline operator?

When you update nested properties in an aggregation pipeline or using update operations, refer to your prop using dot notation: { ‘prop.nestedObj.propA’: value }.

Have a look at this aggregation example:

db.test1.aggregate([
  {
    $lookup: {
      from: 'test2',
      localField: 'content.labels.context',
      foreignField: 'context',
      as: 'joined',
    },
  },
  {
    // transform array to object
    // for the case, when we have 1:1 relations between test1 and test2
    $addFields: {
      joined: {
        $arrayElemAt: ['$joined', 0],
      },
    },
  },
  {
    $addFields: {
      'content.labels.name': {
        $cond: {
          if: {
            $eq: [{ $type: '$content.labels.name' }, 'missing'],
          },
          // set name if it is does not present in original doc
          then: '$joined.name',
          // return its current value
          else: '$content.labels.name',
        },
      },
    },
  },
  {
    // remove temp props
    $unset: ['joined'],
  },
  {
    // output result into other collection
    $merge: {
      into: 'test3',
    },
  },
]);

The above aggregation uses this initial data:

db.test1.insertMany([
  {
    status: 's1',
    content: {
      labels: {
        context: 'c1',
      },
    },
  },
  {
    status: 's2',
    content: {
      labels: {
        context: 'c2',
        name: 'n2',
      },
    },
  },
]);

db.test2.insertMany([
  {
    context: 'c1',
    name: 'n1',
    flag: 1,
  },
]);

As a result, ‘test3’ collection has the following data:

[
  {
    "_id" : ObjectId("5ef9dbe9a1bfc8b19c3b4cf1"),
    "content" : {
      "labels" : {
        "context" : "c1",
        "name" : "n1"
      }
    },
    "status" : "s1"
  }
  {
    "_id" : ObjectId("5ef9dbe9a1bfc8b19c3b4cf2"),
    "content" : {
      "labels" : {
        "context" : "c2",
        "name" : "n2"
      }
    },
    "status" : "s2"
  }
]

As you can see, field ‘name’ is updated only when it is necessary, and other props inside ‘labels’ object are not lost.

@slava , Thank you for that alternative. This is exactly the behaviour I expected from $merge ( based on the documentation.)
I noticed you used … $eq: [{ $type: ‘$content.labels.name’ }, ‘missing’]
if this the same as … ‘content.labels.name’: {$exists: true }
…or … ‘content.labels.name’: { $not: { $type: 10 }, $exists: true }
Is there any difference between these …or they all just check the field exists

Good question!

Well, { $exists: true } can be used only in queries for matching properties by their presense in a document.
While $type can be used to match values by their type or to extract type from value.

In the excerpt below, operator $type is used to extract value from ‘$content.labels.name’ prop, and then $eq operator compares it to the ‘missing’ string.

$eq: [{ $type: ‘$content.labels.name’ }, ‘missing’]

Better to explain them in example. Consider, we have this dataset:

db.test4.insertMany([
  {
    _id: 'A',
    name: 'Bilbo',
    age: null,
  },
  {
    _id: 'B',
    name: 'Frodo',
    surname: 'Baggins',
  },
]);

As I wrote above, $type can be used to get the type of a value. Like this:

db.test4.aggregate([
  {
    $project: {
      typeOfName: {
        $type: '$name',
      },
      typeOfSurname: {
        $type: '$surname',
      },
      typeOfAge: {
        $type: '$age',
      },
    },
  },
]).pretty();

The output of above aggregation will be this:

[
  {
    "_id" : "A",
    "typeOfName" : "string",
    "typeOfSurname" : "missing",
    "typeOfAge" : "null"
  }
  {
    "_id" : "B",
    "typeOfName" : "string",
    "typeOfSurname" : "string",
    "typeOfAge" : "missing"
  }
]

Now, let’s review how we can use $type, and $exists operators in queries:

To get documents, that does not have some prop

a) db.test4.find({ surname: { $exists: false } });
b) db.test4.find({ surname: null });
// all these queries will return document A only.

To get documents, with a prop, that has null for its value

c) db.test4.find({ age: { $type: 10 } }); 
d) db.test4.find({ age: { $type: 'null' } });
// notice, that 'null' is just textual alias for '10'.
// all these queries will return document A only.

To get documents, that have prop, that is either: missing or equal to null

e) db.test4.find({ age: null });  
/* this query will return both documents: A and B.
   yes, null with match missing fields, 
   and ones that have null for their value
*/

Note, that ‘null’ and ‘string’ are names of the types, while ‘missing’ is just a string, that tells you that property value does not have a type (because property does not exist, it is ‘missing’). That means that you can do queries like this:

db.test4.find({ surname: { $type: 'string' }}); // => returns only doc B
db.test4.find({ age: { $type: 'null' }});  // => returns only doc A

But you can not use a query like this (because there is no such type as ‘missing’):

db.test4.find({ age: { $type: 'missing' }}); // throws error
1 Like

I also found a solution using merge.
By default if using path notation to update a value, such as ‘branch.twig.leaf’, merge does replace the entire branch… not just the leaf.

The following will just replace, or add , the leaf. Leaving the branch intact.

{ $merge: {into:'existingcollection',
    whenMatched:[{$set:{"content.labels.name":"$$new.content.labels.name"}}]
}}
1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.