"joining" multiple collections on different fields

Hi,
I need help sone help here.
I have three collections, named employees, customfield.values, customfields.
I want to join customfield.values to employees and customfields to customfield.values.
I manage to do the first join with a lookup aggregation but the second one is the problem.
I cannot lookup customfieldvalues.customfield in the customfield collection.
Can someone help?




//employees
[
{
"_id": "61da0dc4f30580abadcb018c",
"employeenumber": "hitest1"
}]

//customfield.values
[
  {
    "_id": "61daf0c600f0af793e477e7b",
    "customfield": "61daf1ebb124e2e3ce754eda",
    "employee": "61da0dc4f30580abadcb018c",
    "value": "first cf value"
  },
  {
    "_id": "61daf0df9e0eaa2056dc7475",
    "customfield": "61dafdf905b91bce14bff88e",
    "employee": "61da0dc4f30580abadcb018c",
    "value": "second cf value"
  }]

//customfields
[
  {
    "_id": "61daf1ebb124e2e3ce754eda",
    "customfieldname": "first cf"
  },
  {
    "_id": "61dafdf905b91bce14bff88e",
    "customfieldname": "second cf"
  }]

I want to join all three collections together. Starting with employees, joining customfield.values, and then joining customfields to it. If I had to do two aggregations , it would look something like this:


aggregate(
[{"$lookup":
  {"from":"customfield.values",
   "localField":"_id",
   "foreignField":"employee",
   "as":"customfieldvalues"}},
 {"$unwind":"$customfieldvalues"}]),

aggregate(
[{"$lookup":
  {"from":"customfield",
   "localField":"customfieldvalues.customfield",
   "foreignField":"_id",
   "as":"customfieldvalues.customfield"}}])

and thats the result i desire

[
{
"_id": "61da0dc4f30580abadcb018c",
"employeenumber": "hitest1",
"customfieldvalues": [
  {
    "_id": "61daf0c600f0af793e477e7b",
    "customfield": {
      "_id": "61daf1ebb124e2e3ce754eda",
      "customfieldname": "first cf"
    },
    "employee": "61daf0c500f0af793e477e74",
    "value": "first cf value", 
    },
  {
    "_id": "61daf0df9e0eaa2056dc7475",
    "customfield": {
      "_id": "61dafdf905b91bce14bff88e",
      "customfieldname": "second cf"
    },
    "employee": "61daf0c500f0af793e477e74",
    "value": "second cf value",
  }]

Thank you.

Hi @empty ,

With a small nesting of pipeline syntax i $lookup you can achieve that:

[{$lookup: {
 from: 'customfield.values',
 'let': {
  emp_id: '$_id'
 },
 pipeline: [
  {
   $match: {
    $expr: {
     $eq: [
      '$employee',
      '$$emp_id'
     ]
    }
   }
  },
  {
   $lookup: {
    from: 'customfields',
    'let': {
     custom_field: '$customfield'
    },
    pipeline: [
     {
      $match: {
       $expr: {
        $eq: [
         '$_id',
         '$$custom_field'
        ]
       }
      }
     }
    ],
    as: 'customfield'
   }
  }
 ],
 as: 'customfieldvalues'
}}]

Example:

db.employees.aggregate([{$lookup: {
 from: 'customfield.values',
 'let': {
  emp_id: '$_id'
 },
 pipeline: [
  {
   $match: {
    $expr: {
     $eq: [
      '$employee',
      '$$emp_id'
     ]
    }
   }
  },
  {
   $lookup: {
    from: 'customfields',
    'let': {
     custom_field: '$customfield'
    },
    pipeline: [
     {
      $match: {
       $expr: {
        $eq: [
         '$_id',
         '$$custom_field'
        ]
       }
      }
     }
    ],
    as: 'customfield'
   }
  }
 ],
 as: 'customfieldvalues'
}}])
{ _id: '61da0dc4f30580abadcb018c',
  employeenumber: 'hitest1',
  customfieldvalues: 
   [ { _id: '61daf0c600f0af793e477e7b',
       customfield: [ { _id: '61daf1ebb124e2e3ce754eda', customfieldname: 'first cf' } ],
       employee: '61da0dc4f30580abadcb018c',
       value: 'first cf value' },
     { _id: '61daf0df9e0eaa2056dc7475',
       customfield: 
        [ { _id: '61dafdf905b91bce14bff88e',
            customfieldname: 'second cf' } ],
       employee: '61da0dc4f30580abadcb018c',
       value: 'second cf value' } ] }

However, I recommend you to revisit your schema as it looks like a normalised schema for SQL. You won’t benifit from the document model if data that is queried together is not stored that way together…

Look why you can’t store the data the way you desired to query it.

Please read :

Thanks!

Hello @Pavel_Duchovny ,

thank you for the solution. It works.

I actually had this Design as a normalized SQL Schema but I decided to switch to mongodb.

The feature I want to implement has to be very dynamic. Thats why I decided to keep this normalization.
I am not sure how to do it otherwise.

employee to customfield.values is an one-to-many relation.
If I use embedded data I can have the customfield.values but not the customfield.name and parameters like custom regex and other settings which will be saved in the customfield or customfield.config collection (depends if I will use embedded data here).

It is kinda hard to build a very dynamic application where the customer can set every field, grouping, settings etc. with embedded data.

Hi @empty ,

Don’t forget that its ok to duplicate data. So you can store the customers parameter as a customer setup and also store each value in its own collection for other access.

Keeping those consistency does not necessary for every use case…

@Pavel_Duchovny
I see.
So I should just embed the child collections fields into the parent (or at least the fields I need) and keep the child element as it is in its own collection.

I have something like this:
employee has customfield.values,
customfield.values has customfields,
customfields has customfield.options
and so on.

What I have to do now is to make a big employee Schema where I embed all the child collections and fields, their child collections and fields and their child collections and fields, right?

I cannot use the child collection schema since I do not need all the fields and because I do need the child collection of the child collection. So I really need do an overhead and embed everything into employee collection and also embed everything to the child collection since I need the child collection of the child collection of employee…

right?

Thank you.

What I mean is, that I need to do something like this:

employee: {
  customfieldvalues: [{
    key: String,
    value: String,
    customfield: {
      _id: ObjectId,
      name: String,
      customfieldoptions: [
        {_id: ObjectId, name: String, value: String... }
      ]
  }]
}

instead of

employee: {
  customfieldvalues: [customfieldValueSchema]

Because I do not need all the fields in customfield.values, customfield and customfield.options collections, right?

Wouldn’t it be better to just embed the customfieldValueSchema into employee, customfieldSchema into customfieldvalue, customfieldOptionSchema into customfield and so on?

But then again I have cases where I do reference back to one of the parent collections. It would end in a loop, guess :confused:

Any good option to shorten the overhead?
Thanks.

Hi @empty ,

MongoDB is a flexible schema , therefore if you duplicate data you should duplicate only relevant parts or sub parts for a specific document state.

I suspect that you use moongose or some ODM of this sort. Although, it might be convenient in some ways , I see gow it makes you use blocks of fields as one when they should not.

This is why I recommend using the node driver. Documents are already objects so it makes sense to pass them in different shapes to smaller functions that can generic save them

Read the extended reference pattern

Thanks
Pavel