Join documents and then join arrays in documents

Hi,

I have two documents which I can join using $lookup using field linkId. Once joined I needed to be able to join the netsed array from both documents on the id field of the items in the array. Is it possible to do this?

Document 1
{
    _id:
    {
        "$oid": "64678a20d22348271c05f102"
    },
    taskPredictions: [
        {
            id: 2900868,
            predictedCompletionStatus: "Early",
            priority: 1,
        },
        {
            id: 2900488,
            predictedCompletionStatus: "OnTimeInFull",
            priority: 4,
        },
    ],
    linkId:
    {
        "$oid": "64678a20d22348271c05f101"
    },
}

Document 2
{
    _id:
    {
        "$oid": "64678a27d22348271c05f10a"
    },
    tasks: [
        {
            id: 2900868,
            name: "Task 1",

        },
        {
            id: 2900488,
            name: "Task 2",

        },
    ],

    "linkId":
    {
        "$oid": "64678a20d22348271c05f101"
    }
}

the output I’m trying to achieve is like this

{
    tasks: [
        {
            id: 2900868,
            name: "Task 1",
            predictedCompletionStatus: "Early",
            priority: 1,            
        },
        {
            id: 2900488,
            name: "Task 2",
            predictedCompletionStatus: "OnTimeInFull",
            priority: 4,            
        },
    ]
}

If someone could point me in the right direction I’d appreciate it.

Hi @mc_m0ng0,

In my below example, I assume they belong to the same collection "collection" but you can alter and test it accordingly.

Pipeline:

[
  { '$unwind': '$tasks' },
  {
    '$lookup': {
      from: 'collection',
      as: 'test',
      pipeline: [ { '$unwind': '$taskPredictions' } ],
      foreignField: 'linkId',
      localField: 'linkId'
    }
  },
  {
    '$addFields': {
      combinedTaskDetails: {
        '$map': {
          input: '$test',
          as: 'test',
          in: {
            '$cond': {
              if: { '$eq': [ '$$test.taskPredictions.id', '$tasks.id' ] },
              then: {
                tasksv2: {
                  id: '$tasks.id',
                  name: '$tasks.name',
                  predictedCompletionStatus: '$$test.taskPredictions.predictedCompletionStatus',
                  priority: '$$test.taskPredictions.priority'
                }
              },
              else: null
            }
          }
        }
      }
    }
  },
  {
    '$group': {
      _id: '$linkId',
      tasks: { '$push': '$combinedTaskDetails.tasksv2' }
    }
  }
]

Output (based on the sample documents you provided):

[
  {
    _id: ObjectId("64678a20d22348271c05f101"),
    tasks: [
      [
        {
          id: 2900868,
          name: 'Task 1',
          predictedCompletionStatus: 'Early',
          priority: 1
        }
      ],
      [
        {
          id: 2900488,
          name: 'Task 2',
          predictedCompletionStatus: 'OnTimeInFull',
          priority: 4
        }
      ]
    ]
  }
]

It seems it is possible based off the sample documents you provided. However, although the above may give you your desired output, it is far from optimal in terms of performance. You could possibly try adding in $match stage at the start with index usage which could help reduce the amount of documents processed to help out to some degree.

In saying so, if you’re joining these using $lookup initially, is there a reason you don’t just insert these into a single collection in the desired format in the first place?

There could be another way to get the desired output but will let other community members chime in as well.

Regards,
Jason

Thanks. The documents are stored separately as one is calculated from the other and this data is not always needed. The data I provided is just a small sample similar to the structure of my problem data. If I try to store all the data in one document I will potentially hit the 16MB limit. I use match and filter to reduce the data sets before processing.

Your solution has given me some ideas but the $map only seems to work if both nested arrays are sorted on the same field. Any ideas how to work around this as I can’t seem to work out how to sort the arrays on the field I want to join them on. Really what I want in this case is an inner join on the nested arrays.