How to perform $lookup on items in arrays stored in multiple values of key-value pairs of a dictionary?

Let me explain the data I’m working on:

  • The collection properties contains attributes for items (e.g. “main subject”, “language”, “publication date”, “publisher”). The field id matches the regular expression P[0-9]+.
  • The collection items mainly contains literary works (e.g. academic books). The field id matches the regular expression Q[0-9]+. The field statements is a dictionary where the keys are the id of itmes in the collection properties.

Here’s a minimal working example:

db.properties.deleteMany({})
db.properties.insertMany([{'id': 'P1', 'label': 'main subject'}])
db.properties.insertMany([{'id': 'P2', 'label': 'language of work'}])

db.items.deleteMany({});
db.items.insertMany([
  {'id': 'Q1', 'label': 'algorithm'},
  {'id': 'Q2', 'label': 'C++'},
  {'id': 'Q3', 'label': 'English'},
  {
    'id': 'Q4',
    'label': 'Algorithms in C++',
    'statements': {
      'P1': [
        {'property': 'P1', 'value': 'Q1'},
        {'property': 'P1', 'value': 'Q2'}
      ],
      'P2': [
        {'property': 'P2', 'value': 'Q3'}
      ]
    }
  },
]);
print('properties')
printjson(db.properties.find({}))
print('items')
printjson(db.items.find({}))

properties
[
  {
    _id: ObjectId('66b4f9031669d5840e2202d8'),
    id: 'P1',
    label: 'main subject'
  },
  {
    _id: ObjectId('66b4f9031669d5840e2202d9'),
    id: 'P2',
    label: 'language of work'
  }
]
items
[
  {
    _id: ObjectId('66b4f9031669d5840e2202da'),
    id: 'Q1',
    label: 'algorithm'
  },
  {
    _id: ObjectId('66b4f9031669d5840e2202db'),
    id: 'Q2',
    label: 'C++'
  },
  {
    _id: ObjectId('66b4f9031669d5840e2202dc'),
    id: 'Q3',
    label: 'English'
  },
  {
    _id: ObjectId('66b4f9031669d5840e2202dd'),
    id: 'Q4',
    label: 'Algorithms in C++',
    statements: {
      P1: [
        {
          property: 'P1',
          value: 'Q1'
        },
        {
          property: 'P1',
          value: 'Q2'
        }
      ],
      P2: [
        {
          property: 'P2',
          value: 'Q3'
        }
      ]
    }
  }
]

I want to get the following result. That is, lookup the field property and value of each item in the array stored in the value of the key-value pairs in the field statements.

I don’t know how to instruct $lookup to iterate through all key-value pairs in the dictionary statements. Then, iterate through all items in the array stored in the value of the key-value pairs of statements and perform $lookup in the fields property and value.

[
  {
    _id: ObjectId('66b4f699c56a647e922202dd'),
    id: 'Q4',
    label: 'Algorithms in C++',
    statements: {
      P1: [
        {
          property: {'id': 'P1', 'label': 'main subject'}
          value: {'id': 'Q1', 'label': 'algorithm'}
        },
        {
          property: {'id': 'P1', 'label': 'main subject'}
          value: {'id': 'Q2', 'label': 'C++'}
        }
      ],
      P2: [
        {
          property: {'id': 'P2', 'label': 'language'}
          value: {'id': 'Q3', 'label': 'English'}
        }
      ]
    }
  }
]

I searched information on the Internet, but I only found examples that showed how to perform $lookup on explicit fields. I didn’t find examples on how to perform $lookup on an arbitrary number of fields.

I managed to do it on my own. Consider the following data:

db.properties.deleteMany({})
db.properties.insertMany([{'_id': 'P1', 'label': 'main subject'}])
db.properties.insertMany([{'_id': 'P2', 'label': 'language'}])
db.properties.insertMany([{'_id': 'P3', 'label': 'instance of'}])

db.items.deleteMany({});
db.items.insertMany([
  {'_id': 'Q1', 'label': 'algorithm'},
  {'_id': 'Q2', 'label': 'C++'},
  {'_id': 'Q3', 'label': 'English'},
  {'_id': 'Q4', 'label': 'written work'},
  {
    '_id': 'Q5',
    'label': 'Algorithms in C++',
    'statements': {
      'P1': [{'value': 'Q1'}, {'value': 'Q2'}],
      'P2': [{'value': 'Q3'}],
      'P3': [{'value': 'Q4'}]
    }
  },
]);

This is the query I came up with:

printjson(db.items.aggregate([
  // We use objectToArray so that afterwards we can unwind $statements.
  {
    $project: {
      statements: {
        $objectToArray: '$statements'
      }
    }
  },
  // We $unwind $statements so that we have an object for each
  // different property.
  {
    $unwind: "$statements"
  },
  // We $unwind $statements.v so that we have an object for each
  // different pair of property and value.
  {
    $unwind: "$statements.v"
  },
  // lookup properties
  {
    $lookup: {
      'from': 'properties',
      'localField': 'statements.k',
      'foreignField': '_id',
      'as': 'statements.k'
    }
  },
  // convert array to object in key "k"
  {
    $unwind: "$statements.k"
  },
  // lookup items
  {
    $lookup: {
      'from': 'items',
      'localField': 'statements.v.value',
      'foreignField': '_id',
      'as': 'statements.v'
    }
  },
  // convert array to object in key "v"
  {
    $unwind: "$statements.v"
  },
  // group by item id and property id
  {
    $group: {
      "_id": {
        "_id": "$_id",
        "property_id": "$statements.k._id"
      },
      "statements": {
        "$push": {
          "property": "$statements.k",
          "value": "$statements.v"
        }
      }
    }
  },
  // group by property id
  {
    $group: {
      "_id": "$_id._id",
      "statements": {
        "$push": {
          k: "$_id.property_id",
          v: "$statements"
        }
      }
    }
  },
  // make statements be an object
  {
    $project: {
      "statements": {
        $arrayToObject: "$statements"
      }
    }
  },
]))

output:

[
  {
    _id: 'Q5',
    statements: {
      P1: [
        {
          property: {
            _id: 'P1',
            label: 'main subject'
          },
          value: {
            _id: 'Q1',
            label: 'algorithm'
          }
        },
        {
          property: {
            _id: 'P1',
            label: 'main subject'
          },
          value: {
            _id: 'Q2',
            label: 'C++'
          }
        }
      ],
      P2: [
        {
          property: {
            _id: 'P2',
            label: 'language'
          },
          value: {
            _id: 'Q3',
            label: 'English'
          }
        }
      ],
      P3: [
        {
          property: {
            _id: 'P3',
            label: 'instance of'
          },
          value: {
            _id: 'Q4',
            label: 'written work'
          }
        }
      ]
    }
  }
]

This is the first time I write an aggregate query that involves more than 2 operations, so I wonder if there’s a shorter way to get the same output.