After some aggregation stages, I want the unwind result with other collections fields with their values as blank/empty

am usign mongo version as v5.0.19

I have 2 collections with the data in it as below:

Collection 1 - external_S_P_FLAT_main_api:

[
    {
        "_id": {
            "$oid": "654c6a594d0867aef588674d"
        },
        "data.pricing.material": "TG11",
        "data.pricing.controlling_area": "AJ00",
        "data.pricing.cost_center": "DE0000012"
    },
    {
        "_id": {
            "$oid": "654c6a594d0867aef588674e"
        },
        "data.pricing.material": "TG12",
        "data.pricing.controlling_area": "AJ00",
        "data.pricing.cost_center": "DE000002"
    },
    {
        "_id": {
            "$oid": "654c6a594d0867aef588674f"
        },
        "data.pricing.material": "TG14",
        "data.pricing.controlling_area": "AJ00",
        "data.pricing.cost_center": "DE000003"
    },
    {
        "_id": {
            "$oid": "654c6a594d0867aef5886750"
        },
        "data.pricing.material": "TG2341",
        "data.pricing.controlling_area": "AJ00",
        "data.pricing.cost_center": "DE000004"
    }
]

Collection 2 - external_S_C_FLAT_main_api:

[
    {
        "_id": {
            "$oid": "654c6a594d0867aef5886751"
        },
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000001",
        "data.costcenter.valid_from_date": "2023-09-12",
        "data.costcenter.long_description": "CC DE000001 - 3rd"
    },
    {
        "_id": {
            "$oid": "654c6a594d0867aef5886752"
        },
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000002",
        "data.costcenter.valid_from_date": "2022-03-02",
        "data.costcenter.long_description": "CC DE000002 - 1st"
    },
    {
        "_id": {
            "$oid": "654c6a594d0867aef5886753"
        },
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000003",
        "data.costcenter.valid_from_date": "2023-10-25",
        "data.costcenter.long_description": "CC DE000003 - 1st"
    },
    {
        "_id": {
            "$oid": "654c6a594d0867aef5886754"
        },
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000004",
        "data.costcenter.valid_from_date": "2023-10-25",
        "data.costcenter.long_description": "CC DE000004 - 2nd"
    }
]

I am executing the below query:

db.external_S_P_FLAT_main_api.aggregate([
    {
      "$addFields": {
        "external_S_P_FLAT_main_api_data.pricing.controlling_area": "$data.pricing.controlling_area"
      }
    },
    {
      "$addFields": {
        "external_S_P_FLAT_main_api_data.pricing.cost_center": "$data.pricing.cost_center"
      }
    },
    {
      "$addFields": {
        "external_S_P_FLAT_main_api_data.pricing.cost_center": "$data.pricing.cost_center"
      }
    },
    {
      "$lookup": {
        from: "external_S_C_FLAT_main_api",
        let: {
          let_data__pricing__cost_center: {
            $getField: "data.pricing.cost_center"
          }
        },
        pipeline: [
          {
            "$match": {
              "$expr": {
                "$and": [
                  {
                    "$eq": [
                      {
                        $getField: "data.costcenter.cost_center"
                      },
                      "$let_data__pricing__cost_center"
                    ]
                  }
                ]
              }
            }
          }
        ],
        as: "from_external_S_C_FLAT_main_api"
      }
    },
    {
      "$unwind": {
        path: "$from_external_S_C_FLAT_main_api",
        preserveNullAndEmptyArrays: true,
        // includeArrayIndex: ""
        
      }
    },
    {
      "$replaceRoot": {
        newRoot: {
          "$mergeObjects": [
            "$from_external_S_C_FLAT_main_api",
            "$ROOT"
          ]
        }
      }
    },
    {
      "$project": {
        _id: 0,
        "external_S_P_FLAT_main_api_data.pricing.controlling_area": 0,
        "external_S_P_FLAT_main_api_data.pricing.cost_center": 0,
        // from_external_S_C_FLAT_main_api: 0
      }
    }
  ])

Below is the result:

[
    {
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE0000012",
      "data.pricing.material": "TG11",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      }
    },
    {
      "data.costcenter.controlling_area": "AJ00",
      "data.costcenter.cost_center": "DE000002",
      "data.costcenter.long_description": "CC DE000002 - 1st",
      "data.costcenter.valid_from_date": "2022-03-02",
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE000002",
      "data.pricing.material": "TG12",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      },
      "from_external_S_C_FLAT_main_api": {
        "_id": ObjectId("654c6a594d0867aef5886752"),
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000002",
        "data.costcenter.long_description": "CC DE000002 - 1st",
        "data.costcenter.valid_from_date": "2022-03-02"
      }
    },
    {
      "data.costcenter.controlling_area": "AJ00",
      "data.costcenter.cost_center": "DE000003",
      "data.costcenter.long_description": "CC DE000003 - 1st",
      "data.costcenter.valid_from_date": "2023-10-25",
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE000003",
      "data.pricing.material": "TG14",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      },
      "from_external_S_C_FLAT_main_api": {
        "_id": ObjectId("654c6a594d0867aef5886753"),
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000003",
        "data.costcenter.long_description": "CC DE000003 - 1st",
        "data.costcenter.valid_from_date": "2023-10-25"
      }
    },
    {
      "data.costcenter.controlling_area": "AJ00",
      "data.costcenter.cost_center": "DE000004",
      "data.costcenter.long_description": "CC DE000004 - 2nd",
      "data.costcenter.valid_from_date": "2023-10-25",
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE000004",
      "data.pricing.material": "TG2341",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      },
      "from_external_S_C_FLAT_main_api": {
        "_id": ObjectId("654c6a594d0867aef5886754"),
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000004",
        "data.costcenter.long_description": "CC DE000004 - 2nd",
        "data.costcenter.valid_from_date": "2023-10-25"
      }
    }
  ]

The result array’s first object is not having any data for collection 2.

I want it to be filled with the keys having blank/empty values.

Expected result is as below:

[
    {
      "data.costcenter.controlling_area": "",
      "data.costcenter.cost_center": "",
      "data.costcenter.long_description": "",
      "data.costcenter.valid_from_date": "",
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE0000012",
      "data.pricing.material": "TG11",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      },
      "from_external_S_C_FLAT_main_api": [
        {
            "_id": "",
            "data.costcenter.controlling_area": "",
            "data.costcenter.cost_center": "",
            "data.costcenter.long_description": "",
            "data.costcenter.valid_from_date": ""
        }
      ]
    },
    {
      "data.costcenter.controlling_area": "AJ00",
      "data.costcenter.cost_center": "DE000002",
      "data.costcenter.long_description": "CC DE000002 - 1st",
      "data.costcenter.valid_from_date": "2022-03-02",
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE000002",
      "data.pricing.material": "TG12",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      },
      "from_external_S_C_FLAT_main_api": {
        "_id": ObjectId("654c6a594d0867aef5886752"),
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000002",
        "data.costcenter.long_description": "CC DE000002 - 1st",
        "data.costcenter.valid_from_date": "2022-03-02"
      }
    },
    {
      "data.costcenter.controlling_area": "AJ00",
      "data.costcenter.cost_center": "DE000003",
      "data.costcenter.long_description": "CC DE000003 - 1st",
      "data.costcenter.valid_from_date": "2023-10-25",
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE000003",
      "data.pricing.material": "TG14",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      },
      "from_external_S_C_FLAT_main_api": {
        "_id": ObjectId("654c6a594d0867aef5886753"),
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000003",
        "data.costcenter.long_description": "CC DE000003 - 1st",
        "data.costcenter.valid_from_date": "2023-10-25"
      }
    },
    {
      "data.costcenter.controlling_area": "AJ00",
      "data.costcenter.cost_center": "DE000004",
      "data.costcenter.long_description": "CC DE000004 - 2nd",
      "data.costcenter.valid_from_date": "2023-10-25",
      "data.pricing.controlling_area": "AJ00",
      "data.pricing.cost_center": "DE000004",
      "data.pricing.material": "TG2341",
      "external_S_P_FLAT_main_api_data": {
        "pricing": {}
      },
      "from_external_S_C_FLAT_main_api": {
        "_id": ObjectId("654c6a594d0867aef5886754"),
        "data.costcenter.controlling_area": "AJ00",
        "data.costcenter.cost_center": "DE000004",
        "data.costcenter.long_description": "CC DE000004 - 2nd",
        "data.costcenter.valid_from_date": "2023-10-25"
      }
    }
  ]
 
  

I have tried the blow section by adding the includeArrayIndex

{
      "$unwind": {
        path: "$from_external_S_C_FLAT_main_api",
        preserveNullAndEmptyArrays: true,
        // includeArrayIndex: ""
        
      }
    },

Below is the playground for the same Mongo playground

I hope the reasons are good for you to store fields name in flat dot notation rather than object tree.

I see many reasons why it is wrong.

You documents in external_S_P_FLAT_main_api are 14 bytes bigger compare to:

{
  _id: {
            "$oid": "654c6a594d0867aef588674d"
  },
  data: {
    pricing: {
      material: 'TG11',
      controlling_area: 'AJ00',
      cost_center: 'DE0000012'
    }
  }
}

while the documents in external_S_C_FLAT_main_api are 50 bytes bigger compare to a formed object tree such as:

{
        "_id": {
            "$oid": "654c6a594d0867aef5886751"
        },
        "data" : {
            "costcenter" : {
                "controlling_area": "AJ00",
                "cost_center": "DE000001",
                "valid_from_date": "2023-09-12",
                "long_description": "CC DE000001 - 3rd"
            }
      }    
}

I often say to not create a bad schema just to save a few bytes.

But in this case saving a few bytes leads to a better schema and to cleaner code. You could in one simple $addFields get all the fields of data.costcenter rather than having to do it for each. Which leads to errors like:

Why is the above repeated? Note that you could merge all consecutive $addFields into a single one.

With a proper object tree structure you could use localField/foreignField rather than a $match with $let and $getField.

@steevej, thanks for correcting me.
I understand that storing the data in such a way requires a bit bigger size but the data in the collection is completely getting wiped after the execution of a daily cron.

  • you could merge all consecutive $addFields into a single one.

I am working on it to get in a single one.

But with the same query and the schema, how can I get the expected result (as posted in the main thread)?

I am not too sure as it would requires the use of $replaceWith and $setField which I am not familiar with.

But I update your playground to one playground that shows the logic but that generates the missing field to a proper object because I do not know how to do it otherwise. May be you can flatten it in subsequent stage. You see simpler and more efficient schema leads to simpler and probably more efficient code.

I would not consider the above a good reason to keep this schema around.