How to flatten a nested structure?

Hello everyone, I am trying to query a nested structure and want to group the lineText in an array. I have the following structure.

[
{
“_id”: ObjectId(“5ed4bcda3c6f0e386b9bee92”),
“parsedText”: [
{

    "paragraphs": [
      [
        {
          "lineText": "E x h i b i t  1 0 . 2",
          "startT": [
            10.00648158191587
          ],
          "endT": [
            10.00648158191587
          ]
        }
      ],
      [
        {
          "lineText": "VENDOR Afdfj AGREEMENT",
          "startT": [
            10.00648158191587
          ],
          "endT": [
            10.00648158191587
          ]
        }
      ]
  ]
  },
  {
    "paragraphs": [
      [
        {
          "lineText": "CONFIDENTIALITY",
          "startT": [
            10.00648158191587,
            NumberInt(0),
            NumberInt(0),
            10.00648158191587,
            34.00538853510875,
            711.4527903321338
          ],
          "endT": [
            10.00648158191587,
            NumberInt(0),
            NumberInt(0),
            10.00648158191587,
            70.0377305857611,
            711.4527903321338
          ]
        }
      ],
      [
        {
          "lineText": "Tesijtiosje",
          "startT": [
            10.00648158191587
          ],
          "endT": [
            10.00648158191587,
          ]
        },
        {
          "lineText": "wkdfskdjfksdlfkjlsdjf",
          "startT": [
            10.00648158191587,
          ],
          "endT": [
            10.00648158191587,
          ]
        },
        {
          "lineText": "asdjiasjdoasid",
          "startT": [
            10.00648158191587,
          ],
          "endT": [
            10.00648158191587,
          ]
        }
      ],
    ]
  }
]

}
]

is it possible to have everything unwind (lineText) in parsedText like:
"parsedText": ["", "", "", ""]

or

“parsedText”: [
{

    "paragraphs": ["", ""]
  },
  {
    "paragraphs": ["", ""]
  }

]

A help will be highly appreciated.

Hello @Sami_Ullah, welcome to the MongoDB Community forum!

This aggregation query can flatten the nested structure:

 db.collection.aggregate([
  { 
      $unwind: "$parsedText" 
  },
  { 
      $unwind: "$parsedText.paragraphs" 
  },
  { 
      $addFields: { 
          parsedText: { $arrayElemAt: [ "$parsedText.paragraphs", 0 ] } 
      } 
  },
  { 
      $group: { 
          _id: "$_id", 
          parsedText: { $push: "$parsedText.lineText" } 
      } 
  },
])

You get a result like this:

{
        "_id" : ObjectId("5ed4bcda3c6f0e386b9bee92"),
        "parsedText" : [
                "E x h i b i t  1 0 . 2",
                "VENDOR Afdfj AGREEMENT",
                "CONFIDENTIALITY", ...
        ]
}
2 Likes

Thank you so much Prasad for the help. However, some of the lineText is missing in the output.

[Mongo playground](Link : Mongo playground)

Bin Architects Office (hereinafter referred to as Party B) hereby agree to establish the following terms and conditions with regard to the

This is in the input but missing in aggregated output.

@Sami_Ullah, I had tried based upon the input document you had posted (initial post). I don’t know what is missing (I have no way to figure based upon the sample you had provided). You need to be specific what is missing, from a simple input document which I can verify.

1 Like

@Prasad_Saya thank you for quick response.

Here’s the link to the input I initially provided. https://mongoplayground.net/p/yPQbp9ahits

Based on this input, wkdfskdjfksdlfkjlsdjf line 66 and asdjiasjdoasid line 77 are missing from the output.

Hello

I tested this fast,seems to work,but not sure it does what you need.

db.collection.aggregate([
  {
    $unwind: "$parsedText"
  },
  {
    $unwind: "$parsedText.paragraphs"
  },
  {
    $replaceRoot: {
      newRoot: "$parsedText"
    }
  },
  {
    $unwind: "$paragraphs"
  },
  {
    $replaceRoot: {
      newRoot: "$paragraphs"
    }
  },
  {
    $project: {
      "lineText": 1
    }
  },
  {
    $group: {
      _id: "$lineText"
    }
  }
])

Or maybe this is better

db.collection.aggregate([
  {
    $unwind: "$parsedText"
  },
  {
    "$project": {
      "_id": 0,
      "paragraphs": "$parsedText.paragraphs"
    }
  },
  {
    $unwind: "$paragraphs"
  },
  {
    $unwind: "$paragraphs"
  },
  {
    "$project": {
      "lineText": "$paragraphs.lineText"
    }
  },
  {
    $group: {
      _id: "$lineText"
    }
  }
])

If you want them as array you can also add in both,at the end

{
    $group: {
      _id: null,
      parsedText: {
        "$push": "$_id"
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "parsedText": 1
    }
  },

Thank you so much. This works. :slight_smile:

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.