Is it possible to access an array in a dictionary and project the array to a new document?

{
  "_id": {
    "$oid": "6169547919aa9900011c3995"
  },
  "TranscriptionID": "YQPTKEU8JP9844YQVT4WY4HNQP",
  "Status": 1,
  "Created": {
    "$date": {
      "$numberLong": "1633671595666"
    }
  },
  "Modified": {
    "$date": {
      "$numberLong": "1633671595666"
    }
  },
  "Tracks": {
    "A3-A4": {
      "Language": "en_US",
      "Sentences": [
        {
          "SentenceID": "20e8e7f1efd24098be28b3cb5cdbf081",
          "StartTime": 0,
          "EndTime": 0,
          "Speaker": "spk_0",
          "Content": "we should be recording everybody. So yes, this meeting is being recorded, abc",
          "IsApproved": false,
          "ModifiedContent": "we should be recording everybody. So yes, this meeting is being recorded,",
          "Words": [
            {
              "StartTime": 8.84,
              "EndTime": 9.19,
              "Type": "pronunciation",
              "Content": "we",
              "VocabularyFilterMatch": false,
              "Confidence": 1
            }
          ]
        },
        {
          "SentenceID": "fc80abf17f0c42b1833857a1dfb07481",
          "StartTime": 0,
          "EndTime": 0,
          "Speaker": "spk_0",
          "Content": "now we actually have to be somewhat professional. Okay,",
          "IsApproved": true,
          "ModifiedContent": null,
          "Words": [
            {
              "StartTime": 16.74,
              "EndTime": 16.97,
              "Type": "pronunciation",
              "Content": "now",
              "VocabularyFilterMatch": false,
              "Confidence": 0.999
            }
          ]
        },
        {
          "SentenceID": "25f4c1dd322941d4993c02a3270db5bb",
          "StartTime": 8.84,
          "EndTime": 13.565,
          "Speaker": "spk_0",
          "Content": "hahahaha A3-A4 Server",
          "IsApproved": false,
          "ModifiedContent": "we should be recording everybody. So yes, this meeting is being recorded,",
          "Words": [
            {
              "StartTime": 0,
              "EndTime": 0,
              "Type": "pronunciation",
              "Content": "we",
              "VocabularyFilterMatch": false,
              "Confidence": 1
            }
          ]
        }
      ]
    },
    "A5-A6": {
      "Language": "en_US",
      "Sentences": [
        {
          "SentenceID": "ff62bda2900e4121bbd221c3d294af7c",
          "StartTime": 8.84,
          "EndTime": 13.565,
          "Speaker": "spk_0",
          "Content": "we should be recording everybody. So yes, this meeting is being recorded,",
          "IsApproved": false,
          "ModifiedContent": "we should be recording everybody. So yes, this meeting is being recorded,",
          "Words": [
            {
              "StartTime": 0,
              "EndTime": 0,
              "Type": "pronunciation",
              "Content": "we",
              "VocabularyFilterMatch": false,
              "Confidence": 1
            }
          ]
        },
        {
          "SentenceID": "de7328fb670e47b5aa91f8b808cf3e18",
          "StartTime": 16.74,
          "EndTime": 21.965,
          "Speaker": "spk_0",
          "Content": "now we actually have to be somewhat professional. Okay, whatevs",
          "IsApproved": true,
          "ModifiedContent": null,
          "Words": [
            {
              "StartTime": 0,
              "EndTime": 0,
              "Type": "pronunciation",
              "Content": "now",
              "VocabularyFilterMatch": false,
              "Confidence": 0.999
            }
          ]
        }
      ]
    }
  }
}

Hi everyone, I’ve got an example set of data above, Tracks is a dictionary with “A1-A2” as an example key and Language and a list of sentences as its value. I was wondering if it’s possible for me to unwind the sentences array inside tracks and output each sentence to a new document containing the Track key.

Something like below;

{
  "_id": "01GF7QG56P13SBKJHTK51TFK40",
  "transcription:id": "01GCDK5YBHVBN4W2980Y619FTZC",
  "track:id": "A1-A2",
  "sentence:rev": "1",
  "sentence:id": "01GF7QG56P13SBKJHTK51TFK40",
  "startTime:decimal": 0.1275,
  "endTime:decimal": 1.1375,
  "speaker:text": "0",
  "content:text": "Then why do you want to kill",
  "isApproved:bool": false,
  "modifiedContent:text": null,
  "words": [
    {
      "startTime:decimal": 0.1275,
      "endTime:decimal": 0.2975,
      "type:text": "pronunciation",
      "content:text": "Then",
      "vocabularyFilterMatch:bool": false,
      "confidence:number": 0.5578
    },
    {
      "startTime:decimal": 0.2975,
      "endTime:decimal": 0.4475,
      "type:text": "pronunciation",
      "content:text": "why",
      "vocabularyFilterMatch:bool": false,
      "confidence:number": 1
    },
    {
      "startTime:decimal": 0.4475,
      "endTime:decimal": 0.5375,
      "type:text": "pronunciation",
      "content:text": "do",
      "vocabularyFilterMatch:bool": false,
      "confidence:number": 1
    },
    {
      "startTime:decimal": 0.5375,
      "endTime:decimal": 0.6175,
      "type:text": "pronunciation",
      "content:text": "you",
      "vocabularyFilterMatch:bool": false,
      "confidence:number": 1
    },
    {
      "startTime:decimal": 0.6175,
      "endTime:decimal": 0.7875,
      "type:text": "pronunciation",
      "content:text": "want",
      "vocabularyFilterMatch:bool": false,
      "confidence:number": 0.9724
    },
    {
      "startTime:decimal": 0.7875,
      "endTime:decimal": 0.8525,
      "type:text": "pronunciation",
      "content:text": "to",
      "vocabularyFilterMatch:bool": false,
      "confidence:number": 0.9724
    },
    {
      "startTime:decimal": 0.8525,
      "endTime:decimal": 1.1375,
      "type:text": "pronunciation",
      "content:text": "kill",
      "vocabularyFilterMatch:bool": false,
      "confidence:number": 1
    }
  ]
}

Do you really want to “flatten” the data that much?

Here’s a bit of flattening. You could continue it if you want/need.

db.collection.aggregate([
  {
    "$set": {
      "Tracks": {
        "$objectToArray": "$Tracks"
      }
    }
  },
  {
    "$unwind": "$Tracks"
  },
  {
    "$unwind": "$Tracks.v.Sentences"
  },
  {
    "$set": {
      "trackId": "$Tracks.k",
      "sentence": "$Tracks.v.Sentences"
    }
  },
  {
    "$unset": "Tracks"
  }
])

Try it on mongoplayground.net.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.