Transforming Deeply Nested Values In Aggregation Pipeline: SOS

Hello Everyone

I’m a newbie with mongoDB aggregation and I’m struggling a bit trying to get my data to look the way I want. I’m a student completing a bootcamp
And we are doing a project where we seed a database of our choice with from millions lines of CSV that was extracted from a SQL database, though Im not sure which one.
For context, the data is questions and answers from a mock retail application we built.

I was given three files. One with questions, one with answers, and one with photos that were uploaded to answers. I successfully used the $ lookup and $out operators
To join these files on the appropriate index and export to a new collection . So now I just have a collection of questions, and a collections of ansPhotos

The issue Is that the data needs to be structurally transformed for different cases.

Suppose I want all the questions and answers for a particular product . Below shows how the question data is structured, giving me all questions for a product_id of 1:

   db.questions.find({ product_id: 1 })[
  ({
    _id: ObjectId('61731a1cae4ca5aef1836b04'),
    question_id: 4,
    product_id: 1,
    question_body: 'How long does it last?',
    question_date: Long('1594341317010'),
    asker_name: 'funnygirl',
    asker_email: 'first.last@gmail.com',
    reported: 0,
    helpful: 6,
  },
  {
    _id: ObjectId('61731a1cae4ca5aef1836b05'),
    question_id: 3,
    product_id: 1,
    question_body: 'Does this product run big or small?',
    question_date: Long('1608535907083'),
    asker_name: 'jbilas',
    asker_email: 'first.last@gmail.com',
    reported: 0,
    helpful: 8,
  },
  {
    _id: ObjectId('61731a1cae4ca5aef1836b06'),
    question_id: 6,
    product_id: 1,
    question_body: 'Is it noise cancelling?',
    question_date: Long('1608855284662'),
    asker_name: 'coolkid',
    asker_email: 'first.last@gmail.com',
    reported: 1,
    helpful: 19,
  },
  {
    _id: ObjectId('61731a1cae4ca5aef1836b08'),
    question_id: 1,
    product_id: 1,
    question_body: 'What fabric is the top made of?',
    question_date: Long('1595884714409'),
    asker_name: 'yankeelover',
    asker_email: 'first.last@gmail.com',
    reported: 0,
    helpful: 1,
  },
  {
    _id: ObjectId('61731a1cae4ca5aef1836b0d'),
    question_id: 5,
    product_id: 1,
    question_body: 'Can I wash it?',
    question_date: Long('1608855284662'),
    asker_name: 'cleopatra',
    asker_email: 'first.last@gmail.com',
    reported: 0,
    helpful: 7,
  },
  {
    _id: ObjectId('61731a1cae4ca5aef1836b13'),
    question_id: 2,
    product_id: 1,
    question_body: 'HEY THIS IS A WEIRD QUESTION!!!!?',
    question_date: Long('1613888219613'),
    asker_name: 'jbilas',
    asker_email: 'first.last@gmail.com',
    reported: 1,
    helpful: 4,
  })
];

I now want to get all the answers for all these questions. For brevity and because I’ll be pasting a lot of context / examples, heres what a couple of answer documents from ansPhotos looks like:

    db.ansPhotos.find({question_id:4})


    [
      {
        _id: ObjectId("61731c9c39b2df95b4573b3c"),
        id: 65,
        question_id: 4,
        body: 'It runs small',
        date: Long("1605784307205"),
        answerer_name: 'dschulman',
        answerer_email: 'first.last@gmail.com',
        reported: 0,
        helpful: 1,
        photos: [
          {
            _id: ObjectId("61731edbbac3ef59b2a59b04"),
            id: 15,
            answer_id: 65,
            url: 'https://images.unsplash.com/photo-1536922645426-5d658ab49b81?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80'
          },
          {
            _id: ObjectId("61731edbbac3ef59b2a59b0a"),
            id: 14,
            answer_id: 65,
            url: 'https://images.unsplash.com/photo-1470116892389-0de5d9770b2c?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1567&q=80'
          }
        ]
      },
      {
        _id: ObjectId("61731c9c39b2df95b4573b54"),
        id: 89,
        question_id: 4,
        body: 'Showing no wear after a few months!',
        date: Long("1599089609530"),
        answerer_name: 'sillyguy',
        answerer_email: 'first.last@gmail.com',
        reported: 0,
        helpful: 8,
        photos: []
      }
    ]

Now for the part I’m struggling with.

The data needs to look different for different API calls. I basically need to nest every answer with its photos in each question.

Here are the key challenges I’m facing and transformations I have to make. There are other transformations that I am not discussing because they are easy to do, such as not returning the Object_id for answers,
Transforming the date etc.

  1. Each question has an answers object that is stored in key-value pairs with its “id” as the key and the object as the value.

  2. Each answer must only have the photos url in an array, instead of an array of objects that have a URL property each., as you can see above for answers related to question_id 4.

  3. Some questions do not have any answers. Question with “ question_id:3” below is one such question. I am still expected to return an empty object at the “answers” key if there are no questions for it.

         [
             {
                 "question_id": 4,
                 "question_body": "How long does it last?",
                 "question_date": "2020-07-10T00:35:17.010Z",
                 "asker_name": "funnygirl",
                 "reported": false,
                 "question_helpfullness": 6,
                 "answers": {
                     "65": {
         		 _id: ObjectId("61731c9c39b2df95b4573b3c"),
                         "id": 65,
                         "question_id": 4,
                         "body": "It runs small",
                         "date": 1605784307205,
                         "answerer_name": "dschulman",
                         "answerer_email": "first.last@gmail.com",
                         "helpful": 1,
                         "photos": ["https://images.unsplash.com/photo-1536922645426-5d658ab49b81?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80”,
       "https://images.unsplash.com/photo-1470116892389-0de5d9770b2c?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1567&q=80"]
                     },
                     "89": {
                         "_id": "61731c9c39b2df95b4573b54",
                         "id": 89,
                         "question_id": 4,
                         "body": "Showing no wear after a few months!",
                         "date": 1599089609530,
                         "answerer_name": "sillyguy",
                         "answerer_email": "first.last@gmail.com",
                         "reported": 0,
                         "helpful": 8,
                         "photos": []
                     }
                 }
             },
             {
                 "question_id": 5,
                 "question_body": "Can I wash it?",
                 "question_date": "2020-12-25T00:14:44.662Z",
                 "asker_name": "cleopatra",
                 "reported": false,
                 "question_helpfullness": 7,
                 "answers": {
                     "46": {
                         "_id": "61731c9c39b2df95b4573b27",
                         "id": 46,
                         "question_id": 5,
                         "body": "I've thrown it in the wash and it seems fine",
                         "date": 1606022843272,
                         "answerer_name": "marcanthony",
                         "answerer_email": "first.last@gmail.com",
                         "reported": 0,
                         "photos": []
                     },
                     "64": {
                         "_id": "61731c9c39b2df95b4573b3b",
                         "id": 64,
                         "question_id": 5,
                         "body": "It says not to",
                         "date": 1588644950162,
                         "answerer_name": "ceasar",
                         "answerer_email": "first.last@gmail.com",
                         "helpful": 0,
                         "photos": []
                     },
                  }
             },
             {
                 "question_id": 3,
                 "question_body": "Does this product run big or small?",
                 "question_date": "2020-12-21T07:31:47.083Z",
                 "asker_name": "jbilas",
                 "reported": false,
                 "question_helpfullness": 8,
                 "answers": {}
                 }
             },
    

//etc..

What I’ve tried in the pipeline:

Calling db.questions.aggregate([]) with the following stages.

Get all products that have a product id of 1 and are not reported:
Stage 1 :

 {
    '$match': {
      'product_id': 1, 
      'reported': 0
    }
}

Stage 2:

Join all questions documents with their respective answers in an array called “answers”

  {
        '$lookup': {
          'from': 'ansPhotos', 
          'localField': 'question_id', 
          'foreignField': 'question_id', 
          'as': 'answers'
        }
}

Sample output:

questions_answers> db.questions.aggregate([{$match:{product_id:1,reported:0}},{$lookup:{from:'ansPhotos',localField:'question_id',foreignField:'question_id',as:'answers'}}])
[
  {
    _id: ObjectId("61731a1cae4ca5aef1836b04"),
    question_id: 4,
    product_id: 1,
    question_body: 'How long does it last?',
    question_date: Long("1594341317010"),
    asker_name: 'funnygirl',
    asker_email: 'first.last@gmail.com',
    reported: 0,
    helpful: 6,
    answers: [
      {
        _id: ObjectId("61731c9c39b2df95b4573b3c"),
        id: 65,
        question_id: 4,
        body: 'It runs small',
        date: Long("1605784307205"),
        answerer_name: 'dschulman',
        answerer_email: 'first.last@gmail.com',
        reported: 0,
        helpful: 1,
        photos: [
          {
            _id: ObjectId("61731edbbac3ef59b2a59b04"),
            id: 15,
            answer_id: 65,
            url: 'https://images.unsplash.com/photo-1536922645426-5d658ab49b81?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80'
          },
          {
            _id: ObjectId("61731edbbac3ef59b2a59b0a"),
            id: 14,
            answer_id: 65,
            url: 'https://images.unsplash.com/photo-1470116892389-0de5d9770b2c?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1567&q=80'
          }
        ]
      },
      {
        _id: ObjectId("61731c9c39b2df95b4573b54"),
        id: 89,
        question_id: 4,
        body: 'Showing no wear after a few months!',
        date: Long("1599089609530"),
        answerer_name: 'sillyguy',
        answerer_email: 'first.last@gmail.com',
        reported: 0,
        helpful: 8,
        photos: []
      }
    ]
  },
  {
    _id: ObjectId("61731a1cae4ca5aef1836b05"),
    question_id: 3,
    product_id: 1,
    question_body: 'Does this product run big or small?',
    question_date: Long("1608535907083"),
    asker_name: 'jbilas',
    asker_email: 'first.last@gmail.com',
    reported: 0,
    helpful: 8,
    answers: []
  },

//etc… 
] 

Stage 3:
unwind each answer array , preserving null arrays because I still need to return questions without answers.

{
    '$unwind': {
      'path': '$answers', 
      'preserveNullAndEmptyArrays': true
    }
  }

I then have a document for each answer and can manipulate the “answers.photos” object. Each answers field is now an object containing the answers.

STAGE 4

Things become muddy here.
For example, I’ve tried to use $addFields, $set and $project to just get the photos.url property for each answer and put it in an array. I’ve had some success doing this. but…

STAGE 5

Then I try to $group them back into arrays of objects And had some success with it… Note the $ifNull is my feeble attempt to give the next stage what it wants, but it is not working.

{
$group: {“answers.id”}, 
  question_id:{$first:"$question_id"}, question_body:{$first:'$question_body'},question_date:{$first:"$question_date"},asker_name:{$first:"$asker_name"},reported:{$first:"$reported"},question_helpfullness:{$first:"$helpful"},
answers:{$push:{$ifNull:["$answers",{_id:"$_id",id:'noanswers' ,question_id:"$question_id",}]}}

} 

BUT I also need to do this at some point:

STAGE 6 or later…

 {
    '$addFields': {
      'answers': {
        '$arrayToObject': {
          '$map': {
            'input': '$answers', 
            'in': {
              'k': {
                '$toString': '$$this.id'
              }, 
              'v': '$$this'
            }
          }
        }
      }
    }
  }

To give me the appropriate key-value pairs as seen in the desired output.

This is where things get muddy. I have tried a TON of configurations over the last 5 days.

In most cases if I directly manipulate the answers array after stage 4, I get this error when I then try to use $addFields:
PlanExecutor error during aggregation :: caused by :: $arrayToObject requires an object with keys ‘k’ and ‘v’, where the value of ‘k’ must be of type string. Found type: null

This is because the question with id 3 has no answers and I’ve inadvertently assigned it to an empty object using any of the methods mentioned in Stage 4.
I’ve tried some $ifNull operations as you can see to give the this question the key value pairs its expecting, but am only successful sometimes and usually there are other weird side effects.

To summarize, is there a way for me to get only the url property out of the “answers.photos” array, account for the edge case of having a question that has no answers, and still structure them in the key value pairs as illustrated?

Apologies if this is too long or difficult to read. If there’s some more formatting I can do to make it better please let me know. I’m just so stuck with this, any help is very very much appreciated.

Joe