Am I using the right approach in ignoring count for duplicate values in $group aggregation pipeline?

I am developing an online questionnaire for students, each sections comprises of 4-5 different questionnaire each, and a student has the option of either retaking given questionnaire or make a new attempt on the same questionnaire.
The document schema is as follows

Document 1

_id: ObjectId('ID'),
userId: U01,
questionnaireId: Q01,
sectionId: section1,
completed: 1

Document 2

_id: ObjectId('ID'),
userId: U01,
questionnaireId: Q01,
sectionId: section1,
completed: 1

Document 3

_id: ObjectId('ID'),
userId: U01,
questionnaireId: Q02,
sectionId: section1,
completed: 1

Document 4

_id: ObjectId('ID'),
userId: U01,
questionnaireId: Q03,
sectionId: section1,
completed: 1

Document 5

_id: ObjectId('ID'),
userId: U01,
questionnaireId: Q01,
sectionId: section2,
completed: 1

Where completed represents that a user has attempted questionnaire, now as you can see that

questionnaireId: Q01,
sectionId: section1,

is being repeated twice with different _id, in this case it shall be counted as 1 instead of two, and the same process is followed for any combo of sectionId and questionnaireId.

The aggregation pipeline is as follows

 {
          $group: {
            _id: {
              originalId: '$_id',
              userId: '$userId',
              questionnaireId: '$questionnaireId',
              sectionId: '$sectionId',
              completed: '$completed',
            },
            completedQuestionnaire: { $sum: 1 },
          },
        },

Although I am getting percentage calculated in $project stage but I am not convinced whether this is the right approach or not
Any help will be appreciated

Hi @Daniyal_Khan,

If I understand correctly, you just need to $group on userId, questionnaireId and sectionId. Don’t need the rest. Also, _id is the primary key which is unique and immutable so including it won’t help during grouping.

Thanks,
Mahi

Hi @Mahi_Satyanarayana, I made a mistake in this post, its a quiz not a questionnaire, and I am building an online quiz system where one topic for example Physics shall have a set of sections (consider Udemy type sections) and each sections shall comprise of 1-M quizzes. I am calculating a user’s progress in percentage for entire Physics topic, and I am summing on the basis of completed property that is equal to 1 upon each users quiz submission.
In case of two submissions of same quiz by quizId and section by sectionId by the user, I want to sum completed property only once in that case in order to remove any duplications that might arise if a student has attempted same quiz 1-M times.

Yes, so you could $match all documents where {completed: 1} and then $group by userId, quizId, and sectionId to remove duplicates when the quiz has been attempted multiple times. Also, curious, why allow more attempts after the first successful submission? Is this for best-of-multiple-submissions use case?

Also, curious, why allow more attempts after the first successful submission? Is this for best-of-multiple-submissions use case?

Yes, because a user might try to attempt same quiz multiple times in order to improve his/her scores for user grades.

I think I might have to $group twice in my case right, because on first group the results are as follows

{
    "topicProgress": [
        {
            "_id": {
                "userId": "U01",
                "quizId": "Q01",
                "sectionId": section1,
            },
            "completedQuiz": 5,
            "totalSections": 8,
            "userId": "U01",
            "quizId": "Q01",
            "progress": 62.5
        },
        {
            "_id": {
                "userId": "U01",
                "quizId": "Q02",
                "sectionId": section1,
            },
            "completedQuiz": 2,
            "totalSections": 8,
            "userId": "U01",
             "quizId": "Q02",
            "progress": 25
        },
        {
            "_id": {
               "userId": "U01",
                "quizId": "Q01",
                "sectionId": section2,
            },
            "completedQuiz": 1,
            "totalSections": 8,
            "userId": "U01",
            "quizId": "Q01",
            "progress": 12.5
        }
    ]
}

In this example, I am calculating total number of sections per topic, and I am dividing completedTopics by totalTopics, above I just tested grouping, and now I think I will have to group again like this

First Group Results

{
        {
            "_id": {
                "userId": "U01",
                "quizId": "Q01",
                "sectionId": section1,
            },
            "completedQuiz": 5,
        },
        {
            "_id": {
                "userId": "U01",
                "quizId": "Q02",
                "sectionId": section1,
            },
            "completedQuiz": 2,
        },
        {
            "_id": {
               "userId": "U01",
                "quizId": "Q01",
                "sectionId": section2,
            },
            "completedQuiz": 1,
        }
}

Second Group Results
Pseudocode for second $group results

{
  "_id": {
      totalCompletedQuiz: 3
  },
  "totalSections": 8,
   "userId": "U01",
   "topicId": "Physics",
   "progress": 37.5
}

I have figured it out, thanks to the following SO answer

This is the solution I came up with

{
          $group: {
            _id: {
              userId: '$userId',
            },
            completedQuiz: {
              $addToSet: {
                quizId: '$quizId',
                sectionId: '$sectionId',
              },
            },
          },
        },
        { $project: { completedSections: { $size: '$completedQuiz' } } }, 

And it works like a charm, I was making a mistake before, by adding both quizId and sectionId in group stage, seems like I need to go through $group docs again

Awesome. Yes, that’s one way to do it to.

Here’s another way with just $group stage

{ $group: { _id: { userId: '$userId', 
                   quizId: '$quizId', 
                   sectionId: '$sectionId'  }, 
            completedSections: { $sum: 1 } } } 

Unfortunately, it doesn’t work for me as it shows multiple groups (see my second answer), thanks btw.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.