Hello,
So to explain i have a collection of feedback submission which contains data in the following structure.
{
"_id": {
"$oid": "6524ea72a85a4164cf29f849"
},
"form_id": {
"$oid": "6523d74ecf7337be2640b59b"
},
"user_id": 94584,
"resource_id": 31258,
"resource_type": 1,
"responses": [
{
"field_id": {
"$oid": "6523d74ecf7337be2640b596"
},
"field_type": 2,
"question": "What is your feedback of the session?",
"options": [
{
"_id": {
"$oid": "6523d74ecf7337be2640b597"
},
"label": "Very informative"
},
{
"_id": {
"$oid": "6523d74ecf7337be2640b598"
},
"label": "Content needs to improve"
},
{
"_id": {
"$oid": "6523d74ecf7337be2640b599"
},
"label": "Lecture was good"
}
],
"answer": {
"$oid": "6523d74ecf7337be2640b597"
}
},
{
"field_id": {
"$oid": "6523d74ecf7337be2640b59a"
},
"field_type": 3,
"question": "How would you rate the session",
"numbers": 5,
"start_label": "Poor",
"end_label": "Great",
"answer": 5
},
{
"field_id": {
"$oid": "6524d92af40cb9f6de2966c8"
},
"field_type": 4,
"question": "Long answer Type",
"answer": "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum".
}
],
"createdAt": {
"$date": "2023-10-10T06:08:50.333Z"
},
"updatedAt": {
"$date": "2023-10-10T06:08:50.333Z"
}
}
so this is a example of one submission and there can be possibly more than 3 million submissions for each
form_id .
For stats (like google forms ) lets say only for form fields like : Single choice or multiple choice or ratings and similar i need to bring the count for all the options .
and i have made this aggregate command
{
$match:
/**
* query: The query in MQL.
*/
{
form_id: ObjectId(
"6523d74ecf7337be2640b59b"
),
"responses.field_type": {
$in: [2, 3],
},
},
},
{
$project:
/**
* specifications: The fields to
* include or exclude.
*/
{
"responses.field_id": 1,
"responses.field_type": 1,
"responses.answer": 1,
},
},
{
$unwind:
/**
* path: Path to the array field.
* includeArrayIndex: Optional name for index.
* preserveNullAndEmptyArrays: Optional
* toggle to unwind null and empty values.
*/
{
path: "$responses",
},
},
{
$match:
/**
* query: The query in MQL.
*/
{
"responses.field_type": {
$in: [2, 3],
},
},
},
{
$group:
/**
* _id: The id of the group.
* fieldN: The first field name.
*/
{
_id: {
field_id: "$responses.field_id",
answer: "$responses.answer",
},
answer: {
$first: "$responses.answer",
},
count: {
$sum: 1,
},
},
},
{
$group:
/**
* _id: The id of the group.
* fieldN: The first field name.
*/
{
_id: "$_id.field_id",
submissions: {
$push: {
k: {
$toString: "$_id.answer",
},
v: "$count",
},
},
},
},
{
$project: {
_id: 1,
submissions: {
$arrayToObject: "$submissions",
},
},
},
while checking it is working for 60k records ( submissions) in 500 ms . I doubt this is a slow query and is there any better way to write this .
Indesxes are on
form_id , responses.field_id
The above is giving response like this which is fine
{
"_id": {
"$oid": "6523d74ecf7337be2640b596"
},
"submissions": {
"6523d74ecf7337be2640b597": 20311,
"6523d74ecf7337be2640b599": 19922,
"6523d74ecf7337be2640b598": 19769
}
},
{
"_id": {
"$oid": "6523d74ecf7337be2640b59a"
},
"submissions": {
"1": 12068,
"2": 12056,
"3": 11919,
"4": 12086,
"5": 11873
}
}
.
Also for fields like short answer and long answer lets say i need to collect only latest 20 response grouped by responses.field_id ( yet to make aggregate command) , ( if anyone can help )