ESR rule when sorting is done on a field inside of a embedded document

Big_Cat_Public_Safety_Act · March 8, 2023, 5:07pm

{
  tests: [
    { grade: 90  },
    { grade: 80  },
    { grade: 100 },
    ...
  ]
}

This is the schema of my collection.

I have an index on tests.grade.

In my aggregation pipeline, after $elemMatch inside of a $match, there will be

[
  { $sort: { "tests.grade", 1 } }
  { $limit: 10 }
]

My question is, will this work, given that tests.grade is a field inside of an array of embedded documents? Would it know to use the embedded doc that was matched in the elemMatch? What if there are multiple matches? Would this be optimized as in ESR rule?

Satyam · March 15, 2023, 4:08am

Hey @Big_Cat_Public_Safety_Act,

The $sort aggregation stage should work as expected, even if tests.grade is a field inside an array of embedded documents. When using $elemMatch inside $match, it returns the first element that matches the specified condition in the array, and only that element is used in the next pipeline stages.

Regarding multiple matches, $sort should work as expected, sorting all the matched embedded documents The order of the documents after the sort operation will depend on the sort order specified in the $sort stage. To confirm this, I tried to make a sample collection from the sample document you provided. This is what the documents looked like:

> db.test.find({}).limit(5)
{
  _id: ObjectId("6410499d19ad274fe51c6c18"),
  tests: [
    {
      grade: 90
    },
    {
      grade: 80
    },
    {
      grade: 100
    }
  ]
}
{
  _id: ObjectId("64104fd819ad274fe51c6c1b"),
  tests: [
    {
      grade: 40
    },
    {
      grade: 50
    },
    {
      grade: 10
    }
  ]
}
{
  _id: ObjectId("6410500819ad274fe51c6c1c"),
  tests: [
    {
      grade: 75
    },
    {
      grade: 80
    },
    {
      grade: 92
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c19"),
  tests: [
    {
      grade: 70
    },
    {
      grade: 80
    },
    {
      grade: 90
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c1a"),
  tests: [
    {
      grade: 60
    },
    {
      grade: 70
    },
    {
      grade: 80
    }
  ]
}

tests.grade has an index.

Based on the information you provided:

I created an aggregation pipeline:

[
  {
    $match:
      {
        tests: {
          $elemMatch: {
            grade: {
              $gte: 50,
              $lt: 95,
            },
          },
        },
      },
  },
  {
    $sort:
      {
        "tests.grade": 1,
      },
  },
  {
    $limit:10,
  }
]

The result was as we expected, the output was shown in increasing order :

{
  _id: ObjectId("64104fd819ad274fe51c6c1b"),
  tests: [
    {
      grade: 40
    },
    {
      grade: 50
    },
    {
      grade: 10
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c1b"),
  tests: [
    {
      grade: 50
    },
    {
      grade: 60
    },
    {
      grade: 70
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c1a"),
  tests: [
    {
      grade: 60
    },
    {
      grade: 70
    },
    {
      grade: 80
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c1f"),
  tests: [
    {
      grade: 60
    },
    {
      grade: 65
    },
    {
      grade: 70
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c19"),
  tests: [
    {
      grade: 70
    },
    {
      grade: 80
    },
    {
      grade: 90
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c1e"),
  tests: [
    {
      grade: 70
    },
    {
      grade: 75
    },
    {
      grade: 80
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c22"),
  tests: [
    {
      grade: 70
    },
    {
      grade: 80
    },
    {
      grade: 90
    }
  ]
}
{
  _id: ObjectId("6410500819ad274fe51c6c1c"),
  tests: [
    {
      grade: 75
    },
    {
      grade: 80
    },
    {
      grade: 92
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c18"),
  tests: [
    {
      grade: 90
    },
    {
      grade: 80
    },
    {
      grade: 100
    }
  ]
}
{
  _id: ObjectId("6410499d19ad274fe51c6c1d"),
  tests: [
    {
      grade: 80
    },
    {
      grade: 85
    },
    {
      grade: 90
    }
  ]
}

Regarding the optimization of the query, if you have an index on “tests.grade”, then the query should perform well. MongoDB’s query optimizer should use the index to speed up the sort operation, which will improve the query’s performance. However, the specific optimization strategy may depend on the size of the collection, the number of matches, and the sort order. You can use explain output to check this all. The explain output for the above aggregation query without any index looked like this:

executionStats: {
    executionSuccess: true,
    nReturned: 10,
    executionTimeMillis: 0,
    totalKeysExamined: 0,
    totalDocsExamined: 13,
    executionStages: {
      stage: 'SORT',
      nReturned: 10,
      executionTimeMillisEstimate: 0,
      works: 26,
      advanced: 10,
      needTime: 15,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      sortPattern: {
        'tests.grade': 1
      },
      memLimit: 104857600,
      limitAmount: 10,
      type: 'simple',
      totalDataSizeSorted: 1476,
      usedDisk: false,
      spills: 0,
      inputStage: {
        stage: 'COLLSCAN',
        filter: {
          tests: {
            '$elemMatch': {
              '$and': [
                {
                  grade: {
                    '$lt': 95
                  }
                },
                {
                  grade: {
                    '$gte': 50
                  }
                }
              ]
            }
          }
        },
        nReturned: 13,
        executionTimeMillisEstimate: 0,
        works: 15,
        advanced: 13,
        needTime: 1,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        direction: 'forward',
        docsExamined: 13
      }
    }
  }

As we can see, it had to do a collection scan (COLLSCAN). With indexes, the explain output looked like this:

executionStats: {
      executionSuccess: true,
      nReturned: 10,
      executionTimeMillis: 0,
      totalKeysExamined: 21,
      totalDocsExamined: 10,
      executionStages: {
        stage: 'LIMIT',
        nReturned: 10,
        executionTimeMillisEstimate: 0,
        works: 22,
        advanced: 10,
        needTime: 11,
        needYield: 0,
        saveState: 0,
        restoreState: 0,
        isEOF: 1,
        limitAmount: 10,
        inputStage: {
          stage: 'FETCH',
          filter: {
            tests: {
              '$elemMatch': {
                '$and': [
                  {
                    grade: {
                      '$lt': 95
                    }
                  },
                  {
                    grade: {
                      '$gte': 50
                    }
                  }
                ]
              }
            }
          },
          nReturned: 10,
          executionTimeMillisEstimate: 0,
          works: 21,
          advanced: 10,
          needTime: 11,
          needYield: 0,
          saveState: 0,
          restoreState: 0,
          isEOF: 0,
          docsExamined: 10,
          alreadyHasObj: 0,
          inputStage: {
            stage: 'IXSCAN',
            nReturned: 10,
            executionTimeMillisEstimate: 0,
            works: 21,
            advanced: 10,
            needTime: 11,
            needYield: 0,
            saveState: 0,
            restoreState: 0,
            isEOF: 0,
            keyPattern: {
              'tests.grade': 1
            },
            indexName: 'tests.grade_1',
            isMultiKey: true,
            multiKeyPaths: {
              'tests.grade': [
                'tests'
              ]
            },
            isUnique: false,
            isSparse: false,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: {
              'tests.grade': [
                '[MinKey, MaxKey]'
              ]
            },
            keysExamined: 21,
            }
        }
      }
    }

As we can see, the inputStage has no COLLSCAN this time and there is an IXSCAN happening. Of course, there may be some changes to your explain output based on the full structure of your documents and the exact aggregation query that you’re using. I would suggest using explain output to understand which indexes your query is using. Also, using compass for writing aggregation queries can help a lot too since one can easily see the output after each stage and analyze accordingly.

Regards,
Satyam

system · March 20, 2023, 4:08am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.