Best way to sort by date after index on a field?

Manjunath_k_s · September 28, 2023, 12:25pm

Hi

I have small schema like this

{
  "_id": ObjectId('65110ce0d7c21aaca1d1sf33'),
  bookingId: "1234",
   pgId: ObjectId('65110ce0d7c21aaca1d1db33'),
  scheduledDate: ISODate("2023-09-16T05:30:00.000Z")
}

I have index for
a. pgId

My collection could have huge amount of documents.

I want to search by pgId and sort subset of data based on scheduledDate

What is the best way to do this ?
I have tried two ways
a. SORT stage being used in RAM memory

find( { pgId: ObjectId(‘65110ce0d7c21aaca1d1db33’) } ).sort({ scheduledDate : -1 }).explain(“executionStats”)

Here is the output

db.bookingOrders.find( { pgId: ObjectId('65110ce0d7c21aaca1d1db33') } ).sort({ scheduledDate : -1 }).explain("executionStats")

{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'foody.bookingOrders',
    indexFilterSet: false,
    parsedQuery: { pgId: { '$eq': ObjectId("65110ce0d7c21aaca1d1db33") } },
    queryHash: '0D151AAD',
    planCacheKey: 'B4247D13',
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'SORT',
      sortPattern: { scheduledDate: -1 },
      memLimit: 104857600,
      type: 'simple',
      inputStage: {
        stage: 'FETCH',
        inputStage: {
          stage: 'IXSCAN',
          keyPattern: { pgId: 1 },
          indexName: 'pgId_1',
          isMultiKey: false,
          multiKeyPaths: { pgId: [] },
          isUnique: false,
          isSparse: false,
          isPartial: false,
          indexVersion: 2,
          direction: 'forward',
          indexBounds: {
            pgId: [
              "[ObjectId('65110ce0d7c21aaca1d1db33'), ObjectId('65110ce0d7c21aaca1d1db33')]"
            ]
          }
        }
      }
    },
    rejectedPlans: []
  },
executionStats: {
    executionSuccess: true,
    nReturned: 8910,
    executionTimeMillis: 46,
    totalKeysExamined: 8910,
    totalDocsExamined: 8910,
    executionStages: {
      stage: 'SORT',
      nReturned: 8910,
      executionTimeMillisEstimate: 9,
      works: 17822,
      advanced: 8910,
      needTime: 8911,
      needYield: 0,
      saveState: 17,
      restoreState: 17,
      isEOF: 1,
      sortPattern: { scheduledDate: -1 },
      memLimit: 104857600,
      type: 'simple',
      totalDataSizeSorted: 4535264,
      usedDisk: false,
      spills: 0,
      inputStage: {
        stage: 'FETCH',
        nReturned: 8910,
        executionTimeMillisEstimate: 9,
        works: 8911,
        advanced: 8910,
        needTime: 0,
        needYield: 0,
        saveState: 17,
        restoreState: 17,
        isEOF: 1,
        docsExamined: 8910,
        alreadyHasObj: 0,
        inputStage: {
          stage: 'IXSCAN',
          nReturned: 8910,
          executionTimeMillisEstimate: 0,
          works: 8911,
          advanced: 8910,
          needTime: 0,
          needYield: 0,
          saveState: 17,
          restoreState: 17,
          isEOF: 1,
          keyPattern: { pgId: 1 },
          indexName: 'pgId_1',
          isMultiKey: false,
          multiKeyPaths: { pgId: [] },
          isUnique: false,
          isSparse: false,
          isPartial: false,
          indexVersion: 2,
          direction: 'forward',
          indexBounds: {
            pgId: [
              "[ObjectId('65110ce0d7c21aaca1d1db33'), ObjectId('65110ce0d7c21aaca1d1db33')]"
            ]
          },
          keysExamined: 8910,
          seeks: 1,
          dupsTested: 0,
          dupsDropped: 0
        }
      }
    }
  },

b. No SORT operation in RAM by using sort on index field. In this case scheduledDate ( For this i created index for scheduledDate)

find( { pgId: ObjectId(‘65110ce0d7c21aaca1d1db33’) } ).sort({ scheduledDate : -1 }).explain(“executionStats”)

Here is the output

db.bookingOrders.find( { pgId: ObjectId('65110ce0d7c21aaca1d1db33') } ).sort({ scheduledDate : -1 }).explain("executionStats")

{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'foody.bookingOrders',
    indexFilterSet: false,
    parsedQuery: { pgId: { '$eq': ObjectId("65110ce0d7c21aaca1d1db33") } },
    queryHash: '0D151AAD',
    planCacheKey: 'B4247D13',
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'FETCH',
      filter: { pgId: { '$eq': ObjectId("65110ce0d7c21aaca1d1db33") } },
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { scheduledDate: -1 },
        indexName: 'scheduledDate_-1',
        isMultiKey: false,
        multiKeyPaths: { scheduledDate: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { scheduledDate: [ '[MaxKey, MinKey]' ] }
      }
    },
    rejectedPlans: [
      {
        stage: 'SORT',
        sortPattern: { scheduledDate: -1 },
        memLimit: 104857600,
        type: 'simple',
        inputStage: {
          stage: 'FETCH',
          inputStage: {
            stage: 'IXSCAN',
            keyPattern: { pgId: 1 },
            indexName: 'pgId_1',
            isMultiKey: false,
            multiKeyPaths: { pgId: [] },
            isUnique: false,
            isSparse: false,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: {
              pgId: [
                "[ObjectId('65110ce0d7c21aaca1d1db33'), ObjectId('65110ce0d7c21aaca1d1db33')]"
              ]
            }
          }
        }
      }
    ]
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 8910,
    executionTimeMillis: 418,
    totalKeysExamined: 80000,
    totalDocsExamined: 80000,
    executionStages: {
      stage: 'FETCH',
      filter: { pgId: { '$eq': ObjectId("65110ce0d7c21aaca1d1db33") } },
      nReturned: 8910,
      executionTimeMillisEstimate: 135,
      works: 80001,
      advanced: 8910,
      needTime: 71090,
      needYield: 0,
      saveState: 80,
      restoreState: 80,
      isEOF: 1,
      docsExamined: 80000,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 80000,
        executionTimeMillisEstimate: 55,
        works: 80001,
        advanced: 80000,
        needTime: 0,
        needYield: 0,
        saveState: 80,
        restoreState: 80,
        isEOF: 1,
        keyPattern: { scheduledDate: -1 },
        indexName: 'scheduledDate_-1',
        isMultiKey: false,
        multiKeyPaths: { scheduledDate: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { scheduledDate: [ '[MaxKey, MinKey]' ] },
        keysExamined: 80000,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  },
  command: {
    find: 'bookingOrders',
    filter: { pgId: ObjectId("65110ce0d7c21aaca1d1db33") },
    sort: { scheduledDate: -1 },
    '$db': 'foody'
  },
  serverInfo: {
    host: 'DESKTOP-O4GVMQ6',
    port: 27017,
    version: '6.0.8',
    gitVersion: '3d84c0dd4e5d99be0d69003652313e7eaf4cdd74'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
  },
  ok: 1
}

Even though approach “a” has given less time, in future as data grows i am afraid SORT being done in RAM is not that great idea.

Can someone explain why approach “b” is taking more time, can we not expect do sort only for find based result ? Meaning sort only subset of doc matching pgId ?

Please suggest

turivishal · September 28, 2023, 1:06pm

Hello @Manjunath_k_s,

You can use the compound index,

Also refer to the ESR rule for better performance,

Manjunath_k_s · September 29, 2023, 10:28am

Hi Vishal,

Thanks for suggesting compound index. I could able to use it.

createIndex({ pgId : 1, scheduledDate : -1 })

However if we double the data set size and do the same query it is taking more time

Seems “totalKeysExamined” and “totalDocsExamined” directly proportional to time consumption.
Even though it is indexed, why does it take so much time ? Can we optimize this ? Please help in correcting ?

.find( { pgId: ObjectId('65110ce0d7c21aaca1d1db33') } ).explain("executionStats")

ID        Time in ms      dataSetSize 
1.          ~21  ms          80K
2.          ~110 ms         190K

Here is executionStats

{
  "explainVersion": "1",
  "queryPlanner": {
    "namespace": "foody.bookingOrders",
    "indexFilterSet": false,
    "parsedQuery": {
      "pgId": {
        "$eq": "65110ce0efa22fd55988e542"
      }
    },
    "queryHash": "767206C5",
    "planCacheKey": "7E0A1DA7",
    "maxIndexedOrSolutionsReached": false,
    "maxIndexedAndSolutionsReached": false,
    "maxScansToExplodeReached": false,
    "winningPlan": {
      "stage": "FETCH",
      "inputStage": {
        "stage": "IXSCAN",
        "keyPattern": {
          "pgId": 1,
          "scheduledDate": -1
        },
        "indexName": "pgId_1_scheduledDate_-1",
        "isMultiKey": false,
        "multiKeyPaths": {
          "pgId": [],
          "scheduledDate": []
        },
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 2,
        "direction": "forward",
        "indexBounds": {
          "pgId": [
            "[ObjectId('65110ce0efa22fd55988e542'), ObjectId('65110ce0efa22fd55988e542')]"
          ],
          "scheduledDate": ["[MaxKey, MinKey]"]
        }
      }
    },
    "rejectedPlans": []
  },
  "executionStats": {
    "executionSuccess": true,
    "nReturned": 21239,
    "executionTimeMillis": 115,
    "totalKeysExamined": 21239,
    "totalDocsExamined": 21239,
    "executionStages": {
      "stage": "FETCH",
      "nReturned": 21239,
      "executionTimeMillisEstimate": 34,
      "works": 21240,
      "advanced": 21239,
      "needTime": 0,
      "needYield": 0,
      "saveState": 21,
      "restoreState": 21,
      "isEOF": 1,
      "docsExamined": 21239,
      "alreadyHasObj": 0,
      "inputStage": {
        "stage": "IXSCAN",
        "nReturned": 21239,
        "executionTimeMillisEstimate": 7,
        "works": 21240,
        "advanced": 21239,
        "needTime": 0,
        "needYield": 0,
        "saveState": 21,
        "restoreState": 21,
        "isEOF": 1,
        "keyPattern": {
          "pgId": 1,
          "scheduledDate": -1
        },
        "indexName": "pgId_1_scheduledDate_-1",
        "isMultiKey": false,
        "multiKeyPaths": {
          "pgId": [],
          "scheduledDate": []
        },
        "isUnique": false,
        "isSparse": false,
        "isPartial": false,
        "indexVersion": 2,
        "direction": "forward",
        "indexBounds": {
          "pgId": [
            "[ObjectId('65110ce0efa22fd55988e542'), ObjectId('65110ce0efa22fd55988e542')]"
          ],
          "scheduledDate": ["[MaxKey, MinKey]"]
        },
        "keysExamined": 21239,
        "seeks": 1,
        "dupsTested": 0,
        "dupsDropped": 0
      }
    },
    "allPlansExecution": []
  },

turivishal · September 29, 2023, 11:43am

Hello @Manjunath_k_s,

I don’t see a major difference in performance, because if you understand the execution speed does not matter only on indexing but also you need to ensure that your hardware resources (CPU, RAM, storage) are sufficient for your dataset size.

Manjunath_k_s · September 29, 2023, 1:56pm

Hi Vishal,

I see your point. Just out of curiosity how did you say no major difference ? 21ms vs 110 ms ?

John_Sewell · September 29, 2023, 3:08pm

Unless you have a covered index it’ll still need to read the documents to return the data as opposed to just using an in-memory index.
From the looks of things you have fields not in the index that are in the document, hence needing to hit documents as opposed to just the index.

turivishal · September 30, 2023, 4:53am

Hi @Manjunath_k_s,

We can’t measure the execution time by document counts, Second, possibly your first query result comes from the cache but the second query does not, the time will fluctuate if there are other queries that fit/overwrite in memory or cache depending on your hardware configurations. Third possibly the other 80k document sizes (bytes) are higher than the first 80k documents.

Can’t predict the difference in milliseconds with the number of documents but if it is taking time in seconds then you have to worry about what is going on.

Manjunath_k_s · October 2, 2023, 10:32am

Hi Vishal,

Thanks. Since i am on the same topic and schema, i have another question.

I have compound index like this

db.<col>.createIndex({ "pgId" : 1, "email" : 1, "bookingStatus" : 1, "bookingType" : 1, "scheduledDate" : -1})

My question is

Query 1 - find({ email : “Frida@gmail.com” })

Query 2 - find({pgId: ObjectId(‘65110ce00f00ba098b296d8e’), email : “Aaron@gmail.com” })

If i query with fields pgId, email query execution is done by “IXSCAN” stage. However if i just query by email field alone, query plan indicates scan is done by “COLSCAN”.
I expected a “IXSCAN” in the second query as well. Why is mongodb looking for collscan when “email” is prefix of compund index ?

turivishal · October 2, 2023, 10:57am

Hello @Manjunath_k_s,

I think you missed to read the compound index documentation property, refer example provided in the documentation about prefixes,

The order of the fields in the compound index matters, to support the index it requires the first field pgId in your query.