Very slow performance for comparison operator: gt

Hello all, seems like very basic requirement but still not able to figure it out.

So I have 10Million records in a collection who’s structure is somewhat like below:

{
"timestamp": {
"$date": "2024-06-09T23:58:18.846Z"
},
"old_value": {
"bidAmount": 0.18
},
"new_value": {
"bidAmount": 0.18
},
"event_master_name": "GAME_BID_UPDATE"

i have created index on bidAmount as below:
db.logs.createIndex({ “new_value.bidAmount”: 1 });
And trying to find docs using below queries which takes minutes to run, am i missing anything?
Both query run in minutes.

1) db.logs.find({ "new_value.bidAmount": { "$gt": 20 } }).sort({ "timestamp": -1 }).limit(100)

2) db.logs.find({ "new_value.bidAmount": { "$gt": 20 } }).hint({ "new_value.bidAmount": 1 }).sort({ "timestamp": -1 }).limit(100)

attaching both plans below for reference:
Plan 1:

{
  explainVersion: '1',
  stages: [
    {
      '$cursor': {
        queryPlanner: {
          namespace: 'temp.system.buckets.logs',
          indexFilterSet: false,
          parsedQuery: {
            'control.max.new_value.bidAmount': { '$_internalExprGt': 2 }
          },
          queryHash: 'CDA4A9BD',
          planCacheKey: '3716670F',
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            stage: 'COLLSCAN',
            filter: {
              'control.max.new_value.bidAmount': { '$_internalExprGt': 2 }
            },
            direction: 'backward'
          },
          rejectedPlans: []
        }
      }
    },
    {
      '$match': {
        '$expr': {
          '$lte': [
            {
              '$subtract': [ '$control.max.timestamp', '$control.min.timestamp' ]
            },
            { '$const': Long('3600000') }
          ]
        }
      }
    },
    {
      '$_internalUnpackBucket': {
        exclude: [],
        timeField: 'timestamp',
        bucketMaxSpanSeconds: 3600,
        assumeNoMixedSchemaData: true,
        includeMinTimeAsMetadata: true,
        eventFilter: { 'new_value.bidAmount': { '$gt': 2 } }
      }
    },
    {
      '$_internalBoundedSort': {
        sortKey: { timestamp: -1 },
        bound: { base: 'min', offsetSeconds: Long('3600') },
        limit: Long('100')
      }
    }
  ],
  serverInfo: {
    host: '9ddfbf90235d',
    port: 27017,
    version: '7.0.11',
    gitVersion: 'f451220f0df2b9dfe073f1521837f8ec5c208a8c'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600,
    internalQueryFrameworkControl: 'trySbeRestricted'
  },
  command: {
    aggregate: 'system.buckets.logs',
    pipeline: [
      {
        '$_internalUnpackBucket': {
          timeField: 'timestamp',
          bucketMaxSpanSeconds: 3600,
          assumeNoMixedSchemaData: true,
          usesExtendedRange: false
        }
      },
      { '$match': { 'new_value.bidAmount': { '$gt': 2 } } },
      { '$sort': { timestamp: -1 } },
      { '$limit': Long('100') }
    ],
    cursor: {},
    collation: { locale: 'simple' }
  },
  ok: 1
}

Plan2:

{
  explainVersion: '1',
  stages: [
    {
      '$cursor': {
        queryPlanner: {
          namespace: 'temp.system.buckets.logs',
          indexFilterSet: false,
          parsedQuery: {
            'control.max.new_value.bidAmount': { '$_internalExprGt': 2 }
          },
          queryHash: 'CDA4A9BD',
          planCacheKey: '3716670F',
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            stage: 'FETCH',
            filter: {
              'control.max.new_value.bidAmount': { '$_internalExprGt': 2 }
            },
            inputStage: {
              stage: 'IXSCAN',
              keyPattern: {
                'control.min.new_value.bidAmount': 1,
                'control.max.new_value.bidAmount': 1
              },
              indexName: 'new_value.bidAmount_1',
              isMultiKey: false,
              multiKeyPaths: {
                'control.min.new_value.bidAmount': [],
                'control.max.new_value.bidAmount': []
              },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                'control.min.new_value.bidAmount': [ '[MinKey, MaxKey]' ],
                'control.max.new_value.bidAmount': [ '[MinKey, MaxKey]' ]
              }
            }
          },
          rejectedPlans: []
        }
      }
    },
    {
      '$_internalUnpackBucket': {
        exclude: [],
        timeField: 'timestamp',
        bucketMaxSpanSeconds: 3600,
        assumeNoMixedSchemaData: true,
        eventFilter: { 'new_value.bidAmount': { '$gt': 2 } }
      }
    },
    { '$sort': { sortKey: { timestamp: -1 }, limit: Long('100') } }
  ],
  serverInfo: {
    host: '9ddfbf90235d',
    port: 27017,
    version: '7.0.11',
    gitVersion: 'f451220f0df2b9dfe073f1521837f8ec5c208a8c'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600,
    internalQueryFrameworkControl: 'trySbeRestricted'
  },
  command: {
    aggregate: 'system.buckets.logs',
    pipeline: [
      {
        '$_internalUnpackBucket': {
          timeField: 'timestamp',
          bucketMaxSpanSeconds: 3600,
          assumeNoMixedSchemaData: true,
          usesExtendedRange: false
        }
      },
      { '$match': { 'new_value.bidAmount': { '$gt': 2 } } },
      { '$sort': { timestamp: -1 } },
      { '$limit': Long('100') }
    ],
    cursor: {},
    collation: { locale: 'simple' },
    hint: {
      'control.min.new_value.bidAmount': 1,
      'control.max.new_value.bidAmount': 1
    }
  },
  ok: 1
}

Hi Aditya

Try adding the timestamp to your index:

  • BidAmount: 1
  • Timestamp: -1

Hope that helps

Craig.

1 Like

Hello @aditya_rai1.

Great point by @Craig_Crevola, the compound index will improve the performance of the query.

Make sure you understand the Indexing strategy The ESR (Equality, Sort, Range) Rule!

The order of fields is important in a compound index.

Per the query, the filter operator $gt is considered a Range filter. So if you create a compound index in the below order it’s more satisfied.

createIndex({ "timestamp": -1, "new_value.bidAmount": 1 })
3 Likes

Thanks for the reply, Craig and Vishal. I actually did an analysis and found that there were strings in the bidAmount fields as well. After cleaning up, the response time improved significantly. However, the complex query below was still taking a whole lot of time, so following ESR (Entity-Storage-Relationship) principles, I made the index as:

  timestamp: -1,
  'new_value.bidAmount': 1,
  'old_value.bidAmount': 1

which really worked magic on the below complex query:

[
  { "$match": {
      "event_master_name": "GAME_BID_UPDATE", 
      "timestamp": { "$lt": new Date("2024-06-10T00:00:00Z"), "$gt": new Date("2024-04-01T00:00:00Z") }, 
      "$or": [ 
        { "$and": [ { "old_value.bidAmount": { "$gt": 0.1 } }, { "new_value.bidAmount": { "$lt": 2.1 } } ] }, 
        { "$and": [ { "old_value.bidAmount": { "$gt": 3.1 } }, { "new_value.bidAmount": { "$lt": 5 } } ] } 
      ] 
    } 
  },
  { "$sort": { "timestamp": -1 } },
  { "$limit": 50 },
  { "$addFields": { "newBidAmount": "$new_value.bidAmount", "oldBidAmount": "$old_value.bidAmount" } }
]

But one thing I’m not quite sure of is the order of stages in the pipeline, as MongoDB documentation says:

MongoDB cannot perform an index sort on the results of a range filter. 
Place the range filter after the sort predicate so MongoDB can use a non-blocking index sort.

My data is a timeseries one so it makes more sense for me to apply range filter on time key considering i will have years of data and users mostly will query on few last months.

So should i make changes to the pipeline and do range filtering on time and bidvalues after sort?

Hello @aditya_rai1,

It’s Equality, Sort, Range!


Might be about a single range filter, but you have equality and also a bounded range filter, which means a lower limit and an upper limit both so it’s not related to your case.

You can check performance in explain by changing the filters as you have confusion then try both queries and see what you get.

I think you can also add one more field event_master_name for equality match in your index for better performance.

Yeah my bad i asked gpt to format and it did something extra :slight_smile:

For now i’m dont see much difference.

And yeah i included that also it works really fast.

Thanks for all the help.

1 Like