[Suggestion] Optimizing Query Plan Selection for Queries with Large Skip Values in MongoDB

Problem Statement

While implementing pagination using skip in MongoDB, we encountered significant performance issues. When accessing the last page, which required skipping approximately 600,000 documents, we observed the following problems:

  • API failures due to Gateway Timeout (taking 10+ minutes)
  • Excessive time to return just 5 documents
  • Severe load on Atlas server:
    • CPU IOWait spike: over 200% of Max System CPU
    • Memory Virtual metrics: Sustained peak levels
    • Overall query performance degradation

Analysis

Upon examining the execution plan of the problematic query:

// Problematic query structure
db.collection.find({
“type”: “MAJOR”,
“date”: { “$gt”: ISODate(“2024-11-18”) }
})
.sort({ “distance”: 1 })
.skip(600000)
.limit(5)

Key performance metrics:

  • Total execution time: 1,110,238ms (approximately 18.5 minutes)
  • Actual data retrieval time: 3,351ms (approximately 3.3 seconds)
  • Number of rejected plans: 5
  • Data read: 23.8GB
  • Documents returned: 5

Root Cause Analysis

Investigation of MongoDB Query Planner behavior revealed:

  1. Query Planner executes multiple plans to select the optimal execution plan (Winning Plan)
  2. The skip operation is repeated during each plan execution, causing severe performance degradation
  3. Skipping large amounts of data becomes a major bottleneck in plan selection

Solution

Modified the query to explicitly provide a hint, forcing the Query Planner to use a specific index immediately:

db.collection.find({
“type”: “MAJOR”,
“date”: { “$gt”: ISODate(“2024-11-18”) }
})
.sort({ “distance”: 1 })
.skip(600000)
.limit(5)
.hint({ “type”: 1, “date”: 1, “distance”: 1 }) // Specify appropriate index

Discussion Points

Potential optimizations for MongoDB Query Planner:

  1. When skip value exceeds a certain threshold (e.g., 100,000)
  2. Replace with SKIP_MAX value (e.g., 1,000) during execution plan evaluation
  3. This could significantly improve the efficiency of plan selection process

We believe such modifications to the Query Planner’s behavior could substantially improve performance for queries involving large skip operations. We welcome insights and experiences from the MongoDB community regarding this approach.