Query Planner not using DISTINCT_SCAN on time-series collection

Sandro_Mosca · April 28, 2022, 6:54am

Hello everyone,

I’m hitting a problem while trying to .distinct on a time-series collection when an index is present on the key.

When using a normal collection, the planner is able to find the DISTINCT_SCAN plan:

test> db.test_index_skipping.getIndexKeys()
[ { _id: 1 }, { attributes: 1 } ]
test> db.test_index_skipping.explain().distinct('attributes')
{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'test.test_index_skipping',
    indexFilterSet: false,
    parsedQuery: {},
    queryHash: '2211C6E0',
    planCacheKey: 'E10B2080',
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'PROJECTION_COVERED',
      transformBy: {},
      inputStage: {
        stage: 'DISTINCT_SCAN',
        keyPattern: { attributes: 1 },
        indexName: 'attributes_1',
...

But this doesn’t work on a time-series collection, and the planner seems to default on COLLSCAN:

test> db.createCollection('timeseries_index_skipping', { timeseries: { timeField: "timestamp", metaField: "metadata" } })
{ ok: 1 }
... insert some stuff ...
test> db.timeseries_index_skipping.createIndex({ 'metadata.attributes': 1 })
test> db.timeseries_index_skipping.explain().distinct('metadata.attributes')
{
  explainVersion: '1',
  stages: [
    {
      '$cursor': {
        queryPlanner: {
          namespace: 'test.system.buckets.timeseries_index_skipping',
          indexFilterSet: false,
          parsedQuery: {},
          queryHash: '8B3D4AB8',
          planCacheKey: 'D542626C',
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: { stage: 'COLLSCAN', direction: 'forward' },
          rejectedPlans: []
        }
      }
    },
    {
      '$_internalUnpackBucket': {
        include: [ 'metadata' ],
        timeField: 'timestamp',
        metaField: 'metadata',
        bucketMaxSpanSeconds: 3600
      }
    },
...

I tried to search JIRA for a bug but I couldn’t find anything, am I doing something wrong?

Thanks!