Change streams: Scanned Objects / Returned has gone above 1000

Hi!

I see a similar question that was unanswered:
https://www.mongodb.com/community/forums/t/change-stream-causes-rise-in-the-scanned-objects-returned-metric/146414

I am receiving a “Scanned Objects / Returned has gone above 1000” that seems to be originating from the change streams. I did not used to receive those alerts, they started happening recently.

According to change streams indexing and performance:

Change streams cannot use indexes. MongoDB does not support creating indexes on the oplog collection.

So I would like to ask what is causing this and what’s an action I could take to avoid it.

Thanks!


MongoDB version: 6.0.23
Tier: M30 (with scaling down)

Query profiler log (obfuscated names):

{
  "type": "command",
  "ns": "<collection>",
  "appName": "<app>",
  "command": {
    "aggregate": "<collection>",
    "pipeline": [
      {
        "$changeStream": {
          "fullDocument": "updateLookup",
          "fullDocumentBeforeChange": "whenAvailable",
          "resumeAfter": {
            "_data": "826833B4E2000000022B022C0100296E5A1004B65DC44F3CD8445A91EEFD2662C1E11046645F69640064660215A6092125B5ED26A57F0004"
          }
        }
      },
      {
        "$match": {
          "operationType": {
            "$in": [
              "insert",
              "update",
              "replace",
              "delete"
            ]
          }
        }
      }
    ],
    "cursor": {},
    "lsid": {
      "id": {
        "$binary": {
          "base64": "jY0x3pRrR727pqn5GqciQw==",
          "subType": "04"
        }
      }
    },
    "$clusterTime": {
      "clusterTime": {
        "$timestamp": {
          "t": 1748243391,
          "i": 45
        }
      },
      "signature": {
        "hash": {
          "$binary": {
            "base64": "szNARDzm5gueuBpaX0wU5DEfbkQ=",
            "subType": "00"
          }
        },
        "keyId": 7466478117821874177
      }
    },
    "$db": "<db>"
  },
  "planSummary": "COLLSCAN",
  "planningTimeMicros": 1014,
  "cursorid": 6114367631589341584,
  "keysExamined": 0,
  "docsExamined": 834661,
  "numYields": 1170,
  "nreturned": 0,
  "queryHash": "40BB329B",
  "queryFramework": "classic",
  "reslen": 297,
  "locks": {
    "ParallelBatchWriterMode": {
      "acquireCount": {
        "r": 1
      }
    },
    "FeatureCompatibilityVersion": {
      "acquireCount": {
        "r": 1177
      }
    },
    "ReplicationStateTransition": {
      "acquireCount": {
        "w": 2
      }
    },
    "Global": {
      "acquireCount": {
        "r": 1177
      }
    },
    "Database": {
      "acquireCount": {
        "r": 2
      }
    },
    "Collection": {
      "acquireCount": {
        "r": 2
      }
    },
    "Mutex": {
      "acquireCount": {
        "r": 7
      }
    }
  },
  "readConcern": {
    "level": "majority"
  },
  "writeConcern": {
    "w": "majority",
    "wtimeout": 0,
    "provenance": "implicitDefault"
  },
  "storage": {
    "data": {
      "bytesRead": 210637668,
      "timeReadingMicros": 4666481
    }
  },
  "remote": "172.31.248.66:1247",
  "protocol": "op_msg",
  "durationMillis": 14378,
  "v": "6.0.23",
  "isTruncated": false
}

And here’s the stats:

you got that because it scanned 834k docs and returned 0. that’s way above 1k.

looking at the exact profiler log it looks like the change stream is using a resumeAfter token to resume from a specific point. (even if this happens mongodb has to scan through many docs in the oplog to find the changes that match the criteria since the resume point)

maybe using startAfter would be better.
you obfuscated stuff so you’ll have to de-obfuscate it lol

const changeStream = collection.watch([
  {
    $match: {
      operationType: { $in: ["insert", "update", "replace", "delete"] }
    }
  }
], {
  fullDocument: "updateLookup",
  fullDocumentBeforeChange: "whenAvailable",
  startAfter: resumeToken 
});

i dont know your exact use case though…you could also start from current time instead of using an old resume token, or make it more specific to reduce scanning,

main thing is that your resume token is too old, forcing mongodb to scan through a large part of the oplog to catch up.

hope that helps!

Hi!

Thanks for the context!
It seems this is the case spot on.

I basically use the exact same scenario as the example on Resume a Change Stream. I listen for an error and then I try to reconnect in order to resume the stream flow, from that specific moment in time that the error occurred.

The token is not old. I scale my MongoDB tier down and up on specific days and times, and it seems this scanning ratio error coincides with the time the scaling occurs. So it seems after the new instance of MongoDB has spun up, it needs to scan all documents again to get to the resume point (reasonable).

Maybe using startAfter would be better.

I assume the resumeAfter would not make much of a difference in terms of scanning of documents. At least that’s what I gather from the limited documentation on it. Please do let me know if otherwise though.

You could also start from current time instead of using an old resume token, or make it more specific to reduce scanning

Could you expand a bit on this one? Are you suggesting something of sorts?

const changeStream = collection.watch([
  {
    $match: {
      'fullDocument.updatedAt': { $gt: lastStreamTimestamp },
      // Or, maybe: 'fullDocument._id': { $gt: lastStreamDocId },
      operationType: { $in: ["insert", "update", "replace", "delete"] }
    }
  }
], {
  fullDocument: "updateLookup",
  fullDocumentBeforeChange: "whenAvailable"
});

Thanks again!

ah yes, i just meant you dont pass the startAfter field:

const changeStream = collection.watch([
  {
    $match: {
      operationType: { $in: ["insert", "update", "replace", "delete"] }
    }
  }
], {
  fullDocument: "updateLookup",
  fullDocumentBeforeChange: "whenAvailable"
  // No resumeAfter - (starts from current time)
});

but yes combining it with more fields would reduce the scan

so the query you sent is on track with what i meant.

yes given what you just said it the startAfter should be better. start after excludes the token event and starts after it. for scaling:

you’ll still get some scanning because the new mongodb instance needs to rebuild it change stream cursor state.
the oplog might have been truncated during scaling
mongodb needs to validate the resume token is valid

but start after should significantly reduce the scanning compared to resumeAfter. (especially in your scaling scenario where you’re restarting frequently)

I will try couple of things see how it goes!
Thank you!

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.