Change Stream Java driver COLLSCAN

I’m watching 3 collections in a DB using the Java driver. Each of those collections has one document only, each of which have embedded documents. My client code looks like this:

ChangeStreamIterable<Document> changes =
          client.getDatabase(<DBNAME>)
              .watch(Collections.singletonList(
                      Aggregates.match(Filters.in("ns.coll", Arrays.asList(WATCHED_COLLECTIONS)))))
          .fullDocument(FullDocument.UPDATE_LOOKUP);

where the variable “WATCHED_COLLECTIONS” is an array of the 3 collection names that I want to watch.

Since I’ve used the “match” stage, this filtering should be happening at the server side right?
Despite that, in the mongo logs, I can see that ‘docsExamined’ is very high!

Why would that be happening, since there’s only one document in each collection? Even if we count all the embedded documents it doesn’t come up to 11000 documents.

Excerpt from mongo log below:

COMMAND [conn20161] command DBNAME.$cmd command: getMore { getMore: 1760441711222280319, collection: “$cmd.aggregate”, $db: “DBNAME”, $clusterTime: { clusterTime: Timestamp(1590477125, 7396), signature: { hash: BinData(0, 17B8B1B3ADE3FEFC381F56E9201694DC9509BC38), keyId: 6829683829607759874 } }, lsid: { id: UUID(“f88e3593-bec6-47cc-a067-6042f36aa1a3”) } } originatingCommand: { aggregate: 1, pipeline: [ { $changeStream: { fullDocument: “updateLookup” } }, { $match: { ns.coll: { $in: [ “COLLECTION1”, “COLLECTION2”, “COLLECTION3” ] } } } ], cursor: {}, $db: “DBNAME”, $clusterTime: { clusterTime: Timestamp(1590160602, 2), signature: { hash: BinData(0, 39A22239ED8BA07ED1E8B710D4212AE8CDB52663), keyId: 6829683829607759874 } }, lsid: { id: UUID(“f88e3593-bec6-47cc-a067-6042f36aa1a3”) } } planSummary: COLLSCAN cursorid:1760441711222280319 keysExamined:0 docsExamined:11890 numYields:7138 nreturned:0 reslen:305 locks:{ ReplicationStateTransition: { acquireCount: { w: 7141 } }, Global: { acquireCount: { r: 7141 } }, Database: { acquireCount: { r: 7141 } }, Mutex: { acquireCount: { r: 3 } }, oplog: { acquireCount: { r: 7141 } } } storage:{ data: { bytesRead: 14 } } protocol:op_msg 351ms

Trying to bounce this topic up in case anybody has any insights…

@Murali_Rao sorry for the late reply, but to help clarify this behavior Change Streams are special cursors that are opened against the oplog.

As such, when the cursor is being advanced it is scanning the documents in the oplog for matches; not the collections themselves.

1 Like

HI ,

I am using .net drivers. I have few questions on the same topic.

  1. Is there a way i can open a change stream cursor on a collection which is advanced to tail of oploag in the first read. if yes how?

because in my case i am just watching static/master data collection to refresh cache. i dont want it to scan all the oplog when opened.

thanks.