Can cursor reads miss matching documents if there is deletion of only already read entries?

Can cursor reads miss matching documents that are still to be fetched if there is deletion of already read entries?

Not a definitive answer because I cannot point you to a specification that confirms or not my findings.

Using the shell I have the following collection:

[
  { _id: ObjectId("6248480148cd0baca1de83ee") },
  { _id: ObjectId("6248480548cd0baca1de83ef") },
  { _id: ObjectId("6248480648cd0baca1de83f0") },
  { _id: ObjectId("6248480748cd0baca1de83f1") },
  { _id: ObjectId("6248480748cd0baca1de83f2") },
  { _id: ObjectId("6248480848cd0baca1de83f3") },
  { _id: ObjectId("6248480948cd0baca1de83f4") },
  { _id: ObjectId("6248480a48cd0baca1de83f5") },
  { _id: ObjectId("6248480b48cd0baca1de83f6") },
  { _id: ObjectId("6248480c48cd0baca1de83f7") },
  { _id: ObjectId("6248480c48cd0baca1de83f8") },
  { _id: ObjectId("6248480d48cd0baca1de83f9") },
  { _id: ObjectId("6248480e48cd0baca1de83fa") },
  { _id: ObjectId("6248480f48cd0baca1de83fb") },
  { _id: ObjectId("6248480f48cd0baca1de83fc") },
  { _id: ObjectId("6248481048cd0baca1de83fd") },
  { _id: ObjectId("6248481148cd0baca1de83fe") },
  { _id: ObjectId("6248481248cd0baca1de83ff") },
  { _id: ObjectId("6248481248cd0baca1de8400") },
  { _id: ObjectId("6248481348cd0baca1de8401") }
  { _id: ObjectId("6248481448cd0baca1de8402") },
  { _id: ObjectId("6248481548cd0baca1de8404") },
  { _id: ObjectId("6248481648cd0baca1de8405") }
]

I then create a cursor to iterate over all documents.

Atlas rent-shard-0 [primary] test> cursor = c.find({})
[
  { _id: ObjectId("6248480148cd0baca1de83ee") },
  { _id: ObjectId("6248480548cd0baca1de83ef") },
  { _id: ObjectId("6248480648cd0baca1de83f0") },
  { _id: ObjectId("6248480748cd0baca1de83f1") },
  { _id: ObjectId("6248480748cd0baca1de83f2") },
  { _id: ObjectId("6248480848cd0baca1de83f3") },
  { _id: ObjectId("6248480948cd0baca1de83f4") },
  { _id: ObjectId("6248480a48cd0baca1de83f5") },
  { _id: ObjectId("6248480b48cd0baca1de83f6") },
  { _id: ObjectId("6248480c48cd0baca1de83f7") },
  { _id: ObjectId("6248480c48cd0baca1de83f8") },
  { _id: ObjectId("6248480d48cd0baca1de83f9") },
  { _id: ObjectId("6248480e48cd0baca1de83fa") },
  { _id: ObjectId("6248480f48cd0baca1de83fb") },
  { _id: ObjectId("6248480f48cd0baca1de83fc") },
  { _id: ObjectId("6248481048cd0baca1de83fd") },
  { _id: ObjectId("6248481148cd0baca1de83fe") },
  { _id: ObjectId("6248481248cd0baca1de83ff") },
  { _id: ObjectId("6248481248cd0baca1de8400") },
  { _id: ObjectId("6248481348cd0baca1de8401") }
]
Type "it" for more

I then delete one of the document that did not appear in my first batch.

Atlas rent-shard-0 [primary] test> c.deleteOne( { _id: ObjectId("6248481548cd0baca1de8404") } )
{ acknowledged: true, deletedCount: 1 }

I then complete my iteration and I see that the deleted document still exists in my cursor.

Atlas rent-shard-0 [primary] test> it
[
  { _id: ObjectId("6248481448cd0baca1de8402") },
  { _id: ObjectId("6248481548cd0baca1de8404") },
  { _id: ObjectId("6248481648cd0baca1de8405") }
]

Thanks @steevej , actually I wanted to know about the case when

  1. cursor fetches a matching document

  2. the document from one is deleted

  3. cursor fetches more then is there a chance of not fetching the matching docs (not deleted one) that would have been read if there were no deletion because of data movement.

Eg: Say a cursor would give me two results

  { _id: ObjectId("6248480f48cd0baca1de83fc") },
  { _id: ObjectId("6248481048cd0baca1de83fd") },

So if I limit the size to 1 and
delete the first one after first fetch
then is there a possibility of not getting the second result in second fetch?

If you limit to 1, then your cursor will contain one and only one document and it will be the one that existed when the cursor was created, it this case …83fc. The cursor is not reevaluated.

All the docs where existing when the cursor was created.
Doc is not deleted before starting the cursor but is deleted as what was fetched from cursor.
Here is the sequence of events:

  1. the cursor is created with limit 1 matching 2 docs.

  2. Fetch the first ObjectId("6248480f48cd0baca1de83fc") and delete this

  3. Go fetch second one ObjectId("6248481048cd0baca1de83fd") , so would the cursor always return this or is there a chance of missing this doc?

Read about limit. It looks like you have a deep misunderstanding of what limit is for a cursor.

If the cursor is created with limit 1, no matter how many documents matches, the cursor will contain 1 and only 1 document because it is limited to 1.

No matter what you do to the only 1 document in the cursor none of the other matching documents will be in the cursor. So you will always miss the …83fd document because the cursor contains, because it is limited to 1, only 1 document which is …83fc in your example.

Sorry, it was blunder on my part. I take back the limit thing. I wanted to say batchsize of the cursor.

According to batchSize() documentation:

Do not use a batch size of 1 .

Specifying 1 or a negative number is analogous to using the limit() method.

In most cases, modifying the batch size will not affect the user or the application, as the mongo shell and most drivers return results as if MongoDB returned a single batch.

Not using batchsize 1 but using batchsize as 1000. That was mentioned just to give and example and explain what I was asking.

If that document is deleted or altered in way that it does not match your query anymore you will not get the document. If we go back to my first example where I deleted …8404 before iterating the rest of the cursor, we saw that it was still in my cursor, most likely it was part of the first and only batch in this case.

If we change batch size we get a different result.

// starting collection
mongosh> cursor = c.find().batchSize( 4 )
{ _id: ObjectId("6248480148cd0baca1de83ee") }
{ _id: ObjectId("6248480548cd0baca1de83ef") }
{ _id: ObjectId("6248480648cd0baca1de83f0") }
{ _id: ObjectId("6248480748cd0baca1de83f1") }
{ _id: ObjectId("6248480748cd0baca1de83f2") }
{ _id: ObjectId("6248480848cd0baca1de83f3") }
{ _id: ObjectId("6248480948cd0baca1de83f4") }
{ _id: ObjectId("6248480a48cd0baca1de83f5") }
{ _id: ObjectId("6248480b48cd0baca1de83f6") }
{ _id: ObjectId("6248480c48cd0baca1de83f7") }
{ _id: ObjectId("6248480c48cd0baca1de83f8") }
{ _id: ObjectId("6248480d48cd0baca1de83f9") }
{ _id: ObjectId("6248480e48cd0baca1de83fa") }
{ _id: ObjectId("6248480f48cd0baca1de83fb") }
{ _id: ObjectId("6248480f48cd0baca1de83fc") }
{ _id: ObjectId("6248481048cd0baca1de83fd") }
{ _id: ObjectId("6248481148cd0baca1de83fe") }
{ _id: ObjectId("6248481248cd0baca1de83ff") }
{ _id: ObjectId("6248481248cd0baca1de8400") }
{ _id: ObjectId("6248481348cd0baca1de8401") }
Type "it" for more
it
{ _id: ObjectId("6248481448cd0baca1de8402") }
{ _id: ObjectId("624af4707fe88f091fe7b529") }
{ _id: ObjectId("624af5167fe88f091fe7b52c") }

// now the test, mongosh iterates over the 20 first document anyway so in 
// principal we got the first 5 batches with the following
mongosh> cursor = c.find().batchSize( 4 )
{ _id: ObjectId("6248480148cd0baca1de83ee") }
// same documents as above but redacted out for simplicity
{ _id: ObjectId("6248481348cd0baca1de8401") }
Type "it" for more

// Now I delete a document that is in the next batch
mongosh> c.deleteOne( { "_id" : ObjectId('624af5167fe88f091fe7b52c')})
{ acknowledged: true, deletedCount: 1 }

// Then type "it" to get the next batch to see that the deleted document is not there
// which is different from the other test where ...8404 was still in my set.
mongosh> it
{ _id: ObjectId("6248481448cd0baca1de8402") }
{ _id: ObjectId("624af4707fe88f091fe7b529") }

@steevej thanks for the detailed example. But my scenario is different. In my case
the document that I am deleting is already fetched, so I am not expecting that doc in the future fetch and if it is there also there is no problem.
So my real question is does future matching docs gets impacted because I am deleting docs from the fetched result set in a way that it won’t return matched docs? Does deletion in cursor iteration has any side impacts ?
For instance,
In your case you are deletiing ObjectId('624af5167fe88f091fe7b52c').
However is there a chance of cursor not returning

{ _id: ObjectId("6248481448cd0baca1de8402")

from you last fetch result set?