Slight delay during the iteration of 101th document from Cursor

Hi,

I noticed that while iterating a MongoDB cursor using PyMongo, there is a slight delay in requesting the 101st document from the cursor specifically.

This delay occurs even when I increase the batch_size parameter of cursor options and also on all the different machines I have tried.

I am curious as to if there is a reason behind this or it’s just an illusion I am seeing.

Example:

Try to iterate the cursor and print an incrementing counter, I am seeing a delay during the 101st iteration.

col = db["test"]
cursor = col.find({})
count = 0


for it in cursor:
    count += 1
    print("\r", count, end="")

1 Like

Hi @Harshavardhan_Kumare - thanks for posting the question here, and sorry it’s taken so long to reply!

As we talked about on the stream, you’d expect a delay with the code you’ve provided, as the 101st document is in the second batch loaded from the server. Can you provide your code that sets the batch_size? I suspect that’s where the problem lies.

Hi @Harshavardhan_Kumare,

find and aggregate operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the BSON Document Max Size (16 MB).

Cursor Batches will limit each getMore after the initial batch of 101 documents to the specified number of documents if they won’t exceed the BSON Document Max Size.

For example:

function setup() {
    db.foo.drop();
    var data = [];
    for (var i = 0; i < 300; i++) {
        data.push({ a: i, d: new Date() });
    }
    db.foo.insertMany(data);
    db.foo.createIndex({ a: 1 })
    db.setProfilingLevel(0, { slowms: 0 })
}
setup();

db.foo.find({ a: { $lte: 105 } }).toArray();
2021-07-13T06:17:51.358-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: find { find: "foo", filter: { a: { $lte: 105.0 } }, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:646849439178639492 keysExamined:101 docsExamined:101 numYields:0 nreturned:101 queryHash:811451DD planCacheKey:B859B02A reslen:4936 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
2021-07-13T06:17:51.364-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: getMore { getMore: 646849439178639492, collection: "foo", lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } originatingCommand: { find: "foo", filter: { a: { $lte: 105.0 } }, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:646849439178639492 keysExamined:5 docsExamined:5 cursorExhausted:1 numYields:0 nreturned:5 reslen:331 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms

When filtering for an expected 106 documents note the find command returns 101 results first (nreturned:101), then the getMore for the associated cursor (cursorid:646849439178639492) will return the remaining 5 results (cursorExhausted:1 nreturned:5).

db.foo.find({ a: { $lte: 250 } }).batchSize(50).toArray();

The same operation with a larger expected result set (251) in batches of 50 should call getMore multiple times until the cursor is exhausted and all results have been returned:

2021-07-13T06:19:32.680-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: find { find: "foo", filter: { a: { $lte: 250.0 } }, batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:5536327249869240011 keysExamined:50 docsExamined:50 numYields:0 nreturned:50 queryHash:811451DD planCacheKey:B859B02A reslen:2487 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
2021-07-13T06:19:32.683-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: getMore { getMore: 5536327249869240011, collection: "foo", batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } originatingCommand: { find: "foo", filter: { a: { $lte: 250.0 } }, batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:5536327249869240011 keysExamined:50 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
2021-07-13T06:19:32.686-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: getMore { getMore: 5536327249869240011, collection: "foo", batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } originatingCommand: { find: "foo", filter: { a: { $lte: 250.0 } }, batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:5536327249869240011 keysExamined:50 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
2021-07-13T06:19:32.687-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: getMore { getMore: 5536327249869240011, collection: "foo", batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } originatingCommand: { find: "foo", filter: { a: { $lte: 250.0 } }, batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:5536327249869240011 keysExamined:50 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
2021-07-13T06:19:32.689-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: getMore { getMore: 5536327249869240011, collection: "foo", batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } originatingCommand: { find: "foo", filter: { a: { $lte: 250.0 } }, batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:5536327249869240011 keysExamined:50 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
2021-07-13T06:19:32.691-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: getMore { getMore: 5536327249869240011, collection: "foo", batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } originatingCommand: { find: "foo", filter: { a: { $lte: 250.0 } }, batchSize: 50.0, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: IXSCAN { a: 1 } cursorid:5536327249869240011 keysExamined:1 docsExamined:1 cursorExhausted:1 numYields:0 nreturned:1 reslen:143 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms

Based on the example you shared, the same operation using the mongosh shell would be:

db.foo.find({ }).toArray();

If no batch is specified, based on our understanding of the find operation 101 results should be returned then a getMore would return the rest (assuming we don’t exceed a reslen of 16MB):

2021-07-13T06:25:37.247-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: find { find: "foo", filter: {}, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: COLLSCAN cursorid:4001987052313098472 keysExamined:0 docsExamined:101 numYields:0 nreturned:101 reslen:4936 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
2021-07-13T06:25:37.250-0400 I  COMMAND  [conn2] command test.foo appName: "MongoDB Shell" command: getMore { getMore: 4001987052313098472, collection: "foo", lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, lsid: { id: UUID("85402e56-9b99-4704-95bf-53eb6834fad5") }, $db: "test" } planSummary: COLLSCAN cursorid:4001987052313098472 keysExamined:0 docsExamined:199 cursorExhausted:1 numYields:1 nreturned:199 reslen:9737 locks:{ ReplicationStateTransition: { acquireCount: { w: 2 } }, Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms

The perceived “delay” between the 101st document and the 102nd would be the roundtrip time of a getMore command being issued to fetch the next batch of documents from the server for the cursor that is being iterated.

If the batch_size is provided to the find method in PyMongo the same should be happen as what was shown above:

from pymongo import MongoClient
client = MongoClient()
db = client['test']
coll = db['foo']

cursor = coll.find({})
count = 0

for it in cursor:
    count += 1

print("\r", count, end="")

# 2021-07-13T09:41:57.969-0400 I  COMMAND  [conn9] command test.foo command: find { find: "foo", filter: {}, lsid: { id: UUID("b15cdbe6-9a20-4b92-a904-d775aa03ff45") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:7580655042457658670 keysExamined:0 docsExamined:101 numYields:0 nreturned:101 reslen:4936 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
# 2021-07-13T09:41:57.971-0400 I  COMMAND  [conn9] command test.foo command: getMore { getMore: 7580655042457658670, collection: "foo", lsid: { id: UUID("b15cdbe6-9a20-4b92-a904-d775aa03ff45") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, lsid: { id: UUID("b15cdbe6-9a20-4b92-a904-d775aa03ff45") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:7580655042457658670 keysExamined:0 docsExamined:199 cursorExhausted:1 numYields:1 nreturned:199 reslen:9737 locks:{ ReplicationStateTransition: { acquireCount: { w: 2 } }, Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 2 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
cursor = coll.find({}, batch_size=50)
count = 0

for it in cursor:
    count += 1

print("\r", count, end="")

# 2021-07-13T09:44:36.152-0400 I  COMMAND  [conn14] command test.foo command: find { find: "foo", filter: {}, batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:1790856328923279599 keysExamined:0 docsExamined:50 numYields:0 nreturned:50 reslen:2487 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
# 2021-07-13T09:44:36.154-0400 I  COMMAND  [conn14] command test.foo command: getMore { getMore: 1790856328923279599, collection: "foo", batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:1790856328923279599 keysExamined:0 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
# 2021-07-13T09:44:36.156-0400 I  COMMAND  [conn14] command test.foo command: getMore { getMore: 1790856328923279599, collection: "foo", batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:1790856328923279599 keysExamined:0 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
# 2021-07-13T09:44:36.157-0400 I  COMMAND  [conn14] command test.foo command: getMore { getMore: 1790856328923279599, collection: "foo", batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:1790856328923279599 keysExamined:0 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
# 2021-07-13T09:44:36.159-0400 I  COMMAND  [conn14] command test.foo command: getMore { getMore: 1790856328923279599, collection: "foo", batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:1790856328923279599 keysExamined:0 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
# 2021-07-13T09:44:36.160-0400 I  COMMAND  [conn14] command test.foo command: getMore { getMore: 1790856328923279599, collection: "foo", batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:1790856328923279599 keysExamined:0 docsExamined:50 numYields:0 nreturned:50 reslen:2486 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms
# 2021-07-13T09:44:36.161-0400 I  COMMAND  [conn14] command test.foo command: getMore { getMore: 1790856328923279599, collection: "foo", batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test" } originatingCommand: { find: "foo", filter: {}, batchSize: 50, lsid: { id: UUID("196e2520-77b2-4e89-bf40-362459794300") }, $db: "test", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN cursorid:1790856328923279599 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:96 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 0ms

@alexbevi Just to clarify, in the second case, you’d expect minor pauses after the 50th, 100th, and 150th documents (and obviously before the first document was received) while the client waits for a batch to be loaded?

Correct. Each subsequent getMore would be a roundtrip to the server so the impact would be non-zero (though in most cases negligible).

1 Like

Wow, thanks a lot @alexbevi and @Mark_Smith.

I didn’t know about the 16MB rule and I kept increasing the batch size to 5 digits where each document has 500+ keys. No wonder I am getting frequent delays within each batches value I provided.

Thanks a ton, @alexbevi! This question has been bugging me for a while now.

You people are awesome!

P.S. @Mark_Smith looking forward for your Rust presentation at MongoDB .live :slight_smile:

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.