Why does find() return documents in batches?

I am trying to find all the documents I have in a collection in Java. No query, no sorting, just a simple find(). I am using Vert.x’s Mongo Client, which, according to their documentation, uses the MongoDB Reactive Streams Java driver.

The problem I am having is that it does not return all documents at once. It returns them in batches. If I turn on DEBUG logs, this what I see:

2022-01-20 17:12:25.536 [async-channel-group-0-handler-executor] DEBUG org.mongodb.driver.protocol.command - Sending command '{"getMore": 82848641461299, "collection": "someCollection", "$db": "someDB", "$readPreference": {"mode": "primaryPreferred"}}' with request id 10091 to database someDB on connection [connectionId{localValue:2}] to server instance.eu-west-1.docdb.amazonaws.com:27017
2022-01-20 17:12:25.617 [async-channel-group-0-handler-executor] DEBUG org.mongodb.driver.protocol.command - Execution of command with request id 10091 completed successfully in 80.81 ms on connection [connectionId{localValue:2}] to server instance.eu-west-1.docdb.amazonaws.com:27017
2022-01-20 17:12:25.630 [async-channel-group-0-handler-executor] DEBUG org.mongodb.driver.protocol.command - Sending command '{"getMore": 82848641461299, "collection": "someCollection", "$db": "someDB", "$readPreference": {"mode": "primaryPreferred"}}' with request id 10092 to database someDB on connection [connectionId{localValue:2}] to server instance.eu-west-1.docdb.amazonaws.com:27017
2022-01-20 17:12:25.705 [async-channel-group-0-handler-executor] DEBUG org.mongodb.driver.protocol.command - Execution of command with request id 10092 completed successfully in 75.14 ms on connection [connectionId{localValue:2}] to server instance.eu-west-1.docdb.amazonaws.com:27017

This keeps getting printed over and over again for multiple minutes. My guess is that it is behaving similar to mongosh, in the sense that it returns documents in batches, instead of everything at once. So it keeps sending queries to Mongo to get the rest of the documents until it eventually receives all of them. Is there no way to just get all documents at once? If I try running the distinct() function instead, it always gets all documents at once, no matter how many there are. It’s just find() that does this batching mechanism.

Hi @Tiago_Silva1 welcome to the community!

Just to clarify, I noticed this in the log you posted:

... to server instance.eu-west-1.docdb.amazonaws.com:27017

Is this a DocumentDB server you’re connecting to? If yes, then the behaviour you’re seeing could be coming from DocumentDB.

It also worth noting that Vert.x’s client is based on MongoDB’s official Java driver, hence they might have added some extra handling on top of the driver.

I noticed that you also have a similar question posted on StackOverflow. Since your question is about DocumentDB (an AWS product) and Vert.x where neither are supported by MongoDB, I would suggest to monitor the StackOverflow question instead, since this forum is MongoDB-specific.

Having said that, if you have any further question about products supported by MongoDB, we’ll be happy to help!

Best regards
Kevin

1 Like

By the way, the standard behaviour for drivers is to send documents in batches to allow applications (and the server) to efficiently process results. A larger batch of results will have more client processing overhead (memory allocation and deserialisation).

The batch size can be adjusted via the underlying MongoDB driver. For the Java driver: FindIterable (driver-sync 4.4.0 API).

Best regards
Kevin

1 Like

Is this a DocumentDB server you’re connecting to? If yes, then the behaviour you’re seeing could be coming from DocumentDB.

Yes, it is a DocumentDB database. However, I don’t believe the behavior comes from that. I’ve been some more research and going through some code. It does seem that the default behavior in MongoDB is to return batches, like @kevinadi said. If I want all the results at once, I should use a toArray() method on the result I get from find(). That forces MongoDB to load all the results into RAM and return them all at once.

The problem seems to be that the MongoDB Reactive Streams Java driver does not provide that method. As such, the Vert.x Mongo Client I use (which uses the Reactive Mongo Driver under the hood) does not provide it either. The non-reactive version of the driver (i.e. the ordinary one), does provide it though. Sadly, I cannot use it right now, as our application is too engrained on using the other one. So, as a workaround, I will probably have to use distinct() in place of find(), which always returns all results at once and not in batches. The DB will be forced to do a distinct operation when it is not needed, but it seems to be the best work around I can find.

I forgot to mention, the Vert.x Mongo Client does provide a batchSize() method to adjust the batch size of the results, but, for some reason, the library ignores the value and does not use it in find() operations…