(Mis)Understanding batchSize?

zacharykane · June 15, 2022, 5:48am

Hello all, I’m working on a small application to query my Atlas. The size of the collection is trivial now, but I’m looking at performance techniques for the future. In using .find() and .aggregate() the node.js driver says they return cursors, which are iterable objects that I can pull out my docs for processing.

My understanding was that either as an option to those methods, or by calling myCursor.batchSize(X) I would be able to set the number of docs I can pull out per iteration (as opposed to .toArray() which I’m using and could become unmanageable

Is that correct?

Right now, looping over my cursor with for...await or while myCursor.hasNext() myCursor.next() I only get one document at a time. I thought I could at least process them 2, 8, etc at a time with batchSize.

alexbevi · June 15, 2022, 5:34pm

@zacharykane,

When setting batchSize for a cursor this is changing the number of documents that will be included in a getMore command’s response. The “optimization” here is limiting the number of network roundtrips required to retrieve the full result set.

From our “Iterating a Cursor” tutorial’s section on Cursor Batches:

find() and aggregate() operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the 16 megabyte message size.

When you call myCursor.toArray() the Driver will retrieve and deserialize the results from the server using as many getMore commands as are needed to exhaust the cursor. If you don’t set a batchSize, then after the first 101 results the next getMore will fetch as many results will fit into 16MB before returning. If you set a batchSize of 1 and you have 1000 results then the Driver will call getMore 999 times.

Right now, looping over my cursor with for…await or while myCursor.hasNext() myCursor.next() I only get one document at a time. I thought I could at least process them 2, 8, etc at a time with batchSize.

The myCursor.next() call iterates the cursor and returns the next result, but this does not involve a network round trip (unless batchSize(1) is set).

zacharykane · June 15, 2022, 8:25pm

Hey @alexbevi !

Okay, so I’m thinking of this incorrectly then. This is more about optimizing the underlying requests to the DB server, and NOT about changing the return value/functioning of the cursors?

Using JS iteration techniques or the .next() api will always just return one doc from the query’s cursor no matter what?