Batch.size & poll.max.batch.size

Luan · August 10, 2021, 2:32pm

Hi all!

I’m trying to get an understanding of the difference between batch.size & poll.max.batch.size properties.

As the MongoDB Kafka Connector documentation states, poll.max.batch.size is set based on how many change stream documents are going to be in a single batch when polling for new data. However, when it comes to the batch.size property, the documentation only mentions that it’s the cursors batch size. How do these two properties differ, and what is the task of the batch.size property compared to poll.max.batch.size?

/Luan

Ross_Lawley · August 11, 2021, 9:43am

Hi @Luan,

Great question!

poll.max.batch.size is the maximum number of records the source connector will wait for before publishing the data on the topic.
poll.await.time.ms is the maximum amount of time the source connector will wait before publishing the data on the topic.

So data is published to the topic when either of those limits is reached.

The source connector uses a change stream cursor underneath, which also can be configured.

batch.size configures the MongoDB cursor and specifies the maximum number of change events to return in each batch of the response from the MongoDB cluster. The default is 0 meaning it uses the servers default.

So the poll. configurations are to do with how often the connector should pass data to the topic and the batch.size is to do with the maximum amount of data to retrieve from MongoDB.

I hope that helps,

Ross

system · August 16, 2021, 9:44am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.