Significant Latency with Mongo Kafka Connector

We are seeing significant latency with our source connector during high load periods. This delays can reach over an hour. We are currently using v1.9.1, but recently upgraded from v1.6 and noticed the issue on both versions. We believe we’ve isolated the issue to the connector and not the MongoDB instance, Kafka cluster, or network.

I can share the config if desired, but we’ve played around with several of the attributes, so it may be more helpful starting with what attributes we could modify to help with this scenario and I can share current and previous values we’ve set.

What version of MongoDB are you using?

We are currently using Mongo v4.0.4.

You are using an old version of MongoDB, lots of performance improvements were made to Change Streams over the releases. If you can get to the latest that would be helpful. https://www.mongodb.com/docs/manual/administration/change-streams-production-recommendations/#change-stream-optimization

Unfortunately that’s not an option for the short term, though it is on our radar. That being said, we don’t believe the issue is with MongoDB. When we open a change stream locally, we can see an order of magnitude more updates coming through than we see flowing through the Kafka Connector.

To add to the above, we are not seeing issues with CPU, memory, or network utilization on MongoDB. We also don’t see significant replication lag (usually <1sec).

We added JMX metrics support to 1.9, https://www.mongodb.com/docs/kafka-connector/current/monitoring/

The documentation goes into detail on the different metrics to monitor for performance.

Check the Kafka Connect logs for errors/warnings as well.

Are you in a sharded environment?

Upgrading to v1.9 was a thought we had, too. Unfortunately, it’s not working well for us: MongoDB Kafka Connector Logs and Metrics not Showing After Upgrading to 1.9.1

Our MongoDB instance is not sharded. Our Kafka Connectors are deployed in Kubernetes and spread across 6 pods.

We were able to solve this. We’re not sure what did it, but we modified the linger.ms and batch.size settings for the Connector. We also found a very inefficient process that was putting quite a bit of load on Mongo every couple of minutes and improved that system.

I also posted an update on the JMX metrics thread.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.