We are seeing significant latency with our source connector during high load periods. This delays can reach over an hour. We are currently using v1.9.1, but recently upgraded from v1.6 and noticed the issue on both versions. We believe we’ve isolated the issue to the connector and not the MongoDB instance, Kafka cluster, or network.
I can share the config if desired, but we’ve played around with several of the attributes, so it may be more helpful starting with what attributes we could modify to help with this scenario and I can share current and previous values we’ve set.
Unfortunately that’s not an option for the short term, though it is on our radar. That being said, we don’t believe the issue is with MongoDB. When we open a change stream locally, we can see an order of magnitude more updates coming through than we see flowing through the Kafka Connector.
To add to the above, we are not seeing issues with CPU, memory, or network utilization on MongoDB. We also don’t see significant replication lag (usually <1sec).
We were able to solve this. We’re not sure what did it, but we modified the linger.ms and batch.size settings for the Connector. We also found a very inefficient process that was putting quite a bit of load on Mongo every couple of minutes and improved that system.
I also posted an update on the JMX metrics thread.