Hi,
We are facing performance issues with change streams, and I’m reaching out here to see if there are any answers to the following questions:
- Is there is anything we have missed configuration wise to improve the performance?
- Is there any performance enhancements coming to the product, solving the issue below?
For us, change streams looked to be the perfect fit, avoiding using Kafka or similar products in order to act on db events in a reactive manner.
The driver api supports filtering change streams on collection, and the mongodb change stream documentation also mentions this approach in order to retrieve events for a single collection.
There are performance considerations in the document, but non regarding the issue we are facing.
Imagine the following scenario:
- You have two collections, “planes” and “flights”.
- “planes” is updated once a week, triggering a change stream event.
- “flights” is updated 1000 times a day, each triggering change stream events.
- You have opened two change streams, one for each collection, in order to notify a consumer on updates.
As you don’t want to miss any event, you save the resumetoken for each processed event, making sure you know whats been processed in case of failure.
Say that the last “planes” event was 7 days ago, and the last “flights” event was now.
If the app restarts (for whatever reason) it should then resume from the last processed event.
However, since the last event for “planes” was 7 days ago, it then triggers the db to do a scan of all events from 7 days ago up until now (including all those 7000 flights events).
This causes the CPU to spike on the database until the db has scanned all those events.
If you scale this scenario up to 50-100 collections, then that causes issues up to minutes before every underlying cursor is up-to-date. Assuming a few of those collections does not generate enough events.
As I understand it, the reason/issue is that the oplog is the same for the entire db and there is no indexing on collection level.
I feel the filter on collection in the api, as it works now, isn’t very practical unless you have a single collection on a single cluster.
Sure, one could skip the filtering and saving all resume tokens - but that would result in the db sending all those events back to the client instead.
Is there anything we can do to avoid removing the usage of change streams? Cause we really like using change streams!
Or should we just skip them because the above can’t be solved - and then take the easy route going with Kafka.
Thanks in advance!
Best regards
Mike