Our use case is pretty simple - we have a sharded MongoDB cluster with replicas and multiple shards. Currently, we are watching the changes (by using .watch() and connecting to the mongos). These changes are streamed into other parts of our data pipeline.
We are using MongoDB 4.2 community.
When we added a new shard (because our data grows up), I saw an error “Error on remote shard mongoprodnew:27020 :: caused by :: Resume of change stream was not possible, as the resume point may no longer be in the oplog.le, as the resu…” (I guess the last was truncated). And our replication script crashed, as well as the whole feature.
I tried both resumeAfter and startAtOperationTime params to set the starting point. Both caused that error, “Resume of change stream was not possible” - but hey I don’t need to resume, just re-create it for me please?
So whenever we need to add/replace a shard now, we have to completely stop the whole logical replication process, add a shard, wait until it fetches the data chunks, and then start the replication again. What’s even worse, we can’t really write anything into the DB because the changes will be lost - we won’t be able to resume from the point that’s in the past, before the shard is really up and running.
Is there any way to do that without such an unpleasant downtime?