I notice whenever I use “ com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler” in a mongoDB sink connector config to move data from a topic to a mongo collection, everything works fine at first, I see the expected documents count in the collection, but after several days the collection ends up empty with 0 document. Can anyone explain why the target collection suddenly becomes empty ? Is it related to the message retention time of the topic?, how can I prevent the collection from becoming empty ?
Those collections are automatically created by the sink connector with no index besides the default id one.
Not sure, why the collections become empty after a certain time.
I have added kafka-connector to the thread tags in hope someone more savvy with kafka will jump in.
Because I can only see a limited number of reasons why a collection becomes empty.
By mongod itself using TTL index
By human intervention
By an external process which could be kafka, that’s where the savvy kafkanian can help
I could also see a situation where your destination cluster nodes are running on some container without permanent storage and for some reasons they are restarted and empty because the lack of permanent storage.
To help find out, you may change the credential of all database user with write access to the 2 collections once you have some data. The culprit will barfs out.
If nothing barfs out, it is TTL index. If human, you will be called or emailed. If another process, hopefully logs will gives some clues.