Empty collection issue

Hello,

I notice whenever I use “ com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler” in a mongoDB sink connector config to move data from a topic to a mongo collection, everything works fine at first, I see the expected documents count in the collection, but after several days the collection ends up empty with 0 document. Can anyone explain why the target collection suddenly becomes empty ? Is it related to the message retention time of the topic?, how can I prevent the collection from becoming empty ?

Thank you.

I am not very familiar with the kafka connector.

I would be surprised if the message retention time within kafka is directly responsible from emptying the collection in MongoDB.

But it might indirectly if you also created a TTL index withing mongo.

Please share the indexes you have on the collection.

Hi @steevej

Thanks a lot for your answer, this is the config I have, I had to redo the entire config.

In my
[SOURCE CLUSTER]
mongos> db.recipes.countDocuments({})
10720
mongos> db.meals.countDocuments({})
3983

and my

[DESTINATION CLUSTER]

db.recipes.countDocuments({})
10720

db.meals.countDocuments({})
3983

So everything looks pretty good now, documents count match for those collections in both clusters,
but if I wait maybe tomorrow or sometimes.

in my [DESTINATION CLUSTER], the collection recipes and meals contain 0 documents.

This is my sink configuration I have

"config": {
  "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
  "confluent.topic.bootstrap.servers": "boot:9092",
  "tasks.max": "1",
  "database": "DEV",
  "topics.regex": "prefix\\.DEV.*",
  "topic.override.prefix.DEV.meals.collection": "meals",
  "topic.override.prefix.DEV.recipes.collection": "recipes",
  "change.data.capture.handler": "com.mongodb.kafka.connect.sink.cdc.mongodb.ChangeStreamHandler"
},

Those collections are automatically created by the sink connector with no index besides the default id one.
Not sure, why the collections become empty after a certain time.

Thank you.

I have added kafka-connector to the thread tags in hope someone more savvy with kafka will jump in.

Because I can only see a limited number of reasons why a collection becomes empty.

  1. By mongod itself using TTL index
  2. By human intervention
  3. By an external process which could be kafka, that’s where the savvy kafkanian can help

I could also see a situation where your destination cluster nodes are running on some container without permanent storage and for some reasons they are restarted and empty because the lack of permanent storage.

To help find out, you may change the credential of all database user with write access to the 2 collections once you have some data. The culprit will barfs out.

If nothing barfs out, it is TTL index. If human, you will be called or emailed. If another process, hopefully logs will gives some clues.