I use a Mongodb Kafka source connector to do CDC, and I would like to transform the fullDocument of the CDC messages before saving them to the target collection. The transformation is quite complex, so I choose to do it in a Kafka Stream instead of a mongo pipeline, and to push the transformed messages in a new topic and save them to mongo with kafka connect sink.
I use Spring Cloud Stream for the streaming of the CDC messages, and all works fine, but the deserialization of the BSON document in POJO needs “a lot” of custom code with codec registries etc (compared to Spring Data Mongo queries for example). As Kafka seems recommanded by Mongo for CDC, is there a simple and automatic way to deserialize CDC messages from topics?
I’m sorry it’s taken so long for this to get answered, but in short no, you will need to use Pojo, if you really want to be an amazing help in the future, it would be to take Pojo and launch some of your own pushes to it with custom code templates to initiate this.
I’ve been looking at integrating Kafka with Java, as well as implementing Kafka with MongoDB Device Sync, but like you’re noticing it takes a lot of custom code and time.
I would be happy to help improving this part. My idea was to provide a Kafka Stream Serdes for class com.mongodb.client.model.changestream.ChangeStreamDocument to be able to serialize/deserialize automatically the CDC message by configuration. But I don’t know if it can be a mongodb related code?