Darshan Hiranandani : Using Confluent Kafka for PostgreSQL to MongoDB Migration – Advice Needed

Darshan_Hiranandani · January 2, 2025, 6:57am

Hello,

I’m Darshan Hiranandani, in the process of migrating from PostgreSQL to MongoDB and want to leverage Confluent Kafka for this task. If anyone has set up a similar migration pipeline, could you share your approach for:

Capturing real-time changes from PostgreSQL.
Streaming the data into MongoDB.
Handling potential challenges during the migration.

Any recommendations or insights would be greatly appreciated!

Thanks!
Regards
Darshan Hiranandani

Nuri_Halperin · January 9, 2025, 6:23pm

Having written custom migration CDC scripts, my key learning points:

Checkpoint. Make the stream reader persist a checkpoint of last-known-good event processed, so you can resume from it.
Idempotent - make sure events are re-playable with no harm to target state. It is crucial for integrity.
Event ordering is crucial. Make sure to NOT parallelize event insertion and consumption. That said, you can probably separately and parallel load individual collections.
Validation is crucial. The presumption that kafka wrote an event you handed it should be validated. Measures target data and ensure migration was successful. Event if PG and Mongo are transactional, doesn’t mean your process is.
Observability: Extensive logging and immediate exit upon error for the writers helped. It ensured the target state doesn’t get too far ahead in face of failed item migration, thus helping resumability. If we just log and continue, you may end up with corrupt target state.
Modeling: though table → collection may work for some use cases, spending time on re-shaping data to be performant and efficient for MongoDB workloads is well worth the effort. Expecting flat table->collection to work well, as it forces excessive joins ($lookup) - an anti-pattern.