Split data on kafka into topics

Kay_Khan · July 17, 2023, 5:47pm

Hi friends,

We have setup a Kafka cluster via AWS MSK.

We have added the source mongodb connector - https://www.mongodb.com/docs/kafka-connector/current/source-connector/

We have found an interesting configuration option. https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/mongodb-connection/#std-label-source-configuration-mongodb-connection

Name of the database to watch for changes. If not set, the connector watches all databases for changes.

Name of the collection in the database to watch for changes. If not set, the connector watches all collections for changes.

We are interested in streaming all of the data into our kafka cluster so that we can later consume.

How best should we architect this?

It does not make sense to have all the data in a single topic. If we ommit the configuration options database and collection we still have to supply a topic. We are not sure what value topic should be to seperate the data.

Does it make sense to have the data be split into different topics such as "$database_name.$collection_name". For example if we had 2 databases that each had 5 collections, we’d have 10 topics in total.

Is there some way to automatically architect the creation of this layout?