We have setup a Kafka cluster via AWS MSK.
We have added the source mongodb connector - https://www.mongodb.com/docs/kafka-connector/current/source-connector/
We have found an interesting configuration option. https://www.mongodb.com/docs/kafka-connector/current/source-connector/configuration-properties/mongodb-connection/#std-label-source-configuration-mongodb-connection
Name of the database to watch for changes. If not set, the connector watches all databases for changes.
Name of the collection in the database to watch for changes. If not set, the connector watches all collections for changes.
We are interested in streaming all of the data into our kafka cluster so that we can later consume.
How best should we architect this?
It does not make sense to have all the data in a single topic. If we ommit the configuration options
collection we still have to supply a
topic. We are not sure what value
topic should be to seperate the data.
Does it make sense to have the data be split into different topics such as
"$database_name.$collection_name". For example if we had 2 databases that each had 5 collections, we’d have 10 topics in total.
Is there some way to automatically architect the creation of this layout?