On this page
In this guide, you can learn how to apply schemas to incoming documents in a MongoDB Kafka source connector.
There are two types of schema in Kafka Connect, key schema and value schema. Kafka Connect sends messages to Apache Kafka containing both your value and a key. A key schema enforces a structure for keys in messages sent to Apache Kafka. A value schema enforces a structure for values in messages sent to Apache Kafka.
Note on Terminology
The word "key" has a slightly different meaning in the context of BSON and Apache Kafka. In BSON, a "key" is a unique string identifier for a field in a document.
In Apache Kafka, a "key" is a byte array sent in a message used to determine
what partition of a topic to write the message to. Kafka keys can be
duplicates of other keys or
Specifying schemas in the connector is optional, and you can specify any of the following combinations of schemas:
Only a value schema
Only a key schema
Both a value and key schema
Benefits of Schema
To see a discussion on the benefits of using schemas with Kafka Connect, see this article from Confluent.
If you want to send data through Apache Kafka with a specific data format, such as Apache Avro or JSON Schema, see the Converters guide.
To learn more about keys and values in Apache Kafka, see the official Apache Kafka introduction.
The connector provides two default schemas:
To learn more about change events, see our guide on change streams.
To learn more about default schemas, see the default schemas here in the MongoDB Kafka Connector source code.
The connector provides a default key schema for the
_id field of change
event documents. You should use the default key schema unless you remove the
_id field from your change event document using either of the transformations
described in this guide here.
If you specify either of these transformations and want to use a key schema for your incoming documents, you must specify a key schema as described in the specify a schema section of this guide.
You can enable the default key schema with the following option:
The connector provides a default value schema for change event documents. You should use the default value schema unless you transform your change event documents as described in this guide here.
If you specify either of these transformations and want to use a value schema for your incoming documents, you must use one of the mechanisms described in the schemas for transformed documents section of this guide.
You can enable the default value schema with the following option:
There are two ways you can transform your change event documents in a source connector:
An aggregation pipeline that modifies the structure of change event documents
If you transform your MongoDB change event documents, you must do the following to apply schemas:
To learn more about the preceding configuration options, see the Change Stream Properties page.
You can specify schemas for incoming documents using Avro schema syntax. Click on the following tabs to see how to specify a schema for document values and keys:
To view an example that demonstrates how to specify a schema, see the Specify a Schema usage example.
To learn more about Avro Schema, see the Data Formats guide.
If you want to send your data through Apache Kafka with Avro binary encoding, you must use an Avro converter. For more information, see the guide on Converters.
You can have your source connector infer a schema for incoming documents. This option works well for development and for data sources that do not frequently change structure, but for most production deployments we recommend that you specify a schema.
You can have the connector infer a schema by specifying the following options:
The source connector can infer schemas for incoming documents that
contain nested documents stored in arrays. Starting in Version 1.9 of the
connector, schema inference will gather the appropriate data type
for fields instead of defaulting to a
string type assignment if there are
differences between nested documents described by the following cases:
A field is present in one document but missing in another.
A field is present in one document but
A field is an array with elements of any type in one document but has additional elements or elements of other data types in another.
A field is an array with elements of any type in one document but an empty array in another.
If field types conflict between nested documents, the connector
pushes the conflict down to the schema for the field and defaults to a
string type assignment.
Cannot Infer Key Schema
The connector does not support key schema inference. If you want to use a key schema and transform your MongoDB change event documents, you must specify a key schema as described in the specify schemas section of this guide.