/ /

Specify a Schema

This usage example demonstrates how you can configure your MongoDB Kafka source connector to apply a custom schema to your data. A schema is a definition that specifies the structure and type information about data in an Apache Kafka topic. Use a schema when you must ensure the data on the topic populated by your source connector has a consistent structure.

To learn more about using schemas with the connector, see the Apply Schemas guide.

Example

Suppose your application keeps track of customer data in a MongoDB collection, and you want to publish this data to a Kafka topic. You want the subscribers of the customer data to receive consistently formatted data. You choose to apply a schema to your data.

Your requirements and your solutions are as follows:

Requirement	Solution
Receive customer data from a MongoDB collection	Configure a MongoDB source connector to receive updates to data from a specific database and collection. See Receive Data from a Collection.
Provide the customer data schema	Specify a schema that corresponds to the structure and data types of the customer data. See Create a Custom Schema.
Omit Kafka metadata from the customer data	Include only the data from the `fullDocument` field. See Omit Metadata from Published Records.

For the full configuration file that meets the requirements above, see Specify the Configuration.

Receive Data from a Collection

To configure your source connector to receive data from a MongoDB collection, specify the database and collection name. For this example, you can configure the connector to read from the purchases collection in the customers database as follows:

database=customers
collection=purchases

Create a Custom Schema

A sample customer data document from your collection contains the following information:

{
  "name": "Zola",
  "visits": [
    {
      "$date": "2021-07-25T17:30:00.000Z"
    },
    {
      "$date": "2021-10-03T14:06:00.000Z"
    }
  ],
  "goods_purchased": {
    "apples": 1,
    "bananas": 10
  }
}

From the sample document, you decide your schema should present the fields using the following data types:

Field name	Data types	Description
name	string	Name of the customer
visits	array of timestamps	Dates the customer visited
goods_purchased	map of string (the assumed type) to integer values	Names of goods and quantity of each item the customer purchased

You can describe your data using the Apache Avro schema format as shown in the example schema below:

{
  "type": "record",
  "name": "Customer",
  "fields": [{
      "name": "name",
      "type": "string"
    },{
      "name": "visits",
      "type": {
        "type": "array",
        "items": {
          "type": "long",
          "logicalType": "timestamp-millis"
        }
      }
    },{
      "name": "goods_purchased",
      "type": {
        "type": "map",
        "values": "int"
      }
    }
  ]
}

Important

Converters

If you want to send your data through Apache Kafka with Avro binary encoding, you must use an Avro converter. For more information, see the guide on Converters.

Omit Metadata from Published Records

The connector publishes the customer data documents and metadata that describes the document to a Kafka topic. You can set the connector to include only the document data contained in the fullDocument field of the record using the following setting:

publish.full.document.only=true

For more information on the fullDocument field, see the Change Streams guide.

Specify the Configuration

Your custom schema connector configuration should resemble the following:

connector.class=com.mongodb.kafka.connect.MongoSourceConnector
connection.uri=<your MongoDB connection URI>
database=customers
collection=purchases
publish.full.document.only=true
output.format.value=schema
output.schema.value={\"type\": \"record\", \"name\": \"Customer\", \"fields\": [{\"name\": \"name\", \"type\": \"string\"}, {\"name\": \"visits\", \"type\": {\"type\": \"array\", \"items\": {\"type\": \"long\", \"logicalType\": \"timestamp-millis\"}}}, {\"name\": \"goods_purchased\", \"type\": {\"type\": \"map\", \"values\": \"int\"}}]}
value.converter.schemas.enable=true
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter=org.apache.kafka.connect.storage.StringConverter

Note

Embedded Schema

In the preceding configuration, the Kafka Connect JSON Schema Converter embeds the custom schema in your messages. To learn more about the JSON Schema converter, see the Converters guide.

For more information on specifying schemas, see the Apply Schemas guide.

Back

Copy Existing Data

Fundamentals