MongoDB Atlas Stream Processing now supports Apache Avro serialization when integrated with the Confluent Schema Registry, removing key barriers that have made migrating streaming workloads difficult. You no longer have to choose between the flexibility of MongoDB and the performance of binary serialization. Whether you’re building real-time fraud detection, monitoring IoT sensor grids, or synchronizing microservices, MongoDB Atlas Stream Processing provides the tools to do it with confidence and at scale.
Why Apache Avro and Schema Registry matter
In high-throughput streaming environments, JSON’s human readability comes at a cost. JSON repeats field names in every message, creating unnecessary overhead that impacts performance and increases bandwidth use.
Avro’s advantage
Apache Avro is a compact, fast, binary serialization format that relies on a separate schema to interpret binary data rather than embedding field names in each message. This results in message sizes that are typically 60% to 90% the size of their JSON counterparts, leading to significantly lower bandwidth use and higher processing efficiency.
Governance with Schema Registry
Small messages are great, but how do you know what they mean? This is where Schema Registry comes in. It acts as a centralized source of truth for your data’s structure, providing key capabilities that keep streaming pipelines reliable:
Decoupling: Consumers don’t need to coordinate changes manually.
Validation: It ensures that producers only write messages that conform to expected schema definitions, preventing “poison pills” from entering your Apache Kafka topics.
Evolution: It handles schema changes gracefully, ensuring that your pipeline doesn’t break when you add a new field to your data.
How it works
Integrating your Confluent Schema Registry with MongoDB Atlas Stream Processing starts with defining a Schema Registry Connection in your MongoDB Atlas workspace. This connection stores your registry’s URL and credentials, making it available to any stream processor in your project.
Consuming Avro data
When you read from a Kafka topic, MongoDB Atlas Stream Processing automatically detects the schema ID in the message. It fetches the corresponding schema from your registry, caches it for performance, and deserializes the binary payload into a rich BSON document ready for processing.
Producing Avro data
When writing back to Kafka, you provide your target Avro schema. MongoDB Atlas Stream Processing ensures your outbound data matches this schema, validates it against the schema stored in the registry, and emits a compact binary message. If autoregistration is enabled, it will register new schemas automatically.
How to use Avro in your pipelines
Let’s look at how you can implement these features in your pipelines.
Emitting validated Avro messages
When you need to send data to Kafka, you can define your schema directly in the $emit stage. This ensures your output is both performant and compliant with your organization’s data standards. Below is an Avro schema that will be used for the value of a Kafka message.
Consuming Avro-encoded events
In this example, we consume solar energy readings from a Kafka topic. By adding the schemaRegistry field to the $source stage, the processor handles all the heavy lifting of deserialization automatically. The schema defines timestamp as "logicalType": "timestamp-millis", which hints to the $source stage to automatically convert this into an ISODate data type when consuming.
These examples show how MongoDB Atlas Stream Processing handles Avro serialization with Schema Registry integration natively, eliminating the need for custom deserialization code in your applications.
Ready to get started?
With native Apache Avro and Schema Registry support, you can build high-performance streaming pipelines without sacrificing MongoDB’s flexibility. The integration is available now for all MongoDB Atlas Stream Processing users.
Next Steps
For more information, visit the MongoDB Atlas Stream Processing documentation: