When reading a stream from a MongoDB database, the MongoDB Spark Connector supports both micro-batch processing and continuous processing. Micro-batch processing, the default processing engine, achieves end-to-end latencies as low as 100 milliseconds with exactly-once fault-tolerance guarantees. Continuous processing is an experimental feature introduced in Spark version 2.3 that achieves end-to-end latencies as low as 1 millisecond with at-least-once guarantees.
To learn more about continuous processing, see the Spark documentation.
The connector reads from your MongoDB deployment's change stream. To generate change events on the change stream, perform update operations on your database.
To learn more about change streams, see Change Streams in the MongoDB manual.
The following example shows how to stream data from MongoDB to your console.
Inferring the Schema of a Change Stream
When the Spark Connector infers the schema of a DataFrame
read from a change stream, by default,
it uses the schema of the underlying collection rather than that
of the change stream. If you set the
true, the connector uses the schema of the
change stream instead.
For more information about this setting, and to see a full list of change stream configuration options, see the Read Configuration Options guide.
To learn more about the types used in these examples, see the following Apache Spark API documentation: