Docs Menu

Docs HomeMongoDB Spark Connector

Read from MongoDB in Streaming Mode

On this page

  • Overview
  • Example
  • API Documentation

When reading a stream from a MongoDB database, the MongoDB Spark Connector supports both micro-batch processing and continuous processing. Micro-batch processing, the default processing engine, achieves end-to-end latencies as low as 100 milliseconds with exactly-once fault-tolerance guarantees. Continuous processing is an experimental feature introduced in Spark version 2.3 that achieves end-to-end latencies as low as 1 millisecond with at-least-once guarantees.

To learn more about continuous processing, see the Spark documentation.

Note

The connector reads from your MongoDB deployment's change stream. To generate change events on the change stream, perform update operations on your database.

To learn more about change streams, see Change Streams in the MongoDB manual.

The following example shows how to stream data from MongoDB to your console.

Important

Inferring the Schema of a Change Stream

When the Spark Connector infers the schema of a DataFrame read from a change stream, by default, it uses the schema of the underlying collection rather than that of the change stream. If you set the change.stream.publish.full.document.only option to true, the connector uses the schema of the change stream instead.

For more information about this setting, and to see a full list of change stream configuration options, see the Read Configuration Options guide.

To learn more about the types used in these examples, see the following Apache Spark API documentation:

←  Streaming ModeStreaming Read Configuration Options →