Introducing MongoDB Spark Connector Version 10.1

Robert Walters

Today, MongoDB released version 10.1 of the MongoDB Spark Connector. In this post, we highlight key features of this new release.

Microbatch streaming support

The MongoDB Spark connection version 10 introduced support for Apache Structured Spark Streaming. In this initial release, continuous mode streaming was the only mode supported. In this 10.1 update, microbatch mode is now supported, enabling you to stream writes to destinations that currently do not support continuous mode streams, such as Amazon S3 storage.

Increased control of write behavior

When the Spark Connector issues a write, the default behavior is for an upsert to occur. This can cause problems in some scenarios in which you may not want an upsert, such as with time series collections. There is a new configuration parameter, upsertDocument, that, when set to false, will only issue insert statements on write.

solar.write.format("mongodb").mode("append").option("database",
"sensors").option("collection", "panels").option("upsertDocument", "false").save()

In the above code snippet we are writing to the "panels" time series collection by setting the upsertDocument to false. Alternatively, you can set operationType to the value, “insert”. Setting this option will ignore any upsertDocument option set.

Support for BSON types

The data types supported in BSON are not exactly the same as those supported in a Spark dataframe. For example, Spark doesn't support ObjectId as a type specifically. To mitigate these scenarios where you need to leverage different BSON types, you can now set the new configuration values :

spark.mongodb.read.outputExtendedJson=<true/false> 
spark.mongodb.write.convertJson=<true/false>

This will enable you to effectively leverage BSON datatypes within your Spark application.

Call to action

Version 10.1 of the MongoDB Spark Connector continues to enhance the streaming capabilities with support for microbatch processing. This version also adds more granular support for writing to MongoDB supporting use cases like time series collections. For those users wanting to upgrade from the 3.x version but could not because of lack of BSON data type support, the 10.1 version now provides an option for using BSON data types. To learn more about the MongoDB Spark Connector check out the online documentation. You can download the latest version of the MongoDB Spark Connector from the maven repository.