Docs Menu

Docs HomeMongoDB Spark Connector

Important

In version 10.0.0 and later of the Connector, use the format mongodb to read from and write to MongoDB:

df = spark.read.format("mongodb").load()

Dependency Management

Provide the Spark Core, Spark SQL, and MongoDB Spark Connector dependencies to your dependency management tool.

Beginning in version 3.2.0, Apache Spark supports both Scala 2.12 and 2.13. Spark 3.1.3 and previous versions support only Scala 2.12. To provide support for both Scala versions, version 10.2.1 of the Spark Connector produces two artifacts:

  • org.mongodb.spark:mongo-spark-connector_2.12:10.2.1 is compiled against Scala 2.12, and supports Spark 3.1.x and above.

  • org.mongodb.spark:mongo-spark-connector_2.13:10.2.1 is compiled against Scala 2.13, and supports Spark 3.2.x and above.

Important

Use the Spark Connector artifact that's compatible with your versions of Scala and Spark.

The following excerpt from a Maven pom.xml file shows how to include dependencies compatible with Scala 2.12:

<dependencies>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.12</artifactId>
<version>10.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.3.1</version>
</dependency>
</dependencies>

Configuration

When specifying the Connector configuration via SparkSession, you must prefix the settings appropriately. For details and other available MongoDB Spark Connector options, see the Configuring Spark guide.

package com.mongodb.spark_examples;
import org.apache.spark.sql.SparkSession;
public final class GettingStarted {
public static void main(final String[] args) throws InterruptedException {
/* Create the SparkSession.
* If config arguments are passed from the command line using --conf,
* parse args for the values to set.
*/
SparkSession spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.read.connection.uri", "mongodb://127.0.0.1/test.myCollection")
.config("spark.mongodb.write.connection.uri", "mongodb://127.0.0.1/test.myCollection")
.getOrCreate();
// Application logic
}
}
  • The spark.mongodb.read.connection.uri specifies the MongoDB server address(127.0.0.1), the database to connect (test), and the collection (myCollection) from which to read data, and the read preference.

  • The spark.mongodb.write.connection.uri specifies the MongoDB server address(127.0.0.1), the database to connect (test), and the collection (myCollection) to which to write data.

You can use a SparkSession object to write data to MongoDB, read data from MongoDB, create Datasets, and perform SQL operations.

MongoDB Connector for Spark →