Docs Menu

Docs HomeMongoDB Spark Connector

Important

In version 10.0.0 and later of the Connector, use the format mongodb to read from and write to MongoDB:

df = spark.read.format("mongodb").load()

Spark Shell

When starting the Spark shell, specify:

  • the --packages option to download the MongoDB Spark Connector package. The following package is available:

    • mongo-spark-connector

  • the --conf option to configure the MongoDB Spark Connnector. These settings configure the SparkConf object.

    Note

    When specifying the Connector configuration via SparkConf, you must prefix the settings appropriately. For details and other available MongoDB Spark Connector options, see the Configuration Options.

For example,

./bin/spark-shell --conf "spark.mongodb.read.connection.uri=mongodb://127.0.0.1/test.myCollection?readPreference=primaryPreferred" \
--conf "spark.mongodb.write.connection.uri=mongodb://127.0.0.1/test.myCollection" \
--packages org.mongodb.spark:mongo-spark-connector_2.12:10.1.1
  • The spark.mongodb.read.connection.uri specifies the MongoDB server address (127.0.0.1), the database to connect (test), and the collection (myCollection) from which to read data, and the read preference.

  • The spark.mongodb.write.connection.uri specifies the MongoDB server address (127.0.0.1), the database to connect (test), and the collection (myCollection) to which to write data. Connects to port 27017 by default.

  • The packages option specifies the Spark Connector's Maven coordinates, in the format groupId:artifactId:version.

Enable MongoDB Connector specific functions and implicits for your SparkSession and Datasets by importing the following package in the Spark shell:

import com.mongodb.spark._

Connection to MongoDB happens automatically when a Dataset action requires a read from MongoDB or a write to MongoDB.

Self-Contained Scala Application

Provide the Spark Core, Spark SQL, and MongoDB Spark Connector dependencies to your dependency management tool.

The following excerpt demonstrates how to include these dependencies in a SBT build.scala file:

scalaVersion := "2.12",
libraryDependencies ++= Seq(
"org.mongodb.spark" %% "mongo-spark-connector_2.12" % "10.1.1",
"org.apache.spark" %% "spark-core" % "3.3.1",
"org.apache.spark" %% "spark-sql" % "3.3.1"
)

When specifying the Connector configuration via SparkSession, you must prefix the settings appropriately. For details and other available MongoDB Spark Connector options, see the Configuration Options.

package com.mongodb
object GettingStarted {
def main(args: Array[String]): Unit = {
/* Create the SparkSession.
* If config arguments are passed from the command line using --conf,
* parse args for the values to set.
*/
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.read.connection.uri", "mongodb://127.0.0.1/test.myCollection")
.config("spark.mongodb.write.connection.uri", "mongodb://127.0.0.1/test.myCollection")
.getOrCreate()
}
}

Troubleshooting

If you get a java.net.BindException: Can't assign requested address,

  • Check to ensure that you do not have another Spark shell already running.

  • Try setting the SPARK_LOCAL_IP environment variable; e.g.

    export SPARK_LOCAL_IP=127.0.0.1
  • Try including the following option when starting the Spark shell:

    --driver-java-options "-Djava.net.preferIPv4Stack=true"

If you have errors running the examples in this tutorial, you may need to clear your local ivy cache (~/.ivy2/cache/org.mongodb.spark and ~/.ivy2/jars).

MongoDB Connector for Spark →
Share Feedback
© 2023 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2023 MongoDB, Inc.