Important
In version 10.0.0 and later of the Connector, use the format
mongodb
to read from and write to MongoDB:
df = spark.read.format("mongodb").load()
Dependency Management
Provide the Spark Core, Spark SQL, and MongoDB Spark Connector dependencies to your dependency management tool.
The following excerpt is from a Maven pom.xml
file:
<dependencies> <dependency> <groupId>org.mongodb.spark</groupId> <artifactId>mongo-spark-connector</artifactId> <version>10.0.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artifactId> <version>3.0.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.12</artifactId> <version>3.0.2</version> </dependency> </dependencies>
Configuration
When specifying the Connector configuration via SparkSession
, you
must prefix the settings appropriately. For details and other
available MongoDB Spark Connector options, see the
Configuration Options.
package com.mongodb.spark_examples; import org.apache.spark.sql.SparkSession; public final class GettingStarted { public static void main(final String[] args) throws InterruptedException { /* Create the SparkSession. * If config arguments are passed from the command line using --conf, * parse args for the values to set. */ SparkSession spark = SparkSession.builder() .master("local") .appName("MongoSparkConnectorIntro") .config("spark.mongodb.read.connection.uri", "mongodb://127.0.0.1/test.myCollection") .config("spark.mongodb.write.connection.uri", "mongodb://127.0.0.1/test.myCollection") .getOrCreate(); // Application logic } }
- The spark.mongodb.read.connection.uri specifies the
MongoDB server address(
127.0.0.1
), the database to connect (test
), and the collection (myCollection
) from which to read data, and the read preference. - The spark.mongodb.write.connection.uri specifies the
MongoDB server address(
127.0.0.1
), the database to connect (test
), and the collection (myCollection
) to which to write data.
You can use a SparkSession
object to write data to MongoDB, read
data from MongoDB, create Datasets, and perform SQL operations.