Write to MongoDB

On this page

To create a DataFrame, first create a SparkSession object, then use the object’s createDataFrame() function. The sparkR shell provides a default SparkSession object called spark.

To create a DataFrame, use the createDataFrame method to convert an R data.frame to a Spark DataFrame. To save the DataFrame to MongoDB, use the write.df() method:

charactersRdf <- data.frame(list(name=c("Bilbo Baggins", "Gandalf", "Thorin",
                      "Balin", "Kili", "Dwalin", "Oin", "Gloin", "Fili", "Bombur"),
                      age=c(50, 1000, 195, 178, 77, 169, 167, 158, 82, NA)))

charactersSparkdf <- createDataFrame(charactersRdf)
write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource",
         mode = "overwrite")


The empty argument (“”) refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.

The above operation writes to the MongoDB database and collection specified in the spark.mongodb.output.uri option specified in the sparkR shell arguments or SparkSession configuration.

To read the first few rows of the DataFrame, use the head() method.


The operation prints the following output:

           name  age
1 Bilbo Baggins   50
2       Gandalf 1000
3        Thorin  195
4         Balin  178
5          Kili   77
6        Dwalin  169

The printSchema() method prints out the DataFrame’s schema:


In the sparkR shell, the operation prints the following output:

 |-- name: string (nullable = true)
 |-- age: double (nullable = true)

Writing with Options

You can add arguments to the write.df() method to specify a MongoDB database and collection.

The following operation writes the charactersSparkdf data to a MongoDB collection called ages in a database called characters.

write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource",
         mode = "overwrite", database = "characters", collection = "ages")