Read from MongoDB

On this page

You can create a Spark DataFrame to hold data from the MongoDB collection specified in the spark.mongodb.input.uri option which your SparkSession option is using.

Consider a collection named fruit that contains the following documents:

{ "_id" : 1, "type" : "apple", "qty" : 5 }
{ "_id" : 2, "type" : "orange", "qty" : 10 }
{ "_id" : 3, "type" : "banana", "qty" : 15 }

Load the collection into a DataFrame with read.df() from within the sparkR shell.

df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource")


The empty argument (“”) refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.

Spark samples the records to infer the schema of the collection. The following operation prints the schema to the console:


The operation produces the following shell output:

 |-- _id: double (nullable = true)
 |-- qty: double (nullable = true)
 |-- type: string (nullable = true)

Reading with Options

You can add arguments to the read.df() method to specify a MongoDB database and collection. The following example reads from a collection called contacts in a database called people.

df <- read.df("", source = "com.mongodb.spark.sql.DefaultSource",
              database = "people", collection = "contacts")