Docs Home → MongoDB Spark Connector
Use your local SparkSession's read
method to create a DataFrame
representing a collection.
Note
DataFrame
does not exist as a class in the Java API. Use
Dataset<Row>
to reference a DataFrame.
The following example loads the collection specified in the
SparkConf
:
Dataset<Row> df = spark.read().format("mongodb").load(); // Uses the SparkConf for configuration
To specify a different collection, database, and other read
configuration settings, use the option
method:
Dataset<Row> df = spark.read().format("mongodb").option("database", "<example-database>").option("collection", "<example-collection>").load();
Schema Inference
When you load a Dataset or DataFrame without a schema, Spark samples the records to infer the schema of the collection.
Consider a collection named characters
:
{ "_id" : ObjectId("585024d558bef808ed84fc3e"), "name" : "Bilbo Baggins", "age" : 50 } { "_id" : ObjectId("585024d558bef808ed84fc3f"), "name" : "Gandalf", "age" : 1000 } { "_id" : ObjectId("585024d558bef808ed84fc40"), "name" : "Thorin", "age" : 195 } { "_id" : ObjectId("585024d558bef808ed84fc41"), "name" : "Balin", "age" : 178 } { "_id" : ObjectId("585024d558bef808ed84fc42"), "name" : "Kíli", "age" : 77 } { "_id" : ObjectId("585024d558bef808ed84fc43"), "name" : "Dwalin", "age" : 169 } { "_id" : ObjectId("585024d558bef808ed84fc44"), "name" : "Óin", "age" : 167 } { "_id" : ObjectId("585024d558bef808ed84fc45"), "name" : "Glóin", "age" : 158 } { "_id" : ObjectId("585024d558bef808ed84fc46"), "name" : "Fíli", "age" : 82 } { "_id" : ObjectId("585024d558bef808ed84fc47"), "name" : "Bombur" }
The following operation loads data from the MongoDB collection
specified in SparkConf
and infers the schema:
Dataset<Row> implicitDS = spark.read().format("mongodb").load(); implicitDS.printSchema(); implicitDS.show();
implicitDS.printSchema()
outputs the following schema to the console:
root |-- _id: struct (nullable = true) | |-- oid: string (nullable = true) |-- age: integer (nullable = true) |-- name: string (nullable = true)
implicitDS.show()
outputs the following to the console:
+--------------------+----+-------------+ | _id| age| name| +--------------------+----+-------------+ |[585024d558bef808...| 50|Bilbo Baggins| |[585024d558bef808...|1000| Gandalf| |[585024d558bef808...| 195| Thorin| |[585024d558bef808...| 178| Balin| |[585024d558bef808...| 77| Kíli| |[585024d558bef808...| 169| Dwalin| |[585024d558bef808...| 167| Óin| |[585024d558bef808...| 158| Glóin| |[585024d558bef808...| 82| Fíli| |[585024d558bef808...|null| Bombur| +--------------------+----+-------------+