Docs Menu

You can create a Spark DataFrame to hold data from the MongoDB collection specified in the option which your SparkSession option is using.

Consider a collection named fruit that contains the following documents:

{ "_id" : 1, "type" : "apple", "qty" : 5 }
{ "_id" : 2, "type" : "orange", "qty" : 10 }
{ "_id" : 3, "type" : "banana", "qty" : 15 }

Assign the collection to a DataFrame with from within the pyspark shell.

df ="mongodb").load()

Spark samples the records to infer the schema of the collection.


The above operation produces the following shell output:

|-- _id: double (nullable = true)
|-- qty: double (nullable = true)
|-- type: string (nullable = true)

If you need to read from a different MongoDB collection, use the .option method when reading data into a DataFrame.

To read from a collection called contacts in a database called people, specify people.contacts in the input URI option.

df ="mongodb").option("uri", "mongodb://").load()
MongoDB Connector for Spark →
Give Feedback
© 2022 MongoDB, Inc.


  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2022 MongoDB, Inc.