Docs Menu

To create a DataFrame, first create a SparkSession object, then use the object's createDataFrame() function. In the following example, createDataFrame() takes a list of tuples containing names and ages, and a list of column names:

people = spark.createDataFrame([("Bilbo Baggins", 50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])

Write the people DataFrame to the MongoDB database and collection specified in the spark.mongodb.write.connection.uri option by using the write method:

people.write.format("mongodb").mode("append").save()

The above operation writes to the MongoDB database and collection specified in the spark.mongodb.write.connection.uri option when you connect to the pyspark shell.

To read the contents of the DataFrame, use the show() method.

people.show()

In the pyspark shell, the operation prints the following output:

+-------------+----+
| name| age|
+-------------+----+
|Bilbo Baggins| 50|
| Gandalf|1000|
| Thorin| 195|
| Balin| 178|
| Kili| 77|
| Dwalin| 169|
| Oin| 167|
| Gloin| 158|
| Fili| 82|
| Bombur|null|
+-------------+----+

The printSchema() method prints out the DataFrame's schema:

people.printSchema()

In the pyspark shell, the operation prints the following output:

root
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- age: long (nullable = true)
|-- name: string (nullable = true)

If you need to write to a different MongoDB collection, use the .option() method with .write().

To write to a collection called contacts in a database called people, specify the collection and database with .option():

people.write.format("mongodb").mode("append").option("database",
"people").option("collection", "contacts").save()
MongoDB Connector for Spark →
Give Feedback
© 2022 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2022 MongoDB, Inc.