PySpark MongoDb Connector

I am trying to write a basic pyspark script to connect to MongoDB. I am using Spark 3.1.2 and MongoDb driver 3.2.2.

My code is:
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("SparkSQL").getOrCreate()

spark = SparkSession \
    .builder \
    .appName("SparkSQL") \
    .config("spark.mongodb.input.uri", "mongodb://") \
    .config("spark.mongodb.output.uri", "mongodb://") \

df ="mongo").load()

When I execute in Pyspark with /usr/local/spark/bin/pyspark --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 I get:

java.lang.NoClassDefFoundError: org/bson/conversions/Bson

I am very new to Spark. Could someone please help me understand how to install the missing Bson reference? I couldn’t see this in the sample code or MongoDB PySpark documentation.

Thanks in advance,


Looks like you don’t have all the dependencies installed for the MongoDB Spark Connector.

I do have a docker environment that will spin up spark, mongodb and a jypter notebook. This will get you up and running quickly.

Hi Robert, thank you for your reply. My apologies for not getting back to you earlier, I had forgotten about this post.

Thanks for the link to your Docker image, i’ll take a look. Do you have any instructions on how to setup all the dependencies? I have been through the MongoDB Spark documentation and couldn’t find a workable solution.

Thanks in advance,


Are you able to resolve this issue.
I am also facing the same issue. Not finding any suitable solution yet
Saswata Dutta

Hi Saswata,

I don’t remember exactly what the solution was, but I think it might have been an issue with my environment. I would try a clean installation if you can. If you are still having issues, contact me back and i’ll share some pyspark with a mongodb connection and commands for how I submit to the cluster.

Kind regards,


Hi Ben
I am using AWS EMR instance where i installed mongodb 6.
I am using spark 3 up. I have used mongodb-spark connectors as provided by mongodb.
I tried all different option that is availabel in documents. But not luck.
I am trying to connect from notebook
Can you please help

Hi Saswata,

I’m not familiar with AWS EMR so probably not much help to you. The only thing I can think of, is when I submit a job to the cluster I have to specify what packages to load. For example, this is the command I execute:
spark-submit --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 --driver-memory 6G --master spark:// ./

Is it possible that when you execute the notebook, it isn’t including the mongodb packages? Are you able to validate your solution outside of AWS (ie a locally installed cluster & mongodb instance)?



