I am trying to write a basic pyspark script to connect to MongoDB. I am using Spark 3.1.2 and MongoDb driver 3.2.2.
My code is:
from pyspark.sql import SparkSession
# Create a SparkSession
spark = SparkSession.builder.appName("SparkSQL").getOrCreate()
spark = SparkSession \
.builder \
.appName("SparkSQL") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/client.coll") \
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
.getOrCreate()
df = spark.read.format("mongo").load()
When I execute in Pyspark with /usr/local/spark/bin/pyspark --packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.1 I get:
java.lang.NoClassDefFoundError: org/bson/conversions/Bson
I am very new to Spark. Could someone please help me understand how to install the missing Bson reference? I couldn’t see this in the sample code or MongoDB PySpark documentation.
Thanks in advance,
Ben.