Write to mango from pyspark

Hello,

I’m working on an Ubuntu machine, I installed everything and I normally use Hadoop and pyspark. I’m trying to write a Spark dataframe to MongoDb but I keep get an error. I did all necessary steps but still no luck. I get an error on the mongo config DefaultMongoClientFactory, from the connector doc I can see this value is optional, I even tried to manually write the default value but no luck. Please find the steps with commands/output below:

Blockquote

  • mongo: v3.2.10

  • connector: mongo-spark-connector_2.12-10.2.0.jar

  • spark: 3.1.3

  • dependecies:

    bson-3.2.0.jar

    mongodb-driver-3.2.0.jar

    mongodb-driver-core-3.2.0.jar

print(spark.sparkContext.getConf().toDebugString())

spark.app.id=local-1697738922184

spark.app.name=MongoDB

spark.app.startTime=1697738921889

spark.driver.host=10.0.2.15

spark.driver.port=43157

spark.executor.id=driver

spark.jars.packages=org.mongodb.spark:mongo-spark-connector_2.12-10.2.0

spark.master=local[*]

spark.mongodb.collection=tweets

spark.mongodb.connection.uri=mongodb://localhost:27017/

spark.mongodb.database=twitter_db

spark.rdd.compress=True

spark.serializer.objectStreamReset=100

spark.sql.catalogImplementation=hive

spark.sql.warehouse.dir=file:/home/hduser/Desktop/CA/spark-warehouse

spark.submit.deployMode=client

spark.submit.pyFiles=

spark.ui.showConsoleProgress=true

data.write.format(“mongodb”).mode(“overwrite”).save()

Py4JJavaError: An error occurred while calling o208.save.

: com.mongodb.spark.sql.connector.exceptions.ConfigException: Invalid value com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory for configuration mongoClientFactory

at com.mongodb.spark.sql.connector.config.ClassHelper.createInstance(ClassHelper.java:79)

I have updated those jars + Spark and Scala version, and also added mongo-java-driver-3.9.1.jar but no luck. The error I get now is

NoSuchMethodError: org.apache.spark.sql.catalyst.encoders.RowEncoder$.

I then replaced spark-catalyst 2.12 with 2.13 but I couldn’t even initialize spark session so I revert it back