I’ve been using Apache Spark(pyspark) to read from MongoDB Atlas, I’ve a shared(free) cluster - which has a limit of 512 MB storage I’m trying to migrate to serverless, but somehow unable to connect to the serverless instance - error
pyspark.sql.utils.IllegalArgumentException: requirement failed: Invalid uri: 'mongodb+srv://vani:<password>@versa-serverless.w9yss.mongodb.net/versa?retryWrites=true&w=majority'
Pls note : I’m able to connect to the instance using pymongo, but not using pyspark.
Here is the pyspark code (Not Working):
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("MongoDB operations").getOrCreate()
print(" spark ", spark)
# cluster0 - is the free version, and i'm able to connect to this
# mongoConnUri = "mongodb+srv://vani:password@cluster0.w9yss.mongodb.net/?retryWrites=true&w=majority"
mongoConnUri = "mongodb+srv://vani:password@versa-serverless.w9yss.mongodb.net/?retryWrites=true&w=majority"
mongoDB = "versa"
collection = "name_map_unique_ip"
df = spark.read\
.format("mongo") \
.option("uri", mongoConnUri) \
.option("database", mongoDB) \
.option("collection", collection) \
.load()
Error :
22/07/26 12:25:36 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
22/07/26 12:25:36 INFO SharedState: Warehouse path is 'file:/Users/karanalang/PycharmProjects/Versa-composer-mongo/composer_dags/spark-warehouse'.
spark <pyspark.sql.session.SparkSession object at 0x7fa1d8b9d5e0>
Traceback (most recent call last):
File "/Users/karanalang/PycharmProjects/Kafka/python_mongo/StructuredStream_readFromMongoServerless.py", line 30, in <module>
df = spark.read\
File "/Users/karanalang/Documents/Technology/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 164, in load
File "/Users/karanalang/Documents/Technology/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in __call__
File "/Users/karanalang/Documents/Technology/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
pyspark.sql.utils.IllegalArgumentException: requirement failed: Invalid uri: 'mongodb+srv://vani:password@versa-serverless.w9yss.mongodb.net/?retryWrites=true&w=majority'
22/07/26 12:25:36 INFO SparkContext: Invoking stop() from shutdown hook
22/07/26 12:25:36 INFO SparkUI: Stopped Spark web UI at http://10.42.28.205:4040
pymongo code (am able to connect using the same uri):
from pymongo import MongoClient
client = MongoClient("mongodb+srv://vani:password@versa-serverless.w9yss.mongodb.net/vani?retryWrites=true&w=majority")
print(client)
all_dbs = client.list_database_names()
print(f"all_dbs : {all_dbs}")
Spark-submit command :
spark-submit --packages org.mongodb.spark:mongo-spark-connector:10.0.2 ~/PycharmProjects/Kafka/python_mongo/StructuredStream_readFromMongoServerless.py
any ideas how to debug/fix this ?
tia!