IllegalArgumentException: requirement failed: Invalid uri

Hi, I got my URI from Mongo Atlas and I just have to put my password and name of database and collection, but even with that, it keeps giving me this Invalid uri error. I tried changing my password but still I cannot write into my MongoDB collection.

I have the serverless modality if that’s relevant.

This is the uri I’m using (the one given to me by Atlas, password not included):

mongodb+srv://jmcmt87:<password>@twittermongoinstance.db1xm.mongodb.net/twitter_data.aggregated_data?retryWrites=true&w=majority

And in case it’s relevant, this is my configuration:

packages = ','.join([
    'org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1',
    'com.amazonaws:aws-java-sdk:1.11.563',
    'org.apache.hadoop:hadoop-aws:3.2.2',
    'org.apache.hadoop:hadoop-client-api:3.2.2',
    'org.apache.hadoop:hadoop-client-runtime:3.2.2',
    'org.apache.hadoop:hadoop-yarn-server-web-proxy:3.2.2',
    'com.johnsnowlabs.nlp:spark-nlp-spark32_2.12:3.4.2',
    'org.mongodb.spark:mongo-spark-connector_2.12:3.0.1'
])

spark = SparkSession.builder.appName('twitter_app_nlp')\
    .master("local[*]")\
    .config('spark.jars.packages', packages) \
    .config('spark.streaming.stopGracefullyOnShutdown', 'true')\
    .config('spark.hadoop.fs.s3a.aws.credentials.provider', 
            'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider') \
    .config('spark.hadoop.fs.s3a.access.key', ACCESS_KEY) \
    .config('spark.hadoop.fs.s3a.secret.key', SECRET_ACCESS_KEY) \
    .config("spark.hadoop.fs.s3a.impl",
            "org.apache.hadoop.fs.s3a.S3AFileSystem") \
    .config('spark.sql.shuffle.partitions', 3) \
    .config("spark.driver.memory","8G")\
    .config("spark.driver.maxResultSize", "0") \
    .config("spark.kryoserializer.buffer.max", "2000M")\
    .config("spark.mongodb.input.uri", mongoDB) \
    .config("spark.mongodb.output.uri", mongoDB) \
    .getOrCreate()

Where mongDB variable is the string of the uri aforementioned.

What could be the problem?

1 Like

Can you connect by shell?
Is your db name twitter_data.aggregations_data correct?

My db name is twitter_data, but the collection where I want to write everything is aggregated_data so I thought it would be twitter_data.aggregated_data

Using the shell I can connect to the db, but only to the db, not the collection. If I try to put the collection as in twitter_data.aggregated_data it says it doesn’t exist but I have it created in the twitter_data db in Atlas

Just tried to write to the database alone (no collection added), as I can connect to it through the shell, but I still get the same error message

You connect to your db not collection thru uri
Once you are connected to your db you can create/query your collection
What operation you performed after connecting to your db with shell and what error you got?

In the shell I just tried connecting to it like this:

mongosh "mongodb+srv://twittermongoinstance.db1xm.mongodb.net/twitter_data" --apiVersion 1 --username 'jmcmt87'

But what I want to do is writing my data from PySpark to the MongoDB database in Atlas, this is the command I use:

agg_df.write.format("mongo").mode("append").option("uri", mongoDB).save()

Where mongoDB variable is the following string:

'mongodb+srv://jmcmt87:<password>@twittermongoinstance.db1xm.mongodb.net/twitter_data?retryWrites=true&w=majority'

I used these settings to create the spark session:

spark = SparkSession.builder \
    .appName(appName) \
    .config("spark.mongodb.input.uri", "mongodb+srv://user:password@cluster.url.net/databasename?retryWrites=true&w=majority") \
    .config("spark.mongodb.output.uri", "mongodb+srv://user:password@cluster.url.net/databasename?retryWrites=true&w=majority") \
    .getOrCreate()

These for writing:

# Create dataframe named df
df.write.format("mongo").option('spark.mongodb.output.collection', 'collection_name')\
    .mode("append") \
    .save()

And these for reading:

# Read data from MongoDB
df = spark.read.format('mongo').option("spark.mongodb.input.collection", "collection_name").load()
df.printSchema()
df.show()

Hope this help!!

i’m having similar issues - i’ve created a separate post.
Also - here s the stackoverflow link