I am trying to connect mongodb from pyspark. I have installed mongodb 6 in AWS EMR instance.
I have installed mongodb spark connector in the EMR. But when i am trying to connect mongodb from spark, i am getting class not found exception
Can someone please help me to connect and read collection from mongodb fromm pyspark
Are you able to connect to MongoDB using regular MongoClient from the EMR instances? This can inform if this a more general networking issue. Heres a thread which talks about such networking issues: Unable to read data from mongoDB using Pyspark or Python
Otherwise here are some questions that can help us understand whats going on.
1)How is your mongodb setup? Is it self hosted or are you using Mongodb Atlas?
2) Can you share which version of MongoDB spark connector are you using?
3) Can you share the detailed error log