Help using mongo-spark-2.2.x with Atlas

Marcos_Alberto_Perei · February 12, 2021, 7:22pm

First of all, i really sorry if my question it is simply for this forum. I am new in machine learning community and I am heaving to figure solutions during my Phd’s project flows.

I am trying to integrate Spark with my MongoDb Cluster in Atlas and unfortunately, I was unable to do it until now.

I’ve cloned the github repository and run sbt check, but until now I did not realize where is the jars files or what I supposed to do.

I have already the jars files in .ivy2 directory from the haddop-mongoDb connector but it does not working too. Even I am trying to config during launching SparkSession.

If someone could help me i really thanks

Robert_Walters · February 16, 2021, 2:31pm

It might be easier to just use the compiled Spark Connector that is already available in Maven.

config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.12:3.0.0")

Something like this works in Python -

from pyspark.sql import SparkSession

spark = SparkSession.\

builder.\

appName("pyspark-notebook2").\

master("spark://spark-master:7077").\

config("spark.executor.memory", "1g").\

config("spark.mongodb.input.uri","mongodb://mongo1:27017,mongo2:27018,mongo3:27019/Stocks.Source?replicaSet=rs0").\

config("spark.mongodb.output.uri","mongodb://mongo1:27017,mongo2:27018,mongo3:27019/Stocks.Source?replicaSet=rs0").\

config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.12:3.0.0").\

getOrCreate()

Here is an example of using the Spark Connector with MongoDB