Docs Menu

Docs HomeMongoDB Spark Connector

FAQ

For any MongoDB deployment, the Mongo Spark Connector sets the preferred location for a DataFrame or Dataset to be where the data is:

  • For a non sharded system, it sets the preferred location to be the hostname(s) of the standalone or the replica set.

  • For a sharded system, it sets the preferred location to be the hostname(s) of the shards.

To promote data locality,

In MongoDB deployments with mixed versions of mongod, it is possible to get an Unrecognized pipeline stage name: '$sample' error. To mitigate this situation, explicitly configure the partitioner to use and define the Schema when using DataFrames.

To use mTLS, include the following options when you run spark-submit:

--driver-java-options -Djavax.net.ssl.trustStore=<path to your truststore.jks file> \
--driver-java-options -Djavax.net.ssl.trustStorePassword=<your truststore password> \
--driver-java-options -Djavax.net.ssl.keyStore=<path to your keystore.jks file> \
--driver-java-options -Djavax.net.ssl.keyStorePassword=<your keystore password> \
-conf spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=<path to your truststore.jks file> \
-conf spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStorePassword=<your truststore password> \
-conf spark.executor.extraJavaOptions=-Djavax.net.ssl.keyStore=<path to your keystore.jks file> \
-conf spark.executor.extraJavaOptions=-Djavax.net.ssl.keyStorePassword=<your keystore password> \
←  Structured Streaming with MongoDBRelease Notes →
Share Feedback
© 2023 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2023 MongoDB, Inc.