Docs Menu

FAQ

For any MongoDB deployment, the Mongo Spark Connector sets the preferred location for a DataFrame or Dataset to be where the data is:

  • For a non sharded system, it sets the preferred location to be the hostname(s) of the standalone or the replica set.
  • For a sharded system, it sets the preferred location to be the hostname(s) of the shards.

To promote data locality,

In MongoDB deployments with mixed versions of mongod, it is possible to get an Unrecognized pipeline stage name: '$sample' error. To mitigate this situation, explicitly configure the partitioner to use and define the Schema when using DataFrames.

←  Structured Streaming with MongoDBRelease Notes →
Give Feedback
© 2022 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2022 MongoDB, Inc.