/ /

/ /

Batch Write Configuration Options

Overview

You can configure the following properties when writing data to MongoDB in batch mode.

Note

If you use SparkConf to set the connector's write configurations, prefix spark.mongodb.write. to each property.

Property name

Description

connection.uri

Required.
The connection string configuration key.

Default: mongodb://localhost:27017/

database

Required.

The database name configuration.

collection

Required.

The collection name configuration.

comment

The comment to append to the write operation. Comments appear in the
output of the Database Profiler.

Default: None

mongoClientFactory

MongoClientFactory configuration key.
You can specify a custom implementation that must implement the
com.mongodb.spark.sql.connector.connection.MongoClientFactory
interface.

Default: com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory

convertJson

Specifies if the connector parses string values and converts extended JSON
into BSON.

This setting accepts the following values:

any: The connector converts all JSON values to BSON.
- "{a: 1}" becomes {a: 1}.
- "[1, 2, 3]" becomes [1, 2, 3].
- "true" becomes true.
- "01234" becomes 1234.
- "{a:b:c}" doesn't change.
objectOrArrayOnly: The connector converts only JSON objects and arrays to BSON.
- "{a: 1}" becomes {a: 1}.
- "[1, 2, 3]" becomes [1, 2, 3].
- "true" doesn't change.
- "01234" doesn't change.
- "{a:b:c}" doesn't change.
false: The connector leaves all values as strings.

Default: false

idFieldList

Specifies a field or list of fields by which to split the collection data. To specify more than one field, separate them using a comma as shown in the following example:

"fieldName1,fieldName2"

Default: _id

ignoreNullValues

When true, the connector ignores any null values when writing,
including null values in arrays and nested documents.

Default: false

maxBatchSize

Specifies the maximum number of operations to batch in bulk
operations.

Default: 512

operationType

Specifies the type of write operation to perform. You can set this to one of the following values:

insert: Insert the data.
replace: Replace an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether the connector inserts a new document.
update: Update an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether the connector inserts a new document.

Default: replace

ordered

Specifies whether to perform ordered bulk operations.

Default: true

upsertDocument

When true, replace and update operations insert the data
if no match exists.

For time series collections, you must set upsertDocument to
false.

Default: true

writeConcern.w

Specifies w, a write-concern option requesting acknowledgment that
the write operation has propagated to a specified number of MongoDB
nodes.

For a list of allowed values for this option, see WriteConcern
w Option in the MongoDB Server
manual.

Default: Acknowledged

writeConcern.journal

Specifies j, a write-concern option requesting acknowledgment that
the data has been written to the on-disk journal for the criteria
specified in the w option. You can specify either true or
false.

For more information on j values, see WriteConcern j
Option in the MongoDB Server
manual.

writeConcern.wTimeoutMS

Specifies wTimeoutMS, a write-concern option to return an error
when a write operation exceeds the specified number of milliseconds. If you
use this optional setting, you must specify a nonnegative integer.

For more information on wTimeoutMS values, see
WriteConcern wtimeout in
the MongoDB Server manual.

truncateMode

Specifies how to truncate a collection when performing an overwrite. You can set this option to one of the following values:

TruncateMode.DROP: Default. Drops the collection.
TruncateMode.TRUNCATE: Deletes all entries in the collection but preserves indexes, collection options, and any sharding configuration. This is slower than a drop operation.

ignoreDuplicatesOnInsert

When set to true, the connector ignores duplicate key errors when performing
unordered insert operations. The data being inserted must include an _id
field value or whichever fields are specified in the idFieldList option.

Default: false

Specifying Properties in `connection.uri`

If you use SparkConf to specify any of the previous settings, you can either include them in the connection.uri setting or list them individually.

The following code example shows how to specify the database, collection, and convertJson setting as part of the connection.uri setting:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/myDB.myCollection?convertJson=any

To keep the connection.uri shorter and make the settings easier to read, you can specify them individually instead:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/
spark.mongodb.write.database=myDB
spark.mongodb.write.collection=myCollection
spark.mongodb.write.convertJson=any

Important

If you specify a setting in both the connection.uri and on its own line, the connection.uri setting takes precedence. For example, in the following configuration, the connection database is foobar:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/foobar
spark.mongodb.write.database=bar

Back

Write

Streaming Mode

Overview

Note

Specifying Properties in connection.uri

Important

Specifying Properties in `connection.uri`