Join us Sept 17 at .local NYC! Use code WEB50 to save 50% on tickets. Learn more >
MongoDB Event
Docs 菜单
Docs 主页
/
Spark Connector
/ /

流式写入配置选项

以流式传输模式向 MongoDB 写入数据时,可以配置以下属性。

注意

如果您使用 SparkConf 设置连接器的写入配置,请为每个属性添加前缀 spark.mongodb.write.

属性名称
说明

connection.uri

Required.
The connection string configuration key.

Default: mongodb://localhost:27017/

database

Required.
The database name configuration.

collection

Required.
The collection name configuration.

comment

The comment to append to the write operation. Comments appear in the output of the Database Profiler.

Default: None

mongoClientFactory

MongoClientFactory configuration key.
You can specify a custom implementation that must implement the com.mongodb.spark.sql.connector.connection.MongoClientFactory interface.

Default: com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory

convertJson

Specifies if the connector parses string values and converts extended JSON into BSON.

This setting accepts the following values:
  • any:Connector 将所有 JSON 值转换为 BSON。

    • "{a: 1}" 变为 {a: 1}

    • "[1, 2, 3]" 变为 [1, 2, 3]

    • "true" 变为 true

    • "01234" 变为 1234

    • "{a:b:c}" 不变。

  • objectOrArrayOnly:Connector 仅将 JSON 对象和数组转换为 BSON。

    • "{a: 1}" 变为 {a: 1}

    • "[1, 2, 3]" 变为 [1, 2, 3]

    • "true" 不变。

    • "01234" 不变。

    • "{a:b:c}" 不变。

  • false:Connector 将所有值保留为字符串。

Default: false

idFieldList

Specifies a field or list of fields by which to split the collection data. To specify more than one field, separate them using a comma as shown in the following example:
"fieldName1,fieldName2"
Default: _id

ignoreNullValues

When true, the connector ignores any null values when writing, including null values in arrays and nested documents.

Default: false

maxBatchSize

Specifies the maximum number of operations to batch in bulk operations.

Default: 512

operationType

Specifies the type of write operation to perform. You can set this to one of the following values:
  • insert:插入数据。

  • replace:将与 idFieldList 值匹配的现有文档替换为新数据。如果不存在匹配项,则 upsertDocument 的值指示 Connector 是否插入新文档。

  • update:使用新数据更新与 idFieldList 值匹配的现有文档。如果不存在匹配项,则 upsertDocument 的值指示 Connector 是否插入新文档。


Default: replace

ordered

Specifies whether to perform ordered bulk operations.

Default: true

upsertDocument

When true, replace and update operations insert the data if no match exists.

For time series collections, you must set upsertDocument to false.

Default: true

writeConcern.w

Specifies w, a write-concern option requesting acknowledgment that the write operation has propagated to a specified number of MongoDB nodes.

For a list of allowed values for this option, see WriteConcern w Option in the ​ manual.

Default: Acknowledged

writeConcern.journal

Specifies j, a write-concern option requesting acknowledgment that the data has been written to the on-disk journal for the criteria specified in the w option. You can specify either true or false.

For more information on j values, see WriteConcern j Option in the ​ manual.

writeConcern.wTimeoutMS

Specifies wTimeoutMS, a write-concern option to return an error when a write operation exceeds the specified number of milliseconds. If you use this optional setting, you must specify a nonnegative integer.

For more information on wTimeoutMS values, see WriteConcern wtimeout in the ​ manual.

checkpointLocation

The absolute file path of the directory where the connector writes checkpoint information.


Default: None

forceDeleteTempCheckpointLocation

A Boolean value that specifies whether to delete existing checkpoint data.

Default: false

如果您使用 SparkConf指定以前的任何设置,则可以将它们包含在connection.uri设置中或单独列出。

以下代码示例显示如何将数据库、集合和 convertJson 设置指定为 connection.uri 设置的一部分:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/myDB.myCollection?convertJson=any

为了缩短 connection.uri 并使设置更易于阅读,您可以改为单独指定它们:

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/
spark.mongodb.write.database=myDB
spark.mongodb.write.collection=myCollection
spark.mongodb.write.convertJson=any

重要

如果您在 connection.uri 及其自己的行中都指定了某个设置,则 connection.uri 设置优先。例如,在以下配置中,连接数据库为 foobar

spark.mongodb.write.connection.uri=mongodb://127.0.0.1/foobar
spark.mongodb.write.database=bar

后退

写入

在此页面上