Spark MongoDB Connector - Updating documents like Delta Lake

Puviarasu_S · December 6, 2021, 9:54pm

Hello Team,

We are using Spark Mongo connector to write data from our Databricks Delta Lake.

When we use the Spark write mode as “append”, we could see that if the _id from the dataframe is already existing in MongoDB, the document itself is getting replaced with the new document from the dataframe.

We would like to merge the documents, add new elements to array fields of existing MongoDB documents with Spark. More like Spark Databricks Delta upsert.

Is it possible with Spark Mongo Connector?

Regards,
Puviarasu S.

Puviarasu_S · December 6, 2021, 10:59pm

Hello Team,

I am able to find the option “replaceDocument” → “false” which when enabled is not replacing fields.

But is there an option to push elements into existing arrays with Spark MongoDB connector?

Consider the scenario:
MongoDB: {_id: 123, field1 : [‘a’, ‘b’], field2: ‘value’ }
Spark Dataframe: {_id: 123, field1 : [‘c’, ‘d’]}
Current Output: {_id: 123, field1 : [‘c’, ‘d’], field2: ‘value’ }
Expected Output: {_id: 123, field1 : [‘a’, ‘b’, ‘c’, ‘d’], field2: ‘value’ }

Is the expected output possible with Spark Mongo connector?

Thank you.

Regards,
Puviarasu S.