Writing configuration for upsert with pyspark in mongodb

Aishwarya_N_A · December 13, 2022, 12:32pm

For example, mongodb collection have 2 fields already.

pyspark dataframe is contains 3 fields with primary key .
so, if I m updating same document on key I want to keep old fields too while writing dataframe . I don’t want to lose old data and update fields which has been in dataframe

Is it possible ? Please suggest if pyspark writing configuration available that would be helpful.

Example as below:

Data present in collection:

A	B
x	1
y	1
z	1

New Dataframe:

A	B	C
x		2
y	0	2
z	2	2

I want result in mongodb collection as below:

A	B	C
x	1	2
y	0	2
z	2	2
w	3	2

Prakul_Agarwal · April 23, 2023, 4:50am

Hi Aishwarya,
The write configurations are defined here: https://www.mongodb.com/docs/spark-connector/current/configuration/write/

The relevant config would be the following:
||operationType|Specifies the type of write operation to perform. You can set this to one of the following values:

insert: insert the data.
replace: replace an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether or not the connector inserts a new document.
update: update an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether or not the connector inserts a new document.|
| — | — |

Let us know if this answered your question