For example, mongodb collection have 2 fields already.
pyspark dataframe is contains 3 fields with primary key .
so, if I m updating same document on key I want to keep old fields too while writing dataframe . I don’t want to lose old data and update fields which has been in dataframe
Is it possible ? Please suggest if pyspark writing configuration available that would be helpful.
Example as below:
Data present in collection:
New Dataframe:
I want result in mongodb collection as below:
| A |
B |
C |
| x |
1 |
2 |
| y |
0 |
2 |
| z |
2 |
2 |
| w |
3 |
2 |
Hi Aishwarya,
The write configurations are defined here: https://www.mongodb.com/docs/spark-connector/current/configuration/write/
The relevant config would be the following:
||operationType|Specifies the type of write operation to perform. You can set this to one of the following values:
-
insert: insert the data.
-
replace: replace an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether or not the connector inserts a new document.
-
update: update an existing document that matches the idFieldList value with the new data. If no match exists, the value of upsertDocument indicates whether or not the connector inserts a new document.|
| — | — |
Let us know if this answered your question