For example, mongodb collection have 2 fields already.
pyspark dataframe is contains 3 fields with primary key .
so, if I m updating same document on key I want to keep old fields too while writing dataframe . I don’t want to lose old data and update fields which has been in dataframe
Is it possible ? Please suggest if pyspark writing configuration available that would be helpful.
Example as below:
Data present in collection:
New Dataframe:
I want result in mongodb collection as below:
A |
B |
C |
x |
1 |
2 |
y |
0 |
2 |
z |
2 |
2 |
w |
3 |
2 |
Hi Aishwarya,
The write configurations are defined here: https://www.mongodb.com/docs/spark-connector/current/configuration/write/
The relevant config would be the following:
||operationType
|Specifies the type of write operation to perform. You can set this to one of the following values:
-
insert
: insert the data.
-
replace
: replace an existing document that matches the idFieldList
value with the new data. If no match exists, the value of upsertDocument
indicates whether or not the connector inserts a new document.
-
update
: update an existing document that matches the idFieldList
value with the new data. If no match exists, the value of upsertDocument
indicates whether or not the connector inserts a new document.|
| — | — |
Let us know if this answered your question