Writing data do a collection

What are the possibilities of only inserting documents with _id that isn’t in the collection already, otherwise reject the document? , I have tried using spark-3.3.2, spark-mongo-connector 10X and instance version 5X with no success.

_id in mongodb has to be unique.. So it should work out of the box.

It doesn’t, it throws a duplicate key error

If insertMany() throws a duplicate key error and does not insert all the non-existing documents what you may try:

/* insert documents in a temporary non-existing collection */
db.temp_collection.insertMany( documents )
/* use $merge to keep existing document and insert new documents */
db.temp_collection.aggregate( [ { "$merge" : {
    "into" : "permanent_collection" ,
    "on" : "_id" ,
    "whenMatched" : "keepExisting" ,
    "whenNotMatched": "insert"
} } ] )

Alternatively, you may also use the $documents stage to avoid a temporary collection.

db.aggregate( [
    { "$documents" : documents } ,
    { "$merge" : {
        "into" : "permanent_collection" ,
        "on" : "_id" ,
        "whenMatched" : "keepExisting" ,
        "whenNotMatched": "insert"
    } }
] )

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.