Capture id of deleted document with Spark Structured Streaming


I’m trying to replicate a MongoDB collection to Delta Lake using the Spark Connector with structured streaming but there is one problem.
When using the option I won’t get the deleted document. But that is expected.
But if I omit the option, I only get a row with the _data field. All other fields are null.
I would at least expect to have the _id field so I can delete the entry.

Can someone explain me how to capture deleted documents with structured streaming?


can you use something like this:

It will be a SparkConf setting so “”:"[{"$match": {“operationType”: “insert”}]’ for example

ref: MongoDB Connector for Spark V10 and Change Stream - #11 by khang_pham

I tried this…but pipeline didnt triggeted…where u exactly want to add pipeline in structured streaming…