Capture id of deleted document with Spark Structured Streaming

Hi,

I’m trying to replicate a MongoDB collection to Delta Lake using the Spark Connector with structured streaming but there is one problem.
When using the option change.stream.publish.full.document.only=true I won’t get the deleted document. But that is expected.
But if I omit the option, I only get a row with the _data field. All other fields are null.
I would at least expect to have the _id field so I can delete the entry.

Can someone explain me how to capture deleted documents with structured streaming?

Thanks,
Amer

can you use something like this:

It will be a SparkConf setting so “spark.mongodb.read.aggregation.pipeline”:“[{”$match": {“operationType”: “insert”}]’ for example

ref: MongoDB Connector for Spark V10 and Change Stream - #11 by khang_pham

I tried this…but pipeline didnt triggeted…where u exactly want to add pipeline in structured streaming…