Thank you very much. I really appreciate you getting back.
In our case I have built a dynamic PySpark engine that works on any incoming data and masks only requested data - saves it back to DB.
So I don’t know income schema - other than - just set of fields that will be masked/manipulated.
We have contract to retain all other data/data-types as-is once masking is done on requested masked fields - with no changes to rest of schema/data.
While PySpark approach proving very performance friendly for large data sets, I have to think through on how to probably detect string type columns and add this expressions - in my dynamic engine.
All this I have to do - just to get _id (String) to _id(ObjectId) ![]()
I only hope authors of MongoDB Spark Connectors will improve this with may be new option - change only ObjectId scenarios etc. But can’t thank you enough for coming forward and helping me on this.
You can close this case.