How to write ObjectId value using Spark connector 10.1 using Pyspark?

JTBS · September 18, 2023, 9:28pm

Thank you very much. I really appreciate you getting back.
In our case I have built a dynamic PySpark engine that works on any incoming data and masks only requested data - saves it back to DB.

So I don’t know income schema - other than - just set of fields that will be masked/manipulated.

We have contract to retain all other data/data-types as-is once masking is done on requested masked fields - with no changes to rest of schema/data.

While PySpark approach proving very performance friendly for large data sets, I have to think through on how to probably detect string type columns and add this expressions - in my dynamic engine.

All this I have to do - just to get _id (String) to _id(ObjectId)

I only hope authors of MongoDB Spark Connectors will improve this with may be new option - change only ObjectId scenarios etc. But can’t thank you enough for coming forward and helping me on this.

You can close this case.