Desensitization the data

Hello all,

my application accumulate a lot of data in production and I am looking for a tool to copy some data from production to verification environment. The purpose of the tool is to desensitize the personal data, i.e. the telephone, the address, the name, and email etc. There are many collections and the data in these collections may correlated, for example in the user collection there is a document of a real user with name “John Doe” with a telephone number “+33012345678” may have an objectID “5cda7e9e476e7f000113dd44”, and this objectID “5cda7e9e476e7f000113dd44” with the phone number “+33012345678” may appear in another collection, i.e. in collection “meeting” as a participant field. I need to keep the relation valid so that the logic in my application will not broken. In the mean time, I had created some validation on the data schema, for example I will check the format of telephone number(i.e. start with +33 and length 10 - 11 digits), and check the format of country code (uppercase, two characters) etc. I want to make sure the data after desensitization pass the validation. So finally, I need a tool to rewrite the “John Doe” to “James Bond”, the result should be irreversible, means anyone can’t workout “John Doe” from “James Bond”, and the result should be 1-1 mapping, means when it rewrite +33012345678 to +33432109876, it should ensure all of the “+33012345678” be written to “+33432109876” in all of the collections, and no other telephone number will be written to the same one “+33432109876”.

Any suggestions are welcomed,

Thanks,

James

Hello all,

I am not sure if I make the question clear. I suppose it is a common request for many developers. If there are no solution ready to use, please give me some guidance so that I can try to do it by myself.

Thanks,

james