How we can protect data corruption in MongoDB? What are the different ways to protect our data from corruption? If RepicaSet configured, will the corrupted data replicated to secondary node?

Ramya_Navaneeth · April 11, 2023, 8:54am

How we can protect data corruption in MongoDB? What are the different ways to protect our data from corruption? If RepicaSet/Shared configured, will the corrupted data replicated to secondary node?

steevej · April 11, 2023, 12:39pm

What do you mean by

All updates are replicated. If you make an update that you did not really wanted, it will still be replicated.

But if you mean corruption as when a disk crash, then the server will probably crash or terminate when trying to do I/O. The rest of the cluster will do the same as if the secondary was terminated cleanly.

Ramya_Navaneeth · April 19, 2023, 7:14am

Thanks Steeve.
Data Corruption- any user error , errors at the storage layer due to hardware failure.

To mitigate hardware failure, replication will help if the storage is not shared. What about any user error? what is the mitigation other than backup?
In Oracle, if the data is corrupted, it wont replicate to other node and RMAN backup also fail. In Mongo, do we have any feature like this?

steevej · April 19, 2023, 8:20pm

Are you telling me that Oracle is smart enough to detect that a user has made a mistake and that the update/delete was a mistake and it will not replicate this error.

WOW I am impressed.

Sorry we are not that lucky. If a user had the credential to delete something, we do not know if it is a mistake or not so it will be replicated. You can have delayed hidden nodes. But an operation, intended or mistake, is eventually replicated.

Hardware failure will crash the server. An election will occur with the remaining nodes. What will happen next is rather complex and rather well explained in the documentation.

If you want a fool’s proof system, don’t let the fools use it.

tapiocaPENGUIN · April 19, 2023, 9:07pm

Storage / resources shouldn’t be shared, IE a 3 node replica set should have different hardware (vms, servers, etc). Otherwise the whole high availability of MongoDB isn’t useful.