I have not heard much about Mongo DB backups. Probably because its a distributed system. Do we think in real like scenarios we should backup mongo DB that often. Also how is experience of backing up TBs of data spread across ecosystem of mongoDB … considering a single node SQL backUps and restore can be nightmares…
I’m currently backing up several dozens of TBs on several (primary) shards and it’s feasible but slow. Currently using LVM and rsync (and testing rdiff).
Restore of course is also feasible but requires expertise.
Mongodb will have to :
Reduce the collection file size by allowing a datafile size as a configurable argument, resulting in several files per collection.
It will also need to end with the config servers. It will be just a matter of time.
This is a very good question.
Yes, making backups on distributed systems is more complicated.
The difficulty is mainly in backing up data that is consistent across the many nodes.
On the other hand, replicated systems like MongoDB, offer the ability to backup from different nodes. The replication protocol can also be used to perform continuous backups and offer point in time recovery (recover at the exact time you specify).
In Atlas and OpsManager, the backup facility backups data by syncing the metadata with the collections and using a frozen snapshot of the data when needed. In other words, Atlas and OpsManager provide consistent backups.
Atlas and OpsManager simplifies restoring from a backup image with only few clicks in the UI through its automation feature.