Backup(full,block based incremental and log based incremental backup) of mongodb database using wiredtiger utility

We have sharded replica mongodb database with wiredtiger as storage engine.
Mongodb tools like mongodump and mongorestore cannot be used to backup database (on sharded cluster) as we cannot stop the writes on the database.

We understand we can utilize wiredtiger utility wt for backing up mongodb.

As per wiredtiger document, WiredTiger backups are “on-line” or “hot” backups, and applications may continue to read and write the databases while a snapshot is taken.

Is it possible to perform mongodb backup using wiredtiger utility without stopping writes on database?

what do you mean by this? i don’t think you need to stop writes to use mongodump. There’s even an option to catch oplog entries.

Please find the extracts from mongodb doc,
Sharded Clusters
mongodump and mongorestore cannot be part of a backup strategy for 4.2+ sharded clusters that have sharded transactions in progress, as backups created with mongodump do not maintain the atomicity guarantees of transactions across shards.

2)Lock the Cluster
The sharded cluster must remain locked during the backup process to protect the database from writes, which may cause inconsistencies in the backup.

So above two itself suggests we cannot use mongodump for backups(production).
Is it possible to create application using wiredtiger for performing hot online backups?

if you don’t use cross-shard transaction, it’s ok to use mongodump. I can’t find where the mongodb official doc says writes have to be stopped to use the tool. If you have such reference, pls share a link.

ok i got this link. It says the cluster has to be locked to avoid writes.

It is simply to avoid writes after the backup, so that the backup from all your nodes are consistent. It doesn’t mean you have to lock writes to use that tool. As i showed earlier, you can use oplog option to catch the writes.

If you don’t use cross shard transaction and you can do backup on the shards somehow at the same point, then your data will be somehow consistent. :slight_smile:

To call out, the main purpose of backup is to reduce data loss, not to fully recover from an outage. So the best way to deal with an outage is always to avoid an outage.

What we currently do is to for each shard, we lock a secondary node (primary still serves writes) and do EBS snapshot and unlock, then go to next shard.

So is our backup fully consistent ? of course not. And here’s a link to my comment on an related post: Can mongo sharded cluster be recreated from secondary DC nodes (both config and shard)? - #2 by Kobe_W

What I understand is mongodump is not recommended for production databases. Also it does not support incremental backup .

will wiredtiger utility help here??