Hi all,
I’m working on improving MTTR (Mean Time To Recovery) in our MongoDB replica set by enabling faster recovery from accidental delete or update operations.
Here’s my approach:
Take periodic backups of the local.oplog.rs collection (e.g., every 15 minutes) via cron.
Store those backups in cloud storage such as Azure Blob Storage or AWS S3.
In the event of an unwanted operation (like data deletion or an incorrect update), I plan to:
Extract and analyze the oplog entries for the specific time window.
Replay only the necessary entries to restore the affected data (ideally into a staging DB first, then into production after validation).
I’m looking for guidance on the feasibility and best practices for this setup.
Specific questions:
Is it safe and supported to periodically dump or tail the oplog.rs from a secondary node for backup purposes?
Can these raw oplog entries be replayed selectively to restore just the lost/modified data?
What is the recommended method to apply oplog entries to a target MongoDB instance without using a full mongorestore?
Are there any official tools or community solutions that support this kind of point-in-time recovery using oplogs?
Is storing oplog backups in cloud blob storage a valid and reliable disaster recovery strategy?
My current environment:
MongoDB 6.x
Replica set deployment
Regular mongodump/snapshot backups already in place for mongo data disk
This strategy is intended to augment our existing backups with faster, incremental recovery options using oplog deltas.
Would appreciate any insights, caveats, or alternatives.
Thanks!