Hello,
I’m setting up a MongoDB database on a server running Rocky9 Linux OS. The database is all configured for storage and interface with a Flask application running a database website. I want to set up MongoDB backups on our data, but it was recently considered in the team to set up incremental backups for efficient disk space utilisation. I’m using MongoDB community edition version v6.0.15.
I’m aware that MongoDB does not directly support incremental backups when using mongodump, but via OPS Manager or Atlas. We’d prefer to continue using the free mongodump option as we have been doing but I’m exploring the option of using ‘–oplog’ for doing incremental backups. My logic for this is to run full backups every weekend at midnight, scheduled as a cron job running a shell script and creating a timestamp file to mark the checkpoint from which incremental backups can be run every night. This will repeat every week and we will clear memory off by deleting backups older than 30 days. I did not find comprehensive examples to understand oplog usage better, so I have attempted writing shell scripts for backups which I’m sharing here with my observations. Any guidance on how I can set up incremental backups will be greatly appreciated.
The sequence of steps I have followed:
- I have a full backup shell script that backs up the entire database (a small database that can be wrapped into a ~200MB archive) into a .gz file and creates a reference timestamp successfully. This file works an intended. Code below:
#!/bin/bash
Configuration - filenames and directories for full backup
FULL_BACKUP_DIR=“/data/mongo_backup/full”
LOG_FILE=“${FULL_BACKUP_DIR}/mongo_full_backup.log”
DATE=$(date +“%Y%m%d”)
TIMESTAMP_FILE=“/data/mongo_backup/incremental/last_timestamp”Ensure backup directory exists, create if not
mkdir -p $FULL_BACKUP_DIR
Perform full backup
mongodump --archive=${FULL_BACKUP_DIR}/mongo_full_backup_${DATE}.gz --gzip >> $LOG_FILE 2>&1
Check if the backup was successful
if [[ $? -eq 0 ]]; then
# Update the last timestamp file for incremental backups
NEW_TIMESTAMP=$(date +“%Y-%m-%dT%H:%M:%SZ”)
# write new timestamp to file, if file doesn’t exist, create it and write timestamp
echo $NEW_TIMESTAMP > $TIMESTAMP_FILE
else
echo “Full backup unsuccessful, exiting…” >> $LOG_FILE #exit after logging error message if backup unsuccessful
exit 1
fiClean up old full backup files (older than 30 days)
find $FULL_BACKUP_DIR -name “*.gz” -mtime +30 -exec rm -f {} ;
- I have written a second shell script intended for incremental backup that should ideally use this reference timestamp. I have reconfigured my MongoDB instance to run as a replicaSet with a single member only. I haver initiated the replicaSet in the mongo shell and checked its status successfully before testing out my incremental backup script. Ideally this code should read the timestamp from the previous timestamp file and perform incremental backup accordingly. But I have the following problems:
2.1. --oplog does not take any timestamp arguments and I find the error “2024-07-19T14:12:01.479+0100 error parsing command line options: error parsing positional arguments: provide only one MongoDB connection string. Connection strings must begin with mongodb:// or mongodb+srv:// schemes
2024-07-19T14:12:01.479+0100 try ‘mongodump --help’ for more information” if I include timestamp as ${TIMESTAMP} after --oplog in my mongodump command line.
2.2. Removing the timestamp argument gets rid of the error, but the mongodump command with oplog only ever performs the full database backup and does not do incremental backups.
I’m attaching my incremental backup shell script code below for reference:
#!/bin/bash
Configuration
INCREMENTAL_BACKUP_DIR=“/data/mongo_backup/incremental”
LOG_FILE=“${INCREMENTAL_BACKUP_DIR}/mongo_incremental_backup.log”
DATE=$(date +“%Y%m%dT%H:%M:%SZ”)
TIMESTAMP_FILE=“${INCREMENTAL_BACKUP_DIR}/last_timestamp”Ensure backup directory exists
mkdir -p $INCREMENTAL_BACKUP_DIR
Determine the last timestamp for incremental backup
if [[ -f $TIMESTAMP_FILE ]]; then # check if timestamp file exists
LAST_TIMESTAMP=$(cat $TIMESTAMP_FILE) # read timestamp from file
else
LAST_TIMESTAMP=$(date --date=“yesterday” +“%Y-%m-%dT%H:%M:%SZ”) # set timestamp to yesterday if file doesn’t exist
fiPerform incremental backup using mongodump with oplog
mongodump --uri=“mongodb://localhost:27017/?replicaSet=rs0” --oplog --archive=${INCREMENTAL_BACKUP_DIR}/mongo_oplog_backup_${DATE}.gz --gzip >> $LOG_FILE 2>&1
Check if the backup was successful
if [[ $? -eq 0 ]]; then
# Update the last timestamp
NEW_TIMESTAMP=$(date +“%Y-%m-%dT%H:%M:%SZ”)
# write new timestamp to file, if file doesn’t exist, create it and write timestamp
echo $NEW_TIMESTAMP > $TIMESTAMP_FILE
else
echo “Incremental backup unsuccessful, exiting…” >> $LOG_FILE # exit after logging error message if backup unsuccessful
exit 1
fiClean up old full backup files (older than 30 days)
find $INCREMENTAL_BACKUP_DIR -name “*.gz” -mtime +30 -exec rm -f {} ;
Note: I am maintaining a separate backup folder for full backup and incremental backup, is this a problem? Many thanks in advance for any help from the community and experts.