MongoDB using --oplog for incremental backups

Sathya_Sai_Seetharaman · July 20, 2024, 11:44am

Hello,

I’m setting up a MongoDB database on a server running Rocky9 Linux OS. The database is all configured for storage and interface with a Flask application running a database website. I want to set up MongoDB backups on our data, but it was recently considered in the team to set up incremental backups for efficient disk space utilisation. I’m using MongoDB community edition version v6.0.15.

I’m aware that MongoDB does not directly support incremental backups when using mongodump, but via OPS Manager or Atlas. We’d prefer to continue using the free mongodump option as we have been doing but I’m exploring the option of using ‘–oplog’ for doing incremental backups. My logic for this is to run full backups every weekend at midnight, scheduled as a cron job running a shell script and creating a timestamp file to mark the checkpoint from which incremental backups can be run every night. This will repeat every week and we will clear memory off by deleting backups older than 30 days. I did not find comprehensive examples to understand oplog usage better, so I have attempted writing shell scripts for backups which I’m sharing here with my observations. Any guidance on how I can set up incremental backups will be greatly appreciated.

The sequence of steps I have followed:

I have a full backup shell script that backs up the entire database (a small database that can be wrapped into a ~200MB archive) into a .gz file and creates a reference timestamp successfully. This file works an intended. Code below:

#!/bin/bash

Configuration - filenames and directories for full backup

FULL_BACKUP_DIR=“/data/mongo_backup/full”
LOG_FILE=“${FULL_BACKUP_DIR}/mongo_full_backup.log”
DATE=$(date +“%Y%m%d”)
TIMESTAMP_FILE=“/data/mongo_backup/incremental/last_timestamp”

Ensure backup directory exists, create if not

mkdir -p $FULL_BACKUP_DIR

Perform full backup

mongodump --archive=${FULL_BACKUP_DIR}/mongo_full_backup_${DATE}.gz --gzip >> $LOG_FILE 2>&1

Check if the backup was successful

if [[ $? -eq 0 ]]; then
# Update the last timestamp file for incremental backups
NEW_TIMESTAMP=$(date +“%Y-%m-%dT%H:%M:%SZ”)
# write new timestamp to file, if file doesn’t exist, create it and write timestamp
echo $NEW_TIMESTAMP > $TIMESTAMP_FILE
else
echo “Full backup unsuccessful, exiting…” >> $LOG_FILE #exit after logging error message if backup unsuccessful
exit 1
fi

Clean up old full backup files (older than 30 days)

find $FULL_BACKUP_DIR -name “*.gz” -mtime +30 -exec rm -f {} ;

I have written a second shell script intended for incremental backup that should ideally use this reference timestamp. I have reconfigured my MongoDB instance to run as a replicaSet with a single member only. I haver initiated the replicaSet in the mongo shell and checked its status successfully before testing out my incremental backup script. Ideally this code should read the timestamp from the previous timestamp file and perform incremental backup accordingly. But I have the following problems:
2.1. --oplog does not take any timestamp arguments and I find the error “2024-07-19T14:12:01.479+0100 error parsing command line options: error parsing positional arguments: provide only one MongoDB connection string. Connection strings must begin with mongodb:// or mongodb+srv:// schemes
2024-07-19T14:12:01.479+0100 try ‘mongodump --help’ for more information” if I include timestamp as ${TIMESTAMP} after --oplog in my mongodump command line.
2.2. Removing the timestamp argument gets rid of the error, but the mongodump command with oplog only ever performs the full database backup and does not do incremental backups.
I’m attaching my incremental backup shell script code below for reference:

#!/bin/bash

Configuration

INCREMENTAL_BACKUP_DIR=“/data/mongo_backup/incremental”
LOG_FILE=“${INCREMENTAL_BACKUP_DIR}/mongo_incremental_backup.log”
DATE=$(date +“%Y%m%dT%H:%M:%SZ”)
TIMESTAMP_FILE=“${INCREMENTAL_BACKUP_DIR}/last_timestamp”

Ensure backup directory exists

mkdir -p $INCREMENTAL_BACKUP_DIR

Determine the last timestamp for incremental backup

if [[ -f $TIMESTAMP_FILE ]]; then # check if timestamp file exists
LAST_TIMESTAMP=$(cat $TIMESTAMP_FILE) # read timestamp from file
else
LAST_TIMESTAMP=$(date --date=“yesterday” +“%Y-%m-%dT%H:%M:%SZ”) # set timestamp to yesterday if file doesn’t exist
fi

Perform incremental backup using mongodump with oplog

mongodump --uri=“mongodb://localhost:27017/?replicaSet=rs0” --oplog --archive=${INCREMENTAL_BACKUP_DIR}/mongo_oplog_backup_${DATE}.gz --gzip >> $LOG_FILE 2>&1

Check if the backup was successful

if [[ $? -eq 0 ]]; then
# Update the last timestamp
NEW_TIMESTAMP=$(date +“%Y-%m-%dT%H:%M:%SZ”)
# write new timestamp to file, if file doesn’t exist, create it and write timestamp
echo $NEW_TIMESTAMP > $TIMESTAMP_FILE
else
echo “Incremental backup unsuccessful, exiting…” >> $LOG_FILE # exit after logging error message if backup unsuccessful
exit 1
fi

Clean up old full backup files (older than 30 days)

find $INCREMENTAL_BACKUP_DIR -name “*.gz” -mtime +30 -exec rm -f {} ;

Note: I am maintaining a separate backup folder for full backup and incremental backup, is this a problem? Many thanks in advance for any help from the community and experts.

Sathya_Sai_Seetharaman · July 31, 2024, 12:04pm

UPDATE:

I was taking the wrong approach in using the --gzip and --archive options for the incremental backups as the --oplog option did not seem to work well with these options.

I instead replaced it with:
mongodump -h localhost -d local -c oplog.rs --queryFile ${QUERY_DIR}/query.js --port 27017 -o ${INCREMENTAL_BACKUP_DIR}/$DATE >> $LOG_FILE 2>&1

where,
-h is the host id which is simply localhost locally on my server,
-d is the database I want to dump from, ‘local’ is the default location where oplog.rs is maintained
-c is the collection to dump which is oplog.rs
–queryFile - I replaced the timestamping method in my old code with a single query.js file (whose contents are {“ts”:{“$gt”:{“$timestamp”:{“t”:1722261163,“i”:1}}}}). The timestamp is generated in a simple “$date +%s” string format.
-o is the dump location

To restore full database:
I can still restore my archived base copy first using (from terminal not mongo shell):
$mongorestore --gzip --archive=<full_backup_name>.gz

Then proceed to apply my incremental backups, in the same sequence in which they were created as below:

$mongorestore -h localhost --port 27017 --oplogReplay  --dir incremental_backup_main_directory/incremental_backup_<date1> --oplogFile=incremental_backup_main_directory/incremental_backup_<date1>/oplog.rs.bson
$mongorestore -h localhost --port 27017 --oplogReplay  --dir incremental_backup_main_directory/incremental_backup_<date2> --oplogFile=incremental_backup_main_directory/incremental_backup_<date2>/oplog.rs.bson

Hope this is helpful to someone.

system · August 5, 2024, 12:05pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.