I am trying to schedule backups on a time series collection and I want to do hourly backups of the last hour of data. On top of daily full backups, I would like to schedule a backup every hour of the past hour of data (ie at 3pm, create a dump for data from 2pm-3pm).
I am trying to set up my shell script as such:
EPOCH_DATE=$(date '+%s%3N')
for COLLECTION in "${COLLECTIONS[@]}"
do
mongodump \
--db=database\
--collection=$COLLECTION \
--query="{ \"createdOn\": { \"\$gte\": {\"\$date\": $EPOCH_DATE} } }"
--out=/dir/backup/
done
But I am getting the following error:
Failed: cannot process query [{createdOn [{$gte 1670480228792}]}] for timeseries collection database.collection mongodump only processes queries on metadata fields for timeseries collections.
Is there a better way of achieving the goal of doing scheduled backups? Or should I change the way I structure my script?
I spoke with @Tim_Fogarty who worked on mongodump.
He explained to me why this constraint exists and to be fair, it’s quite complex. It’s due to the low level implementation of timeseries in MongoDB and explaining all those details won’t help. The conclusion is that there are currently no workaround using mongodump.
If you do not need a point-in-time snapshot using the oplog though (which I think is the case here), you can use mongoexport in a script and achieve basically the same thing. It won’t be as fast as mongodump - but at least this should work properly.
Else you can still write a script and use find() with the appropriate filter to find these docs but it’s a bit more work.
First trivial idea that comes to mind would be to mongodump the entire collection.
It would probably be faster to use a disk snapshot though depending on your production env.
A random idea that could be worth exploring though would be to add an extra field in the metadata (a different one every hours) and you could use this field for your query. As it’s in the metadata this time, it would work with the --query.
I guess you have more than one client so you could come up with an algorithm that generates a new unique ID every hours (first that comes to mind could be to have day 1 that goes from 1 to 24. Day 2 from 25 to 48, etc).