Corrupt Atlas backup snapshots

Hi there,

I currently have an M10 cluster running v4.4.18 with a 20GB database residing on it. Atlas takes a daily backup snapshot and keeps it for 7 days. Recently I have been trying to download snapshots to my local machine to do some testing only to find that all the snapshots are corrupt. When I extract the folder from the download and connect to it using mongod I get errors and then the process terminates (I can’t upload the logs as I’m a new user of the community).

The common error I get from all these snapshots is:
{“t”:{"$date":“2023-01-26T13:01:13.273+00:00”},“s”:“E”, “c”:“STORAGE”, “id”:22435, “ctx”:“initandlisten”,“msg”:“WiredTiger error”,“attr”:{“error”:-31802,“message”:"[1674738073:273058][9008:140719278152144], file:sizeStorer.wt, WT_SESSION.open_cursor: int __cdecl __win_file_read(struct __wt_file_handle *,struct __wt_session *,__int64,unsigned __int64,void *), 288: C:/databases/productionv2\sizeStorer.wt: handle-read: ReadFile: failed to read 4096 bytes at offset 24576: Reached the end of the file.\r\n: WT_ERROR: non-specific WiredTiger error"}}

If it’s of any relevance the sizeStorer.wt file is exactly 4096 bytes.

Right now I have zero faith that any snapshots are of actual use if I ever need to restore to my cluster. With nearly 400,000 users and associated data in the database this is of real concern.

Can anybody please advise as to what might be going on and possible solutions. This sort of undermines the exact reason why we’re currently paying for Atlas.

Thanks,
Paul

Hi @Paul_Kenyon welcome to the community!

Sorry to hear you’re having issues with Atlas. Since you’re using a dedicated M10 instance, could you contact Atlas in-app chat support team about this issue? They’ll have the resources to escalate any issues for you and also have more visibility into what’s happening with your backups.

Best regards
Kevin

Hi Kevin. Thanks for taking time to respond. I did contact the in-app chat and they referred me to here.

To answer my problem for anyone else with the same issue…

After I’ve downloaded the snapshot I extract the TAR file, and then extract the database from the TAR using 7-Zip on Windows. After comparing a working downloaded snapshot with the most recent ones, I noticed that the largest collection (over 8GB) was showing as 0 bytes, even though the overall snapshot was the right size. I discovered that 7-Zip can’t handle files in a TAR over 8GB by the looks of it (although not documented anywhere), and doesn’t extract them, hence the corrupt database (which is also why a repair removes this collection). So, fingers crossed, all the snapshots are okay. (My previous working snapshots must have this collection at just under 8GB.)

I still have an issue to do with compatibility versions but I’ve raised that with the in-app support team.

Thanks,
Paul.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.