MongoDB fills disks with WiredTigerLog.XXXXXXXXXX

Hi,

We are using MongoDB 4.2.0 running in Windows Server 2019 environment. We have a replica set with 2 data nodes and an arbiter. After restarting the service on one of the data nodes, it tried to recover, but complained about not finding a specific file (WiredTigerLog.0000000006):

2021-11-19T09:38:24.226+0100 I  STORAGE  [initandlisten] WiredTiger message [1637311104:225994][5108:140713466155616], txn-recover: Recovering log 6 through 273
2021-11-19T09:38:24.228+0100 E  STORAGE  [initandlisten] WiredTiger error (2) [1637311104:228003][5108:140713466155616], txn-recover: __win_open_file, 539: C:\ProgramData\MongoDB\db\journal\WiredTigerLog.0000000006: handle-open: CreateFileW: The system cannot find the file specified.
: No such file or directory Raw: [1637311104:228003][5108:140713466155616], txn-recover: __win_open_file, 539: C:\ProgramData\MongoDB\db\journal\WiredTigerLog.0000000006: handle-open: CreateFileW: The system cannot find the file specified.
: No such file or directory

From now on, the number of these journal files started to grow rapidly, filling the disk completely in less than 10 minutes.

2021-11-19T09:47:33.226+0100 E  STORAGE  [initandlisten] WiredTiger error (28) [1637311653:226007][5484:140713466155616], connection: __win_file_set_end, 394: C:\ProgramData\MongoDB\db\journal\WiredTigerTmplog.0000000001: handle-set-end: SetEndOfFile: There is not enough space on the disk.
: No space left on device Raw: [1637311653:226007][5484:140713466155616], connection: __win_file_set_end, 394: C:\ProgramData\MongoDB\db\journal\WiredTigerTmplog.0000000001: handle-set-end: SetEndOfFile: There is not enough space on the disk.
: No space left on device

What is strange, the last recovery log lines were encountered 10 days earlier:

2021-11-09T09:59:12.657+0100 I  STORAGE  [serviceShutdown] WiredTiger message [1636448352:656659][9476:140720848654944], txn-recover: Recovering log 271 through 272
2021-11-09T09:59:12.809+0100 I  STORAGE  [serviceShutdown] WiredTiger message [1636448352:809657][9476:140720848654944], txn-recover: Recovering log 272 through 272

No problem was detected in the meantime, unit 2021-11-19.

Maybe it is just by coincidence, but it looks strange that it tries to search for file with index 6 up to 273, which is one more as 272 10 days before.

Did someone have a similar problem or experience?

Thank you in advance
Best regards
Lubo

Hi @lubo17 welcome to the community!

Sorry to hear you’re having issues with MongoDB.

So if I understand correctly, the timeline of the events are like this:

  1. You restarted one of the nodes in a PSA replica set configuration
  2. That node in question refused to start up, saying it can’t find WiredTigerLog.0000000006
  3. That node then proceeded to fill up the remaining disk space with more journal files

Is this correct?

If yes, is this the only node that experienced this sequence of events? What about the other data bearing node in the replica set?

Also could you confirm that:

  • Your MongoDB version is 4.2.0 Windows version, official binary by MongoDB
  • How did you install MongoDB?

Best regards
Kevin

1 Like

Hi @kevinadi, thank you :slight_smile:

Basically, regarding your 1. point, highly probably there were all nodes restarted at about the same time, as our system was newly deployed. Unfortunately, we are missing the logs from the other two nodes from the restart time.
For 2. and 3. yes, it happened like that.

This means that I cannot fully answer your question if this was the only affected node.

Our MongoDB version used is actually 4.2.5 Windows version. For installation we use the ZIP file and package it into our MSI. Before that, MS Visual VC++ Redistributable package 2015-2019 is installed on nodes.
During the affected restart, the MongoDB was not re-installed. It was running before without any issues.
Due to time pressure we’ve removed the entire MongoDB installation afterwards on all nodes and started fresh.

Unfortunately, the ticket reached me with the delay of some months and therefore I do not have the first-hand info.

Best regards
Lubo

Hi @lubo17

Can I assume that the old deployment is totally gone since you wiped the whole thing and started fresh? That’s unfortunate. I’d very much like to know what’s going on since what you described is quite a unique situation.

Having said that, best of luck with the current deployment :+1:

Best regards
Kevin

1 Like

Hi @kevinadi I have just signed up to say that this happens to me as well. Luckily it has happened in my local MongoDB running over Docker. It has happened twice. I think MongoDB at some point becomes crazy and starts creating journal entries. I would like to send some logs but as I was in 0 bytes free space I had to delete journals manually from the file explorer, making my MongoDB impossible to start now. When you are in 0 bytes free space pcs start to misbehave that is why I had to do that quick deletion. This has happened twice in less than 3 months. Next time I will try to grab some logs so someone can take a look at it. Version used is 4.4

I forgot; Is there a way to tell MongoDB to keep the journal size smaller than X size? Thank you in advance and regards

Hi @Javier_Guzman

Is there a way to tell MongoDB to keep the journal size smaller than X size?

The journal files should be no more than 100MB in size, individually. If you mean the number of journal files created, then no I don’t believe you can restrict it. The journals are WT’s write-ahead log, so anything that’s not in the data files yet would be in the journal. If you artifically restrict the journals, then you’ll either lose data or have to force WT to stop receiving writes. Since WT performs a checkpoint every 60s and the journals are only there to recover to the latest data state (checkpoint + journal), it’s curious if you see journal files accumulate in large numbers.

Do you mind using the latest MongoDB version, either 5.0.8 or 4.4.14 and see if the situation repeat itself? If yes, please provide details on how you can induce this behaviour.

Best regards
Kevin

1 Like

Hello @kevinadi,

Thank you for your response. I have checked the Mongo version in my Docker image and it is
4.4.13, I will see if I can upgrade without too much hassle.

Just for future; is there a particular command that would help for diagnosis when this happens again (if it is happen)? Or only with the logs would be enough?

Kind regards,
Javier

Hi @Javier_Guzman

Any information you can provide will be helpful. Logs, the content (ls -lRa) of the dbpath, any config file or mongod parameters in use, and the Dockerfile of the image should be good as a starting point. It’s great if you can also provide a reproduction steps as well.

Best regards
Kevin

All right @kevinadi , I will come back with more information if it happens again. Thank you and kind regards, Javier