Server does not start: "Too many files"

I have upgraded from version 4.2 to 4.4 on Debian Buster. It worked fine after the first start, but then I shut down the Mongo server using db.shutdown(); which was also confirmed as successful. I had to restart the container. The whole system runs in an LXC under Proxmox.

Now the instance doesn’t start anymore. I am using a standalone version. Mongo seems to think that I did not shut down the server cleanly, so it is attempting a recovery.

{"t":{$date":"2023-04-11T08:49:48.577+02:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1681195788:577487][2009:0x7f84349a7cc0], txn rollback_to_stable: [WT_VERB_RECOVERY_PROGRESS] Rollback to stable has been running for 340 seconds and has inspected 426639 files. For more detailed logging, enable WT_VERB_RTS}}

Unfortunately, all my attempts to recover have failed with the following error:

{"t":{"$date":"2023-04-11T08:50:23.765+02:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":23,"message":"[1681195823:765078][2009:0x7f84349a7cc0], file:index-32660--
5356062899923388618.wt, txn rollback_to_stable: __posix_open_file, 808: /var/lib/mongodb/index-32660--5356062899923388618.wt: handle-open: open: Too many open files in system"}}

This doesn’t make sense, since there should be enough resources available.

ulimit -an
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515430
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 999999
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 515430
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
cat /proc/2474/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             140000               140000               processes
Max open files            750000               750000               files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       515430               515430               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

We have a fairly large installation with ~1.5TB data and ~260K files.

root@mongo lib/mongodb# ls -l | wc -l

The recovery always restarts at around ~200K open files.

I have checked the forums for a solution but other than increasing the open files limit I did not find anything.

I´d be very thankful for any hints!

Thank you

Hello @Fabio_Bacigalupo ,

Welcome to The Community Forums! :wave:

I saw that you haven’t had any response to this topic yet, were you able to find a solution to this?
If not, as mentioned by you

Do you mean that after successfully updating the server, you were able to use it without any issues and later when you shut it down and restarted it, the server started giving the error?

  • Have you tried increasing the ulimit and restarting the server?
  • All the resources available in this hardware are just for MongoDB or are there any other installations sharing the resources?
  • Can you try to start the server without any additional connections/read/writes from the application?

There is a similar issue resolved, can you take a look at this?