Dear MongoDB community,
we are using MongoDB in an academic context on a small scale to run a workflow management framework (https://github.com/materialsproject/fireworks) since ~ 2 years. Now we encountered the following issue for the first time:
The databases resides on an smb share that is provided by our university’s computing center, and apparently the underlying file system imposes some undocumented 4TB file size limit.
Our databases consist of multiple collections, two of them constituting a “GridFS” object storage system (https://docs.mongodb.com/manual/core/gridfs/). Recently, one database’s “chunks” collection apparently hit the mentioned 4TB limit, causing mongod to crash with
2020-10-27T14:10:04.293+0100 E STORAGE [WTCheckpointThread] WiredTiger error (28) [1603804204:293902][12:0x7ff9ae8c6700], file:collection-0-8347993523018026877.wt, WT_SESSION.checkpoint: __posix_sync, 99: /data/db/collection-0-8347993523018026877.wt: handle-sync: fdatasync: No space left on device Raw: [1603804204:293902][12:0x7ff9ae8c6700], file:collection-0-8347993523018026877.wt, WT_SESSION.checkpoint: __posix_sync, 99: /data/db/collection-0-8347993523018026877.wt: handle-sync: fdatasync: No space left on device
It took us a little while to realize that this was actually neither due to storage volume nor to inodes running out. With a running system, I never bothered much about what the underlying WiredTiger storage engine was doing, and only realize now that every collection resides in a single file!
The difficulty now is that we are unable to launch that mongod again, as it would always crash with the same error before becoming accessible and maintainable, and there is no other storage system of sufficient capacity and more generous file size restrictions readily available to transfer the database to for recovery.
Ideally, I would like to dump a few of the latest files from the GridFS and have the whole database in a maintainable state again. If that is not possible, then I could also live with removing the two GridFS collections from the database to at least have all other collections accessible again. However, how can I do either of these “low-level” without a running mongod instance?
Obviously, we will have to rethink out use of GridFS, and there arises a related question: Would there be some easy-to-apply solution to alleviate the file system’s file size restrictions, i.e. having WiredTiger split its collections across multiple files or inserting some file system layer that would do that transparently and efficiently without affecting the database’s performance too much on top of the underlying file system. Or would you advise on avoiding GridFS altogether? The only related post I have found is this one, https://jira.mongodb.org/browse/WT-3272. It mentions the same issue, but does not really discuss possible solutions.
Thanks for any discussion.