I'm looking for pointers & heuristics in determining the right GridFS chunk size

Michael_Jay1 · September 13, 2022, 3:16am

I’ve read over the MongoDB Manual GridFS section, and I’m left wondering how to determine what an adequate chunk size is?

I understand some of the detriments of choosing a small chunk size, and I experienced the worst detriment of all when I tried to upload a file with a 1KB chunk size. I know that in addition to that (now) obvious issue, there are index & document count concerns.

I understand that if

you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.

What if I’m with certainty always going to read the whole file in? Is there any reason not to choose near the max chunk size? I assume that the max chunk size is the same as the max document size: 16 MB - affording room for the additional document data.

Or, if my app has a file size restriction of 5 MB, is it more appropriate to set that as the chunk size? Is there much of a difference in setting the chunk size to 5 MB vs 15 MB if all of my files will be, at most, 5 MB?

Thanks for any advice or general aspects of determining chunk size that I’m missing.

Stennie_X · September 20, 2022, 12:57pm

Hi @Michael_Jay1,

If you are always going to read the whole file and all of your files will be less than the 16MB document size limit, you could serialise each file as binary data in a single document instead of using GridFS.

GridFS is more useful when you want to store files larger than 16MB or access portions of a large binary file without having to read the entire file into memory.

See When to Use GridFS for more background, including similar advice:

Furthermore, if your files are all smaller than the 16 MB BSON Document Size limit, consider storing each file in a single document instead of using GridFS. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.

Regards,
Stennie