Increase max document size to at least 64MB

Initially limit was 4MB, then it was raised to 16MB in 2009.
We are in 2021, we have better hardware, better network, bigger documents and competitors without 16MB limit.
There are a ton of use cases where this limit is too small nowadays, sensors, large documents with history of data.
I’ve opened a new JIRA Issue on this here: https://jira.mongodb.org/browse/SERVER-60040

6 Likes

Ticket on Jira has been closed with this comment: “ Thanks for your report. Please note that the SERVER project is for bugs for the MongoDB server. As this ticket appears to be an improvement request, I will now close it.”

But main Jira page for MongoDB says:

Which JIRA project should I use to report bugs or feature requests?

  • To report potential bugs, suggest improvements, or request new features in the MongoDB database server , use Core Server (SERVER) .”
1 Like

Hi @Ivan_Fioravanti welcome to the community!

Apologies for the confusion. The description of the SERVER project is a bit outdated. We currently are using https://feedback.mongodb.com/ to collect ideas on how to improve the server, and dedicated the SERVER JIRA project for bug reports. Specifically in your case, you would want to go to the Database Section on that page.

Best regards
Kevin

1 Like

Hi @Ivan_Fioravanti

Since only data that’s always used together should be stored together, I’m curious about the use case that requires documents bigger than 16MBs. You mention tons of use cases, but can you be a bit more specific? Like “history of data” - this seems like a problematic example since eventually it will outgrow any document size limit, and I’m not sure I can think of a use case where you need all history (no matter how old) when reading a document.

Asya

1 Like

Hi @Asya_Kamsky
there are many examples in the https://jira.mongodb.org/browse/SERVER-5923
everyone starts using MongoDB thinking: 16MB is a lot! I’ll never hit this limit, but when you reach it is a mess.

Also this one would be extremely beneficial https://jira.mongodb.org/browse/SERVER-12305 complex aggregations with many pipeline can hit this limit more often than you think.
Removing this limit shoild be easier, please plan at least this one.

Thanks,
Ivan

1 Like

That’s because it usually is an indication of incorrect schema design.

Asya

I kindly disagree: I have financial data of a currency symbols like USDJPY (US-Dollar vs. Japan Yen) and USDCHF. Each daily candle contains a timestamp and 4 prices: open, high, low and close price.

I’ve been implementing mongo queries and complex analysis routines for many years and was - until now - happily using just one document per symbol. Just recently I figured out that from more than 3000 financial instruments USDJPY and USDCHF are the only ones that have such a huge data history (dating back to January 1970) that they exceed 16MB and thus cannot be stored entirely.

With this 16MB limit I would now have to go through dozens of complicated methods and implement additional “boilerplate” logic to read in chunks of history data and deal with highly increased complexity of analysis routines that now need to see beyond the borders between a set of chunks. No fun, seriously.

I do like to work with MongoDB and I don’t mind difficult tasks, but implementing additional logic just because there is no way to increase a tightened memory “budget” seems utterly wrong. At least to me. Not to mention that the whole additional logic reduces the readability of the code and lowers its performance.

If there’s chance, can you at least provide a parameter in MongoDB with the default of 16MB, and in case people really need more memory, then they have the freedom to do so?

The only way to have larger document size would be to change the limit in the source code and to recompile/build it yourself and then run your own changed copy. I wouldn’t recommend it though because there are a lot of other places where things can go wrong - drivers assuming documents will not be larger than 16MBs also.

It’s hard to imagine that it’s actually required to have full history of a symbol in a single document. Do you always analyze the full history when querying it? If not then it’s quite inefficient to keep it all on a single document.

Asya

Similar use case here…

I’m using mongoDB to store colelctions of 10000 frames of a moving system of 200.000 particles. Each frame stores 13 floating-point values per particle. So a single frame holds 2.6 million floating point values (20.8 MB). It is just not practical to split frames into more than 1 document.

I have 128 processors, 16 TB of SSD and 1 TB of RAM on the server… could anybody explain the logic behind the 16MB document limit?. Sounds a bit 2009.

How are you using this data? Do you always fetch all 10000 frames of all the values? Sure, 16MBs is arbitrary, but 10 thousand seems arbitrary also, why not five thousand?

Asya

Hi Asya.

For my case scenario, each frame is 2 picoseconds of motion. 10k frames are 20 nanoseconds, which is the minimum window I need to measure. Sampling less time would be statistically insufficient, and reducing sampling times would induce aliasing.

After saving I need to build n-dimentioinal dataframes and do some physics with it. I’m still thinking wether I should do pipelining or work everything from python.

Would you go for GridFS? or definetly compile with max document size to… say 64Mb?.

Best!
Pedro

I don’t fully understand the implications of forcing a bigger document size. Hence the question.

As for the usage, i need to query each particle in all 10.000 frames and compute its motion. This can be done by getting all 13 attributes of the same particle ID from all documents (if each frame is 1 document).

So 13x8x10000 is 1MB per particle. But then each 20MB frame should fit in a document.

I’m thinking splitting frame data in 2 collections would do… but its far from ideal.