Increase max document size to at least 64MB

That’s because it usually is an indication of incorrect schema design.

Asya

1 Like

I kindly disagree: I have financial data of a currency symbols like USDJPY (US-Dollar vs. Japan Yen) and USDCHF. Each daily candle contains a timestamp and 4 prices: open, high, low and close price.

I’ve been implementing mongo queries and complex analysis routines for many years and was - until now - happily using just one document per symbol. Just recently I figured out that from more than 3000 financial instruments USDJPY and USDCHF are the only ones that have such a huge data history (dating back to January 1970) that they exceed 16MB and thus cannot be stored entirely.

With this 16MB limit I would now have to go through dozens of complicated methods and implement additional “boilerplate” logic to read in chunks of history data and deal with highly increased complexity of analysis routines that now need to see beyond the borders between a set of chunks. No fun, seriously.

I do like to work with MongoDB and I don’t mind difficult tasks, but implementing additional logic just because there is no way to increase a tightened memory “budget” seems utterly wrong. At least to me. Not to mention that the whole additional logic reduces the readability of the code and lowers its performance.

If there’s chance, can you at least provide a parameter in MongoDB with the default of 16MB, and in case people really need more memory, then they have the freedom to do so?

1 Like

The only way to have larger document size would be to change the limit in the source code and to recompile/build it yourself and then run your own changed copy. I wouldn’t recommend it though because there are a lot of other places where things can go wrong - drivers assuming documents will not be larger than 16MBs also.

It’s hard to imagine that it’s actually required to have full history of a symbol in a single document. Do you always analyze the full history when querying it? If not then it’s quite inefficient to keep it all on a single document.

Asya

Similar use case here…

I’m using mongoDB to store colelctions of 10000 frames of a moving system of 200.000 particles. Each frame stores 13 floating-point values per particle. So a single frame holds 2.6 million floating point values (20.8 MB). It is just not practical to split frames into more than 1 document.

I have 128 processors, 16 TB of SSD and 1 TB of RAM on the server… could anybody explain the logic behind the 16MB document limit?. Sounds a bit 2009.

How are you using this data? Do you always fetch all 10000 frames of all the values? Sure, 16MBs is arbitrary, but 10 thousand seems arbitrary also, why not five thousand?

Asya

1 Like

Hi Asya.

For my case scenario, each frame is 2 picoseconds of motion. 10k frames are 20 nanoseconds, which is the minimum window I need to measure. Sampling less time would be statistically insufficient, and reducing sampling times would induce aliasing.

After saving I need to build n-dimentioinal dataframes and do some physics with it. I’m still thinking wether I should do pipelining or work everything from python.

Would you go for GridFS? or definetly compile with max document size to… say 64Mb?.

Best!
Pedro

I don’t fully understand the implications of forcing a bigger document size. Hence the question.

As for the usage, i need to query each particle in all 10.000 frames and compute its motion. This can be done by getting all 13 attributes of the same particle ID from all documents (if each frame is 1 document).

So 13x8x10000 is 1MB per particle. But then each 20MB frame should fit in a document.

I’m thinking splitting frame data in 2 collections would do… but its far from ideal.

Mongo Manual said

we can use embedded documents and arrays to capture relationships between data in a single document structure instead of normalizing across multiple documents and collections, this single-document atomicity obviates the need for multi-document transactions for many practical use cases.

So we tend to use embed documents,but sometimes one document can be very large,in our project it may reach 30M or more, so we must split it and keep reference relation, it gonna be very complicated and this way mongo doesn’t support muti-doc txn in single server, it’s so wired.
Redis has supported RedisJson Module and the limit is 512M, I wish mongo increase this limit and support JsonPath.
And i want to know why mongo only support muti-doc txn in replica and shard, sometimes we want to test transaction but mush deploy replica, it’s troublesome

I’m not sure what you are talking about - MongoDB has supported transactions (included across shards) for years now…

Asya

1 Like

Hi @timer_izaya,

Multi-document transactions rely on the replication oplog which is not present on a standalone mongod deployment.

However, you can deploy a single node replica set for test purposes.

Regards,
Stennie

Don’t you love it when a company tells you it’s your business that’s wrong, not their product.

We have also crashed into the 16MB document size limit in another financial use case. The application design is sound, the issue is not the schema or the way in which we are using the tool. The issue is the tool, so we have little choice but to switch to another tool unfortunately.

It seems that if multiple customers, in different industries, with very different use cases, are all struggling with the limitation it would be prudent for that company to ask whether it’s really the customers who have gotten it wrong, instead of spikily insisting that they don’t know what they are doing or don’t understand their own domain.

5 Likes

I have no choice but to implement a generic method to split huge json.

Same here I have simple website builder where we keep pages for easy css and text manipulation.

We’re storing documents with images and lidar pointclouds, using MongoDB as a geospatial database. Each pose of the vehicle is stored as a geojson point and queried by geographic location and radius. The collection reaches 100GB with 10min of vehicle travel (and images less than 16MB) Because we need better resolution, we’re hitting that 16MB limit now with just the image sizes. I understand the issue is a limit in BSON and there’s another extension, GridFS, to store large BLOBs by writing them directly to the filesystem in chunks. It’s my opinion that this makes the database, software, and filesystem more complex to manage, unless I’m missing something.

The reason we were sold on MongoDB was it’s performance, nosql, and geospatial indexes. We’re starting questioning the true performance after using MongoDB for 4 years. Each write to the collection, with just one spherical index, takes .5s on an i9, 64GB ram, with a Samsung 970EVO ssd. Granted, this isn’t the fastest machine out there, but we’re limited to what we can fit and power on an electric vehicle. What we’re learning about database systems, like Cassandra, is they’re faster and have document size limits (including BLOBs) of 2GB. I’d really hate to rewrite my data abstraction layer.

Also, the bsoncxx and mongocxx API documentation is really lacking and needs updating. It’s mostly built with doxygen, with very few descriptive comments and few if any code examples. I had to read the source code to figure out writing and reading binary. It took me way longer to figure out how to write binary than was necessary. Here’s an example of the only official documentation I could find on it.
I don’t know how anyone could get this from that documentation:

  bsoncxx::types::b_binary b_blob 
  { 
    bsoncxx::binary_sub_type::k_binary,
    sizeof your_array_or_object,
    reinterpret_cast<uint8_t*>(&your_array_or_object)
  };

@Matthew_Richards what are you storing in a single document? Normally images tend to be stored separately from various other “metadata” in part because letting documents get really large means that all operations on the document will be slower - and if you’re doing any sorts of updates or partial reads of the document then you’d be better off to have things stored separately.

Asya

1 Like

We use ROS (Robotic Operating System) and it’s distributed between 6 different computers on the vehicle. We have a perception (cameras and lidar), localization/navigation, mapping, control, safety, and a database computer running MongoDB. It’s too heavy of a load for a single computer; we had a team build an amazing (and fragile) Threadripper and it failed. The images come in from six cameras as a 15fps stream of compressed raw stills over the network (not files). Each still from one of the low rez cameras is 5MB and 25MB from the high rez cameras. We thought about writing the raw images (bgr8 format, the binary isn’t a standard format like jpg or png) with a unique id to the filesystem, but that creates different problems. The software isn’t running on the same machine as the MongoDB and when documents are updated or deleted, the filesystem also needs to be updated. It all can be figured out, it just complicates things using the filesystem. It’s much cleaner and faster to write the c++ objects and arrays directly to the database, using lazy writes if the filesystem can’t keep up, and cast them back when read. No encoding/decoding or serialization/deserialization. For example:

To Write:

MyClass my_object;
bsoncxx::types::b_binary b_object
{
  bsoncxx::binary_sub_type::k_binary,
  uint32_t(sizeof(MyClass)),
  reinterpret_cast<uint8_t*>(my_object)
};

Then to read:

MyClass* my_object =
  reinterpret_cast<MyClass*>(my_doc["my_object"].get_binary().bytes);

It’s simple = more stable, less code = faster, and less to maintain = more reliable.

Hi, I understand the design thinking behind MongoDB restricting the document size to 16MB. However, I think there should not be any limit on a document size. I think developers should be allowed to structure their data the way they think is best for their product. I think it should just be a situation where companies pay more for their overall database size without any limits on the size of a document. I am currently in the process of considering making changes to the design of my database and it’s quite some work to think about and go through even though we have not even launched our product to the market. I can only imagine what companies that have so much data have to go through to redesign their database and migrate without any issues. It will really be amazing if MongoDB removes the document size limit to make their customers satisfied.

1 Like

Ok so I started making changes to my database design and I think the MongoDB limit of 64MB is much better and easier to work with. However, I do understand that it might be strange for older developers to adopt quickly. I only learnt SQL but never worked with it deeply. MongoDB is all I have worked with so making the changes is a lot easier for me and probably other younger developers.

1 Like

There is currently a request for this open on the feedback site.
Anyone interested in seeing the limit increased should probably go and vote for that idea to show their support:

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.