Larger arrays within Time-Series collections causes issues with bucketing

Allan_Chase · March 14, 2024, 4:09am

I was loading up some observation data that just happened to contain 2 large arrays [7200 doubles] and noticed that the buckets were 1 to 1 with the documents after the load. I took out both arrays and time-series worked perfectly, put them back in and I was back to a 1 document to 1 bucket situation. I know there is a 16MB per document limit, so I measured the size of 1 document that contained both large arrays and that came out to 0.59MB. Is this by design or a bug?

Kushagra_Kesav · March 14, 2024, 6:02am

Hi @Allan_Chase,

Welcome to the MongoDB Community

In MongoDB, we recommend not exceeding 200 elements in the array size to maintain the efficiency of the query. In your case, having 7.2K elements in a single array is not a good schema choice in terms of query efficiency.

The maximum size of the document is 16 MB, and while you can still store more than 7200 elements. However, storing data in that manner will certainly degrade the performance and result in high query latency.

What I would advocate in the most general terms is that: how will the schema help simplify your workflow, and at the same time, allow you to create indexes that make those workloads run faster?

In certain scenarios, arrays may considered as the preferred option, while in others, embedded documents might be more suitable. Depending on the specific use case, arrays can streamline certain operations but potentially complicate others, and vice versa. Therefore, it falls upon the user to determine the right balance that will satisfy all workloads, while still being able to perform well using indexes.

Please refer to the following resource to learn more:

https://www.mongodb.com/docs/atlas/schema-suggestions/avoid-unbounded-arrays/

Best regards,
Kushagra

Allan_Chase · March 14, 2024, 10:22pm

As usual Kushagra, great explanation. This makes total sense. The queries against this collection are usually time based with a couple other criteria (but not using the arrays in the query criteria). Thanks for taking the time to explain this out for me.