Multiple collections vs one collection for the same data

Rodrigo_Vazquez · August 15, 2023, 7:53am

Hi, I am pretty new to MongoDB, and I am developing a web app where I query large time series and aggregate them. For that, I have an identifier variable which I am filtering by using the following match condition in the query (example):

{‘$match’: ‘id’: ‘123X’}

Currently, I have all the time series (one for each identifier) in the same collection. Running the query this way shows a too-large latency. I wonder if a design with a collection for every identifier would improve the performance (as I would avoid the match condition and could define this condition in a previous step in the API service).

Basically:

A unique collection with all identifiers VS Several collections, one for every identifier.

What is better (if any)?

steevej · August 15, 2023, 1:36pm

It all depends on your use-cases so only you can really answer that after doing some performance tests.

With new aggregation operators like $unionWith it is less important to keep together documents that are involved in the same use-cases.

In your case, having multiple collections might help since this field can be removed from all documents which mean more documents fit in RAM.

An alternative from splitting into many collection is to have partial indexes where your id:123X query is the partialFilterExpression. This way specific and smaller indexes will be used whenever your $match includes id:123X.

This being said may your too-large latency issue is simply the lack of indexes. Something that often happen with someone that is new to MongoDB.