Strategy for storing billions of documents that need to be queryable

Kay_Zander · June 23, 2021, 10:17am

I need to store up to billions of rather small documents containing information like this:

{ "deviceId": 1234, "timestamp": "2020-04-03T15:00Z", "value": 13.534233}

there may be additional fields but basically thats about it.

Documents need to be queryable by timestamp, deviceId etc.

Typically there are 10^3 to 10^6 documents for a specific deviceId.

Problem is: over time we will end up having billions of documents. At some point it may be possible to actually drop old data but MongoDB must be able to handle billions of documents and be able serve query in reasonable time. At this time MongoDB needs to run locally. Scaling, vertically or horizontically could be an option.

Are there any suitable strategies for that? Can I somehow split collections?

Currently we use Elasticsearch with up to 1000 indices containing all the documents.

Rafael_Green · June 27, 2021, 11:59pm

Hi @Kay_Zander,
In my opinion, the bucket pattern seems to fit very well for your case.
Thanks,
Rafael,

Kay_Zander · July 29, 2021, 9:38am

Just came across Time Series Collections which is new in 5.x
So I would create a single timeseries “datapoints” that stores all (up to 10^9) documents
that come from different sources.
Can I still expect reasonable query performance when performing complex queries on this huge data set?