We use MongoDB (v. 4.0.13) to store sensor data from IoT devices. We have two collections that store the same type of documents. Each document has a datetime attribute. We use PyMongo (v. 3.7.2) and Python (3.7.2) to retrieve the data from the MongoDB database. More complex aggregation queries are quite common. The situation is as follows:
- The first collection is for storing documents that aren’t older than one month. Documents older than one month are deleted regularly every day. This is to ensure a reasonably sized collection, which is used to retreive data into our web application.
- The second collection holds all documents ever received and we never delete any documents from this collection. This is to ensure we have detailed historical data for every device. This collection is used
when we need detailed historical data, that is older than one month.
We expect these collections to receive a lot of data - in tens of millions. Consequently, the second collection is bound to reach a size that is unmanageable and all queries will take a significant amount of time, which will be, at some point, unacceptable.
We use indexes (which sped up our queries significantly) and are looking for another way to speed up our queries and manage our collections in a way that let’s us keep the time and cpu utilization of these queries reasonably low and stable.
- Is there a way to split this large collection into multiple smaller ones (of a similar size for example), but query more than one, if necessary, as if it was one collection? Let’s say I wanted data that is split between two smaller collections - I would need to combine these two and then use find/aggregate on the combined data. Is there a way to achieve this?
- Is the approach presented in question #1 feasible and MongoDB friendly? If not, are there any other ways to approach this problem?