Bucket pattern when data frequency is not stable


On bucket pattern, i understanded that we can keep parts of data in separate documents with some creteria, for example time, 1 document/day.

But what can we do if this flow is not stable?

For example sensor1 sends sometimes 100000/day and other days sends 1000/day.
And sensor2 sends 10/day but sometimes 1/day

Here sensors are not stable, and also they have big differences between each others to be in same collection. How to bucket this any ideas?

I thought of checking the document first, and if big to create another document(bucket) (like bucket based on document size), but this will cause slower updates, like query first and then update. Maybe there are better ways.

Thank you

There is no hard set rules for bucket size or for bucket homogeneity. Except for the 16M document size limitation.

You also can combine pattern.

For your 100000/day sensor you may use the outlier pattern.

An alternative could be the computed pattern, where rather than keeping 100000 data points you start keeping a count, max, min, avg and may be last 10 values. Too many data points, in a lot of situations (sensor values is one), becomes meaning less anyway.

If you look at the update API you will see that an update is always with a query. You just have to make a more complex query to update correctly your normal case and handle the outliers when the normal update fail. Your update could look like:

query = { "sensor_id" : 369 , "date" : today , "data" : { "$size" : { "$lt" : 1000 } } }
update = { "$push" : { "data" : value } }
result = c.updateOne( query , update )
if ( result.updateCount == 0 )
  updateOutlier( ... )

It is probably possible to do both normal and outlier update in a single round trip to the database using bulkWrite with 2 updateOne operations, one would fail for the outlier and the other would fail for the normal case. Simply add the $size:{$gt:x} in one and $size:${lte:x} for the other.

Alternatively, you may simply just do your $push and then asynchronously handle outliers using the change stream API where you put your array size limitation on the stream query so your outlier code is automatically called when a document becomes an outlier.

Hello steevej : ) thank you for trying to help.

Another example of the problem i have could be chatrooms.
Some chatrooms are busy, others are not, and sometime chatrooms become busy for some period of time.

So bucketing by date doesn’t make sense, 1 chatroom can be not-active for like months.
The question is how to save all those messages as part of the chatroom document? (except the solution to make message a separete collection)

How would you model chatrooms that some are busy with thousands of members, and others have like 10 people that almost never type. (number of people is not the creteria, some big chats can be not-active, but some small ones can be)

Outlier pattern as far as i understanded is when the a schema fits the vast majority,
and few documents can have for example big arrays, so i keep that part else for the few
documents. (like twitter followers, vast majority has < 1000)
This maybe can help, if i save data in multiple collections maybe.

Computed pattern as far as i understanded is when i want to pre-calculate some aggregations for example, to avoid doing it in every read.
I don’t think that computed pattern can help, i dont want to use aggregation functions.

Each situations is different. There is no fits-all solution. If you feel the bucket pattern does not apply, then do not use it. There is no problem with the following:

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.