Need some advice on how to design my database

Til_Weimann · December 28, 2020, 8:19pm

Hey,

I’m new to MongoDB need some advice on how to design my database (with mongoose).

I want to have multiple documents that each have a user, a date, a type and a number.
Every few minutes the db should be check if any date is smaller than Date.now and if yes, my program do some things with type and user. The second thing that should be checked is if for every type, the number is bigger than another external value I am comparing it to, and if yes, it should do the same thing.

Since the database could get very big and these checks should run every couple of minutes, I want to make sure this thing is as efficient as possible. To achieve this, I thought about storing documents with different users and numbers as subdocuments with a parent that has a unique combination of type and date. These top-level documents should be sorted by date on saving so that only the first document has to be checked for an expired date every time the checks begin and for every other document, only the first subdocument (sorted by the number) has to be checked.

My question is if this is a good idea or if there is any better way to do it. If no, can you explain how I can sort the top-level documents so that I only have to read the first one’s date?

Thank you,
Til

Pavel_Duchovny · December 29, 2020, 6:34am

Hi @Til_Weimann,

Welcome to MongoDB community!

Based on the very high level description provided here it sounds like your documents could have 2 potential structures:

Bucket data by dates , although not sure what should be the boundaries of each document (min/hour or day). It mostly depands on how many user embedded documents are expected to be located in the array. We do not recommend arrays growing beyond of 1000 elements per document.
Once you bucket the data you can index inner user and/or type subdocs to have filtering ability by a type or user.
Store each user inside a document with the users dates and types where the inner objects can be indexed. The ammount of data is dependent on expected growth of the arrays per user. You may consider to keep from 1-n samples per document.

Now to better address your questions I would need:

The relationship between user/dates and types including how many of those can be viewed per user or per min.
The most common queries and the needed data to be returned by them (eg. Users and types per data or users per types etc.)

I recommend looking into the following blogs:
https://www.mongodb.com/article/mongodb-schema-design-best-practices

https://www.mongodb.com/article/schema-design-anti-pattern-summary

This is very good material to pick a good starting point for any design.

Best
Pavel

Til_Weimann · December 29, 2020, 1:56pm

Hi,
Thanks a lot for your answer, I will go a bit more into detail on how the database should work.

What I am trying to build is a structure that will help me automatically manage events based on two decisive factors.
The most common queries will be:
a) Every couple of minutes, look up if any events have expired by finding the event with the lowest date value and comparing it to the current date. If the event is expired, a function that processes user, date, type and some other values will be executed and the event will be removed from the datebase.
This check should be repeated either right after the processing has been done (only if the previous event expired, of course), or a couple of minutes later if the previous event hasnt expired yet.
Events should be added to the database all around the clock (probably mostly from every couple of minutes to every ten seconds, depending on user activity).
b) Every couple of minutes, for every type, find the event with the event with the lowest value and check if this value is below an external value. If yes, process the data in a function just like with query a) and remove the event from the database.
There should be up to around 50k possible event types, but the vast majority of them will probably never be used, so the database will most likely contain only a couple of thousands of types but a much larger amount of users.
c) Another, less important query will be to look up what events a specific user has, but this will only happen based on user-input and not be too frequent.

Events are created by users with a date (year, month, day, hour) within the next few months. Every user may have up to 10 events at a time, but the average number will be lower. Since the amount of users should get bigger than the amount of combionations between dates popular types, users will most likely share those two values, but not other variables that probably will be more or less unique to them when looking at the type-date combination.

While the bucket structure you proposed could help group dates into groups (e. g. by day), this structure wouldn’t be very useful when performing query b).
The other structure you proposed would, if I understand this correctly, rather help speed up things when performing query c), but since this one will be a rather unimportant and not very frequent query compared to a) and b), I dont know if this is the best choice.

Thanks,
Til

Pavel_Duchovny · December 29, 2020, 5:17pm

Hi @Til_Weimann,

Now looking at the details why not to hold a document for each event in an event collection. This collection can be segregated into partitions (generic names for collections based on a monthly or yearly basis like events_202012) each document will be for an event holding user info, type and expireDate. expireDate will be indexed so you could fetch any expire documents and pass their data to the functions.

As you can index your type data you will be able to do queries based on type filtering.

An example of a high level document:

{ "event_id" : ...,
 "Event_Name" : ...,
 "expireDate" : ...,
 "UserData": { userId : ...}
 "EventType" : { typeId : ....}
...
}

Now here you have the flexibility to hold a document per user and duplicate it for users sharing the event or turn users to arrays and host user ids in them.

Additionally , you can think on using a ttl index to eventually expire to object if you just need to delete it or use a remove command at the end of processing for each event.

You can use the trick of 0 seconds expire settings so you set the expire field with the current date only after processing the event for deletion and the ttl thread will expire it.

One last question is if you have the need to show a user profile and its associated evenrts as this might be a bit more complex with this design. In such case there might be benift in holding a users collection with events array (up to 10) for each user, and each event will hold its expireDate and type. But you will probably still need to have an extended reference collection for extra event details to avoid overflowing the user profile doc.

Thanks
Pavel

system · January 3, 2021, 5:17pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.