Need help structuring my database to optimise scaling

Hey. I’ve been developing my app for a couple of months now and I’ve just recently been accepted to the App Store and google play store. Hooray!

However, for the last couple of days I’ve been worried about scaling and how my app will preform under load, with potentially (and hopefully) hundreds and thousands of users.

Rather than shrug it off and “worry about it later”, I’d like to address it now while there’s not as much data in my database.

Essentially I have one quick question in particular, but if anyone could give me some other tips on how to optimise my database for scaling in the future, that would be great too.

My question is this. Right now my app has a group messaging feature. Each group is characterised by its _id that mongo automatically generates when a group is created. Every group is stored in a single collection named “groups”, and every message is stored in a single collection called messages.

In node I have an endpoint that gets all the messages from a given group._id, the way it works is in the individual message’s body, I have a “groupId:” field that contains the group._id that the message was sent from.

However I’m wondering if rather than filtering potentially millions of messages and finding the ones with the given groupId, should I instead create a new collection every time a group is created? Would it be faster to have thousands of collections and instantly return those, without any filter delay. Or filter between millions based on a given groupId in a single collection housing all the messages ever sent in the app. Is there any downside to having too many collections ?
Hope I explained clearly.

Thanks guys !

Hi @Ethan_H ,

Great that you asked this question. One of the most important consideration when designing your schema is to avoid a known antipatterns.

One of the anti-patterns is “too-many-collections”. Having a collection per group will result as you said in thousnds of collections which means 1000 of files that the database software needs to manage.

If you index the group_id as a leading field (if the index is compound) whithin a single collection a search for a specific index should not be that bad compared to full scanning its own collection.

What you can consider though and I saw it in other types of messaging application is to partition the data based on time or regions and keep a monthly or a yearly collection for the messages. Another solution s maybe to delete with TTL or archive with Atlas online archiving the uneeded messages over time to optimize the size of the collections.

I suggest to read the following blogs:

Best regards,
Pavel

2 Likes

Thanks for the help!
I’ll be sure to read up on those blogs :slight_smile:

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.