Scalable group chat modelling and performance

Anshul_Negi · September 19, 2021, 12:23pm

I have created Group Chat Schema like this

User : {
   group_id:String,
   name: String,
   user_id:String,
   last_seen:String
}

Message : {
    message:String,
    user_id:String,
    group_id:String,
    created_at:String
}

Group : {
    group_id:String,
    total_messages:String
}

Need to perform few queries when the individual user gets his/her group

All groups where the user is available with unread count sorted in latest order
Paginated list of chat

What I have done so far

User sends his/her id, matches current id in the USER collection which gives all groups of a specified user, and then gets unread messages from another MESSAGE collection using last seen date. I don’t want to perform any lookup or group operation when designing this query. As both operations incur overhead. So is there any way to redesign this schema?
I am maintaining a separate collection for total messages in the group so that I don’t want to perform any count operation in real-time. And for getting chat I have applied index on a date so on query it gives me latest messages varied according to limit and offset and later I applied sort method of javascript. So is there any other way better than this. Can I get formatted array for db only ?

Pavel_Duchovny · September 20, 2021, 7:09am

Hi @Anshul_Negi ,

Welcome to MongoDB community.

The schema you are showing is more likely to be relevant for relational databases where you seperate the data.

In MongoDB data that is queried together should be stored together and I will link some useful articles for you.

In your case I would use a different schema embedding some data.

My opinion is that user group ids should be embedded in the user object. You can consider if this will be just ids or embedded objects.

users collection

{
Userid : ... ,
Name : ...,
 Groups : [ groupId ],
GroupCount : ...
Lastseen : ...
}

groups collection

{
GroupId : ... ,
Name :  ...,
TotalMessages : ... ,
LastModified : ... ,
Users : [{userId : ..., TotalUnread : ...}]
}

message collection

GroupId : ...,
Message : ...,
User : { userid : ..., Name : ... },
CreatedAt : ...

With this schema design a user login and pass it userId and his user data is loaded. Than the groups array is passed as a $in to fetch all his groups from the groups collection sorted by last modified. You can also use aggregation $filter to present only this user unreadcounts.

Whenever a message is written into the message collection with the relevant user and group id you should $inc the relevant array element in the other users unread total. As well as update the last modified of a group.

When a user enters a group you set a 0 for the unread count for its specific id.

Please let me know if that makes sense.

Thanks
Pavel

Anshul_Negi · September 20, 2021, 7:48am

According to my scenario

A user can be joined unlimited groups around 10k, so I have made separate collection.
On login providing user info about all groups and then applying $in on large dataset is it applicable, doesn’t this will create performance issue.

If limiting the group size solves the problem then its ok , but I don’t know about performance impact of $in query on large set and to what extent the group size should be limited?

Pavel_Duchovny · September 20, 2021, 7:51am

Ok with those numbers the schema should be rethink.

How many users are expected to have thousands of groups?

You can possibly keep a group count for a user if the group count is more than lets say 500 you should index groups collection on user.userid and query the groups from there based on the specific user.

See outlier pattern

Thanks
Pavel

system · August 18, 2023, 8:04am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.