Get chats list MongoDB

I have 3 collections. (Chats, Members, Users)
Chats can have an unlimited number of members thats why i store chat members in a separate collection.

Sample data:

{
    "chats": [
        {
            "id": 9,
            "title": "Best Friends",
            "type": "group",
            "last_updated": 13752365
        }
    ],

    "members": [
        {
            "user_id": 1,
            "chat_id": 9
        }
    ],

    "users": [
        {
            "id": 1,
            "username": "Rick"
        }
    ]
}

For example when i send a message in a chat the updated_at field of the chat should be updated.
I want to get last updated chats for user#1 “Rick”. I don’t want to use $lookup for this reason:
“Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.”

Instead i get the chat ids from the members collection then query the chats. This is slow and not efficient if the number of chats is big.

const members = Members.find({ user_id: 'A' }).project({ chat_id: 1 });
const chat_ids = members.map(item => item.chat_id); // for example: 2000 ids

const chats = Chats.find({
  id: {
    $in: chat_ids
  }
}).sort({ updated_at: -1 }).limit(20);

is there any way to handle this scenario? i need fresh data, speed.

Hi @Chris_Martel ,

Your schema is very relational. The idea in MongoDB is to try and store the data together.

I would avoid the connection members collection and store the members of a chat in the group document as an array.

Now I know you want this to be unlimited but more than 100 members will probably be an outlier so use an outlier pattern

This way you in most cases query only 1 collection.

Thanks
Pavel

1 Like

Thank you, It’s a very good pattern.

I have some other questions:
Does the outlier pattern impact performance (speed and ram usage, …)?
Can we index the array field (members)?

// looking for chats with user#103
Chats.find({
  members: 103
});

Hi @Chris_Martel ,

Of course you can also index arrays for search.

If you store documents in arrays they can be also indexed inside. The created index is a multi key index:

Thanks
Pavel

1 Like

Hi again,
I implemented the outlier pattern but faced another problem.
As i said in the first question when a message is sent the updated_at field should be updated to get last updated chats of a user.
Now a group has many connected documents because of the pattern. Updating all the documents at once is very slow, i think it’s a bad idea doing so.

"chats": [
        {
            "id": 1,
            "chat_id": 9,
            "title": "Best Friends",
            "members": [1, 2, 3, 4],
            "main": true,
            "has_extras": true,
            "updated_at": 13741551
        },
        {
            "id": 2,
            "chat_id": 9,
            "members": [5, 6, 7, 8],
            "main": false,
            "updated_at": 13741551
        },
        {
            "id": 3,
            "chat_id": 9,
            "members": [9, 10, 11, 12],
            "main": false,
            "updated_at": 13741551
        },

        // many chats...
]

Hi @Chris_Martel ,

I am not sure why you chose so little number of users per document . I would go for like 500 maybe for one document.

If you see performance problems related to this specific search you can decrease the number of array elements. Having 5-10 elements does not make sense…

So for most chats you should have 1 maybe 2 documents in total .

Thanks
Pavel

That’s just an example.

A group with 2 million members:
(500 members per document)
2,000,000 / 500 = 4,000 documents.

Not sure what kind of app do you have that a group has 2 million users…

How many such groups do you have?

In that case update the date in the main document only and query it together with the specific group.

There’s a group with ~200.000 members and growing.

Your solution is similar to the members collection.

const chats = Chats.find({ members: 5 }).project({ chat_id: 1, main:1 });
const chat_ids = chats.map(item => item.chat_id);

const chats = Chats.find({
  chat_id: {
    $in: chat_ids
  },
  main: true
}).sort({ updated_at: -1 }).limit(20);

I wish i could do this with only one query.

@Chris_Martel ,

Perhaps those very large groups should maybe get their own collection.

Why won’t you store the _id of each group main object in each extra object. This way you will only look for the single main document and not scan through all chats…

Also does chats have the message data as well or is it just a group repository?

Its better to have a groups collection and the messages in another collection with a group_id…

I hope there is no limit for the number of collections that we can create.


Why won’t you store the _id of each group main object in each extra object. This way you will only look for the single main document and not scan through all chats…

Do you mean this?

"chats": [
  {
    "id": 3,
    "chat_id": 9, // the same as _id, i use id and chat_id for simplicity here
    "members": [9, 10, 11, 12],
    "main": false,
  },
]

Otherwise i’m confused could you please show me a pseudo code example with sample chat data?


Also does chats have the message data as well or is it just a group repository?

Yes, They are stored in another collection.

Its not the same as _id as for chat_id : 9 you have multiple documents while the main document is just 1 document. So query by _id will be faster.

The psedo code is

 Main_ids = chats.find({members : x },{main_id : 1}).toArray();

Groups = chats.find({_id : {$in : main_ids}}).sort({updated_at : -1}).limit(20);

Now you should probably index:

  1. {members : 1, main_id : 1}
  2. {_id : 1, updated_at : -1}

This is my idea

Thanks
Pavel

Thanks for your help.