Advice for Chat schema design

Hi, in my application I’m developing I need to allow for chats between any 2 users. I have the following schema in mind for a Messages collection:

{
  _id: ObjectId(),
  userID1: "userA_ID",
  userID2: "userB_ID",
  sender: "one of the users",
  message: "some text",
  timestamp: time
}

userID1 will always be the first user to start the chat, I’ll save this for every Chat room consisting of 2 people. (EDIT: Should I use lexicographical ordering instead, with userID1 coming before userID2?)

I’ll create an index on the timestamp field so I can sort it in reverse, and then I can do the following to get the data every time the chat room is loaded (with pagination as you scroll up):

db.find({ userID1, userID2 }).sort({ "timestamp" : -1 })
  .skip(offset).limit(limit)

And this I think should give me my intended behavior.

  1. Does this make sense? Is there anything I’m missing out or have overlooked?
  2. Is this the correct practice, storing ALL user messages in one big Messages collection? Of course I’ll be implementing all necessary security/privacy protection but the fact still remains that all these messages are stored in one large collection. Is this a concern from a security/privacy/logical standpoint?

Appreciate any advice, thanks.

That particulate query will be better serve with a compound index on userID1,userID2,timestamp. See Performance Best Practices: Indexing | MongoDB Blog for the specifics.

1 Like

In your schema you might want also want to add a field “sent_by” or something similar, this way you could moderate / find messages by users with a search feature.

1 Like

Thanks for pointing this out, it’s a necessity actually. Otherwise I have no way of knowing who actually sent the message, oversight on my part.

I wonder if I should use lexicographical sorting to determine userID1 and userID2, with userID1 coming before userID2, instead of who initiates the chat first.

Could you merge userID1, userID2 and sender into just 2 fields?

sender rather than userID1
receiver rather than userID2

This would reduce the size of each document.

3 Likes

Thanks for the suggestion, that makes a lot of sense, then the accompanying query would be the following right?

db.find({ $or: [{ sender: userA, receiver: userB }, { sender: userB, receiver: userA }]}
  .sort({ timestamp: -1 }).skip(offset).limit(limit)

So I would then have to create a multikey index on the sender, receiver and timestamp fields?

1 Like

Another implementation you could do that would involve another collection is, when a new chat is created in a “Conversation” collection you can add the information:


{ id: 123
  participants: ['user1', 'user2'],
}

Then in the message instead of UserID1 and UserID2 you would just have “conversation_id”

{ sender: 'user1', 
  message: 'Hello World', 
  timestamp: time,
  converstationId: 123
}

This way if the members of the group change you only have to change it once in the “Conversation” collection and all messages referencing the ID will see the changes.

This would be similar to a One-to-Many with reference

4 Likes

That’s a nice approach! Thank you both for the insights and advice.

I was wondering (which was my second question), is storing a huge amount of random chat data in a collection good practice? I say random because in the Messages collection the ordering will be jumbled up when different users communicate at different times. Although it has no effect on the end users, is that acceptable practice?

A more traditional data structure would be perhaps to keep all this chat in an array within a document (but of course it’s subject to the 16MB BSON size limit), but logically this means there’s no way any messages are interleaved with other messages.

1 Like

In general I don’t believe that collections with a lot of documents is an issue. As long as your queries are indexed it shouldn’t be a problem.

Blockquote
A more traditional data structure would be perhaps to keep all this chat in an array within a document (but of course it’s subject to the 16MB BSON size limit), but logically this means there’s no way any messages are interleaved with other messages.

MongoDB does have a bucket design pattern. In which you store related items in an array.

{
    conversation_id: 12345,
    time: time,
    members: ['user1', 'user2'],
    messages: [
      {
         sender: 'user1', 
         message: 'Hello World', 
         timestamp: time
      },
      {
         sender: 'user1', 
         message: 'Hello World', 
         timestamp: time
      }],
   total_messages: 2
}

You could have a field called “total_messages” that is the sum of all messages and once it hits a certain number it creates a second bucket so you stay within the 16MB limit and don’t have the massive arrays anti pattern.
Although this may be more complicated than is required.

6 Likes

Thank you for the links and explanation. That’s a nice approach as well, but yes it comes with a massive array regardless, and queries become a little more complex.

What we’re doing now without using the bucket design pattern is essentially merging all of these potential arrays into one collection. I think I shall be going ahead with what’s been discussed so far with the sender/receiver method.

If you are working on a chat application that is great, I think you should have the feature to add more users.

Hi thanks for this… but i need to know how to get all the results by sorted


db.chats.aggregate(
    {
        /**
         * query: The query in MQL.
         */
        $match: {
          "members": {
            "$all": [
                ObjectId('63ad8631e0d6cfe452b80677'),
                ObjectId('63adadf287e482e6f128ec0e'),  
            ]
          }
        },
    },
    {
        /**
         * Provide any number of field/order pairs.
         */
        $sort: {
          "created_at": -1
        }
    },
    {
        /**
         * specifications: The fields to
         *   include or exclude.
         */
        $project: {
            "_id": 0,
            // "members": 1,
            "result": {
                $sortArray: {input: "$message", sortBy: {created_at: -1}}
            }
        }
    },
)

I have used this… but thing is every things are in result block and I want it to in one array so I can fetch previous messages with $limit and $skip so I can send back to API of user message…
here is my sample data

[
  {
    "result": [
      {
        "uuid": "df359053-5b69-4fc3-ac42-57aaf70ec9d9",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00011",
        "created_at": {
          "$date": "2023-01-10T19:52:26Z"
        }
      },
      {
        "uuid": "0cfe04ab-ccf2-4eb9-8760-d3471f033ea6",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00010",
        "created_at": {
          "$date": "2023-01-10T19:52:15Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "fe54bf97-5dad-4d3f-ada8-ea1f1ab31039",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00010",
        "created_at": {
          "$date": "2023-01-10T19:52:14Z"
        }
      },
      {
        "uuid": "9face81d-b989-4662-a090-2f8f498b1a7b",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00010",
        "created_at": {
          "$date": "2023-01-10T19:52:12Z"
        }
      },
      {
        "uuid": "8578ce16-f3ac-4ceb-b779-0c1f3e8fec2f",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00010",
        "created_at": {
          "$date": "2023-01-10T19:51:02Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "ef6f12e0-b601-4377-8ff9-28921799985b",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00009",
        "created_at": {
          "$date": "2023-01-10T19:51:00Z"
        }
      },
      {
        "uuid": "6fcbcc3b-ff09-4e1c-9ce5-d73790911592",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00008",
        "created_at": {
          "$date": "2023-01-10T19:50:47Z"
        }
      },
      {
        "uuid": "fdde26cd-6688-466a-be61-37ac8e312a70",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00007",
        "created_at": {
          "$date": "2023-01-10T19:49:33Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "fdde26cd-6688-466a-be61-37ac8e312a70",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00007",
        "created_at": {
          "$date": "2023-01-10T19:49:33Z"
        }
      },
      {
        "uuid": "fb72bef2-ceff-4e22-80cd-b54da6ba6f01",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00006",
        "created_at": {
          "$date": "2023-01-10T19:48:51Z"
        }
      },
      {
        "uuid": "3ffee1e6-8eb4-4a93-99a2-329576dc7754",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00005",
        "created_at": {
          "$date": "2023-01-10T19:48:34Z"
        }
      },
      {
        "uuid": "927726bd-acc9-461b-98db-9ce07cca5ff0",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00004",
        "created_at": {
          "$date": "2023-01-10T19:48:31Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "927726bd-acc9-461b-98db-9ce07cca5ff0",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00004",
        "created_at": {
          "$date": "2023-01-10T19:48:31Z"
        }
      },
      {
        "uuid": "664cb8fb-0284-4ad2-af1f-d9458fb09c24",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00003",
        "created_at": {
          "$date": "2023-01-10T19:48:25Z"
        }
      },
      {
        "uuid": "460e8a5a-327f-47c9-a8c3-76772cca4cc7",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:22:03Z"
        }
      },
      {
        "uuid": "ed4e5060-45f9-4f4a-a6a0-e45eddbd64bc",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:22:03Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "a45e6738-afb4-47aa-a8f4-fdf1d70eeb5f",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:22:03Z"
        }
      },
      {
        "uuid": "466b8569-12c4-4d06-8572-eb5999f0e4ae",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:22:03Z"
        }
      },
      {
        "uuid": "314b60eb-1eaf-4a72-a7a6-528ab474a0d9",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:22:03Z"
        }
      },
      {
        "uuid": "460e8a5a-327f-47c9-a8c3-76772cca4cc7",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:22:03Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "a45e6738-afb4-47aa-a8f4-fdf1d70eeb5f",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:22:03Z"
        }
      },
      {
        "uuid": "4f56f58d-1b0c-48f9-b3b6-c63020068760",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00002",
        "created_at": {
          "$date": "2023-01-10T19:21:55Z"
        }
      },
      {
        "uuid": "4fb86042-93ef-4209-88ca-8c287f16a2f6",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "00001",
        "created_at": {
          "$date": "2023-01-10T19:20:30Z"
        }
      },
      {
        "uuid": "3a6dc43d-bdb5-41b9-be93-15ab389380a9",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "New Docu ",
        "created_at": {
          "$date": "2023-01-10T18:44:01Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "69b1dd00-8ecf-4ba3-8df4-2cbba3887505",
        "sender": {
          "$oid": "63adadf287e482e6f128ec0e"
        },
        "message": "88888888888888888888",
        "created_at": {
          "$date": "2023-01-10T16:01:22Z"
        }
      },
      {
        "uuid": "3ea03321-3c99-48b6-a7fd-2cc5fa514fb3",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "0000000000000",
        "created_at": {
          "$date": "2023-01-10T15:51:20Z"
        }
      },
      {
        "uuid": "c8e35012-1106-4e3f-b678-6040002d54b8",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "pppppp",
        "created_at": {
          "$date": "2023-01-10T15:40:24Z"
        }
      }
    ]
  },
  {
    "result": [
      {
        "uuid": "b78f1288-5363-4b0f-9bbe-ddf73f60fac1",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "Kio nn sass,,,",
        "created_at": {
          "$date": "2023-01-10T18:46:39Z"
        }
      },
      {
        "uuid": "94e3ae3a-eb2c-4ceb-b580-e97f4ec49aba",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "Kio nn sass,,,",
        "created_at": {
          "$date": "2023-01-10T18:44:57Z"
        }
      },
      {
        "uuid": "80c86174-9836-477b-b0d3-e4cb968b015a",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "Another messgee 2",
        "created_at": {
          "$date": "2023-01-10T18:44:45Z"
        }
      },
      {
        "uuid": "3a6dc43d-bdb5-41b9-be93-15ab389380a9",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "New Docu ",
        "created_at": {
          "$date": "2023-01-10T18:44:01Z"
        }
      },
      {
        "uuid": "69b1dd00-8ecf-4ba3-8df4-2cbba3887505",
        "sender": {
          "$oid": "63adadf287e482e6f128ec0e"
        },
        "message": "i am fine thanks",
        "created_at": {
          "$date": "2023-01-10T16:01:22Z"
        }
      },
      {
        "uuid": "3ea03321-3c99-48b6-a7fd-2cc5fa514fb3",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "Buddy hhh.. ",
        "created_at": {
          "$date": "2023-01-10T15:51:20Z"
        }
      },
      {
        "uuid": "c8e35012-1106-4e3f-b678-6040002d54b8",
        "sender": {
          "$oid": "63ad8631e0d6cfe452b80677"
        },
        "message": "LOL How are you ",
        "created_at": {
          "$date": "2023-01-10T15:36:24Z"
        }
      }
    ]
  }
]

I have made 3 messages collection document after 3 new document is created… i will make it 1000 in production…
and I have use another thing called $concatArray but its adding only fields which I explicitly give example

// MongoDB Playground
// Use Ctrl+Space inside a snippet or a string literal to trigger completions.

const database = 'test3';

// The current database to use.
use(database);

// db.chats.find({
//   "members": {
//     "$eq": [
//       ObjectId('63ad8631e0d6cfe452b80677'),
//       ObjectId('63adadf287e482e6f128ec0e')
//       ]
//     }
//   }
// )

// db.chats.aggregate(
// {
//   $match: {
//   "members": {
//     "$all": [
//       ObjectId('63ad8631e0d6cfe452b80677'),
//       // ObjectId('63adadf287e482e6f128ec0e'),
//       ObjectId('63b3c2b8ce51cf1fca0325ad')
//       ]
//     },
//   } 
// },
// {
//   $sort: {
//     "created_at": -1
//   }
// },
// // {
// //   $project: {
// //     "total": 1,
// //     "_id": 0
// //   }
// // },


// {
//   $limit: 1
// },
// )
// .sort({"created_at": -1})

// db.chats.find(
//   {
//     "members": {
//       "$all": [
//       ObjectId('63ad8631e0d6cfe452b80677'),
//       ObjectId('63adadf287e482e6f128ec0e'),
//       // ObjectId('63b3c2b8ce51cf1fca0325ad')
//       ]
//     }
//   },
// )

// db.chats.find({
//   "members": {
//     "$eq": [
//       ObjectId('63ad8631e0d6cfe452b80677'),
//       ObjectId('63adadf287e482e6f128ec0e')
//       ]
//     }
//   }
// )


db.chats.aggregate(
    {
        /**
         * query: The query in MQL.
         */
        $match: {
          "members": {
            "$all": [
                ObjectId('63ad8631e0d6cfe452b80677'),
                ObjectId('63adadf287e482e6f128ec0e'),  
            ]
          }
        },
    },
    {
        /**
         * Provide any number of field/order pairs.
         */
        $sort: {
          "created_at": -1
        }
    },
    {
        /**
         * specifications: The fields to
         *   include or exclude.
         */
        $project: {
            "_id": 0,
            // "members": 1,
            "result": {
                $sortArray: {input: "$message", sortBy: {created_at: -1}},
            }
        }
    },
    {
        /**
         * specifications: The fields to
         *   include or exclude.
         */
        $project: {
          "messages": {
            '$concatArrays': [
                "$result", "$result"
                ]
            }
        }
    }
)

it will add two result to messages but not to all of that result
here my demo MongoDB collections [{ "_id": { "$oid": "63bd85a54ffc8dc686c0df5b" }, "message": [ - Pastebin.com

i solved this… but is it performance friendly


db.chats.aggregate(
    {
        /**
         * query: The query in MQL.
         */
        $match: {
          "members": {
            "$all": [
                ObjectId('63ad8631e0d6cfe452b80677'),
                ObjectId('63adadf287e482e6f128ec0e'),  
            ]
          }
        },
    },
    {
        /**
         * Provide any number of field/order pairs.
         */
        $sort: {
          "created_at": -1
        }
    },
    // {
    //     /**
    //      * specifications: The fields to
    //      *   include or exclude.
    //      */
    //     $project: {
    //         "_id": 0,
    //         // "members": 1,
    //         "result": {
    //             $sortArray: {input: "$message", sortBy: {created_at: -1}},
    //             // "$initialValue": {},
    //         }
    //     }
    // },
    {
        /**
         * path: Path to the array field.
         * includeArrayIndex: Optional name for index.
         * preserveNullAndEmptyArrays: Optional
         *   toggle to unwind null and empty values.
         */
        $unwind: {
          path: "$messages",
        //   includeArrayIndex: 'string',
        //   preserveNullAndEmptyArrays: boolean
        }
    },
    {
        /**
         * specifications: The fields to
         *   include or exclude.
         */
        $project: {
            "_id": 0,
            "messages": 1
            // "members": 1,
            // "result": {
            //     $sortArray: {input: "$message", sortBy: {created_at: -1}}
            // }
        }
    },
    {
        /**
         * Provide any number of field/order pairs.
         */
        $sort: {
          "messages.created_at": -1
        }
    },
    {
        /**
         * outputFieldN: The first output field.
         * stageN: The first aggregation stage.
         */
        $facet: {
          data: [ 
                {$skip: 0},
                {$limit: 2} 
            ],
            pagination: [
                {$count: "count"}
            ]
        }
    }
    // {
    //     /**
    //      * Provide any number of field/order pairs.
    //      */
    //     $sort: {
    //       "created_at": 1
    //     }
    // }
    // {
    //     $unwind: '$message',
    // }
    

    // {
    //     /**
    //      * specifications: The fields to
    //      *   include or exclude.
    //      */
    //     $project: {
    //         "result": {
    //             $sortArray: {input: "$message", sortBy: {created_at: -1}}
    //         }

    //     }
    // }
)

if you allow me then I’ll be the first beta tester of your chat schema design, which I used on my website page.

Of course, you can use it. and if you can please improve it. thank you

I appreciate you bringing this up because it is absolutely necessary. Otherwise, I would be unable to determine who actually delivered the message; this is a mistake on my part.