Search for documents using an array of _id vs search using an indexed value inside those documents

Hi, I’m fairly new to mongodb, and I’m building a small chat project. I’m trying to think of the best way to retrieve a users message history.

I will save the messages with the _id of the sender and receiver

So in the first scenario, I will save the _id of the message and place it in an array in both the sender and receiver documents. and when I fetch the messages, I will use that array to look for the messages.

In the second scenario, I will not save the _id to the sender and receiver, and when it comes time to fetch the messages, I will search for the messages by using the sender or receiver _id.

Which do you suppose will be faster in a large dataset. For small ones, like the one i’m building, i supposed it doesn’t really matter, but say for example the message documents grow to a couple of million.

And I would add that the senderId and receiverId will be indexed as well

I know there are better ways, but this are the only two I can think of at the moment.


you may end up with the Massive Array anti-pattern for very popular sender or receiver.

In a large dataset, the first scenario where the _id of the message is saved in an array in both the sender and receiver documents is likely to be faster. This is because when fetching messages, the array can be used to quickly find the relevant messages, without the need for an index lookup. In the second scenario, a search needs to be performed on the sender or receiver _id, which would require an index lookup and could be slower.
It’s worth noting that there may be other ways to structure the data for even faster retrieval, such as using a separate collection for messages or using a combination of indexing and query optimization. However, for the two scenarios proposed in the question, the first one is likely to be faster in a large dataset.

hope this was helpful

@Deepak_Kumar16, your answer seems straight out of what ChatGPT would produce.

Why do you quote the whole original post?

How storing the _id of the message in one collection make fetching the message in another collection faster. You get the _id fast but not the message. Contrary to what you wrote, you will need an index lookup in the message collection to get the message.

Such as a separate collection for messages, really, if you only store the _id of messages in the sender and receiver, where do you store the messages if not in a separate collection.

How, please be specific.

Can you point to some documentation that substantiate what you wrote.

In both scenario there is lookup, you either lookup the messages in the messages collection by _id stored in the receiver/sender array or you lookup the messages in the messages collection by the receiver/sender _id.

Not really because it is wrong and lacks details.