Looking for advice on designing schema for performing efficient queries

Hello, I am fairly new to MongoDB and I’m looking for advice on designing the schema before I commit to going down this route. I’m developing a collaborative documentation system, where the user creates a document and invites other users to collaborate, much like Google docs.

There are two collections. The first one stores documents and the second one stores lists of collaborators. When the user creates a new document, they assign a list of collaborators to this document. In the simplest form, the schema would look something like this

The Document schema contains some data but it also maintains a reference to a document in the Collaborators collection

   Document model
    {
       ....
      collaborators: ObjectId; // e.g. 0x507f1f77bcf86cd799439011
    } 

Collaborators collection contains documents that contain an array of roles for the collaborators.

Collaborators model 
{
     _id: 0x507f1f77bcf86cd799439011; // refererenced by Document model
    collaborators: [
        {userId: 1, role: "editor"},
        {userId: 2, role: "commenter}
   ]
}

I will have an API that fetches all those documents where the logged-in user’s userId is in the list of collaborators referenced by the document. Without much experience with writing efficient queries, I think a two-step lookup will work but it won’t be very efficient.

Step 1 → Find all the collaborators lists which contain userId, and obtain their _id field
Step 2 → Find all documents that have collaborators field containing one of the values found in Step 1

Is there a more efficient way to construct this query particularly if the users fetch this list frequently?

If I should redesign the schema in some way so that the lookup can be efficient, I’d like to know.

I realized using mongodb aggregation framework is what I needed. I was able to use $lookup and $match stage to achieve what I want. Still not sure how expensive this is given that $lookup will perform left join.

Here’s an example if anybody wants to look.

Hi @Vineet_Dixit1 ,

Is there a reason you don’t want to embed the collaborators into the main document model? It seems like a 1 to 1 relationship.

You can than query just one collection on the embedded list to find all data in one go without lookup.

In general both sides will have similar best practices where 2 main rules apply:

  1. Data that is accessed together should be stored together.
  2. Duplicating immutable data is fine even on the cost of storage for better query pattern coverage.

I would recommend reading:

  1. A Summary of Schema Design Anti-Patterns and How to Spot Them
  2. MongoDB Schema Design Best Practices
  3. Building with Patterns: A Summary | MongoDB Blog

Thanks
Pavel

Almost the same thing I am facing. MongoDB is getting complicated as much as I study it more and more.

  1. Embedding leading to unbound array threatening for MAX_SIZE. The only solution is an outlier pattern, and what if the outlier document starts MAX_SIZE.
  2. Referencing Documents leads to $lookup, and all schema guides create an impression that it is not a good practice.

It is such a sad thing!

I am looking my alternatives in Graph Database. Let’s see if they cover something for me.

Hi @Ali_Muqaddas ,

The beauty of outlier documents is that you can have many of them chained. So an outlier filled document will require to open an available new document if such does not yet exist.

Now reference can be done in many ways , it doesn’t mean you have to use $lookup. You can do 2 queries for example …

I suggest to read the embedding consideration antipatterns for better understanding.

Specifically focus your reading on massive arrays example and how to best reference it…

Mongodb is a great graph database and I personally know huge customers with massive graph data that were successful using MongoDB, the learning curve worth the outcome believe me…

Ty
Pavel