Data modeling question on an M:N relationship

Michael_Jay · May 19, 2020, 9:18am

I learned in the amazing Mongo University Data Modeling course that typically a many-to-many relationship is modeled as two collections, with an array of references to one collection in the other collection.

I’m facing a design problem, and I would like to get the opinion of some people with more experience than me. I need to make a good argument for my plan (laid out below) because someone else working on this project is suggesting a lookup table, essentially. He comes from a SQL background - so I feel in some ways that it’s one of those “if you’re a hammer, everything looks like a nail” scenarios. But I may also be wrong!

Consider two collections: one that stores Users and one that stores Clubs. A Club may have many Users (and must have at least one), and a User may belong to many Clubs.

Because a Club can assign permissions to its Users, and within the scope of the Club, those permissions will need to be easily queryable, I decided to store User references in Club documents.

However, it is also crucial that all of a User’s Clubs are easily queried. So after some consideration, I decided to take the hit on data redundancy and also store an array of Club references in User documents. Now, when I need all of a User’s Clubs, I query on the User ‘Clubs’ field, and when I need all of a Club’s Users, I query on the Club’s ‘Users’ field. From my (inexperienced) point of view, I’m trading the cost of each User’s array of Club _id’s (which I think are 8B/ObjectId) for improved querying.

I guess my partner wants to store { UserId, ClubId, Permissions } documents in a new join collection. It just feels icky and too “SQL-y” to me. But otherwise, I can’t think of a really good reason not to do it this way, other than the fact that now I’ve got to make two queries or a $lookup for either mentioned query - every time. I know that if I ever choose to shard either collection, that I won’t be able to use it as the joined collection in a lookup - so I’m weary to go that route.

Thanks for any help.

wan · June 18, 2020, 6:16am

Hi @Michael_Jay,

It’s been a while since you posted this question, have you found an answer yet ?

Generally data modelling is a broad topic to discuss, this is due to many factors that may affect the design decision. One major factor is the application requirements, knowing how the application is going to interact with the database. With MongoDB flexible schema characteristic, developers can focus on the application design and let the database design conform for the benefit of the application. See also Building With Patterns: Summary

Depending on the use case, if you’re storing user references as an array in a club document you may have a very large array. Also, updating users within those array may not be a straight forward process.

One alternative is to keep the reference of a club in users document. You can still utilise $lookup to query all users that belong to a club.

For example, if you have Users documents as below:

{"name":"Alice", clubs:["Foo"]}
{"name":"Bob", clubs:["Bar"]}
{"name":"Chuck", clubs:["Foo", "Bar"]}

Example of Clubs documents as below:

{"club_id":"Foo"}
{"club_id":"Bar"}

Example to find all users for club Foo :

db.clubs.aggregate([
   {"$match":{"club_id":"Foo"}}, 
   {"$lookup":{
       "from":"users", 
       "let": {"cid":"$club_id"}, 
       "pipeline":[
           {"$match":{"$expr":{"$in":["$$cid", "$clubs"]}}}], 
       "as":"members"}
    }
]);

You may also find Data Model Examples and Patterns a useful reference.

Regards,
Wan.