Many to many relationship and linked table/collection

Hi there,

I’m new to NoSQL databases and MongoDB and have some troubles understanding some core schema-design principles, despite the very well structured MongoDB’s docs and similar answers all around the i-net.

In SQL world, there are Linked tables, mostly used as many-to-many references between other tables, in MongoDB, I didn’t find any doc entry or tutorial equivalent to them. The only many-to-many solution I found is to link each connected tables to each other via DBRefs:
So, for example if I have Students and Courses, I will do(using mongoose):

const StudentSchema = new Schema({
  name: String,
  courses: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Course' }]
});

const CoursesSchema = new Schema({
  name: String,
  studends: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Student' }]
});

And for me it’s a very bad schema-design, because it’s not scalable for the reason the array of students and courses can grow very large(student-courses was just an example). So from scalability perspective, it’s bad.

Are there any alternatives to that?
What if I’ll use a linked tables (collection) to connect between the two, how should I implement it on the application level: should I first make a call to the linked collection, await it and only then perform another query? Can I do it in one go(make one request to the DB server), because awaiting for the first response on the application level can take time?

Hopefully, you could assist with the above concerns of mine.
Thanks

1 Like

Hello @John_D1, welcome to the MongoDB Community forum!

I have some information for you and hope it is useful. I will assume there are a few thousand students and few hundred courses, and I could model something like this:

student:

{
  _id: <number>,
 name: : <string>,
 otherDetails: { ... },
 courses: [
      { courseId:  <number>, courseName: <string> },
      { courseId:  <number>, courseName: <string> },
      ...
  ]
} 

course:

{
  _id: <number>,
  name: <string>,
  description: <string>,
  otherDetails: { ... }
}

Now, your application has some queries. To start with some queries I can think about is, get all students in a particular course and get all courses for a particular student. These are simple queries.

To get all courses for a specific student, the query would be:

db.students.find( { name: "John Doe" }, { courses: 1, name: 1 } )

To get all students enrolled for a specific course, your query can be like this:

db.students.find( { "courses.courseName": "Database Design" } )

Note that the data related to the course is duplicated - the course name and id are present in both collections. You can also note that these details will not change often (or may not change at all). This is feasible, though you have the same data in both collections. And, this also makes querying simpler.

Some important aspects that influence database design are the amount of data in each entity, the size of each document, the relationships between entities, the application functionality (note that all this is part of an app, and the app has various functions, like displaying certain data in browser window, for example), and the other queries you write for the application (the CRUD operations).

The “linked table” is not needed in the MongoDB data design - which allows denormalized data through embedding or referencing. Based upon the kind of queries you write you can have the course referenced within the student or vice-versa.

In the above student entity, the courses field is an array of { courseId: <number>, courseName: <string> } embedded documents. You can also include more course details in this embedded document, or just the courseId field. If you have just the courseId in the courses array field, then you will have to use an Aggregation Query with $lookup to get more details about the courses.

3 Likes

Thanks @Prasad_Saya for such a detailed and quick reply.

The thing is, Courses and Students was just an example and the point was to properly address the issue of many-to-many relationship between two(or more) collections in large, scalable systems.

Just imagine the amount of courses per student, but also students per courses can be very, very high. Holding such a huge data in each document will lead inevitably to serious performance issues.

In SQL world, this is pretty common scenario and you can make a linked table and query whatever you need in one go.

MongoDB made for scale, as far as I read, I honestly can’t believe the very common scenario isn’t address properly yet. Or perhaps I’m mistaken? Please prove me wrong.

1 Like

Hello @John_D1, you need to provide some realistic data and application to discuss what and how this can be designed. Real data and real applications have many attributes and relationships and requirements - that is where the data modeling as a process can help.

@Prasad_Saya, say you have Products and Customers. There are millions of Products and there’s millions of Customers.

@John_D1, it is a very broad and vague description of a problem. What is it you are doing (or want to do) with those products and customers? What is the nature of their attributes?

Designing, again, requires a lot of information; thorough investigation of various aspects. I think you can start with information I had already posted, and build on that. All kinds of “large amounts” have some solution - but they are not on their own - lot of “other factors” influence the design in such cases.

1 Like

A post was split to a new topic: Is there any method for modifications like “OnCascade” in SQL?

@John_D1 Have you found any solution for this issue?

The document increased dramatically. Any other way to create pivit table and access the data in a single query?

Let me give an example.
Let’s say there are social media posts and let there be users.
Now each user can like tons of posts and each post can be liked by thousands of users.
So how am I supposed to model this many to many relationship in mongodb ?

5 Likes

In my opinion, for instance, in a scenario with lots of students and courses, it’s more efficient to include the course ID in the student’s data rather than including all student IDs in the course data. This way, considering that a student might have fewer courses compared to the number of students in a course, it’s more practical to link them by including the course ID in the student’s information.

maybe this article will help you

Although Prasad_Saya’s solution works but it might have issues if the embedded data grows.
What I would suggest is to use the Subset pattern where you store only a subset of the documents as embeds and the rest in its own collection, let me explain:
Imagine we have a Posts collection and a Likes collection. There can be millions of posts and millions of likes - In our Posts collection, we’ll keep a subset of the likes — let’s say the most recent 100 likes — to keep things snappy. Meanwhile, all the likes, including the older ones, will reside comfortably in our Likes collection.

Here’s a glimpse of how our documents might look:

posts:

// Posts Collection
{
  _id: <post_id>,
  content: <string>,
  author: <user_id>,
  likes: [
    // Keep only the most recent likes here (Latest 100 likes)
    { user_id: <user_id>, timestamp: <date> },
    { user_id: <user_id>, timestamp: <date> },
      ...
  ]
}

likes:

// Likes Collection
{
  _id: <like_id>,
  post_id: <post_id>,
  user_id: <user_id>,
  timestamp: <date>,
  // Store all likes here, including the older ones
}

We keep the most frequently accessed likes directly within the post documents, reducing the working set and improving performance. Meanwhile, the older likes peacefully reside in their own cozy corner, ready to be fetched whenever needed.

Here is an article I have written on medium explaining this: How to model data for many to many relationship in MongoDB | by Ashique Desai | Apr, 2024 | Medium

Additionally a link to MongoDB blog explaining the subset pattern:

You can search for and refer to the subset pattern or read the below article: How to model data for many to many relationship in MongoDB | by Ashique Desai | Apr, 2024 | Medium