Suggestion for data modelling for big heavy read-write collection

Hi there,
I’m in the early stages of my journey so please feel free to fix my flows.

I have a sale tracker app which will have users and each user can generate thousands of sale documents in a sales collection. I’ve read about data modelling options on the MongoDB blogs as much as I can understand and find a little solution but I’m not sure if it’s applicable or not. I’m going to create a new sales collection for each user and store the name of the collection in each users document. I’ll generate the collection name as {userId}_sales and store it with the user. So whenever a user logs in, the application will look for that collection and call it. My app is like a cashier, users submit their sales and track the financial performance so it’s heavy both on read and write. What do you think about this approach? Is there any better way you can suggest?

generallyI don’t see any issue with this approach.

But since you are not explaining your query patterns, we won’t be able to know if this is best or not.

2 Likes

Since yesterday I’ve tried to implement this perspective to my code but all ended with more failure. Right now it’s not possible for me to write a node.js driver to create a collection whenever a user creates account. The way I’ve learned, I need the name of the collection to create mongodb connection and route. It’s best if I learn to create a healthy connection before overthinking whether my database could slow down after thousand update :disappointed:

Hey @Sezen_Cetin,

Welcome to the MongoDB Community Forums! :leaves:

Generally, when designing a schema in MongoDB, one follows a thumb rule that “things that are queried together should stay together”. Some other things to consider given your use case:

  • If there are a large number of users and you are expecting these many users to generate many sales documents, it might not be the best idea to create a separate collection for each user, as it can result in a large number of collections. This can lead to performance and management issues, such as increased resource consumption and increased difficulty in managing collections. You can read more on this here: Massive number of Collections
  • Since you mentioned it will be heavy on the read side too, if one has to frequently access sales data, it might be more efficient to store all sales data in a single collection and index on the user_id field to allow for fast lookups of a user’s sales data.

Hence, as you have noticed, it’s hard to say without more information whether the approach you described would be the best one for your use case or not. But one great thing about schema designing in MongoDB is its flexible nature. One can easily evolve their schema with time with little or almost no downtime. It would be good if you start with identifying your queries - and then based on that consider how you want to define your schema. I would suggest you try to experiment with different schema design ideas and try to simulate the expected workload, and see how the design behaves with more and more data. mgeneratejs is a great tool to create random documents for testing purposes.

I am also attaching a MongoDB Blog that you can refer to MongoDB Schema Design Best Practices. You can also read about different patterns in MongoDB here: Building with Patterns. Also, if you’re new to data modelling in MongoDB, it might be worthwhile to check out our University Course: Introduction to Data Modelling

Please let us know if there are any more questions about this. Feel free to reach out for anything else as well.

Regards,
Satyam

2 Likes

Thanks a lot, these sources you refer to will respond all the struggles I’m having.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.