Catalogue Design For Complex Pricing Model

In the current project at my workplace, we are designing a catalog module using MongoDB as database for e commerce site.

We have modelled all other aspects of catalogue except pricing part. It has complex requirement. Use case goes like this.

Say any product(P1) is sold by multiple vendors V1, V2, V3… across multiple districts D1,D2,D3 for given price PR1,PR2,PR3… respectively.

When a user logs in, we get district from user login token and based on user district, we pick the least price available in that district considering all given vendors in that district. D1-> ( (V1-PR1), (V2-PR2), (V3-PR3)… )

Here D1 = District1 , (V1 - PR1) is price PR1 for vendor V1 and so on. Assume V2 has the least price among all, so we pick PR2 for product P1 and show it to user.

Now the question is how do we model this.

Number of Districts is almost a constant. But number of vendors for given district is scalable i.e the number can grow huge. If we have at least 10000 Products, 1000 districts and 100000 vendors the number of documents we create to model each price goes insanely high 10000 * 1000 * 100000 = 1000000000000 i.e 1 peta byte documents. Querying this information or bulk update of prices during sales would be extremely challenging

I have gone through the link Antoine Girbal's Corner • Product Catalog Part 1 - Schema Design. However the pricing model described there is not that clear. Is there any other strategy to model this pricing information apart from doing naively. Thanks in advance


With MongoDB you need to see other people idea for schema design as a brainstorming opportunity but should always design the schema specifically for your needs as long as you avoid the known antipatterns.

Here we want to avoid , large arrays and too many collections.

Now according to your calculations every vendor is selling in every district and selling every product… Is that real life scenario? I would assume that this worst case will never happen in real life…

If for some reason this worse case scenario is possible , you can think about partitioning the data based on districts perhaps. So every group of districts might get its own collection (25 district per collection = 25000000000 docs which is large but might be maintainable). You can have a mapping collection of districts to collection name. This means you will have 50 collections. When you bulk update the product with Vandor/product/price indexed you can run it in parallel using bulk updates against each collection.

As I understand the vast majority of queries will be based on the district/product and sort by price.

Application login > Extract District > Get collection to work against > query District = X, Product = Y and Sort by Price. Pick the top and limit the result set…

Let me know if that makes sense to you.


1 Like

Hi Pavel
Sure, your thoughts added value to my design considerations

1 Like

This topic was automatically closed after 180 days. New replies are no longer allowed.