To denormalize the TPC-H benchmark dataset

I want to denormalize the TPC-H dataset having 8 relational tables and 22 relational queries. I want to migrate the relational database into MongoDB. Help me how to denormalize it using embedding and referencing in MongoDB?
Link for the TPC-H official document is given below:

The relational database tables are given below:
tpch-image
Queries are given in the official document.

Hello @neha_bansal

I don’t think there is enough clarity to provide a recommendation. You can find interesting facts on the Transitioning from Relational Databases to MongoDB in the linked blog post. Please note also the links at the bottom of this post, and the referenced migration guide .

There will be not the one and only answer. Your schema highly depends on how data is accessed. One rule of thumb is: that data that is accessed together should be stored together. Bases on this you will first need to evaluate your access pattern, define the most important queries, find relationships, define patterns to use. We need to take into account data durability and staleness at this point as well as cost of maintaining duplicated data / indices vs. fast access.

I highly recommend to take some of the great and free classes from the MongoDB Univerity resp. some documents on this subject:

When you still feel unsure after visiting the mentioned docs, feel free to provide some sample data and what you want to archive. I am pretty sure that we, as in the community, will find an answer.

Regards,
Michael

1 Like