Designing a database structure

Here is my current mongoDB database idea:

alliance info collection

"name": "guild1",
"id": abcd,
"totalExp": 10101001010101,
"tag": "GUILD_ONE"
"name": "guild2",
"id": abce,
"totalExp": 10101001010101,
"tag": "GUILD_TWO"

member collection

{"member_id": 1111,
"current_alliance": "guild1",
"alliances": {"guild1": {"11-05-2021": 100000,
                         "12-05-2021": 100000,
                         "13-05-2021": 100000,
                         "14-05-2021": 100000,
                         "15-05-2021": 100000,}
              {"guild2": {"06-05-2021": 100000,
                          "07-05-2021": 100000,
                          "08-05-2021": 100000,
                          "09-05-2021": 100000,
                          "10-05-2021": 100000}
{"member_id": 2222,
"current_alliance": "guild1",
"alliances": {"guild1": {"11-05-2021": 100000,
                         "12-05-2021": 100000,
                         "13-05-2021": 100000,
                         "14-05-2021": 100000,
                         "15-05-2021": 100000,}

-I am trying to store an unlimited number of days experience for EACH member.
-I thought they could each get their own document otherwise the alliance info documents would become massive over time.
-Members can join and leave different alliances, but I am trying to keep all their experience history for all their alliances they have been in.
-Language is python.
-Each alliance can have a total of 100 current members.
-I want to store member’s experience for an alliance even if they leave it.
-For each document in the alliance info collection their will be up to 100 documents in the member collection.

Queries will be per alliance, so if I was querying “guild1”, I would need to select 100 documents from the member collection.
If i had an index for “current_alliance” I thought I could use this query to select all members from one:

cursor = db["member"].find({'current_alliance': "guild1"})

Each day I will add the member’s daily experience for the current alliance they are in, so an example for updating “guild1”:

result = await db["member"].bulk_write([ UpdateOne( {"member_id":1111},{'$set': {'alliances.guild1.15-05-2021': '100000'}}),
                                         UpdateOne({"member_id":2222},{'$set': {'alliances.guild1.15-05-2021': '100000'}})
                                         #...for uo to 100 members...

-Both the queries above seem to work however i dont know how optimal bulk_write() or find() are?
-Is doing a bulk_write() for 100 documents in one collection OK?
-And also when I need to retrieve data I have to select all 100 member’s documents from the member collection before i can do anything (for one alliance).
-Does this model look OK?
Thanks very much!

Hi Co0kei, welcome to the community.

This seems an interesting project and it shows you are already considering a lot of things like storage, performance, usability and ease of development by asking directed questions.

Since this appears to be going in direction of a big data project I feel you need to step back from the implementation and take few moments to clearly define your use cases, requirements and assumptions first.

A few questions I would ask myself:

  • Who is going to use the system and how?
    For example, the system will be used by guild members to see their daily experience history (for last 3 years, otherwise yearly summary), guild masters will see guild experience summary and 10 highest scorers. etc
  • What data will system need to produce?
    For example: weekly summary of…
  • What is the most likely operation to be and most data intense?
  • List all operations that are envisaged to be needed (high level API)
    For example: add user experience, delete user, user change guild, add guild to alliance, etc
  • What are my assumptions?
    For example: 100 users will change guild daily, system will serve 10K users, itemised records above 3 years are not required for the user if not in active guild
  • Do your approx db sizing
  • Think of other non-functional requirement
    Including cost of running if data exceeds threshold, support, maintenance cycle
  • Think of potential bottlenecks

The more you capture even the informal way the better and you will find it easier to take decisions and design the system. (Tip: use Freeplane to capture your ideas, use to sketch your initial designs). Yes, it takes time but if you jump right into coding you will make mistakes that will be very costly to rectify, jeopardise or even forfeit your project.

Best of luck and if you do define your requirements in clear way I would be even interested to discuss decisions and help design.

Hey MaxOfLondon, thank you very much for your reply! I was very happy to see a reply and such a detailed one as well!

I have tried using but haven’t figured out how to “draw” a database design. I ended up thinking about the questions you suggested.

I have also been thinking a lot about the data model of the database and wonder whether the model I suggested would lead to good performance or if it is an inefficient design.

I hope the questions below help to illustrate the use of data and any further replies would be greatly appreciated! Thank you for your time.

Who is going to use the system?
-The system is used by anyone to view alliances experience data.
-The important part of the design though is to have EVERY members EVERY days experience value recorded (not just one monthly total of all days). So that a member can look back and see their exact experience from a particular day in the past.

What data will system need to produce?
-The system will only need to return experience values, and for one particular alliance at a time. So each READ operation from the database will be collecting members from the same alliance.

Examples of what people could be requesting:
-A Daily leaderboard per alliance - Needs to retrieve today’s data from all 100 member’s documents.
-A weekly leaderboard per alliance - Needs to retrieve last 7 day’s data from all 100 member’s documents.
-A monthly leaderboard per alliance - needs to retrieve last 30 days data from all 100 member’s documents.
-Past month leaderboards per alliance - E.g. say for January select all values that have keys “XX-01-2021” from all 100 member’s documents.
-A yearly leaderboard per alliance - E.g. for 2021 select all values that have keys “XX-XX-2021” from all 100 member’s documents
-And past year leaderboards per alliance - So for 2021, 2022, 2023…
-An individual member’s daily experience history (as long back as it goes BUT only for the alliance that the member is currently in) that shows their daily experience AND position for EVERY day they have data. For example say today they got 100,000 xp and placed number 1 out of 100 in their alliance on one day. Then yesterday they got 10,000 xp and placed number 80 out of 100.

What is the most likely operation to be and most data intense?
Most popular queries I predict:
-The individual member experience history. I think this would be used the most, however in most cases people would just look at THIS month’s experience and maybe last months. But less commonly people will wish to look through past months and go back many months.
-Maybe database design STORES ALL MEMBERS (from one alliance) last 30 or 60 days (two months) daily XP together in one additional document or the alliance info document to reduce reads/writes?.
-I think the individual member history is the most intensive as it not only shows their daily experience but also their daily POSITION (in their alliance) ranked by experience of other members.
-Then I think the daily experience of each member (daily leaderboard per alliance), weekly,- monthly and finally yearly.

List all operations that are envisaged to be needed
-Inserting a new alliance info document in one collection and then the corresponding 100 members into the member collection, when a new alliance is registered (starts getting experience tracked).
-Updating all member’s documents in the member collection with their daily experience.
-When a person requests an alliance’s data from the database, e.g. a monthly leaderboard, the member’s experience must be within say 10 minutes of their actual experience (say there is a “last_updated” value stored in the database). If not less than 10 mins I request the alliance’s members up to date data from a public API (which only shows the past 7 days daily experience but for all members though!) and update all 100 member’s daily experience values for that alliance.
-When a member changes alliance, the “current_alliance” value will change to their new alliance’s name, and in the member’s “alliances” field (in their document) the new alliance name will be added which will then store all the experience the member gets whilst in this alliance.
-When a member LEAVES an alliance their experience from that alliance is still stored however is it NOT used in ANY queries whatsoever. It is only used if that member were to re-join that alliance again. Then it would be used in daily, monthly leaderboards etc for that alliance.

What are my assumptions?
-Idea: some check to stop updating an alliance’s history if all the members have 0 experience for say the last month (the data for this alliance no longer saved as it is inactive).

Do your approx db sizing
-I hope to be able to implement a good data model so that the database can continue to grow in size without having performance issues in the future that would require drastic data modelling redesign.
-For database sizing: I hope to be able to store data for thousands of alliances which will make hundreds of thousands of member documents in the member collection (database would start small, but hopefully I can quickly increase number of alliances being tracked!)
Think of potential bottlenecks
-Current bottlenecks I see are updating the 100 member’s separate documents for an alliance (as im not sure how intensive a bulk_write() is. IF no data is requested to be viewed for a certain alliance throughout a day, then all the members in this alliance will have their daily experience for today updated just once at the end of the day.
-But if there is an alliance that is being requested from lots, the alliance must update the daily experience of all its members if it has not been updated in the past 10 mins (to keep experience up to date as members obtain it, and not be hours incorrect). Example: someone requests a daily leaderboard for an alliance but it was “last_updated” over 10 mins ago, so first the database is updated (all the member’s todays exp values are retrieved (from an API which provides all the members from one alliance) and written to database, then the data is read from the database and used to provide the daily leaderboard. Then if another person requests a monthly leaderboard for the SAME ALLIANCE just 1 minute later the data is just read again from the database without having any WRITE operations to be executed.

Hi @Co0kei, I was unavailable and only now managed to read your response. Let me digest and think about it a bit then we can discuss what solution might be appropriate in terms of db design. One observation though, storing all data for always is not really practicable and inevitably will lead to performance issues and might not even be possible unless you are financed adequately :wink:

Other though: in line with GDPR user can request data to be removed

Hi @Co0kei, nice progress, the model 2 seems better suited for reads and data transfer.
I was also thinking about tracking history of alliance membership, guild name changes, etc and got this initial draft - work in progress.

As it is late today I will have another go tomorrow - this might change drastically - but welcome your thoughts.
Take care.

Hi MaxOfLondon, thank you very much for your interest and even designing a draft!
Sorry if I didn’t explain correctly but an alliance or guild is the same thing (just a group of 100 members).
It’s an interesting draft, but I think there will have to be writes to many documents (when updating each member’s daily experience)?

What would you think of using model 2 which I wrote above where each alliance/guild has a document per month that has all the members in it. I think this would be beneficial as when reading the data, less documents would be read, as only 1 for 1 month or 2 for 2 months (instead of 100 for all the members). I am less inclined to model 1 now as I think reading a years worth of data could be unnecessary for the majority of queries.

Thank you for showing interest in this project! Take care!

Hi Co0kei, Thank you for the clarification, I think I now understand what you are aiming at. Yes, the second model seems reasonable in that case. I think the most common query will be individual wanting to know ranking within their guild so sorting result when retrieving could achieve that but I’d suggest that for longer time intervals of time (last: 7 day, 30 day, 365 day) you calculate it once a day using crontab and store in separate collection rather than calculate it from all the data on each read.
I wouldn’t worry about updating multiple collections or several updates at same time too much, just make model logical enough to get information you want read with least cost of query and data transfer.