Modeling collection(s) to store prices evolution

Hi everyone, :slightly_smiling_face:

→ First (long) post in the Mongo DB community. :tada:

I will briefly introduce myself :wave: :

I’m a french 31 yo Data Analyst and engineer working mainly with SQL (Bigquery/Teradata at the moment), I’ve been working for 6 years now and I didn’t specifically study “Data” before that.
I’m more and more enthusiastic about data related technologies but, as you know, there is a lot to learn. To force myself into that (and shine at work) I’m trying to incrementally build an app/website that I would enjoy developping and that would be used for a portfolio.


What is this app about ?

Context :

Another (last?) thing about me, one of my hobbies is the trading card game Magic The Gathering (MTG). One important aspect of this game is the secondary market, where A LOT of cards are sold/bought by professionals or people just like me, everyday, hours, minutes… (you got the idea…)


Objectives of the app :

  • Inform about MTG cards’ prices spikes
  • Predict MTG cards’ prices (not sure if would be able to do that, maybe in a second time, because Data Science seems difficult)
  • Display informations with graphs and tables

Who is this app for ?

  • Personal use, portfolio. Maybe friends or even …
    MTG investors ? :telescope:

What problem does it solve ?

  • European card market does not have an easy tool to do that, mainly because cardmarket (the main european website for cards sales) is quite restrictive / doesn’t have API available for an individual.

How is it going to work ?

  • Scrap cardmarket’s website regularly (How often ?) to extract prices/quantities and store them in a database, then transform the data and compute it in a meaningful way. (mean price at different intervals, quantity sold since previous day/week/month, compare to the entire set trend …)

  • MVP : The user just select the cards he wants to follow and then the app will display tables and plots to help make decisions (Should I buy this card or should I sell this card ? When ?) I could also build an alerting system and/or predict prices.

Here is a draft schema :


How can you help me ?

  • Give me some insights about the model. (Is Mongo DB a good choice ?)
    I followed the webinar with @Michael_Lynn yesterday (really interesting for SQL users like me by the way, thank you and your team if you see that reference) and I learnt :

“Data that is access together should be stored together”

(So I feel like I would only need one big indexed collection :thinking: )
Click here for the : "Storage Structure example" :
[
    {
        "timeStamp":1451649600511,
        "card":"Teferi's Protection",
        "sets": [
            {
            "set": "Double Masters 2022",
            "prices": [
                {"idPrice": 1, "price": 12},
                {"idPrice": 2, "price": 13},
                {"idPrice": 3, "price": 14}
            ],
            "meanPrice":13
            },
            {
            "set": "Commander 2017",
            "prices": [
                {"idPrice": 1, "price": 11},
                {"idPrice": 2, "price": 12},
                {"idPrice": 3, "price": 13}
            ],
            "meanPrice": 12
    },
    {
        "timeStamp":1451649600512,
        "card":"Teferi's Protection",
        "sets": [
            {
            "set": "Double Masters 2022",
            "prices": [
                {"idPrice": 1, "price": 9},
                {"idPrice": 2, "price": 10},
                {"idPrice": 3, "price": 11}
            ],
            "meanPrice":10
            },
            {
            "set": "Commander 2017",
            "prices": [
                {"idPrice": 1, "price": 10},
                {"idPrice": 2, "price": 11},
                {"idPrice": 3, "price": 12}
            ],
            "meanPrice": 11
    },
    ...
]

This example shows the relative complexity of the subject.
Mainly : One card can be printed in X sets and I need to follow the prices for each set.
Also, a selled card can have a lot of specificities/filters (condition, location of seller, reputation of seller) :

Filters screenshot (Click here)


  • Help me on the global solution.

As I said, the data landscape is so big, it’s really difficult to chose a solution design, but here are some …

Prerequisites (Click to expand)
  1. Free / Open source : Can’t afford to put the app in the cloud

  2. Adopted by the data community (remember I want to use this project for a portfolio)

  3. Scalable :chart_with_upwards_trend: (I will start by one card (with multiple sets) and one user (myself :smiley: ) but I would like to have the option to grow and remain free or at least really cheap)

  4. Elegant : Concerning the visualization, I need it to be pretty :nail_care: I think Panel + Plotly is a good option. But I don’t know if it covers all my needs.

  5. Quite simple AND with Python : I know that if I make it too hard for myself, I won’t succeed to deliver a first version of the app (at least at the beginning), because I can’t spend too much time. (work, family, sport, friends, … you certainly have the same constraints :slight_smile: ) Also, I want to use Python as much as possible.

And here some …

Difficulties (Click to expand)
  • Also, for now, I did some tests locally on my Windows computer but what if I want to push it online for free or really cheap ? (less than 5 € a month for 10 users max. for example)

  • The Web Scraping part, especially because I know I will have to use proxies if I don’t want my IP blocked when I will need to fetch a lot of cards quite often (And How often is one of my question as I explained above)… But at least I found this link for the basics :+1: : MongoDB Data Scraping & Storage Tutorial | MongoDB | MongoDB


I will edit this post if you need some clarifications or to update it with the progression.
Thank you if you made it that far, and thank you in advance for your help.

And sorry for the potential mistakes (a long technical post in english is quite hard for a french dude … :sweat_smile: )

Hey @Gregory_Desprez,

Welcome to the MongoDB Community Forums! :leaves:

You are correct that in general one should design their schema according to how the data will be used instead of how the data will be stored. Thus, it may be beneficial to work from the required queries first, making it as simple as possible, and let the schema design follow the query pattern.
I would suggest you to experiment with multiple schema design ideas. You can use mgeneratejs to create sample documents quickly in any number, so the design can be tested easily.

Given the prerequisites you mentioned, I believe you can make use of MongoDB Atlas to start and make your App. You can use Charts for visualization, and Pymongo driver to work with Python easily. For the pricing part, you can refer to Atlas Pricing.

I would advise you to start with your data modeling, start working with a shared cluster first(which is free by the way), and once you have gained the necessary skills and feel the app is working as you expect, then you can start to think of scalability, adoption, etc.

Additionally, since you mentioned you’re new to MongoDB, I am sharing some courses that you can do from our University Platform. They should really help you get started quickly.
Introduction to MongoDB
Data Modelling in MongoDB
MongoDB for SQL Pros

Please let us know if you have any additional questions. Feel free to reach out for anything else as well.

Regards,
Satyam

1 Like

Hello @Satyam ,

Thanks for the answers/tips.
I will read and try all the webpages you linked.
For now, I’m struggling with the webscrapping part and the fact there is a “Load more button” with Javascript and an API call towards a different endpoint. (not covered in the tutorial I linked above)

Regards,
Grégory

1 Like

A post was split to a new topic: Connecting MongoDB with GCP

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.