Location and performance of linked fields

Dan_Burt · September 3, 2021, 3:25pm

I have documents stored in 2 related collections:

Club - singular club that can have multiple players
Player - can be a member of multiple clubs

The Player collection could look like:

{
    "_id": {
        "$oid": "609d0993906429612483cea0"
    },
    "name": "Lionel Messi",
    "clubs": [{
        "clubId": {
            "$oid": "6076030465508936f00e086c"
        },
        "name": "Paris St Germain FC",
        "nickName": "PSG",
        "logoPath": "psg.png"
    }, {
        "clubId": {
            "$oid": "612e7c1a154c9900ce44c252"
        },
        "name": "Argentina National Team",
        "nickname": "Argentina",
        "logoPath": "arg.png"
    }]
}

A Club could look like:

{
    "_id": {
        "$oid": "612e7c1a154c9900ce44c252"
    },
    "name": "Paris St Germain",
    "nickname": "PSG",
    "logo": "psg.png",
    "players": [{
        "$oid": "609d0993906429612483ceb0"
    }, {
        "$oid": "609d0993906429612483cea0"
    }],
    "homeClub": {
        "$oid": "607603312327b51b98000106"
    }
}

Queries could occur both ways, as in I might need to look up all the players at a club, or all the clubs a player is a member of.

On which side should I store the linked field? Is there a method to it? Is there a means of measuring performance, particularly as datasets get very large?

Asya_Kamsky · September 3, 2021, 4:34pm

While it depends somewhat on the queries you’ll be running, my inclination would be to store clubs in player records, but that’s because I’m thinking you’ll have a lot more players in each club than clubs for each player?

As long as things are indexed correctly it might not matter also.

Asya

Dan_Burt · September 3, 2021, 6:45pm

Thanks @Asya_Kamsky

Your thinking is similar to my thinking. But the usage may end up more like WhatsApp, where you are added to dozens, may be even 3-digits number of chats (or clubs) per player. Some will manage and prune the list over time. Others will just leave it.

Do you know any good published videos or tutorials around the performance aspects of document design, which would cover indexes? Introductory level, opposed to guru! Forewarned is forearmed and all part of the learning…

Imad_Bouteraa · September 3, 2021, 7:37pm

Hello @Dan_Burt
have you tried

especially:

M320: Data Modeling
M201: MongoDB Performance

there are also good articles with “schema-design” tag

For example, Building with Patterns series

Dan_Burt · September 4, 2021, 8:33am

@Imad_Bouteraa - thanks for the links.

I have actually been through the M320 course, and similar materials.

I wasn’t sure if there were other tutorials or articles specifically related to this question of collection structure, performance and how introduction to measuring that. Is it “Plans” or something similar?

Imad_Bouteraa · September 4, 2021, 1:18pm

Hello @Dan_Burt,
There is no definitive best schema. All depends on how the data is queried and updated
This article may help

For the performance.
you can use explain()

to get the winning plan for a query (without execution the query)
to execute the winning plan for a query and get metrics about the execution
to investigate every possible plan

The query optimizer chooses the winning plan empirically. e.g. the fastest plan to get the first 100 documents is the winner. the winning plan is then cached.