Collection of 2k docs ~2mb each very slow to query

Dan_Lindsey · May 31, 2020, 11:48pm

I have a service that queries profile data for a given user. Right now there are only two users, myself and my brother, but the results are dreadfully slow. The profile data consists of two collections, performances and matches. I have an indexes on both collections that are listed below in the collection models but I could be misusing them.

Performances are queried first, with an operation like this:

const player = { ... }
const performances = await mongo.collection('performances').find({ "player._id": player._id }).toArray()

This query takes about 2 seconds for 330 results out of ~2,500 documents of a few kb in size each. After getting an array of performances I query matches with an operation like this:

const matches= await mongo.collection('matches').find({ matchId: { $in: performances.map(p => p.matchId) } }).toArray()

This query takes about 1 second per match returned out of ~1,500 documents of approx 2mb in size each. As you can imagine, fetching a hundred or more matches takes several minutes.

Sorry if this is an overload, but the document models are listed below. I’d like to know if I can optimize indexing or data structure to improve performance for this scenario. I’ve tried piping the results to a stream rather than returning the aggregate but did not see any performance increase.

// performances
// indexes: { player._id: 1 }
_id: ObjectId
matchId: string
player: {
  _id: ObjectId
  username: string
  clantag: string
}
stats: {
  kills: number
  deaths: number
  downs: number[] // max length 8
  gulagKills: number
  gulagDeaths: number
  revives: number
  contracts: number
  teamWipes: number
  lootCrates: number
  buyStations: number
  teamPlacement: number
  teamSurvivalTime: number
  xp: {
    score: number
    match: number
    bonus: number
    medal: number
    misc: number
    challenge: number
  }
}


// matches
// indexes: { matchId: 1 }
_id: ObjectId
matchId: string
mapId: string
modeId: string
utcSecStart: number
utcSecEnd: number
teams: []{ // average array length ~150
  {
    name:string
    placement: number
    players: []{
      username: string
      clantag: string
      platform: string
      stats: {
        kills
        deaths
        score
        assists
        headshots
        executions
        damageDone
        damageTaken
        longestStreak
        timePlayed
        distanceTraveled
        percentTimeMoving
      }
    }
  }

Thanks for looking!

Prasad_Saya · June 1, 2020, 4:24am

Hello Dan Lindsey,

I have some suggestions.

1. Using Projection:

Projection used with a find query allows restrict the fields you need in the result. See about projection at db.collection.find.

const cursor = performancesColl.find( { "player._id": player._id } ).project( { matchId: 1 } );

const matchIds = [];

while(await cursor.hasNext()) {
      const doc = await cursor.next();
      matchIds.push(doc.matchId);
}

const matches = matchesColl.find( { matchId: { $in: matchIds } } )
                            .project( { // .... include only the fields you need in the application // } )
                            .toArray();

2. Aggregation Lookup:

Another way of building the query is using an aggregation’s $lookup stage which lets you query both the collections together on a common field (in a “join” operation). This will get the result in a single query.

3. Index Usage:

You can verify if an index is being applied in a query correctly by using the explain method. The explain generates a query plan which has details about the index usage. For example from the mongo shell, get a query plan for the find method you are using with performances collection:

db.performances.find( { "player._id": player._id } ).explain()