How to optimize a charts aggregation?

michael_hoeller · May 20, 2022, 1:19pm

Hello Charts experts,

a kind of simple chart takes quite long. When I explain the aggregation I see a collection scan which I do not get rid of. None of my indexing approaches borough any change and since I have to deal with the result charts is making of the dragged fields I feel very limited.
What do you suggest ? These are less than 6000 documents and I don’t get it better than 11 secs !!!??
Regards,
Michael

{
  "stages": [
    {
      "$cursor": {
        "queryPlanner": {
          "plannerVersion": 1,
          "namespace": "list.list",
          "indexFilterSet": false,
          "parsedQuery": {},
          "queryHash": "732BF4BE",
          "planCacheKey": "732BF4BE",
          "winningPlan": {
            "stage": "PROJECTION_SIMPLE",
            "transformBy": {
              "__alias_0": 1,
              "createdAt": 1,
              "deviceType": 1,
              "_id": 0
            },
            "inputStage": {
              "stage": "COLLSCAN",
              "direction": "forward"
            }
          },
          "rejectedPlans": []
        },
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 5775,
          "executionTimeMillis": 11341,
          "totalKeysExamined": 0,
          "totalDocsExamined": 5775,
          "executionStages": {
            "stage": "PROJECTION_SIMPLE",
            "nReturned": 5775,
            "executionTimeMillisEstimate": 10248,
            "works": 5777,
            "advanced": 5775,
            "needTime": 1,
            "needYield": 0,
            "saveState": 602,
            "restoreState": 602,
            "isEOF": 1,
            "transformBy": {
              "__alias_0": 1,
              "createdAt": 1,
              "deviceType": 1,
              "_id": 0
            },
            "inputStage": {
              "stage": "COLLSCAN",
              "nReturned": 5775,
              "executionTimeMillisEstimate": 10248,
              "works": 5777,
              "advanced": 5775,
              "needTime": 1,
              "needYield": 0,
              "saveState": 602,
              "restoreState": 602,
              "isEOF": 1,
              "direction": "forward",
              "docsExamined": 5775
            }
          }
        }
      },
      "nReturned": 5775,
      "executionTimeMillisEstimate": 11321
    },
    {
      "$addFields": {
        "createdAt": {
          "$cond": [
            {
              "$eq": [
                {
                  "$type": [
                    "$createdAt"
                  ]
                },
                {
                  "$const": "date"
                }
              ]
            },
            "$createdAt",
            {
              "$const": null
            }
          ]
        }
      },
      "nReturned": 5775,
      "executionTimeMillisEstimate": 11321
    },
    {
      "$addFields": {
        "__alias_0": {
          "year": {
            "$year": {
              "date": "$createdAt"
            }
          },
          "month": {
            "$subtract": [
              {
                "$month": {
                  "date": "$createdAt"
                }
              },
              {
                "$const": 1
              }
            ]
          }
        }
      },
      "nReturned": 5775,
      "executionTimeMillisEstimate": 11321
    },
    {
      "$group": {
        "_id": {
          "__alias_0": "$__alias_0",
          "__alias_1": "$deviceType"
        },
        "__alias_2": {
          "$sum": {
            "$const": 1
          }
        }
      },
      "nReturned": 45,
      "executionTimeMillisEstimate": 11329
    },
    {
      "$project": {
        "__alias_2": true,
        "__alias_0": "$_id.__alias_0",
        "__alias_1": "$_id.__alias_1",
        "_id": false
      },
      "nReturned": 45,
      "executionTimeMillisEstimate": 11329
    },
    {
      "$project": {
        "x": "$__alias_0",
        "y": "$__alias_2",
        "color": "$__alias_1",
        "_id": false
      },
      "nReturned": 45,
      "executionTimeMillisEstimate": 11329
    },
    {
      "$group": {
        "_id": {
          "x": "$x"
        },
        "__grouped_docs": {
          "$push": "$$ROOT"
        }
      },
      "nReturned": 42,
      "executionTimeMillisEstimate": 11329
    },
    {
      "$sort": {
        "sortKey": {
          "_id.x.year": 1,
          "_id.x.month": 1
        }
      },
      "nReturned": 42,
      "executionTimeMillisEstimate": 11339
    },
    {
      "$unwind": {
        "path": "$__grouped_docs"
      },
      "nReturned": 45,
      "executionTimeMillisEstimate": 11339
    },
    {
      "$replaceRoot": {
        "newRoot": "$__grouped_docs"
      },
      "nReturned": 45,
      "executionTimeMillisEstimate": 11339
    },
    {
      "$limit": 5000,
      "nReturned": 45,
      "executionTimeMillisEstimate": 11339
    }
  ],
   "ok": 1,
  "$clusterTime": {
    "clusterTime": {
      "$timestamp": {
        "t": 1653052072,
        "i": 1
      }
    },
    "signature": {
      "hash": {
        "$binary": {
          "base64": "wbTpff+wi+JIk2OTk+7x8AI=",
          "subType": "00"
        }
      },
      "keyId": 7078973919130026000
    }
  },
  "operationTime": {
    "$timestamp": {
      "t": 1653052072,
      "i": 1
    }
  }
}

tomhollander · May 21, 2022, 1:31am

Hi @michael_hoeller -

Collection scans are expected when rendering charts that do not have any filters. In an unfiltered chart, we need to consider every document, so the scan is inevitable. Indexes are helpful and recommended whenever you have filters or lookups.

As to why it’s taking 11 seconds, I’m not sure, but I’m not an expert on query stats or performance tuning. One thing to note is that Charts does cache the results of the query up to the period you specify, so people viewing the charts will get much better performance when the results are in the cache and fresh.

Tom

michael_hoeller · May 21, 2022, 8:00pm

Hi @tomhollander
thanks for your answer, I was kind of expecting this answer but was hoping on some kind of secret sauce

Concerning the fact that the collection scan takes 11 sec for only 6000 docs, I’d like to dig deeper in this does not sound correct. Does anyone see something odd in the explain statement?

Cheers,
Michael

kevinadi · June 9, 2022, 6:53am

Hi @michael_hoeller

11 secs for scanning 6000 docs seems excessive. Can you provide a bit more details so we can reproduce this?

What’s the relevant output of db.collection.stats() regarding to collection size & average document size?
Could you provide some example documents so it can be simulated using e.g. mgeneratejs
What’s your Atlas instance size? (e.g. M0, M10, etc.)
Could you provide more details about the chart you’re creating?

Best regards
Kevin

michael_hoeller · June 9, 2022, 10:53am

Thank you for following up @kevinadi
I will provide all information right after the MDBW22

michael_hoeller · June 13, 2022, 4:19pm

Hello @kevinadi
thanks again for your response. While gathering the data I realized that on (not needed field) in each document has > 5MB data. I created a view and could get rid of the problem.
Thanks a lot!
Michael

system · June 18, 2022, 4:20pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.