Slow query fixing

Hello,

We have some slow queries being reported on our PSS production environment which is based on a Community version of MongoDB 4.4 . In the past I was able to send the metrics file to Mongo support but I have not been able to find where I can do that anymore.

Does anybody know about this or could somebody help us in solving the issue?

Thank you in advance!

Best regards,
Bogdan Pelin

Hi @Bogdan_Pelin and welcome in the MongoDB Community,

Where is your cluster running and which kind of support contract do you have?
If you are Entreprise Advanced I think it’s here: https://support.mongodb.com/.
If you are on Atlas, you can find the support in the bottom right corner (chat) or in the top right corner (Get Help) and the response time depends on the type of support plan you have in Atlas.

That being said, in the community forum we can also help a bit if we have enough information. :wink:

Can you share a sample document maybe and the faulty query along with the current indexes present on this collection? This may not solve the problem but at least it’s a start :muscle:.

If you could add some more information like an explain plan and the DB sizing & sizing of the machine, it could help determine a possible cause.

Cheers,
Maxime.

Hello,

Our cluster is running on our premises (our datacenter). I have reached to Atlas but they redirected me here as it was on premises deployment.

We have 3 servers with a PSS deployment. The servers have 125GB RAM .

The line in the log for the slow query:

"c":"COMMAND",  "id":51803,   "ctx":"conn16","msg":"Slow query","attr":{"type":"command","ns":"MtxEventDatabase.$cmd","command":{"update":"EventCo
llection20220401","ordered":false,"$db":"MtxEventDatabase","lsid":{"id":{"$uuid":"63ae02fc-c755-4dd0-8bfe-12c3559b602c"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1649934625,"i":6}},"signature":{
"hash":{"$binary":{"base64":"mj2uqCmNycKfMhwdvAtPgYeX+mA=","subType":"0"}},"keyId":7023367372685705220}},"writeConcern":{"wtimeout":1}},"numYields":0,"reslen":508997,"locks":{"ParallelBatchWriterMode":{"
acquireCount":{"r":9997}},"ReplicationStateTransition":{"acquireCount":{"w":9997}},"Global":{"acquireCount":{"w":9997}},"Database":{"acquireCount":{"w":9997}},"Collection":{"acquireCount":{"w":9997}},"Mu
tex":{"acquireCount":{"r":9997}}},"flowControl":{"acquireCount":9997,"timeAcquiringMicros":3120},"writeConcern":{"w":1,"wtimeout":1},"storage":{"data":{"bytesRead":6409105,"timeReadingMicros":7106}},"pro
tocol":"op_msg","durationMillis":1036}}

I have attached the metrics file for the primary member of the cluster.
Below you can find one document from the collection:

prod-cluster:PRIMARY> db.EventCollection20220401.findOne()
{
        "_id" : "H560:1:52:183071606",
        "WalletOwnerId" : NumberLong("73183493945073446"),
        "CMdc" : "<1,3,1,-24,5213,3>[]<1,1,0,1167,5213,3>[{<1,4,1,1296,5213,3>[500,(12:String),(0:)],<1,4,1,1296,5213,3>[520,(6:String),(6:String)]}]<1,11,0,626,5213,3>[{28,1}]<1,28,0,208,5213,3>[0:2:5:302886,(13:NumberLong),0:2:5:303239,(19:NumberLong),0,0:2:5:302887,0:2:5:302886,(13:NumberLong),,,,,,2022-04-01T03:00:01.041472+03:00,,(19:H560:1:52:183071606),,,,,4,,,0:2:5:302885,(36:String),1,1]",
        "DeleteCode" : 4,
        "EventTime" : ISODate("2022-04-01T00:00:01.041Z"),
        "EventType" : 28,
        "UserId" : NumberLong("73183493945073445")
}

Indexes for the collection:

prod-cluster:PRIMARY> db.EventCollection20220401.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "WalletOwnerId" : 1,
                        "EventTime" : 1
                },
                "name" : "MtxWalletOwnerId_1_EventTime_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "EventTime" : 1
                },
                "name" : "MtxEventTime_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "DeleteCode" : 1,
                        "EventTime" : 1
                },
                "name" : "DeleteCode_1_EventTime_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "UserId" : 1
                },
                "name" : "UserId_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        }
]

Please let me know if you need anything else.

Thank you in advance!
Best regards,
Bogdan Pelin

Unfortunately I am unable to upload the metrics file to this thread.

It would be really easier if you could format the code using the Markdown notation that we can use in this forum.

```json
{
  "msg": "This is code!"
}
```

Which renders like this:

{
  "msg": "This is code!"
}

Regarding your slow query, the log you shared is a command wrapping 9997 update events. What’s the actual query that is triggering these updates? What filter are you using which could benefit from an index?

Cheers,
Maxime.

Here is the error from the logs:

{
  "t": {
    "$date": "2022-04-18T14:34:46.913+03:00"
  },
  "s": "I",
  "c": "COMMAND",
  "id": 51803,
  "ctx": "conn1414",
  "msg": "Slow query",
  "attr": {
    "type": "command",
    "ns": "MtxEventDatabase.EventCollection20220401",
    "command": {
      "find": "EventCollection20220401",
      "filter": {
        "$and": [
          {
            "WalletOwnerId": 1125899907444249
          },
          {
            "EventTime": {
              "$gte": {
                "$date": "2022-01-18T11:34:46.000Z"
              }
            }
          },
          {
            "EventTime": {
              "$lt": {
                "$date": "2022-04-19T11:34:46.000Z"
              }
            }
          },
          {
            "EventType": {
              "$in": [
                1
              ]
            }
          }
        ]
      },
      "sort": {
        "EventTime": -1
      },
      "projection": {
        "CMdc": 1,
        "_id": 0
      },
      "$db": "MtxEventDatabase",
      "$clusterTime": {
        "clusterTime": {
          "$timestamp": {
            "t": 1650281686,
            "i": 3
          }
        },
        "signature": {
          "hash": {
            "$binary": {
              "base64": "bD2hCEQMSNusXQXYJqKsZUaZypE=",
              "subType": "0"
            }
          },
          "keyId": 7023367372685705000
        }
      },
      "lsid": {
        "id": {
          "$uuid": "05cc81fa-3e36-4e1c-b07e-a8931dc51c22"
        }
      }
    },
    "planSummary": "IXSCAN { WalletOwnerId: 1, EventTime: 1 }",
    "keysExamined": 152,
    "docsExamined": 152,
    "cursorExhausted": true,
    "numYields": 4,
    "nreturned": 32,
    "queryHash": "79AA4A27",
    "planCacheKey": "952C0A34",
    "reslen": 36215,
    "locks": {
      "ReplicationStateTransition": {
        "acquireCount": {
          "w": 5
        }
      },
      "Global": {
        "acquireCount": {
          "r": 5
        }
      },
      "Database": {
        "acquireCount": {
          "r": 5
        }
      },
      "Collection": {
        "acquireCount": {
          "r": 5
        }
      },
      "Mutex": {
        "acquireCount": {
          "r": 1
        }
      }
    },
    "storage": {
      "data": {
        "bytesRead": 9427201,
        "timeReadingMicros": 99133
      }
    },
    "protocol": "op_msg",
    "durationMillis": 101
  }
}

This is a sample document:

{
        "_id" : "H560:1:52:183071606",
        "WalletOwnerId" : NumberLong("73183493945073446"),
        "CMdc" : "<1,3,1,-24,5213,3>[]<1,1,0,1167,5213,3>[{<1,4,1,1296,5213,3>[500,(12:STRING),(0:)],<1,4,1,1296,5213,3>[520,(6:APP_TT),(6:STRING)]}]<1,11,0,626,5213,3>[{28,1}]<1,28,0,208,5213,3>[0:2:5:302886,(13:number),0:2:5:303239,(19:Number),0,0:2:5:302887,0:2:5:302886,(13:Number),,,,,,2022-04-01T03:00:01.041472+03:00,,(19:H560:1:52:183071606),,,,,4,,,0:2:5:302885,(36:e90e9f54-e1ee-46d8-b5bb-e7f758dc6b63),1,1]",
        "DeleteCode" : 4,
        "EventTime" : ISODate("2022-04-01T00:00:01.041Z"),
        "EventType" : 28,
        "UserId" : NumberLong("73183493945073445")
}

Best regards,
Bogdan Pelin

And the current indexes:

[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "WalletOwnerId" : 1,
                        "EventTime" : 1
                },
                "name" : "MtxWalletOwnerId_1_EventTime_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "EventTime" : 1
                },
                "name" : "MtxEventTime_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "DeleteCode" : 1,
                        "EventTime" : 1
                },
                "name" : "DeleteCode_1_EventTime_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        },
        {
                "v" : 2,
                "key" : {
                        "UserId" : 1
                },
                "name" : "UserId_1",
                "ns" : "MtxEventDatabase.EventCollection20220401"
        }
]

Best regards,
Bogdan Pelin

Why is the filter on EventType a $in query with a single field? Do you sometimes select more than one EventType for this query and it just happens that you used a single one in this example?

I’m asking because it’s important for the index. A $in query is a “range” query when a filter based on an equality is… an equality. Which is important for the ESR rule that is explained in this blog post about indexes and performances.

If you transform the $in into a single equality, then the best index for this query would be: {"WalletOwnerId": 1, "EventType": 1, "EventTime": 1, "CMdc": 1}.

If you keep EventType as a $in query, then {"WalletOwnerId": 1, "EventTime": 1, "EventType": 1, "CMdc": 1} this one is better because it respect the ESR rule and will avoid the in-memory sort that would result from a multi-filter in the EventType field.

As you can see, I also added the CMdc field at the end of the index because you are using this as a projection… So this means that you can do a “covered query” because everything you need to answer this query is already in the index. This might result in a “too big” index issue. So it’s a trade off…

  • If you don’t put it in the index, then your index is smaller but you have to fetch the docs from disks each time.
  • If you put it in the index, you need more RAM to support the index but you don’t need to use the disk at all to answer this query and it can’t be any faster than that. It all depends if you want these extra milliseconds or not. Avoiding the in-memory sort is already a huge win for your performances and RAM…

I hope I make sense :smiley: !

Cheers,
Maxime.