Meaning and root cause of log message "serverStatus was very slow"

wrzr123 · March 7, 2023, 7:30pm

Hello there,

I’m running a MongoDB CommunityEdition V5.0 on an Ubuntu 20.04.5 server. I’m currently struggeling with slow perfomance of the DB, but I don’t know where it comes from. The same queries sometimes take seconds, and sometimes only milliseconds.
I now checked the logs of MongoDB, and every now and then (about 1-5 times per day) I can see the following log line:

{"t":{"$date":"2023-03-07T00:40:01.826+01:00"},"s":"I", "c":"COMMAND", "id":20499, "ctx":"ftdc","msg":"serverStatus was very slow","attr":{"timeStats":{"after basic":0,"after asserts":0,"after bucketCatalog":0,"after catalogStats":0,"after connections":0,"after electionMetrics":0,"after extra_info":0,"after flowControl":0,"after globalLock":0,"after indexBulkBuilder":0,"after indexStats":0,"after locks":0,"after logicalSessionRecordCache":0,"after mirroredReads":0,"after network":0,"after opLatencies":0,"after opcounters":0,"after opcountersRepl":0,"after oplog":0,"after oplogTruncation":0,"after readConcernCounters":0,"after repl":0,"after scramCache":0,"after security":0,"after storageEngine":0,"after tcmalloc":2825,"after tenantMigrations":2825,"after trafficRecording":2825,"after transactions":2825,"after transportSecurity":2825,"after twoPhaseCommitCoordinator":2825,"after wiredTiger":2825,"at end":2826}}}

I didn’t find any documentation on the individual parameters. But it’s suspicious that all the time gets lost after tcmalloc. It’s always the same when that log message appears, the only parameter where time is lost is tcmalloc.

Can someone explain to me what this means? And does it give me any hint where my performance issues come from?

Regards,
Jonas

Kobe_W · March 7, 2023, 10:32pm

Based on this, it’s a symptom, instead of a cause.

So something on your server must be running slow/heavy. We need more info on it, for instance cpu usage/mem usage/disk usage.

wrzr123 · March 7, 2023, 11:11pm

Hi Kobe,

thanks for answering!
The server doesn’t look really overloaded… Yesterday, the “serverStatus was very slow” message was logged about 15 times within 90 seconds. At that time, the maximum CPU usage was at 0.8%. Yes, 0.8%, not 80%. Memory usage was at 19GB out of 32GB, most of it was consumed by MongoDB. But I think that’s normal as well. Maximum system.load was at 4.5, which should be okay, as the server has 8 cores. Maximum disk in was at about 2.5 MiB/s, which also shouldn’t be too much. Disk out was close to 0.

The database was handling some requests for sure during that time period. But looking at the resources of the server, I can’t tell that it has overloaded the server. Are those numbers from the log message telling anything? Could this tcmalloc give a hint about what’s the problem?

kevinadi · March 9, 2023, 12:47am

Hi @wrzr123

Rather than overloaded, I think in your case it’s a bottleneck somewhere.

There are reports of stalls when tcmalloc was decommitting a large amount of RAM at once (see SERVER-31417). There was another report of this phenomenon that turns out to be a monitoring agent locking up the RAM that mongod tries to work with.

Anecdotally, monitoring agents seem to interfere with the mongod process. I have also seen reports on some unexplained crashes, that turns out to be a security software that tampers with mongod memory.

If you’re not running any monitoring agent, then perhaps you can try running this workload on a different set of hardware and see if the issue persists to rule out any hardware related causes.

If this still persists, then perhaps a snippet of mongostat might be useful to double check the server’s load.

Best regards
Kevin

wrzr123 · March 9, 2023, 8:22am

Hi Kevin,

thanks a lot for your reply!
I indeed have a monitoring agent running, I’m using netdata cloud. I also have an anti virus software running, ClamAV.
I will turn them off and check if that resolves the issue. I’ll get back here as soon as I have any results.
Thanks!

Regards,
Jonas

wrzr123 · March 21, 2023, 9:17pm

Hi all,

right before I wrote my last reply on this topic, my server was restarted. Then the problem was gone, and all MongoDB queries were as fast as they should be. But now after about 2 weeks, it’s getting slow again.
And again I can see the log messages as shown in my first post.

Find here a snippet of mongostat, in case it helps:
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn time
*0 *0 *0 *0 0 0|0 0.0% 41.5% 0 9.53G 1.67G 0|0 0|0 111b 56.6k 35 Mar 21 22:14:42.902
*0 *0 *0 *0 0 2|0 0.0% 41.5% 0 9.53G 1.67G 0|0 0|0 407b 57.2k 35 Mar 21 22:14:43.902
*0 *0 *0 *0 0 7|0 0.0% 41.5% 0 9.53G 1.67G 0|0 0|0 1.24k 58.0k 35 Mar 21 22:14:44.898
*0 *0 *0 *0 0 9|0 0.0% 41.5% 0 9.53G 1.67G 0|0 0|0 2.55k 57.6k 35 Mar 21 22:14:45.901
*0 *0 *0 *0 0 8|0 0.0% 41.5% 0 9.53G 1.67G 0|0 0|0 3.23k 61.4k 35 Mar 21 22:14:47.148
*0 *0 *0 *0 0 11|0 0.0% 41.5% 0 9.53G 1.67G 0|0 0|0 4.71k 78.6k 35 Mar 21 22:14:47.900
*0 *0 *0 *0 0 8|0 0.0% 41.5% 0 9.53G 1.67G 0|0 1|0 2.54k 56.8k 35 Mar 21 22:14:48.925
*0 *0 *0 *0 0 11|0 0.0% 41.5% 0 9.53G 1.67G 0|0 0|0 5.00k 60.0k 35 Mar 21 22:14:49.899
*0 *0 *0 *0 0 12|0 0.0% 41.5% 0 9.53G 1.67G 1|0 2|0 7.21k 57.6k 35 Mar 21 22:14:50.910
*0 *0 *0 *0 0 9|0 0.0% 41.5% 0 9.53G 1.67G 0|0 1|0 3.50k 57.3k 35 Mar 21 22:14:51.939

I turned off my anti virus software (ClamAV) and my monitoring software (netdata), and the problem still persists. Does someone have any idea what else could cause the problem? Maybe that the problem was gone after a server restart could be a hint. Only restarting MongoDB doesn’t help by the way.
I have absolutely no idea anymore…

Thanks in advance!

kevinadi · March 22, 2023, 2:05am

Hi @wrzr123

I don’t see anything wrong with the mongostat output. In fact, it shows an idle server doing nothing: zero inserts, queries, and updates. It’s also only uses 41.5% of the allocated WT cache.

When you said “slow”, could you give further details on what you’re seeing? Some pointers:

What’s the mongod logs showing during these slow times? Is it still saying serverStatus was very slow and nothing else?
Is this running on a container/virtualization, or is it directly on the hardware? Also, is this on a cloud provider like AWS?
Are you running a standalone, a replica set, or sharded cluster?

To be perfectly honest, I don’t see a specific MongoDB issue here yet. Perhaps it’s something about the hardware or the deployment?

Just an idea, if you’re running on AWS, you might be running into a burstable performance limit and are being throttled. Note that the burst involve CPU and disk (separately, I believe), so either could be a cause.

Best regards
Kevin

Kobe_W · March 22, 2023, 3:37am

Honestly, i wouldn’t personally suggest spend too much time looking into this “serverStatus slow” if that’s the only thing you observed. I mean, if all other service level metrics look good (e.g. CPU/memory/disk/latency…), why bother.

wrzr123 · March 22, 2023, 9:16am

Thanks a lot again to both of you for your replies!

@Kobe_W
I don’t bother about the log line itself, my problem is, that my application which is using the MongoDB is responding very slow. That’s why I started investigating at all.

In general, there is very low traffic on the application, but when there is, it’s responding most of the time very slow. I turned on the system profle (that was my first measure) to investigate about long running queries. And there are a lot of long running queries, which shouldn’t be that slow.
For example there was an insert operation, which took 21 (!!) seconds. There is nothing special about the operation, the same insert operation usually takes only a few milliseconds. When I look at the details, I can see it’s spending basically all the 21 seconds waiting for the schema lock.
The same applies for queries and commands. Here there is no specific hint about locks, but queries which perform very fast on my development machine take forever on the server (with the same amount of data in my local MongoDB).

@kevinadi
Regarding your points:

Usually, the logs show the message “Slow query” a couple of times around the “serverStatus was very slow” logs. But these “slow queries” are all kind of normal queries, even the server status query itself.
My server is a Ubuntu VM, hosted by the German web hosting service Strato. So it’s not running on bare metal
I’m running the MongoDB communtiy edition as standalone.

I totally agree that it’s probably not a bug in MongoDB, but an issue with the deployment. The only other things running on the server are 3 dotnet core microservices which belong to the application (those services are querying the database) and an apache2 webserver.
I know that in an optimal deployment the database should run on a dedicated server whith nothing else running on it, but I’ve seen this kind of deployment a few times for bigger testing environments, and we never faced issue like that.