Mongod crash by memory peak

Hi, after upgrade mongodb replication cluster to 32Gb and 16 core, it seems usually unstable and crash due to high memory usage. Normally the system always execute query lower than 300ms, but sometimes all the query (whether heavy or light) always get higher than 20000ms and after that the mongod was killed by kernel by oom-killer. I’ve use datadog for system monitor and get the memory peak when mongod of primary go down.
Mongodb cluster info:

  • mongod version: 4.0.10
  • 1 primary, 1 secondary, 1 arbiter
  • rs version: pv1
  • ssl disable
  • wiredTigerCacheSizeGB: 10.5 (lower than max RAM - 1 * 50%)
  • secondary and primary hardware: 32GB - 16 core - amzn2.x86_64 x86_64 x86_64 x86_64 GNU/Linux

Datadog Metrics:

Mongoops Metrics:

Hi @barryd welcome to the community!

after upgrade mongodb replication cluster to 32Gb and 16 core, it seems usually unstable and crash due to high memory usage

If I understand correctly, you upgraded the hardware but now you’re seeing worse performance? What was the hardware in the previous iteration?

As a basic check, you might want to check out what’s happening via mongotop and mongostat to see if anything is amiss, and also check the mongod logs for anything out of the ordinary. Please see MongoDB Performance Tuning Questions to see if any of the suggestions there can help.

For a sanity check, are you running anything else besides a single mongod in any of the nodes there? It’s highly recommended to not run anything else besides mongod in a node (e.g. other servers, apps, another mongod process, etc.) since they can compete with mongod for resources.

  • wiredTigerCacheSizeGB: 10.5 (lower than max RAM - 1 * 50%)

For this setting in particular, is there a reason why it’s changed from the default? Have you tried setting it back to its default value?

Best regards
Kevin

Thank for your reply, yes we face the problem when upgrade the hardware but I forgot to tell that we had merged and cleaned up multiple replicaset cluster (15) into two clusters and upgrade the hardware of these. Total databases in one cluster now is around 500, but the datasize (not usage disk) after cleaned up is ~300GB and equal to one old cluster.
And we use mongoops and freshed ec2 instances to deploy so all nodes run mongodb only.
The problem that I cannot understand is that the execution time is very high (almost every 5 minutes) even with the simplest query (findOne in a collection has no docs take 50second to execute) and sometimes the primary was killed and switch while cpu is not high (40-50%). When the execution time go high all query was affected but the CPU usage is still stable at 40-50%. Every metrics in the mongoops chart is normal, total connection is the same as old cluster ~1500 connections and the cpu usage always lower than 60%.

Hi @barryd

Ok so let me see if I understand your case correctly:

  • You had 15 replica sets previously, and merged them all into just 2 replica sets
  • These new 2 replica sets have better hardware than the previous 15
  • There are about 500 databases in one of the replica sets

Did I get that correctly?

Now the problems:

  • findOne in a collection has no docs take 50 second to execute
  • execution time is almost 5 minutes for most queries
  • sometimes the primary node is OOMkilled

Actually all the problems you see point to typical symptoms of the hardware struggling to keep up with the work. If you have inefficient queries (e.g. queries that returns the majority of a collection, queries without indexes to back them up), then they will put additional burdens as well. You can try to optimize the queries, but at some point there’s no way around needing more hardware to service the workload.

If my understanding of the before & after state of your deployment is accurate, then one replica set is now serving the work of ~7 replica sets previously, so anecdotally you’ll need 7x the old hardware :slight_smile:

There are some tools that can help you determine how your deployment is coping with work:

  • mongostat would show you how much work a mongod is servicing
  • iostat would show you how busy the disks are

Please post the output of these two tools during the busiest period, as typically we can determine if the hardware is really struggling from their output.

Also I would recommend you to set the WT cache back to it’s default setting. Making it smaller would amplify some issue, so we should check if setting it back to default makes it slightly better.

Best regards
Kevin

2 Likes

The cases is correct.
The problem of execution time is not 5 mins for each query, I mean every 5 minutes something affect the server and cause all query slowed even findOne in a collection has no docs take 50 second to execute.
And the workload of new cluster is not ~7 replica sets previously, because we also reduce the traffic too, the total connections to new cluster is the same as old one. The difference is that the total databases in one cluster is now larger than old one (due to multi-tenantcy architecture).
Here is the output of these tools in one node of a cluster:


I don’t think hardware is the problem. One thing we recognize from mongodb logs is when the query slowed, almost of time execute was spend on “schemaLock”.

Yes I don’t see the numbers in mongotop that are outlandish, so it’s possible it’s something else.

One thing we recognize from mongodb logs is when the query slowed, almost of time execute was spend on “schemaLock”.

Do you mean that this doesn’t happen all the time? E.g., every 5 mintues this will happen consistently? Could you post the log snippet when this happens?

Just for due diligence and to ensure you’re not running into a solved issue, I noted that you’re running MongoDB 4.0.10, which is quite old by now. The latest in the 4.0 series is 4.0.28. Is it possible to at least try this version?

Having said that, please note that the whole 4.0 series is out of support since April 2022, so you might want to consider moving to a supported version.

Best regards
Kevin

1 Like

Yes it’s not happen all the time but every 5-10 mins consistently.
Some logs when query was slowed:
mongodb.log (12.3 KB)

I will prepare to upgrade the mongod version, do you have some migrate or upgrade notes to announce?

Unfortunately I cannot see the log you posted for some reason. Could you copy the relevant part instead?

Upgrading within a major version should be a straightforward binary replacement (e.g. from 4.0.10 to 4.0.28), but it’s slightly more involved to upgrade to a newer major release. To upgrade from 4.0 to 4.2, please have a look at https://www.mongodb.com/docs/manual/release-notes/4.2-upgrade-replica-set/

Please note that the supported upgrade path between major versions only covers subsequent versions, e.g. to upgrade to 5.0, you’ll need to do 4.0 → 4.2 → 4.4 → 5.0

Best regards
Kevin

1 Like

Here you are

2022-06-08T01:48:02.822+0000 I NETWORK [listener] connection accepted from 172.16.106.98:44388 #68446 (1636 connections now open)
2022-06-08T01:48:02.836+0000 I ACCESS [conn68446] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44388
2022-06-08T01:48:02.853+0000 I ACCESS [conn68446] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44388
2022-06-08T01:48:02.867+0000 I ACCESS [conn68446] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44388
2022-06-08T01:48:02.881+0000 I ACCESS [conn68446] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44388
2022-06-08T01:48:02.895+0000 I ACCESS [conn68446] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44388
2022-06-08T01:48:02.909+0000 I ACCESS [conn68446] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44388
2022-06-08T01:48:02.923+0000 I ACCESS [conn68446] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44388
2022-06-08T01:48:03.004+0000 I NETWORK [listener] connection accepted from 127.0.0.1:35332 #68447 (1637 connections now open)
2022-06-08T01:48:03.315+0000 I COMMAND [conn26646] command db1.products command: find { find: “products”, filter: { is_active: true, is_deleted: false }, sort: { created: -1 }, projection: {}, skip: 119900, limit: 100, returnKey: false, showRecordId: false, lsid: { id: UUID(“c0889d15-d2a2-4891-8f54-bf6d8e11d368”) }, $clusterTime: { clusterTime: Timestamp(1654652882, 10), signature: { hash: BinData(0, 5DB339F8B7F72F5E1A38A2F261F3525DD1C33118), keyId: 7057813452781256841 } }, $db: “db1” } planSummary: IXSCAN { is_active: 1, created: -1 } keysExamined:120000 docsExamined:120000 cursorExhausted:1 numYields:938 nreturned:100 reslen:139555 locks:{ Global: { acquireCount: { r: 939 } }, Database: { acquireCount: { r: 939 } }, Collection: { acquireCount: { r: 939 } } } storage:{ data: { bytesRead: 14742, timeReadingMicros: 14 } } protocol:op_msg 219ms
2022-06-08T01:48:03.487+0000 I COMMAND [conn68332] command local.oplog.rs command: collStats { collstats: “oplog.rs”, $readPreference: { mode: “secondaryPreferred” }, $db: “local” } numYields:0 reslen:7143 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 56654174 } } protocol:op_query 56654ms
2022-06-08T01:48:03.487+0000 I COMMAND [conn68364] command local.oplog.rs command: collStats { collstats: “oplog.rs”, $readPreference: { mode: “secondaryPreferred” }, $db: “local” } numYields:0 reslen:7143 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 63527346 } } protocol:op_query 63527ms
2022-06-08T01:48:03.487+0000 I COMMAND [conn68440] command local.oplog.rs command: collStats { collstats: “oplog.rs”, $readPreference: { mode: “secondaryPreferred” }, $db: “local” } numYields:0 reslen:7143 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 23466243 } } protocol:op_query 23466ms
2022-06-08T01:48:03.488+0000 I COMMAND [conn2348] command db3.balances command: find { find: “balances”, filter: { created: { $lt: new Date(1654534800000) } }, sort: { created: -1 }, projection: {}, limit: 1, lsid: { id: UUID(“6b230d18-ede8-4fba-aa73-cc97a901b51b”) }, $clusterTime: { clusterTime: Timestamp(1654652812, 125), signature: { hash: BinData(0, 74934498926F07BC1FA12E736332F8EAAD2D6A0C), keyId: 7057813452781256841 } }, $db: “db3” } planSummary: IXSCAN { created: 1 } keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:231 locks:{ Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } storage:{ timeWaitingMicros: { handleLock: 28808, schemaLock: 70536054 } } protocol:op_msg 70564ms
2022-06-08T01:48:03.488+0000 I NETWORK [conn68332] end connection 172.16.106.98:44040 (1636 connections now open)
2022-06-08T01:48:03.491+0000 I WRITE [conn39654] update db2.orders command: { q: { _id: ObjectId(‘629ffb1ca8338b7e97dce04f’) }, u: { $set: { somefield: “value” } }, multi: false, upsert: false } planSummary: IDHACK keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keysInserted:1 keysDeleted:1 numYields:1 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 } }, Collection: { acquireCount: { w: 3 } } } storage:{ data: { bytesRead: 12714, timeReadingMicros: 11 }, timeWaitingMicros: { handleLock: 2551, schemaLock: 60845939 } } 60850ms
2022-06-08T01:48:03.491+0000 I COMMAND [conn39654] command db2.$cmd command: update { update: “orders”, updates: [ { q: { _id: ObjectId(‘629ffb1ca8338b7e97dce04f’) }, u: { $set: { somefield: “value” } }, upsert: false, multi: false } ], ordered: true, lsid: { id: UUID(“880af29e-6d0b-41bf-878c-11029ac879d7”) }, txnNumber: 42, $clusterTime: { clusterTime: Timestamp(1654652820, 20), signature: { hash: BinData(0, D892514641C0A21507EBE33BA230A731093091B0), keyId: 7057813452781256841 } }, $db: “db2” } numYields:0 reslen:245 locks:{ Global: { acquireCount: { r: 6, w: 4 } }, Database: { acquireCount: { w: 4 } }, Collection: { acquireCount: { w: 4 } }, Metadata: { acquireCount: { W: 1 } } } storage:{} protocol:op_msg 60850ms
2022-06-08T01:48:03.493+0000 I COMMAND [conn26295] command db4.costs command: insert { insert: “costs”, documents: [ { status: “pending”, currency: “USD”, _id: ObjectId(‘629fffce9d7f4312150e83c5’), order: ObjectId(‘629f5020a8338bba4adc4c5e’), created: new Date(1654652878704), __v: 0 } ], ordered: true, lsid: { id: UUID(“73feff84-ee1f-4334-a69c-663409705397”) }, txnNumber: 2, $clusterTime: { clusterTime: Timestamp(1654652878, 4), signature: { hash: BinData(0, C99B37FB735574CB01648E132D4B1CB1D56E15EA), keyId: 7057813452781256841 } }, $db: “db4” } ninserted:1 keysInserted:6 numYields:0 reslen:230 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } storage:{ data: { bytesRead: 104055, timeReadingMicros: 2071 }, timeWaitingMicros: { handleLock: 7271, schemaLock: 4776151 } } protocol:op_msg 4786ms
2022-06-08T01:48:03.496+0000 I COMMAND [conn68441] serverStatus was very slow: { after basic: 0, after advisoryHostFQDNs: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after dur: 0, after extra_info: 0, after freeMonitoring: 0, after globalLock: 0, after logicalSessionRecordCache: 0, after network: 0, after opLatencies: 0, after opReadConcernCounters: 0, after opcounters: 0, after opcountersRepl: 0, after oplog: 1694, after repl: 1694, after security: 1694, after storageEngine: 1694, after tcmalloc: 1694, after transactions: 1694, after transportSecurity: 1694, after wiredTiger: 1694, at end: 1695 }
2022-06-08T01:48:03.496+0000 I COMMAND [conn68441] command local.oplog.rs command: serverStatus { serverStatus: 1, advisoryHostFQDNs: 1, locks: 0, recordStats: 0, oplog: 1, $readPreference: { mode: “secondaryPreferred” }, $db: “admin” } numYields:0 reslen:33408 locks:{ Global: { acquireCount: { r: 3 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } storage:{} protocol:op_query 1695ms
2022-06-08T01:48:03.502+0000 I ACCESS [conn68440] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44314
2022-06-08T01:48:03.503+0000 I NETWORK [conn68364] end connection 172.16.106.98:44112 (1635 connections now open)
2022-06-08T01:48:03.515+0000 I ACCESS [conn68441] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44318
2022-06-08T01:48:03.516+0000 I ACCESS [conn68440] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44314
2022-06-08T01:48:03.530+0000 I ACCESS [conn68440] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44314
2022-06-08T01:48:03.533+0000 I ACCESS [conn68441] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44318
2022-06-08T01:48:03.546+0000 I ACCESS [conn68440] Successfully authenticated as principal mms-monitoring-agent on admin from client 172.16.106.98:44314
2022-06-08T01:48:03.547+0000 I WRITE [conn39369] update db2.orders command: { q: { _id: ObjectId(‘629ffb1ca8338b7e97dce04f’) }, u: { $set: { date: new Date(1654652823741) } }, multi: false, upsert: false } planSummary: IDHACK keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keysInserted:1 keysDeleted:1 writeConflicts:780 numYields:780 locks:{ Global: { acquireCount: { r: 782, w: 782 } }, Database: { acquireCount: { w: 782 } }, Collection: { acquireCount: { w: 782 } } } storage:{} 59783ms
2022-06-08T01:48:03.549+0000 I COMMAND [conn39369] command db2.$cmd command: update { update: “orders”, updates: [ { q: { _id: ObjectId(‘629ffb1ca8338b7e97dce04f’) }, u: { $set: { date: new Date(1654652823741) } }, upsert: false, multi: false } ], ordered: true, lsid: { id: UUID(“ebed48c8-dc08-46df-9a04-b76f179353b0”) }, txnNumber: 39, $clusterTime: { clusterTime: Timestamp(1654652823, 6), signature: { hash: BinData(0, A69CE8B3E8DBDF008051A969595868A30CB2CBDF), keyId: 7057813452781256841 } }, $db: “db2” } numYields:0 reslen:245 locks:{ Global: { acquireCount: { r: 785, w: 783 } }, Database: { acquireCount: { w: 783 } }, Collection: { acquireCount: { w: 783 } }, Metadata: { acquireCount: { W: 1 } } } storage:{} protocol:op_msg 59785ms
2022-06-08T01:48:04.327+0000 I COMMAND [conn26646] command db1.products command: find { find: “products”, filter: { is_active: true, is_deleted: false }, sort: { created: -1 }, projection: {}, skip: 120000, limit: 100, returnKey: false, showRecordId: false, lsid: { id: UUID(“c0889d15-d2a2-4891-8f54-bf6d8e11d368”) }, $clusterTime: { clusterTime: Timestamp(1654652883, 8), signature: { hash: BinData(0, 13CB05DE002D356B635E7EB9CE4BBD2FD0D287A4), keyId: 7057813452781256841 } }, $db: “db1” } planSummary: IXSCAN { is_active: 1, created: -1 } keysExamined:120100 docsExamined:120100 cursorExhausted:1 numYields:939 nreturned:100 reslen:141777 locks:{ Global: { acquireCount: { r: 940 } }, Database: { acquireCount: { r: 940 } }, Collection: { acquireCount: { r: 940 } } } storage:{ data: { bytesRead: 58688, timeReadingMicros: 42 } } protocol:op_msg 204ms
2022-06-08T01:48:04.673+0000 I ACCESS [conn68434] Successfully authenticated as principal __system on local from client 172.16.106.98:44278
2022-06-08T01:48:04.680+0000 I ACCESS [conn68442] Successfully authenticated as principal __system on local from client 172.16.106.98:44326
2022-06-08T01:48:04.794+0000 I ACCESS [conn68435] Successfully authenticated as principal __system on local from client 172.16.134.149:35916
2022-06-08T01:48:04.884+0000 I ACCESS [conn68435] Successfully authenticated as principal __system on local from client 172.16.134.149:35916
2022-06-08T01:48:04.896+0000 I ACCESS [conn68435] Successfully authenticated as principal __system on local from client 172.16.134.149:35916
2022-06-08T01:48:04.908+0000 I ACCESS [conn68435] Successfully authenticated as principal __system on local from client 172.16.134.149:35916
2022-06-08T01:48:04.920+0000 I ACCESS [conn68435] Successfully authenticated as principal __system on local from client 172.16.134.149:35916
2022-06-08T01:48:04.933+0000 I ACCESS [conn68435] Successfully authenticated as principal __system on local from client 172.16.134.149:35916
2022-06-08T01:48:05.125+0000 I COMMAND [conn26646] command db1.products command: find { find: “products”, filter: { is_active: true, is_deleted: false }, sort: { created: -1 }, projection: {}, skip: 120100, limit: 100, returnKey: false, showRecordId: false, lsid: { id: UUID(“c0889d15-d2a2-4891-8f54-bf6d8e11d368”) }, $clusterTime: { clusterTime: Timestamp(1654652883, 8), signature: { hash: BinData(0, 13CB05DE002D356B635E7EB9CE4BBD2FD0D287A4), keyId: 7057813452781256841 } }, $db: “db1” } planSummary: IXSCAN { is_active: 1, created: -1 } keysExamined:120200 docsExamined:120200 cursorExhausted:1 numYields:939 nreturned:100 reslen:137342 locks:{ Global: { acquireCount: { r: 940 } }, Database: { acquireCount: { r: 940 } }, Collection: { acquireCount: { r: 940 } } } storage:{ data: { bytesRead: 83510, timeReadingMicros: 60 } } protocol:op_msg 204ms

Thanks for the logs. There are some things that caught my eye. Let me explain:

2022-06-08T01:48:03.315+0000 I COMMAND [conn26646] command db1.products command: find { find: “products”, filter: { is_active: true, is_deleted: false }, sort: { created: -1 }, projection: {}, skip: 119900, limit: 100, returnKey: false, showRecordId: false, lsid: { id: UUID(“c0889d15-d2a2-4891-8f54-bf6d8e11d368”) }, $clusterTime: { clusterTime: Timestamp(1654652882, 10), signature: { hash: BinData(0, 5DB339F8B7F72F5E1A38A2F261F3525DD1C33118), keyId: 7057813452781256841 } }, $db: “db1” } planSummary: IXSCAN { is_active: 1, created: -1 } keysExamined:120000 docsExamined:120000 cursorExhausted:1 numYields:938 nreturned:100 reslen:139555 locks:{ Global: { acquireCount: { r: 939 } }, Database: { acquireCount: { r: 939 } }, Collection: { acquireCount: { r: 939 } } } storage:{ data: { bytesRead: 14742, timeReadingMicros: 14 } } protocol:op_msg 219ms

What’s interesting about this line is that it’s a pagination query with skip/limit, with it trying to skip 119,900 documents and limit to 100. This is not an efficient query, since it needs to look at all 120,000 documents, throw away 119,900 of them, and only return 100.

I found three instances of this query, at the interval of 1 per second. But I think this is only part of the problem.

The other part is that I suspect you have too many databases & collections sitting in one server, since you mentioned that this is a multi-tenant database. WT does a checkpoint every 60 seconds, where it writes a consistent view of the database to disk. In order to do this, it needs to walk through all open collections open by one. With more collections & databases, the longer this will take. Thus, other operations may queue up behind a long checkpoint. Do you mind telling us how many databases, collections, and indexes you have in the deployment?

Hence I believe you’re seeing the effect of WT-5479. This is a known issue, and it’s gradually made better in MongoDB 4.4 series, and also in MongoDB 5.0 series.

So to mitigate the issue you’re seeing, I would recommend you to:

  • Change the pagination query: instead of using skip/limit, use a range query instead, possibly with a bucket pattern
  • Upgrade to MongoDB 5.0 so you’ll get better performance with many collections

Best regards
Kevin

2 Likes

Thanks a lots for your information. We have prepared to upgrade to version 4.4 and keep watching the cluster. Here is some information about total databases, collections, indexes in one repl cluster deployment:
Total indexes on the server: 491161
Total dbs on the server: 3669
Total cols on the server: 88576

After upgrade to 4.4 to improve the checkpoint locking time, but how can I determine the maximum limit number of databases, collections, indexes that can cause same issue in the future?

how can I determine the maximum limit number of databases, collections, indexes that can cause same issue in the future?

Unfortunately there’s no one number I can quote for this, as this will be wildly different between hardware, and the actual workload of the database. The checkpoint process involves writing modified data (inserted, modified, deleted, etc.) to disk, so if you have relatively few of these operations, having a lot of collections may not matter. In another case, you can have a relatively small number of busy collections that can result in long checkpoints.

To make any number even less relevant, there’s also a consistent effort from MongoDB & WT to make this situation better and improve the server to handle more work.

I would say, as a general rule, to keep an eye on the mongod logs, and regularly monitor them for any slow operations. If you see an increase in number of slow operations, it’s best to act to discover the root cause and mitigate the issue before it gets out of hand.

Sorry I can’t be more precise than that. However, having ~90k collections and ~500,000 indexes is definitely a large number by any metric :slight_smile:

Best regards
Kevin

1 Like

I got it, thank you sir for all your helps.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.