Mongo spends time waiting on cache on a 1.45 TB RAM server!

Hi,

We are managing Mongo DB (4.4.18) on a Docker container, The server specs are huge:

  • 1.45 TB of RAM → ~750GB of WT cache
  • 50 CPU Cores

A few notes about the setup:

  • We have many collections (> 5K)
  • Some of the collections are big (100+GB)
  • The collections mostly are well indexed
  • We have many background services that write and read data from some collections
  • We don’t have any special configuration, only --replSet db-cluster (it is a single server)

Occasionally, during working hours we notice slowness and degradation in the application performance.

Reviewing mongod logs reveals that our slow queries/aggregations are 99.9% of the time waiting on cache (system.profile.storage.timeWaitingMicros.cache)

Based on the documentation, mongo is waiting on free space on the cache, but it is unlikely the case with 750GB of cache.

Are there any other reasons for cache lock? how would you suggest troubleshooting this kind of issue?

Appreciated,
Jafar

1 Like