MongoDB consuming all available memory

billy_noah · March 6, 2023, 6:35pm

I’m having an issue with MongoDB continually consuming all available RAM and then getting killed by oom. I’ve read a few questions on the stackexchange network that suggest setting storage.wiredTiger.engineConfig.cacheSizeGB can resolve the issue but it is not helping.

Right at this moment, here is the situation:

Mongod Service Status

$ service mongod status
● mongod.service - MongoDB Database Server
     Loaded: loaded (/lib/systemd/system/mongod.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/mongod.service.d
             └─always_restart.conf
     Active: active (running) since Mon 2023-03-06 12:57:35 EST; 3min 26s ago
       Docs: https://docs.mongodb.org/manual
   Main PID: 13318 (mongod)
     Memory: 6.4G
     CGroup: /system.slice/mongod.service
             └─13318 /usr/bin/mongod --config /etc/mongod.conf

Mar 06 12:57:35 chat systemd[1]: Stopped MongoDB Database Server.
Mar 06 12:57:35 chat systemd[1]: Started MongoDB Database Server.

Free Memory

$ free -h
              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       7.2Gi       129Mi       4.0Mi       435Mi       309Mi
Swap:         511Mi       511Mi          0B

Mongo Config

# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
  engine: wiredTiger
  wiredTiger:
    engineConfig:
        cacheSizeGB: 2

As you can see I have 8GB of RAM and even now Mongo is on the verge of consuming most of it. By the time I posted this OOM had already intervened:

Mar  6 13:19:13 host kernel: [ 5638.313813] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mongod.service,task=mongod,pid=13997,uid=116
Mar  6 13:19:13 host kernel: [ 5638.314025] Out of memory: Killed process 13997 (mongod) total-vm:8773352kB, anon-rss:6775648kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:14084kB oom_score_adj:0

I realize cacheSizeGB may not be the only factor to consider here, but what else should I be looking at?

Jack_Woehr · March 7, 2023, 4:29am

What’s mongod doing? Serving requests? Just idling?
What version? What OS release? etc.
More info, please.

Kobe_W · March 7, 2023, 5:49am

Did you check this? The default value shouldn’t be that high. Do you have any index building in progress?

https://www.mongodb.com/docs/manual/reference/configuration-options/#mongodb-setting-storage.wiredTiger.engineConfig.cacheSizeGB

billy_noah · March 7, 2023, 2:47pm

What’s mongod doing?

I am just learning how to debug mongod server issues. I know now I can use db.currentOp() to check this in the future.

What version? What OS release?

mongod v5.0.14
Ubuntu 20.04.5 LTS

As the issue is still happening I examined the output of db.currentOp() and I think the following query is probably the culprit. Looking at the rid reveals that there are only a few hundred records which match but for some reason the dbms doesn’t use that index, but instead uses only the full text search? I might be misreading the output here. Is there a simple change I can make to get this query to perform normally? I also don’t understand why the plansummary seems to include the same index over and over again.


      type: 'op',
      host: 'chat:27017',
      desc: 'conn41',
      connectionId: 41,
      client: '127.0.0.1:53152',
      clientMetadata: {
        driver: { name: 'nodejs', version: '4.3.1' },
        os: {
          type: 'Linux',
          name: 'linux',
          architecture: 'x64',
          version: '5.4.0-144-generic'
        },
        platform: 'Node.js v14.19.3, LE (unified)|Node.js v14.19.3, LE (unified)'
      },
      active: true,
      currentOpTime: '2023-03-07T09:41:07.276-05:00',
      effectiveUsers: [ { user: 'rocketchat', db: 'admin' } ],
      threaded: true,
      opid: 13676,
      lsid: {
        id: new UUID("bcbe2f99-cdf9-4c09-bca3-c8f3d175a374"),
        uid: Binary(Buffer.from("56bb3afa50a12c12bd55cb8cc97243c1cc61c311a559ffab86114c472d35e7d4", "hex"), 0)
      },
      secs_running: Long("536"),
      microsecs_running: Long("536208026"),
      op: 'query',
      ns: 'rocketchat.rocketchat_message',
      command: {
        find: 'rocketchat_message',
        filter: {
          '$text': {
            '$search': 'https://example.com/path/index.php?type=test'
          },
          t: { '$ne': 'rm' },
          _hidden: { '$ne': true },
          rid: 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i'
        },
        sort: { ts: -1 },
        projection: { score: { '$meta': 'textScore' } },
        skip: 0,
        limit: 10,
        lsid: { id: new UUID("bcbe2f99-cdf9-4c09-bca3-c8f3d175a374") },
        '$clusterTime': {
          clusterTime: Timestamp({ t: 1678199528, i: 1 }),
          signature: {
            hash: Binary(Buffer.from("222888b20decf0073dbf33332c7f1236a7473034", "hex"), 0),
            keyId: Long("7176963756902055940")
          }
        },
        '$db': 'rocketchat',
        '$readPreference': { mode: 'secondaryPreferred' }
      },
      planSummary: 'IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }, IXSCAN { _fts: "text", _ftsx: 1 }',
      numYields: 27991,
      locks: { FeatureCompatibilityVersion: 'r', Global: 'r' },
      waitingForLock: false,
      lockStats: {
        FeatureCompatibilityVersion: { acquireCount: { r: Long("27993") } },
        ReplicationStateTransition: { acquireCount: { w: Long("1") } },
        Global: { acquireCount: { r: Long("27993") } },
        Database: { acquireCount: { r: Long("1") } },
        Collection: { acquireCount: { r: Long("1") } },
        Mutex: { acquireCount: { r: Long("2") } }
      },
      waitingForFlowControl: false,
      flowControlStats: {}
    }

billy_noah · March 7, 2023, 3:10pm

Did you check this?

Yes.

The default value shouldn’t be that high.

Specifically which default value are you referring to? I already posted the cacheSizeGB setting I am using which is 2 - I think this is considerably restricting the default value which is “50% of (RAM - 1 GB)”, i.e. 3GB.

Do you have any index building in progress?

No.

Jack_Woehr · March 7, 2023, 3:58pm

I think you might try to simplify the problem query or factor it in some fashion and do some testing on a test partition to validate your assumption and perhaps find a way of making it more performant.

billy_noah · March 7, 2023, 4:06pm

That’s pretty vague Jack. I can see the plan is breaking the search phrase into individual words, but even enclosing it in quotes does not help. I’ve also tried rebuilding the entire index to no avail. I think Mongo should be smart enough to know that if there are 150 messages with rid we don’t need to do FTS on 300,000 messages - and even so, why is it so slow?

Even when I remove all other criteria it is very slow.

The bigger picture issue is that I am not the author of Rocketchat so I essentially have no real control over how it builds queries. I can only say that Mongo doesn’t seem to be properly using the FTS index in this case - or I need to adjust my config to get it to perform.

I’m seeking real concrete advice. If you have experience in this area and want to DM me, we are willing to offer compensation for direct support.

billy_noah · March 7, 2023, 9:25pm

I made some progress on this by dropping the text index and creating a new one with rid included:

db.rocketchat_messages.createIndex( { "msg" : "text", "rid" : 1 } )

Does this look correct? Queries seem to perform a bit better now when a rid is included but still generally very slow and without rid it’s unusable. I’m very interested in understanding what I can do to make this index perform. I have a MySQL db with a fulltext search on something like 3 million rows and it’s very fast - often less than 1 second for results. If I could get Mongo to behave anything like that it would be a dream.

Jack_Woehr · March 10, 2023, 5:30pm

See:

Deepak_Kumar16 · March 11, 2023, 12:10am

billy_noah:

I’m having an issue with MongoDB continually consuming all available RAM and then getting killed by oom. I’ve read a few questions on the stackexchange network that suggest setting storage.wiredTiger.engineConfig.cacheSizeGB can resolve the issue but it is not helping.

Right at this moment, here is the situation:

Mongod Service Status

$ service mongod status
● mongod.service - MongoDB Database Server
     Loaded: loaded (/lib/systemd/system/mongod.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/mongod.service.d
             └─always_restart.conf
     Active: active (running) since Mon 2023-03-06 12:57:35 EST; 3min 26s ago
       Docs: https://docs.mongodb.org/manual
   Main PID: 13318 (mongod)
     Memory: 6.4G
     CGroup: /system.slice/mongod.service
             └─13318 /usr/bin/mongod --config /etc/mongod.conf

Mar 06 12:57:35 chat systemd[1]: Stopped MongoDB Database Server.
Mar 06 12:57:35 chat systemd[1]: Started MongoDB Database Server.

Free Memory

$ free -h
              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       7.2Gi       129Mi       4.0Mi       435Mi       309Mi
Swap:         511Mi       511Mi          0B

Mongo Config

# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  journal:
    enabled: true
  engine: wiredTiger
  wiredTiger:
    engineConfig:
        cacheSizeGB: 2

As you can see I have 8GB of RAM and even now Mongo is on the verge of consuming most of it. By the time I posted this OOM had already intervened:

Mar  6 13:19:13 host kernel: [ 5638.313813] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mongod.service,task=mongod,pid=13997,uid=116
Mar  6 13:19:13 host kernel: [ 5638.314025] Out of memory: Killed process 13997 (mongod) total-vm:8773352kB, anon-rss:6775648kB, file-rss:0kB, shmem-rss:0kB, UID:116 pgtables:14084kB oom_score_adj:0

I realize cacheSizeGB may not be the only factor to consider here, but what else should I be looking at?

@billy_noah not sure, if you are still stuck … but here a few things I can suggest

To address the issue of MongoDB consuming all available RAM, apart from cacheSizeGB, one can consider other factors such as the size of the database and the number of connections to the server. It is also worth checking if there are any poorly written queries that are not optimized and are causing the database to use a lot of memory. Another possible issue could be that the hardware resources are insufficient for the workload.
To further diagnose the issue, one can check the MongoDB logs for any warnings or errors related to memory usage. It is also recommended to monitor the memory usage of the server and the mongod process over time to understand how the consumption is changing. Additionally, it is worth considering provisioning a swap space to prevent the mongod process from being killed by the OOM killer.

Deepak_Kumar16 · March 11, 2023, 12:14am

It’s not possible to identify the specific issues without further information or error messages. However, some possible reasons for poor query performance are:

The text index is too large (as in the related question, where the total index size is 4GB), making queries slow.
The index is not being used for the query, as in the related answer where the query planner output shows that the TEXT stage is not being used.
The dataset is too large, and the query takes a long time to scan through it.
To optimize the index and improve query performance, some possible solutions are:
Consider removing or optimizing any indexes that are not being used or are redundant.
Use explain() to analyze the query plan and identify any possible issues with index usage, and adjust the index accordingly.
Use the pipeline framework instead of find() when dealing with text search queries, as suggested in the related answer.
Consider sharding the dataset to distribute the load across multiple nodes and improve query performance.
However, since there is not enough information provided, the best course of action is to analyze the dataset and query performance using metrics and profiling tools to identify the specific cause of the issue.

steevej · March 11, 2023, 2:30am

@Deepak_Kumar16, another answer that looks like ChatGPT.

What do you quote the whole message? It looks like you use the same cut-n-paste in ChatGPT to get the answer.

billy_noah · March 12, 2023, 10:44pm

I’ve learned quite a lot about MongoDB over the last week or so and have the following to offer future readers:

Mongo only uses one index at a time and key order is important. I was able to rebuild my text index and add rid as the first key. This filters records by rid first then searches by $text. I still am at a loss to explain the very poor performance of my text index. The total index size was around 2GB but Mongo was consuming over 8Gb of RAM on a long running query involving this index.