Debug Out of Memory Crash

billy_noah · March 17, 2022, 8:27pm

We recently moved to a manual install of Rocketchat which uses MongoDB for it’s database and I am completely new to MongoDB.

After running for a weeks without issue today it appears the kernel OOM killer went to work on MongoDB and killed the main server process. Normally when things like this happen, my assumption is that:

there may be a query design flaw, missing index or some other underlying issue causing certain processes to stall or consume more memory than normal.
there may also be some server or db configuration that needs tuning to limit memory consumption

It would also be nice if the Mongo server restarted itself after such events but that’s another story.

As an absolute beginner, can someone please help guide me in getting to the root of what happened here? I see lots of "msg":"Slow query" and "msg":"serverStatus was very slow" messages in the logs among other things. Please let me know what further info I can provide here that would be of use debugging this issue.

Fabio_Ramohitaj · March 17, 2022, 10:03pm

Hi @billy_noah,
I think you need to create index to permorf a faster query:

I suggest you to read the production note to have the better environment for mongo (and avoid oom killer) :

And finally i suggest you to install mongodb compass:

Best regards

billy_noah · March 17, 2022, 11:08pm

Thank you for the resources. I was just wondering if the “slow query” messages could simply be related to server memory issues stemming from some other database problem - or if we can confidently assume that they are the root cause of running out of memory? Coming over from MySQL, I’ve seen a single long running process do things like that and affect many other queries and overall performance.

At any rate, is this a place where I can post a few log snippets to get more specific advice?

kevinadi · March 25, 2022, 12:43am

Hi @billy_noah

Sorry you’re having issues with this. OOMkiller is a common issue that typically points to the lack of resources in the server itself for the workload it’s supposed to do. You are correct that it may be caused by query issues. However if you’re only running Rocketchat, there’s probably not much to be done on that front. That is, other than provisioning a larger server, or setting up swap (this may delay but not eliminate the main issue).

Are you running Rocketchat and mongod in the same server? If yes, it may benefit both apps if you separate them into their own servers. A single mongod process was designed to wholly take over the server to give you the best possible performance out of the hardware you have. Consequently in a busy server they frequently have resource contention issues with other apps, or with other mongod running on the same machine.

Yes this is typical message you see when a server is being overwhelmed by work. MongoDB calls serverStatus every second for telemetry purposes and the call should be very quick to complete (it was designed to place negligible burden on the server). If these calls start to get bogged down, it means the server is really busy.

Yes if you have some log snippets that you think may be helpful, please do post them here.

Best regards
Kevin

billy_noah · March 8, 2023, 3:02pm

A year late here @kevinadi but hopefully you are still around.

After spending a bit of time over the last few days learning more about which specific queries have been causing issues I can offer a bit more here and hopefully get some productive feedback.

The problem queries seem to be ones that use a text index and have certain words under 4 characters. For instance, this query performs fine:

db.rocketchat_message.find({ '$text': { '$search': 'type=test' }, t: { '$ne': 'rm' }, _hidden: { '$ne': true }, rid: 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i' })

While this one is very slow:

db.rocketchat_message.find({ '$text': { '$search': 'type=www' }, t: { '$ne': 'rm' }, _hidden: { '$ne': true }, rid: 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i' })

In fact, even searching “www” is very slow but “wwww” is not. Furthermore, despite having created a compound index and added rid to the text index I found the order of the columns makes a big difference. When I created this, it would search text fields first (on 6 million records) and then filter by rid which is silly because rid only matches 257 of those:

db.rocketchat_message.createIndex( {"msg" : "text",  "rid" : 1 } )

This led me to recreate the index as such:

db.rocketchat_message.createIndex( { "rid" : 1, "msg" : "text" } )

This is now much faster but obviously fails completely when rid is not included. That makes the index useless for querying the entire table and since Mongodb only allows one text index per table I’m a bit stuck there.

At any rate, I’m wondering I can get some specific input into why my initial queries are performing so badly on shorter word lengths. Here is an explain that might help shed some light:

db.rocketchat_message.find({ '$text': { '$search': 'https://www.example.com' }, t: { '$ne': 'rm' }, _hidden: { '$ne': true }, rid: 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i' }).explain('executionStats')
{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'rocketchat.rocketchat_message',
    indexFilterSet: false,
    parsedQuery: {
      '$and': [
        { rid: { '$eq': 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i' } },
        { _hidden: { '$not': { '$eq': true } } },
        { t: { '$not': { '$eq': 'rm' } } },
        {
          '$text': {
            '$search': 'https://www.example.com',
            '$language': 'english',
            '$caseSensitive': false,
            '$diacriticSensitive': false
          }
        }
      ]
    },
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { _hidden: { '$not': { '$eq': true } } },
          { t: { '$not': { '$eq': 'rm' } } }
        ]
      },
      inputStage: {
        stage: 'TEXT_MATCH',
        indexPrefix: {},
        indexName: 'msg_text_rid_1',
        parsedTextQuery: {
          terms: [ 'com', 'exampl', 'https', 'www' ],
          negatedTerms: [],
          phrases: [],
          negatedPhrases: []
        },
        textIndexVersion: 3,
        inputStage: {
          stage: 'FETCH',
          inputStage: {
            stage: 'OR',
            filter: { rid: { '$eq': 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i' } },
            inputStages: [
              {
                stage: 'IXSCAN',
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {}
              },
              {
                stage: 'IXSCAN',
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {}
              },
              {
                stage: 'IXSCAN',
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {}
              },
              {
                stage: 'IXSCAN',
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {}
              }
            ]
          }
        }
      }
    },
    rejectedPlans: []
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 2,
    executionTimeMillis: 47743,
    totalKeysExamined: 13301404,
    totalDocsExamined: 4,
    executionStages: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { _hidden: { '$not': { '$eq': true } } },
          { t: { '$not': { '$eq': 'rm' } } }
        ]
      },
      nReturned: 2,
      executionTimeMillisEstimate: 15933,
      works: 13301408,
      advanced: 2,
      needTime: 13301405,
      needYield: 0,
      saveState: 13308,
      restoreState: 13308,
      isEOF: 1,
      docsExamined: 2,
      alreadyHasObj: 2,
      inputStage: {
        stage: 'TEXT_MATCH',
        nReturned: 2,
        executionTimeMillisEstimate: 15717,
        works: 13301408,
        advanced: 2,
        needTime: 13301405,
        needYield: 0,
        saveState: 13308,
        restoreState: 13308,
        isEOF: 1,
        indexPrefix: {},
        indexName: 'msg_text_rid_1',
        parsedTextQuery: {
          terms: [ 'com', 'exampl', 'https', 'www' ],
          negatedTerms: [],
          phrases: [],
          negatedPhrases: []
        },
        textIndexVersion: 3,
        docsRejected: 0,
        inputStage: {
          stage: 'FETCH',
          nReturned: 2,
          executionTimeMillisEstimate: 15470,
          works: 13301408,
          advanced: 2,
          needTime: 13301405,
          needYield: 0,
          saveState: 13308,
          restoreState: 13308,
          isEOF: 1,
          docsExamined: 2,
          alreadyHasObj: 0,
          inputStage: {
            stage: 'OR',
            filter: { rid: { '$eq': 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i' } },
            nReturned: 2,
            executionTimeMillisEstimate: 15309,
            works: 13301408,
            advanced: 2,
            needTime: 13301405,
            needYield: 0,
            saveState: 13308,
            restoreState: 13308,
            isEOF: 1,
            dupsTested: 13301404,
            dupsDropped: 6824735,
            inputStages: [
              {
                stage: 'IXSCAN',
                nReturned: 351631,
                executionTimeMillisEstimate: 214,
                works: 351632,
                advanced: 351631,
                needTime: 0,
                needYield: 0,
                saveState: 13308,
                restoreState: 13308,
                isEOF: 1,
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {},
                keysExamined: 351631,
                seeks: 1,
                dupsTested: 351631,
                dupsDropped: 0
              },
              {
                stage: 'IXSCAN',
                nReturned: 271,
                executionTimeMillisEstimate: 0,
                works: 272,
                advanced: 271,
                needTime: 0,
                needYield: 0,
                saveState: 13308,
                restoreState: 13308,
                isEOF: 1,
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {},
                keysExamined: 271,
                seeks: 1,
                dupsTested: 271,
                dupsDropped: 0
              },
              {
                stage: 'IXSCAN',
                nReturned: 6476186,
                executionTimeMillisEstimate: 5115,
                works: 6476187,
                advanced: 6476186,
                needTime: 0,
                needYield: 0,
                saveState: 13308,
                restoreState: 13308,
                isEOF: 1,
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {},
                keysExamined: 6476186,
                seeks: 1,
                dupsTested: 6476186,
                dupsDropped: 0
              },
              {
                stage: 'IXSCAN',
                nReturned: 6473316,
                executionTimeMillisEstimate: 4796,
                works: 6473317,
                advanced: 6473316,
                needTime: 0,
                needYield: 0,
                saveState: 13308,
                restoreState: 13308,
                isEOF: 1,
                keyPattern: { _fts: 'text', _ftsx: 1, rid: 1 },
                indexName: 'msg_text_rid_1',
                isMultiKey: true,
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'backward',
                indexBounds: {},
                keysExamined: 6473316,
                seeks: 1,
                dupsTested: 6473316,
                dupsDropped: 0
              }
            ]
          }
        }
      }
    }
  },
  command: {
    find: 'rocketchat_message',
    filter: {
      '$text': { '$search': 'https://www.example.com' },
      t: { '$ne': 'rm' },
      _hidden: { '$ne': true },
      rid: 'LkgTmX2dCncp5Rxtcx2Hj2YYiyyK49zj9i'
    },
    '$db': 'rocketchat'
  },
  serverInfo: {
    host: 'chat',
    port: 27017,
    version: '5.0.15',
    gitVersion: '935639beed3d0c19c2551c93854b831107c0b118'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
  },
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1678285569, i: 1 }),
    signature: {
      hash: Binary(Buffer.from("e07e7a674340cdc236b59f232718bd4694bb0321", "hex"), 0),
      keyId: Long("7176963756902055940")
    }
  },
  operationTime: Timestamp({ t: 1678285569, i: 1 })
}