Mongod shutting down

Nicolas_Riesco · July 27, 2020, 9:16pm

Today my mongod instance just shut down, no idea why. I’ve been working with Mongodb for several years, but I’ve never seen this.

The instance is running in Digital Ocean and the applications that connect to this instance are in the same server.

I’m running version 4.2.8 on debian:

4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux

Mongod:

db version v4.2.8
git version: 43d25964249164d76d5e04dd6cf38f6111e21f5f
OpenSSL version: OpenSSL 1.1.1d  10 Sep 2019
allocator: tcmalloc
modules: none
build environment:
    distmod: debian10
    distarch: x86_64
    target_arch: x86_64

The logs shows this but there seems to be no reason for the shutdown:

2020-07-27T19:01:32.628+0000 I  COMMAND  [conn3146] command smartcob_3_200.importados appName: "mongodump" command: getMore { getMore: 1975688055802701079, collection: "importados", lsid: { id: UUID("e386989c-4ca0-4007-9806-d611e20d3b04") }, $db: "smartcob_3_200" } originatingCommand: { find: "importados", filter: {}, lsid: { id: UUID("e386989c-4ca0-4007-9806-d611e20d3b04") }, $db: "smartcob_3_200" } planSummary: COLLSCAN cursorid:1975688055802701079 keysExamined:0 docsExamined:11307 numYields:89 nreturned:11307 reslen:16776566 locks:{ ReplicationStateTransition: { acquireCount: { w: 90 } }, Global: { acquireCount: { r: 90 } }, Database: { acquireCount: { r: 90 } }, Collection: { acquireCount: { r: 90 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 16761457, timeReadingMicros: 152636 } } protocol:op_msg 172ms
2020-07-27T19:01:32.871+0000 I  COMMAND  [conn3144] command smartcob_3_200.importados appName: "mongodump" command: getMore { getMore: 1975688055802701079, collection: "importados", lsid: { id: UUID("e386989c-4ca0-4007-9806-d611e20d3b04") }, $db: "smartcob_3_200" } originatingCommand: { find: "importados", filter: {}, lsid: { id: UUID("e386989c-4ca0-4007-9806-d611e20d3b04") }, $db: "smartcob_3_200" } planSummary: COLLSCAN cursorid:1975688055802701079 keysExamined:0 docsExamined:11327 numYields:88 nreturned:11327 reslen:16775970 locks:{ ReplicationStateTransition: { acquireCount: { w: 89 } }, Global: { acquireCount: { r: 89 } }, Database: { acquireCount: { r: 89 } }, Collection: { acquireCount: { r: 89 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 16797002, timeReadingMicros: 136238 } } protocol:op_msg 154ms
2020-07-27T19:01:33.043+0000 I  NETWORK  [conn3145] end connection 127.0.0.1:49374 (113 connections now open)
2020-07-27T19:01:33.043+0000 I  NETWORK  [conn3146] end connection 127.0.0.1:49376 (112 connections now open)
2020-07-27T19:01:33.043+0000 I  NETWORK  [conn3144] end connection 127.0.0.1:49372 (111 connections now open)
2020-07-27T19:01:33.044+0000 I  NETWORK  [conn3143] end connection 127.0.0.1:49370 (110 connections now open)
2020-07-27T19:01:33.044+0000 I  NETWORK  [conn3142] end connection 127.0.0.1:49368 (109 connections now open)
2020-07-27T19:01:47.971+0000 I  CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2020-07-27T19:01:48.084+0000 I  NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
2020-07-27T19:01:48.084+0000 I  NETWORK  [listener] removing socket file: /tmp/mongodb-27017.sock
2020-07-27T19:01:48.087+0000 I  -        [signalProcessingThread] Stopping further Flow Control ticket acquisitions.
2020-07-27T19:01:48.092+0000 I  CONTROL  [signalProcessingThread] Shutting down free monitoring
2020-07-27T19:01:48.094+0000 I  FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
2020-07-27T19:01:48.113+0000 I  STORAGE  [signalProcessingThread] Deregistering all the collections
2020-07-27T19:01:49.777+0000 I  STORAGE  [signalProcessingThread] Timestamp monitor shutting down
2020-07-27T19:01:49.780+0000 I  STORAGE  [signalProcessingThread] WiredTigerKVEngine shutting down
2020-07-27T19:01:49.780+0000 I  STORAGE  [signalProcessingThread] Shutting down session sweeper thread
2020-07-27T19:01:49.780+0000 I  STORAGE  [signalProcessingThread] Finished shutting down session sweeper thread
2020-07-27T19:01:49.780+0000 I  STORAGE  [signalProcessingThread] Shutting down journal flusher thread
2020-07-27T19:01:49.860+0000 I  STORAGE  [signalProcessingThread] Finished shutting down journal flusher thread
2020-07-27T19:01:49.860+0000 I  STORAGE  [signalProcessingThread] Shutting down checkpoint thread
2020-07-27T19:01:49.861+0000 I  STORAGE  [signalProcessingThread] Finished shutting down checkpoint thread
2020-07-27T19:01:49.889+0000 I  STORAGE  [signalProcessingThread] Downgrading WiredTiger datafiles.
2020-07-27T19:01:51.455+0000 I  STORAGE  [signalProcessingThread] WiredTiger message [1595876511:455212][444:0x7f8ba7e62700], txn-recover: Recovering log 999 through 1000
2020-07-27T19:01:51.460+0000 I  STORAGE  [signalProcessingThread] WiredTiger message [1595876511:460761][444:0x7f8ba7e62700], txn-recover: Recovering log 1000 through 1000
2020-07-27T19:01:51.664+0000 I  STORAGE  [signalProcessingThread] WiredTiger message [1595876511:664796][444:0x7f8ba7e62700], txn-recover: Main recovery loop: starting at 999/99623296 to 1000/256
2020-07-27T19:01:51.740+0000 I  STORAGE  [signalProcessingThread] WiredTiger message [1595876511:740506][444:0x7f8ba7e62700], txn-recover: Recovering log 999 through 1000
2020-07-27T19:01:51.745+0000 I  STORAGE  [signalProcessingThread] WiredTiger message [1595876511:745630][444:0x7f8ba7e62700], txn-recover: Recovering log 1000 through 1000
2020-07-27T19:01:51.813+0000 I  STORAGE  [signalProcessingThread] WiredTiger message [1595876511:813085][444:0x7f8ba7e62700], txn-recover: Set global recovery timestamp: (0, 0)
2020-07-27T19:01:51.844+0000 I  STORAGE  [signalProcessingThread] shutdown: removing fs lock...
2020-07-27T19:01:52.945+0000 I  CONTROL  [signalProcessingThread] now exiting
2020-07-27T19:01:52.946+0000 I  CONTROL  [signalProcessingThread] shutting down with code:0

I’m totally lost. Any help would be appreciated.

Pavel_Duchovny · July 29, 2020, 5:43am

Hi @Nicolas_Riesco,

The signal 15 means that someone or something issued a graceful kill command

CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends

Check /var/log/messages or other system logs to understand who issued the command.

Can it be part of cloud maintenance?

Unfortunately the logs won’t reveal much, thus database audit is useful as part of the enterprise version.

Thanks
Pavel

Nicolas_Riesco · July 29, 2020, 4:31pm

Thanks for your response @Pavel_Duchovny

I’m starting to thing that the shutdown was not the problem. I did reboot the server to avoid service interruption, this is actually not a “Mongo shutting down” issue.

The lines before the shutdown show appName: "mongodump" and I do have a mongodump script running every x hours.

So my problem should be that somehow mongodump causes the db to be unresponsive, so I should move to another topic on how to make my backups so they don’t interrupt any services.

Thanks a lot!

As I dig into the logs the problem might be another one, it might be just before this, when doing a backup of the DB.

Pavel_Duchovny · July 29, 2020, 4:39pm

@Nicolas_Riesco,

Is that a replica set or a standalone host?

Using mongodump for backups might be tricky as it needs to read all data through the memory. Also you need to export the oplog to make it consistent.

If that is a replica set I would recommend using filesystem snapshots instead , and consider using our cloud manager backup for long term

Best regards
Pavel