Primary and arbiter is down at the same time

Raja_Varma · April 24, 2020, 6:26pm

Hi,

We have three node mongo cluster. One primary, one secondary, and an arbiter. The arbiter and primary is done at the same time. How to deal with this situation? How to promote the leftover secondary to primary?

Thanks,
Raja

chris · April 24, 2020, 6:55pm

You can reconfigure the replicaset config using force or restart the node without the replicaSet configuration/flags.

I see Doug typing, lets see what he says.

Doug_Duncan · April 24, 2020, 6:58pm

Hi Raja and welcome to the community.

The arbiter is a light weight server and should be able to come up quickly. I would look at the logs of the arbiter to figure out why it’s down and get it back up as that will be your quickest option.

You could force your single secondary to come up as a lone member, but you will then be in more trouble should that server go down while you’re trying to get the replica set back up and running. If this is production, I would not recommend this path.

satvant_singh · April 28, 2020, 6:09am

I would recommend not to keep Arbiter on Primary or Secondary side , It’s very light weight service you can keep it on any application server if possible for you.

Bruno_Boin · January 12, 2022, 7:59pm

Hi Doug,

I’m reliving this tread because I’m having the same problem and I couldn’t solve it with what I found here.
I’m trying to get the arbiter back up but with no success.

I have 3 data nodes and 1 arbiter. The 2 active nodes are set as Secondary and i have 1 node as primary unreachable.
The arbiter is also down and when i check its status i get:

Jan 12 16:21:36 AXISMEDBRLNX16 mongod[2536]: To see additional information in this output, start without the “–fork” option.
Jan 12 16:21:36 AXISMEDBRLNX16 systemd[1]: mongod.service: control process exited, code=exited status=1
Jan 12 16:21:36 AXISMEDBRLNX16 systemd[1]: Failed to start MongoDB Database Server.
Jan 12 16:21:36 AXISMEDBRLNX16 systemd[1]: Unit mongod.service entered failed state.
Jan 12 16:21:36 AXISMEDBRLNX16 systemd[1]: mongod.service failed.

If i check the log:

2021-12-25T08:41:26.560-0300 W NETWORK [LogicalSessionCacheRefresh] Unable to reach primary for set rs0
2021-12-25T08:41:27.066-0300 W NETWORK [LogicalSessionCacheRefresh] Unable to reach primary for set rs0
2021-12-25T08:41:27.572-0300 W NETWORK [LogicalSessionCacheRefresh] Unable to reach primary for set rs0
2021-12-25T08:41:28.023-0300 I CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2021-12-25T08:41:28.114-0300 I NETWORK [signalProcessingThread] shutdown: going to close listening sockets…
2021-12-25T08:41:28.114-0300 I NETWORK [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
2021-12-25T08:41:28.257-0300 I NETWORK [LogicalSessionCacheRefresh] Marking host “hostname” as failed :: caused by :: ShutdownInProgress: Can’t use connection pool during shutdown
2021-12-25T08:41:28.378-0300 W NETWORK [LogicalSessionCacheRefresh] Unable to reach primary for set rs0
2021-12-25T08:41:28.378-0300 I NETWORK [LogicalSessionCacheRefresh] Cannot reach any nodes for set rs0. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.
2021-12-25T08:41:28.379-0300 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Server is shutting down
2021-12-25T08:41:28.381-0300 I REPL [signalProcessingThread] shutting down replication subsystems
2021-12-25T08:41:28.488-0300 I ASIO [Replication] Killing all outstanding egress activity.
2021-12-25T08:41:28.489-0300 I ASIO [Replication] Dropping all pooled connections to “hostname” due to ShutdownInProgress: Shutting down the connection pool
2021-12-25T08:41:28.489-0300 I ASIO [Replication] Dropping all pooled connections to “hostname” due to ShutdownInProgress: Shutting down the connection pool
2021-12-25T08:41:28.489-0300 I ASIO [Replication] Dropping all pooled connections to “hostname” due to ShutdownInProgress: Shutting down the connection pool
2021-12-25T08:41:28.624-0300 I ASIO [ReplicaSetMonitor-TaskExecutor] Killing all outstanding egress activity.
2021-12-25T08:41:28.736-0300 I CONTROL [signalProcessingThread] Shutting down free monitoring
2021-12-25T08:41:28.859-0300 I FTDC [signalProcessingThread] Shutting down full-time diagnostic data capture
2021-12-25T08:41:29.110-0300 I STORAGE [signalProcessingThread] WiredTigerKVEngine shutting down
2021-12-25T08:41:29.391-0300 I STORAGE [signalProcessingThread] Downgrading WiredTiger datafiles.
2021-12-25T08:41:30.047-0300 I NETWORK [conn21] end connection 172.30.5.40:41537 (28 connections now open)
2021-12-25T08:41:30.160-0300 I NETWORK [conn22] end connection 172.30.5.40:40279 (27 connections now open)
2021-12-25T08:41:30.508-0300 I NETWORK [conn23] end connection 172.30.5.40:45159 (26 connections now open)
2021-12-25T08:41:30.962-0300 I NETWORK [conn24] end connection 172.30.5.40:43673 (25 connections now open)
2021-12-25T08:41:31.391-0300 I NETWORK [conn25] end connection 172.30.5.41:41276 (24 connections now open)
2021-12-25T08:41:31.610-0300 I NETWORK [conn27] end connection 172.30.5.40:38133 (23 connections now open)
2021-12-25T08:41:31.610-0300 I NETWORK [conn26] end connection 172.30.5.40:37830 (22 connections now open)
2021-12-25T08:41:31.646-0300 I STORAGE [signalProcessingThread] WiredTiger message [1640432491:646769][13015:0x7f2ae1f51700], txn-recover: Main recovery loop: starting at 87/50529920 to 88/256
2021-12-25T08:41:31.758-0300 I STORAGE [signalProcessingThread] WiredTiger message [1640432491:758370][13015:0x7f2ae1f51700], txn-recover: Recovering log 87 through 88
2021-12-25T08:41:31.799-0300 I STORAGE [signalProcessingThread] WiredTiger message [1640432491:799861][13015:0x7f2ae1f51700], txn-recover: Recovering log 88 through 88
2021-12-25T08:41:31.856-0300 I STORAGE [signalProcessingThread] WiredTiger message [1640432491:856937][13015:0x7f2ae1f51700], txn-recover: Set global recovery timestamp: 0
2021-12-25T08:41:31.971-0300 I STORAGE [signalProcessingThread] shutdown: removing fs lock…
2021-12-25T08:41:31.971-0300 I CONTROL [signalProcessingThread] now exiting
2021-12-25T08:41:31.971-0300 I CONTROL [signalProcessingThread] shutting down with code:0

Any Idea how can i bring it up?