Mongo Cluster: Node unable to join the cluster after failure

Ronak_Shah · October 25, 2023, 7:36am

Created a cluster with 3 nodes (A - Primary, B, C). Everything worked fine. Now,

Node C goes down
Node C comes up and starts replicating with primary node A.
While replicating is ongoing, node A (primary node) goes down.
We receive an error in Pymongo saying: “primary stepped down while waiting for replication”
Node A, comes up.
But now, Node C is unable to join the cluster and the cluster status command shows only 2 nodes (A and B).

How can we recover the cluster and make node C functional?

Aasawari · November 6, 2023, 1:33pm

Hi @Ronak_Shah and welcome to MongoDB community forums!!

Based on the above deployment configuration, I tried to insert sample data infinitely using the code below while I tried to perform the shutdown from the secondary and then primary as mentioned.

The sample code:

import random
from pymongo import MongoClient
replica_set_name = "rs02"
host1 = 'localhost:6000'
host2 = 'localhost:6001'
host3 = 'localhost:6002'
connection_string = f"mongodb://{host1},{host2},{host3}/?replicaSet={replica_set_name}"

try:
    client = MongoClient(connection_string)
    db = client['test']
    collection = db['sample']

    while True:
        random_number = random.randint(1, 100)
        collection.insert_one({'random_number': random_number})
        print(f"Inserted: {random_number}")

except Exception as e:
    print(f"An error occurred: {e}")

While the insertion was being performed, I tried the following steps:

shutdown one of the secondary, the insertion was still continuing in the primary node.
When the secondary came up, I tried to shutdown the primary node.
Ideally, when the primary goes down, immediately the election takes places and one between the secondaries is selected as the primary node.
The insertion still continued and the current status of the replica set had one primary and one secondary.
When I restarted the primary, it resynced with the replica set and stayed as primary node.

Please note that the test has been conducted on MongoDB version 6.0.5 and the shutdown performed have been graceful shutdowns.

In order to triage your issue further could you help me with some information as:

A sample pymongo code to replicate the issue.
Are you facing the issue outside the python application?
How does the node goes down?
Do you see any error messages in the logs for the primary or secondary ?
What is the MongoDB version you are using?

Regards
Aasawari