Hi all,
We have a sharded cluster deployed 5 mongos/config servers and 4 shard replica sets all deployed across datacenters. Shards are deployed with 1 Primary 2 secondary members each. This database is not sharded, all the data remains on just one shard. We recently attempted to restore about 40,000 documents to one of the collections in the database using mongorestore, but this caused the shard replica set to fail.
For some reason, running the mongorestore caused replication to stop working on the replica set. As soon as I stopped the restore job, the cluster returned to normal and was able to elect a primary. We got the below error message: “Host failed in replica set” where the primary fails and the replica set failed to elect a new primary. The cluster was down and no users were able to access the database at that time.
I have looked through the logs, but can’t seem to find anything to indicate what exactly caused this, we have run similar restore jobs before with no issues.
Any help with this would be greatly appreciated.