Mongo: 4.4.5
Mongoose: 6.6.0
Node: 14.20.1 (bullseye-slim)
Docker: 20.10.8
Problem
We have a docker swarm with multiple servers and a mongo stack with 3 replicas each on different servers.
Our rs1(primary) replica goes down due to a Docker restart. When this happens an election occurs and rs3 is selected as the new primary. After rs1 is re-elected a few seconds later, some of our backend client replicas receive the following error when making queries:
MongooseServerSelectionError: Server selection timed out after 30000 ms
at Function.Model.$wrapCallback (/usr/share/backend-server/node_modules/mongoose/lib/model.js:5192:32)
at /usr/share/backend-server/node_modules/mongoose/lib/query.js:4901:21
at /usr/share/backend-server/node_modules/mongoose/lib/helpers/promiseOrCallback.js:41:5
at new Promise (<anonymous>)
at promiseOrCallback (/usr/share/backend-server/node_modules/mongoose/lib/helpers/promiseOrCallback.js:40:10)
at model.Query.exec (/usr/share/backend-server/node_modules/mongoose/lib/query.js:4900:10)
at model.Query.Query.then (/usr/share/backend-server/node_modules/mongoose/lib/query.js:4983:15) {
reason: TopologyDescription {
type: 'ReplicaSetNoPrimary',
servers: Map(3) {
'rs1:27017' => [ServerDescription],
'rs2:27017' => [ServerDescription],
'rs3:27017' => [ServerDescription]
},
stale: false,
compatible: true,
heartbeatFrequencyMS: 10000,
localThresholdMS: 15,
setName: 'rs0',
maxElectionId: new ObjectId("7fffffff0000000000000053"),
maxSetVersion: 1,
commonWireVersion: 0,
logicalSessionTimeoutMinutes: 30
},
code: undefined
}
Upon checking the status of the replicas, rs1 is the primary and most of our backend clients fulfill queries correctly. We have seen this issue occur on two separate occasions and we are unsure why some Mongoose clients are unable to find the primary replica after the election. We have been unable to recreate the error intentionally.
Our Setup
docker-compose.yml:
The image being used is for Mongodb 4.4.5
version: '3.3'
secrets:
mongo_cluster_key:
external: true
services:
rs1:
image: mongodb-custom:v1.0.0
command: mongod --keyFile /run/secrets/mongo_cluster_key --replSet "rs0"
networks:
- mongo
ports:
- 27017:27017
secrets:
- source: mongo_cluster_key
target: mongo_cluster_key
uid: '999'
gid: '999'
mode: 0400
environment:
- MONGO_INITDB_ROOT_USERNAME=admin
- MONGO_INITDB_ROOT_PASSWORD=password
- MONGO_INITDB_DATABASE=admin
- MAIN_MONGO_DB_NAME=testing
- MAIN_MONGO_DB_USERNAME=test
- MAIN_MONGO_DB_PASSWORD=password
- MAIN_MONGO_DB_ROLE=readWrite
deploy:
replicas: 1
volumes:
- rs1:/data/db
- rs1:/data/configdb
rs2:
image: mongodb-custom:v1.0.0
command: mongod --keyFile /run/secrets/mongo_cluster_key --replSet "rs0"
networks:
- mongo
secrets:
- source: mongo_cluster_key
target: mongo_cluster_key
uid: '999'
gid: '999'
mode: 0400
environment:
- MONGO_INITDB_ROOT_USERNAME=admin
- MONGO_INITDB_ROOT_PASSWORD=password
- MONGO_INITDB_DATABASE=admin
- MAIN_MONGO_DB_NAME=testing
- MAIN_MONGO_DB_USERNAME=test
- MAIN_MONGO_DB_PASSWORD=password
- MAIN_MONGO_DB_ROLE=readWrite
deploy:
replicas: 1
volumes:
- rs2:/data/db
- rs2:/data/configdb
rs3:
image: mongodb-custom:v1.0.0
command: mongod --keyFile /run/secrets/mongo_cluster_key --replSet "rs0"
networks:
- mongo
secrets:
- source: mongo_cluster_key
target: mongo_cluster_key
uid: '999'
gid: '999'
mode: 0400
environment:
- MONGO_INITDB_ROOT_USERNAME=admin
- MONGO_INITDB_ROOT_PASSWORD=password
- MONGO_INITDB_DATABASE=admin
- MAIN_MONGO_DB_NAME=testing
- MAIN_MONGO_DB_USERNAME=test
- MAIN_MONGO_DB_PASSWORD=password
- MAIN_MONGO_DB_ROLE=readWrite
deploy:
replicas: 1
volumes:
- rs3:/data/db
- rs3:/data/configdb
rs:
image: mongodb-custom:v1.0.0
command: /usr/local/bin/replica-init.sh
networks:
- mongo
secrets:
- source: mongo_cluster_key
target: mongo_cluster_key
uid: '999'
gid: '999'
mode: 0400
environment:
- MONGO_INITDB_ROOT_USERNAME=admin
- MONGO_INITDB_ROOT_PASSWORD=password
- MONGO_INITDB_DATABASE=admin
- MAIN_MONGO_DB_NAME=testing
- MAIN_MONGO_DB_USERNAME=test
- MAIN_MONGO_DB_PASSWORD=password
- MAIN_MONGO_DB_ROLE=readWrite
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 10
volumes:
rs1:
driver: local
rs2:
driver: local
rs3:
driver: local
networks:
mongo:
driver: overlay
driver_opts:
encrypted: "true"
internal: true
attachable: true
replica-init.sh:
#!/bin/bash
# Make sure 3 replicas available
for rs in rs1 rs2 rs3;do
mongo --host $rs --eval 'db'
if [ $? -ne 0 ]; then
exit 1
fi
done
MONGO_INITDB_ROOT_USERNAME="$(< $MONGO_INITDB_ROOT_USERNAME_FILE)"
MONGO_INITDB_ROOT_PASSWORD="$(< $MONGO_INITDB_ROOT_PASSWORD_FILE)"
# Connect to rs1 and configure replica set if not done
status=$(mongo --host rs1 --quiet --eval 'rs.status().members.length')
if [ $? -ne 0 ]; then
# Replicaset not yet configured
mongo --username $MONGO_INITDB_ROOT_USERNAME -p $MONGO_INITDB_ROOT_PASSWORD --host rs1 --eval 'rs.initiate({ _id: "rs0", version: 1, members: [ { _id: 0, host : "rs1", priority: 100 }, { _id: 1, host : "rs2", priority: 2 }, { _id: 2, host : "rs3", priority: 2 } ] })';
fi
backend-server.js
const mongoose = require('mongoose');
// MongoDB Connection Class
class MongoDB {
constructor() {
mongoose
.connect('mongodb://test:password@rs1:27017,rs2:27017,rs3:27017/testing?replicaSet=rs0')
.then(() => {
console.log('Connected to MongoDB');
})
.catch((err) => {
console.error('MongoDB Error: ', err.message);
});
// Add error handler while connected
mongoose.connection.on('error', (err) => {
console.log('MongoDB Error: ', err);
});
mongoose.pluralize(null);
}
}
module.exports = new MongoDB();
We have 2500 connections, between 60 clients, so using the directConnection flag may be too slow.
What we’ve tried
- We have tested the replica election process and it seems to be electing a secondary node as the new primary correctly once the original primary goes down. It also is re-electing the original primary once that node comes back up.
- We have verified that the priorities for each primary and secondary node are set correctly.
- The Docker Swarm DNS resolves rs1’s hostname and is reachable from the backend server container that has a Mongoose client receiving the error above.
We verified that the authentication and connection strings are all setup correctly. When this error occurs, restarting the backend server docker containers fixes the issue. We are not receiving any connection errors in our backend server logs. Errors only appear when querying the database.