Hi All,
Sorry if this is the wrong category.
I have inherited the infrastructure from someone who is long gone from my company. I am not aware of the configuration/steps used to bringup the MongoDB containers.
Basically, we are running MongoDB (v3.6.18) as three containers and are configured as a replicaset. When the docker stack is deployed, two of the containers are up and running, but the third one is taking a long time to come up. The two DBs that are up, are about 1GB each. The third DB is about 637GB. Since the third DB is large, it tries for about 3.5hrs and exits and tries to be recreated. This goes on a loop.
The logs from the other two DBs which are up show that they try to reach the other DBs and fails. Similar logs from both DB01 and DB02.
**docker logs <DB01_container_ID>**
<snip>
2022-04-28T00:55:23.546+0000 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Could not find host matching read preference { mode: "primary" } for set graylog
2022-04-28T00:59:58.027+0000 I NETWORK [LogicalSessionCacheRefresh] Starting new replica set monitor for graylog/mongodb_db01:27017,mongodb_db02:27018,mongodb_db03:27019
2022-04-28T01:00:03.030+0000 W NETWORK [LogicalSessionCacheRefresh] Failed to connect to 10.0.11.2:27017 after 5000ms milliseconds, giving up.
2022-04-28T01:00:03.033+0000 W NETWORK [ReplicaSetMonitor-TaskExecutor-0] Failed to connect to 10.0.11.5:27018 after 5000ms milliseconds, giving up.
2022-04-28T01:00:08.033+0000 W NETWORK [LogicalSessionCacheRefresh] Failed to connect to 10.0.11.8:27019 after 5000ms milliseconds, giving up.
2022-04-28T01:00:08.033+0000 W NETWORK [LogicalSessionCacheRefresh] Unable to reach primary for set graylog
2022-04-28T01:00:08.033+0000 I NETWORK [LogicalSessionCacheRefresh] Cannot reach any nodes for set graylog. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.
2022-04-28T01:00:13.539+0000 W NETWORK [LogicalSessionCacheRefresh] Failed to connect to 10.0.11.8:27019 after 5000ms milliseconds, giving up.
2022-04-28T01:00:18.545+0000 W NETWORK [LogicalSessionCacheRefresh] Failed to connect to 10.0.11.2:27017 after 5000ms milliseconds, giving up.
2022-04-28T01:00:23.551+0000 W NETWORK [LogicalSessionCacheRefresh] Failed to connect to 10.0.11.5:27018 after 5000ms milliseconds, giving up.
2022-04-28T01:00:23.551+0000 W NETWORK [LogicalSessionCacheRefresh] Unable to reach primary for set graylog
2022-04-28T01:00:23.551+0000 I NETWORK [LogicalSessionCacheRefresh] Cannot reach any nodes for set graylog. Please check network connectivity and the status of the set. This has happened for 2 checks in a row.
2022-04-28T01:00:23.551+0000 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Could not find host matching read preference { mode: "primary" } for set graylog
<snip>
For DB03, the following are the initial set of logs
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27019 dbpath=/data/db 64-bit host=01d5b6a43712
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] db version v3.6.18
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] git version: 2005f25eed7ed88fa698d9b800fe536bb0410ba4
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] allocator: tcmalloc
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] modules: none
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] build environment:
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] distmod: ubuntu1604
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] distarch: x86_64
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] target_arch: x86_64
2022-04-28T00:08:14.735+0000 I CONTROL [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIpAll: true, port: 27019, ssl: { CAFile: "/etc/certs/ca.pem", PEMKeyFile: "/etc/certs/certandkey.pem", allowConnectionsWithoutCertificates: true, allowInvalidHostnames: true, mode: "preferSSL" } }, replication: { oplogSizeMB: 400, replSetName: "graylog" } }
2022-04-28T00:08:14.737+0000 W - [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2022-04-28T00:08:14.741+0000 I - [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2022-04-28T00:08:14.743+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2022-04-28T00:08:14.743+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=63873M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
When I logged into either the DB01 or DB02 containers and check the status, it shows the following
[root@dcvsl126 sjillalla]# docker exec -it 24664c0d5a58 mongo
MongoDB shell version v3.6.18
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("964473e9-b60b-46b9-b1d1-de99829f62a4") }
MongoDB server version: 3.6.18
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
Server has startup warnings:
2022-04-27T20:09:57.767+0000 I CONTROL [initandlisten]
2022-04-27T20:09:57.767+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2022-04-27T20:09:57.767+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2022-04-27T20:09:57.767+0000 I CONTROL [initandlisten]
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten]
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten] ** WARNING: You are running on a NUMA machine.
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten] ** We suggest launching mongod like this to avoid performance problems:
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten] ** numactl --interleave=all mongod [other options]
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten]
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten]
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2022-04-27T20:09:57.769+0000 I CONTROL [initandlisten]
graylog:OTHER> rs.status()
{
"state" : 10,
"stateStr" : "REMOVED",
"uptime" : 16290,
"optime" : {
"ts" : Timestamp(1650247277, 4),
"t" : NumberLong(111)
},
"optimeDate" : ISODate("2022-04-18T02:01:17Z"),
"lastHeartbeatMessage" : "",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"ok" : 0,
"errmsg" : "Our replica set config is invalid or we are not a member of it",
"code" : 93,
"codeName" : "InvalidReplicaSetConfig",
"operationTime" : Timestamp(1650247277, 4),
"$clusterTime" : {
"clusterTime" : Timestamp(1650247277, 4),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
graylog:OTHER> exit
The docker compose file is
version: '3'
services:
db01:
image: docker-prod.tools.royalsunalliance.ca/mongo:3.6.18
volumes:
- /docker/services/mongodb/db01:/data/db
- /docker/services/mongodb/db01-dump:/data/db/dump
- /docker/services/mongodb/db01-config/mongod.conf:/etc/mongod.conf
- /docker/services/elasticsearch-prod/certs/db01:/etc/certs
ports:
- "27017:27017"
command: ["mongod", "--sslAllowConnectionsWithoutCertificates", "--sslMode", "preferSSL", "--sslPEMKeyFile", "/etc/certs/certandkey.pem", "--sslCAFile", "/etc/certs/ca.pem", "--config", "/etc/mongod.conf", "--sslAllowInvalidHostnames"]
db02:
image: docker-prod.tools.royalsunalliance.ca/mongo:3.6.18
volumes:
- /docker/services/mongodb/db02:/data/db
- /docker/services/mongodb/db02-dump:/data/db/dump
- /docker/services/mongodb/db02-config/mongod.conf:/etc/mongod.conf
- /docker/services/elasticsearch-prod/certs/db02:/etc/certs
ports:
- "27018:27018"
command: ["mongod", "--port", "27018", "--sslAllowConnectionsWithoutCertificates", "--sslMode", "preferSSL", "--sslPEMKeyFile", "/etc/certs/certandkey.pem", "--sslCAFile", "/etc/certs/ca.pem", "--config", "/etc/mongod.conf", "--sslAllowInvalidHostnames"]
#command: ["mongod", "--config", "/etc/mongod.conf"]
db03:
image: docker-prod.tools.royalsunalliance.ca/mongo:3.6.18
volumes:
- /docker/services/mongodb/db03:/data/db
- /docker/services/mongodb/db03-dump:/data/db/dump
- /docker/services/mongodb/db03-config/mongod.conf:/etc/mongod.conf
- /docker/services/elasticsearch-prod/certs/db03:/etc/certs
ports:
- "27019:27019"
command: ["mongod", "--port", "27019", "--sslAllowConnectionsWithoutCertificates", "--sslMode", "preferSSL", "--sslPEMKeyFile", "/etc/certs/certandkey.pem", "--sslCAFile", "/etc/certs/ca.pem", "--config", "/etc/mongod.conf", "--sslAllowInvalidHostnames"]
and there is a rs.initiate file which I am sure is used for configuring the replicaset, but not sure how.
rs.initiate({
"_id": "graylog",
"version": 1,
"members" : [
{"_id": 1, "host": "mongodb_db01:27017"},
{"_id": 2, "host": "mongodb_db02:27018"},
{"_id": 3, "host": "mongodb_db03:27019"}
]
})
Please let me know how I can recover the MongoDB stack so that the containers are all up and running properly.
Let me know if you need any more information.