MongoDB Enterprise Kubernetes Operator on EKS

Hi All,

Followed the guide to setup mongodb enterprise for trial.

Here are the steps I followed:

  1. Setup EKS Cluster.
  2. Created the namespace.
  3. Followed the instructions to deploy Operator and Ops Manager.
  4. Deployed a MongoDB Replica set.
  5. Validated that the Ops Manager and Replica Set come up.
  6. Exposed the replica set as nodeport.
  7. Updated the replica set to include replicaSetHorizons based on the ec2 instance and exposed nodeport. persistent: true
    security:
    tls:
    enabled: true
    connectivity:
    replicaSetHorizons:
    • “test-db”: “ec2-node1-public-ip:32561”
    • “test-db”: “ec2-node2-public-ip:31828”
    • “test-db”: “ec2-node3-public-ip:32212”
  8. Approved the CSR’s.
  9. Updated SG to allow all traffic for the node ports.
  10. Test the connectivity from CLI using the command: mongo --host test-replica-set/ec2-node1-public-ip:32561,ec2-node2-public-ip:31828,ec2-node3-public-ip:32212 --ssl --sslAllowInvalidCertificates

Connectivity test fails constantly. Can someone let me know what could be causing issue with external connectivity ?

Thank you

Hi @Kish_V,

What error are you receiving upon connection?

Have you confirmed all nodes and ports are correct?

Also please try to use the CAFile with the mongo shell and see if you face a similar issue.

Best
Pavel

See attachment for more details.

Also some of the documentation refers to deploy the replica set with tls enabled as false but the replica set fails with this option being false.

Hi @Kish_V,

Why are the ports specified in the mongo shell command, differ from the one in the horizon specifications:

test-db”: “ec2-node1-public-ip:32561”
“test-db”: “ec2-node2-public-ip:31828”
“test-db”: “ec2-node3-public-ip:32212”

What are 32595 and 30432?
Our documentation uses only ports and nodes specified in the horizon clauses.

Best
Pavel

Hi @Pavel_Duchovny,

I created a new cluster since I usually don’t keep them long due to the cost.

Here is the complete breakdown with detailed information.

Since this is my test environment, I am ok to share the IP’s and detailed information which might assist you with reviewing it but will be tore down once we have a resolution.

connectivity:
replicaSetHorizons:
- “customer-prod-db”: “ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671”
- “customer-prod-db”: “ec2-3-17-154-122.us-east-2.compute.amazonaws.com:32595”
- “customer-prod-db”: “ec2-18-218-2-179.us-east-2.compute.amazonaws.com:30432”

kubectl get svc | grep customer-rep
customer-replica-set-0 NodePort 10.100.66.6 27017:31671/TCP 76m
customer-replica-set-1 NodePort 10.100.185.237 27017:32595/TCP 76m
customer-replica-set-2 NodePort 10.100.37.222 27017:30432/TCP 76m
customer-replica-set-svc ClusterIP None 27017/TCP 79m

kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
customer-replica-set-0 1/1 Running 0 10h 192.168.22.100 ip-192-168-2-202.us-east-2.compute.internal
customer-replica-set-1 1/1 Running 0 10h 192.168.80.99 ip-192-168-85-40.us-east-2.compute.internal
customer-replica-set-2 1/1 Running 0 10h 192.168.33.103 ip-192-168-51-42.us-east-2.compute.internal

kubectl get nodes -o jsonpath=‘{ $.items[*].status.addresses[?(@.type==“ExternalDNS”)].address }’
ec2-18-216-32-24.us-east-2.compute.amazonaws.com
ec2-18-218-2-179.us-east-2.compute.amazonaws.com
ec2-3-17-154-122.us-east-2.compute.amazonaws.com

kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-192-168-2-202.us-east-2.compute.internal Ready 11h v1.16.13-eks-2ba888 192.168.2.202 18.216.32.24 Amazon Linux 2 4.14.186-146.268.amzn2.x86_64 docker://19.3.6
ip-192-168-51-42.us-east-2.compute.internal Ready 11h v1.16.13-eks-2ba888 192.168.51.42 18.218.2.179 Amazon Linux 2 4.14.186-146.268.amzn2.x86_64 docker://19.3.6
ip-192-168-85-40.us-east-2.compute.internal Ready 11h v1.16.13-eks-2ba888 192.168.85.40 3.17.154.122 Amazon Linux 2 4.14.186-146.268.amzn2.x86_64 docker://19.3.6

Navigating the node/ip:
customer-replica-set-0 (31671) → ip-192-168-2-202.us-east-2.compute.internal → external ip → 18.216.32.24 → nslookup → ec2-18-216-32-24.us-east-2.compute.amazonaws.com

customer-replica-set-1 (32595) → ip-192-168-85-40.us-east-2.compute.internal-> external ip → 3.17.154.122 → nslookup → ec2-3-17-154-122.us-east-2.compute.amazonaws.com

customer-replica-set-2 (30432) → ip-192-168-51-42.us-east-2.compute.internal-> external ip → 18.218.2.179 → nslookup → ec2-18-218-2-179.us-east-2.compute.amazonaws.com

Please let me know when you have reviewed this so that I can delete or mask the IP’s.

Here is the command with the new cluster and associated error:
mongo --host customer-replica-set/ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671,ec2-3-17-154-122.us-east-2.compute.amazonaws.com:32595,ec2-18-218-2-179.us-east-2.compute.amazonaws.com:30432 --tls --tlsAllowInvalidCertificates --verbose
MongoDB shell version v4.2.7
connecting to: mongodb://ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671,ec2-3-17-154-122.us-east-2.compute.amazonaws.com:32595,ec2-18-218-2-179.us-east-2.compute.amazonaws.com:30432/?compressors=disabled&gssapiServiceName=mongodb&replicaSet=customer-replica-set
2020-08-16T07:36:53.811-0500 D1 NETWORK [js] Starting up task executor for monitoring replica sets in response to request to monitor set: customer-replica-set/ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671,ec2-3-17-154-122.us-east-2.compute.amazonaws.com:32595,ec2-18-218-2-179.us-east-2.compute.amazonaws.com:30432
2020-08-16T07:36:53.811-0500 I NETWORK [js] Starting new replica set monitor for customer-replica-set/ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671,ec2-3-17-154-122.us-east-2.compute.amazonaws.com:32595,ec2-18-218-2-179.us-east-2.compute.amazonaws.com:30432
2020-08-16T07:36:53.811-0500 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to ec2-18-218-2-179.us-east-2.compute.amazonaws.com:30432
2020-08-16T07:36:53.811-0500 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to ec2-3-17-154-122.us-east-2.compute.amazonaws.com:32595
2020-08-16T07:36:53.811-0500 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671
2020-08-16T07:36:53.841-0500 W NETWORK [ReplicaSetMonitor-TaskExecutor] Unable to reach primary for set customer-replica-set
2020-08-16T07:36:53.842-0500 I NETWORK [ReplicaSetMonitor-TaskExecutor] Cannot reach any nodes for set customer-replica-set. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.
2020-08-16T07:36:53.842-0500 D1 NETWORK [ReplicaSetMonitor-TaskExecutor] Refreshing replica set customer-replica-set took 30ms
2020-08-16T07:36:54.356-0500 W NETWORK [ReplicaSetMonitor-TaskExecutor] Unable to reach primary for set customer-replica-set
2020-08-16T07:36:54.357-0500 I NETWORK [ReplicaSetMonitor-TaskExecutor] Cannot reach any nodes for set customer-replica-set. Please check network connectivity and the status of the set. This has happened for 2 checks in a row.

Hi @Kish_V,

Can you ssh to the primary node and provide the rs.status() and rs.conf()?
If the rs.conf is with internal ips the mongo shell will try to reach those.

Also can you try to connect to a single host without all replica configuration. Just by specifying a single --host and --port.

Also please run a telnet dns port for all 3 hosts

Best
Pavel

Replica set primary node:
rs.conf()

customer-replica-set:PRIMARY> rs.conf()
{
“_id” : “customer-replica-set”,
“version” : 2,
“protocolVersion” : NumberLong(1),
“writeConcernMajorityJournalDefault” : true,
“members” : [
{
“_id” : 0,
“host” : “customer-replica-set-0.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

                    },
                    "horizons" : {
                            "customer-prod-db" : "ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671"
                    },
                    "slaveDelay" : NumberLong(0),
                    "votes" : 1
            },
            {
                    "_id" : 1,
                    "host" : "customer-replica-set-1.customer-replica-set-svc.mongodb.svc.cluster.local:27017",
                    "arbiterOnly" : false,
                    "buildIndexes" : true,
                    "hidden" : false,
                    "priority" : 1,
                    "tags" : {

                    },
                    "horizons" : {
                            "customer-prod-db" : "ec2-3-17-154-122.us-east-2.compute.amazonaws.com:32595"
                    },
                    "slaveDelay" : NumberLong(0),
                    "votes" : 1
            },
            {
                    "_id" : 2,
                    "host" : "customer-replica-set-2.customer-replica-set-svc.mongodb.svc.cluster.local:27017",
                    "arbiterOnly" : false,
                    "buildIndexes" : true,
                    "hidden" : false,
                    "priority" : 1,
                    "tags" : {

                    },
                    "horizons" : {
                            "customer-prod-db" : "ec2-18-218-2-179.us-east-2.compute.amazonaws.com:30432"
                    },
                    "slaveDelay" : NumberLong(0),
                    "votes" : 1
            }
    ],
    "settings" : {
            "chainingAllowed" : true,
            "heartbeatIntervalMillis" : 2000,
            "heartbeatTimeoutSecs" : 10,
            "electionTimeoutMillis" : 10000,
            "catchUpTimeoutMillis" : -1,
            "catchUpTakeoverDelayMillis" : 30000,
            "getLastErrorModes" : {

            },
            "getLastErrorDefaults" : {
                    "w" : 1,
                    "wtimeout" : 0
            },
            "replicaSetId" : ObjectId("5f388cc3900792e0998729e1")
    }

}

rs.status()

rs.status()
{
“set” : “customer-replica-set”,
“date” : ISODate(“2020-08-16T13:51:01.340Z”),
“myState” : 1,
“term” : NumberLong(3),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“heartbeatIntervalMillis” : NumberLong(2000),
“majorityVoteCount” : 2,
“writeMajorityCount” : 2,
“optimes” : {
“lastCommittedOpTime” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“lastCommittedWallTime” : ISODate(“2020-08-16T13:50:59.046Z”),
“readConcernMajorityOpTime” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“readConcernMajorityWallTime” : ISODate(“2020-08-16T13:50:59.046Z”),
“appliedOpTime” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“durableOpTime” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“lastAppliedWallTime” : ISODate(“2020-08-16T13:50:59.046Z”),
“lastDurableWallTime” : ISODate(“2020-08-16T13:50:59.046Z”)
},
“lastStableRecoveryTimestamp” : Timestamp(1597585840, 8),
“lastStableCheckpointTimestamp” : Timestamp(1597585840, 8),
“electionCandidateMetrics” : {
“lastElectionReason” : “stepUpRequestSkipDryRun”,
“lastElectionDate” : ISODate(“2020-08-16T01:45:37.715Z”),
“termAtElection” : NumberLong(3),
“lastCommittedOpTimeAtElection” : {
“ts” : Timestamp(1597542328, 1),
“t” : NumberLong(2)
},
“lastSeenOpTimeAtElection” : {
“ts” : Timestamp(1597542328, 1),
“t” : NumberLong(2)
},
“numVotesNeeded” : 2,
“priorityAtElection” : 1,
“electionTimeoutMillis” : NumberLong(10000),
“priorPrimaryMemberId” : 1,
“numCatchUpOps” : NumberLong(27017),
“newTermStartDate” : ISODate(“2020-08-16T01:45:37.761Z”),
“wMajorityWriteAvailabilityDate” : ISODate(“2020-08-16T01:45:38.278Z”)
},
“members” : [
{
“_id” : 0,
“name” : “customer-replica-set-0.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“ip” : “192.168.22.100”,
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 43532,
“optime” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“optimeDate” : ISODate(“2020-08-16T13:50:59Z”),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“electionTime” : Timestamp(1597542337, 1),
“electionDate” : ISODate(“2020-08-16T01:45:37Z”),
“configVersion” : 2,
“self” : true,
“lastHeartbeatMessage” : “”
},
{
“_id” : 1,
“name” : “customer-replica-set-1.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“ip” : “192.168.80.99”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 43519,
“optime” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“optimeDurable” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“optimeDate” : ISODate(“2020-08-16T13:50:59Z”),
“optimeDurableDate” : ISODate(“2020-08-16T13:50:59Z”),
“lastHeartbeat” : ISODate(“2020-08-16T13:51:00.605Z”),
“lastHeartbeatRecv” : ISODate(“2020-08-16T13:51:00.876Z”),
“pingMs” : NumberLong(0),
“lastHeartbeatMessage” : “”,
“syncingTo” : “customer-replica-set-2.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“syncSourceHost” : “customer-replica-set-2.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“syncSourceId” : 2,
“infoMessage” : “”,
“configVersion” : 2
},
{
“_id” : 2,
“name” : “customer-replica-set-2.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“ip” : “192.168.33.103”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 43525,
“optime” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“optimeDurable” : {
“ts” : Timestamp(1597585859, 1),
“t” : NumberLong(3)
},
“optimeDate” : ISODate(“2020-08-16T13:50:59Z”),
“optimeDurableDate” : ISODate(“2020-08-16T13:50:59Z”),
“lastHeartbeat” : ISODate(“2020-08-16T13:51:00.659Z”),
“lastHeartbeatRecv” : ISODate(“2020-08-16T13:50:59.778Z”),
“pingMs” : NumberLong(0),
“lastHeartbeatMessage” : “”,
“syncingTo” : “customer-replica-set-0.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“syncSourceHost” : “customer-replica-set-0.customer-replica-set-svc.mongodb.svc.cluster.local:27017”,
“syncSourceId” : 0,
“infoMessage” : “”,
“configVersion” : 2
}
],
“ok” : 1,
“$clusterTime” : {
“clusterTime” : Timestamp(1597585859, 1),
“signature” : {
“hash” : BinData(0,“AAAAAAAAAAAAAAAAAAAAAAAAAAA=”),
“keyId” : NumberLong(0)
}
},
“operationTime” : Timestamp(1597585859, 1)
}

Connecting to the single host is the same issue:

mongo --host customer-replica-set/ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671 --tls --tlsAllowInvalidCertificates --verbose
MongoDB shell version v4.2.7
connecting to: mongodb://ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671/?compressors=disabled&gssapiServiceName=mongodb&replicaSet=customer-replica-set
2020-08-16T09:13:33.132-0500 D1 NETWORK [js] Starting up task executor for monitoring replica sets in response to request to monitor set: customer-replica-set/ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671
2020-08-16T09:13:33.133-0500 I NETWORK [js] Starting new replica set monitor for customer-replica-set/ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671
2020-08-16T09:13:33.135-0500 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671
2020-08-16T09:13:33.218-0500 W NETWORK [ReplicaSetMonitor-TaskExecutor] Unable to reach primary for set customer-replica-set
2020-08-16T09:13:33.219-0500 I NETWORK [ReplicaSetMonitor-TaskExecutor] Cannot reach any nodes for set customer-replica-set. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.

telnet ec2-18-216-32-24.us-east-2.compute.amazonaws.com 31671
Trying 18.216.32.24

telnet: connect to address 18.216.32.24: Connection refused
telnet: Unable to connect to remote host

Deployed an nginx pod to make sure that it is not SG related:

kubectl get services -w
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
customer-replica-set-0 NodePort 10.100.66.6 27017:31671/TCP 12h
customer-replica-set-1 NodePort 10.100.185.237 27017:32595/TCP 12h
customer-replica-set-2 NodePort 10.100.37.222 27017:30432/TCP 12h
customer-replica-set-svc ClusterIP None 27017/TCP 12h
mynginxsvc NodePort 10.100.121.119 80:30180/TCP 3m28s
operator-webhook ClusterIP 10.100.40.145 443/TCP 13h
ops-manager-db-svc ClusterIP None 27017/TCP 13h
ops-manager-svc ClusterIP None 8080/TCP 13h
ops-manager-svc-ext LoadBalancer 10.100.146.190 a326d1d4fefc844e49d9da6d8ce1f229-105300929.us-east-2.elb.amazonaws.com 8080:30187/TCP 13h

telnet ec2-3-17-154-122.us-east-2.compute.amazonaws.com 30180
Trying 3.17.154.122

Connected to ec2-3-17-154-122.us-east-2.compute.amazonaws.com.
Escape character is ‘^]’.

@Pavel_Duchovny Seems like this is something specific to MongoDB Operator and the deployment. I would suggest that you run through the same deployment on an EKS cluster and let me know what you find since this should be pretty straight forward for accessing thru node port.

Hi @Kish_V,

If the telnet cannot reach the instance its a 100% issue with your network infrastructure

telnet ec2-18-216-32-24.us-east-2.compute.amazonaws.com 31671
Trying 18.216.32.24

telnet: connect to address 18.216.32.24: Connection refused
telnet: Unable to connect to remote host

I don’t see a need for a repro until you won’t resolve this.
Best
Pavel

Thanks @Pavel_Duchovny.

As a follow up I did install a nginx pod into the same namespace (refer above reply) and have verified that the k8s infra/nodes and node port is not an issue including the NACL and SG and it seems like is an issue with the configuration of the replicasethorizons.

Please refer to the latter half of the verification that was performed above.

Hi @Kish_V,

Have you tried to run the mongo shell command from the same place you successfully done the rs.status() does this work?

Whats the difference between those two connection attempts?

Best
Pqvel

Here you go @Pavel_Duchovny

Kubernetes Internal IP:

I have no name!@customer-replica-set-0:/$ /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-4.2.1-ent/bin/mongo --host customer-replica-set-0.customer-replica-set-svc.mongodb.svc.cluster.local --port 27017 --ssl --sslAllowInvalidCertificates
2020-08-16T16:02:12.922+0000 W CONTROL [main] Option: ssl is deprecated. Please use tls instead.
2020-08-16T16:02:12.922+0000 W CONTROL [main] Option: sslAllowInvalidCertificates is deprecated. Please use tlsAllowInvalidCertificates instead.
MongoDB shell version v4.2.1
connecting to: mongodb://customer-replica-set-0.customer-replica-set-svc.mongodb.svc.cluster.local:27017/?compressors=disabled&gssapiServiceName=mongodb
2020-08-16T16:02:12.996+0000 W NETWORK [js] SSL peer certificate validation failed: self signed certificate in certificate chain
Implicit session: session { “id” : UUID(“85d3bbe5-0f65-4f96-ad67-28d46baadcab”) }
MongoDB server version: 4.2.1
Welcome to the MongoDB shell.
For interactive help, type “help”.
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
2020-08-16T16:02:13.009+0000 I STORAGE [main] In File::open(), ::open for ‘//.mongorc.js’ failed with Permission denied
Server has startup warnings:
2020-08-16T01:45:29.701+0000 I STORAGE [initandlisten]
2020-08-16T01:45:29.701+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-08-16T01:45:29.701+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten]
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten]
MongoDB Enterprise customer-replica-set:PRIMARY>

Node Internal IP:

/var/lib/mongodb-mms-automation/mongodb-linux-x86_64-4.2.1-ent/bin/mongo --host ip-192-168-2-202.us-east-2.compute.internal --port 27017 --ssl --sslAllowInvalidCertificates
I have no name!@customer-replica-set-0:/$ /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-4.2.1-ent/bin/mongo --host ip-192-168-2-202.us-east-2.compute.internal --port 31671 --ssl --sslAllowInvalidCertificates
2020-08-16T16:05:01.481+0000 W CONTROL [main] Option: ssl is deprecated. Please use tls instead.
2020-08-16T16:05:01.481+0000 W CONTROL [main] Option: sslAllowInvalidCertificates is deprecated. Please use tlsAllowInvalidCertificates instead.
MongoDB shell version v4.2.1
connecting to: mongodb://ip-192-168-2-202.us-east-2.compute.internal:31671/?compressors=disabled&gssapiServiceName=mongodb
2020-08-16T16:05:02.552+0000 E QUERY [js] Error: couldn’t connect to server ip-192-168-2-202.us-east-2.compute.internal:31671, connection attempt failed: SocketException: Error connecting to ip-192-168-2-202.us-east-2.compute.internal:31671 (192.168.2.202:31671) :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:341:17
@(connect):2:6
2020-08-16T16:05:02.554+0000 F - [main] exception: connect failed
2020-08-16T16:05:02.554+0000 E - [main] exiting with code 1

Node External IP

I have no name!@customer-replica-set-0:/$ /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-4.2.1-ent/bin/mongo --host ec2-18-216-32-24.us-east-2.compute.amazonaws.com --port 31671 --ssl --sslAllowInvalidCertificates
2020-08-16T16:06:26.944+0000 W CONTROL [main] Option: ssl is deprecated. Please use tls instead.
2020-08-16T16:06:26.944+0000 W CONTROL [main] Option: sslAllowInvalidCertificates is deprecated. Please use tlsAllowInvalidCertificates instead.
MongoDB shell version v4.2.1
connecting to: mongodb://ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671/?compressors=disabled&gssapiServiceName=mongodb
2020-08-16T16:06:28.024+0000 E QUERY [js] Error: couldn’t connect to server ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671, connection attempt failed: SocketException: Error connecting to ec2-18-216-32-24.us-east-2.compute.amazonaws.com:31671 (192.168.2.202:31671) :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:341:17
@(connect):2:6
2020-08-16T16:06:28.026+0000 F - [main] exception: connect failed
2020-08-16T16:06:28.026+0000 E - [main] exiting with code 1
I have no name!@customer-replica-set-0:/$

Pod - Localhost:

I have no name!@customer-replica-set-0:/$ /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-4.2.1-ent/bin/mongo --host 127.0.0.1 --port 27017 --ssl --sslAllowInvalidCertificates
2020-08-16T16:07:23.098+0000 W CONTROL [main] Option: ssl is deprecated. Please use tls instead.
2020-08-16T16:07:23.098+0000 W CONTROL [main] Option: sslAllowInvalidCertificates is deprecated. Please use tlsAllowInvalidCertificates instead.
MongoDB shell version v4.2.1
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
2020-08-16T16:07:23.164+0000 W NETWORK [js] SSL peer certificate validation failed: self signed certificate in certificate chain
Implicit session: session { “id” : UUID(“eab8dae7-2984-486f-a22b-2b9c89397c7b”) }
MongoDB server version: 4.2.1
Welcome to the MongoDB shell.
For interactive help, type “help”.
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
2020-08-16T16:07:23.169+0000 I STORAGE [main] In File::open(), ::open for ‘//.mongorc.js’ failed with Permission denied
Server has startup warnings:
2020-08-16T01:45:29.701+0000 I STORAGE [initandlisten]
2020-08-16T01:45:29.701+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-08-16T01:45:29.701+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten]
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2020-08-16T01:45:30.725+0000 I CONTROL [initandlisten]
MongoDB Enterprise customer-replica-set:PRIMARY>

I have seen that there are cases that the mongo conf needs to be updated for the bind ip. Do you think that it has a play here since that could cause this not to be exposed outside ?

Have you guys tried to run this on AWS EKS and it should hardly take 15 mins to try this out on your own assuming you have a EKS cluster to play with ?

Hi @Kish_V,

I believe ops manager automation should place a bind of 0.0.0.0 for the hosts.

Do you see otherwise?

I don’t have EKS at hand at the moment , I might try later this week.

It seems as more of a dns problem. Does your vpc have dns and hostname resolution enabled?

Can you upload the primary mongod log during a failed attempt?

Best
Pavel

hi @Pavel_Duchovny confirmed that the config does include the bind address:

I have no name!@customer-replica-set-0:/$ more /data/automation-mongod.conf
net:
bindIp: 0.0.0.0
port: 27017
tls:
CAFile: /mongodb-automation/ca.pem
allowConnectionsWithoutCertificates: true
certificateKeyFile: /mongodb-automation/server.pem
mode: allowTLS
processManagement:
fork: “true”
replication:
replSetName: customer-replica-set
storage:
dbPath: /data
engine: wiredTiger
systemLog:
destination: file
path: /var/log/mongodb-mms-automation/mongodb.log

Regarding VPC and host resloution yes it is enabled since I can get to the OPS manager and by test nginx pod with nodeport test.

Primary Mongod log during a failed attempt, I did provide this earlier from the client side. I don’t think there is going to be anything at the pod level since the traffic is not even getting thru. If there is anything specific you need let me know where to find and upload it here.

I will be tearing down the cluster in the interest of cost but if you do get through the test let me know what you find. I can certainly repeat this issue repeatedly and can spin up a cluster any time we need to try this again but I believe this is an issue with Operator and DNS resolution within the cluster or how the replicasethorizons is set.

Thanks @Pavel_Duchovny for staying with me.

@Pavel_Duchovny do we have any update on this issue ?

Thanks

@Pavel_Duchovny do we have any update on this issue ?

Thanks

Hi,

I believe I know the cause of this and for the benefit of anyone else coming across it:

  1. Run the following and check the replica set member Selectors:

sudo kubectl describe svc -n <namespace>
replacing with the namespace configured e.g. mongodb

You may find a selector with a revision-hash e.g. controller-revision-hash=myreplicaset-68d57865cf. These need to be removed

  1. Edit the config on each node and remove the controller-revision-hash=myreplicaset-xxx value:

kubectl edit svc/<replica set member> -n <namespace>
replacing the values in brackets e.g. kubectl edit svc/myreplicaset-2 -n mongodb

  1. You should now be able to connect
1 Like