Primary shard ConfigServerCatalogCacheLoader connecting to SECONDARY config-server?

Hi there,

Quick question about ConfigServerCatalogCacheLoader on a newly elected PRIMARY shard in a sharded deployment running Mongo 4.0.25.

Question:

Consider a sharded deployment with these pieces (along with mongos & other shards):

          fed-db-mongo-ram-configsvr-0 
          fed-db-mongo-ram-configsvr-1
          fed-db-mongo-ram-configsvr-2

          fed-db-mongo-ram-shard2-0  
          fed-db-mongo-ram-shard2-1 
          fed-db-mongo-ram-shard2-2

The logs below show that the ConfigServerCatalogCacheLoader-0 on a newly elected PRIMARY shard2-2 is connected to SECONDARY configsvr-1, even though configsvr-0 is PRIMARY. This seems odd? Is it really expected?

This seems to have bad side-effects with “Failed to refresh metadata…” if the SECONDARY configsvr-2 is then rebooted?

Logs:

All logs below are from the perspective of shard2-2:

//
// At this point:
//
//        fed-db-mongo-ram-configsvr-0  ==  PRIMARY
//        fed-db-mongo-ram-configsvr-1  ==  SECONDARY
//        fed-db-mongo-ram-configsvr-2  ==  SECONDARY
//
//        fed-db-mongo-ram-shard2-0     ==  SECONDARY
//        fed-db-mongo-ram-shard2-1     ==  PRIMARY
//        fed-db-mongo-ram-shard2-2     ==  SECONDARY
//
// Where shard2-2 properly understands that configsvr-0 is PRIMARY, and configsvr-1 is SECONDARY:
// 
2022-04-03T02:34:13.200+0000 I NETWORK  [ReplicaSetMonitor-TaskExecutor-18] Updating host fed-db-mongo-ram-configsvr-0.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 state based on ismaster reply from ReplicaSet cnfRepSet. Self is primary: 1, primary is: fed-db-mongo-ram-configsvr-0.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017, election ID: 7fffffff0000000000000008, ReplicaSet version: 3
2022-04-03T02:34:13.201+0000 I NETWORK  [ReplicaSetMonitor-TaskExecutor-18] Updating host fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 state based on ismaster reply from ReplicaSet cnfRepSet. Self is primary: 0, primary is: fed-db-mongo-ram-configsvr-0.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017, election ID: 000000000000000000000000, ReplicaSet version: 3
2022-04-03T02:34:13.201+0000 I NETWORK  [ReplicaSetMonitor-TaskExecutor-18] Updating host fed-db-mongo-ram-configsvr-2.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 state based on ismaster reply from ReplicaSet cnfRepSet. Self is primary: 0, primary is: fed-db-mongo-ram-configsvr-0.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017, election ID: 000000000000000000000000, ReplicaSet version: 3
//
// At this point:
//
//        fed-db-mongo-ram-configsvr-0  ==  PRIMARY
//        fed-db-mongo-ram-configsvr-1  ==  SECONDARY
//        fed-db-mongo-ram-configsvr-2  ==  SECONDARY
//
//        fed-db-mongo-ram-shard2-0     ==  SECONDARY
//        fed-db-mongo-ram-shard2-1     ==  PRIMARY   ---> reboot
//        fed-db-mongo-ram-shard2-2     ==  SECONDARY ---> stepUp ---> PRIMARY
//
// Where shard2-2 is quickly elected PRIMARY via "stepUp" and is ready within ~1 second.
//
2022-04-03T02:34:17.515+0000 I COMMAND  [conn1] Received replSetStepUp request
2022-04-03T02:34:18.518+0000 I REPL     [rsSync-0] transition to primary complete; database writes are now permitted
//
// At this point, configsvr-1 is rebooted:
//
//        fed-db-mongo-ram-configsvr-0  ==  PRIMARY
//        fed-db-mongo-ram-configsvr-1  ==  SECONDARY ---> reboot
//        fed-db-mongo-ram-configsvr-2  ==  SECONDARY
//
//        fed-db-mongo-ram-shard2-0     ==  SECONDARY
//        fed-db-mongo-ram-shard2-1     ==  PRIMARY   ---> still rebooting
//        fed-db-mongo-ram-shard2-2     ==  PRIMARY
//
// Is it really expected that ConfigServerCatalogCacheLoader-0 was connected to SECONDARY configsvr-1?
// The configsvr-0 is PRIMARY.
//
2022-04-03T02:34:18.522+0000 I CONNPOOL [ShardRegistry] dropping unhealthy pooled connection to fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017
2022-04-03T02:34:18.522+0000 I ASIO     [ShardRegistry] Connecting to fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017
2022-04-03T02:34:38.522+0000 I ASIO     [ShardRegistry] Connecting to fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017
2022-04-03T02:34:38.522+0000 I NETWORK  [ConfigServerCatalogCacheLoader-0] Marking host fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 as failed :: caused by :: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2022-04-03T02:34:38.522+0000 I SHARDING [ConfigServerCatalogCacheLoader-0] Operation timed out  :: caused by :: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2022-04-03T02:34:38.522+0000 I SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database upf_UPF-CL11 took 20000 ms and failed :: caused by :: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2022-04-03T02:34:38.523+0000 I ASIO     [ShardRegistry] Failed to connect to fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 - HostUnreachable: Error connecting to fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 :: caused by :: Could not find address for fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017: SocketException: Host not found (authoritative)
2022-04-03T02:34:38.523+0000 I CONNPOOL [ShardRegistry] Dropping all pooled connections to fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 due to HostUnreachable: Error connecting to fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017 :: caused by :: Could not find address for fed-db-mongo-ram-configsvr-1.fed-db-mongo-ram-configsvr-pods.fed-db-mongo-ram.svc.cluster.local:27017: SocketException: Host not found (authoritative)
//
// At this point, configsvr-1 is still rebooting (above):
//
//        fed-db-mongo-ram-configsvr-0  ==  PRIMARY
//        fed-db-mongo-ram-configsvr-1  ==  SECONDARY ---> rebooting
//        fed-db-mongo-ram-configsvr-2  ==  SECONDARY
//
//        fed-db-mongo-ram-shard2-0     ==  SECONDARY
//        fed-db-mongo-ram-shard2-1     ==  PRIMARY   ---> still rebooting
//        fed-db-mongo-ram-shard2-2     ==  PRIMARY
//
// There are a series of these logs for about 0.2 seconds : trying to connect to dead SECONDARY configsvr-1 ?
//
2022-04-03T02:34:38.535+0000 I SHARDING [conn543] Failed to refresh metadata for collection...blah... :: caused by :: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit
2022-04-03T02:34:38.535+0000 I SHARDING [conn543] Failed to handle stale version exception :: caused by :: NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit

... repeat ...

2022-04-03T02:34:38.550+0000 I SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for collection ...blah... to version 1|5||6248fb74821589a79f4590d9 took 2 ms