Issue with Secondary replica set syncing: Network Interface Exceeded Time Limit: timed out

Vijay_Rajpurohit · December 28, 2022, 8:46am

Hi,

I’m using mongo 4.0.1-rc0-2-g54f1582fc6. I have a cluster on 2 shards and 1 config server, each of them is a 3 node replica set(standard cluster). The primary shard is the first shard where most of the data is present.

When I’m trying to start secondary replica of the first shard, I’m getting a timeout error.

This the secondary replica of first shard:

1:STARTUP2> rs.printSlaveReplicationInfo()
source: 172.31.61.174:27001
syncedTo: Thu Jan 01 1970 00:00:00 GMT+0000 (UTC)
1672216775 secs (464504.66 hrs) behind the primary

Attaching the sample log below:

2022-12-28T08:25:54.559+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 28273932/277825445 10% (documents copied)
2022-12-28T08:26:18.757+0000 I ASIO     [ShardRegistry] Connecting to 172.31.2.161:27001
2022-12-28T08:26:56.983+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 34701253/277825445 12% (documents copied)
2022-12-28T08:27:18.740+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:36958 #67 (13 connections now open)
2022-12-28T08:27:18.741+0000 I NETWORK  [conn67] received client metadata from 127.0.0.1:36958 conn67: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1-rc0-2-g54f1582fc6" }, os: { type: "Linux", name: "CentOS Linux release 7.6.1810 (Core) ", architecture: "x86_64", version: "Kernel 3.10.0-1160.81.1.el7.x86_64" } }
2022-12-28T08:27:56.240+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 39948059/277825445 14% (documents copied)
2022-12-28T08:28:59.657+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 46496982/277825445 16% (documents copied)
2022-12-28T08:29:06.112+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:36960 #68 (14 connections now open)
2022-12-28T08:29:06.113+0000 I NETWORK  [conn68] received client metadata from 127.0.0.1:36960 conn68: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.1-rc0-2-g54f1582fc6" }, os: { type: "Linux", name: "CentOS Linux release 7.6.1810 (Core) ", architecture: "x86_64", version: "Kernel 3.10.0-1160.81.1.el7.x86_64" } }
2022-12-28T08:29:06.113+0000 I NETWORK  [conn68] end connection 127.0.0.1:36960 (13 connections now open)
2022-12-28T08:29:06.115+0000 I NETWORK  [conn55] end connection 127.0.0.1:36934 (12 connections now open)
2022-12-28T08:30:08.769+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 53216848/277825445 19% (documents copied)
2022-12-28T08:30:18.754+0000 I SHARDING [LogicalSessionCacheReap] Refreshing cached database entry for config; current cached database info is {}
2022-12-28T08:30:48.754+0000 I NETWORK  [ShardRegistry] Marking host 172.31.2.161:27001 as failed :: caused by :: NetworkInterfaceExceededTimeLimit: timed out
2022-12-28T08:30:48.754+0000 I SHARDING [ShardServerCatalogCacheLoader-1] Operation timed out with status NetworkInterfaceExceededTimeLimit: timed out
2022-12-28T08:30:48.754+0000 I ASIO     [ShardRegistry] Ending connection to host 172.31.2.161:27001 due to bad connection status; 9 connections to that host remain open
2022-12-28T08:30:48.754+0000 I SHARDING [ShardServerCatalogCacheLoader-1] Refresh for database config took 30000 ms and failed :: caused by :: NetworkInterfaceExceededTimeLimit: timed out
2022-12-28T08:30:48.754+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: timed out
2022-12-28T08:30:48.754+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: timed out
2022-12-28T08:31:13.205+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 60086778/277825445 21% (documents copied)
2022-12-28T08:31:18.760+0000 I ASIO     [ShardRegistry] Connecting to 172.31.2.161:27001
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:31:20.107+0000 I ASIO     [ShardRegistry] Connecting to 172.31.80.10:27005
2022-12-28T08:32:14.946+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 65520518/277825445 23% (documents copied)
2022-12-28T08:33:23.297+0000 I -        [repl writer worker 12]   k2db.cvis collection clone progress: 72149091/277825445 25% (documents copied)

I have gone through the link here but this could not resolve my error.

Any help would be much appreciated

Aasawari · January 3, 2023, 7:52am

Hi @Vijay_Rajpurohit and welcome to the MongoDB community forum!!

The MongoDB version 4.0 is a very old version and RC indicating Release Candidate (but not final release). I would recommend you to upgrade to the latest version 4.0 release which is 4.0.28 or to a supported version 4.2 or newer for major bug fixes and improvements.
Also, please note that, the minor upgrade (4.0.x) do not introduce any backward compatibility changes.

Config servers store essential metadata for a sharded cluster, including which shards own ranges of data for sharded collections. If something happens to their single config server their recovery path will likely be restoring the entire sharded cluster from a backup. For a production environment specifically, the recommendation would be to have multiple config servers to handle failures if any.

Since it is possible that you’re seeing the effect of a fixed issue due to the version you’re using, upgrading the server to the latest supported version would be my first step. After the upgrade, if the issue persists, could you share the output for rs.status() and sh.status() from the primary members of the replica set and mongos respectively.

Best Regards
Aasawari