Restore of Sharded causing connections to hang with timeout

Jonathan_Stairs · April 8, 2021, 4:40pm

Hello,

We have started to see this issue after upgrading to 3.6 and it has continued into 4.0. When we restore a shared cluster with config servers setup in a 3 node replication set we are getting timeouts on all activity from mongos.

The error message in the config server logs is

2021-04-08T16:24:47.251+0000 I COMMAND [conn470] Command on database config timed out waiting for read concern to be satisfied.
Command: { find: “databases”, filter: { _id: “b2b” }, readConcern: { level: “majority”, afterOpTime: { ts: Timestamp(1617856808, 1), t: 39 } }, maxTimeMS: 30000, $readPreference: { mode: “nearest” }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1617899057, 5), signature: { hash: BinData(0, AD61160760CD2170230E457CFC08DF2D056E92E5), keyId: 6932855326379082068 } }, $configServerState: { opTime: { ts: Timestamp(1617856808, 1), t: 39 } }, $db: “config” }. Info: MaxTimeMSExpired: Error waiting for snapshot not less than { ts: Timestamp(1617856808, 1), t: 39 }, current relevant optime is { ts: Timestamp(1617899083, 2), t: 35 }. :: caused by :: operation exceeded time limit

This seems to persist for hours after the restore and numerous restarts of the mongos, config db’s and mongodbs will finally clear out. The error is confusing since it is looking for a snapshot not less than 1617856808 and the current optime is greater than that 1617899083.

On the config database I am able to create and write to collections and replication is showing in sync across all the nodes.

ashwin_reddy1 · November 7, 2023, 9:54am

Even we are facing the same issue. Did you find any solution for this issue?