Initail sync stuck at STARTUP2

Wernfried_Domscheit · April 28, 2022, 10:33am

Currently I am running some restore test on a sharded cluster. I have 4 shards, each shard is configured as PSA (PRIMARY, SECONDARY, ARBITER). MongoDB version 5.0

I removed one SECONDARY, stopped it, deleted all data files started it again and added it again to ReplicaSet. Inital sync starts normal:

rs.add({ host: "d-mipmdb-sh2-03:27018", priority: 0, votes: 0})
rs.add({ host: "d-mipmdb-arb-01:28003", arbiterOnly: true})

rs.printSecondaryReplicationInfo()
source: d-mipmdb-sh2-03:27018
	syncedTo: Thu Jan 01 1970 01:00:00 GMT+0100 (W. Europe Standard Time)
	1651134828 secs (458648.56 hrs) behind the primary 

rs.printReplicationInfo()
configured oplog size:   51200MB
log length start to end: 10872secs (3.02hrs)
oplog first event time:  Thu Apr 28 2022 07:32:38 GMT+0200 (W. Europe Daylight Time)
oplog last event time:   Thu Apr 28 2022 10:33:50 GMT+0200 (W. Europe Daylight Time)
now:                     Thu Apr 28 2022 10:33:50 GMT+0200 (W. Europe Daylight Time)


rs.status()

{
	"set" : "shard_03",
	"date" : ISODate("2022-04-28T08:34:33.200Z"),
	"myState" : 1,
	"term" : NumberLong(4),
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"heartbeatIntervalMillis" : NumberLong(2000),
	"majorityVoteCount" : 2,
	"writeMajorityCount" : 1,
	"votingMembersCount" : 2,
	"writableVotingMembersCount" : 1,
...
	"members" : [
		{
			"_id" : 0,
			"name" : "d-mipmdb-sh1-03:27018",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 124629,
			"optime" : {
				"ts" : Timestamp(1651134873, 3000),
				"t" : NumberLong(4)
			},
			"optimeDate" : ISODate("2022-04-28T08:34:33Z"),
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"electionTime" : Timestamp(1651010266, 6734),
			"electionDate" : ISODate("2022-04-26T21:57:46Z"),
			"configVersion" : 34,
			"configTerm" : 4,
			"self" : true,
			"lastHeartbeatMessage" : ""
		},
		{
			"_id" : 1,
			"name" : "d-mipmdb-sh2-03:27018",
			"health" : 1,
			"state" : 5,
			"stateStr" : "STARTUP2",
			"uptime" : 54,
			"optime" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDurable" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
			"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
			"lastHeartbeat" : ISODate("2022-04-28T08:34:32.657Z"),
			"lastHeartbeatRecv" : ISODate("2022-04-28T08:34:32.816Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "d-mipmdb-sh1-03:27018",
			"syncSourceId" : 0,
			"infoMessage" : "",
			"configVersion" : 34,
			"configTerm" : 4
		},
...

After around 2 hours the inital sync seems to be finished, however status remains at STARTUP2 even though according to rs.status().initialSyncStatus it’s all done. All databases and collections are cloned. Total files size is around 300GB on each shard. The mongo log files do not write any warnings or errors.

shard_03:STARTUP2> rs.status().initialSyncStatus
{
        "failedInitialSyncAttempts" : 0,
        "maxFailedInitialSyncAttempts" : 10,
        "initialSyncStart" : ISODate("2022-04-28T08:33:36.519Z"),
        "totalInitialSyncElapsedMillis" : 6651090,
        "initialSyncAttempts" : [ ],
        "approxTotalDataSize" : NumberLong("488957278026"),
        "approxTotalBytesCopied" : NumberLong("519638502008"),
        "remainingInitialSyncEstimatedMillis" : -392702,
        "appliedOps" : 1710001,
        "initialSyncOplogStart" : Timestamp(1651134817, 16004),
        "initialSyncOplogEnd" : Timestamp(1651141388, 1000),
        "totalTimeUnreachableMillis" : NumberLong(0),
        "databases" : {
                "databasesToClone" : 0,
                "databasesCloned" : 7,
...
                "data" : {
                        "collections" : 10,
                        "clonedCollections" : 10,
                        "start" : ISODate("2022-04-28T08:33:39.732Z"),
                        "end" : ISODate("2022-04-28T09:30:12.759Z"),
                        "elapsedMillis" : 3393027,
                        "data.sessions.20220423" : {
                                "documentsToCopy" : 13257930,
                                "documentsCopied" : 13257930,
                                "indexes" : 5,
                                "fetchedBatches" : 3340,
                                "bytesToCopy" : NumberLong("55166789435"),
                                "approxBytesCopied" : NumberLong("55166246730"),
                                "start" : ISODate("2022-04-28T08:33:39.733Z"),
                                "end" : ISODate("2022-04-28T08:40:48.082Z"),
                                "elapsedMillis" : 428349,
                                "receivedBatches" : 3340
                        },
...

rs.status()
{
	"set" : "shard_03",
	"date" : ISODate("2022-04-28T10:27:41.180Z"),
	"myState" : 1,
	"term" : NumberLong(4),
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"heartbeatIntervalMillis" : NumberLong(2000),
	"majorityVoteCount" : 2,
	"writeMajorityCount" : 1,
	"votingMembersCount" : 2,
	"writableVotingMembersCount" : 1,
	"lastStableRecoveryTimestamp" : Timestamp(1651141648, 21620),
	"members" : [
		{
			"_id" : 0,
			"name" : "d-mipmdb-sh1-03:27018",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 131417,
			"optime" : {
				"ts" : Timestamp(1651141661, 6000),
				"t" : NumberLong(4)
			},
			"optimeDate" : ISODate("2022-04-28T10:27:41Z"),
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"electionTime" : Timestamp(1651010266, 6734),
			"electionDate" : ISODate("2022-04-26T21:57:46Z"),
			"configVersion" : 34,
			"configTerm" : 4,
			"self" : true,
			"lastHeartbeatMessage" : ""
		},
		{
			"_id" : 1,
			"name" : "d-mipmdb-sh2-03:27018",
			"health" : 1,
			"state" : 5,
			"stateStr" : "STARTUP2",
			"uptime" : 6842,
			"optime" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDurable" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
			"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
			"lastHeartbeat" : ISODate("2022-04-28T10:27:40.845Z"),
			"lastHeartbeatRecv" : ISODate("2022-04-28T10:27:40.137Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "d-mipmdb-sh1-03:27018",
			"syncSourceId" : 0,
			"infoMessage" : "",
			"configVersion" : 34,
			"configTerm" : 4
		},
		{
			"_id" : 2,
			"name" : "d-mipmdb-arb-01:28003",
			"health" : 1,
			"state" : 7,
			"stateStr" : "ARBITER",
			"uptime" : 6840,
			"lastHeartbeat" : ISODate("2022-04-28T10:27:39.547Z"),
			"lastHeartbeatRecv" : ISODate("2022-04-28T10:27:40.681Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"configVersion" : 34,
			"configTerm" : 4
		}
	],
...

Do I have to wait even longer? According to my understanding if the sync takes more than 3 hours then replica set member has fallen too far behind the primary’s oplog and will stuck.

Note, the application has a fairly high ongoing load. Every second the application inserts around 100’000 documents! This shard carries around 30% of it. For the same reason I cannot use Sync by Copying Data Files from Another Member, I would need to stop the PRIMARY for 1-2 hours which is not acceptable.

Update:

Actually it looks like, the oplog is applied. However, I don’t see any progress. The optimeDate is growing but will never reach current time:

shard_03:STARTUP2> rs.status().members.filter(x => x.stateStr == 'STARTUP2').forEach(x => printjson(    { date: rs.status().date, name: x.name, stateStr: x.stateStr, optimeDate: x.optimeDate, oplog_BacklogSeconds: (rs.status().date - x.optimeDate) / 1000 }) )
{
        "date" : ISODate("2022-04-28T10:49:12.683Z"),
        "name" : "d-mipmdb-sh2-03:27018",
        "stateStr" : "STARTUP2",
        "optimeDate" : ISODate("2022-04-28T09:01:40Z"),
        "oplog_BacklogSeconds" : 6452.683
}

oplog_BacklogSeconds should get lower but it remains at around 6452 seconds.

Kind Regards
Wernfried

Wernfried_Domscheit · April 28, 2022, 1:36pm

Hard to believe, I just need to be patient. After 3 hours the ReplicaSet went properly to SECONDARY state and everything seems to run fine.

Kind Regards
Wernfried