readPreference=secondary routes all request to only one secondary

Nam_Le · July 12, 2022, 10:17am

Currently I have set up a simple replica set with three nodes and readPreference value is “secondary”. Monitoring cpu usage between the two secondaries, I notice one is under 8% and one is almost 100%. My question is why such difference exists? Should it be 50-50 or something similar? If this is not normal then what is the possible reason for it, as well as the solution to fix our problem? Thank you.

Arkadiusz_Borucki · July 12, 2022, 10:37am

Hi,
can you share your replica set config - rs.config() and a code where you specified a readPreference? you can have additional parameters or tags which can affect query routing, also check if both secondary replica set members are in a healthy state - rs.status()

Yilmaz_Durmaz · July 12, 2022, 11:45am

most drivers and programs use connection pooling and they connect to one node after the initial connection procedures.

that node can be different for many users, but a single user will have a pooling to a single node. so it won’t be 50-50 load unless you have many different programs connected to the server.

Nam_Le · July 12, 2022, 11:50am

Hi, thank you for your response. Currently there’s another team managing this microservice. This is what I gave them

mongo_uri=mongodb://user:pass@db1:27000,db2:27000,db3:27000/?replicaSet=replicaset-remote&authSource=admin&readPreference=secondary&maxStalenessSeconds=120

but in their code, they are using these options

mongo = await new MongoClient(mongo_uri, {
   ignoreUndefined: true,
   readPreference: ReadPreference.SECONDARY_PREFERRED,
}).connect()

I noticed there’s conflict in readPreference, but haven’t had the chance to fix it yet, due to role restriction. But do you think this may be the cause of the problem? Also this is the result from rs.status() if that helps. Thank you

{
	"set" : "replicaset-remote",
	"date" : ISODate("2022-07-12T11:36:15.052Z"),
	"myState" : 1,
	"term" : NumberLong(9),
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"heartbeatIntervalMillis" : NumberLong(2000),
	"majorityVoteCount" : 2,
	"writeMajorityCount" : 2,
	"votingMembersCount" : 3,
	"writableVotingMembersCount" : 3,
	"optimes" : {
		"lastCommittedOpTime" : {
			"ts" : Timestamp(1657625775, 17),
			"t" : NumberLong(9)
		},
		"lastCommittedWallTime" : ISODate("2022-07-12T11:36:15.046Z"),
		"readConcernMajorityOpTime" : {
			"ts" : Timestamp(1657625775, 17),
			"t" : NumberLong(9)
		},
		"readConcernMajorityWallTime" : ISODate("2022-07-12T11:36:15.046Z"),
		"appliedOpTime" : {
			"ts" : Timestamp(1657625775, 17),
			"t" : NumberLong(9)
		},
		"durableOpTime" : {
			"ts" : Timestamp(1657625775, 17),
			"t" : NumberLong(9)
		},
		"lastAppliedWallTime" : ISODate("2022-07-12T11:36:15.046Z"),
		"lastDurableWallTime" : ISODate("2022-07-12T11:36:15.046Z")
	},
	"lastStableRecoveryTimestamp" : Timestamp(1657625753, 4),
	"electionCandidateMetrics" : {
		"lastElectionReason" : "stepUpRequestSkipDryRun",
		"lastElectionDate" : ISODate("2021-12-26T10:07:29.763Z"),
		"electionTerm" : NumberLong(9),
		"lastCommittedOpTimeAtElection" : {
			"ts" : Timestamp(1640513247, 1),
			"t" : NumberLong(8)
		},
		"lastSeenOpTimeAtElection" : {
			"ts" : Timestamp(1640513247, 1),
			"t" : NumberLong(8)
		},
		"numVotesNeeded" : 2,
		"priorityAtElection" : 1,
		"electionTimeoutMillis" : NumberLong(10000),
		"priorPrimaryMemberId" : 0,
		"numCatchUpOps" : NumberLong(0),
		"newTermStartDate" : ISODate("2021-12-26T10:07:29.778Z"),
		"wMajorityWriteAvailabilityDate" : ISODate("2021-12-26T10:07:32.894Z")
	},
	"members" : [
		{
			"_id" : 0,
			"name" : "",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 17113068,
			"optime" : {
				"ts" : Timestamp(1657625775, 17),
				"t" : NumberLong(9)
			},
			"optimeDate" : ISODate("2022-07-12T11:36:15Z"),
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"electionTime" : Timestamp(1640513249, 1),
			"electionDate" : ISODate("2021-12-26T10:07:29Z"),
			"configVersion" : 3,
			"configTerm" : 9,
			"self" : true,
			"lastHeartbeatMessage" : ""
		},
		{
			"_id" : 1,
			"name" : "",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 15477834,
			"optime" : {
				"ts" : Timestamp(1657625773, 151),
				"t" : NumberLong(9)
			},
			"optimeDurable" : {
				"ts" : Timestamp(1657625773, 151),
				"t" : NumberLong(9)
			},
			"optimeDate" : ISODate("2022-07-12T11:36:13Z"),
			"optimeDurableDate" : ISODate("2022-07-12T11:36:13Z"),
			"lastHeartbeat" : ISODate("2022-07-12T11:36:13.314Z"),
			"lastHeartbeatRecv" : ISODate("2022-07-12T11:36:14.588Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "",
			"syncSourceId" : 0,
			"infoMessage" : "",
			"configVersion" : 3,
			"configTerm" : 9
		},
		{
			"_id" : 2,
			"name" : "",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 15477759,
			"optime" : {
				"ts" : Timestamp(1657625774, 7),
				"t" : NumberLong(9)
			},
			"optimeDurable" : {
				"ts" : Timestamp(1657625774, 7),
				"t" : NumberLong(9)
			},
			"optimeDate" : ISODate("2022-07-12T11:36:14Z"),
			"optimeDurableDate" : ISODate("2022-07-12T11:36:14Z"),
			"lastHeartbeat" : ISODate("2022-07-12T11:36:14.093Z"),
			"lastHeartbeatRecv" : ISODate("2022-07-12T11:36:14.830Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "",
			"syncSourceHost" : "",
			"syncSourceId" : 1,
			"infoMessage" : "",
			"configVersion" : 3,
			"configTerm" : 9
		}
	],
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1657625775, 17),
		"signature" : {
			"hash" : BinData(0,"udjGeYIefRN886XFy4HW8chzJeY="),
			"keyId" : NumberLong("7086458178816180233")
		}
	},
	"operationTime" : Timestamp(1657625775, 17)
}

Arkadiusz_Borucki · July 12, 2022, 11:58am

I see you are using maxStalenessSeconds=120

mongo_uri=mongodb://user:pass@db1:27000,db2:27000,db3:27000/?replicaSet=replicaset-remote&authSource=admin&readPreference=secondary&maxStalenessSeconds=120

The read preference maxStalenessSeconds option lets you specify a maximum replication lag, or “staleness”, for reads from secondaries. When a secondary’s estimated staleness exceeds maxStalenessSeconds , the client stops using it for read operations.

can you also run:

rs.printSecondaryReplicationInfo()

maybe one secondary is lagging behind a primary

Nam_Le · July 12, 2022, 12:06pm

Thank you. If it’s true then it’s the first time I’ve seen the option maxStalenessSeconds in action and it’s very interesting.
When I ran your command, I saw the 8% and 100% cpu-usage secondaries has 2 and 1 seconds lag behind the primary, respectively. So it’s not because of that option, right?

Yilmaz_Durmaz · July 12, 2022, 12:17pm

Assumptions without actual experimenting are not good. so try these first.

do you have access to your server and admin credentials?
you can try lowering connection limit, restart server and check the usage again with 3-5 clients connected.

add net.maxIncomingConnections: 250 to config file
250 is not something special, I haven’t tried but should be fine for 2-3 client pooling.

what is your server version? use mongod --version

version 4.4 and onwards have “Hedged Reads”. your queries are sent to 2 secondaries and fastest answer is returned to user.

ReadPreference.SECONDARY_PREFERRED is not a conflict. you can tell them to drop _PREFERRED part. this preference will first try all secondary nodes for a connection. if they are too busy to respond (connections at limit), then it will ask to primary. otherwise, it will just give connection error. which one do you prefer?

Nam_Le · July 12, 2022, 12:18pm

I asked the teams using our database and this may be the reason. There’s currently only one very active microservice which is responsible for making 95% of the queries to our database. May be that’s why most of the queries only go to one member of secondaries?

Yilmaz_Durmaz · July 12, 2022, 12:37pm

I would suggest you create your own scripted read workload to work over a minute and run a copy of it 3-5 times at least to benchmark your cluster. this way you should see a distributed reading from both secondaries. A nodejs or python script will be the fastest choice I guess.

and as for the microservice, yes, they usually tend to create a pooling to a node when they start and use that pool until they crash or shut down. and you have only 1 of them running it seems. I haven’t thought about this before but this is clearly a bottleneck. I don’t have an answer to optimize this.