Replica performance issues

logwriter · December 16, 2020, 10:47pm

What is the replication lag between mongo15 and the primary during working hours?

It looks like your workload is predominant queries and updates:

opcounters	insert	query	update	delete	getmore	Total opcount	% insert	% query	% update	% delete	% getmore
mongo16	489,841,759	3,342,264,963	1,964,763,438	56,824,876	1,342,395,157	7,196,090,193	6.81%	46.45%	27.30%	0.79%	18.65%
mongo15	1,213,845	3,337,071,771	9,277,428	53,262	35,840,774	3,383,457,080	0.04%	98.63%	0.27%	0.00%	1.06%
mongo04	-	722,850	-	-	47,780	770,630	0.00%	93.80%	0.00%	0.00%	6.20%

At a glance, I noticed mongo15 (sync source for mongo04) has been busy serving queries (~98% of its total opcount), while also having to keep up with a heavy replication update workload (~79% of its opReplCount).

opcounterRepl	insert	query	update	delete	getmore	Total opcount	% insert	% query	% update	% delete
mongo16	1,635,305	-	9,709,525	174,284	-	11,519,114	14.20%	0.00%	84%	2%
mongo15	1,057,934,813	-	4,474,701,469	138,481,085	-	5,671,117,367	18.65%	0.00%	79%	2%
mongo04	355,496,844	0	1,512,312,845	33,000,800	-	1,900,810,489	18.70%	0.00%	80%	2%

Keep in mind, these counters are counting the number of operations since each instance has been started. That was the reason why I asked for a couple of db.serverStatus samples from each node, therefore we could calculate the difference between two positions in time. I only had one sample per server, so … cut me some slack with this ends up not making any sense.

First of all, a secondary main role should be to provide data availability in the cluster. With that being said, it’s ok to let the application query from a secondary if the query workload isn’t heavy and it isn’t causing any replication lag.

My guess: mongo15 is too busy serving queries and it’s causing a replication lag. Then, mongo04 is also suffering the consequences, because mongo15 is its sync source.

Make sense?

– Rodrigo