Replica performance issues

Hey @Konstantin!

What is the replication lag between mongo15 and the primary during working hours?

It looks like your workload is predominant queries and updates:

opcounters insert query update delete getmore Total opcount % insert % query % update % delete % getmore
mongo16 489,841,759 3,342,264,963 1,964,763,438 56,824,876 1,342,395,157 7,196,090,193 6.81% 46.45% 27.30% 0.79% 18.65%
mongo15 1,213,845 3,337,071,771 9,277,428 53,262 35,840,774 3,383,457,080 0.04% 98.63% 0.27% 0.00% 1.06%
mongo04 - 722,850 - - 47,780 770,630 0.00% 93.80% 0.00% 0.00% 6.20%

At a glance, I noticed mongo15 (sync source for mongo04) has been busy serving queries (~98% of its total opcount), while also having to keep up with a heavy replication update workload (~79% of its opReplCount).

opcounterRepl insert query update delete getmore Total opcount % insert % query % update % delete
mongo16 1,635,305 - 9,709,525 174,284 - 11,519,114 14.20% 0.00% 84% 2%
mongo15 1,057,934,813 - 4,474,701,469 138,481,085 - 5,671,117,367 18.65% 0.00% 79% 2%
mongo04 355,496,844 0 1,512,312,845 33,000,800 - 1,900,810,489 18.70% 0.00% 80% 2%

Keep in mind, these counters are counting the number of operations since each instance has been started. That was the reason why I asked for a couple of db.serverStatus samples from each node, therefore we could calculate the difference between two positions in time. I only had one sample per server, so … cut me some slack with this ends up not making any sense. :smiley:

First of all, a secondary main role should be to provide data availability in the cluster. With that being said, it’s ok to let the application query from a secondary if the query workload isn’t heavy and it isn’t causing any replication lag.

My guess: mongo15 is too busy serving queries and it’s causing a replication lag. Then, mongo04 is also suffering the consequences, because mongo15 is its sync source.

Make sense?

– Rodrigo

3 Likes