Hey @Konstantin!
What is the replication lag between mongo15 and the primary during working hours?
It looks like your workload is predominant queries and updates:
opcounters | insert | query | update | delete | getmore | Total opcount | % insert | % query | % update | % delete | % getmore |
---|---|---|---|---|---|---|---|---|---|---|---|
mongo16 | 489,841,759 | 3,342,264,963 | 1,964,763,438 | 56,824,876 | 1,342,395,157 | 7,196,090,193 | 6.81% | 46.45% | 27.30% | 0.79% | 18.65% |
mongo15 | 1,213,845 | 3,337,071,771 | 9,277,428 | 53,262 | 35,840,774 | 3,383,457,080 | 0.04% | 98.63% | 0.27% | 0.00% | 1.06% |
mongo04 | - | 722,850 | - | - | 47,780 | 770,630 | 0.00% | 93.80% | 0.00% | 0.00% | 6.20% |
At a glance, I noticed mongo15 (sync source for mongo04) has been busy serving queries (~98% of its total opcount), while also having to keep up with a heavy replication update workload (~79% of its opReplCount).
opcounterRepl | insert | query | update | delete | getmore | Total opcount | % insert | % query | % update | % delete |
---|---|---|---|---|---|---|---|---|---|---|
mongo16 | 1,635,305 | - | 9,709,525 | 174,284 | - | 11,519,114 | 14.20% | 0.00% | 84% | 2% |
mongo15 | 1,057,934,813 | - | 4,474,701,469 | 138,481,085 | - | 5,671,117,367 | 18.65% | 0.00% | 79% | 2% |
mongo04 | 355,496,844 | 0 | 1,512,312,845 | 33,000,800 | - | 1,900,810,489 | 18.70% | 0.00% | 80% | 2% |
Keep in mind, these counters are counting the number of operations since each instance has been started. That was the reason why I asked for a couple of db.serverStatus samples from each node, therefore we could calculate the difference between two positions in time. I only had one sample per server, so … cut me some slack with this ends up not making any sense.
First of all, a secondary main role should be to provide data availability in the cluster. With that being said, it’s ok to let the application query from a secondary if the query workload isn’t heavy and it isn’t causing any replication lag.
My guess: mongo15 is too busy serving queries and it’s causing a replication lag. Then, mongo04 is also suffering the consequences, because mongo15 is its sync source.
Make sense?
– Rodrigo