Thanks Max for the response and sharing detailed info.
My queries are listed below :
-
if we use this command (db.printSlaveReplicationInfo() ) over mongoshell then does mongo server use any kind of locking on oplog collection to calculate the replication lag ?. why this command is so expensive than other admins commands ?
-
if secondary is pulling oplog from the synchronisation source then how do we check the status whether it’s pulling or stuck by responsible thread or producer thread ?
-
is there any way to trace replwriter thread activity in details whether does it apply oplog to itself or not ?
-
As per repl metrics , replops counter was not getting incremented so do we use any logic to apply the oplog based on any logic like scheduler frequency or buffer full , Could you plz share more insight on this ?
DB server status for repl metrics .
"repl" : {
"executor" : {
"counters" : {
"eventCreated" : 2,
"eventWait" : 2,
"cancels" : 55145773,
"waits" : 129239302,
"scheduledNetCmd" : 334840,
"scheduledDBWork" : 0,
"scheduledXclWork" : 0,
"scheduledWorkAt" : 55522254,
"scheduledWork" : 129239360,
"schedulingFailures" : 0
},
"queues" : {
"networkInProgress" : 0,
"dbWorkInProgress" : 0,
"exclusiveInProgress" : 0,
"sleepers" : 4,
"ready" : 0,
"free" : 17
},
"unsignaledEvents" : 4,
"eventWaiters" : 0,
"shuttingDown" : false,
"networkInterface" : "NetworkInterfaceASIO inShutdown: 0"
},
"apply" : {
"batches" : {
"num" : 11559352,
"totalMillis" : 6564
},
"ops" : NumberLong(243908927)
},
"buffer" : {
"count" : NumberLong(1383730),
"maxSizeBytes" : 268435456,
"sizeBytes" : NumberLong(266197984)
},
"network" : {
"bytes" : NumberLong("46889211548"),
"getmores" : {
"num" : 54978354,
"totalMillis" : 160607439
},
Reduce the workload or increase the server capacities ASAP. RAM + CPU + IOPS. – i don’t see capacity issue and as per metrics it was not overloaded. Do you see as per below , plz add
MongoDB server specs : RAM - 252 GB + vCPU’s - 32 + IOPS + 10TB pure storage disks in LVM so i don’t see capacity issues on Secondary node even we don’t use for read from Secondary.
Sar stats :
Mem:
12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
01:50:01 AM 574332 263830316 99.78 287900 91780836 168020360 56.39
02:00:01 AM 783424 263621224 99.70 287952 91450596 168778552 56.64
02:10:01 AM 563308 263841340 99.79 287880 91232540 168892636 56.68
Device:
01:40:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
01:50:01 AM dev8-0 7.91 6.83 123.06 16.42 0.00 0.17 0.09 0.07
01:50:01 AM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:50:01 AM dev8-32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
01:50:01 AM dev8-48 43.53 21.01 5324.36 122.81 0.06 1.44 0.46 1.98
01:50:01 AM dev8-64 48.59 22.86 5542.14 114.53 0.06 1.23 0.40 1.93
01:50:01 AM dev8-80 47.32 21.88 5391.72 114.40 0.06 1.30 0.42 1.99
01:50:01 AM dev8-96 48.64 22.62 5452.74 112.56 0.06 1.26 0.41 1.99
01:50:01 AM dev253-0 188.08 88.37 21710.96 115.90 0.28 1.50 0.21 3.95
01:50:01 AM dev253-1 191.84 88.37 21710.96 113.63 0.29 1.49 0.20 3.92
02:00:01 AM dev8-0 8.77 0.84 148.13 16.98 0.00 0.17 0.08 0.07
02:00:01 AM dev8-16 0.01 0.05 0.00 8.00 0.00 2.75 2.75 0.00
02:00:01 AM dev8-32 0.00 0.03 0.00 8.00 0.00 0.00 0.00 0.00
02:00:01 AM dev8-48 9.98 0.11 251.18 25.18 0.01 1.13 0.09 0.09
02:00:01 AM dev8-64 10.70 0.10 271.60 25.38 0.01 1.04 0.08 0.09
02:00:01 AM dev8-80 10.53 0.12 271.78 25.82 0.01 1.06 0.09 0.09
02:00:01 AM dev8-96 10.66 0.13 274.87 25.79 0.01 1.05 0.08 0.09
02:00:01 AM dev253-0 41.88 0.46 1069.42 25.55 0.05 1.13 0.03 0.11
02:00:01 AM dev253-1 42.82 0.03 1069.42 24.98 0.05 1.11 0.02 0.10
02:10:01 AM dev8-0 8.20 0.00 123.04 15.00 0.00 0.17 0.09 0.07
02:10:01 AM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:10:01 AM dev8-32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
02:10:01 AM dev8-48 106.64 143.96 7929.23 75.70 0.15 1.40 0.32 3.43
02:10:01 AM dev8-64 128.48 147.57 8862.38 70.13 0.14 1.09 0.26 3.37
02:10:01 AM dev8-80 124.11 142.89 8428.62 69.06 0.14 1.16 0.28 3.44
02:10:01 AM dev8-96 128.53 144.05 8659.11 68.49 0.14 1.11 0.27 3.43
02:10:01 AM dev253-0 487.75 578.47 33879.35 70.65 0.64 1.32 0.14 6.77
02:10:01 AM dev253-1 492.88 578.47 33879.35 69.91 0.65 1.31 0.13 6.64
load:
12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 6.42 0.00 0.22 0.12 0.00 93.24
12:20:01 AM all 3.37 0.00 0.09 0.00 0.00 96.54
12:30:01 AM all 6.61 0.00 0.23 0.09 0.00 93.06
12:40:01 AM all 7.42 0.00 0.25 0.07 0.00 92.26
12:50:01 AM all 3.42 0.00 0.14 0.00 0.00 96.44
01:00:01 AM all 6.90 0.00 0.26 0.03 0.00 92.81
01:10:01 AM all 3.32 0.00 0.09 0.00 0.00 96.59
01:20:01 AM all 5.76 0.00 0.23 0.03 0.00 93.99
01:30:01 AM all 4.77 0.00 0.14 0.01 0.00 95.08
01:40:01 AM all 4.08 0.00 0.11 0.01 0.00 95.80
01:50:01 AM all 8.22 0.00 0.27 0.05 0.00 91.46
02:00:01 AM all 3.55 0.00 0.15 0.00 0.00 96.30
02:10:01 AM all 9.80 0.00 0.40 0.13 0.00 89.68
02:20:02 AM all 3.37 0.00 0.11 0.00 0.00 96.51