Replication syncSource issue with bandwidth

Ben_Murrell · April 24, 2024, 7:49am

I have a replicaset with many nodes, 2 secondaries of which are in 1 region of the world, giving regional resilience.
The problem I have is that when both lag, they both start sync’ing from Primary, taking all the bandwidth on the pipe to that region. If one did the sync from Primary and the other sync’d from the other it would give the best solution. I don’t care if both are over the 30 seconds lag which causes a secondary to sync from another in-sync node.
The syncSource seems to depend on ping times and lag (and a few other unrelated things in my case), but not take bandwidth into account.
How can I configure this?

Peter_Hubbard · April 26, 2024, 7:34am

Assuming you are using self-managed installs, check the documentation here https://www.mongodb.com/docs/manual/tutorial/configure-replica-set-secondary-sync-target/

Ben_Murrell · May 8, 2024, 11:52am

Yes, we are using self-managed installs. How do we stop this happening?
Also, when 1 node in each of 2 remote regions are online (let’s call them A and B, with primary at C), 1 lags while the other doesn’t. When I shutdown B, A doesn’t lag. The question is, why is A throttled by B when it’s sync’ing from C ? We’ve proved it’s not network or disk performance as it only happens while B is online.

chris · May 30, 2024, 9:59pm

Hi @Ben_Murrell

Can you explain the actual topology of the cluster along with votes and priorities configured?

You could try adjusting the parameter maxSyncSourceLagSecs

setParameter:
  maxSyncSourceLagSecs: 120

ref: mongo/src/mongo/db/repl/README.md at r7.0.11 · mongodb/mongo · GitHub

Ben_Murrell · May 31, 2024, 6:58am

Hi Chris
The topology is currently 4 nodes (yes, I know it’s not odd at the moment).
2 nodes in UK, at different DCs, 1 with priority 100, 1 with priority 50.
1 node in US, with priority 0.
1 node in Asia, with priority 0.
I was trying to add extra nodes in US/Asia, but this causes problems when lagging, detailed above.
The US node lags daily, by many hours, but only if Asia is online. I can’t explain it.
Not sure how maxSyncSourceLagSecs will help.
We have writeConcern 1, and readConcern local.

chris · May 31, 2024, 10:33pm

Its the even number of voting members that are the issue.

It looks like the main deployment is UK. I’d expand this to 3 voting members.

This really does read as a networking type issue. Bandwidth, traffic shaping, one route prioritized over another. I can’t think of anything specific to MongoDB that would influence this.

I posted this parameter in response to the very first post with 2 secondaries in the remote region where they are resetting to sync from the Primary when the secondary sync source is lagging more than 30s, perhaps it could help that situation.