Mongosync keeps "lagTimeSeconds" big

Hello!

I’m using mongosync 1.7.0 to sync data between two datacenters (about 500GB) with MongoDB 6.0.8. It syncs collections without any issues moves to “change event application” stage. But the problem that “lagTimeSeconds” continue growing (now it’s 16493) and stops when it’s 4-8 hours behind.
Then if want to commit I have to wait the lag time. I checked this and after “commit” command sent “lagTimeSeconds” goes down slowly second by second. By docs I see that during “commit” state I should prevent writes to source cluster. It means that I should consider 4-8 hours downtime
I was thinking that after initial sync of collections, the “lagTimeSeconds” will be going down. But it’s not, it goes up and then stops increasing some moment.

Same time if I decide to move data using copying of mongodb files from disk it will take about 4 hours of downtime.
I suppose something is wrong here.

{"progress":{"state":"RUNNING","canCommit":true,"canWrite":false,"info":"change event application","lagTimeSeconds":17042,"collectionCopy":{"estimatedTotalBytes":1718274427703,"estimatedCopiedBytes":1718366009460},"directionMapping":{"Source":"cluster0: 127.0.0.1:27017","Destination":"cluster1: 127.0.0.1:27100"},"mongosyncID":"coordinator","coordinatorID":"coordinator"}}

I restarted sync many times and always after moving to “change event application” state it keeps “lagTimeSeconds” big enough to make it not possible to move to another server without downtime.

Please let me know if you have any idea why it happens. Network speed and CPU are not bottlenecks, as I restarted sync many times and “collection copy” stage goes fast enough and after that it still does some inserts on the new server, but it looks like it keeps the lag and doesn’t try to catch up.

This is state that I see right now after it runs for couple more hours {"progress":{"state":"RUNNING","canCommit":true,"canWrite":false,"info":"change event application","lagTimeSeconds":23292,"collectionCopy":{"estimatedTotalBytes":1718274427703,"estimatedCopiedBytes":1718366009460},"directionMapping":{"Source":"cluster0: 127.0.0.1:27017","Destination":"cluster1: 127.0.0.1:27100"},"mongosyncID":"coordinator","coordinatorID":"coordinator"}}

Btw, desstination server is not on same machine. I just run ssh tunnel to another server on port 27100

Looks like I found why I have that lag. “date” command shows different time on two servers because of time zone. And difference is 4 hours. I’ll try to set same timezone now and will update this ticket if it works. FYI I run mongosync binary on source cluster with EDT tz, but on destionation one I have UTC.

Ok, I changed time zone, restarted server and restarted sync. Now it looks promising because lag doesn’t grow up during sync, stays around 50 seconds

{"progress":{"state":"RUNNING","canCommit":false,"canWrite":false,"info":"collection copy","lagTimeSeconds":75,"collectionCopy":{"estimatedTotalBytes":1721506636723,"estimatedCopiedBytes":395547521848},"directionMapping":{"Source":"cluster0: 127.0.0.1:27017","Destination":"cluster1: 127.0.0.1:27100"},"mongosyncID":"coordinator","coordinatorID":"coordinator"}}

75 seconds. So, looks like mongosync uses local time instead of unix timestamp for dates to apply oplog, not sure why it’s like that.

Looks like it didn’t help. Collection copy finished, now on “change event application”. And the lag grows up again

{“progress":{“state”:“RUNNING”,“canCommit”:true,“canWrite”:false,“info”:“change event application”,“lagTimeSeconds”:23079,“collectionCopy”:{“estimatedTotalBytes”:1721506636723,“estimatedCopiedBytes”:1722156937585},“directionMapping”:{“Source”:“cluster0: 127.0.0.1:27017”,“Destination”:“cluster1: 127.0.0.1:27100”},“mongosyncID”:“coordinator”,“coordinatorID”:“coordinator”}}

After three weeks of trying I decided to use replication to move to another datacenter instead of mongosync.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.