Suddenly problems with flexible sync

Hi,

I’m using flexible sync for syncing the game state in a game project and never had any problems before.
However, today after I let my app test by ~50 users suddenly the following errors popped up after a few hours of testing and a few users were complaining about sync not working anymore until they restarted the app.

This happened during a time frame of ~1-2 hours and then everything worked normally again. I hope this error doesn’t occur again, otherwise I cannot use flexible sync in my project until it’s stable…

What could be the reason? Was there any update from your side between May 23 20:06:11+02:00 and May 23 21:46:05+02:00? Or is it a bug?

I’m using Unity 2020.3.33f1 and Realm Unity v10.11.1.

These are the error logs I got during this time frame:

translator failed to complete processing batch: failed to update resume token document: connection pool for realmcluster-shard-00-01.kx2p2.mesh.mongodb.net:30444 was cleared because another operation failed with: connection() error occurred during connection handshake: read tcp 127.0.0.1:36740->127.0.0.1:30444: read: connection reset by peer
recoverable event subscription error encountered: failed to load unsynced documents cache: error querying for unsynced documents while initializing unsynced documents cache: connection() error occurred during connection handshake: read tcp 127.0.0.1:52668->127.0.0.1:30444: read: connection reset by peer

message handler failed with error: error handling "upload" message: error updating sync progress: connection(realmcluster-shard-00-01.kx2p2.mesh.mongodb.net:30444[-98435]) socket was unexpectedly closed: EOF
Logs:
[
  "Connection was active for: 1m56s"
]
translator failed to complete processing batch: failed to flush instructions to client history: allocating new client versions failed: error incrementing the version counter for (appID="62079d369f7c7cf6d91c56d9", fileIdent=2): connection pool for realmcluster-shard-00-01.kx2p2.mesh.mongodb.net:30444 was cleared because another operation failed with: connection() error occurred during connection handshake: read tcp 127.0.0.1:44172->127.0.0.1:30444: read: connection reset by peer
ending session with error: integrating changesets failed: error creating new integration attempt: failed to get latest server version while integrating changesets: connection(realmcluster-shard-00-01.kx2p2.mesh.mongodb.net:30444[-2701]) socket was unexpectedly closed: EOF (ProtocolErrorCode=201)
integrating changesets failed: error creating new integration attempt: failed to get latest server version while integrating changesets: connection(realmcluster-shard-00-01.kx2p2.mesh.mongodb.net:30444[-2701]) socket was unexpectedly closed: EOF (ProtocolErrorCode=201)
translator failed to complete processing batch: failed to update resume token document: connection() error occurred during connection handshake: read tcp 127.0.0.1:41830->127.0.0.1:30444: read: connection reset by peer
translator failed to complete processing batch: failed to update resume token document: connection() error occurred during connection handshake: EOF

Thanks in advance!

Hi. These are all transient errors that cause a quick restart / rejection and then things should pick back up. All of them look like issues connecting to your Atlas cluster which point to either (a) an event on the cluster occuring or (b) an underpovisioned cluster (if you are using an M0, performance issues are common). Did any of these cause issues, or it is just the error in the UI that is concerning you?

Hi,

thanks for the quick reply! I’m currently using a M10 cluster and cannot find any performance issues or special events that occurred during this time frame in the cluster…
I only stumbled about these error messages, because a few testers were complaining that syncing suddenly stopped and didn’t pick up again unless they restarted the app.
Since I cannot find any problems in my code causing this (it’s a very basic flexible sync implementation) or on the M10 cluster and it never happened before I assumed the problem might be connected to a (temporary?) bug in flexible sync.

The cluster itself seemed to work fine during this incident as there were no errors or performance issues in any database related backend functions…

Hmm this looks like a connection error to your Atlas cluster. Can you open a support ticket or share with us the Realm App URL (the url in the web browser) - and we can take a look on the backend for you?

Do you mean this URL: Link ?

Hey There,

We took a look and there appears to be I/O timeouts on your Atlas cluster during this time. This typically points to your cluster being overloaded and not being able to respond to requests from the sync servers. Upgrading your Atlas cluster to a higher instance type should resolve this.

-Ian