Clients can't download all data after terminating sync

I have noticed a breaking bug behaviour which makes us impossible to upgrade from a shared Atlas (M2) to a dedicated Atlas (M10) : Terminating sync which leads to downloading incomplete data.

Context
I had a strange Realm bug on our actual M2 server where I was forced to terminate sync.
I have a collection called CurrenciesContent using the Realm sync partition system (with _partition field).
Each user has a one and only one CurrenciesContent (which handles the in-app virtual currencies of the user - such as coins, gems, etc as you can find in mobile games).

The iOS is very simple : I will read/write the first CurrenciesContent because all users only have one.

func currenciesContent() -> CurrenciesContent {
    if let existingContent = self.realm()?.objects(CurrenciesContent.self).first {
        return existingContent
    }
    let newContent = CurrenciesContent()
    self.saveRealm(newContent)
    return newContent
}

As for the Realm object, I use a “download everything before using Realm” approach with

Realm.asyncOpen(configuration: userSyncConfig) { [weak self] result in
    switch result {
    case .failure(let error): break // Handle error
    case .success(let syncRealm):
        self?.syncRealm = syncRealm // Realm ready to use
    }
}

The problem
After restarting the sync, the line self.realm()?.objects(CurrenciesContent.self).first always returned null. There were no CurrenciesContent found at all BUT the document still exists in DB online, I double checked using Mongo Compass. Realm iOS SDK just cannot fetch it.
Instead it will create a new document, and the entire code logic (which relies on having only 1 single document) is now broken.

So now, I’m trying to understand why Realm SDK can’t fetch what’s in the DB online after a terminating sync.
I would like to upgrade server without losing all our users virtual currencies which they paid with real money.

Hey @Jerome_Pasquier Unfortunately, when upgrading from a shared to dedicated cluster the changestream is not migrated over which is necessary for Realm Sync to function. This requires a terminate and re-enablement of Sync which causes a Client Reset on clients which is likely the error you are seeing in your clients, they are stuck and need to client reset in order to gain fresh state from the new dedicated cluster. This is why we do not recommend shared tier clusters for production usage. You can see documentation here -

Okay, so after many tests and a few exchanges with Realm support, I have found out 2 things

  • My initial issue was from my side: I had updated the Realm scheme with a new required field. By design, all existing documents in Realm Cloud without the new required field won’t be transferred to the clients. It was written in the Realm documentation. So I just updated all the documents using Mongo Compass. Now everything syncs again.
  • However, I had a second issue following up another Terminate Sync action: Right after Terminate Sync + Enable Sync, a client couldn’t fetch any of its own documents even though they are valid documents. And upon a reboot of the app, everything worked again, but the client now has new documents of everything… Trying to figure out this specific issue.
1 Like

Hi Jerome,

but the client now has new documents of everything

Could you please elaborate what you mean by this?
Do you mean the user on that client is syncing documents that they shouldn’t have access to? If that’s the case, then please check your sync permission configuration on the cloud app. The user will see documents for a particular partition being opened if the sync permissions allow it.

Regards

Sync permissions are set correctly. Everybody can read/write only their own data (partition: “user={user_id}”).

The problem is that upon Terminate Sync, I encounter errors:

  • BadClientFileIdent (client file not found. The server has forgotten about the client-side file presented by the client. This is likely due to using a synchronized realm after terminating and re-enabling sync. Please wipe the file on the client to resume synchronization.)
  • DivergingHistories (client sent incorrect file ident salt. This is likely due to using a synchronized realm after terminating and re-enabling sync. Please wipe the file on the client to resume synchronization.)

Apparently, it is normal to have these errors.

However, my issue is that the user, upon its first connection to a re-enabled sync (having an existing local Realm Sync on the iOS client), couldn’t fetch its own documents. And could only fetch them after a reboot of the iOS app (and therefore a reboot of Realm SDK).

Hi Jerome,

Apparently, it is normal to have these errors.

Correct, it is normal to have these errors after a sync termination which requires a client reset to resolve as per Ian’s comment above.

However, my issue is that the user, upon its first connection to a re-enabled sync (having an existing local Realm Sync on the iOS client), couldn’t fetch its own documents. And could only fetch them after a reboot of the iOS app (and therefore a reboot of Realm SDK).

This behaviour is associated with the errors you’re observing.
In order for Sync to work again on clients after a sync termination, they will need to undergo a client reset to resume syncing of data. This is also why we do not recommend performing a sync termination in a production app unless otherwise advised by MongoDB staff. On this note we also recommend that before launching a production app, it should be on a minimum of M10 cluster tier so that a termination is not forced should you need to upgrade from a shared tier to a dedicated tier in the future.

As per the client reset article linked above, we recommend that you include client reset handling in your app to handle such situations so that the user will not have to manually reset it via uninstall/reinstall the app etc. Please see examples of client reset handling in your particular SDK.

Regards

I have indeed prepared my iOS app to run a client reset properly (and logout + delete everything locally).

Yesterday, I ran a few tests with Terminate Sync and I have noticed a specific flow

  1. Terminate Sync
  2. Re-enable Sync
  3. Realm UI backend shows a blue toaster saying something like “Syncing 345/370 documents…”. It takes a few minutes. Afterwards, everything looks fine. However…
  4. I try to login on my iOS app, run a client reset and do everything correctly (even uninstall/reinstall iOS app), but Realm iOS still won’t be able to fetch any of my documents.
  5. If I check back the Logs of Realm backend, I see hundreds (thousands?) of logs after re-enabling sync which look like this:

  1. It seems like users shouldn’t connect to Realm Sync while Realm backend seems to be reconstructing itself because Realm iOS won’t be able to fetch anything during this process.

Is my observation correct? If it is correct, how can I know when everything is finalized and I can allow users to be back on the app?

Hi Jerome,

The logs you’re seeing are related to the blue banner sync progress you mentioned.

When you terminate sync, it clears out the Sync metadata which contain all the instructions for sync to happen to and fro between the cloud and the client. When reenabling sync after a termination, this metadata needs to be rebuilt and you’ll see the blue banner showing the progress of the MongoDB data being translated to “changeset” instructions that can be downloaded to clients when they open Realms.

You’ll notice those logs in your screenshot do not have a value for the “user id” column which means they are not requests made by users. These logs are generated due to a Realm process which handle the rebuilding of instructions when enabling sync. You’ll also see such logs every so often when writes occur either in clients or directly in MongoDB because creation of the corresponding instructions are required.

Regards

Thank you for the explanation.

One very last question : How do you suggest to handle the mobile clients when Realm has to rebuild the metadata?
Currently, we have created a server flag “server down” using our own server and when the iOS client sees this flag, it won’t connect Realm at all.

But this solution requires another server beside Realm server. Do you have an in-house solution?

I just did another test: I forced a metadata rebuild (because I activated Dev Mode), and if I update any document which hasn’t been refreshed by the metadata system yet, then I will lose the document update.
As I understand the metadata rebuild happens when there is a terminate sync + dev mode activated + update to the Realm schema. So it seems to be something that can happen in prod and we really don’t want users to lose their progress because Realm metadata is refreshing.