Realm client reset struggles

We are trying to implement client reset functionality in a Xamarin app that uses Realm cloud sync. And there are a couple of problems that we are experiencing. First and foremost, here are the steps that we perform to generate a client reset. Here you can find the sample code which is a Xamarin app to this question:

  1. We have a realm app set with anonymous authentication and sync enabled
  2. There’s a class called MyInformation in the sample project.
  3. I’ve declared a property called Test in the cloud schema and I am not referencing that property in the sample app
  4. To generate a client reset I simply change the type of the property from string to double and then back.

Our real app, of course, is much more complicated and it asks for the user confirmation before doing the client reset and has a much more complex schema, but we’ve distilled the code to the minimum to be able to reproduce the issues that I will explain below.

Data duplication issue

After generating client reset a process starts in the cloud that copies the data from the previous schema to the new schema. And this is the phase where we face the first issue. After we receive the ClientResetException and invoke the InitiateClientReset and receive the result of true, which denotes the fact of the client reset being successful. At this point, we restart the app’s main UI. During that, we ensure that all of the changes are synced by invoking the following bit of code

await Task.WhenAll(_realm.SyncSession.WaitForDownloadAsync(),
                                   _realm.SyncSession.WaitForUploadAsync());

Then, we have a step where we check if there is already MyInformation data in the database or not, and if there is not, we generate the data. This is the simplified version of what we do in the real app, where we have to bootstrap the data of the user if it hasn’t been done before. And the problem is that even though we’ve asked the app to wait for the pending changes to sync and we do have MyInformation in the cloud-synced DB, _realm.All<MyInformation>() would return no results and our data generation logic would kick in, essentially duplicating already existing data. I guess the problem is that the data is not yet available in the new realm instance as it’s being copied.

I know, probably in this concrete case the better strategy would have been to perform the data bootstrapping in the cloud when the user’s account is created for the first time by putting some kind of trigger that would execute the data generation code for only the new users. However, we are really trying to finalize realm cloud sync and release the app and we don’t have the capacity for that yet. Also, IMHO, if the data is not yet ready in the cloud, the app should not be allowed to sync any data to it. Is there a way to somehow wait on the app and restrict cloud sync if the cloud data copy is being executed?

Client reset exception infinite cycle

This one took us about a couple of weeks to find and reproduce. In our app we were experiencing an infinite client reset cycle, we would handle the client reset properly, and then after several seconds app would receive another client reset exception. Initially, we thought that we are not disposing of all of the Realm instances. After a lot of “needle in a haystack” style search, we’ve find out that the cycle happens because we monitor for Upload and Download progress. (the way we do that can be found in this post). When we commented out the progress monitoring logic, everything worked just fine. Again, we started thinking that we did something wrong in our reactive chain and something gets leaked. But after some code filtration and extraction to the small sample I’ve attached, we discovered that just the bare directly-invoked progress monitoring logic results in this issue.

_bindings = new CompositeDisposable
                {
                    _realm.SyncSession.GetProgressObservable(ProgressDirection.Download, ProgressMode.ReportIndefinitely).Subscribe(progress => DownloadProgress = (double)progress.TransferredBytes / progress.TransferableBytes),
                    _realm.SyncSession.GetProgressObservable(ProgressDirection.Upload, ProgressMode.ReportIndefinitely).Subscribe(progress => UploadProgress = (double)progress.TransferredBytes / progress.TransferableBytes)
                };

We execute this logic during the launch of the app. In this small sample, we do that right after creating the realm instance. If we remove these lines from the sample, the client reset works just fine. If we add them, we will end up in a client reset loop. The only way to exit the loop is to force close the app and restart it. I think there’s a bug in the Realm .NET SDK which caches some connection under the hood and disposing of all of the realm instance doesn’t dispose that, causing this loop.

A workaround for this would be asking the user to force restart the app. However, I do hope that this bug can be addressed quickly, considering I have 100% reproducible sample.

1 Like

Hi @Gagik_Kyurkchyan, thanks a lot for making the time to make a reproducible example and a precise explanation!
We will investigate and keep you posted.

1 Like

@papafe an update from our side. If I change progress monitoring logic to utilize ProgressMode.ForCurrentlyOutstandingWork instead of ProgressMode.ReportIndefinitely the issue no longer happens and it resolves the infinite loop. However, we do need to use ReportIndefinitely, because we need to show the progress of sync whenever new data arrives for sync. I’ve explained this in the other post. I think this proves my point of ProgressMode.ReportIndefinitely doing some session caching in the back and not disposing of it properly when the subscription is disposed of.

Perhaps, I could figure out something, for now, to work around this. But the problem is that our progress reporting is already very complicated due to the issues described in that post. I would not like to make it even more complicated.

@papafe are there any news? I am just replying for this issue not to be closed.

@Gagik_Kyurkchyan sorry for the delay in answering.

Regarding your problem with data duplication, unfortunately it is to be expected. When client reset happens there are several steps that are happening in the cloud to make sync reinitialise and that takes a certain amount of time. During this period if you try to open a realm locally then you would not be able to see/query any of the data that was there before the client reset. Without going too much into details, the amount of time necessary depends also on the amount of data saved. I understand your frustration, but if the local client had to wait for sync to be completely reinitialised before opening a realm, then the local realm would be unusable for an undetermined (and possibly very long) amount of time. For your use case a possibility would be to query the MongoDB Atlas cluster directly, as shown here, for example.

Regarding your client reset cycle unfortunately we are still investigating why it is happening, but we’ll keep you posted when we discover something.

1 Like

Thanks @papafe. It does make sense. However, is there a way to detect from the client side that the cloud DB is busy and kind of in maintenance mode? In our case, app would be unusable anyway as it won’t have any data in it after the client reset. And right now, before you’ve written this, we decided to query MongoDB directly through the Graph QL API and check the availability of the data, which partially solves our issue. However, we have put a clutch to detect that the database is currently being copied, which is checking that the local Realm data is empty. Until the local data is empty we block the user from opening the app telling them that we are in maintenance. It’d be really great to have some explicit way of detecting this special state.

As for the client reset cycle, thanks for the update. I am interested in whether you were able to reproduce the issue I’ve explained?

@Gagik_Kyurkchyan Unfortunately it’s not something we can detect from the client side.

And yes, I’ve managed to reproduce your issue “fortunately” :slight_smile:

@papafe I am “glad” that you were able to repro the issue :slight_smile:

As for the possibility of detecting the copy on the client-side, that’s clear. As you’ve explicitly mentioned “not being able to detect client-side”, do you imply I can detect that server-side? If so, we could build an API that talks to Realm cluster on the backend, or perhaps a function within the realm cluster, that would detect such scenario not ambiguously and the client-side could talk to that API instead?

The way we’ve work-around this specific issue is by checking a table that we know for sure should have data and detected it not having any data, which implies, that server is in that intermediate copy state. However, this might result in such cases where that specific table has data, but others do not, thus, is a clutch and not a “mathematically accurate and reliable” solution.

@Gagik_Kyurkchyan sorry for the late reply, unfortunately I am having issues as I do not get notifications when a reply is posted, I’ll need to check how to solve this :slight_smile:

Regarding your issue, there is actually an endpoint in the MongoDB API that you can poll. In particular you can follow these docs: Atlas App Services API and just replace sync/data with sync/progress. So the full endpoint would be https://realm.mongodb.com/api/admin/v3.0/groups/{groupId}/apps/{appId}/sync/progress.
It’s an oversight that this endpoint has not been documented, but it should return something like this:

{
    "progress": {
        "testDB.Errand": {
            "started_at": "2021-09-21T20:32:06.267Z",
            "updated_at": "2021-09-21T20:32:18.511Z",
            "complete": true
        },
        "testDB.Task": {
            "started_at": "2021-09-21T20:32:06.253Z",
            "updated_at": "2021-09-21T20:32:18.52Z",
            "complete": true
        }
    }
}

This is used specifically for the sync initialisation phase, and it’s complete only after all the objects in the list have complete == true. Obviously this is not probably the straightforward API you’d expect but I hope it could help.

1 Like

Hi @Gagik_Kyurkchyan sorry for not giving you updates about the client reset loop issue, but it’s a little complex to investigate it.

I have a question. It seems I am able to reproduce the client reset loop only on Xamarin.iOS, but not on Xamarin.Android. Can you confirm that?

1 Like

As an update, we’ve created this issue here in realm core for the client reset loop.

1 Like

@papafe thanks so much for the replies. It seems I have the same issue as you I do not get notifications or they are lost :slight_smile:
Thanks for sharing the API to check for the status of the DB progress. I will play with this and see how we can make use of it within our application and replace the clutch we’ve put in.

As for the client reset loop, out app is based on Xamarin.Forms and targets iOS & UWP (but not Android). I haven’t checked for this issue on Android but we did observe this issue both on iOS and UWP.

@Gagik_Kyurkchyan I have actually asked internally and it seems that there is a problem with email notifications at the moment but they are working on it.

Thanks for the update. I’ve actually tried it only on iOS and Android as I’ve played around with your example project.

Anyways, I’m sorry that you are experiencing the error with the loop but, unfortunately, it will not be an easy fix. So for now asking the user to restart the app would be the best option in my opinion. I understand your frustration but we need to be quite careful with the way we fix this.

1 Like

Hey @papafe
Sorry, I just realized I haven’t checked for this thread for a while and I havent’ received notification either :slight_smile:

Even though my exact problem wasn’t resolved, I am glad that you’ve acknowledged it and that I was able to pinpoint the issue :slight_smile: Thanks for the support so far. I think the workaround with restart, well not ideal, but it exists, and hopefully users won’t encounter it as client reset is something that should be avoided at all costs.

@Gagik_Kyurkchyan No worries :wink:
I’m glad that you think that you have a working solution for now. Actually we’re still investigating what is happening, but it’s quite difficult to pinpoint the cause.

Regarding the notifications, they have been fixed shortly after I sent my previous message, so hopefully you should get a notification for this and all the other messages you’ll receive in the future.

2 Likes

@Gagik_Kyurkchyan I have an update regarding this.
We discovered that the underlying cause is actually a timing issue. What you can do right now to solve the problem on your side is to add a delay before calling LoadAsync again after the client reset. Something like a couple of seconds seems to be fine on our side (await Task.Delay(2000)).
I obviously understand that this is a hacky solution, but it’s something you could work on if you really want to avoid letting the user restart the app in case of a client reset.

3 Likes

Thanks a lot, @papafe for the reply. Let us try this and get back to you. If it works, it’s definitely waaaay better than a restart!

2 Likes