Keep getting Bad changeset (DOWNLOAD), even after re-installing app

Some users of my production app consistently get “Bad changeset (DOWNLOAD)” errors when using Realm Sync, even after re-installing the app.

The code of the sync error handler is the following (Swift):

realmApp.syncManager.errorHandler = { error, session in
    let syncError = error as! SyncError
        
    switch syncError.code {
        
    case .clientResetError:
        guard let (path, clientResetToken) = syncError.clientResetInfo(),
              let realmFileURL = getClientResetRealmFileURL() else { return }
            
        DispatchQueue.main.async {
            let data = SyncErrorData(clientResetToken: clientResetToken, autoBackupPath: path, realmFileURL: realmFileURL, syncManager: realmApp.syncManager)
            AppDelegate.shared.rootViewController.handleClientResetError(data: data)
        }
            
    case .clientUserError: // "expired refresh token" error, after 30 days
        DispatchQueue.main.async {
            AppDelegate.shared.rootViewController.handleSyncError(.expiredRefreshToken)
        }

    // In case of other errors, we do the same as for expired refresh token errors: log out the user
    default:
        AppDelegate.shared.rootViewController.handleSyncError(.unknown(syncError))
    }
}

When getting this error, or any other kind of sync error, the user is logged out. But for this error, when users try to login again, they get the same error (even if they uninstall and re-install the app).

The error on the server is:

Error:

ending session with error: integrating 1 changesets failed after 1 attempts in 7.540692071s: could not complete upload integration as this connection no longer owns the file ident; no action is needed, as the client has already established a new connection to the sync server to complete its upload (ProtocolErrorCode=201)

Logs:

[ "Session was active for: 10s" ]

Partition:

PUBLIC

Session Metrics:

{ "uploads": 1, "downloads": 1 }

Remote IP Address:

81.102.24.215

SDK:

Realm Cocoa v10.15.1

Platform Version:

Version 15.1 (Build 19B74)

The important part (I guess) is could not complete upload integration as this connection no longer owns the file ident, but I don’t understand what it means. The part that troubles me is “no action is needed, as the client has already established a new connection to the sync server to complete its upload”, which seems to indicate that this shouldn’t be an error.

It’s also worth noting that before getting this specific error, users got another error for a few months, “Bad progress information (DOWNLOAD)”, which also prevented them from using the app.

I’m guessing the source of the problem is a schema inconsistency, however this should be resolved by now. My concern is that such errors should be solved by doing a client reset, which is not the case here.

How to get sync to work again for those users?

1 Like

Hello @Jean-Baptiste_Beau

My name is Josman and I am happy to assist you with this issue. Usually, a BADCHANGESET (DOWNLOAD) error is an error related to the history for a said partition. These types of errors can occur due to actions that are not permitted like making changes to schemas with development mode Off, setting incorrect partition value, syncing to/from incorrect partition, etc.

In your particular scenario, it appears that the client is unable to rebuild the history for a particular partition due to two (or more) operations that are not congruent with each other, i.e. an update after a delete related to the same id of a document.

Currently, there is only two possible solution to this error:

  1. In order to investigate this further, I suggest you open a case using the support portal. This will allow us to help you in the best way possible and see if we may be able to fix this.
  2. Terminate Sync, wait 10 minutes and re-enable Sync. This is an action that we only recommend as a last resort. Terminating Sync will create a new history for each of the existing partitions, thus eliminating the possible inconsistency. However, as this is a very aggressive action, it may have undesirable side effects, especially if the client reset has not been implemented.

Please let me know if you have any additional questions or concerns regarding the details above.

Kind Regards,
Josman

Hi @Josman_Perez_Exposit,

Thank you for your answer.

Terminating Sync is a very risky and troublesome process, as you pointed out, and the last times I had to do it were absolute nightmares, causing the app in production to be down for a few days.

Isn’t it possible to restart sync (or something similar) only for some users? I find it very surprising that to solve the problem of one user (or a few), I have to put the whole thing down for everyone.

Thanks,
JB

Hello @Jean-Baptiste_Beau

Isn’t it possible to restart sync (or something similar) only for some users? I find it very surprising that to solve the problem of one user (or a few), I have to put the whole thing down for everyone.

Unfortunately no, terminating Sync is, as you said, the last resource to fix this kind of issue and it cannot be performed for only some users. The badchangeset problem you are experiencing for that partition is making the client not be able to restore the history for the same. If you could open a support case, we would be able to help you in the best way possible and investigate further if we could solve this without terminating Sync.

Please let me know if you have any additional questions or concerns regarding the details above.

Kind Regards,
Josman

1 Like

I have a similar issue with the Sync. The clients reported that the app is “down”. Not sure if that’s user-related or time-related though. The error message is the same message handler failed with error: error handling "upload" message: could not complete upload integration as this connection no longer owns the file ident; no action is needed, as the client has already established a new connection to the sync server to complete its upload

Might be related to Error Invalid Session on Old Client 10.3 or 10.4

Might be related to the fact that async open took 4 minutes to complete and we allow to reconnect after it takes more than 2 minutes

The whole uncompressed database size is ~7mb and it even less for a specific user due to partition so it’s quite strange especially because some users were from the USA and we have AWS N. Virginia (us-east-1) M10 there.

It is indeed related to the Error Invalid Session on Old Client 10.3 or 10.4 as symptoms are the same but might not be related to this topic even though the error seems the same.

It is happening right now but was fine during morning-day. It takes more than 2 mins during Realm async open to actually start downloading any data. Moreover, it isn’t the first launch. It takes 4-5 mins to async open Realm on app restart.

01.10 20:38:13.168 - The app tries to async open two Realms
01.10 20:38:14.630 - The smaller one (user) successfully opened
01.10 20:40:36.307 - The bigger one (public) download start
01.10 20:40:46.171 - The bigger one (public) download finish

So it takes 140 seconds to start the download and just 10 seconds to finish downloading. That’s beyond any expectations. Especially because all the data should already be available locally but it looks like it redownloading everything after the app restart.

UPDATE: It looks like MongoDB was just updated from the 4.4.10 to the 4.4.11 and it works just fine after restart. Though, not sure if that’s a restart or version update that helped.

UPDATE 2: This is happening again right now so it was not fixed :man_shrugging:. The strange thing it is happening 3rd day in a row at the same time. It took 8 minutes today to startup the app. The server load is just zero but it works like under 1000% load

UPDATE 3: It works fine on the AWS Ireland (eu-west-1) M0 machine but for all our AWS N. Virginia (us-east-1) projects it just doesn’t work. I think we will try to migrate Is it possible to migrate MongoDB together with Realm and Sync to different cloud provider or region?

Why do we need to wait 10 mins? I tried Terminated sync and re-enable. It also seems to work fine.

Hello @NightNight , sorry for the long reply on this. Usually, it is a good practice to wait a minimum of 10 minutes before terminating and reenabling Sync to allow the Sync process to prune all pending operations.

Note that the Realm database on the server is common for all your Realm applications within the same project. When you terminate a Realm application, the server process must delete all metadata associated with that application. So, although you can terminate and start it without any problem, depending on the size of the stored data, it is recommended (as a good practice) to wait a minimum amount of time to ensure that there will be no problems when restarting Atlas Device Sync again.

Please let me know if you have any additional questions or concerns regarding the details above.

Kind Regards,
Josman

1 Like

I had to terminate sync and there is a message Sync is currently terminating... Please wait for sync to finish terminating before enabling again. that persists for 30 minutes already and does not allow to enable sync back. I doubt there is work that requires so much time since there is only 2MB of data.

Hello @Anton_P ,

Could you please share with me by private message the URL of your App Services Project? I would want to see what could be the issue you are facing.

Looking forward to your response

Please let me know if you have any additional questions or concerns regarding the details above.

Kind Regards,
Josman

1 Like

Hello @Anton_P

As we have been talking privately, I confirm that the problem with your application is now fixed.

Sorry for the inconvenience and thank you for your patience.

Please let me know if you have any additional questions or concerns regarding the details above.

Kind Regards,
Josman

2 Likes