“non-recoverable error processing event: dropping namespace ns=‘.’ being synchronized with Device Sync is not supported. Sync cannot be resumed from this state and must be terminated and re-enabled to continue functioning.”
As per the below thread, this has resulted in devices being unable to communicate with MongoDB. We have a client using the app right now so this is a major issue.
The thing is I had no intention of dropping the collection in question. I don’t really know what this even means. In fact I can pretty much pinpoint the exact time the issue occurred because that is when the last data was uploaded and it was while I was on holiday, so I don’t think it’s anything I could have done.
So my questions are:
how could the collection have been dropped? This is so I know how to avoid the issue reoccurring in the future.
is there a way of “un-dropping” the collection without terminating and restarting the sync?
if I do have to restart the sync, what will happen to all the data that is currently (I’m assuming) stored on our users’ devices? This issue occurred while I was away and so users’ have been uploading data for a week. If this all gets lost it would be a bit of a catastrophe.
We heard from you after a long time, how are you? Thank you for raising your concerns.
This error happens when a synced collection is dropped from the database. This brings sync to a halt. The team does not recommend dropping synced collections. The only recovery from here is to terminate and re-enable sync.
If your application is running with dev mode on, then anyone with access to your app can delete the collection from the database. Could you please verify if the dev mode is off?
Could you please confirm if your application has client-reset implemented? The client-reset implementation can assist in recovering unsynced changes from client devices. Please follow the Client-Reset section for more details.
Can you please help me understand how I could have dropped the synced collection from the database? I have no idea how this could have happened but it was certainly not my intention so it would be great to understand how to stop it from happening again as it’s caused a fairly major issue!
Does dropping the database meaning deleting it? Because I very much don’t want this to happen it has lots of key data!
If your application is running with dev mode on, then anyone with access to your app can delete the collection from the database. Could you please verify if the dev mode is off?
A correction to this section. With developer mode ON, it allows anyone to change the schema from the client side. If other users have permission to access the database they can delete a collection. You can look at the project feed for recent actions performed.
This is accessible from the Data Services tab, top right corner.
Depending on your SDK version, if you have implemented client-reset logic in your application, then unsynced changes can be recovered. But to enable sync again between Atlas and the device, you will have to terminate and re-enable.
Hi, quick clarification here, no device sync user (regardless of development mode being on) is capable of dropping a collection. This is an action that an Admin takes either through the Data Explorer / Compass or using the Shell / Drivers. A cluster update does not result in dropping collections.
It is possible for us to track down how this was dropped depending on when it happened, but I am quite certain that this must have been a manual action. To that end, if you would like us to keep looking at this could you possibly:
Coordinate with your team and confirm that no one accidentally performed this drop collection
Send us your group_id (see the URL when you are in the atlas console)
In terms of places to go from here, terminating and restarting sync is the only option at the moment. In special cases we can allow you to pass over this, but it would result in an unhealthy state in which Realm and Atlas are no longer synchronized (and we do not recommend that). If client reset logic is enabled, then you should not experience data loss when terminating and re-enabling sync.
Thanks very much for your help with this @Tyler_Kaye.
Just so I know (so that I can confirm with my team that this wasn’t done accidentally) if I wanted to drop a collection in Compass how would I do this? Would it be by hitting the delete button next to the collection (see image below).
The collection that it says has been dropped btw is still there in the Compass (it’s the User collection). If it had been dropped wouldn’t it be expected that it would not be there anymore?
There’s only one other member of the team who is an admin. If it turns out that this wasn’t accidentally done by him, does that mean then that we’re likely to have been hacked?
For the group_id, please find atlas console URL below. If this is the wrong URL let me know.
Hi, dropping a collection can be done in Compass / Data Explorer with the trash button or the shell/drivers with the dropCollection() method.
Thank you for passing along the link to your cluster. I am forwarding it along to someone on the Atlas team to verify that nothing occurred during the cluster upgrade that might have caused this.
As to why the collection exists still, a number of things could have re-created it. Do you still see all of the users in there that you would expect to see? If so, there is a chance we can allow you to keep syncing, though it would be very odd and surprising to me if so.
Ok thanks very much, let me know what the Atlas team say. The app does seem to have stopped syncing at pretty much the exact time that the cluster upgraded and I would be very surprised if either one of us accidentally deleted the collection by hitting the trash button for our user collection (presumably an “are you sure?” popup would have appeared before the collection was dropped?), especially given that neither of us can remember being online at that time. So if it’s not the upgrade I can only think we were hacked, although I can’t really imagine who would want to do that either.
I can still see all of the users that I would expect to see. The only issue is their data has stopped updating due to the synchronisation having stopped, so the device data will be out of sync with the atlas data.
You mention that you may be able to allow us to keep syncing. Do you mean without terminating and restarting the sync? Would the fact that device and atlas data are currently out of date with each other be an issue with this?
I have filed a ticket with the Atlas team to identify if there was an issue that might have occurred and if not, produce some sort of evidence that this was indeed user driven.
In the meantime, I still would suggest terminating and re-enabling sync (it will be the same regardless of the outcome of the investigation). Additionally, I would take this time to mention that we really recommend you not use shared tier clusters for any sort of production environment. They have rate limitting built-in, limited visibility and metrics, and generally can be more susceptible to network outages.
Ideally, you can terminate sync. scale up your cluster to an M10, and then enable sync. This should be a safe operation and will just cause your clients to reset and re-upload any unsynced changes.
I will keep you informed of the investigation. I too find the timing suspect, though I also would be very surprised if routine maintenance led to this.
I will take your advice on scaling up our cluster (cost depending) as would like to avoid issues like this happening again!
I just want to clarify before terminating and re-enabling sync, would you expect that our users’ data that they’ve been saving over the past week and a half will not be lost? We are doing a 5 week trial with this client so if it is likely that user data will be lost it may be best to wait till the trial is over and then terminate and re-enable sync.
Hi, quick question first. Does your workload involve making changes to Atlas and ensuring that they make their way to Realm? Looking at the metrics from before it does seem like you used to have some writes making it from Atlas to the Device.
If you are in a trial and you do not need writes to be going to/from MongoDB at the moment, the best bet for you and your business is likely to wait out of safety, but terminating and re-enabling will eventually be necessary and should be a safe operation. (It is needed for migrating shared-tier clusters, but once you are dedicated you can scale up and down freely).
We are looking into a few re-architectures of this replication component to be able to more safely resume without terminating sync in these cases, but unfortunately, they are in the early stages still.
Yes we do make changes in the Atlas that are needed to make their way to realm as part of our normal workload. Our app lets people track the carbon footprint of their food purchases so we have a database of food items and their associated footprints, so occasionally we update this database in the atlas. We also occasionally update user data.
Ok I will make a decision on terminating and re-syncing. The downside of not doing this is that right now new users cannot create an account, because creating a new user requires communication between their device and the Atlas, so the app is currently crashing every time a new user tries to create an account.
Let me know when the Atlas team get back to you with the results of their investigation. If this is something we’ve accidentally done on our end it would be good to know so that we don’t do it again.
I was just wondering whether the Atlas team have got back to yet as to whether this issue was caused by an action that was taken manually or whether it happened during the cluster update?
Hi @Laurence_Collingwood, I checked in with the team and they couldn’t find any evidence that this was related to ongoing maintenance work and I cross-checked that with occurrences of this error across our systems and found no spike in this error type (which I would have expected to find if this were an issue across all free-tier clusters). I looked over what that maintenance was doing and it seems unlikely to have caused this issue.
Unfortunately, because it is a free-tier cluster, we have very limited metrics and logging for the clusters so it is difficult to investigate this any further (https://www.mongodb.com/docs/atlas/mongodb-logs/).
I would be happy to continue chatting about the best path forward, but unfortunately, I am at a bit of a dead end in terms of investigating what exactly happened here.
I just terminated and restarted the sync and it looks as though all the data that has been stored on devices since the issue occurred (one month ago) has been lost.
Just wanted to make you aware of this because the advice above from your team was that this was unlikely to happen due to the fact that we have client recovery on.
I also just wanted to double check that there’s no way of recovering this data? And also whether there’s anything I should have done differently which would have meant that client data would not have been lost? All I did was terminate and re-enable sync as requested.
Hi, that is unexpected. What should happen is that when the device re-connects after the client resetting it should re-upload all of the lost changes. Have those devices reconnected yet?
We are working on ways to make this not rely on the device reconnecting. Unfortunately, as of now, until the device reconnects the history has been wiped by terminating sync.
I am attaching this to an Epic in the hopes we can prioritize changing this interaction.
As a check-in, is the system healthy now and you can see changes flowing to/from atlas?
I can only speak for my own device, but after terminating and re-enabling sync when trying to log in the app crashes with the following error:
“The server has forgotten about this client-side file (Bad client file identifier (IDENT)). Please wipe the file on the client to resume synchronization”
The only fix I’ve been able to find to this in the past is deleting and reinstalling the app. So I did that and this, presumably, wiped the locally saved data and therefore it is gone forever. If there is a way of resolving the bad client error without deleting the app and therefore the locally saved data I’d be grateful if you’d let me know.
One thing to note though, which is curious, is that when I was testing the app on an Xcode simulator (essentially another device) over the past month since the issue occurred, I’m pretty sure I could see data that I saved on my actual device - suggesting that the data was not only saved to the device? In any case, when I log in on any device now, all the data I saved over the previous month has gone.
The system does seem to be healthy now, although I’ve not done a ton of testing. Immediately after syncing there were some strange events (values not being updated that I would expect to be) but this seems to have smoothed out for now.
By the way, I am currently looking at upgrading from the shared cluster, per your recommendation. Would the “serverless” option do the job? We have very low usage at the moment while we’re in the trial phase, so I think it would save us a lot of money compared to the “dedicated” option.
The error “The server has forgotten about this client-side file (Bad client file identifier (IDENT)). Please wipe the file on the client to resume synchronization” is just as you would expect, it is what happens when you terminate and re-enable sync and we force each client to reset. You should see those clients connect and re-upload their unsynced changes as long as they were built with an SDK released within the last year or so: https://www.mongodb.com/docs/atlas/app-services/sync/error-handling/client-resets/#client-reset-recovery-rules
As for this:
One thing to note though, which is curious, is that when I was testing the app on an Xcode simulator (essentially another device) over the past month since the issue occurred, I’m pretty sure I could see data that I saved on my actual device - suggesting that the data was not only saved to the device? In any case, when I log in on any device now, all the data I saved over the previous month has gone.
I think I am a little unclear on what you mean. Do you mind elaborating?
As for Serverless clusters, Device Sync does not yet support using them for the same reason we do not support migrations from Shared to Dedicated clusters. Can I use Realm Sync with Serverless Atlas
This is an unfortunate limitation but we are working with the Serverless team to remove the limitations that prevent us from being able to reliably sync data to/from serverless clusters.