Simon, Jerome,
All I can say is WOW these posts are a gut punch to read for those us here at MongoDB working hard to make these products better and it’s disappointing that we have not responded sooner.
To be intellectually honest and show some vulnerability, I think each of us that read your note line for line probably tried to respond and then felt ashamed and/or just didn’t quite know how to fully respond, and then said “I will come back to this later” and now it’s been 12 days and a second community member has responded on top with a similar albeit different issue and still 6 days have passed since then which likely caused folks who had planned or hoped to respond to further think “now how can I really even begin to respond”! None of that is to make any excuses, but only to simply start engaging with you.
THANK YOU both for sharing the long and incredibly frustrating journeys you’ve been on with us. I believe that both of you have experienced different issues even if they look and sound similar on the surface.
Simon, I believe you suffered from our M2 to M5 upgrade process running into an edge case in the brittleness of the backend processes that move the data (on the backend we pipe mongodump to mongorestore and have occasionally seen classes of errors that require manual intervention to fix; we have a plan to move to a more modern backend utility to power these upgrades in the future but unfortunately that utility is still in development and we’ve prioritized upgrades from our serverless environment to dedicated clusters ahead of M2 to M5 upgrades first which may have been a mistake in hindsight). The fact that you felt unable to get support when you needed it is also unacceptable – even if this was a small database, your users were counting on it and we let you down. The process you went through to pin down the data issues afterwards sounds nightmarish: I am still not 100% clear on whether you think the data issues derived from your app writing during the upgrade or restore, or if you believe the backed up data itself had the issue? if the latter that is very concerning.
And then Jerome, your issue I believe may be completely different, and related to the fact that upon upgrade, the oplog is not preserved–this can cause a Sync enabled application to lose the ability to stay in sync and to need to re-initialize. We are trying to figure out how to architecturally handle this situation more elegantly: it is unfortunately a nuanced and technically complex topic to properly address. Your suggestion around better ergonomics for managing this state is a good one: ideally we would not need the state at all.
Taking a step back I want to really celebrate both of you for taking a positive “help the community” tone instead of coming in hot and angry as I probably would have done after experiencing these really problematic experiences. Your patience and willingness to help us help the community is really an incredible sign of maturity that all of us at MongoDB appreciate.
-Andrew (SVP Cloud Products)
(we will reach out separately via email)