UpdateMany changes are occasionally 'reverted'

Ilia_Shkolyar · June 15, 2022, 2:22pm

We are using the csharp MongoDB.Driver library, version 2.13.1.

During our integration tests, the UpdateManyAsync method occasionally acts weirdly.
The flow is the following:

We call await UpdateManyAsync with a FilterDefinition and an UpdateDefinition.
Let’s say that we try to update a specific field for several entities which is currently X, and we want to update it to Y.
We print the UpdateResult of the operation and see the following fields in the result:
{"update_modified_count":3,"update_matched_count":3,"update_is_acknowledged":true}
We query the DB, retrieve the entities that should have been updated and verify that they were indeed updated. All entities have the field set to Y.
After several seconds, a different flow queries the DB and retrieves all entities.
In some cases (again, it happens from time to time), the field value is X and our tests fail.

Several facts:

There is no flow which updates these entities between steps 3 and 4.
This happens in our integration tests flow, in which the MongoDB is built via docker-compose.
No read-write replica or something of the sort.
There are several tests that suffer from this inconsistency, all of them use UpdateManyAsync.
We have yet to find it happen in our production environment, but it doesn’t mean it that it didn’t happen.
This happens once in a while, most of the times all works properly.
We have yet to see this with any other MongoDB driver API. Only in UpdateManyAsync.

The only guess we currently have (rather than some read-write replica implementation which we don’t really think exists when using a simple docker-compose setup), is some caching done by the csharp driver.
Is it possible that it sets some internal state (and therefore the changes are reflected properly when queried in the same flow) and fails to persist it to the actual DB from time to time?

Any help will really be appreciated here.

MaBeuLux88_xxx · June 16, 2022, 5:14pm

Hi @Ilia_Shkolyar and welcome back !

I could be completely wrong but here is my wild guess as this already happened to me, also in an integration tests / CICD setup.

Unit Tests or integration tests are supposed to be completely independent from one another. So on your computer, if you run them one by one => no problem.

The DB is reset with default values with X. You run the test. Confirm it’s now Y in your assert statements. All !

But often CICD run tests in parallel to reduce build time. So now if Integration Tests 1 and 2 are running in parallel, and they rely on the same MDB collection, you can have a conflict or a race condition that can randomly make your test succeed or fail.

Could this be what is happening here? The cache hypothesis doesn’t make sense though because the MDB collection could be altered by any other client so if you send a find command, it always HAS to get that data from the actual collection. Can’t cache anything here.

Cheers,
Maxime.

Ilia_Shkolyar · June 18, 2022, 6:46pm

Hello @MaBeuLux88_xxx!

First of all thanks for the response.
We have a dedicated tenantId field in our Mongo collections in order to fully support a “multi-tenant” approach.
We use the same mechanism in our integration tests, so each new test creates a unique tenant id.
This means that any DB operations (that indeed can happen in parallel) will not modify values for other tenants/tests.
So unfortunately no, this is not the case here.

Our first suspicion was that UpdateManyAsync is changing the fields in the background and the “acknowledgment” is just to identify that the operation will happen sometime in the future.
This made sense that if the DB is under heavy load the fields could be updated after some time which can cause the tests to fail some time.

But as I explained above, we added code that queries the DB and verifies the field values right after the await UpdateManyAsync is called, so that theory is invalid as well.

What else can lead the UpdateManyAsync behavior to “revert” its operation from time to time?
We are truly out of ideas here…

Thanks again for your help!

MaBeuLux88_xxx · June 18, 2022, 7:14pm

If you are in a mutli-doc ACID transaction that is aborted. But apparently that’s not the case here.
Or if the entire Replica Set performs a rollback operation because the Primary failed => Elect a secondary that was lagging 1s behind (these not replicated operations are now “lost” => Primary comes back online => has to rollback 1s of write operations.

But I guess that’s not that either. No nothing else really.

It’s not a sync issue with unrelsolved promises and the check is performed before the resolution of the promise?

Cheers,
Maxime.

Ilia_Shkolyar · June 18, 2022, 7:43pm

No promises.
Standard async-await with C# Mongo driver…

MaBeuLux88_xxx · June 19, 2022, 7:09pm

I don’t know if it’s called something else in C# but the principle is the same.