Replace operation: is its write stage atomic with regard to its filter stage?

kostix · August 16, 2020, 12:15pm

We’re using the replaceOne operation on the collection which uses the “filter” clause to check that it’s about to update a document with its certain field having some specific value.

All is good but since the documents in the collection are updated by multiple parties working in parallel, I’d like to know whether it is guaranteed that between the time the filter has found a document to be replaced — that is, the filter has matched, — and that very document gets replaced, it is impossible for that document to be replaced by a concurrently running operation. I failed to find any statement on this in the MongoDB docs.

We’re using MongoDB 3.6 (please don’t ask why) so using transactions is out of the question.

I have already asked this question on SO but it collected no constructive responses so far, so I’m asking here.

kevinadi · August 21, 2020, 6:05am

Hi @kostix welcome to the community.

replaceOne and similar commands like updateOne, findAndModify, etc. will only perform the operation if the document satisfies the filter criteria. They are a single atomic operation.

However, replaceOne() replaces the first matching document in the collection. That is, if you have multiple documents satisfying the filter criteria, it will replace the first one it sees, so if the filter criteria doesn’t identify a single unique document, the result could be unpredictable.

If you need to update/replace a single document, it’s typically recommended to use findAndModify since you’ll have more control if the filter critera can potentially match multiple documents (see Comparisons with the update Method).

Best regards,
Kevin

kostix · August 21, 2020, 8:32pm

Hi, Kevin!

Thanks for the response!

I would like to solicit a bit more expanded definition of atomicity there — if possible.
Maybe an example could help.

Each document in my collection has a unique identifier (the _id field is naturally used to store it) and an integer field which may be though of as a version (or revision) of a datum identified by a particular _id.
The piece of software making use of that collection periodically receives new revisions of particular documents (from the outside) and has to update them in the collection.
No matter which exactly MongoDB operation we intend to use for that, the logic for the replacement has to be this: “find a document with such and such _id and with the version less than what we are about to use as a replacement”.
That is, if the collection has a document with a version greater or equal to than what we have, do nothing, otherwise — perform replacement.

So far so good, but now let’s introduce more “updaters”: now more than a single client may receive an updated document and will attempt to replace its existing version in the collection.
What I’m afraid of — in this setting — is a following situation:

The collection has a document with _id=ABC and version=1
One of the updaters attempts to update this document with the data having version=2.
Another one attempts to update this document with the data having version=3.
Now the replaceOne operation issued by the first updater finds the document with the version field lower than what the updater is about to update it with, and so the replacement may proceed.
At the same time, the concurrently running second updater finds the same document, and the query also gives it the go-ahead so the replacement may proceed.

What I’m asking for is whether it’s possible that in the described case the replacement performed by the second updater (wanting to replace with version=3) might happen in between the query performed by the first updater (wanting to replace with version=2) allows it to proceed and it actually stores its document?

This way, there is a possibility of updating the document to a lower-than-should-have-been version irrespective of the check performed by the operation’s query.

That is what bothers me: the atomicity of the query and the update — as a sequence of operations.

(Sorry for the wall of text but I have tried hard to explain this problem.)

kevinadi · August 24, 2020, 1:06am

Hi @kostix

That is a very detailed scenario. I would note that even though MongoDB pre-4.0 doesn’t have multiple document transaction, MongoDB post-3.2 are using WiredTiger as the default storage engine, which is a modern key-value store that notably supports transactions. Internally, MongoDB with WiredTiger has been using transactions to perform all data manipulation work since MongoDB 3.0.

In fact, the scenario you described could be a bit more complicated if both the _id and the version fields are indexed (which it should, by the way ). Without leveraging WiredTiger’s transaction capabilities, there could be a moment where the document was updated but the index was not, leading to inconsistencies in the database. To ensure that the database is consistent at all times, MongoDB uses WiredTiger’s transaction extensively. The end result is, even with multiple threads/clients, there is not a moment where the database would be inconsistent from the point of view of any client. Hopefully this answers your question regarding atomicity and consistency.

Having said that, in practice it is very difficult for two or more clients to try to update a single document at precisely the same time, unless the schema design forces the clients to bottleneck on a single document. This would also lower your throughput severely since you would have less concurrency. Are you expecting multiple clients to hit one document at exactly the same time once your app scales?

Best regards,
Kevin

system · September 13, 2020, 5:17pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.