MongoDB states that updating single documents is atomic and there is no question about it. However, it is rarely the case when only updating the document is enough, even if/ when working with a single document only. Here is what I mean:
Sample document structure:
A User 1 does the following:
Retrieves a document from the db
The retrieved document’s applyTax field value is set to True which causes the program to add X amount of money to the retrieved total field
Saves the updated document to the database
Simultaneously, or through a different endpoint, a User 2 does the following:
Retrieves the same document from db
Updates the document applyTax field itself by setting it to False which means that starting from this update no additional amount of money should be added.
Suppose the User 2’s update happens milliseconds after the User1’s step one. Would that skew the data? Is that a valid operational concern regarding atomicity? Or is such a scenario highly unlikely?
If this is indeed potentially problematic, how can this be obviated? Thank you.
Yes, but it’s a concern to you. Mongodb won’t care.
The problem is that read-then-update is not an atomic operation.
Mongo’s statement on single doc atomicity is that any update on a single doc matching a specific filter is atomic. This is different from read-then-write in application layer. (though a lock equivalent mechanism is still there)
You can use a transaction, or make sure this will not happen from application code. (e.g. use lease from threads)
Thank you for your reply. Regarding your transaction suggestion, I was thinking of that too. The issue here is that that would introduce quite a few transactions in the code. Because of that I have 2 concerns:
I have watched quite a few official MongoDB youtube videos on transactions and most of them transmit this vibe that if you have to reach for transactions in MongoDB, chances are you are doing something wrong. But in this use case, a transaction seems to be a valid choice, right?
Suppose there is a lot of logic happening between the READ and UPDATE operations. Is it a bad idea to put a lot of code inside the transaction’s callback?
If the User 2 hits the resource while it is still locked, will that return a TRANSIENT error?
I think @Kobe_W has answered most of the question here, but I’d like to add my 2 cents as well.
Yes this is why transaction was added to MongoDB. There are some workflow that necessitates modifying multiple documents atomically, and perhaps there’s no way around that fact. In those cases, then using transaction is definitely the right way forward.
Depends on how much code and what they’re doing, I think
There are different possible errors that can happen in a transaction. Transient transaction error generally means that it’s safe to retry, but the driver does not retry this automatically. See transaction error handling for more details.
MongoDB gives you a lot of freedom to design your schema. But in many cases, using SQL design methodologies is the default mindset, since we live with SQL for so long. SQL practically depends on the existence of transactions since an entity’s data is usually spread across many different tables. Thus, to modify that entity, you’ll need transactions to modify it atomically.
In contrast, MongoDB allows you to store an entity’s data as-is inside a single document. For myself, it’s helpful to think 1 entity == 1 document.
However different workflows have different requirements. Sometimes you need to modify multiple entities in a single command. This is where MongoDB’s transaction can help.
Otherwise, there are certain design patterns that may be able to help you minimize transaction use and maximize concurrency, with various levels of tradeoff.
To complement @kevinadi 's answer to our specific questions:
It’s always best to avoid keeping a transaction for too long time. The reason is transaction consumes resources, (especially in a sharded cluster) and can hold locks for write operations (i recall write locks are only released upon transaction completion). A long life-time transaction can give you trouble.
You can search for “transaction write conflict”. When a transaction B tries to modify a same doc that has been already modified by in-progress transaction A (and thus locked), it will raise this error and then abort. (i remember there’s one post asking why transaction B can’t be put into “blocked” state instead).
Depending on your detailed logic. read-then-write sometimes doesn’t need a transaction.
Let’s say you read a = 1, then do something in if (a == 1).... section, then update b to 2. In this simple case, for the write you can just use a=1 as the update filter. By this way, you only update b to 2 if a=1 still holds. (otherwise during your logic, some other requests have modified a’s value). So no need for a transaction.
Thank you for your detailed replies. I do understand the meaning of the Transaction error and that an attempted update operation on a resource that has not released its write lock will result in an error. What I’d like to specifically learn is if this specific case will trigger the TransientTransactionError . I have watched all the videos on MongoDB TXs on youtube and I had read your linked mongodb documentation articles previously, they do explain what a TransientTransactionError is and that it is up to the developer to identify it and address it, but I could not find a single mention whether it is thrown when attempting to modify a document that still has a write lock on it.
Sorry, I should have provided more context. It is a busy endpoint with a bunch of other roundtrips to MongoDB Atlas. An update operation normally takes around 200-300ms to complete without transactions. Also, it just feels very untidy having to push all of that inside a transaction callback with potentially inadvartent side effects (getting locks for queries that don’t need them by themselves). In general, all the articles and videos about transactions in MongoDB have so many disclaimers that I always get the feeling that unless the project is a national bank, transactions should be avoided and a better solution is right around the corner, one just has to identify it for their project. Transactions also seem like a very impactful decision because they require one to know what and when exactly documents are locked to prevent dead locks and/ or reduced performance.
Thank you very much @Kobe_W ! I have learned very recently about the Optimistic Concurrency Control, is your suggestion that?