Understanding locking/transactions in mongo

Hello, community,

I am writing a REST API in Golang and I am using MongoDB as the database. Some things about locking data/transactions are unclear to me. In my case, I am storing data in a two-dimensional array. When a user makes a request I am retrieving the corresponding document from the database. In Go I run some complex logic to check if the request is valid, change the two-dimensional array and update the document with the new array.

From my understanding, this could go wrong because if this user would make two rapid requests after each other or another user makes a request after the retrieval of the document the data will be old and invalid in which case the document will not have the new data and data may get corrupted.

What I would like to do is lock the document before I retrieve the document and only release the lock after I have run my logic and updated the document. I have not found out how to do this. I feel that the things I find about transactions are more about multi-document/collection updates and it doesn’t work with a standard database because you will need a cluster/replica-set?. Other answers advise using compounds like FindOneAndUpdate however this would not work in my case because I would need to run logic between “find” and “update”.

Kind regards

1 Like

I forgot to add that I also saw someone suggest adding a “islocked” property to the document and update this when you start and finish with the document. However, this does not seem right to me.

Hi @Egbert-Jan_Terpstra welcome to the community!

MongoDB using WiredTiger has document-level locking since at least MongoDB 3.0, so there will be no instant where the document is partly updated (unless your app runs multiple update commands that deliberately only update parts of a document).

From my understanding, this could go wrong because if this user would make two rapid requests after each other or another user makes a request after the retrieval of the document the data will be old and invalid in which case the document will not have the new data and data may get corrupted.

If I understand correctly, your workflow involves: 1) retrieving a document, 2) calculate new values for that document based on the document’s current state, then 3) push an updated document, and you’re worried that in-between steps 1 and 3, there are other threads that are trying to update the exact same document. Is this accurate?

If yes, using a transaction is a perfectly valid solution. Having a field called locked to act as a semaphore is also valid, although you’ll need to take care of cases where an update crashes between steps 1 & 3, rendering the document locked forever until it’s manually unlocked. From the application side, it is also possible to implement the semaphore construct outside of the database, to ensure that no two threads are working on the same document at one time.

In my opinion, the simplest & safest solution is to use transaction. Yes you’ll have to deploy a replica set (three data-bearing nodes as the minimum recommended for production), but the benefits are many. For example, you’ll gain high availability, redundancy and thus safer data storage, easier database maintenance, while also gaining transactions, change streams, and other replica-set specific features. In fact, I would not recommend you to deploy a standalone node for a production environment, ever. Using transactions, you would not need to do locked documents cleanup, nor have to implement special semaphore in your app.

Best regards
Kevin

3 Likes

Yes, you understood my workflow correct. Thanks for your advice I will use it! I think the locked field problem could be solved by having a timestamp which can max be x seconds valid. I do not suspect many users to edit the same document at the same time however will look at using replicasets and transactions.

Kind regards

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.