How to choose read/write concerns from your application?

timw · August 22, 2022, 6:01pm

I’m posting in order to get some feedback on a general approach to selecting read/write concerns, in particular when isolated script executions don’t know they’re part of a wider “session”.

I’ve been working for years with MongoDB, but have always had a hard time with read/write concerns. In particular how to set the appropriate concerns when the different parts of my application don’t necessarily know the full context of what has recently happened. The concept of a consistent session is easy enough to model in a single script execution, but introduce (for example) stateless API calls in quick succession and the concept of a “session” starts to span operations that are actually related, but have no knowledge of each other.

Perhaps the safe thing to do would be to read and write majority for all operations, but some of my operations insert many thousands of documents and I want them done quickly. What I’ve ended up with is my application switching to unacknowledged writes when speed is required, then switching reads to local in case the next operation in the same script needs to read it back. It doesn’t really matter if unacknowledged writes get rolled back, but what does matter is that the next script execution has no context of the previous and so can’t choose the best read concern. To get around that I end up reading from the primary with local in most cases, which means no benefit from secondaries!

I want to sort this mess out in a simple fashion, while ensuring that:

Concurrent users working on the same data see the same data.
Stateless HTTP requests in rapid succession get apparent consistency.

One thing my application does know is how recently its state was modified. This is due to tracking all updates with a timestamp and incrementing number (in separate storage). So based on this I’m considering that my application could have two operating modes (Safe, and Fast) and switch between them based on recent activity.

Safe: Write majority, Read majority from Secondary.
Used when there have been no recent updates and fast writes aren’t necessary.
Fast: Write {w:0} Read local from Primary.
Used when write speed is required, or when there have been recent updates (which may still be running).

What do you think? How have other people solved this problem? Feedback much appreciated.

Aasawari · September 2, 2022, 7:50am

Hi @timw and welcome to the community!!

If I understand correctly, you have an application that basically needs to do two things that appear to be in contradiction: one mode of working is fast writes, read the most recent data, don’t care if it’s get rolled back, and the other mode is consistency. Typically an application is one or the other (but you wanted both), so I’m not sure I fully understand the requirements. Could you perhaps post some scenario that the application handles that would help illustrate the day to day operation of the application?

Also, to be more specific, could you please elaborate on a couple of points:

It doesn’t really matter if unacknowledged writes get rolled back, but what does matter is that the next script execution has no context of the previous and so can’t choose the best read concern.

This feels contradictory to me: since the writes are unacknowledged, there is no guarantee that the write even happened. Also what if the “next script execution” reads data that are rolled back? Is there a scenario that you can post so we can understand this better?

This is due to tracking all updates with a timestamp and incrementing number (in separate storage).

This follows the previous point: if the writes are unacknowledged and can be rolled back, why do you need to track these writes (that can disappear) on a separate storage?

To cater for different application needs, MongoDB provides various combinations of read and write concerns settings depending on the level of consistency and availability required. However this requirement is typically consistent application-wide (i.e. consistency, or availability), so I might misunderstand your use case that appears to call for both requirements in a single application.

Best Regards
Aasawari

timw · September 2, 2022, 10:54am

Thanks for the reply.

The idea that an application has one set of read/write concerns for all operations makes perfect sense. I guess the short answer to my very broad question is “if you care about state at all then go for consistency across the board”. The presence of any unacknowledged writes anywhere in an application potentially breaks this model.

I feel I’m trying to justify my requirements now, but I’ll try to clarify some points you highlighted…

Re this contradiction: Some functions of my application can write tens of thousands of documents while many others can write a maximum of one. Choosing speed in all cases would increase risk in my application for the many operations that don’t need it. Choosing consistency in all cases would make the largest operations much slower. If I could choose only one approach I would choose consistency, but selecting the right trade-off for the right situation seemed desirable (when I built the system 10 years ago!). It didn’t seem strange to me that I’d want to benefit from MongoDB’s write speed in select contexts, but maintain a stateful application in general.

When I say that failures “don’t matter”. I mean that they are very rare and any inconsistencies created by rollbacks are corrected soon afterwards. Hence accepting some risk in exchange for a lot of speed seemed reasonable, but only when necessary. I didn’t mean to say that I care about consistency sometimes, but not at other times. All operations are equal in this regard, but very occasional failure is tolerable.

I track updates mainly for the purpose of cache invalidation. Clients requesting unmodified data will get 304 responses (and this saves me a lot of juice). I figured the same mechanism could be used to put the application into a secondary read mode. The worst case here (in the event of failed writes) would be the client getting a 200 response with discarded data from the primary. This is actually the default mode of the current system, so I thought a safe incremental improvement.

I don’t mean to answer my own question, rather I’m curious if others have been through the same though process of choosing between consistency and speed at different times, and how they managed the potential consistency problems this creates.