Introduction to Multi-Document ACID Transactions in Python
Rate this quickstart
Multi-document transactions arrived in in June 2018. MongoDB has always been transactional around updates to a single document. Now, with multi-document ACID transactions we can wrap a set of database operations inside a start and commit transaction call. This ensures that even with inserts and/or updates happening across multiple collections and/or databases, the external view of the data meets .
To demonstrate transactions in the wild we use a trivial example app that emulates a flight booking for an online airline application. In this simplified booking we need to undertake three operations:
- Allocate a seat in the
- Pay for the seat in the
- Update the count of allocated seats and sales in the
For this application we will use three separate collections for these documents as detailed above. The code in
transactions_main.pyupdates these collections in serial unless the
--usetxns argumentis used. We then wrap the complete set of operations inside an ACID transaction. The code in
transactions_main.pyis built directly using the MongoDB Python driver ().
The goal of this code is to demonstrate to the Python developers just how easy it is to covert existing code to transactions if required or to port older SQL based systems.
gitignore: Standard Github .gitignore for Python.
LICENSE: Apache's 2.0 (standard Github) license.
Makefile: Makefile with targets for default operations.
transaction_main.py: Run a set of writes with and without transactions. Run python
transactions_main.py -hfor help.
transactions_retry.py: The file containing the transactions retry functions.
watch_transactions.py: Use a MongoDB change stream to watch collections as they change when transactions_main.py is running.
kill_primary.py: Starts a MongoDB replica set (on port 7100) and kills the primary on a regular basis. This is used to emulate an election happening in the middle of a transaction.
featurecompatibility.py: check and/or set feature compatibility for the database (it needs to be set to "4.0" for transactions).
You can clone this repo and work alongside us during this blog post (please file any problems on the Issues tab in Github).
The Makefile outlines the operations that are required to setup the test environment.
All the programs in this example use a port range starting at 27100 to ensure that this example does not clash with an existing MongoDB installation.
To setup the environment you can run through the following steps manually. People that have
makecan speed up installation by using the
mlaunchprogram gives us a simple command to start a MongoDB replica set as transactions are only supported on a replica set.
Start a replica set whose name is txntest. See the
make init_servermake target for details:
There is a
Makefilewith targets for all these operations. For those of you on platforms without access to Make, it should be easy enough to cut and paste the commands out of the targets and run them on the command line.
You will need to have MongoDB 4.0 on your path. There are other convenience targets for starting the demo programs:
make notxns: start the transactions client without using transactions.
make usetxns: start the transactions client with transactions enabled.
make watch_seats: watch the seats collection changing.
make watch_payments: watch the payment collection changing.
The transactions example consists of two python programs.
You can choose to use
--randdelay. If you use both --delay takes precedence. The
--randdelayparameter creates a random delay between a lower and an upper bound that will be added between each insertion event.
transactions_main.pyprogram knows to use the txntest replica set and the right default port range.
To run the program without transactions you can run it with no arguments:
The program runs a function called
book_seat()which books a seat on a plane by adding documents to three collections. First it adds the seat allocation to the
seats_collection, then it adds a payment to the
payments_collection, finally it updates an audit count in the
audit_collection. (This is a much simplified booking process used purely for illustration).
The default is to run the program without using transactions. To use transactions we have to add the command line flag
--usetxns. Run this to test that you are running MongoDB 4.0 and that the correct is configured (it must be set to 4.0). If you install MongoDB 4.0 over an existing
/datadirectory containing 3.6 databases then featureCompatibility will be set to 3.6 by default and transactions will not be available.
Note: If you get the following error running python
transaction_main.py --usetxnsthat means you are picking up an older version of pymongo (older than 3.7.x) for which there is no multi-document transactions support.
To actually see the effect of transactions we need to watch what is happening inside the collections
Here is the
We need to watch each collection so in two separate terminal windows start the watcher.
Lets run the code without transactions first. If you examine the
transaction_main.pycode you will see a function
This program emulates a very simplified airline booking with a seat being allocated and then paid for. These are often separated by a reasonable time frame (e.g. seat allocation vs external credit card validation and anti-fraud check) and we emulate this by inserting a delay. The default is 1 second.
Now with the two
watch_transactions.pyscripts running for
payments_collectionwe can run
The first run is with no transactions enabled.
The bottom window shows
transactions_main.pyrunning. On the top left we are watching the inserts to the seats collection. On the top right we are watching inserts to the payments collection.
We can see that the payments window lags the seats window as the watchers only update when the insert is complete. Thus seats sold cannot be easily reconciled with corresponding payments. If after the third seat has been booked we CTRL-C the program we can see that the program exits before writing the payment. This is reflected in the Change Stream for the payments collection which only shows payments for seat 1A and 2A versus seat allocations for 1A, 2A and 3A.
If we want payments and seats to be instantly reconcilable and consistent we must execute the inserts inside a transaction.
Now lets run the same system with
We run with the exact same setup but now set
Note now how the change streams are interlocked and are updated in parallel. This is because all the updates only become visible when the transaction is committed. Note how we aborted the third transaction by hitting CTRL-C. Now neither the seat nor the payment appear in the change streams unlike the first example where the seat went through.
This is where transactions shine in world where all or nothing is the watchword. We never want to keeps seats allocated unless they are paid for.
In a MongoDB replica set all writes are directed to the Primary node. If the primary node fails or becomes inaccessible (e.g. due to a network partition) writes in flight may fail. In a non-transactional scenario the driver will recover from a single failure and . In a multi-document transaction we must recover and retry in the event of these kinds of transient failures. This code is encapsulated in
transaction_retry.py. We both retry the transaction and retry the commit to handle scenarios where the primary fails within the transaction and/or the commit operation.
In order to observe what happens during elections we can use the script
kill_primary.py. This script will start a replica-set and continuously kill the primary.
kill_primary.pyis running we can start up
transactions_main.pyagain using the
As you can see during elections the transaction will be aborted and must be retried. If you look at the
transaction_rety.pycode you will see how this happens. If a write operation encounters an error it will throw one of the following exceptions:
Within these exceptions there will be a label called . This label can be detected using the
has_error_label(label)function which is available in pymongo 3.7.x. Transient errors can be recovered from and the retry code in
transactions_retry.pyhas code that retries for both writes and commits (see above).
Multi-document transactions are the final piece of the jigsaw for SQL developers who have been shying away from trying MongoDB. ACID transactions make the programmer's job easier and give teams that are migrating from an existing SQL schema a much more consistent and convenient transition path.
As most migrations involving a move from highly normalised data structures to more natural and flexible nested JSON documents one would expect that the number of required multi-document transactions will be less in a properly constructed MongoDB application. But where multi-document transactions are required programmers can now include them using very similar syntax to SQL.
With ACID transactions in MongoDB 4.0 it can now be the first choice for an even broader range of application use cases.