GIANT Stories at MongoDB

The Top 5 Reasons to attend a MongoDB.local conference

I recently attended (and spoke at!) my first MongoDB.local (or .local for short) in Dallas, Texas. If you've been wondering if it's worth taking a day out of your busy schedule to attend one, the answer is, "Yes!" Here are my top 5 reasons why.

Five Minute MongoDB - Change Streams and MongoDB 4.x

Change Streams are a powerful tool in MongoDB for monitoring changes in a collection's documents. They got even more powerful in MongoDB 4.0 enabling you to act on changes to any document in any collection in any database in your MongoDB deployment. Read this Five Minute MongoDB to find out how.

MongoDB in 2018

Welcome to that time of the year when we look back at what we've delivered to our users and community in the past year. Here's MongoDB's 2018, month by month - it's been a great year so let's start with...

January

The year started with the announcement of a new Go driver in development. Go is a popular language, used widely including inside MongoDB Inc, and we decided to start developing an official driver.

Exciting New Security Features in MongoDB 4.0

We would like in this blog post to highlight two of the most important security enhancements delivered in MongoDB version 4.0: SHA-256 support and TLS improvements.

The General Availability of 4.0 was announced on June 27, 2018 at MongoDB World 2018. Its multi-document ACID transactions, new native type conversions, Kubernetes integration, as well as new tooling and cloud services, make it our most significant release ever!

MongoDB 4.0: Non-Blocking Secondary Reads

Mat Keep

Technical, MongoDB 4.0

Many MongoDB users scale read performance by distributing their query load across secondary replicas. With the MongoDB 4.0 release, reads are no longer blocked when oplog entries are applied. Here's how

Java and MongoDB 4.0 Support for Multi-Document ACID Transactions

Introduction

MongoDB 4.0 adds support for multi-document ACID transactions.

But wait... Does that mean MongoDB did not support transactions until now? No, actually MongoDB has always supported transactions in the form of single document transactions. MongoDB 4.0 extends these transactional guarantees across multiple documents, multiple statements, multiple collections, and multiple databases. What good would a database be without any form of transactional data integrity guarantee?

Before we dive into this blog post, you can find all the code and try multi-document ACID transactions here.

Quick start

Step 1: Start MongoDB

Start a single node MongoDB ReplicaSet in version 4.0.0 minimum on localhost, port 27017.

If you use Docker:

  • You can use start-mongo.sh.
  • When you are done, you can use stop-mongo.sh.
  • If you want to connect to MongoDB with the Mongo Shell, you can use connect-mongo.sh.

If you prefer to start mongod manually:

  • mkdir /tmp/data && mongod --dbpath /tmp/data --replSet rs
  • mongo --eval 'rs.initiate()'

Step 2: Start Java

This demo contains two main programs: ChangeStreams.java and Transactions.java.

  • Change Steams allow you to be notified of any data changes within a MongoDB collection or database.
  • The Transaction process is the demo itself.

You need two shells to run them.

If you use Docker:

First shell:

./compile-docker.sh
./change-streams-docker.sh

Second shell:

./transactions-docker.sh

If you do not use Docker, you will need to install Maven 3.5.X and a JDK 10 (or JDK 8 minimum but you will need to update the Java versions in the pom.xml):

First shell:

./compile.sh
./change-streams.sh

Second shell:

./transactions.sh

Let’s compare our existing single document transactions with MongoDB 4.0’s ACID compliant multi-document transactions and see how we can leverage this new feature with Java.

Prior to MongoDB 4.0

Even in MongoDB 3.6 and earlier, every write operation is represented as a transaction scoped to the level of an individual document in the storage layer. Because the document model brings together related data that would otherwise be modeled across separate parent-child tables in a tabular schema, MongoDB’s atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications.

Every typical write operation modifying multiple documents actually happens in several independent transactions: one for each document.

Let’s take an example with a very simple stock management application.

First of all, I need a MongoDB Replica Set so please follow the instructions given above to start MongoDB.

Now let’s insert the following documents into a product collection:

MongoDB Enterprise rs:PRIMARY> db.product.insertMany([
    { "_id" : "beer", "price" : NumberDecimal("3.75"), "stock" : NumberInt(5) }, 
    { "_id" : "wine", "price" : NumberDecimal("7.5"), "stock" : NumberInt(3) }
])

Let’s imagine there is a sale on and we want to offer our customers a 20% discount on all our products.

But before applying this discount, we want to monitor when these operations are happening in MongoDB with Change Streams.

Execute the following in Mongo Shell:

cursor = db.product.watch([{$match: {operationType: "update"}}]);
while (!cursor.isExhausted()) {
  if (cursor.hasNext()) {
    print(tojson(cursor.next()));
  }
}

Keep this shell on the side, open another Mongo Shell and apply the discount:

PRIMARY> db.product.updateMany({}, {$mul: {price:0.8}})
{ "acknowledged" : true, "matchedCount" : 2, "modifiedCount" : 2 }
PRIMARY> db.product.find().pretty()
{
    "_id" : "beer",
    "price" : NumberDecimal("3.00000000000000000"),
    "stock" : 5
}
{
    "_id" : "wine",
    "price" : NumberDecimal("6.0000000000000000"),
    "stock" : 3
}

As you can see, both documents were updated with a single command line but not in a single transaction. Here is what we can see in the Change Stream shell:

{
    "_id" : {
        "_data" : "825B4637290000000129295A1004374DC58C611E4C8DA4E5EDE9CF309AC5463C5F6964003C62656572000004"
    },
    "operationType" : "update",
    "clusterTime" : Timestamp(1531328297, 1),
    "ns" : {
        "db" : "test",
        "coll" : "product"
    },
    "documentKey" : {
        "_id" : "beer"
    },
    "updateDescription" : {
        "updatedFields" : {
            "price" : NumberDecimal("3.00000000000000000")
        },
        "removedFields" : [ ]
    }
}
{
    "_id" : {
        "_data" : "825B4637290000000229295A1004374DC58C611E4C8DA4E5EDE9CF309AC5463C5F6964003C77696E65000004"
    },
    "operationType" : "update",
    "clusterTime" : Timestamp(1531328297, 2),
    "ns" : {
        "db" : "test",
        "coll" : "product"
    },
    "documentKey" : {
        "_id" : "wine"
    },
    "updateDescription" : {
        "updatedFields" : {
            "price" : NumberDecimal("6.0000000000000000")
        },
        "removedFields" : [ ]
    }
}

As you can see the cluster times (see the clusterTime key) of the two operations are different: the operations occurred during the same second but the counter of the timestamp has been incremented by one.

Thus here each document is updated one at a time and even if this happens really fast, someone else could read the documents while the update is running and see only one of the two products with the discount.

Most of the time, it is something you can tolerate in your MongoDB database because, as much as possible, we try to embed tightly linked, or related data in the same document. As a result, two updates on the same document happen within a single transaction :

PRIMARY> db.product.update({_id: "wine"},{$inc: {stock:1}, $set: {description : "It’s the best wine on Earth"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
PRIMARY> db.product.findOne({_id: "wine"})
{
    "_id" : "wine",
    "price" : NumberDecimal("6.0000000000000000"),
    "stock" : 4,
    "description" : "It’s the best wine on Earth"
}

However, sometimes, you cannot model all of your related data in a single document, and there are a lot of valid reasons for choosing not to embed documents.

MongoDB 4.0 with multi-document ACID transactions

Multi-document ACID transactions in MongoDB are very similar to what you probably already know from traditional relational databases.

MongoDB’s transactions are a conversational set of related operations that must atomically commit or fully rollback with all-or-nothing execution.

Transactions are used to make sure operations are atomic even across multiple collections or databases. Thus, with snapshot isolation reads, another user can only see all the operations or none of them.

Let’s now add a shopping cart to our example.

For this example, 2 collections are required because we are dealing with 2 different business entities: the stock management and the shopping cart each client can create during shopping. The lifecycle of each document in these collections is different.

A document in the product collection represents an item I’m selling. This contains the current price of the product and the current stock. I created a POJO to represent it : Product.java.

{ "_id" : "beer", "price" : NumberDecimal("3"), "stock" : NumberInt(5) }

A shopping cart is created when a client adds its first item in the cart and is removed when the client proceeds to checkout or leaves the website. I created a POJO to represent it : Cart.java.

{
    "_id" : "Alice",
    "items" : [
        {
            "price" : NumberDecimal("3"),
            "productId" : "beer",
            "quantity" : NumberInt(2)
        }
    ]
}

The challenge here resides in the fact that I cannot sell more than I possess: if I have 5 beers to sell, I cannot have more than 5 beers distributed across the different client carts.

To ensure that, I have to make sure that the operation creating or updating the client cart is atomic with the stock update. That’s where the multi-document transaction comes into play. The transaction must fail in the case someone tries to buy something I do not have in my stock. I will add a constraint on the product stock:

db.createCollection("product", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "_id", "price", "stock" ],
         properties: {
            _id: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            price: {
               bsonType: "decimal",
               minimum: 0,
               description: "must be a positive decimal and is required"
            },
            stock: {
               bsonType: "int",
               minimum: 0,
               description: "must be a positive integer and is required"
            }
         }
      }
   }
})

Node that this is already included in the Java code.

To monitor our example, we are going to use MongoDB Change Streams that were introduced in MongoDB 3.6.

In each of the threads of this process called ChangeStreams.java, I am going to monitor one of the 2 collections and print each operation with its associated cluster time.

// package and imports

public class ChangeStreams {

    private static final Bson filterUpdate = Filters.eq("operationType", "update");
    private static final Bson filterInsertUpdate = Filters.in("operationType", "insert", "update");
    private static final String jsonSchema = "{ $jsonSchema: { bsonType: \"object\", required: [ \"_id\", \"price\", \"stock\" ], properties: { _id: { bsonType: \"string\", description: \"must be a string and is required\" }, price: { bsonType: \"decimal\", minimum: 0, description: \"must be a positive decimal and is required\" }, stock: { bsonType: \"int\", minimum: 0, description: \"must be a positive integer and is required\" } } } } ";

    public static void main(String[] args) {
        MongoDatabase db = initMongoDB(args[0]);
        MongoCollection<Cart> cartCollection = db.getCollection("cart", Cart.class);
        MongoCollection<Product> productCollection = db.getCollection("product", Product.class);
        ExecutorService executor = Executors.newFixedThreadPool(2);
        executor.submit(() -> watchChangeStream(productCollection, filterUpdate));
        executor.submit(() -> watchChangeStream(cartCollection, filterInsertUpdate));
        ScheduledExecutorService scheduled = Executors.newSingleThreadScheduledExecutor();
        scheduled.scheduleWithFixedDelay(System.out::println, 0, 1, TimeUnit.SECONDS);
    }

    private static void watchChangeStream(MongoCollection<?> collection, Bson filter) {
        System.out.println("Watching " + collection.getNamespace());
        List<Bson> pipeline = Collections.singletonList(Aggregates.match(filter));
        collection.watch(pipeline)
                  .fullDocument(FullDocument.UPDATE_LOOKUP)
                  .forEach((Consumer<ChangeStreamDocument<?>>) doc -> System.out.println(
                          doc.getClusterTime() + " => " + doc.getFullDocument()));
    }

    private static MongoDatabase initMongoDB(String mongodbURI) {
        getLogger("org.mongodb.driver").setLevel(Level.SEVERE);
        CodecRegistry providers = fromProviders(PojoCodecProvider.builder().register("com.mongodb.models").build());
        CodecRegistry codecRegistry = fromRegistries(MongoClient.getDefaultCodecRegistry(), providers);
        MongoClientOptions.Builder options = new MongoClientOptions.Builder().codecRegistry(codecRegistry);
        MongoClientURI uri = new MongoClientURI(mongodbURI, options);
        MongoClient client = new MongoClient(uri);
        MongoDatabase db = client.getDatabase("test");
        db.drop();
        db.createCollection("cart");
        db.createCollection("product", productJsonSchemaValidator());
        return db;
    }

    private static CreateCollectionOptions productJsonSchemaValidator() {
        return new CreateCollectionOptions().validationOptions(
                new ValidationOptions().validationAction(ValidationAction.ERROR).validator(BsonDocument.parse(jsonSchema)));
    }
}

In this example we have 5 beers to sell. Alice wants to buy 2 beers but we are not going to use the new MongoDB 4.0 multi-document transactions for this. We will observe in the change streams two operations : one creating the cart and one updating the stock at 2 different cluster times.

Then Alice adds 2 more beers in her cart and we are going to use a transaction this time. The result in the change stream will be 2 operations happening at the same cluster time.

Finally, she will try to order 2 extra beers but the jsonSchema validator will fail the product update and result in a rollback. We will not see anything in the change stream. Here is the Transaction.java source code:

// package and import

public class Transactions {

    private static MongoClient client;
    private static MongoCollection<Cart> cartCollection;
    private static MongoCollection<Product> productCollection;

    private final BigDecimal BEER_PRICE = BigDecimal.valueOf(3);
    private final String BEER_ID = "beer";

    private final Bson stockUpdate = inc("stock", -2);
    private final Bson filterId = eq("_id", BEER_ID);
    private final Bson filterAlice = eq("_id", "Alice");
    private final Bson matchBeer = elemMatch("items", eq("productId", "beer"));
    private final Bson incrementBeers = inc("items.$.quantity", 2);

    public static void main(String[] args) {
        initMongoDB(args[0]);
        new Transactions().demo();
    }

    private static void initMongoDB(String mongodbURI) {
        getLogger("org.mongodb.driver").setLevel(Level.SEVERE);
        CodecRegistry codecRegistry = fromRegistries(MongoClient.getDefaultCodecRegistry(), fromProviders(
                PojoCodecProvider.builder().register("com.mongodb.models").build()));
        MongoClientOptions.Builder options = new MongoClientOptions.Builder().codecRegistry(codecRegistry);
        MongoClientURI uri = new MongoClientURI(mongodbURI, options);
        client = new MongoClient(uri);
        MongoDatabase db = client.getDatabase("test");
        cartCollection = db.getCollection("cart", Cart.class);
        productCollection = db.getCollection("product", Product.class);
    }

    private void demo() {
        clearCollections();
        insertProductBeer();
        printDatabaseState();
        System.out.println("#########  NO  TRANSACTION #########");
        System.out.println("Alice wants 2 beers.");
        System.out.println("We have to create a cart in the 'cart' collection and update the stock in the 'product' collection.");
        System.out.println("The 2 actions are correlated but can not be executed on the same cluster time.");
        System.out.println("Any error blocking one operation could result in stock error or beer sale we don't own.");
        System.out.println("---------------------------------------------------------------------------");
        aliceWantsTwoBeers();
        sleep();
        removingBeersFromStock();
        System.out.println("####################################\n");
        printDatabaseState();
        sleep();
        System.out.println("\n######### WITH TRANSACTION #########");
        System.out.println("Alice wants 2 extra beers.");
        System.out.println("Now we can update the 2 collections simultaneously.");
        System.out.println("The 2 operations only happen when the transaction is committed.");
        System.out.println("---------------------------------------------------------------------------");
        aliceWantsTwoExtraBeersInTransactionThenCommitOrRollback();
        sleep();
        System.out.println("\n######### WITH TRANSACTION #########");
        System.out.println("Alice wants 2 extra beers.");
        System.out.println("This time we do not have enough beers in stock so the transaction will rollback.");
        System.out.println("---------------------------------------------------------------------------");
        aliceWantsTwoExtraBeersInTransactionThenCommitOrRollback();
        client.close();
    }

    private void aliceWantsTwoExtraBeersInTransactionThenCommitOrRollback() {
        ClientSession session = client.startSession();
        try {
            session.startTransaction(TransactionOptions.builder().writeConcern(WriteConcern.MAJORITY).build());
            aliceWantsTwoExtraBeers(session);
            sleep();
            removingBeerFromStock(session);
            session.commitTransaction();
        } catch (MongoCommandException e) {
            session.abortTransaction();
            System.out.println("####### ROLLBACK TRANSACTION #######");
        } finally {
            session.close();
            System.out.println("####################################\n");
            printDatabaseState();
        }
    }

    private void removingBeersFromStock() {
        System.out.println("Trying to update beer stock : -2 beers.");
        try {
            productCollection.updateOne(filterId, stockUpdate);
        } catch (MongoCommandException e) {
            System.out.println("#####   MongoCommandException  #####");
            System.out.println("##### STOCK CANNOT BE NEGATIVE #####");
            throw e;
        }
    }

    private void removingBeerFromStock(ClientSession session) {
        System.out.println("Trying to update beer stock : -2 beers.");
        try {
            productCollection.updateOne(session, filterId, stockUpdate);
        } catch (MongoCommandException e) {
            System.out.println("#####   MongoCommandException  #####");
            System.out.println("##### STOCK CANNOT BE NEGATIVE #####");
            throw e;
        }
    }

    private void aliceWantsTwoBeers() {
        System.out.println("Alice adds 2 beers in her cart.");
        cartCollection.insertOne(new Cart("Alice", Collections.singletonList(new Cart.Item(BEER_ID, 2, BEER_PRICE))));
    }

    private void aliceWantsTwoExtraBeers(ClientSession session) {
        System.out.println("Updating Alice cart : adding 2 beers.");
        cartCollection.updateOne(session, and(filterAlice, matchBeer), incrementBeers);
    }

    private void insertProductBeer() {
        productCollection.insertOne(new Product(BEER_ID, 5, BEER_PRICE));
    }

    private void clearCollections() {
        productCollection.deleteMany(new BsonDocument());
        cartCollection.deleteMany(new BsonDocument());
    }

    private void printDatabaseState() {
        System.out.println("Database state:");
        printProducts(productCollection.find().into(new ArrayList<>()));
        printCarts(cartCollection.find().into(new ArrayList<>()));
        System.out.println();
    }

    private void printProducts(List<Product> products) {
        products.forEach(System.out::println);
    }

    private void printCarts(List<Cart> carts) {
        if (carts.isEmpty())
            System.out.println("No carts...");
        else
            carts.forEach(System.out::println);
    }

    private void sleep() {
        System.out.println("Sleeping 3 seconds...");
        try {
            Thread.sleep(3000);
        } catch (InterruptedException e) {
            System.err.println("Oups...");
            e.printStackTrace();
        }
    }
}

Here is the console of the Change Stream :

$ ./change-streams.sh 

Watching test.cart
Watching test.product

Timestamp{value=6570052721557110786, seconds=1529709604, inc=2} => Cart{id='Alice', items=[Item{productId=beer, quantity=2, price=3}]}



Timestamp{value=6570052734442012673, seconds=1529709607, inc=1} => Product{id='beer', stock=3, price=3}






Timestamp{value=6570052764506783745, seconds=1529709614, inc=1} => Product{id='beer', stock=1, price=3}
Timestamp{value=6570052764506783745, seconds=1529709614, inc=1} => Cart{id='Alice', items=[Item{productId=beer, quantity=4, price=3}]}

As you can see here, we only get four operations because the two last operations were never committed to the database, and therefore the change stream has nothing to show.

You can also note that the two first cluster times are different because we did not use a transaction for the two first operations, and the two last operations share the same cluster time because we used the new MongoDB 4.0 multi-document transaction system, and thus they are atomic.

Here is the console of the Transaction java process that sum up everything I said earlier.

$ ./transactions.sh 
Database state:
Product{id='beer', stock=5, price=3}
No carts...

#########  NO  TRANSACTION #########
Alice wants 2 beers.
We have to create a cart in the 'cart' collection and update the stock in the 'product' collection.
The 2 actions are correlated but can not be executed on the same cluster time.
Any error blocking one operation could result in stock error or a sale of beer that we can’t fulfill as we have no stock.
---------------------------------------------------------------------------
Alice adds 2 beers in her cart.
Sleeping 3 seconds...
Trying to update beer stock : -2 beers.
####################################

Database state:
Product{id='beer', stock=3, price=3}
Cart{id='Alice', items=[Item{productId=beer, quantity=2, price=3}]}

Sleeping 3 seconds...

######### WITH TRANSACTION #########
Alice wants 2 extra beers.
Now we can update the 2 collections simultaneously.
The 2 operations only happen when the transaction is committed.
---------------------------------------------------------------------------
Updating Alice cart : adding 2 beers.
Sleeping 3 seconds...
Trying to update beer stock : -2 beers.
####################################

Database state:
Product{id='beer', stock=1, price=3}
Cart{id='Alice', items=[Item{productId=beer, quantity=4, price=3}]}

Sleeping 3 seconds...

######### WITH TRANSACTION #########
Alice wants 2 extra beers.
This time we do not have enough beers in stock so the transaction will rollback.
---------------------------------------------------------------------------
Updating Alice cart : adding 2 beers.
Sleeping 3 seconds...
Trying to update beer stock : -2 beers.
#####   MongoCommandException  #####
##### STOCK CANNOT BE NEGATIVE #####
####### ROLLBACK TRANSACTION #######
####################################

Database state:
Product{id='beer', stock=1, price=3}
Cart{id='Alice', items=[Item{productId=beer, quantity=4, price=3}]}

Next Steps

Thanks for taking the time to read my post - I hope you found it useful and interesting. As a reminder, all the code is available on this Github repository for you to experiment.

If you are looking for a very simple way to get started with MongoDB, you can do that in just 5 clicks on our MongoDB Atlas database service in the cloud.

Also, multi-document ACID transactions is not the only new feature in MongoDB 4.0, so feel free to take a look at our free course on MongoDB University M040: New Features and Tools in MongoDB 4.0 and our guide to what’s new in MongoDB 4.0 where you can learn more about native type conversions, new visualization and analytics tools, and Kubernetes integration.

Introduction to MongoDB Transactions in Python

Multi-document transactions arrived in MongoDB 4.0 in June 2018. MongoDB has always been transactional around updates to a single document. Now, with multi-document transactions we can wrap a set of database operations inside a start and commit transaction call. This ensures that even with inserts and/or updates happening across multiple collections and/or databases, the external view of the data meets ACID constraints.

To demonstrate transactions in the wild we use a trivial example app that emulates a flight booking for an online airline application. In this simplified booking we need to undertake three operations:

  1. Allocate a seat (seat_collection)
  2. Pay for the seat (payment_collection)
  3. Update the count of allocated seats and sales (audit_collection)

For this application we will use three separate collections for these documents as detailed in bold above. The code in transactions_main.py updates these collections in serial unless the --usetxns argument is used. We then wrap the complete set of operations inside an ACID transaction. The code in transactions_main.py is built directly using the MongoDB Python driver (Pymongo 3.7.1). See the section on client sessions for an overview of the new transactions API in 3.7.1.

The goal of this code is to demonstrate to the Python developer just how easy it is to covert existing code to transactions if required or to port older SQL based systems.

Setting up your environment

The following files can be found in the associated github repo, pymongo-transactions.

  • gitignore : Standard Github .gitignore for Python
  • LICENSE : Apache's 2.0 (standard Github) license
  • Makefile : Makefile with targets for default operations
  • transaction_main.py : Run a set of writes with and without transactions. Run python transactions_main.py -h for help.
  • transactions_retry.py : The file containing the transactions retry functions.
  • watch_transactions.py : Use a MongoDB change stream to watch collections as they change when transactions_main.py is running
  • kill_primary.py : Starts a MongoDB replica set (on port 7100) and kills the primary on a regular basis. This is used to emulate an election happening in the middle of a transaction.
  • featurecompatibility.py : check and/or set feature compatibility for the database (it needs to be set to "4.0" for transactions)
You can clone this repo and work alongside us during this blog post (please file any problems on the Issues tab for the repo).

We assume for all that follows that you have Python 3.6 or greater correctly installed and on your path.

The Makefile outlines the operations that are required to setup the test environment.

All the programs in this example use a port range starting at 27100 to ensure that this example does not clash with an existing MongoDB installation.

Preparation

To setup the environment you can run through the following steps manually. People that have make can speed up installation by using the make install command.

Set a python virtualenv

$ cd pymongo-transactions
$ virtualenv -p python3 venv
$ source venv/bin/activate

Install Python MongoDB Driver, pymongo

Install the latest version of the PyMongo MongoDB Driver (3.7.1 at the time of writing).

pip install --upgrade pymongo

Install Mtools

MTools is a collection of helper scripts to parse, filter, and visualize MongoDB log files (mongod, mongos). mtools also includes mlaunch, a utility to quickly set up complex MongoDB test environments on a local machine. For this demo we are only going to use the mlaunch program.

pip install mtools

the mlaunch program also requires the psutil package.

pip install psutil

The mlaunch program gives us a simple command to start a MongoDB replica set as transactions are only supported on a replica set

Start a replica set whose name is txntest. (see the make init_server make target) for details:

mlaunch init --port 27100 --replicaset --name "txntest"

Using the Makefile for configuration

There is a Makefile with targets for all these operations. For those of you on platforms without access to Make it should be easy enough to cut and paste the commands out of the targets and run them on the command line.

Running the Makefile

cd pymongo-transactions
make

You will need to have MongoDB 4.0 on your path. There are other convenience targets for starting the demo programs:

  • make notxns : start the transactions client without using transactions
  • make usetxns : start the transactions client with transactions enabled
  • make watch_seats : watch the seats collection changing
  • make watch_payments : watch the payment collection changing

Running the transactions example

The transactions example consists of two python programs. transaction_main.py and watch_transactions.py.

Running transactions_main.py

$ python transaction_main.py -h
usage: transaction_main.py [-h] [--host HOST] [--usetxns] [--delay DELAY]
                           [--iterations ITERATIONS]
                           [--randdelay RANDDELAY RANDDELAY]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           MongoDB URI [default: mongodb://localhost:27100,localh
                        ost:27101,localhost:27102/?replicaSet=txntest&retryWri
                        tes=true]
  --usetxns             Use transactions [default: False]
  --delay DELAY         Delay between two insertion events [default: 1.0]
  --iterations ITERATIONS
                        Run N iterations. O means run forever
  --randdelay RANDDELAY RANDDELAY
                        Create a delay set randomly between the two bounds
                        [default: None]

You can choose to use --delay or --randdelay. if you use both --delay takes precedence. The--randdelay parameter creates a random delay between a lower and an upper bound that will be added between each insertion event.

The transactions_main.py program knows to use the txntest replica set and the right default port range.

To run the program without transactions you can run it with no arguments:

$ python transaction_main.py
using collection: SEATSDB.seats
using collection: PAYMENTSDB.payments
using collection: AUDITDB.audit
Using a fixed delay of 1.0

1. Booking seat: '1A'
1. Sleeping: 1.000
1. Paying 330 for seat '1A'
2. Booking seat: '2A'
2. Sleeping: 1.000
2. Paying 450 for seat '2A'
3. Booking seat: '3A'
3. Sleeping: 1.000
3. Paying 490 for seat '3A'
4. Booking seat: '4A'
4. Sleeping: 1.000
^C

The program runs a function called book_seat() which books a seat on a plane by adding documents to three collections. First it adds the seat allocation to the seats_collection, then it adds a payment to the payments_collection`, finally it updates an audit count in the audit_collection. (This is a much simplified booking process used purely for illustration).

The default is to run the program without using transactions. To use transactions we have to add the command line flag --usetxns. Run this to test that you are running MongoDB 4.0 and that the correct featureCompatibility is configured (it must be set to 4.0). If you install MongoDB 4.0 over an existing /data directory containing 3.6 databases then featureCompatibility will be set to 3.6 by default and transactions will not be available.

Note: If you get the following error running python transaction_main.py --usetxns that means you are picking up an older version of pymongo (older than 3.7.x) for which there is no multi-document transactions support.

Traceback (most recent call last):
  File "transaction_main.py", line 175, in 
    total_delay = total_delay + run_transaction_with_retry( booking_functor, session)
  File "/Users/jdrumgoole/GIT/pymongo-transactions/transaction_retry.py", line 52, in run_transaction_with_retry
    with session.start_transaction():
AttributeError: 'ClientSession' object has no attribute 'start_transaction'

Watching Transactions

To actually see the effect of transactions we need to watch what is happening inside the collections SEATSDB.seats and PAYMENTSDB.payments.

We can do this with watch_transactions.py. This script uses MongoDB Change Streams to see what's happening inside a collection in real-time. We need to run two of these in parallel so it's best to line them up side by side.

Here is the watch_transactions.py program:

$ python watch_transactions.py -h
usage: watch_transactions.py [-h] [--host HOST] [--collection COLLECTION]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           mongodb URI for connecting to server [default:
                        mongodb://localhost:27100/?replicaSet=txntest]
  --collection COLLECTION
                        Watch  [default:
                        PYTHON_TXNS_EXAMPLE.seats_collection]

We need to watch each collection so in two separate terminal windows start the watcher.

Window 1:

$ python watch_transactions.py --watch seats
Watching: seats
...

Window 2:

$ python watch_transactions.py --watch payments
Watching: payments
...

What Happens when you run without transactions?

Lets run the code without transactions first. If you examine the transaction_main.py code you will see a function book_seats.

def book_seat(seats, payments, audit, seat_no, delay_range, session=None):
    '''
    Run two inserts in sequence.
    If session is not None we are in a transaction

    :param seats: seats collection
    :param payments: payments collection
    :param seat_no: the number of the seat to be booked (defaults to row A)
    :param delay_range: A tuple indicating a random delay between two ranges or a single float fixed delay
    :param session: Session object required by a MongoDB transaction
    :return: the delay_period for this transaction
    '''
    price = random.randrange(200, 500, 10)
    if type(delay_range) == tuple:
        delay_period = random.uniform(delay_range[0], delay_range[1])
    else:
        delay_period = delay_range

    # Book Seat
    seat_str = "{}A".format(seat_no)
    print(count( i, "Booking seat: '{}'".format(seat_str)))
    seats.insert_one({"flight_no" : "EI178",
                      "seat"      : seat_str,
                      "date"      : datetime.datetime.utcnow()},
                     session=session)
    print(count( seat_no, "Sleeping: {:02.3f}".format(delay_period)))
    #pay for seat
    time.sleep(delay_period)
    payments.insert_one({"flight_no" : "EI178",
                         "seat"      : seat_str,
                         "date"      : datetime.datetime.utcnow(),
                         "price"     : price},
                        session=session)
    audit.update_one({ "audit" : "seats"}, { "$inc" : { "count" : 1}}, upsert=True)
    print(count(seat_no, "Paying {} for seat '{}'".format(price, seat_str)))

    return delay_period

This program emulates a very simplified airline booking with a seat being allocated and then paid for. These are often separated by a reasonable time frame (e.f. seat allocation vs external credit card validation and anti-fraud check) and we emulate this by inserting a delay. The default is 1 second.

Now with the two watch_transactions.py scripts running for seats_collection and payments_collection we can run transactions_main.py as follows:

$ python transaction_main.py

The first run is with no transactions enabled.

The bottom window shows transactions_main.py running. On the top left we are watching the inserts to the seats collection. On the top right we are watching inserts to the payments collection.

watching without transactions

We can see that the payments window lags the seats window as the watchers only update when the insert is complete. Thus seats sold cannot be easily reconciled with corresponding payments. If after the third seat has been booked we CTRL-C the program we can see that the program exits before writing the payment. This is reflected in the Change Stream for the payments collection which only shows payments for seat 1A and 2A versus seat allocations for 1A, 2A and 3A.

If we want payments and seats to be instantly reconcilable and consistent we must execute the inserts inside a transaction.

What happens when you run with Transactions?

Now lets run the same system with --usetxns enabled.

$ python transaction_main.py --usetxns

We run with the exact same setup but now set --usetxns.

watching with transactions

Note now how the change streams are interlocked and are updated in parallel. This is because all the updates only become visible when the transaction is committed. Note how we aborted the third transaction by hitting CTRL-C. Now neither the seat nor the payment appear in the change streams unlike the first example where the seat went through.

This is where transactions shine in world where all or nothing is the watchword. We never want to keeps seats allocated unless they are paid for.

What happens during failure?

In a MongoDB replica set all writes are directed to the Primary node. If the primary node fails or becomes inaccessible (e.g. due to a network partition) writes in flight may fail. In a non-transactional scenario the driver will recover from a single failure and retry the write. In a multi-document transaction we must recover and retry in the event of these kinds of transient failures. This code is encapsulated in transaction_retry.py. We both retry the transaction and retry the commit to handle scenarios where the primary fails within the transaction and/or the commit operation.

def commit_with_retry(session):
    while True:
        try:
            # Commit uses write concern set at transaction start.
            session.commit_transaction()
            print("Transaction committed.")
            break
        except (pymongo.errors.ConnectionFailure, pymongo.errors.OperationFailure) as exc:
            # Can retry commit
            if exc.has_error_label("UnknownTransactionCommitResult"):
                print("UnknownTransactionCommitResult, retrying "
                      "commit operation ...")
                continue
            else:
                print("Error during commit ...")
                raise

def run_transaction_with_retry(functor, session):
    assert (isinstance(functor, Transaction_Functor))
    while True:
        try:
            with session.start_transaction():
                result=functor(session)  # performs transaction
                commit_with_retry(session)
            break
        except (pymongo.errors.ConnectionFailure, pymongo.errors.OperationFailure) as exc:
            # If transient error, retry the whole transaction
            if exc.has_error_label("TransientTransactionError"):
                print("TransientTransactionError, retrying "
                      "transaction ...")
                continue
            else:
                raise

    return result

In order to observe what happens during elections we can use the script kill_primary.py. This script will start a replica-set and continuously kill the primary.

$ make kill_primary
. venv/bin/activate && python kill_primary.py
no nodes started.
Current electionTimeoutMillis: 500
1. (Re)starting replica-set
no nodes started.
1. Getting list of mongod processes
Process list written to mlaunch.procs
1. Getting replica set status
1. Killing primary node: 31029
1. Sleeping: 1.0
2. (Re)starting replica-set
launching: "/usr/local/mongodb/bin/mongod" on port 27101
2. Getting list of mongod processes
Process list written to mlaunch.procs
2. Getting replica set status
2. Killing primary node: 31045
2. Sleeping: 1.0
3. (Re)starting replica-set
launching: "/usr/local/mongodb/bin/mongod" on port 27102
3. Getting list of mongod processes
Process list written to mlaunch.procs
3. Getting replica set status
3. Killing primary node: 31137
3. Sleeping: 1.0

kill_primary.py resets electionTimeOutMillis to 500ms from its default of 10000ms (10 seconds). This allows elections to resolve more quickly for the purposes of this test as we are running everything locally.

Once kill_primary.py is running we can start up transactions_main.py again using the --usetxns argument.


$ make usetxns
. venv/bin/activate && python transaction_main.py --usetxns
Forcing collection creation (you can't create collections inside a txn)
Collections created
using collection: PYTHON_TXNS_EXAMPLE.seats
using collection: PYTHON_TXNS_EXAMPLE.payments
using collection: PYTHON_TXNS_EXAMPLE.audit
Using a fixed delay of 1.0
Using transactions

1. Booking seat: '1A'
1. Sleeping: 1.000
1. Paying 440 for seat '1A'
Transaction committed.
2. Booking seat: '2A'
2. Sleeping: 1.000
2. Paying 330 for seat '2A'
Transaction committed.
3. Booking seat: '3A'
3. Sleeping: 1.000
TransientTransactionError, retrying transaction ...
3. Booking seat: '3A'
3. Sleeping: 1.000
3. Paying 240 for seat '3A'
Transaction committed.
4. Booking seat: '4A'
4. Sleeping: 1.000
4. Paying 410 for seat '4A'
Transaction committed.
5. Booking seat: '5A'
5. Sleeping: 1.000
5. Paying 260 for seat '5A'
Transaction committed.
6. Booking seat: '6A'
6. Sleeping: 1.000
TransientTransactionError, retrying transaction ...
6. Booking seat: '6A'
6. Sleeping: 1.000
6. Paying 380 for seat '6A'
Transaction committed.
...

As you can see during elections the transaction will be aborted and must be retried. If you look at the transaction_rety.py code you will see how this happens. If a write operation encounters an error it will throw one of the following exceptions:

Within these exceptions there will be a label called TransientTransactionError. This label can be detected using the has_error_label(label) function which is available in pymongo 3.7.x. Transient errors can be recovered from and the retry code in transactions_retry.py has code that retries for both writes and commits (see above).

Conclusions

Multi-document transactions are the final piece of the jigsaw for SQL developers who have been shying away from trying MongoDB. ACID transactions make the programmer's job easier and give teams that are migrating from an existing SQL schema a much more consistent and convenient transition path.

As most migrations involving a move from highly normalised data structures to more natural and flexible nested JSON documents one would expect that the number of required multi-document transactions will be less in a properly constructed MongoDB application. But where multi-document transactions are required programmers can now include them using very similar syntax to SQL.

With ACID transactions in MongoDB 4.0 it can now be the first choice for an even broader range of application use cases.

Why not try our transactions today by setting up your first cluster on MongoDB Atlas our Database as a Service offering.

To try it locally download MongoDB 4.0.


Join us at MongoDB Europe 2018 for deep-dive technical sessions and hands-on tutorials.

Introducing the Best Database for Modern Applications

The announcements we made today at MongoDB World 2018 represent a significant milestone in the evolution of MongoDB, making it the database of choice for all modern applications. Broadly speaking, there are three reasons for this:

  1. The document data model – presenting you the best way to work with data.
  2. It’s distributed by design – allowing you to intelligently put data where you want it.
  3. A unified experience that gives you the freedom to run anywhere – allowing you to future-proof your work and eliminate vendor lock-in.

There is a ton of new stuff, and so I wanted to give you a summary of what I covered during my keynote, with links to key resources so you can learn more.

Best Way to Work with Data

Today we released MongoDB Server 4.0 for General Availability. The highlight of the release is multi-document ACID transactions, which we previewed back in February with a beta program that attracted thousands of members of the community, putting transactions through their paces and providing invaluable feedback to the engineering team. We’ve implemented transactions so they feel just like the transactions you are familiar with from relational databases. They enforce snapshot isolation to provide a consistent view of data, and all-or-nothing execution to maintain data integrity. And while the document model means multi-document transactions aren’t necessary for most operations, with them it’s even easier for you to address a complete range of use cases with MongoDB.

It’s no secret how much I love MongoDB’s aggregation framework. Building queries stage-by-stage, checking your output as you go is by far a better way to write your most complicated queries than dealing with a monolithic snarl of SQL. To make that workflow even better, we’ve enhanced MongoDB Compass with the aggregation pipeline builder, which provides stage-by-stage, real-time feedback on the documents flowing through your pipelines. It’s easier than ever to deploy sophisticated processing pipelines that transform, aggregate, and analyze your data, all from the simple and intuitive MongoDB Compass GUI. You can then export the pipelines, and any other queries you create in Compass, to the native code of your preferred programming language. Server 4.0 also adds type conversions to the aggregation pipeline. With the new $convert operator you can transform mixed data types into standardized, cleansed formats natively within the database, preparing it for BI and machine learning, while eliminating costly, slow, and fragile ETL processes.

Extending the tools you can use to work with data managed by the server, we announced the public beta of MongoDB Charts, which provides the fastest and easiest way to get insights into your operational data, in real time. With Charts, you can create and share visualisations of your MongoDB data, using a document-native interface, without needing to move it into other systems or leverage third-party tools.

Documents and MongoDB’s query language are the best way to work with data, and to bring that power out of the datacenter and into the hands of app developers, MongoDB Stitch, which is GA as of today, provides two of its four services: QueryAnywhere and Functions. Using the authentication and declarative access control rules of Stitch QueryAnywhere, we can end the horrid practice of implementing shadow query languages in REST on top of application servers that just turn those REST calls into real query languages. With a native SDK, developers can make use of the full power of MongoDB from mobile and JavaScript applications, while Stitch makes sure that the right permissions are observed. Stitch Functions, JavaScript functions that execute with full access to application context, let developers compose their business logic with access to Atlas and calls to external services. With these two services, it’s easy to build complete applications without standing up a single application server.

Intelligently Put Data Where you Want It

As a distributed system, MongoDB enables you to spread data out across a cluster of nodes for resilience, scalability, and workload isolation. Unlike other distributed databases that randomly spray data around a cluster, MongoDB allows you to define controls that place data on specific nodes, for example in a specific region for low latency reads and writes, and for compliance with new privacy regulations.

The new Global Clusters introduced to MongoDB Atlas allow you to deploy a geographically distributed, fully managed database that provides low latency writes and reads to users anywhere, with data placement controls for regulatory compliance. We also announced Atlas Enterprise, offering new security controls including LDAP integration, the encrypted storage engine with bring-your-own key management, and database-level auditing. Organizations can now also use databases managed by MongoDB Atlas to build HIPAA-compliant applications under an executed Business Associate Agreement (BAA) with MongoDB, Inc. With these now announcements, MongoDB Atlas is the most secure cloud database service available anywhere.

Coding in a distributed world also means that the traditional means of responding to events in a database are no longer viable. So Stitch Triggers, also GA today, makes it possible by building on the Change Streams introduced in MongoDB 3.6. When you create a trigger, Stitch manages a change stream on your behalf, providing real-time notifications to Stitch Functions, which can react in all the ways functions can, from updating analytics rollup collections, to sending email or text messages, or kicking off other external services like Kafka or Kinesis.

Freedom to Run Anywhere

Whether you want to consume your database as a service in MongoDB Atlas, or manage it yourself on your own infrastructure, the announcements today make that even easier. We deliver a data platform that runs the same everywhere, that leverages the benefits of multi-cloud strategy with no lock-in, and is available in 50+ regions across the major cloud providers.

If you want to run MongoDB yourself, then we have released our new free MongoDB monitoring cloud service. The service is available to all MongoDB users, without needing to install an agent, navigate a paywall, or complete a registration form. You will be able to see the metrics and topology about your environment from the moment free monitoring is enabled. You can enable free monitoring easily using the MongoDB shell, MongoDB Compass, or by starting the mongod process with the new db.enableFreeMonitoring() command line option, and you can opt out at any time.

We’re seeing more DevOps teams leveraging the power of containerization and technologies like Kubernetes and Red Hat OpenShift to manage containerized clusters. Today we announced beta of the new MongoDB Enterprise Operator for Kubernetes, enabling you to deploy and manage MongoDB clusters from within the Kubernetes API, without having to connect separately to Ops Manager. You can learn more by reading our Red Hat OpenShift and MongoDB blog, and checking out the repository on GitHub.

Announced today, MongoDB Mobile takes MongoDB to a new frontier. Available in beta, MongoDB Mobile extends your ability to put data where you need it, all the way out to the edge of the network on IoT assets and iOS and Android mobile devices. MongoDB Mobile provides a single database, query language, and the intuitive Stitch SDK that runs consistently for data held on mobile clients, through to the backend server.

MongoDB Mobile provides the power and flexibility of MongoDB in a compact form that is power- and performance-aware with a low disk and memory footprint. It supports 64 bit iOS and Android operating systems and is easily embedded into mobile and IoT devices for fast and reliable local storage of JSON documents. With secondary indexing, access to the full MongoDB query language and aggregations, users can query data any way they want. With local reads and writes, MongoDB Mobile lets you build the fastest, most reactive apps. And Stitch Mobile Sync, which is in private beta now, will automatically synchronize data changes between data held locally and your backend database, helping resolve any conflicts – even after the device has been offline. The beta program is open now, and you can sign up for access on the MongoDB Mobile product page.

What’s Next?

So as you can see, that’s a ton of stuff. The announcements today represent our biggest set of releases yet, and we’re incredibly excited to get it into your hands and see what amazing things you do with them. Head over to our MongoDB World 2018 announcements page for more resources on each of these new products and services.

MongoDB Multi-Document ACID Transactions are GA

With the release of 4.0, you now have multi-document ACID transactions in MongoDB.

But how have you been able to be so productive with MongoDB up until now? The database wasn’t built overnight and many applications have been running mission critical use cases demanding the highest levels of data integrity without multi-document transactions for years. How was that possible?

Well, let’s think first about why many people believe they need multi-document transactions. The first principle of relational data modeling is normalizing your data across tables. This means that many common database operations, like account creation, require atomic updates across many rows and columns.

In MongoDB, the data model is fundamentally different. The document model encourages users to store related data together in a single document. MongoDB, has always supported ACID transactions in a single document and, when leveraging the document model appropriately, many applications don’t need ACID guarantees across multiple documents.

However, transactions are not just a check box. Transactions, like every MongoDB feature, aim to make developers lives easier. ACID guarantees across documents simplify application logic needed to satisfy complex applications.

The value of MongoDB’s transactions

While MongoDB’s existing atomic single-document operations already provide transaction semantics that satisfy the majority of applications, the addition of multi-document ACID transactions makes it easier than ever for developers to address the full spectrum of use cases with MongoDB. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity. For developers with a history of transactions in relational databases, MongoDB’s multi-document transactions are very familiar, making it straightforward to add them to any application that requires them.

In MongoDB 4.0, transactions work across a replica set, and MongoDB 4.2 will extend support to transactions across a sharded deployment* (see the Safe Harbor statement at the end of this blog). Our path to transactions represents a multi-year engineering effort, beginning back in early 2015 with the groundwork laid in almost every part of the server and the drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

In this blog, we will explore why MongoDB had added multi-document ACID transactions, their design goals and implementation, characteristics of transactions and best practices for developers. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

Why Multi-Document ACID Transactions

Since its first release in 2009, MongoDB has continuously innovated around a new approach to database design, freeing developers from the constraints of legacy relational databases. A design founded on rich, natural, and flexible documents accessed by idiomatic programming language APIs, enabling developers to build apps 3-5x faster. And a distributed systems architecture to handle more data, place it where users need it, and maintain always-on availability. This approach has enabled developers to create powerful and sophisticated applications in all industries, across a tremendously wide range of use cases.

Figure 1: Organizations innovating with MongoDB

With subdocuments and arrays, documents allow related data to be modeled in a single, rich and natural data structure, rather than spread across separate, related tables composed of flat rows and columns. As a result, MongoDB’s existing single document atomicity guarantees can meet the data integrity needs of most applications. In fact, when leveraging the richness and power of the document model, we estimate 80%-90% of applications don’t need multi-document transactions at all.

However, some developers and DBAs have been conditioned by 40 years of relational data modeling to assume multi-document transactions are a requirement for any database, irrespective of the data model they are built upon. Some are concerned that while multi-document transactions aren’t needed by their apps today, they might be in the future. And for some workloads, support for ACID transactions across multiple documents is required.

As a result, the addition of multi-document transactions makes it easier than ever for developers to address a complete range of use cases on MongoDB. For some, simply knowing that they are available will assure them that they can evolve their application as needed, and the database will support them.

Data Models and Transactions

Before looking at multi-document transactions in MongoDB, we want to explore why the data model used by a database impacts the scope of a transaction.

Relational Data Model

Relational databases model an entity’s data across multiple records and parent-child tables, and so a transaction needs to be scoped to span those records and tables. The example in Figure 2 shows a contact in our customer database, modeled in a relational schema. Data is normalized across multiple tables: customer, address, city, country, phone number, topics and associated interests.

Figure 2: Customer data modeled across separate tables in a relational database

In the event of the customer data changing in any way, for example if our contact moves to a new job, then multiple tables will need to be updated in an “all-or-nothing” transaction that has to touch multiple tables, as illustrated in Figure 3.

Figure 3: Updating customer data in a relational database

Document Data Model

Document databases are different. Rather than break related data apart and spread it across multiple parent-child tables, documents can store related data together in a rich, typed, hierarchical structure, including subdocuments and arrays, as illustrated in Figure 4.

Figure 4: Customer data modeled in a single, rich document structure

MongoDB provides existing transactional properties scoped to the level of a document. As shown in Figure 5, one or more fields may be written in a single operation, with updates to multiple subdocuments and elements of any array, including nested arrays. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document. With this design, application owners get the same data integrity guarantees as those provided by older relational databases.

Figure 5: Updating customer data in a document database

Where are Multi-Document Transactions Useful

There are use cases where transactional ACID guarantees need to be applied to a set of operations that span multiple documents. Back office “System of Record” or “Line of Business” (LoB) applications are the typical class of workload where multi-document transactions are useful. Examples include:

  • Processing application events when users perform important actions -- for instance when updating the status of an account, say to delinquent, across all those users’ documents.
  • Logging custom application actions -- say when a user transfers ownership of an entity, the write should not be successful if the logging isn’t.
  • Many to many relationships where the data naturally fits into defined objects -- for example positions, calculated by an aggregate of hundreds of thousands of trades, need to be updated every time trades are added or modified.

MongoDB already serves these use cases today, but the introduction of multi-document transactions makes it easier as the database automatically handles multi-document transactions for you. Before their availability, the developer would need to programmatically implement transaction controls in their application. To ensure data integrity, they would need to ensure that all stages of the operation succeed before committing updates to the database, and if not, roll back any changes. This adds complexity that slows down the rate of application development. MongoDB customers in the financial services industry have reported they were able to cut 1,000+ lines of code from their apps by using multi-document transactions.

In addition, implementing client side transactions can impose performance overhead on the application. For example, after moving from its existing client-side transactional logic to multi-document transactions, a global enterprise data management and integration ISV experienced improved MongoDB performance in its Master Data Management solution: throughput increased by 90%, and latency was reduced by over 60% for transactions that performed six updates across two collections.

Multi-Document ACID Transactions in MongoDB

Transactions in MongoDB feel just like transactions developers are used to in relational databases. They are multi-statement, with similar syntax, making them familiar to anyone with prior transaction experience.

The following Python code snippet shows a sample of the transactions API.

    with client.start_session() as s:
            s.start_transaction()
            collection_one.insert_one(doc_one, session=s)
            collection_two.insert_one(doc_two, session=s)
            s.commit_transaction()

The following snippet shows the transactions API for Java.

    try (ClientSession clientSession = client.startSession()) {
                  clientSession.startTransaction();
                  collection.insertOne(clientSession, docOne);
                  collection.insertOne(clientSession, docTwo);
                  clientSession.commitTransaction();
        }

As these examples show, transactions use regular MongoDB query language syntax, and are implemented consistently whether the transaction is executed across documents and collections in a replica set, and with MongoDB 4.2, across a sharded cluster*.

We're excited to see MongoDB offer dedicated support for ACID transactions in their data platform and that our collaboration is manifest in the Lovelace release of Spring Data MongoDB. It ships with the well known Spring annotation-driven, synchronous transaction support using the MongoTransactionManager but also bits for reactive transactions built on top of MongoDB's ReactiveStreams driver and Project Reactor datatypes exposed via the ReactiveMongoTemplate.

Pieter Humphrey - Spring Product Marketing Lead, Pivotal

The transaction block code snippets below compare the MongoDB syntax with that used by MySQL. It shows how multi-document transactions feel familiar to anyone who has used traditional relational databases in the past.

MySQL

    db.start_transaction()
        cursor.execute(orderInsert, orderData)
        cursor.execute(stockUpdate, stockData)
    db.commit()

MongoDB

    s.start_transaction()
        orders.insert_one(order, session=s)
        stock.update_one(item, stockUpdate, session=s)
    s.commit_transaction()

Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions can apply to operations against multiple documents contained in one, or in many, collections and databases. The changes to MongoDB that enable multi-document transactions do not impact performance for workloads that don't require them.

During its execution, a transaction is able to read its own uncommitted writes, but none of its uncommitted writes will be seen by other operations outside of the transaction. Uncommitted writes are not replicated to secondary nodes until the transaction is committed to the database. Once the transaction has been committed, it is replicated and applied atomically to all secondary replicas.

An application specifies write concern in the transaction options to state how many nodes should commit the changes before the server acknowledges the success to the client. All uncommitted writes live on the primary exclusively.

Taking advantage of the transactions infrastructure introduced in MongoDB 4.0, the new snapshot read concern ensures queries and aggregations executed within a read-only transaction will operate against a single, isolated snapshot on the primary replica. As a result, a consistent view of the data is returned to the client, irrespective of whether that data is being simultaneously modified by concurrent operations. Snapshot reads are especially useful for operations that return data in batches with the getMore command.

Even before MongoDB 4.0, typical MongoDB queries leveraged WiredTiger snapshots. The distinction between typical MongoDB queries and snapshot reads in transactions is that snapshot reads use the same snapshot throughout the duration of the query. Whereas typical MongoDB queries may switch to a more current snapshot during yield points.

Snapshot Isolation and Write Conflicts

When modifying a document, a transaction locks the document from additional changes until the transaction completes. If a transaction is unable to obtain a lock on a document it is attempting to modify, likely because another transaction is already holding the lock, it will immediately abort after 5ms with a write conflict.

However, if a typical MongoDB write outside of a multi-document transaction tries to modify a document that is currently being held by a multi-document transaction, that write will block behind the transaction completing. The typical MongoDB write will be infinitely retried with backoff logic until $maxTimeMS is reached. Even prior to 4.0, all writes were represented as WiredTiger transactions at the storage layer and it was possible to get write conflicts. This same logic was implemented to avoid users having to react to write conflicts client-side.

Reads do not require the same locks that document modifications do. Documents that have uncommitted writes by a transaction can still be read by other operations, and of course those operations will only see the committed values, and not the uncommitted state.

Only writes trigger write conflicts within MongoDB. Reads, in line with the expected behavior of snapshot isolation, do not prevent a document from being modified by other operations. Changes will not be surfaced as write conflicts unless a write is performed on the document. Additionally, no-op updates – like setting a field to the same value that it already had – can be optimized away before reaching the storage engine, thus not triggering a write conflict. To guarantee that a write conflict is detected, perform an operation like incrementing a counter.

Retrying Transactions

MongoDB 4.0 introduces the concept of error labels. The transient transaction error label notifies listening applications that the error surfaced, ranging from network errors to write conflicts, that the error may be temporary, and that the transaction is safe to retry from the beginning. Permanent errors, like parsing errors, will not have the transient transaction error label, as rerunning the transaction will never result in a successful commit.

One of the core values of MongoDB is its highly available architecture. These error labels make it easy for applications to be resilient in cases of network blips or node failures, enabling cross document transactional guarantees without sacrificing use cases that must be always-on.

Transactions Best Practices

As noted earlier, MongoDB’s single document atomicity guarantees will meet 80-90% of an application’s transactional needs. They remain the recommended way of enforcing your app’s data integrity requirements. For those operations that do require multi-document transactions, there are several best practices that developers should observe.

Creating long running transactions, or attempting to perform an excessive number of operations in a single ACID transaction can result in high pressure on WiredTiger’s cache. This is because the cache must maintain state for all subsequent writes since the oldest snapshot was created. As a transaction uses the same snapshot while it is running, new writes accumulate in the cache throughout the duration of the transaction. These writes cannot be flushed until transactions currently running on old snapshots commit or abort, at which time the transactions release their locks and WiredTiger can evict the snapshot. To maintain predictable levels of database performance, developers should therefore consider the following:

  1. By default, MongoDB will automatically abort any multi-document transaction that runs for more than 60 seconds. Note that if write volumes to the server are low, you have the flexibility to tune your transactions for a longer execution time. To address timeouts, the transaction should be broken into smaller parts that allow execution within the configured time limit. You should also ensure your query patterns are properly optimized with the appropriate index coverage to allow fast data access within the transaction.
  2. There are no hard limits to the number of documents that can be read within a transaction. As a best practice, no more than 1,000 documents should be modified within a transaction. For operations that need to modify more than 1,000 documents, developers should break the transaction into separate parts that process documents in batches.
  3. In MongoDB 4.0, a transaction is represented in a single oplog entry, therefore must be within the 16MB document size limit. While an update operation only stores the deltas of the update (i.e., what has changed), an insert will store the entire document. As a result, the combination of oplog descriptions for all statements in the transaction must be less than 16MB. If this limit is exceeded, the transaction will be aborted and fully rolled back. The transaction should therefore be decomposed into a smaller set of operations that can be represented in 16MB or less.
  4. When a transaction aborts, an exception is returned to the driver and the transaction is fully rolled back. Developers should add application logic that can catch and retry a transaction that aborts due to temporary exceptions, such as a transient network failure or a primary replica election. With retryable writes, the MongoDB drivers will automatically retry the commit statement of the transaction.
  5. DDL operations, like creating an index or dropping a database, block behind active running transactions on the namespace. All transactions that try to newly access the namespace while DDL operations are pending, will not be able to obtain locks, aborting the new transactions.
  6. You can review all best practices in the MongoDB documentation for multi-document transactions.

    Transactions and Their Impact to Data Modeling in MongoDB

    Adding transactions does not make MongoDB a relational database – many developers have already experienced that the document model is superior to the relational one today.

    All best practices relating to MongoDB data modeling continue to apply when using features such as multi-document transactions, or fully expressive JOINs (via the $lookup aggregation pipeline stage). Where practical, all data relating to an entity should be stored in a single, rich document structure. Just moving data structured for relational tables into MongoDB will not allow users to take advantage of MongoDB’s natural, fast, and flexible document model, or its distributed systems architecture.

    The RDBMS to MongoDB Migration Guidedescribes the best practices for moving an application from a relational database to MongoDB.

    The Path to Transactions

    Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

    Figure 6 presents a timeline of the key engineering projects that have enabled multi-document transactions in MongoDB, with status shown as of June 2018. The key design goal underlying all of these projects is that their implementation does not sacrifice the key benefits of MongoDB – the power of the document model and the advantages of distributed systems, while imposing no performance impact to workloads that don’t require multi-document transactions.

    Figure 6: The path to transactions – multi-year engineering investment, delivered across multiple releases

    Conclusion

    MongoDB has already established itself as the leading database for modern applications. The document data model is rich, natural, and flexible, with documents accessed by idiomatic drivers, enabling developers to build apps 3-5x faster. Its distributed systems architecture enables you to handle more data, place it where users need it, and maintain always-on availability. MongoDB’s existing atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications. The addition of multi-document ACID transactions in MongoDB 4.0 makes it easier than ever for developers to address a complete range of us cases, while for many, simply knowing that they are available will provide critical peace of mind.

    Take a look at our multi-document transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

    If you want to learn more about everything that’s new in MongoDB 4.0, download our Guide.

    Transactional guarantees have been a critical feature for relational databases for decades, but have typically been absent from non-relational alternatives, which has meant that users have been forced to choose between transactions and the flexibility and versatility that non-relational databases offer. With its support for multi-document ACID transactions, MongoDB is built for customers that want to have their cake and eat it too.

    Stephen O’Grady, Principal Analyst, Redmonk

    *Safe Harbour Statement

    This blog contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

    In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

Scaling Your Replica Set: Non-Blocking Secondary Reads in MongoDB 4.0

MongoDB 4.0 adds the ability to read from secondaries while replication is simultaneously processing writes. To see why this is new and important let's look at secondary read behavior in versions prior to 4.0.

Background

From the outset MongoDB has been designed so that when you have sequences of writes on the primary, each of the secondary nodes must show the writes in the same order. If you change field "A" in a document and then change field "B", it is not possible to see that document with changed field "B" and not changed field "A". Eventually consistent systems allow you to see it, but MongoDB does not, and never has.

On secondary nodes, we apply writes in batches, because applying them sequentially would likely cause secondaries to fall behind the primary. When writes are applied in batches, we must block reads so that applications cannot see data applied in the "wrong" order. This is why when reading from secondaries, periodically the readers have to wait for replication batches to be applied. The heavier the write load, the more likely that your secondary reads will have these occasional "pauses", impacting your latency metrics. Given that applications frequently use secondary reads to reduce the latency of queries (for example when they use "nearest" readPreference) having to wait for replication batches to be applied defeats the goal of getting lowest latency on your reads.

In addition to readers having to wait for replication batch writes to finish, the writing of batches needs a lock that requires all reads to complete before it can be taken. That means that in the presence of high number of reads, the replication writes can start lagging – an issue that is compounded when chain replication is enabled.

What was our goal in MongoDB 4.0?

Our goal was to allow reads during oplog application to decrease read latency and secondary lag, and increase maximum throughput of the replica set. For replica sets with a high write load, not having to wait for readers between applying oplog batches allows for lower lag and quicker confirmation of majority writes, resulting in less cache pressure on the primary and better performance overall.

How did we do it?

Starting with MongoDB 4.0 we took advantage of the fact that we implemented support for timestamps in the storage engine, which allows transactions to get a consistent view of data at a specific "cluster time". For more details about this see the video: WiredTiger timestamps.

Secondary reads can now also take advantage of the snapshots, by reading from the latest consistent snapshot prior to the current replication batch that's being applied. Reading from that snapshot guarantees a consistent view of the data, and since applying current replication batch doesn't change these earlier records, we can now relax the replication lock and allow all these secondary reads at the same time the writes are happening.

How much difference does this make?

A lot! The range of performance improvements for throughput could range from none (if you were not impacted by the replication lock - that is your write load is relatively low) to 2X.

Most importantly, this improves latency for secondary reads – for those who use readPreference "nearest" because they want to reduce latency from the application to the database – this feature means their latency in the database will also be as low as possible. We saw significant improvement in 95 and 99th percentile latency in these tests.

Thread levels 8 16 32 64
Feature off 1 2 3 5
Feature on 0 1 1 0

95th percentile read latency (ms)

Best part of this new feature? You don't need to do anything to enable it or opt-into it. All secondary reads in 4.0 will read from snapshot without waiting for replication writes.

This is just one of a number of great new features coming in MongoDB 4.0. Take a look at our blog on the 4.0 release candidate to learn more. And don’t forget, you’ve still got time to register for MongoDB World where you can meet with the engineers who are building all of these great new features.