GIANT Stories at MongoDB

Java and MongoDB 4.0 Support for Multi-Document ACID Transactions


MongoDB 4.0 adds support for multi-document ACID transactions.

But wait... Does that mean MongoDB did not support transactions until now? No, actually MongoDB has always supported transactions in the form of single document transactions. MongoDB 4.0 extends these transactional guarantees across multiple documents, multiple statements, multiple collections, and multiple databases. What good would a database be without any form of transactional data integrity guarantee?

Before we dive into this blog post, you can find all the code and try multi-document ACID transactions here.

Quick start

Step 1: Start MongoDB

Start a single node MongoDB ReplicaSet in version 4.0.0 minimum on localhost, port 27017.

If you use Docker:

  • You can use
  • When you are done, you can use
  • If you want to connect to MongoDB with the Mongo Shell, you can use

If you prefer to start mongod manually:

  • mkdir /tmp/data && mongod --dbpath /tmp/data --replSet rs
  • mongo --eval 'rs.initiate()'

Step 2: Start Java

This demo contains two main programs: and

  • Change Steams allow you to be notified of any data changes within a MongoDB collection or database.
  • The Transaction process is the demo itself.

You need two shells to run them.

If you use Docker:

First shell:


Second shell:


If you do not use Docker, you will need to install Maven 3.5.X and a JDK 10 (or JDK 8 minimum but you will need to update the Java versions in the pom.xml):

First shell:


Second shell:


Let’s compare our existing single document transactions with MongoDB 4.0’s ACID compliant multi-document transactions and see how we can leverage this new feature with Java.

Prior to MongoDB 4.0

Even in MongoDB 3.6 and earlier, every write operation is represented as a transaction scoped to the level of an individual document in the storage layer. Because the document model brings together related data that would otherwise be modeled across separate parent-child tables in a tabular schema, MongoDB’s atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications.

Every typical write operation modifying multiple documents actually happens in several independent transactions: one for each document.

Let’s take an example with a very simple stock management application.

First of all, I need a MongoDB Replica Set so please follow the instructions given above to start MongoDB.

Now let’s insert the following documents into a product collection:

MongoDB Enterprise rs:PRIMARY> db.product.insertMany([
    { "_id" : "beer", "price" : NumberDecimal("3.75"), "stock" : NumberInt(5) }, 
    { "_id" : "wine", "price" : NumberDecimal("7.5"), "stock" : NumberInt(3) }

Let’s imagine there is a sale on and we want to offer our customers a 20% discount on all our products.

But before applying this discount, we want to monitor when these operations are happening in MongoDB with Change Streams.

Execute the following in Mongo Shell:

cursor =[{$match: {operationType: "update"}}]);
while (!cursor.isExhausted()) {
  if (cursor.hasNext()) {

Keep this shell on the side, open another Mongo Shell and apply the discount:

PRIMARY> db.product.updateMany({}, {$mul: {price:0.8}})
{ "acknowledged" : true, "matchedCount" : 2, "modifiedCount" : 2 }
PRIMARY> db.product.find().pretty()
    "_id" : "beer",
    "price" : NumberDecimal("3.00000000000000000"),
    "stock" : 5
    "_id" : "wine",
    "price" : NumberDecimal("6.0000000000000000"),
    "stock" : 3

As you can see, both documents were updated with a single command line but not in a single transaction. Here is what we can see in the Change Stream shell:

    "_id" : {
        "_data" : "825B4637290000000129295A1004374DC58C611E4C8DA4E5EDE9CF309AC5463C5F6964003C62656572000004"
    "operationType" : "update",
    "clusterTime" : Timestamp(1531328297, 1),
    "ns" : {
        "db" : "test",
        "coll" : "product"
    "documentKey" : {
        "_id" : "beer"
    "updateDescription" : {
        "updatedFields" : {
            "price" : NumberDecimal("3.00000000000000000")
        "removedFields" : [ ]
    "_id" : {
        "_data" : "825B4637290000000229295A1004374DC58C611E4C8DA4E5EDE9CF309AC5463C5F6964003C77696E65000004"
    "operationType" : "update",
    "clusterTime" : Timestamp(1531328297, 2),
    "ns" : {
        "db" : "test",
        "coll" : "product"
    "documentKey" : {
        "_id" : "wine"
    "updateDescription" : {
        "updatedFields" : {
            "price" : NumberDecimal("6.0000000000000000")
        "removedFields" : [ ]

As you can see the cluster times (see the clusterTime key) of the two operations are different: the operations occurred during the same second but the counter of the timestamp has been incremented by one.

Thus here each document is updated one at a time and even if this happens really fast, someone else could read the documents while the update is running and see only one of the two products with the discount.

Most of the time, it is something you can tolerate in your MongoDB database because, as much as possible, we try to embed tightly linked, or related data in the same document. As a result, two updates on the same document happen within a single transaction :

PRIMARY> db.product.update({_id: "wine"},{$inc: {stock:1}, $set: {description : "It’s the best wine on Earth"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
PRIMARY> db.product.findOne({_id: "wine"})
    "_id" : "wine",
    "price" : NumberDecimal("6.0000000000000000"),
    "stock" : 4,
    "description" : "It’s the best wine on Earth"

However, sometimes, you cannot model all of your related data in a single document, and there are a lot of valid reasons for choosing not to embed documents.

MongoDB 4.0 with multi-document ACID transactions

Multi-document ACID transactions in MongoDB are very similar to what you probably already know from traditional relational databases.

MongoDB’s transactions are a conversational set of related operations that must atomically commit or fully rollback with all-or-nothing execution.

Transactions are used to make sure operations are atomic even across multiple collections or databases. Thus, with snapshot isolation reads, another user can only see all the operations or none of them.

Let’s now add a shopping cart to our example.

For this example, 2 collections are required because we are dealing with 2 different business entities: the stock management and the shopping cart each client can create during shopping. The lifecycle of each document in these collections is different.

A document in the product collection represents an item I’m selling. This contains the current price of the product and the current stock. I created a POJO to represent it :

{ "_id" : "beer", "price" : NumberDecimal("3"), "stock" : NumberInt(5) }

A shopping cart is created when a client adds its first item in the cart and is removed when the client proceeds to checkout or leaves the website. I created a POJO to represent it :

    "_id" : "Alice",
    "items" : [
            "price" : NumberDecimal("3"),
            "productId" : "beer",
            "quantity" : NumberInt(2)

The challenge here resides in the fact that I cannot sell more than I possess: if I have 5 beers to sell, I cannot have more than 5 beers distributed across the different client carts.

To ensure that, I have to make sure that the operation creating or updating the client cart is atomic with the stock update. That’s where the multi-document transaction comes into play. The transaction must fail in the case someone tries to buy something I do not have in my stock. I will add a constraint on the product stock:

db.createCollection("product", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "_id", "price", "stock" ],
         properties: {
            _id: {
               bsonType: "string",
               description: "must be a string and is required"
            price: {
               bsonType: "decimal",
               minimum: 0,
               description: "must be a positive decimal and is required"
            stock: {
               bsonType: "int",
               minimum: 0,
               description: "must be a positive integer and is required"

Node that this is already included in the Java code.

To monitor our example, we are going to use MongoDB Change Streams that were introduced in MongoDB 3.6.

In each of the threads of this process called, I am going to monitor one of the 2 collections and print each operation with its associated cluster time.

// package and imports

public class ChangeStreams {

    private static final Bson filterUpdate = Filters.eq("operationType", "update");
    private static final Bson filterInsertUpdate ="operationType", "insert", "update");
    private static final String jsonSchema = "{ $jsonSchema: { bsonType: \"object\", required: [ \"_id\", \"price\", \"stock\" ], properties: { _id: { bsonType: \"string\", description: \"must be a string and is required\" }, price: { bsonType: \"decimal\", minimum: 0, description: \"must be a positive decimal and is required\" }, stock: { bsonType: \"int\", minimum: 0, description: \"must be a positive integer and is required\" } } } } ";

    public static void main(String[] args) {
        MongoDatabase db = initMongoDB(args[0]);
        MongoCollection<Cart> cartCollection = db.getCollection("cart", Cart.class);
        MongoCollection<Product> productCollection = db.getCollection("product", Product.class);
        ExecutorService executor = Executors.newFixedThreadPool(2);
        executor.submit(() -> watchChangeStream(productCollection, filterUpdate));
        executor.submit(() -> watchChangeStream(cartCollection, filterInsertUpdate));
        ScheduledExecutorService scheduled = Executors.newSingleThreadScheduledExecutor();
        scheduled.scheduleWithFixedDelay(System.out::println, 0, 1, TimeUnit.SECONDS);

    private static void watchChangeStream(MongoCollection<?> collection, Bson filter) {
        System.out.println("Watching " + collection.getNamespace());
        List<Bson> pipeline = Collections.singletonList(Aggregates.match(filter));
                  .forEach((Consumer<ChangeStreamDocument<?>>) doc -> System.out.println(
                          doc.getClusterTime() + " => " + doc.getFullDocument()));

    private static MongoDatabase initMongoDB(String mongodbURI) {
        CodecRegistry providers = fromProviders(PojoCodecProvider.builder().register("com.mongodb.models").build());
        CodecRegistry codecRegistry = fromRegistries(MongoClient.getDefaultCodecRegistry(), providers);
        MongoClientOptions.Builder options = new MongoClientOptions.Builder().codecRegistry(codecRegistry);
        MongoClientURI uri = new MongoClientURI(mongodbURI, options);
        MongoClient client = new MongoClient(uri);
        MongoDatabase db = client.getDatabase("test");
        db.createCollection("product", productJsonSchemaValidator());
        return db;

    private static CreateCollectionOptions productJsonSchemaValidator() {
        return new CreateCollectionOptions().validationOptions(
                new ValidationOptions().validationAction(ValidationAction.ERROR).validator(BsonDocument.parse(jsonSchema)));

In this example we have 5 beers to sell. Alice wants to buy 2 beers but we are not going to use the new MongoDB 4.0 multi-document transactions for this. We will observe in the change streams two operations : one creating the cart and one updating the stock at 2 different cluster times.

Then Alice adds 2 more beers in her cart and we are going to use a transaction this time. The result in the change stream will be 2 operations happening at the same cluster time.

Finally, she will try to order 2 extra beers but the jsonSchema validator will fail the product update and result in a rollback. We will not see anything in the change stream. Here is the source code:

// package and import

public class Transactions {

    private static MongoClient client;
    private static MongoCollection<Cart> cartCollection;
    private static MongoCollection<Product> productCollection;

    private final BigDecimal BEER_PRICE = BigDecimal.valueOf(3);
    private final String BEER_ID = "beer";

    private final Bson stockUpdate = inc("stock", -2);
    private final Bson filterId = eq("_id", BEER_ID);
    private final Bson filterAlice = eq("_id", "Alice");
    private final Bson matchBeer = elemMatch("items", eq("productId", "beer"));
    private final Bson incrementBeers = inc("items.$.quantity", 2);

    public static void main(String[] args) {
        new Transactions().demo();

    private static void initMongoDB(String mongodbURI) {
        CodecRegistry codecRegistry = fromRegistries(MongoClient.getDefaultCodecRegistry(), fromProviders(
        MongoClientOptions.Builder options = new MongoClientOptions.Builder().codecRegistry(codecRegistry);
        MongoClientURI uri = new MongoClientURI(mongodbURI, options);
        client = new MongoClient(uri);
        MongoDatabase db = client.getDatabase("test");
        cartCollection = db.getCollection("cart", Cart.class);
        productCollection = db.getCollection("product", Product.class);

    private void demo() {
        System.out.println("#########  NO  TRANSACTION #########");
        System.out.println("Alice wants 2 beers.");
        System.out.println("We have to create a cart in the 'cart' collection and update the stock in the 'product' collection.");
        System.out.println("The 2 actions are correlated but can not be executed on the same cluster time.");
        System.out.println("Any error blocking one operation could result in stock error or beer sale we don't own.");
        System.out.println("\n######### WITH TRANSACTION #########");
        System.out.println("Alice wants 2 extra beers.");
        System.out.println("Now we can update the 2 collections simultaneously.");
        System.out.println("The 2 operations only happen when the transaction is committed.");
        System.out.println("\n######### WITH TRANSACTION #########");
        System.out.println("Alice wants 2 extra beers.");
        System.out.println("This time we do not have enough beers in stock so the transaction will rollback.");

    private void aliceWantsTwoExtraBeersInTransactionThenCommitOrRollback() {
        ClientSession session = client.startSession();
        try {
        } catch (MongoCommandException e) {
            System.out.println("####### ROLLBACK TRANSACTION #######");
        } finally {

    private void removingBeersFromStock() {
        System.out.println("Trying to update beer stock : -2 beers.");
        try {
            productCollection.updateOne(filterId, stockUpdate);
        } catch (MongoCommandException e) {
            System.out.println("#####   MongoCommandException  #####");
            System.out.println("##### STOCK CANNOT BE NEGATIVE #####");
            throw e;

    private void removingBeerFromStock(ClientSession session) {
        System.out.println("Trying to update beer stock : -2 beers.");
        try {
            productCollection.updateOne(session, filterId, stockUpdate);
        } catch (MongoCommandException e) {
            System.out.println("#####   MongoCommandException  #####");
            System.out.println("##### STOCK CANNOT BE NEGATIVE #####");
            throw e;

    private void aliceWantsTwoBeers() {
        System.out.println("Alice adds 2 beers in her cart.");
        cartCollection.insertOne(new Cart("Alice", Collections.singletonList(new Cart.Item(BEER_ID, 2, BEER_PRICE))));

    private void aliceWantsTwoExtraBeers(ClientSession session) {
        System.out.println("Updating Alice cart : adding 2 beers.");
        cartCollection.updateOne(session, and(filterAlice, matchBeer), incrementBeers);

    private void insertProductBeer() {
        productCollection.insertOne(new Product(BEER_ID, 5, BEER_PRICE));

    private void clearCollections() {
        productCollection.deleteMany(new BsonDocument());
        cartCollection.deleteMany(new BsonDocument());

    private void printDatabaseState() {
        System.out.println("Database state:");
        printProducts(productCollection.find().into(new ArrayList<>()));
        printCarts(cartCollection.find().into(new ArrayList<>()));

    private void printProducts(List<Product> products) {

    private void printCarts(List<Cart> carts) {
        if (carts.isEmpty())
            System.out.println("No carts...");

    private void sleep() {
        System.out.println("Sleeping 3 seconds...");
        try {
        } catch (InterruptedException e) {

Here is the console of the Change Stream :

$ ./ 

Watching test.cart
Watching test.product

Timestamp{value=6570052721557110786, seconds=1529709604, inc=2} => Cart{id='Alice', items=[Item{productId=beer, quantity=2, price=3}]}

Timestamp{value=6570052734442012673, seconds=1529709607, inc=1} => Product{id='beer', stock=3, price=3}

Timestamp{value=6570052764506783745, seconds=1529709614, inc=1} => Product{id='beer', stock=1, price=3}
Timestamp{value=6570052764506783745, seconds=1529709614, inc=1} => Cart{id='Alice', items=[Item{productId=beer, quantity=4, price=3}]}

As you can see here, we only get four operations because the two last operations were never committed to the database, and therefore the change stream has nothing to show.

You can also note that the two first cluster times are different because we did not use a transaction for the two first operations, and the two last operations share the same cluster time because we used the new MongoDB 4.0 multi-document transaction system, and thus they are atomic.

Here is the console of the Transaction java process that sum up everything I said earlier.

$ ./ 
Database state:
Product{id='beer', stock=5, price=3}
No carts...

#########  NO  TRANSACTION #########
Alice wants 2 beers.
We have to create a cart in the 'cart' collection and update the stock in the 'product' collection.
The 2 actions are correlated but can not be executed on the same cluster time.
Any error blocking one operation could result in stock error or a sale of beer that we can’t fulfill as we have no stock.
Alice adds 2 beers in her cart.
Sleeping 3 seconds...
Trying to update beer stock : -2 beers.

Database state:
Product{id='beer', stock=3, price=3}
Cart{id='Alice', items=[Item{productId=beer, quantity=2, price=3}]}

Sleeping 3 seconds...

######### WITH TRANSACTION #########
Alice wants 2 extra beers.
Now we can update the 2 collections simultaneously.
The 2 operations only happen when the transaction is committed.
Updating Alice cart : adding 2 beers.
Sleeping 3 seconds...
Trying to update beer stock : -2 beers.

Database state:
Product{id='beer', stock=1, price=3}
Cart{id='Alice', items=[Item{productId=beer, quantity=4, price=3}]}

Sleeping 3 seconds...

######### WITH TRANSACTION #########
Alice wants 2 extra beers.
This time we do not have enough beers in stock so the transaction will rollback.
Updating Alice cart : adding 2 beers.
Sleeping 3 seconds...
Trying to update beer stock : -2 beers.
#####   MongoCommandException  #####

Database state:
Product{id='beer', stock=1, price=3}
Cart{id='Alice', items=[Item{productId=beer, quantity=4, price=3}]}

Next Steps

Thanks for taking the time to read my post - I hope you found it useful and interesting. As a reminder, all the code is available on this Github repository for you to experiment.

If you are looking for a very simple way to get started with MongoDB, you can do that in just 5 clicks on our MongoDB Atlas database service in the cloud.

Also, multi-document ACID transactions is not the only new feature in MongoDB 4.0, so feel free to take a look at our free course on MongoDB University M040: New Features and Tools in MongoDB 4.0 and our guide to what’s new in MongoDB 4.0 where you can learn more about native type conversions, new visualization and analytics tools, and Kubernetes integration.

The MongoDB Summer ‘18 Intern Series: Building Global Influence

Andrea Dooley
August 14, 2018

Stef Wang is a MongoDB Summer ‘18 international intern. Originally from Taiwan, Stef had never visited the U.S. before deciding to attend the University of Pennsylvania, where she is currently a rising senior pursuing a major in computer and cognitive science and a minor in consumer psychology.

As an advocate for women in STEM, she is passionate about working to help change the underrepresentation of women in tech. During her internship, Stef took some time away to attend the Global Leadership Organization annual conference in Taiwan, which is dedicated to teaching leadership curriculum to empower youth to engage in social impact work.

Andrea Dooley: What made you want to come to the US, despite the fact you had never been there prior?
Stef Wang: I always knew I wanted to go abroad for university. The school I attended in Taiwan was tailored for students to obtain their degree either in the US or the UK, and I applied to over 20 schools in both regions. I was particularly interested in UPenn because of their Science Technology and Society program, which explores the impact of technology on society.

AD: Since that is a social science, how did you wind up majoring in computer science?
SW: I worked on CS in high school. When I went to college I found that women are underrepresented in STEM, and I wanted to challenge that. I found computer science to be very interesting and figured the best way to challenge the underrepresentation was for me to represent. It’s an issue I care about and I wanted to be able to encourage women to pursue CS as their major and stick to it.

AD: Was there a particular experience that made you this passionate about encouraging women into the CS field?
SW: I previously interned at a very big technology company, and the lack of female representation impacted me. It was in Taiwan, so the office was smaller in general, but out of 70+ engineers, three were female. I didn’t feel it was an optimal work environment because of subconscious judgements and assumptions. It’s very different than what my experience has been like at MongoDB. There are women in this [NYC] office in high positions such as engineering team leads, and a lot of women in engineering. It’s something I really appreciate.

AD: How does your passion to help bring change align to the conference you attended in Taiwan?
SW: I’ve been involved in the Global Leadership Organization since junior high school, and the conference is hosted regularly in both the US and Taiwan. GLO’s mission is to empower people with the leadership skills they need to bring about the changes they want to see in the world. I’m in charge of overseeing GLO operations in Asia, and help to teach the curriculum that Teddy Liaw developed, which helps to inspire people to become change agents. It’s empowered me to believe I can make a change. One day I’d like to create a nonprofit for women in STEM.

AD: How did you first learn about MongoDB?
SW: MongoDB often attends PennApps, which are hackathons hosted every semester, so I had the opportunity to speak to a few people and learn more about the company. A current intern referred me to the internship program while he was going through the process. What appealed to me most was the interview process and how different it was from any other process I had gone through before.

AD: How was the process different from what you’ve experienced before?
SW: My recruiter got to know me as a person, instead of just focusing on what was on my resume. She was also more technical than any other recruiter I had worked with, so I was able to thoroughly explain my background. I felt like it was less about attempting to confirm what I had on my resume, but instead understanding what I was passionate about and what I was looking to get out of the experience.

AD: What are you working on this summer?
SW: I’m working on the Cloud team, working on two projects: optimizing upgrades and downgrades for anyone who is using MongoDB Atlas on Google Cloud or Microsoft Azure, and also implementing Atlas support for encryption at rest with multiple cloud providers, which would allow customers to manage data encryption using Azure Key Value and Google Cloud KMS credentials. Atlas is a core MongoDB product, and really important to the overall company mission. I’ve really learned a lot over the course of the summer. In the beginning, there was a very steep learning curve, and a huge code base that we had to look through in order to complete a lot of our initial tickets, but after a few weeks our workflow became a lot more efficient.

AD: What’s it like to work on a product that directly impacts business success?
SW: The Atlas team is super fun. It’s really exciting to hear people be so enthusiastic about their work and have so much passion about what they do on a day to day basis, be willing to talk about what they’re working on, and be open to your opinions. It’s also great to see that the work we are doing is directly impacting our customers and the work they are doing. They don’t have to worry about maintaining a database and the underlying infrastructure needed to support it and instead can spend more time and energy focusing on work that matters to them. As an intern I’m able to help people do more with their data, and ultimately achieve their mission. Since we work in sprints, some of the features we've worked on have already been released and are currently being used by our customers. I appreciate that we get to be a part of how things happen. There’s a lot of collaboration but I’m also able to say, “I owned this.”

AD: What has your overall experience been like as a MongoDB intern this summer?
SW: My mentor has been an amazing manager, which was surprising to me but in a good way. His sole responsibility is to mentor the Atlas Intern team, which consists of me and two other interns. In the past, I found it frustrating if a mentor was working on something different, since I was more hesitant to ask questions in fear I would distract them from more important work. Here, my mentor is working on what I’m working on and always reiterates that I should be asking questions. It has helped me become a better coder and produce better work. I also appreciate how much emphasis MongoDB puts on feedback, both giving and receiving. It helps create a culture where everyone is always striving to be better.

To learn more about the MongoDB internship program, click here.

View the MongoDB World 2018 sessions

Danielle James
August 14, 2018

MongoDB World 2018 drew MongoDB community members from across the globe. At the event, attendees were able to meet with experts, connect with one another, and have a good time at the daily closing receptions.

If you weren’t able to attend and followed the #MDBW18 hashtag, chances are you found yourself dealing with a bit of FOMO. But just because you missed out on the in-person experience doesn’t mean you have to miss out on the learning.

From the keynotes to the panels, we filmed the majority of this year’s sessions. Now, you can view them at your convenience.


P.S. If you’d rather learn about these topics in person, check out our upcoming events. MongoDB.local might be coming to a city near you for a one-day, educational conference. First up? Chicago, Austin, and DC.

Share your story to become the next MongoDB Certified Professional of the Year

Amy Berman
August 10, 2018

Has becoming MongoDB Certified affected your life in any way, big or small?

Whether being MongoDB Certified has helped transform your career, connect with your community, or just see the world a little differently, we want to hear your story! MongoDB and the open source community want to learn from your success.

Since 2013, MongoDB has recognized a current MongoDB Certified Professional who demonstrates ingenuity, hard work, and expertise as the MongoDB Certified Professional of the Year. Tell us why you should be the next MongoDB Certified Professional of 2018 by answering a few questions here.

We'll choose a winner with the most interesting and compelling certification story. The winner will receive a free trip to the MongoDB Europe 2018 conference in London, including flight, conference pass, and hotel accommodation.

Submissions are open through September 1, 2018. Submit your entry today.

*See complete contest rules here.

The MongoDB Summer ‘18 Intern Series: From Hackathon to Haskell

Andrea Dooley
August 10, 2018

Mihai Andrei is going into his senior year at Rutgers University, the alma mater of MongoDB CEO Dev Ittycheria. While Dev received his BS in Electrical Engineering, Mihai is studying Computer Science and minoring in Mathematics. Mihai is also extremely involved in HackeRU, a 24 hour student run hackathon at Rutgers.

Andrea Dooley: Hackathons are very popular amongst CS students. What roles have you played for Rutgers HackeRU?
Mihai Andrei: If you are an organizer you’re not able to participate in the event, but this coming year I will be one of two Executive Directors, essentially overseeing the entire thing. In the past I have played the part of Director of Finance for the event, so I know this will be a particularly challenging role, but nonetheless an exciting one.

AD: You’ve been involved with HackRU for quite a while. Is that where you first learned about MongoDB?
MA: I actually learned about MongoDB during a student demo at a tech talk on campus. The first time I ever used MongoDB was at a previous internship for a data warehouse application we were developing. I was looking online for internship opportunities in the software industry and came across an opening for the MongoDB internship program.

AD: What made you interested in interning at MongoDB?
MA: My previous experience interning has mostly been with financial institutions, so this time around I wanted to take a different route to a company with more emphasis on tech and tech culture. I was able to get a good sense of the culture during the recruiting process, so I was really excited when I got the offer.

AD: Did you know our CEO was a Rutgers alum?
MA: I learned that Dev attended Rutgers a bit later on, but I think it’s really cool that someone from my university became the CEO of such an awesome company.

AD: What MongoDB Eng team are you on, and what projects were you responsible for this summer?
MA: I’m on the query team working on the MQL model, which is a model implementation of the query language built from scratch, serving as a reference. The reason for creating it from scratch is to identify flaws and iron out changes for future implementations, and the model can be a point of reference for how we create future versions of the query language. There are some flaws in the current version of the language that need sorting out for future iterations.

AD: What were some of the flaws present in the query language?
MA: An example of a flaw in the query language is the difference between find and aggregation projection. They are ambiguous and one will allow you do things the other doesn’t. For example, in aggregation you are able to use nested documents to specify how to project your output. That is not possible in find, but in find you have special operators to customize an output for arrays such as $elemMatch that you can’t use in aggregation projection. The ultimate goal is to unify the semantics.

AD: Did you have any previous experience working to improve a programming language, or did you find there was a learning curve?
MA: I took a programming languages class last year so I was able to learn about what goes into creating a programming language. I spent my first few weeks at MongoDB learning Haskell. I had to sit down with other team members to go through the code base and get ramped up. It’s been very rewarding from an educational and experience standpoint.

AD: What would you say is one key takeaway from your experience at MongoDB this summer?
MA: Beyond learning a new programming language and what goes into writing the MongoDB query language, what I wanted to get out of my summer internship was to learn how to develop software more collaboratively. MongoDB has a code review process, so you’re given a ticket but just completing the ticket is not enough. You have to run it by other members of the team to ensure it meets expectations. There’s been really great quality control feedback from the team.

AD: How has the level of feedback helped to benefit you as an engineer early in your career?
MA: Every week I sit down with my mentor for a thirty minute one on one to discuss how things are going. The continuous feedback has been very helpful because it helped me to improve the quality of the comments I left in my code. It was easy for me to understand what I did and how I did it, but I learned that you need to be very thorough in order for other people to understand as well.

AD: What would you say to someone considering an internship opportunity at MongoDB?
MA: I would absolutely recommend it. It’s a great environment to intern in, and I have really been able to grow my skills. The work is very challenging, but very rewarding, and I understand exactly how my project is going to impact the work my mentor and other members of the query team will continue doing after I leave.

To learn more about the MongoDB internship program, click here.

Adopting a Serverless Approach at Bazaarvoice with MongoDB Atlas and AWS Lambda

I recently had the pleasure of welcoming Ani Hammond, Senior Staff Software Engineer from Bazaarvoice, to the MongoDB World stage. To a completely packed room, Ani chronicled her team’s journey as they replatformed Bazaarvoice’s Curations service from a runaway monolith architecture to a completely serverless architecture backed by MongoDB Atlas.

Bazaarvoice logo

Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services. To use Ani’s own description, “If you're shopping online and you’re reading a review, it's probably powered by us.”

Bazaarvoice strives to connect brands and retailers with consumers through the gathering, curation, and display of user-generated content—anything from pictures on Instagram to an online product review—during a potential customer’s buying journey.

To give you a sense of the scale of this task, Bazaarvoice clocked over a billion total page views between Thanksgiving Day and Cyber Monday in 2017, peaking at around 6,000 page views per second!

Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services.

One of the technologies behind this herculean task is the Curations platform. To understand how this platform works, let’s look at an example:

An Instagram user posts a cute photo of their child wearing a particular brand’s rain boots. Using Curations, that brand is watching for specific content that mentions their products, so the social collection service picks up that post and shows it to the client team in the Curations application. The post can then be enriched in various manual and automatic ways. For example, a member of the client team can append metadata describing the product contained in the image or automatic rules can filter content for potentially offensive material. The Curations platform then automates the process of securing the original poster’s permission for the client to use their content. Now, this user-generated content is able to be displayed in real time on the brand’s homepage or product pages to potential customers considering similar products.

In a nutshell, this is what Curations does for hundreds of clients and hundreds of thousands of individual content pieces.

The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS.

The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS.

This platform was effective in allowing Bazaarvoice to scale to hundreds of new clients. However, this architecture did have an Achilles heel: each additional client onboarded to Bazaarvoice’s platform represented an additional Python/Django/MySQL cluster to manage. Not only was this configuration expensive (approximately $60,000/month), the operational overhead generated by each additional cluster made debugging, patching, releases, and general data management an ever-growing challenge. As Ani put it, “Most of our solutions were basically to throw more hardware/money at the problem and have a designated DevOps person to manage these clusters.”

One of the primary factors in selecting MongoDB for the new Curations platform was its support for a variety of different access patterns. For example, the part of the platform responsible for sourcing new social content had to support high write volume whereas the mechanism for displaying the content to consumers is read-intensive with strict availability requirements.

Diving into the specifics of why the Bazaarvoice team opted to move from a MySQL-based stack to one built on MongoDB is a blog post for another day. (Though, if you’d like to see what motivated other teams to do so, I recommend How DevOps, Microservices, and MongoDB are Making HSBC “Simpler, Better, and Faster” and Breuninger delivers omnichannel shopping experience for thousands of daily online users.)

That is to say, the focus of this particular post is the paradigm shift the Curations team made from a linearly-scaling monolith to a completely serverless approach, underpinned by MongoDB Atlas.

The new Curations platform is broken into three distinct services for content collection, enrichment, and display. The collections service is powered by a series of AWS Lambda functions triggered by an Amazon Kinesis stream written in Node.js whereas the enrichment and display services are built on autoscaling AWS Elastic Beanstalk instances. All three services making up the new Curations platform are backed by MongoDB Atlas.

Not only did this approach address the cluster-per-customer challenges of the old system, but the monthly costs were reduced by nearly 90% to approximately $6,500/month. The results are, again, best captured by Ani’s own words:

Massive cost savings, huge performance gains, strong consistency, and a handful of services rather than hundreds of clusters.

MongoDB Atlas was a natural fit in this new serverless paradigm as the team is fully able to focus on developing their product rather than on infrastructure management. In fact, the team had originally opted to manage the MongoDB instances on AWS themselves. After a couple of iterations of manual deployment and management, a desire to gain even more operational efficiency and increased insight into database performance prompted their move to Atlas. According to Ani, the cost of migrating to and leveraging a fully managed service was, "Way cheaper than having dedicated DevOps engineers.” Atlas’ support for direct VPC peering also made the transition to a hosted solution straightforward for the team.

Speaking of DevOps, one of the first operational benefits Ani and her team experienced was the ability to easily optimize their index usage in MongoDB. Previously, their approach to indexing was “build stuff that makes sense at the time and is easy to iterate on.” After getting up and running on Atlas, they were able to use the built-in Performance Advisor to make informed decisions on indexes to add and unused ones to remove. As Ani puts it:

An index killed is as valuable as an index added. This ensures all your indexes to fit into memory and a bad index doesn't push out the good ones.

Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries. According to her, the built-in tools helped keep the team honest, "[People] say, ‘My database isn't scaling. It's not able to perform complex queries in real doesn't work.’ Fix your code. The hardware is great, the tools are great but they can only carry you so far. I think sometimes we tend to get sloppy with how we write our code because of how cheap and how easy hardware is but we have to write code responsibly too.”

In another incident, a different Atlas feature, the Real Time Performance Panel, was key to identifying an issue with high load times in the display service. Some client’s displays were taking more than 6 seconds to load. (For context, content delivery network provider, Akamai, found that a two-second delay in web page load time can cause bounce rates to double!) High-level metrics in Datadog reported 5+ seconds query response times, while Atlas reported less than 100 ms response times for the same query. The team used both data points to triangulate and soon realized the discrepancy was a result of the time it took for Lambda to connect to MongoDB for each new operation. Switching from standard Lambda functions to a dockerized service ensured each operation could leverage an open connection rather than initiating a “cold start.”

I know a lot of the cool things that Atlas does can be done by hand but unless this is your full-time job, you're just not going to do it and you’re not going to do it as well.

Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries.

Before wrapping up her presentation, Ani shared an improvement over the old system that the team wasn’t expecting. Using Atlas, they were able to provide the customer support and services teams read-only views into the database. This afforded them deeper insight into the data and allowed them to perform ad-hoc queries directly. The result was a more proactive approach to issue management, leading to an 80% reduction in inbound support tickets.

By re-architecting their Curations platform, Bazaarvoice is well-positioned to bring on hundreds of new clients without a proportional increase in operations work for the team. But once again, Ani summarized it best:

As the old commercial goes… ‘Old platform: $60,000. New platform: $6,000. Getting to focus all of my time on development: priceless.'

Thank you very much to Ani Hammond and the rest of the Curations team at Bazaarvoice for putting together the presentation that inspired this post. Be sure to check out Ani’s full presentation in addition to dozens of other high-quality talks from MongoDB World on our YouTube channel.

If you haven’t tried out MongoDB Atlas for yourself, you can started with a free sandbox cluster.

Working with the Stitch CLI – Editing Apps Outside of Stitch

Andrew Morgan
August 08, 2018


This post introduces the MongoDB Stitch CLI, demonstrating how the Command Line Interface (CLI) can be used to export an existing app, continue to develop it locally, and then merge the changes back into Stitch. This workflow allows automation, enables collaboration, and increases productivity.

Introducing MongoDB Stitch

MongoDB Stitch is a serverless platform which accelerates application development with simple, secure access to data and services from the client – getting your apps to market faster while reducing operational costs and effort.

The Stitch serverless platform addresses the challenges faced by the developers of modern applications by providing four services:

  • Stitch QueryAnywhere: Exposes the full power of working with documents in MongoDB and the MongoDB query language, directly from your web and mobile application frontend code. A powerful rules engine lets developers declare fine-grained security policies.
  • Stitch Functions: Allows developers to run simple JavaScript functions in Stitch’s serverless environment, making it easy to create secure APIs or to build integrations with microservices and server-side logic. Enables integration with popular cloud services such as Slack and Twilio, enriching your apps with a single Stitch method call.
  • Stitch Triggers: Real-time notifications that launch functions in response to changes in the database. The functions can make further database changes, push data to other places, or interact with users – such as through push notifications, text messages, or emails.
  • Stitch Mobile Sync: Automatically synchronizes data between documents held locally in MongoDB Mobile and the backend database. MongoDB Mobile allows mobile developers to use the full power of MongoDB locally. Stitch Mobile Sync ensures that data is kept up to date across phones and all other clients in real time.

The Sitch CLI

Stitch has an intuitive UI to create and maintain your apps, but you also have the option of using the Stitch CLI. The stitch-cli is the key to automating workflows with Stitch (it also lets you develop code inside your favorite editor). The CLI provides functions for authenticating Stitch users, importing local application directories, and exporting applications from the MongoDB Stitch service.

With the import & export commands in the Stitch CLI, you can:

  • Develop Stitch applications using your favorite desktop editor and import them into Stitch
  • Export existing applications to a local directory. From there you can:
    • Add the files to source control (e.g., GitHub)
    • Make changes to the application code and configuration files, before merging those changes back into your Stitch app
    • Let other developers review and contribute code
  • Easily share custom services and functions between Stitch apps
  • Push code from development environments, to QA, and onto production.

stitch-cli currently includes 5 commands:

  • login: Log in using an Atlas API Key
  • logout: Deauthenticate as an administrator
  • whoami: Display Current User Info
  • export: Export a stitch application to a local directory
  • import: Import and deploy a stitch application from a local directory

While MongoDB Stitch lets you create complex applications, it also simplifies development by mapping your apps to standard JSON and JavaScript files and structures in a way that's easy to navigate.

An exported application is made up of a hierarchy of directories and files:

  • *.json: Files containing configuration data
  • *.js: Files containing JavaScript code

This is an example hierarchy for a Stitch IoT service that works with the backend MongoDB Atlas database and the REST APIs of 2 other cloud services – IFTTT and

├── auth_providers
│   ├── anon-user.json
│   └── api-key.json
├── functions
│   ├── Imp_Write
│   │   ├── config.json
│   │   └── source.js
│   ├── TempTimeline
│   │   ├── config.json
│   │   └── source.js
│   └── controlHumidity
│       ├── config.json
│       └── source.js
├── services
│   ├── IFTTT
│   │   ├── config.json
│   │   └── rules
│   │       └── IFTTT.json
│   ├── darksky
│   │   ├── config.json
│   │   └── rules
│   │       └── darkSkyGET.json
│   └── mongodb-atlas
│       ├── config.json
│       └── rules
│           └── Imp.TempData.json
├── stitch.json
└── values
    ├── DarkSkyKey.json
    ├── DeviceLocation.json
    ├── MakerIFTTKey.json
    ├── maxHumidity.json
    └── minTemp.json

Installing stitch-cli

The simplest way is to install stitch-cli is as an NPM module:

npm install -g mongodb-stitch-cli 

Logging into the Atlas API

Before doing anything else, you need to create an Atlas API key & whitelist your local machine's IP Address, as shown here:

Creating an Atlas API key

You can now log into the admin API using stitch-cli, and confirm that you're logged in:

stitch-cli login --username=andrew.morgan --api-key=XXXXXXXX-XXX-4010-97a2-ad7c39d8cfb5

stitch-cli whoami
andrew.morgan [API Key: ********-****-****-****-ad7c39d8cfb5]

Export app, make local edits, and merge changes back into the same Stitch app

I noticed that I'm missing some error handling code in my functions. Rather than editing them all through the Stitch UI, I export the app so that I can work on the code in my favorite browser:

Export the Stitch app to the local file system:

stitch-cli export --app-id=imptemp-sobpa

I then add some extra code to handle cases where the MongoDB insert fails:

exports = function(data){

  //Get the current time
  var now = new Date();

  var darksky ="darksky");
  var mongodb ="mongodb-atlas");
  var TempData = mongodb.db("Imp").collection("TempData");

  // Fetch the current weather from

  darksky.get({"url": "" + 
    context.values.get("DarkSkyKey") + '/' + 
    context.values.get("DeviceLocation") +
  }).then(response => {
    var darkskyJSON = EJSON.parse(response.body.text()).currently;

    var status =
        "Timestamp": now.getTime(),
        "Date": now,
        "Readings": data,
        "External": darkskyJSON,
    status.Readings.light = (100*(data.light/65536));
    context.functions.execute("controlHumidity", data.temp, data.humid);
      results => {
        console.log("Successfully wrote document to TempData");
      error => {
        console.log("Error writing to TempData colletion: " + error);

I then apply the updates to the Stitch app (the CLI conveniently shows me the delta before I commit the merge):

cd impTemp
stitch-cli import --app-id=imptemp-sobpa --strategy=merge

--- functions/Imp_Write/source.js
+++ functions/Imp_Write/source.js
@@ -28,6 +28,9 @@
       results => {
         console.log("Successfully wrote document to TempData");
+      },
+      error => {
+        console.log("Error writing to TempData collection: " + error);

Please confirm the changes shown above: [y/n]: y
Successfully imported 'imptemp-sobpa'

You can then validate the changes through the UI:

Updated Stitch function, viewed through the UI


The Stitch CLI make it easier than ever to work with Stitch code – regardless of whether you’re an advanced user or just beginning. If you've created your first stitch app, try the Stitch CLI to start extending it – all from the comfort of your favorite editor. If not, try importing this example dashboard application.

Sign up for the Stitch free tier and see what you can create!

The MongoDB Summer ‘18 Intern Series: Driving Connections from Work to Life

Andrea Dooley
August 07, 2018

Nathan Louie is a MongoDB summer intern on our Server Replication team. He’s a computer science major and sociology minor at the University of Michigan, finishing up his senior year this fall. During his time at school, he’s developed a passion for problem solving within distributed systems and working on lower level projects.

Andrea Dooley: Since your interest lies primarily on the lower levels of the stack, how did you first learn about MongoDB?
Nathan Louie: I first heard about MongoDB at a hackathon as a tool to build apps quickly. But the first time I actually used it was right before my interview. I made an application with MongoDB Stitch, and was impressed with how easy my backend was to set up and interact with.

AD: Since this is your fifth internship, how does it differ from your previous experiences, and what piqued your interest to apply?
NL: My previous internships have been higher up the tech stack. I’ve taken a lot of systems courses in school, and wanted industry experience on the kinds of projects I’ve encountered academically. I looked through the MongoDB code base, and realized it was the type of system I wanted experience in. Finally, when considering the company itself, MongoDB has a high-growth culture with a focused mission and is working on the problems that I’m looking to solve.

AD: How has the project you’re working on this summer provided you with the technical problem solving experience you were looking for?
NL: My primary project this summer is adding diagnostics support for multi-document transactions, which was a feature added in the 4.0 release. It’s a way for people to measure the performance of their transactions, such as total duration and the number of succeeded and failed transactions. Building out diagnostics has helped me understand how our replication and transactions system work. I attended some sessions at this past MongoDB World, and the audience asked about this feature. Sometimes when you’re working deep in the stack, you can feel disconnected from the end users, but the fact that people were waiting for it to become available helped me understand the outcome of my work.

AD: How have you found the overall dynamic of the server team?
NL: In my opinion, the defining thing about MongoDB is how engineering-centric the company is. For instance, I find it really interesting how involved management is with the actual code. My tech leads are very familiar with the architecture, and can hop right into the code base because they wrote a lot of it. It has been extremely valuable for me as an intern because I’m able to learn from people at all levels, and it has also exposed me to an alternative way of managing engineering teams. This experience has helped me visualize a career path where I can remain technical, and not have to sacrifice my passion for deep problem solving while leading a team.

AD: What’s the most interesting thing you’ve learned so far?
NL: The most interesting thing I’ve realized at my time here is the impact my work has on the world. MongoDB is not an immediately recognizable consumer brand, and the types of people who know it well are usually developers and engineers. However, consumers are using MongoDB all the time, likely without realizing it. Whether it’s applied towards compliance software, cryptocurrency, or popular games like Fortnite, the work that I do effects so many different industries, projects, and people across the world. I think it’s very powerful to be able to drive the connection from my work to my everyday life.

AD: What would you say has been the most impactful aspect of your internship experience at MongoDB?
NL: The open source aspect: being able to commit my code and have it be out there for everyone to see. I can log into any computer and show people the work I’m doing and having that level of transparency still blows my mind. Most times, company code is secret, and you can’t just take the code you write and go anywhere. Here, once the code you write gets approved, it immediately gets pushed out for anyone to see. I find it liberating to talk to friends about what I’m working on and have the ability to pull it up and walk through my decision making. It definitely gives me a sense of pride and builds trust in the product.

To learn more about the MongoDB internship program, click here.

Introduction to MongoDB Transactions in Python

Multi-document transactions arrived in MongoDB 4.0 in June 2018. MongoDB has always been transactional around updates to a single document. Now, with multi-document transactions we can wrap a set of database operations inside a start and commit transaction call. This ensures that even with inserts and/or updates happening across multiple collections and/or databases, the external view of the data meets ACID constraints.

To demonstrate transactions in the wild we use a trivial example app that emulates a flight booking for an online airline application. In this simplified booking we need to undertake three operations:

  1. Allocate a seat (seat_collection)
  2. Pay for the seat (payment_collection)
  3. Update the count of allocated seats and sales (audit_collection)

For this application we will use three separate collections for these documents as detailed in bold above. The code in updates these collections in serial unless the --usetxns argument is used. We then wrap the complete set of operations inside an ACID transaction. The code in is built directly using the MongoDB Python driver (Pymongo 3.7.1). See the section on client sessions for an overview of the new transactions API in 3.7.1.

The goal of this code is to demonstrate to the Python developer just how easy it is to covert existing code to transactions if required or to port older SQL based systems.

Setting up your environment

The following files can be found in the associated github repo, pymongo-transactions.

  • gitignore : Standard Github .gitignore for Python
  • LICENSE : Apache's 2.0 (standard Github) license
  • Makefile : Makefile with targets for default operations
  • : Run a set of writes with and without transactions. Run python -h for help.
  • : The file containing the transactions retry functions.
  • : Use a MongoDB change stream to watch collections as they change when is running
  • : Starts a MongoDB replica set (on port 7100) and kills the primary on a regular basis. This is used to emulate an election happening in the middle of a transaction.
  • : check and/or set feature compatibility for the database (it needs to be set to "4.0" for transactions)
You can clone this repo and work alongside us during this blog post (please file any problems on the Issues tab for the repo).

We assume for all that follows that you have Python 3.6 or greater correctly installed and on your path.

The Makefile outlines the operations that are required to setup the test environment.

All the programs in this example use a port range starting at 27100 to ensure that this example does not clash with an existing MongoDB installation.


To setup the environment you can run through the following steps manually. People that have make can speed up installation by using the make install command.

Set a python virtualenv

$ cd pymongo-transactions
$ virtualenv -p python3 venv
$ source venv/bin/activate

Install Python MongoDB Driver, pymongo

Install the latest version of the PyMongo MongoDB Driver (3.7.1 at the time of writing).

pip install --upgrade pymongo

Install Mtools

MTools is a collection of helper scripts to parse, filter, and visualize MongoDB log files (mongod, mongos). mtools also includes mlaunch, a utility to quickly set up complex MongoDB test environments on a local machine. For this demo we are only going to use the mlaunch program.

pip install mtools

the mlaunch program also requires the psutil package.

pip install psutil

The mlaunch program gives us a simple command to start a MongoDB replica set as transactions are only supported on a replica set

Start a replica set whose name is txntest. (see the make init_server make target) for details:

mlaunch init --port 27100 --replicaset --name "txntest"

Using the Makefile for configuration

There is a Makefile with targets for all these operations. For those of you on platforms without access to Make it should be easy enough to cut and paste the commands out of the targets and run them on the command line.

Running the Makefile

cd pymongo-transactions

You will need to have MongoDB 4.0 on your path. There are other convenience targets for starting the demo programs:

  • make notxns : start the transactions client without using transactions
  • make usetxns : start the transactions client with transactions enabled
  • make watch_seats : watch the seats collection changing
  • make watch_payments : watch the payment collection changing

Running the transactions example

The transactions example consists of two python programs. and


$ python -h
usage: [-h] [--host HOST] [--usetxns] [--delay DELAY]
                           [--iterations ITERATIONS]
                           [--randdelay RANDDELAY RANDDELAY]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           MongoDB URI [default: mongodb://localhost:27100,localh
  --usetxns             Use transactions [default: False]
  --delay DELAY         Delay between two insertion events [default: 1.0]
  --iterations ITERATIONS
                        Run N iterations. O means run forever
                        Create a delay set randomly between the two bounds
                        [default: None]

You can choose to use --delay or --randdelay. if you use both --delay takes precedence. The--randdelay parameter creates a random delay between a lower and an upper bound that will be added between each insertion event.

The program knows to use the txntest replica set and the right default port range.

To run the program without transactions you can run it with no arguments:

$ python
using collection: SEATSDB.seats
using collection: PAYMENTSDB.payments
using collection: AUDITDB.audit
Using a fixed delay of 1.0

1. Booking seat: '1A'
1. Sleeping: 1.000
1. Paying 330 for seat '1A'
2. Booking seat: '2A'
2. Sleeping: 1.000
2. Paying 450 for seat '2A'
3. Booking seat: '3A'
3. Sleeping: 1.000
3. Paying 490 for seat '3A'
4. Booking seat: '4A'
4. Sleeping: 1.000

The program runs a function called book_seat() which books a seat on a plane by adding documents to three collections. First it adds the seat allocation to the seats_collection, then it adds a payment to the payments_collection`, finally it updates an audit count in the audit_collection. (This is a much simplified booking process used purely for illustration).

The default is to run the program without using transactions. To use transactions we have to add the command line flag --usetxns. Run this to test that you are running MongoDB 4.0 and that the correct featureCompatibility is configured (it must be set to 4.0). If you install MongoDB 4.0 over an existing /data directory containing 3.6 databases then featureCompatibility will be set to 3.6 by default and transactions will not be available.

Note: If you get the following error running python --usetxns that means you are picking up an older version of pymongo (older than 3.7.x) for which there is no multi-document transactions support.

Traceback (most recent call last):
  File "", line 175, in 
    total_delay = total_delay + run_transaction_with_retry( booking_functor, session)
  File "/Users/jdrumgoole/GIT/pymongo-transactions/", line 52, in run_transaction_with_retry
    with session.start_transaction():
AttributeError: 'ClientSession' object has no attribute 'start_transaction'

Watching Transactions

To actually see the effect of transactions we need to watch what is happening inside the collections SEATSDB.seats and PAYMENTSDB.payments.

We can do this with This script uses MongoDB Change Streams to see what's happening inside a collection in real-time. We need to run two of these in parallel so it's best to line them up side by side.

Here is the program:

$ python -h
usage: [-h] [--host HOST] [--collection COLLECTION]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           mongodb URI for connecting to server [default:
  --collection COLLECTION
                        Watch  [default:

We need to watch each collection so in two separate terminal windows start the watcher.

Window 1:

$ python --watch seats
Watching: seats

Window 2:

$ python --watch payments
Watching: payments

What Happens when you run without transactions?

Lets run the code without transactions first. If you examine the code you will see a function book_seats.

def book_seat(seats, payments, audit, seat_no, delay_range, session=None):
    Run two inserts in sequence.
    If session is not None we are in a transaction

    :param seats: seats collection
    :param payments: payments collection
    :param seat_no: the number of the seat to be booked (defaults to row A)
    :param delay_range: A tuple indicating a random delay between two ranges or a single float fixed delay
    :param session: Session object required by a MongoDB transaction
    :return: the delay_period for this transaction
    price = random.randrange(200, 500, 10)
    if type(delay_range) == tuple:
        delay_period = random.uniform(delay_range[0], delay_range[1])
        delay_period = delay_range

    # Book Seat
    seat_str = "{}A".format(seat_no)
    print(count( i, "Booking seat: '{}'".format(seat_str)))
    seats.insert_one({"flight_no" : "EI178",
                      "seat"      : seat_str,
                      "date"      : datetime.datetime.utcnow()},
    print(count( seat_no, "Sleeping: {:02.3f}".format(delay_period)))
    #pay for seat
    payments.insert_one({"flight_no" : "EI178",
                         "seat"      : seat_str,
                         "date"      : datetime.datetime.utcnow(),
                         "price"     : price},
    audit.update_one({ "audit" : "seats"}, { "$inc" : { "count" : 1}}, upsert=True)
    print(count(seat_no, "Paying {} for seat '{}'".format(price, seat_str)))

    return delay_period

This program emulates a very simplified airline booking with a seat being allocated and then paid for. These are often separated by a reasonable time frame (e.f. seat allocation vs external credit card validation and anti-fraud check) and we emulate this by inserting a delay. The default is 1 second.

Now with the two scripts running for seats_collection and payments_collection we can run as follows:

$ python

The first run is with no transactions enabled.

The bottom window shows running. On the top left we are watching the inserts to the seats collection. On the top right we are watching inserts to the payments collection.

watching without transactions

We can see that the payments window lags the seats window as the watchers only update when the insert is complete. Thus seats sold cannot be easily reconciled with corresponding payments. If after the third seat has been booked we CTRL-C the program we can see that the program exits before writing the payment. This is reflected in the Change Stream for the payments collection which only shows payments for seat 1A and 2A versus seat allocations for 1A, 2A and 3A.

If we want payments and seats to be instantly reconcilable and consistent we must execute the inserts inside a transaction.

What happens when you run with Transactions?

Now lets run the same system with --usetxns enabled.

$ python --usetxns

We run with the exact same setup but now set --usetxns.

watching with transactions

Note now how the change streams are interlocked and are updated in parallel. This is because all the updates only become visible when the transaction is committed. Note how we aborted the third transaction by hitting CTRL-C. Now neither the seat nor the payment appear in the change streams unlike the first example where the seat went through.

This is where transactions shine in world where all or nothing is the watchword. We never want to keeps seats allocated unless they are paid for.

What happens during failure?

In a MongoDB replica set all writes are directed to the Primary node. If the primary node fails or becomes inaccessible (e.g. due to a network partition) writes in flight may fail. In a non-transactional scenario the driver will recover from a single failure and retry the write. In a multi-document transaction we must recover and retry in the event of these kinds of transient failures. This code is encapsulated in We both retry the transaction and retry the commit to handle scenarios where the primary fails within the transaction and/or the commit operation.

def commit_with_retry(session):
    while True:
            # Commit uses write concern set at transaction start.
            print("Transaction committed.")
        except (pymongo.errors.ConnectionFailure, pymongo.errors.OperationFailure) as exc:
            # Can retry commit
            if exc.has_error_label("UnknownTransactionCommitResult"):
                print("UnknownTransactionCommitResult, retrying "
                      "commit operation ...")
                print("Error during commit ...")

def run_transaction_with_retry(functor, session):
    assert (isinstance(functor, Transaction_Functor))
    while True:
            with session.start_transaction():
                result=functor(session)  # performs transaction
        except (pymongo.errors.ConnectionFailure, pymongo.errors.OperationFailure) as exc:
            # If transient error, retry the whole transaction
            if exc.has_error_label("TransientTransactionError"):
                print("TransientTransactionError, retrying "
                      "transaction ...")

    return result

In order to observe what happens during elections we can use the script This script will start a replica-set and continuously kill the primary.

$ make kill_primary
. venv/bin/activate && python
no nodes started.
Current electionTimeoutMillis: 500
1. (Re)starting replica-set
no nodes started.
1. Getting list of mongod processes
Process list written to mlaunch.procs
1. Getting replica set status
1. Killing primary node: 31029
1. Sleeping: 1.0
2. (Re)starting replica-set
launching: "/usr/local/mongodb/bin/mongod" on port 27101
2. Getting list of mongod processes
Process list written to mlaunch.procs
2. Getting replica set status
2. Killing primary node: 31045
2. Sleeping: 1.0
3. (Re)starting replica-set
launching: "/usr/local/mongodb/bin/mongod" on port 27102
3. Getting list of mongod processes
Process list written to mlaunch.procs
3. Getting replica set status
3. Killing primary node: 31137
3. Sleeping: 1.0 resets electionTimeOutMillis to 500ms from its default of 10000ms (10 seconds). This allows elections to resolve more quickly for the purposes of this test as we are running everything locally.

Once is running we can start up again using the --usetxns argument.

$ make usetxns
. venv/bin/activate && python --usetxns
Forcing collection creation (you can't create collections inside a txn)
Collections created
using collection: PYTHON_TXNS_EXAMPLE.seats
using collection: PYTHON_TXNS_EXAMPLE.payments
using collection: PYTHON_TXNS_EXAMPLE.audit
Using a fixed delay of 1.0
Using transactions

1. Booking seat: '1A'
1. Sleeping: 1.000
1. Paying 440 for seat '1A'
Transaction committed.
2. Booking seat: '2A'
2. Sleeping: 1.000
2. Paying 330 for seat '2A'
Transaction committed.
3. Booking seat: '3A'
3. Sleeping: 1.000
TransientTransactionError, retrying transaction ...
3. Booking seat: '3A'
3. Sleeping: 1.000
3. Paying 240 for seat '3A'
Transaction committed.
4. Booking seat: '4A'
4. Sleeping: 1.000
4. Paying 410 for seat '4A'
Transaction committed.
5. Booking seat: '5A'
5. Sleeping: 1.000
5. Paying 260 for seat '5A'
Transaction committed.
6. Booking seat: '6A'
6. Sleeping: 1.000
TransientTransactionError, retrying transaction ...
6. Booking seat: '6A'
6. Sleeping: 1.000
6. Paying 380 for seat '6A'
Transaction committed.

As you can see during elections the transaction will be aborted and must be retried. If you look at the code you will see how this happens. If a write operation encounters an error it will throw one of the following exceptions:

Within these exceptions there will be a label called TransientTransactionError. This label can be detected using the has_error_label(label) function which is available in pymongo 3.7.x. Transient errors can be recovered from and the retry code in has code that retries for both writes and commits (see above).


Multi-document transactions are the final piece of the jigsaw for SQL developers who have been shying away from trying MongoDB. ACID transactions make the programmer's job easier and give teams that are migrating from an existing SQL schema a much more consistent and convenient transition path.

As most migrations involving a move from highly normalised data structures to more natural and flexible nested JSON documents one would expect that the number of required multi-document transactions will be less in a properly constructed MongoDB application. But where multi-document transactions are required programmers can now include them using very similar syntax to SQL.

With ACID transactions in MongoDB 4.0 it can now be the first choice for an even broader range of application use cases.

Why not try our transactions today by setting up your first cluster on MongoDB Atlas our Database as a Service offering.

To try it locally download MongoDB 4.0.

Join us at MongoDB Europe 2018 for deep-dive technical sessions and hands-on tutorials.

Charting a Course to MongoDB Atlas: Part 1 - Preparing for the Journey

Michael Lynn
August 01, 2018

MongoDB Atlas is an automated cloud MongoDB service engineered and run by the same team that builds the database. It incorporates operational best practices we’ve learned from optimizing thousands of deployments across startups and the Fortune 100. You can build on MongoDB Atlas with confidence, knowing you no longer need to worry about database management, setup and configuration, software patching, monitoring, backups, or operating a reliable, distributed database cluster.

MongoDB introduced its Database as a Service offering, in July of 2016 and it’s been a phenomenal success since its launch. Since then, thousands of customers have deployed highly secure, highly scalable and performant MongoDB databases using this service. Among its most compelling features are the ability to deploy Replica Sets in any of the major cloud hosting providers (AWS, Azure, GCP) and the ability to deploy database clusters spanning multiple cloud regions. In this series, I’ll explain the steps you can follow to migrate data from your existing MongoDB database into MongoDB Atlas.

Preparing for the Journey

Before you embark on any journey regardless of the destination, it’s always a good idea to take some time to prepare. As part of this preparation, we’ll review some options for the journey — methods to get your data migrated into MongoDB Atlas — along with some best practices and potential wrong turns to watch out for along the way.

Let’s get a bit more specific about the assumptions I’ve made in this article.

  • You have data that you want to host in MongoDB Atlas.
    • There’s probably no point in continuing from here if you don’t want to end up with your data in MongoDB Atlas.
  • Your data is currently in a MongoDB database.
    • If you have data in some other format, all is not lost — we can help. However, we’re going to address a MongoDB to MongoDB migration in this series. If you have other requirements -- data in another database or another format, for example, let me know you’ll like an article covering migration from some other database to MongoDB and I’ll make that the subject of a future series.
  • Your current MongoDB database is running MongoDB Version 3.0 or greater. MongoDB Atlas supports version 3.4, and 3.6. Therefore, we’ll need to work to get your database upgraded either as part of the migration - or, you can handle that ahead of the migration. We have articles and documentation designed to help you upgrade your MongoDB instances should you need.
  • Your data is in a clustered deployment (Sharded or Replica Set). We’ll cover converting a standalone deployment to a replica set in part 3 of this series.

At a high level, there are 4 basic steps to migrating your data. Let’s take a closer look at the journey:

  1. Deploy a Destination Cluster in MongoDB Atlas
  2. Prepare for the Journey
  3. Migrate the databases
  4. Cutover and Modify Your Applications to use the new MongoDB Atlas-based Deployment

As we approach the journey, it's important to know the various routes from your starting point to your eventual destination. Each route has its considerations and benefits and the choice of which route you choose will ultimately be up to you. Review the following table which presents a list of the available data migration methods from which you may choose.

Method Descriptions Considerations Benefits Version Notes
Live Import Fully automated via the Atlas administrative console. Downtime: Minimal - Cutover Only. Fully automated. From:Version 2.6, 3.0, 3.2, 3.4To: 3.4, 3.6
mongomirror mongomirror is a utility for migrating data from an existing MongoDB replica set to a MongoDB Atlas replica set. mongomirror does not require you to shut down your existing replica set or applications Downtime: Minimal - Cutover Only. Version 2.6 or great 3.4, 3.6
mongorestore mongorestore is a command-line utility program that loads data from either a binary database dump created by mongodump or the standard input. Downtime required Version 2.6 or Greater 3.4, 3.6
mongoimport mongoimport tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool Downtime required

For a majority of deployments, Live Import is the best, most efficient route to get your data into MongoDB Atlas. It offers the ability to keep your existing cluster up and active (but not too active, see considerations.) There are considerations, however. If you’re not located in a region that is geographically close to the US-EAST AWS datacenter, for example, you may encounter unacceptable latency. There are a number of possible concerns you should consider prior to embarking on your migration journey. The following section offers some helpful route guidance to ensure that you’re going in the right direction and moving steadily toward your destination.

Route Guidance for the Migration Journey

If you've made it this far, you’re likely getting ready to embark on a journey that will bring your data into a robust, secure, and scalable environment within MongoDB Atlas. The potential to encounter challenges along the way is real and the likelihood of encountering difficulties depends primarily upon your starting point in that journey. In this section, I’ll discuss some potential issues you may encounter as you prepare for your migration journey. A summary of the potential detours and guidance for each is presented in the following table.

Follow the links in the table to read more about each potential detour and its relevant guidance:

Potential Detour Guidance Reference
Insufficient RAM on Destination Cluster Calculate the RAM required for your application and increase that to account for the migration process requirements How do I calculate how much RAM I need for my application?
Too Much Network Latency Between Source and Destination Reduce Latency, or leverage mongodump/mongorestore instead of Live Import
Insufficient Network Access due to missing IP Whitelist or Firewall Rules Ensure that MongoDB Live Import Application Servers are whitelisted and that corporate firewalls permit access between source, destination
Insufficient user access permissions to source database deployment Ensure that authentication is enabled and that the user credentials granted for source database have required entitlements
Insufficient Oplog Size on Destination Size the operations log appropriately based on the application workload Sizing the Operations Log

Potential Detour: Insufficient RAM on Destination Cluster

Every deployment of MongoDB requires some form of resource to run efficiently. These resource requirements will include things like RAM, CPU, Disk and Network. To ensure acceptable response times and performance of the database, we typically look to the application’s read/write profile to inform the decisions we make about the amounts and sizes of each of these resources we’ll need for our deployment.

The amount of RAM a deployment will require is largely informed by the applications’ demand for data in the database. To approximate RAM requirements, we typically look at the frequently accessed documents in each collection, adding up the total data size and then we increase that by the total size of required indexes. Referred to as the working set, this is typically the approximate amount of RAM we’ll want our deployment to have. A more complete discussion of sizing will be found in the documentation pages on sizing for MongoDB.

Sizing is a tricky task especially for the cost constrained. We obviously don’t want to waste money by over-provisioning servers larger than those we’ll need to support the profile of our users and applications. However, it is important to consider that during our migration, we’ll not only need to account for the application requirements -- we also need to account for the resources required by the migration process itself. Therefore, you will want to ensure that you surpass the requirements for your production implementation when sizing your destination cluster.

Route Guidance: Increase available RAM During Migration

The size of the destination cluster should provide adequate resource across all environmentals (storage, CPU, and Memory) with room to spare. The migration process will require additional CPU and Memory as the destination database is being built from the source. It’s quite common for incoming clusters to be undersized and as a result the migration process fails. If this happens during a migration, you must empty the destination cluster, and resize the cluster to a larger M-Value to increase the amount of available RAM. A great feature of Atlas is that resizing -- in both directions, is extremely easy to do. Whether you’re adding resource (increasing the amount of RAM, Disk, CPU, shards, etc.) or decreasing the same, the process is very simple. Therefore, increasing the resource available on your target environment is painless and easy -- and once the migration completes, you can simply scale back down to a cluster size with less RAM, and CPU.

Potential Detour: Network Latency

Latency is defined as the amount of time it takes for a packet of data to get from one designated point to another. Because the migration process is all about moving packets of data between servers it is by its very nature latency sensitive.

Migrating data into MongoDB Atlas leveraging the Live Import capability involves connecting your source MongoDB Instance to a set of application servers running in the AWS us-east-1 region. These servers act as the conductors running the actually migration process between your source and destination MongoDB Database Servers. A potential detour can crop up when your source MongoDB database deployment exists in a datacenter located far from the AWS us-east-1 region.

Route Guidance: Reduce latency if possible or use mongomirror instead of Live Import

Should your source MongoDB Database servers exist in regions far from these application servers, you may need to leverage mongomirror, mongodump/mongorestore rather than Live Import.

Potential Detour: Network Access

In order to accomplish a migration using Live Import, Atlas streams data through a set of MongoDB-Controller application servers. Atlas provides the IP Address ranges of the MongoDB Live Import servers during the Live Import process. You must be certain to add these IP Address ranges to the IP Whitelist for your Destination cluster.

The migration processes within Atlas run on a set of application servers --- these are the traffic directors. The following is a list of the IP Addresses on which these application servers depend. It is important to ensure that traffic between these servers, your source cluster and the destination cluster is able to freely flow. These addresses are in C.I.D.R. notation.


An additional area where a detour may be encountered is in the realm of corporate firewall policy.

To avoid these potential detours, ensure that you have the appropriate connectivity from the networks where your source deployment resides to the networks where MongoDB Atlas exists.

Route Guidance: Whitelist the IP Ranges of the MongoDB Live Import Process

These IP ranges will be provided at the start of the migration process. Ensure that you configure the whitelist to enable appropriate access during the migration.

Potential Detour: Insufficient User Rights on Source Deployment

Every deployment of MongoDB should enforce authentication. This will ensure that only appropriate individuals and applications may access your MongoDB data.

A potential detour may arise when you attempt to Live Migrate a database without creating or providing the appropriately privileged user credentials.

If the source cluster enforces authentication, create a user with the following privileges:

  • Read all databases and collections (i.e. readAnyDatabase on the admin database)
  • Read the oplog.

Route Guidance: Ensure Appropriate User Access Permissions on the Source Deployment

Create a SCRAM user and password on each server in the replica set and ensure that this user belongs to roles that have the following permissions:

Read and write to the config database Read all databases and collections. Read the oplog.

For example:

  • For 3.4+ source clusters, a user with both clusterMonitor and backup roles would have the appropriate privileges.
  • For 3.2 source cluster, a user with clusterMonitor, clusterManager, and backup roles would have appropriate privileges.

Specify the username and password to Atlas when prompted by the Live Migration procedure.

Also, once you’ve migrated your data, if the source cluster enforced authentication you must consider that Atlas does not migrate any user or role data to the destination cluster. Therefore, you must re-create the credentials used by your applications on the destination Atlas cluster. Atlas uses SCRAM for user authentication. See Add MongoDB Users for a tutorial on creating MongoDB users in Atlas.

Potential Detour: Insufficient Oplog Size on Destination

The oplog, or operations log is a capped collection that keeps a rolling record of all operations that modify the data stored in your databases. When you create an Atlas cluster to serve as the destination for your migration, by default Atlas creates the oplog size at 5% of the total amount of disk you allocated for the cluster. If the activity profile of your application requires a larger oplog size, you will need to submit a proactive support ticket to have the oplog size increased on your destination cluster.

Route Guidance: Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

As stated previously, the decisions regarding the resources we apply to a given MongoDB Deployment are informed by the profile of the applications that depend on the database. As such, there are certain application read/write profiles or workloads that require a larger than default operations log. These are listed in detail in the documentation pages on the subject of Replica Set Oplog. Here is a summary of the workloads that typically require a larger than normal Oplog:

Updates to Multiple Documents at Once

The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use.

Deletions Equal the Same Amount of Data as Inserts

If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.

Significant Number of In-Place Updates If a significant portion of the workload is updates that do not increase the size of the documents, the database records a large number of operations but does not change the quantity of data on disk.

In Conclusion

Regardless of your starting point, MongoDB provides a robust, secure and scalable destination for your data. MongoDB Atlas Live Import automates and simplifies the process of migrating your data to MongoDB Atlas. The command line version of this utility, called mongomirror, gives users additional control and flexibility around how the data gets migrated. Other options include exporting (mongoexport) and importing (mongoimport) your data manually or even writing your own application to accomplish migration. The decision to use one particular method over another depends upon the size of your database, its geographic location as well as your tolerance for application downtime.

If you choose to leverage MongoDB Atlas Live Import, be aware of the following potential challenges along the journey.

  • Increase available RAM During Migration sufficient for application plus migration requirements.
  • Reduce latency if possible or use mongomirror instead of Live Import.
  • Whitelist the IP Ranges of the MongoDB Live Import Process
  • Ensure Appropriate User Access Permissions on the Source Deployment
  • Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

Now that you’re fully prepared, let’s embark on the journey and I’ll guide you through the process of deploying a cluster in MongoDB Atlas and walk you through migrating your data from an AWS Replica Set.