GIANT Stories at MongoDB

An Introduction to Change Streams

There is tremendous pressure for applications to immediately react to changes as they occur. As a new feature in MongoDB 3.6, change streams enable applications to stream real-time data changes by leveraging MongoDB’s underlying replication capabilities. Think powering trading applications that need to be updated in real time as stock prices change. Or creating an IoT data pipeline that generates alarms whenever a connected vehicle moves outside of a geo-fenced area. Or updating dashboards, analytics systems, and search engines as operational data changes. The list, and the possibilities, go on, as change streams give MongoDB users easy access to real-time data changes without the complexity or risk of tailing the oplog (operation log). Any application can readily subscribe to changes and immediately react by making decisions that help the business to respond to events in real time.

Change streams can notify your application of all writes to documents (including deletes) and provide access to all available information as changes occur, without polling that can introduce delays, incur higher overhead (due to the database being regularly checked even if nothing has changed), and lead to missed opportunities.

Characteristics of change streams

  1. Targeted changes
    Changes can be filtered to provide relevant and targeted changes to listening applications. As an example, filters can be on operation type or fields within the document.
  2. Resumablility
    Resumability was top of mind when building change streams to ensure that applications can see every change in a collection. Each change stream response includes a resume token. In cases where the connection between the application and the database is temporarily lost, the application can send the last resume token it received and change streams will pick up right where the application left off. In cases of transient network errors or elections, the driver will automatically make an attempt to reestablish a connection using its cached copy of the most recent resume token. However, to resume after application failure, the applications needs to persist the resume token, as drivers do not maintain state over application restarts.
  3. Total ordering
    MongoDB 3.6 has a global logical clock that enables the server to order all changes across a sharded cluster. Applications will always receive changes in the order they were applied to the database.
  4. Durability
    Change streams only include majority-committed changes. This means that every change seen by listening applications is durable in failure scenarios such as a new primary being elected.
  5. Security
    Change streams are secure – users are only able to create change streams on collections to which they have been granted read access.
  6. Ease of use
    Change streams are familiar – the API syntax takes advantage of the established MongoDB drivers and query language, and are independent of the underlying oplog format.
  7. Idempotence
    All changes are transformed into a format that’s safe to apply multiple times. Listening applications can use a resume token from any prior change stream event, not just the most recent one, because reapplying operations is safe and will reach the same consistent state.

An example

Let’s imagine that we run a small grocery store. We want to build an application that notifies us every time we run out of stock for an item. We want to listen for changes on our stock collection and reorder once the quantity of an item gets too low.

{    _id: 123UAWERXHZK4GYH
    product: pineapple
    quantity: 3
}

Setting up the cluster

As a distributed database, replication is a core feature of MongoDB, mirroring changes from the primary replica set member to secondary members, enabling applications to maintain availability in the event of failures or scheduled maintenance. Replication relies on the oplog (operation log). The oplog is a capped collection that records all of the most recent writes, it is used by secondary members to apply changes to their own local copy of the database. In MongoDB 3.6, change streams enable listening applications to easily leverage the same internal, efficient replication infrastructure for real-time processing.

To use change streams, we must first create a replica set. Download MongoDB 3.6 and after installing it, run the following commands to set up a simple, single-node replica set (for testing purposes).

mkdir -pv data/db 
mongod --dbpath ./data/db --replSet "rs"

Then in a separate shell tab, run: mongo

After the rs:PRIMARY> prompt appears, run: rs.initiate()

If you have any issues, check out our documentation on creating a replica set.

Seeing it in action

Now that our replica set is ready, let’s create a few products in a demo database using the following Mongo shell script:

ner/f73d3fc5 Copy the code above into a createProducts.js text file and run it in a Terminal window with the following command: mongo createProducts.js.

Creating a change stream application

Now that we have documents being constantly added to our MongoDB database, we can create a change stream that monitors and handles changes occurring in our stock collection:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

const changeStreamCursor = collection.watch();

pollStream(changeStreamCursor);

//this function polls a change stream and prints out each change as it comes in
function pollStream(cursor) {
  while (!cursor.isExhausted()) {
    if (cursor.hasNext()) {
      change = cursor.next();
      print(JSON.stringify(change));
    }
  }
  pollStream(cursor);
}
By using the parameterless <a href="http://mongodb.github.io/node-mongodb-native/3.0/api/Collection.html#watch" target="_blank"> watch()</a> method, this change stream will signal every write to the stock collection. In the simple example above, we’re logging the change stream's data to the console. In a real-life scenario, your listening application would do something more useful (such as replicating the data into a downstream system, sending an email notification, reordering stock...). Try inserting a document through the mongo shell and see the changes logged in the Mongo Shell.

Creating a targeted change stream

Remember that our original goal wasn’t to get notified of every single update in the stock collection, just when the inventory of each item in the stock collection falls below a certain threshold. To achieve this, we can create a more targeted change stream for updates that set the quantity of an item to a value no higher than 10. By default, update notifications in change streams only include the modified and deleted fields (i.e. the document “deltas”), but we can use the optional parameter <a href="https://docs.mongodb.com/manual/reference/method/db.collection.watch?jmp=blog#change-stream-with-full-document-update-lookup"> fullDocument: "updateLookup"</a> to include the complete document within the change stream, not just the deltas.
const changeStream = collection.watch(
  [{
    $match: {
      $and: [
        { "updateDescription.updatedFields.quantity": { $lte: 10 } },
        { operationType: "update" }
      ]
    }
  }],
  {
    fullDocument: "updateLookup"
  }
);

Note that the fullDocument property above reflects the state of the document at the time lookup was performed, not the state of the document at the exact time the update was applied. Meaning, other changes may also be reflected in the fullDocument field. Since this use case only deals with updates, it was preferable to build match filters using updateDescription.updatedFields, instead of fullDocument.

The full Mongo shell script of our filtered change stream is available below:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

let updateOps = {
  $match: {
    $and: [
      { "updateDescription.updatedFields.quantity": { $lte: 10 } },
      { operationType: "update" }
    ]
  }
};

const changeStreamCursor = collection.watch([updateOps]);

pollStream(changeStreamCursor);

//this function polls a change stream and prints out each change as it comes in
function pollStream(cursor) {
  while (!cursor.isExhausted()) {
    if (cursor.hasNext()) {
      change = cursor.next();
      print(JSON.stringify(change));
    }
  }
  pollStream(cursor);
}
In order to test our change stream above, let’s run the following script to set the quantity of all our current products to values less than 10:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;
let updatedQuantity = 1;

function sleepFor(sleepDuration) {
  var now = new Date().getTime();
  while (new Date().getTime() < now + sleepDuration) {
    /* do nothing */
  }
}

function update() {
  sleepFor(1000);
  res = collection.update({quantity:{$gt:10}}, {$inc: {quantity: -Math.floor(Math.random() * 10)}}, {multi: true});
  print(res)
  updatedQuantity = res.nMatched + res.nModified;
}

while (updatedQuantity > 0) {
  update();
}
You should now see the change stream window display the update shortly after the script above updates our products in the stock collection.

Resuming a change stream

In most cases, drivers have retry logic to handle loss of connections to the MongoDB cluster (such as , timeouts, or transient network errors, or elections). In cases where our application fails and wants to resume, we can use the optional parameter resumeAfter : <resumeToken>, as shown below:

conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs");
db = conn.getDB("demo");
collection = db.stock;

const changeStreamCursor = collection.watch();
resumeStream(changeStreamCursor, true);

function resumeStream(changeStreamCursor, forceResume = false) {
  let resumeToken;
  while (!changeStreamCursor.isExhausted()) {
    if (changeStreamCursor.hasNext()) {
      change = changeStreamCursor.next();
      print(JSON.stringify(change));
      resumeToken = change._id;
      if (forceResume === true) {
        print("\r\nSimulating app failure for 10 seconds...");
        sleepFor(10000);
        changeStreamCursor.close();
        const newChangeStreamCursor = collection.watch([], {
          resumeAfter: resumeToken
        });
        print("\r\nResuming change stream with token " + JSON.stringify(resumeToken) + "\r\n");
        resumeStream(newChangeStreamCursor);
      }
    }
  }
  resumeStream(changeStreamCursor, forceResume);
}
With this resumability feature, MongoDB change streams provide at-least-once semantics. It is therefore up to the listening application to make sure that it has not already processed the change stream events. This is especially important in cases where the application’s actions are not idempotent (for instance, if each event triggers a wire transfer).

All the of shell scripts examples above are available in the following GitHub repository. You can also find similar Node.js code samples here, where a more realistic technique is used to persist the last change stream token before it is processed.

Next steps

I hope that this introduction gets you excited about the power of change streams in MongoDB 3.6.

If you want to know more:

If you have any question, feel free to file a ticket at https://jira.mongodb.org or connect with us through one of the social channels we use to interact with the developer community.

About the authors – Aly Cabral and Raphael Londner

Aly Cabral is a Product Manager at MongoDB. With a focus on Distributed Systems (i.e. Replication and Sharding), when she hears the word election she doesn’t think about politics. You can follow her or ask any questions on Twitter at @aly_cabral

Raphael Londner is a Principal Developer Advocate at MongoDB. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

Integrating MongoDB Atlas with Heroku Private Spaces

Introduction

Heroku and MongoDB Atlas are the perfect fit for modern, cloud-based app development and deployment. Since its inception in 2007, Heroku has been a PaaS (Platform-as-a-Service) favorite of developer and operations teams thanks to its tight integration to CI tools and ease of app deployment. MongoDB is also a long-time favorite of developers who value increasing their productivity and decreasing application development cycles. MongoDB’s fully managed DBaaS (Database-as-a-Service), Atlas, is also popular among cloud DevOps teams, who are naturally demanding a strong integration between Heroku and MongoDB Atlas.

Today, we are happy to present a tutorial showcasing how to securely integrate Heroku with MongoDB Atlas.

Protecting your cloud data assets with MongoDB Atlas

MongoDB Atlas provides industry-grade, out-of-the-box security controls: encrypted data in-flight and at-rest, encrypted backups, authentication enabled by default, IP whitelisting and VPC Peering (with customer-owned AWS accounts) are strong safeguards MongoDB provides its users to ensure their data is safe in the cloud.

Companies hosting their MongoDB Atlas-backed applications on Heroku typically require that their data be only accessed by their applications. This has proved to be challenging in most Heroku deployments, which typically don’t offer guarantees that requests performed by their hosted applications originate from fixed IPs or a fixed range of IPs (defined as CIDR blocks).

With Heroku Private Spaces however, companies can combine Heroku powerful developer experience with enterprise-grade secure network topologies. More specifically, peering a Heroku Private Space with a MongoDB Atlas cluster running in AWS is a straightforward option to secure the communication between a Heroku-deployed application and a MongoDB Atlas database, by using MongoDB Atlas VPC Peering capabilities.

The tutorial below goes through the specific steps required to link a Heroku Private Space with a MongoDB Atlas project.

Initiating the VPC Peering request

The first step is to initiate the VPC Peering request on the Atlas side. To do so, it’s necessary to retrieve a few parameters from the Heroku Private Space, by using the Heroku CLI. After logging in with an account having access to a Private Space, use the spaces:peering:info command to retrieve the AWS information required by MongoDB Atlas:

heroku spaces:peering:info <your_private_space_name>

Heroku CLI console

In the screenshot above, I chose to use a Private Space hosted in the us-west-2 AWS region (aptly prefixed "oregon-*"), since my M10 MongoDB Atlas cluster is also deployed in that region.

Copy the AWS Account ID, AWS Region, AWS VPC ID and AWS VPC CIDR values from the Heroku console above.

Now, head over the MongoDB Atlas website and navigate to the Security tab of your cluster (M10 or above and in the same region as your Heroku Private Space). Select the +New Peering Connection button and fill out the form with the values you previously copied:

MongoDB Atlas VPC Peering New Connection Form

Press the Initiate Peering button, and verify that the VPC Peering request appears in Atlas’ VPC Peering list (with a "Waiting for Approval" status):

MongoDB Atlas VPC Peering Waiting for Approval

Approving the VPC Peering request

Now that the VPC Peering request has been initiated on the MongoDB Atlas side, let’s approve it on the Heroku side. In the Heroku console, the following command should display the request we just created in MongoDB Atlas:

heroku spaces:peerings <your_private_space_name>

Heroku CLI VPC Peering pending requests

Take note of the PCX ID value of your VPC Peering ID and pass it to Heroku space:peerings:accept command:

heroku spaces:peerings:accept <your_PCX_ID> --space <your_private_space_name>

Heroku CLI Accept VPC Peering request

Verifying that VPC Peering works

The first step to verify that VPC Peering has been properly set up between your Heroku Private Space and MongoDB Atlas is by running the following Heroku command again:

heroku spaces:peerings <your_private_space_name>

Heroku CLI VPC Peering accepted requests

The peering connection should now appear as active.

In MongoDB Atlas, the peering connection should now also appear as available:

MongoDB Atlas Approved VPC Peerings

The next verification step would be to run an Heroku-deployed app connected to your MongoDB Atlas cluster and verify that you can read from or write to it.

For instance, you could clone this GitHub repository, customize its config.js file with your MongoDB Atlas connection string, and deploy its atlas-test branch to your Heroku Private Space using Heroku GitHub Deploys. Since Heroku automatically runs npm start for each Node-detected app, it will keep calling the produce.js script. As a result, documents should be created in the devices collection of a demo database in your Atlas cluster (if it doesn’t, I recommend that you first verify that the CIDR block of your Heroku Private Space is present in the IP Whitelist of your MongoDB Atlas cluster).

Next steps

I hope that you found this Heroku-MongoDB Atlas integration tutorial useful. As next steps, I recommend the following:

Sign up for MongoDB Atlas if you don’t already use it.

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

JSON Schema Validation and Expressive Query Syntax in MongoDB 3.6

One of MongoDB’s key strengths has always been developer empowerment: by relying on a flexible schema architecture, MongoDB makes it easier and faster for applications to move through the development stages from proof-of-concept to production and iterate over update cycles as requirements evolve.

However, as applications mature and scale, they tend to reach a stable stage where frequent schema changes are no longer critical or must be rolled out in a more controlled fashion, to prevent undesirable data from being inserted into the database. These controls are especially important when multiple applications write into the same database, or when analytics processes rely on predefined data structures to be accurate and useful.

MongoDB 3.2 was the first release to introduce Document Validation, one of the features that developers and DBAs who are accustomed to relational databases kept demanding. As MongoDB’s CTO, Eliot Horowitz, highlighted in Document Validation and What Dynamic Schemas means:

Along with the rest of the 3.2 "schema when you need it" features, document validation gives MongoDB a new, powerful way to keep data clean. These are definitely not the final set of tools we will provide, but is rather an important step in how MongoDB handles schema.

Announcing JSON Schema Validation support

Building upon MongoDB 3.2’s Document Validation functionality, MongoDB 3.6 introduces a more powerful way of enforcing schemas in the database, with its support of JSON Schema Validation, a specification which is part of IETF’s emerging JSON Schema standard.

JSON Schema Validation extends Document Validation in many different ways, including the ability to enforce schemas inside arrays and prevent unapproved attributes from being added. These are the new features we will focus on in this blog post, as well as the ability to build business validation rules.

Starting with MongoDB 3.6, JSON Schema is the recommended way of enforcing Schema Validation. The next section highlights the features and benefits of using JSON Schema Validation.

Switching from Document Validation to JSON Schema Validation

We will start by creating an orders collection (based on an example we published in the Document Validation tutorial blog post):

db.createCollection("orders", {
  validator: {
    item: { $type: "string" },
    price: { $type: "decimal" }
  }
});

With this document validation configuration, we not only make sure that both the item and price attributes are present in any order document, but also that item is a string and price a decimal (which is the recommended type for all currency and percentage values). Therefore, the following element cannot be inserted (because of the "rogue" price attribute):

db.orders.insert({
    "_id": 6666, 
    "item": "jkl", 
    "price": "rogue",
    "quantity": 1 });

However, the following document could be inserted (notice the misspelled "pryce" attribute):

db.orders.insert({
    "_id": 6667, 
    "item": "jkl", 
    "price": NumberDecimal("15.5"),
    "pryce": "rogue" });

Prior to MongoDB 3.6, you could not prevent the addition of misspelled or unauthorized attributes. Let’s see how JSON Schema Validation can prevent this behavior. To do so, we will use a new operator, $jsonSchema:

db.runCommand({
  collMod: "orders",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["item", "price"],
      properties: {

       item: {
            bsonType: "string"
       },
       price: {
          bsonType: "decimal"
        }
      }
    }
  }
});

The JSON Schema above is the exact equivalent of the document validation rule we previously set above on the orders collection. Let’s check that our schema has indeed been updated to use the new $jsonSchema operator by using the db.getCollectionInfos() method in the Mongo shell:

db.getCollectionInfos({name:"orders"})

This command prints out a wealth of information about the orders collection. For the sake of readability, below is the section that includes the JSON Schema:

...
"options" : {
    "validator" : {
        "$jsonSchema" : {
            "bsonType" : "object",
            "required" : [
                "item",
                "price"
            ],
            "properties" : {
                "item" : {
                    "bsonType" : "string"
                },
                "price" : {
                    "bsonType" : "decimal"
                }
            }
        }
    },
    "validationLevel" : "strict",
    "validationAction" : "error"
}
...

Now, let’s enrich our JSON schema a bit to make better use of its powerful features:

db.runCommand({
  collMod: "orders",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      <strong>additionalProperties: false</strong>,
      required: ["item", "price"],
      properties: {
       <strong>_id: {}</strong>,
       item: {
            bsonType: "string",
            description: "'item' must be a string and is required"
        },
        price: {
          bsonType: "decimal",
          description: "'price' must be a decimal and is required"
        },
        quantity: {
          <strong>bsonType: ["int", "long"]</strong>,
          minimum: 1,
          maximum: 100,
          exclusiveMaximum: true,
          description:
            "'quantity' must be short or long integer between 1 and 99"
        }
      }
    }
  }
});
Let’s go through the additions we made to our schema:

  • First, note the use of the additionalProperties:false attribute: it prevents us from adding any attribute other than those mentioned in the properties section. For example, it will no longer be possible to insert data containing a misspelled pryce attribute. As a result, the use of additionalProperties:false at the root level of the document also makes the declaration of the _id property mandatory: whether our insert code explicitly sets it or not, it is a field MongoDB requires and would automatically create, if not present. Thus, we must include it explicitly in the properties section of our schema.
  • Second, we have chosen to declare the quantity attribute as either a short or long integer between 1 and 99 (using the minimum, maximum and exclusiveMaximum attributes). Of course, because our schema only allows integers lower than 100, we could simply have set the bsonType property to int. But adding long as a valid type makes application code more flexible, especially if there might be plans to lift the maximum restriction.
  • Finally, note that the description attribute (present in the item, price, and quantity attribute declarations) is entirely optional and has no effect on the schema aside from documenting the schema for the reader.

With the schema above, the following documents can be inserted into our orders collection:

db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    "quantity": NumberInt(99)
  });

  db.orders.insert({ 
    "item": "jklm", 
    "price": NumberDecimal(15.50),
    "quantity": NumberLong(99)
  });

However, the following documents are no longer considered valid:

db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    <strong>"quantity": NumberInt(100)</strong>
  });
  db.orders.insert({ 
    "item": "jkl", 
    "price": NumberDecimal(15.50),
    <strong>"quantity": "98"</strong>
  });
  db.orders.insert({ 
    "item": "jkl", 
    <strong>"pryce": NumberDecimal(15.50),</strong>
    "quantity": NumberInt(99)
  });
You probably noticed that our orders above are seemingly odd: they only contain one single item. More realistically, an order consists of multiple items and a possible JSON structure might be as follows:

{
    _id: 10000,
    total: NumberDecimal(141),
    VAT: 0.20,
    totalWithVAT: NumberDecimal(169),
    lineitems: [
        {
            sku: "MDBTS001",
            name: "MongoDB Stitch T-shirt",
            quantity: NumberInt(10),
            unit_price:NumberDecimal(9)
        },
        {
            sku: "MDBTS002",
            quantity: NumberInt(5),
            unit_price: NumberDecimal(10)
        }
    ]
}

With MongoDB 3.6, we can now control the structure of the lineitems array, for instance with the following JSON Schema:

db.runCommand({
    collMod: "orders",
    validator: {
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems"],
        properties: {
        lineitems: {
              <strong>bsonType: ["array"],</strong>
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {
                        bsonType: "string",
                        description: "'sku' must be a string and is required"
                      },
                      name: {
                        bsonType: "string",
                        description: "'name' must be a string"
                      },
                      unit_price: {
                        bsonType: "decimal",
                        description: "'unit_price' must be a decimal and is required"
                      },
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true,
                        description:
                          "'quantity' must be a short or long integer in [0, 100)"
                      },
                  }                    
              }
          }
        }
      }
    }
  });
With the schema above, we enforce that any order inserted or updated in the orders collection contain a lineitems array of 1 to 10 documents that all have sku, unit_price and quantity attributes (with quantity required to be an integer).

The schema would prevent inserting the following, badly formed document:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                price: NumberDecimal(9) //this should be 'unit_price'
            },
            {
                name: "MDBTS002", //missing a 'sku' property
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

But it would allow inserting the following, schema-compliant document:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

However, if you pay close attention to the order above, you may notice that it contains a few errors:

  1. The totalWithVAT attribute value is incorrect (it should be equal to 141*1.20=169.2)

  2. The total attribute value is incorrect (it should be equal to the sum of each line item sub-total, (i.e. 10*9+10*5=140)

Is there any way to enforce that total and totalWithVAT values be correct using database validation rules, without relying solely on application logic?

Introducing MongoDB expressive query syntax

Adding more complex business validation rules is now possible thanks to the expressive query syntax, a new feature of MongoDB 3.6.

One of the objectives of the expressive query syntax is to bring the power of MongoDB’s aggregation expressions to MongoDB’s query language. An interesting use case is the ability to compose dynamic validation rules that compute and compare multiple attribute values at runtime. Using the new $expr operator, it is possible to validate the value of the totalWithVAT attribute with the following validation expression:

$expr: {
   $eq: [
     "$totalWithVAT",
     {$multiply: [
       "$total", 
       {$sum: [1, "$VAT"]}
     ]}
   ]
}

The above expression checks that the totalWithVAT attribute value is equal to total * (1+VAT). In its compact form, here is how we could use it as a validation rule, alongside our JSON Schema validation:

db.runCommand({
    collMod: "orders",
    validator: {
 <strong>$expr:{$eq:[
           "$totalWithVAT",
           {$multiply:["$total", {$sum:[1,"$VAT"]}]}
             ]}</strong>,
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems"],
        properties: {
          lineitems: {
              bsonType: ["array"],
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {
                        bsonType: "string",
                        description: "'sku' must be a string and is required"
                      },
                      name: {
                        bsonType: "string",
                        description: "'name' must be a string"
                      },
                      unit_price: {
                        bsonType: "decimal",
                        description: "'unit_price' must be a decimal and is required"
                      },
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true,
                        description:
                          "'quantity' must be a short or long integer in [0, 100)"
                      },
                  }                    
              }
          }
        }
      }
    }
  });
With the validator above, the following insert operation is no longer possible:

db.orders.insert({
        total: NumberDecimal(141),
        VAT: NumberDecimal(0.20),
        totalWithVAT: NumberDecimal(169),
        lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                Unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})

Instead, the totalWithVAT value must be adjusted according to our new VAT validation rule:

db.orders.insert({
    total: NumberDecimal(141),
    VAT: NumberDecimal(0.20),
    <strong>totalWithVAT: NumberDecimal(169.2)</strong>,
    lineitems: [
            {
                sku: "MDBTS001",
                name: "MongoDB Stitch T-shirt",
                quantity: NumberInt(10),
                unit_price: NumberDecimal(9)
            },
            {
                sku: "MDBTS002",
                quantity: NumberInt(5),
                unit_price: NumberDecimal(10)
            }
        ]
})
If we also want to make sure that the total value is the sum of each order line item value (i.e. quantity*unit_price), the following expression should be used:

$expr: { 
    $eq: [
       "$total", 
       {$sum: {
          $map: {
             "input": "$lineitems",
             "as": "item",
             "in": { 
                "$multiply": [
                   "$$item.quantity", 
                   "$$item.unit_price"
                ]
             } 
          }
       }}
    ]
  }

The above expression uses the $map operator to compute each line item’s sub-total, then sums all these sub-totals, and finally compares it to the total value. To make sure that both the Total and VAT validation rules are checked, we must combine them using the $and operator. Finally, our collection validator can be updated with the following command:

db.runCommand({
    collMod: "orders",
    validator: {
      $expr:{ $and:[
          {$eq:[ 
            "$totalWithVAT",
                   {$multiply:["$total", {$sum:[1,"$VAT"]}]}
          ]}, 
          {$eq: [
                   "$total", 
                {$sum: {$map: {
                    "input": "$lineitems",
                    "as": "item",
                    "in":{"$multiply":["$$item.quantity","$$item.unit_price"]}
                   }}}
             ]}
        ]},
      $jsonSchema: {
        bsonType: "object",       
        required: ["lineitems", "total", "VAT", "totalWithVAT"],
        properties: {
          total: { bsonType: "decimal" },
          VAT: { bsonType: "decimal" },
          totalWithVAT: { bsonType: "decimal" },
          lineitems: {
              bsonType: ["array"],
              minItems: 1,
              maxItems:10,
              items: {
                  required: ["unit_price", "sku", "quantity"],
                  bsonType: "object",
                  additionalProperties: false,
                  properties: {
                      sku: {bsonType: "string"},
                      name: {bsonType: "string"},
                      unit_price: {bsonType: "decimal"},
                      quantity: {
                        bsonType: ["int", "long"],
                        minimum: 0,
                        maximum: 100,
                        exclusiveMaximum: true

                      },
                  }                    
              }
          }
        }
      }
    }
  });

Accordingly, we must update the total and totalWithVAT properties to comply with our updated schema and business validation rules (without changing the lineitems array):

db.orders.insert({
      total: NumberDecimal(140),
      VAT: NumberDecimal(0.20),
      totalWithVAT: NumberDecimal(168),
      lineitems: [
          {
              sku: "MDBTS001",
              name: "MongoDB Stitch T-shirt",
              quantity: NumberInt(10),
              unit_price: NumberDecimal(9)
          },
          {
              sku: "MDBTS002",
              quantity: NumberInt(5),
              unit_price: NumberDecimal(10)
          }
      ]
  })

Next steps

With the introduction of JSON Schema Validation in MongoDB 3.6, database administrators are now better equipped to address data governance requirements coming from compliance officers or regulators, while still benefiting from MongoDB’s flexible schema architecture.

Additionally, developers will find the new expressive query syntax useful to keep their application code base simpler by moving business logic from the application layer to the database layer.

If you want to learn more about everything new in MongoDB 3.6, download our What’s New guide.

If you want to get deeper on the technical side, visit the Schema Validation and Expressive Query Syntax pages in our official documentation.

If you want to get more practical, hands-on experience, take a look at this JSON Schema Validation hands-on lab. You can try it right away on the MongoDB Atlas database service, which supports MongoDB 3.6 since its general availability date.

Last but not least, sign up for our free MongoDB 3.6 training from MongoDB University.

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

Building a voice-activated movie search app powered by Amazon Lex, Lambda, and MongoDB Atlas - Part 3

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here.

Introduction

This is Part 3 of our Amazon Lex blog post series, part of our larger Road to re:Invent 2017 series. As a reminder, this tutorial is divided into 3 parts:

In this last blog post, we will deploy our Lambda function using the AWS Command Line Interface and verify that the bot fully works as expected. We’ll then review the code that makes up our Lambda function and explain how it works.

Let’s deploy our AWS Lambda function

Please follow the deployment steps available in this GitHub repository. I have chosen to use Amazon’s SAM Local tool to showcase how you can test your Lambda function locally using Docker, as well as package it and deploy it to an AWS account in just a few commands. However, if you’d like to deploy it manually to the AWS Console, you can always use this zip script to deploy it in pretty much the same way I did in this MongoDB Atlas with Lambda tutorial.

Let’s test our Lex bot (end-to-end)

Now that our Lambda fulfillment function has been deployed, let’s test our bot again in the Amazon Lex console and verify that we get the expected response. For instance, we might want to search for all the romance movies Jennifer Aniston starred in, a scenario we can test with the following bot conversation:

Amazon Lex Test Bot UI

Amazon Lex Test Bot UI

Amazon Lex Test Bot UI

As the screenshot above testifies, the Lex bot replies with the full list of Jennifer Aniston’s romance movies retrieved from our movies MongoDB database through our Lambda function. But how does our Lambda function process that request? We’ll dig deeper into our Lambda function code in the next section.

Let's dive into the Lambda function code

Our Lambda function always receives a JSON payload with a structure compliant with Amazon Lex’ input event format (as this event.json file is):

{
  "messageVersion": "1.0",
  "invocationSource": "FulfillmentCodeHook",
  "userId": "user-1",
  "sessionAttributes": {},
  "bot": {
    "name": "SearchMoviesBot",
    "alias": "$LATEST",
    "version": "$LATEST"
  },
  "outputDialogMode": "Text",
  "currentIntent": {
    "name": "SearchMovies",
    "slots": {
      "castMember": "jennifer aniston",
      "year": "0",
      "genre": "Romance"
    }
  }
}

Note that the request contains the bot’s name (SearchMoviesBot) and the slot values representing the answers to the bot’s questions provided by the user.

The Lambda function starts with the exports.handler method which validates the bot’s name and performs some additional processing if the payload is received through Amazon API Gateway (this is only necessary if you want to test your Lambda function through Amazon API Gateway but is not relevant in an Amazon Lex context). It then calls the dispatch() method, which takes care of connecting to our MongoDB Atlas database and passing on the bot’s intent to the query() method, which we’ll explore in a second. Note that the dispatch() method uses the performance optimization technique I highlighted in Optimizing AWS Lambda performance with MongoDB Atlas and Node.js, namely not closing the database connection and using the callbackWaitsForEmptyEventLoop Lambda context property. This allows our bot to be more responsive after the first query fulfilled by the Lambda function.

Let’s now take a closer look at the query() method, which is the soul and heart of our Lambda function. First, that method retrieves the cast member, movie genre, and movie release year. Because these values all come as strings and the movie release year is stored as an integer in MongoDB, the function must convert that value to an integer.

We then build the query we will run against MongoDB:

var castArray = [castMember];

var matchQuery = {
    Cast: { $in: castArray },
    Genres: { $not: { $in: ["Documentary", "News", ""] } },
    Type: "movie"
  };

  if (genre != undefined && genre != allGenres) {
    matchQuery.Genres = { $in: [genre] };
    msgGenre = genre.toLowerCase();
  }

  if ((year != undefined && isNaN(year)) || year > 1895) {
    matchQuery.Year = year;
    msgYear = year;
  }

We first restrict the query to items that are indeed movies (since the database also stores TV series) and we exclude some irrelevant movie genres such as the documentary and news genres. We also make sure we only query movies in which the cast member starred. Note that the $in operator expects an array, which is why we have to wrap our unique cast member into the castArray array. Since the cast member is the only mandatory query parameter, we add it first and then optionally add the Genres and Year parameters if the code determines that they were provided by the user (i.e. the user did not use the All and/or 0 escape values).

The query() method then goes on to define the default response message based on the user-provided parameters. This default response message is used if the query doesn’t return any matching element:

var resMessage = undefined;
  if (msgGenre == undefined && msgYear == undefined) {
    resMessage = `Sorry, I couldn't find any movie for ${castMember}.`;
  }
  if (msgGenre != undefined && msgYear == undefined) {
    resMessage = `Sorry, I couldn't find any ${msgGenre} movie for ${castMember}.`;
  }
  if (msgGenre == undefined && msgYear != undefined) {
    resMessage = `Sorry, I couldn't find any movie for ${castMember} in ${msgYear}.`;
  }
  if (msgGenre != undefined && msgYear != undefined) {
    resMessage = `Sorry, ${castMember} starred in no ${msgGenre} movie in ${msgYear}.`;
  }

The meat of the query() method happens next as the code performs the database query using 2 different methods: the classic db.collection.find() method and the db.collection.aggregate() method. The default method used in this Lambda function is the aggregate one, but you can easily test the find() method by setting the [aggregationFramework](https://github.com/rlondner/mongodb-awslex-searchmovies/blob/master/code/lambda.js#L112) variable to false.

In our specific use case scenario (querying for one single cast member and returning a small amount of documents), there likely won’t be any noticeable performance or programming logic impact. However, if we were to query for all the movies multiple cast members each starred in (i.e. the union of these movies, not the intersection), the aggregation framework query is a clear winner. Indeed, let’s take a closer look at the find() query the code runs:

cursor = db.collection(moviesCollection)
      .find(matchQuery, { _id: 0, Title: 1, Year: 1 })
      .collation(collation)
      .sort({ Year: 1 });

It’s a fairly simple query that retrieves the movie’s title and year, sorted by year. Note that we also use the same { locale: "en", strength: 1 } collation we used to create the case-insensitive index on the Cast property in Part 2 of this blog post series. This is critical since the end user might not title case the cast member’s name (and Lex won’t do it for us either).

The simplicity of the query is in contrast to the relative complexity of the app logic we have to write to process the result set we get with the find() method:

var maxYear, minYear;
for (var i = 0, len = results.length; i < len; i++) { 
    castMemberMovies += `${results[i].Title} (${results[i].Year}), `;
}

 //removing the last comma and space
castMemberMovies = castMemberMovies.substring(0, castMemberMovies.length - 2);

moviesCount = results.length;
var minYear, maxYear;
minYear = results[0].Year;
maxYear = results[results.length-1].Year;
yearSpan = maxYear - minYear;

First, we have to iterate over all the results to concatenate its Title and Year properties into a legible string. This might be fine for 20 items, but if we had to process hundreds of thousands or millions of records, the performance impact would be very noticeable. We further have to remove the last period and white space characters of the concatenated string since they’re in excess. We also have to manually retrieve the number of movies, as well as the low and high ends of the movie release years in order to compute the time span it took the cast member to shoot all these movies. This might not be particularly difficult code to write, but it’s clutter code that affects app clarity. And, as I wrote above, it definitely doesn’t scale when processing millions of items.

Contrast this app logic with the succinct code we have to write when using the aggregation framework method:

for (var i = 0, len = results.length; i < len; i++) { 
    castMemberMovies = results[i].allMovies;
    moviesCount = results[i].moviesCount;
    yearSpan = results[i].timeSpan;
}

The code is not only much cleaner and concise now, it’s also more generic, as it can handle the situation where we want to process movies for each of multiple cast members. You can actually test this use case by uncommenting the following line earlier in the source code:

castArray = [castMember, "Angelina Jolie"]

and by testing it using this SAM script.

With the aggregation framework, we get the correct raw and final results without changing a single line of code:

MongoDB Aggregation Framework Query Response

However, the find() method’s post-processing requires some significant effort to fix this incorrect output (the union of comedy movies in which Angelina Jolie or Brad Pitt starred in, all incorrectly attributed to Brad Pitt):

MongoDB Find Query Response

We were able to achieve this code conciseness and correctness by moving most of the post-processing logic to the database layer using a MongoDB aggregation pipeline:

cursor = db.collection(moviesCollection).aggregate(
      [
        { $match: matchQuery },
        { $sort: { Year: 1 } },
        unwindStage,
        castFilterStage,
        { $group: {
            _id: "$Cast",
            allMoviesArray: {$push: {$concat: ["$Title", " (", { $substr: ["$Year", 0, 4] }, ")"] } },
            moviesCount: { $sum: 1 },
            maxYear: { $last: "$Year" },
            minYear: { $first: "$Year" }
          }
        },
        {
          $project: {
            moviesCount: 1,
            timeSpan: { $subtract: ["$maxYear", "$minYear"] },
            allMovies: {
              $reduce: {
                input: "$allMoviesArray",
                initialValue: "",
                in: {
                  $concat: [
                    "$$value",
                    {
                      $cond: {
                        if: { $eq: ["$$value", ""] },
                        then: "",
                        else: ", "
                      }
                    },
                    "$$this"
                  ]
                }
              }
            }
          }
        }
      ],
      {collation: collation}

);

This aggregation pipeline is arguably more complex than the find() method discussed above, so let’s try to explain it one stage at a time (since an aggregation pipeline consists of stages that transform the documents as they pass through the pipeline):

  1. $match stage: performs a filter query to only return the documents we’re interested in (similarly to the find() query above).
  2. $sort stage: sorts the results by year ascending.
  3. $unwind stage: splits each movie document into multiple documents, one for each cast member in the original document. For each original document, this stage unwinds the Cast array of cast members and creates separate, unique documents with the same values as the original document, except for the Cast property which is now a string value (equal to each cast member) in each unwinded document. This stage is necessary to be able to group by only the cast members we’re interested in (especially if there are more than one). The output of this stage may contain documents with other cast members irrelevant to our query, so we must filter them out in the next stage.
  4. $match stage: filters the deconstructed documents from the $unwind stage by only the cast members we’re interested in. This stage essentially removes all the documents tagged with cast members irrelevant to our query.
  5. $group stage: groups movies by cast member (for instance, all movies with Brad Pitt and all movies with Angelina Jolie, separately). This stage also concatenates each movie title and release year into the Title (Year) format and adds it to an array called allMoviesArray (one such array for each cast member). This stage also computes a count of all movies for each cast member, as well as the earliest and latest year the cast member starred in a movie (of the requested movie genre, if any). This stage essentially performs most of the post-processing we previously had to do in our app code when using the find() method. Because that post-processing now runs at the database layer, it can take advantage of the database server’s computing power along with the distributed system nature of MongoDB (in case the collection is partitioned across multiple shards, each shard performs this stage independently of the other shards).
  6. $project stage: last but not least, this stage performs a $reduce operation (new in MongoDB 3.4) to concatenate our array of ‘Title (Year)’ strings into one single string we can use as is in the response message sent back to the bot.

Once the matching movies have been retrieved from our MongoDB Atlas database, the code generates the proper response message and sends it back to the bot according to the expected Amazon Lex response format:

 if (msgGenre != allGenres) {
                resMessage = `${toTitleCase(castMember)} starred in 
                the following ${moviesCount>1?moviesCount+" ":""}
                ${msgGenre.toLowerCase()} movie(s)${yearSpan>0?" over " 
                + yearSpan +" years":""}: ${castMemberMovies}`;
} else {
    resMessage = `${toTitleCase(castMember)} starred in the following 
    ${moviesCount>1?moviesCount+" ":""}movie(s)${yearSpan>0?" over " 
    + yearSpan +" years":""}: ${castMemberMovies}`;
}
if (msgYear != undefined) {
    resMessage = `In ${msgYear}, ` + resMessage;

callback(
    close(sessionAttributes, "Fulfilled", {
        contentType: "PlainText",
        content: resMessage
    })
);

Our Jennifer Aniston fan can now be wowed by the completeness of our bot's response!

Amazon Lex MongoDB response

Wrap-up and next steps

This completes our Lex blog post series and I hope you enjoyed reading it as much as I did writing it.

In this final blog post, we tested and deployed a Lambda function to AWS using the SAM Local tool.

We also learned:

  • How a Lambda function processes a Lex request and responds to it using Amazon Lex’ input and out event format.

  • How to use a case-insensitive index in a find() or aggregate() query

  • How to make the most of MongoDB’s aggregation framework to move complexity from the app layer to the database layer

As next steps, I suggest you now take a look at the AWS documentation to learn how to deploy your bot to Facebook Messenger , Slack or to your own web site.

Happy Lex-ing!

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

Building a voice-activated movie search app powered by Amazon Lex, Lambda, and MongoDB Atlas - Part 2

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here.

Introduction

This is Part 2 of our Road to re:Invent 2017 blog post series. If you haven’t read it yet, take a look at Part 1 for a brief overview of Amazon Lex and instructions to set up our movie database with MongoDB Atlas, our fully managed database service.

As a reminder, this tutorial is divided into 4 parts:

In this blog post, we will set up our Lex bot in the AWS Console and verify that its basic flow works as expected. We’ll implement the business logic (which leverages MongoDB) in Part 3 of this post series.

Amazon Lex bot setup instructions

In this section, we will go through the whole process of creating our SearchMovies bot while explaining the architectural decisions I made.

After signing in into the AWS Console, select the Lex service (in the Artificial Intelligence section) and press the Create button.

Select the Custom bot option and fill out the form parameters as follows:

  • Bot name: SearchMoviesBot

  • Output voice: None

  • Session timeout: 5

  • COPPA: No

Press the Create button at the bottom of the form.

A new page appears, where you can create an intent. Press the Create Intent button and in the Add intent pop-up page, click the Create new intent link and enter SearchMovies in the intent name field.

In the Slot types section, add a new slot type with the following properties:

  • Slot type name: MovieGenre

  • Description: Genre of the movie (Action, Comedy, Drama…)

  • Slot Resolution: Restrict to Slot values and Synonyms

  • Values: All, Action, Adventure, Biography, Comedy, Crime, Drama, Romance, Thriller

image alt text

You can add synonyms to all these terms (which strictly match the possible values for movie genres in our sample database), but the most important one for which you will want to configure synonyms is the Any value. We will use it as a keyword to avoid filtering on movie genre in scenarios when the user cannot qualify the genre of the movie he’s looking for or wants to retrieve all the movies for a specific cast member. Of course, you can explore the movie database on your own to identify and add other movie genres I haven’t listed above. Once you’re done, press the Save slot type button.

Next, in the Slots section, add the following 3 slots:

  1. genre

    1. Type: MovieGenre

    2. Prompt: I can help with that. What's the movie genre?

    3. Required: Yes

  2. castMember

    1. Type: AMAZON.Actor

    2. Prompt: Do you know the name of an actor or actress in that movie?

    3. Required: Yes

  3. year

    1. Type: AMAZON.FOUR_DIGIT_NUMBER

    2. Prompt: Do you know the year {castMember}'s movie was released? If not, just type 0

    3. Required: Yes

Press the Save Intent button and verify you have the same setup as shown in the screenshot below:

image alt text

The order of the slots is important here: once the user’s first utterance has been detected to match a Lex intent, the Lex bot will (by default) try to collect the slot values from the user in the priority order specified above by using the Prompt texts for each slot. Note that you can use previously collected slot values in subsequent slot prompts, which I demonstrate in the ‘year’ slot. For instance, if the user answered Angelina Jolie to the castMember slot prompt, the year slot prompt will be: ‘Do you know the year Angelina Jolie’s movie was released? If not, just type 0

Note that it’s important that all the slots are marked Required. Otherwise, the only opportunity for the user to specify them is to mention them in the original utterance. As you will see below, we will provide such ability for Lex to identify slots right from the start, but what if the user chooses to kick off the process without mentioning any of them? If the slots aren’t required, they are by default overlooked by the Lex bot so we need to mark them Required to offer the user the option to define them.

But what if the user doesn’t know the answer to those prompts? We’ve handled this case as well by defining "default" values: All for the genre slot and _0_ for the year slot. The only mandatory parameter the bot’s user must provide is the cast member’s name; the user can restrict the search further by providing the movie genre and release year.

Last, let’s add the following sample utterances that match what we expect the user will type (or say) to launch the bot:

  • I am looking for a movie

  • I am looking for a ​{genre}​ movie

  • I am looking for a movie released in ​{year}​

  • I am looking for a ​{genre}​ movie released in ​{year}​

  • In which movie did ​{castMember}​ play

  • In which movie did ​{castMember}​ play in {year}

  • In which ​{genre}​ movie did ​{castMember}​ play

  • In which ​{genre}​ movie did ​{castMember}​ play in {year}

  • I would like to find a movie

  • I would like to find a movie with ​{castMember}​

Once the utterances are configured as per the screenshot below, press Save Intent at the bottom of the page and then Build at the top of the page. The process takes a few seconds, as AWS builds the deep learning model Lex will use to power our SearchMovies bot.

image alt text

It’s now time to test the bot we just built!

Testing the bot

Once the build process completes, the test window automatically shows up:

image alt text

Test the bot by typing (or saying) sentences that are close to the sample utterances we previously configured. For instance, you can type ‘Can you help me find a movie with Angelina Jolie?’ and see the bot recognize the sentence as a valid kick-off utterance, along with the {castMember} slot value (in this case, ‘Angelina Jolie’). This can be verified by looking at the Inspect Response panel:

image alt text

At this point, the movie genre hasn’t been specified yet, so Lex prompts for it (since it’s the first required slot). Once you answer that prompt, notice that Lex skips the second slot ({castMember}) since it already has that information.

Conversely, you can test that the ‘Can you help me find a comedy movie with angelina jolie?’ utterance will immediately prompt the user to fill out the {year} slot since both the {castMember} and {genre} values were provided in the original utterance:

image alt text

An important point to note here is that enumeration slot types (such as our MovieGenre type) are not case-sensitive. This means that both "comedy" and “coMeDy” will resolve to “Comedy”. This means we will be able to use a regular index on the Genres property of our movies collection (as long as our enumeration values in Lex match the Genres case in our database).

However, the AMAZON.Actor type is case sensitive - for instance, "angelina jolie" and “Angelina Jolie” are 2 distinct values for Lex. This means that we must define a case-insensitive index on the Cast property (don’t worry, there is already such an index, called ‘Cast_1’ in our sample movie database). Note that in order for queries to use that case-insensitive index, we’ll have to make sure our find() query specifies the same collation as the one used to create the index (locale=’en’ and strength=1). But don’t worry for now: I’ll make sure to point it out again in Part 3 when we review the code of our chat’s business logic (in the Lambda function we’ll deploy).

Summary

In this blog post, we created the SearchMovies Lex bot and tested its flow. More specifically, we:

  • Created a custom Lex slot type (MovieGenre)

  • Configured intent slots

  • Defined sample utterances (some of which use our predefined slots)

  • Tested our utterances and the specific prompt flows each of them starts

We also identified the case sensitivity of a built-in Lex slot that adds a new index requirement on our database.

In Part 3, we’ll get to the meat of this Lex blog post series and deploy the Lambda function that will allow us to complete our bots’ intended action (called ‘fulfillment’ in the Lex terminology).

Meanwhile, I suggest the following readings to further your knowledge of Lex and MongoDB:

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

Building a voice-activated movie search app powered by Amazon Lex, Lambda, and MongoDB Atlas - Part 1

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here.

Introduction

As we prepare to head out to Las Vegas for AWS re:Invent 2017, I thought it’d be a good opportunity to explore how to combine serverless and artificial intelligence services such as Lex and Lambda with MongoDB Atlas, our fully managed database service.

This tutorial is divided into 3 parts:

Since this is Part 1 of our blog series, let’s dig right into it now.

What is Amazon Lex?

Amazon Lex is a deep learning service provided by AWS to power conversational bots (more commonly known as "chatbots"), which can either be text- or voice-activated. It’s worth mentioning that Amazon Lex is the technology that powers Alexa, the popular voice service available with Amazon Echo products and mobile applications (hence the Lex name). Amazon Lex bots are built to perform actions (such as ordering a pizza), which in Amazon lingo is referred to as intents.

Note that each bot may perform multiple intents (such as "booking a flight" and “booking a hotel”), which can each be kicked off by distinct phrases (called utterances). This is where the Natural Language Understanding (NLU) power of Lex bots shines — you define a few sample utterances and let the Lex AI engine infer all the possible variations of these utterances (another interesting aspect of Lex’ AI engine is its Automatic Speech Recognition technology, which allows).

Let's illustrate this concept with a fictitious, movie search scenario. If you create a SearchMovies intent, you may want to define a sample utterance as “I would like to search for a movie”, since you expect it to be what the user will say to express their movie search intention. But as you may well know, human beings have a tendency to express the same intention in many different ways, depending on their mood, cultural background, language proficiency, etc... So if the user types (or says) “I’d like to find a movie” or “I’d like to see a movie”, what happens? Well, you’ll find that Lex is smart enough to figure out that those phrases have the same meaning as “I would like to search for a movie” and consequently trigger the “SearchMovies” intent.

However, as our ancestors the Romans would say, dura lex sed lex and if the user’s utterance veers too far away from the sample utterances you have defined, Lex would stop detecting the match. For instance, while "I’d like to search for a motion picture" and “I’d like to see a movie” are detected as matches of our sample utterance (I would like to search for a movie), “I’d like to see a motion picture” is not (at least in the tests I performed).

The interim conclusion I drew from that small experiment is that Lex’ AI engine is not yet ready to power Blade Runner’s replicants or Westworld’s hosts, but it definitely can be useful in a variety of situations (and I’m sure the AWS researchers are hard at work to refine it).

In order to fulfill the intent (such as providing the name of the movie the user is looking for), Amazon Lex would typically need some additional information, such as the name of a cast member, the movie genre and the movie release year. These additional parameters are called slots in the Lex terminology and theye are collected one at a time after a specific Lex prompt.

For instance, after an utterance is detected to launch the SearchMovies intent, Lex may ask the following questions to fill all the required slots:

  • What's the movie genre? (to fill the genre slot)

  • Do you know the name of an actor or actress with a role in that movie? (to fill the castMember slot)

  • When was the movie was released? (to fill the year slot)

Once all the required slots have been filled, Lex tries to fulfill the intent by passing all the slot values to some business logic code that performs the necessary action — e.g, searching for matching movies in a movie database or booking a flight. As expected, AWS promotes its own technologies so Lex has a built-in support for Lambda functions, but you can also "return parameters to the client", which is the method you’ll want to use if you want to process the fulfillment in your application code (used in conjunction with the Amazon Lex Runtime Service API).

Demo bot scenario

Guess what? This will be a short section since the scenario we will implement in this blog post series is exactly the "fictitious example" I described above (what a coincidence!).

Indeed, we are going to build a bot allowing us to search for movies among those stored in a movie database. The data store we will use is a MongoDB database running in MongoDB Atlas, which is a good serverless fit for developers and DevOps folks who don’t want to set up and manage infrastructure.

Speaking of databases, it’s time for us to deploy our movie database to MongoDB Atlas before we start building our Lex bot.

Data setup and exploration

To set up the movie database, follow the instructions available in this GitHub repository.

Note that in order to keep the database dump file under GitHub's 100MB limit per file, the database I have included isn’t complete (for instance, it doesn’t include movies released prior to 1950 - sincere apologies to Charlie Chaplin fans).

Now, let’s take a look at a typical document in this database (Mr. & Mrs. Smith released in 2005):

{
    "_id" : ObjectId("573a13acf29313caabd287dd"),
    "ID" : 356910,
    "imdbID" : "tt0356910",
    "Title" : "Mr. & Mrs. Smith",
    "Year" : 2005,
    "Rating" : "PG-13",
    "Runtime" : "120 min",
    "Genre" : "Action, Comedy, Crime",
    "Released" : "2005-06-10",
    "Director" : "Doug Liman",
    "Writer" : "Simon Kinberg",
    "Cast" : [
        "Brad Pitt",
        "Angelina Jolie",
        "Vince Vaughn",
        "Adam Brody"
    ],
    "Metacritic" : 55,
    "imdbRating" : 6.5,
    "imdbVotes" : 311244,
    "Poster" : "http://ia.media-imdb.com/images/M/MV5BMTUxMzcxNzQzOF5BMl5BanBnXkFtZTcwMzQxNjUyMw@@._V1_SX300.jpg",
    "Plot" : "A bored married couple is surprised to learn that they are both assassins hired by competing agencies to kill each other.",
    "FullPlot" : "John and Jane Smith are a normal married couple, living a normal life in a normal suburb, working normal jobs...well, if you can call secretly being assassins \"normal\". But neither Jane nor John knows about their spouse's secret, until they are surprised to find each other as targets! But on their quest to kill each other, they learn a lot more about each other than they ever did in five (or six) years of marriage.",
    "Language" : "English, Spanish",
    "Country" : "USA",
    "Awards" : "9 wins & 17 nominations.",
    "lastUpdated" : "2015-09-04 00:02:26.443000000",
    "Type" : "movie",
    "Genres" : [
        "Action",
        "Comedy",
        "Crime"
    ]
}

I have highlighted the properties of interest to our use case. Each movie record typically includes the principal cast members (stored in a string array), a list of genres the movie can be categorized in (stored in a string array) and a release year (stored as a 4-digit integer).

These are the 3 properties we will leverage in our Lex bot (which we will create in Part 2) and consequently in our Lambda function (which we will build in Part 3) responsible for querying our movies database.

Storing these properties as string arrays is key to ensure that our bot is responsive: they allow us to build small, multikey indexes that will make our queries much faster compared to full collection scans (which regex queries would trigger).

Summary

In this blog post, we introduced the core concepts of Amazon Lex and described the scenario of the Lex bot we’ll create in Part 2. We then deployed a sample movie database to MongoDB Atlas, explored the structure of a typical movie document and identified the fields we’ll use in the Lambda function we’ll build in Part 3. We then reviewed the benefits of using secondary indexes on these fields to speed up our queries.

I have only scratched the surface on all these topics, so here is some additional content for those of you who strive to learn more:

I hope this introduction to Lex has drawn enough interest for you to continue our journey with Part 2!

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

Azure Tutorial: How to Integrate Azure Functions with MongoDB

As announced at MongoDB World ‘17, MongoDB Atlas, the database-as-a-service provided by the creators of MongoDB, is now available on the three major public cloud providers: Amazon Web Services, Google Cloud Platform and Microsoft Azure. In this blog post, I’ll cover the integration of Microsoft Azure Functions with MongoDB Atlas from a developer standpoint.

What are Azure Functions? In a nutshell, Azure Functions are the core building block of Microsoft’s serverless technologies, similar to AWS Lambda and Google Cloud Functions. You can write your Azure Functions code in a variety of languages and execute it at scale without worrying about the underlying virtual machine and operating system.

That’s not very different from other cloud vendor offerings, but what seems to be unique about Azure Functions is Microsoft’s promise to open source the Azure Functions Runtime, which means that we could theoretically run Azure functions anywhere - on Azure, on private data centers or in other cloud providers. At the time of writing, we have yet to see whether Microsoft will deliver on that promise). To their credit, Microsoft already provides tools to run and debug Azure functions locally, as we’ll see below.

In this post, I’ll introduce you to the process I recommend to create an Azure function with Visual Studio. I’ll also show how to leverage the .NET MongoDB driver to perform CRUD operations on a fully-managed MongoDB Atlas hosted on Azure.

Specifically, I will take you through the following steps:

  • Set up your development environment
  • Create an Azure function in Visual Studio
  • Write MongoDB CRUD queries
  • Connect the Azure function to MongoDB Atlas
  • Test the Azure function locally
  • Deploy the Azure function to Microsoft Azure
  • Configure and test the Azure function running on Microsoft Azure

Set up your development environment

First, you should make sure you have Visual Studio 2017 version 15.3 (or higher) installed on your Windows machine (the Community Edition is enough, but the Professional and Enterprise Edition also work with this tutorial). At the time of this writing, Visual Studio 2017 version 15.3 is in Preview and can be installed from https://visualstudio.com/vs/preview (VS 2017 v15.3 is required to run the Azure Functions Tools for Visual Studio 2017)

When installing Visual Studio 2017, make sure you select the Azure development workload (as well as any other workload you wish to install). Azure tutorial

If you already installed Visual Studio 2017, but did not install the Azure development workload, you can do so by going to Settings → Apps & features, find the Visual Studio 2017 app and select Modify.

At the time of writing, the Azure Functions Tools for Visual Studio 2017 must be installed as a Visual Studio extension. Please refer to Microsoft’s documentation for detailed installation instructions.

Create an Azure function in Visual Studio

Azure Functions offer a wide choice of programming languages, such as C#, F#, Node.js, Python, PHP and more. Given that C# is the language of choice of most Microsoft developers, this tutorial will focus on developing and deploying an Azure function using C#.

Open Visual Studio 2017 and select File → New → Project. Select the Azure Functions project type and give your project a name (for instance MongoDB.Tutorials.AzureFunctions).

Next, right-click on your project in the Solution Explorer and select Add → New item. Select the Azure Function item and give it a name such as CreateRestaurantFunction.cs (the file name doesn’t matter as much as the function name, as we’ll see below).

Azure tutorial

A new window appears and lets you choose the type of Azure Function you would like to create. Let’s keep it simple for now and choose the HttpTrigger function type, which will allow us to use our function as a REST API we’ll be able to call from cURL, Postman, or any custom application.

Select Anonymous in AccessRights (you will be able to change this later) and name the function CreateRestaurant.

Azure tutorial

Press the Create button. A CreateRestaurant.cs file gets created with boilerplate code in the public static async Task Run(...) method. This method is invoked every time you call your function endpoint, which by default is http://localhost:7071/api/ on your local machine (http://localhost:7071/api/CreateRestaurant in our case).

Let’s take a closer look at that Run(...) method:

 [FunctionName("Restaurants")]
 public static async Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)]HttpRequestMessage req, TraceWriter log)

  • First, note the FunctionName attribute that determines your function url (so use caution if you want to update it).
  • Second, the AuthorizationLevel is set to Anonymous as previously configured when creating the function. There are ways you can enforce authentication and authorization using OpenID Connect with Windows Azure Active Directory, Facebook, Google or Twitter (as acknowledged by Microsoft), but we’ll leave it to identity experts to fill in the gaps.
  • Third, the get and post parameters indicate that the function can be called with the GET or a POST Http methods. Typically, the GET method is used to retrieve data, the POST method to create data, the PUT method to update data, and the DELETE method to, well, delete data, as you guessed.
  • Since we only want to use this function to create a document in MongoDB, let’s remove the get parameter and let’s keep the post parameter.

    We’ll create another function that we’ll use to retrieve, update and delete. However, we’ll use the HttpTriggerWithParameters function type as it will allow us to provide parameters (such as the restaurant id to retrieve, update or delete) as part of the API endpoint url. Following the same process as above, create another RestaurantFunction.cs Azure function file and name that function Restaurant.

    Azure tutorial

    Write CRUD queries to MongoDB Atlas

    Now that our 2 functions are set up, let’s move to the meat of this blog post: interact with MongoDB Atlas by writing CRUD queries. In order to write any C# application connected to a MongoDB database, you need the MongoDB .NET driver. An Azure function is no exception to the rule so let’s go ahead and install it with NuGet. Right-click on your Visual Studio project and select Manage NuGet Packages. In the NuGet Package Manager, select the Browse tab and search for MongoDB. In the search results, select MongoDB.Driver, make sure you choose the latest version of the driver and press Install (v2.4.4 at the time of writing).

    The MongoDB .NET driver requires dependent assemblies (MongoDB.Bson and MongoDB.Driver.Core) so accept the Apache 2.0 license for these 3 libraries:

    Azure tutorial

    The MongoDB driver also depends on the System.Runtime.InteropServices.RuntimeInformation assembly (v4.0.0), which the current version of the Azure Function Tools don’t automatically import with the MongoDB.Driver package (as other Visual Studio project types do). We therefore need to explicitly import it with NuGet as well:

    Azure tutorial

    Once that’s done, edit the CreateRestaurantFunction.cs file, and add the following using statements:

    curl -iX PATCH http://localhost:7071/api/Restaurant/id/40356018 -H 'content-type: application/json' -d '{ 
        "address.zipcode" : "10036", 
        "borough" : "Manhattan", 
        "cuisine" : "Italian"
    }'

    Next, delete the content of the Run(...) method in the CreateRestaurantFunction.cs file and replace it with the following:

    curl -iX GET http://localhost:7071/api/Restaurant/id/40356018

    Replace the entire content of the RestaurantFunction.cs file with the following code:

    curl -iX POST http://localhost:7071/api/CreateRestaurant -H 'content-type: application/json' -d '{ 
        "address" : {
            "building" : "2780", 
            "coord" : [
                -73.98241999999999, 
                40.579505
            ], 
            "street" : "Stillwell Avenue", 
            "zipcode" : "11224"
        }, 
        "borough" : "Brooklyn", 
        "cuisine" : "American", 
        "name" : "Riviera Caterer", 
        "restaurant_id" : "40356018"
    }'

    Note that I made several changes to that function. Namely, I changed the signature of the Run() method to make it asynchronous and I enabled it to handle GET, PATCH and DELETE http requests (as mentioned above).

    Note also the RunPatch() of the code where I make use of the BsonDocument and UpdateDefinition objects to update an existing document given an arbitrary (but valid) JSON string:

    curl -iX DELETE http://localhost:7071/api/Restaurant/id/40356018

    This would allow me to update the cuisine and borough properties of an existing document by sending the following JSON to the /api/Restaurant/[restaurantId] endpoint of the function:

    { 
        "cuisine": "Italian",
        "borough": "Manhattan"
    }
    

    The same piece of code would also be able to update the zipcode and building property of the address sub-document by setting JSON attributes with dot notation:

    
    { 
        "address.zipcode": "99999",
        "address.building": "999"
    }
    

    Note that if you prefer to use sub-document notation, such as

    
    { 
        "address":
        {
        "zipcode": "99999",
        "building": "999"
        }
    }
    

    then you should use a simpler form (in order to preserve the sub-document attributes you don’t update):

    update = new BsonDocument("$set", changesDocument);

    At this point, our function doesn’t compile because it misses an additional, singleton RestaurantsCollection class, responsible for instantiating a MongoDB database connection and returning a reference to the restaurants collection. The purpose of this class is two-fold:

    1. Encapsulate similar code we’d otherwise have to write for each function
    2. Only instantiate a new database connection if none already exists.

    Indeed, an Azure function has the ability to reuse the same underlying operating system it uses as its core runtime across close enough calls, thereby allowing us to reuse any database connection we’ve already established in a previous call.

    Add a RestaurantsCollection.cs class file add paste the following content into it:

    log.Info("CreateRestaurant function processed a request.");
    var itemId = ObjectId.Empty;
    var jsonContent = string.Empty;
    try
    {
        //retrieving the content from the request's body
        jsonContent = await req.Content.ReadAsStringAsync().ConfigureAwait(false);
        //assuming we have valid JSON content, convert to BSON
        var doc = BsonSerializer.Deserialize<BsonDocument>(jsonContent);
        var collection = RestaurantsCollection.Instance;
        //store new document in MongoDB collection
        await collection.InsertOneAsync(doc).ConfigureAwait(false);
        //retrieve the _id property created document
        itemId = (ObjectId)doc["_id"];
    }
    catch (System.FormatException fex)
    {
        //thrown if there's an error in the parsed JSON
        log.Error($"A format exception occurred, check the JSON document is valid: {jsonContent}", fex);
    }
    catch (System.TimeoutException tex)
    {
        log.Error("A timeout error occurred", tex);
    }
    catch (MongoException mdbex)
    {
        log.Error("A MongoDB error occurred", mdbex);
    }
    catch (System.Exception ex)
    {
        log.Error("An error occurred", ex);
    }
    return itemId == ObjectId.Empty
        ? req.CreateResponse(HttpStatusCode.BadRequest, "An error occurred, please check the function log")
        : req.CreateResponse(HttpStatusCode.OK, $"The created item's _id is  {itemId}");

    Connect the Azure function to MongoDB Atlas

    The last step is to configure our function so it can connect to MongoDB. To do so, edit the local.settings.json file and add a MongoDBAtlasURI attribute inside the Values nested document:

    Azure tutorial

    While you test your Azure function locally, you can use your local MongoDB instance and specify http://localhost:27017. However, since you will publish your Azure function to Microsoft Azure, I recommend that you use MongoDB Atlas to host your MongoDB database cluster since MongoDB Atlas is by default secure, yet publicly available (to configurable IP addresses and specific database users). If you don’t have a MongoDB Atlas cluster yet, sign up now and set up a MongoDB Atlas database on Microsoft Azure.

    You can retrieve your cluster’s connection string from the MongoDB Atlas portal by pressing the Connect button on your cluster page:

    Azure tutorial

    Next, you should press the Copy button to copy your MongoDB Atlas URI to your clipboard:

    Azure tutorial

    Then paste it to the local.settings.json file and modify it to match your needs. If you chose Microsoft Azure to host your 3-node MongoDB Atlas replica set, the format of your connection string is the following:

    mongodb://<USERNAME>:<PASSWORD>@<CLUSTERNAME_LOWERCASE>-shard-00-00-<SUFFIX>.azure.mongodb.net:27017,<CLUSTERNAME_LOWERCASE>-shard-00-01-<SUFFIX>.azure.mongodb.net:27017,<CLUSTERNAME_LOWERCASE>-shard-00-02-<SUFFIX>.azure.mongodb.net:27017/<DATABASE>?ssl=true&replicaSet=<CLUSTERNAME>-shard-0&authSource=admin

    While you’re at it, press the Add current IP address button to allow your current machine (or virtual machine) to access your MongoDB Atlas database:

    Azure tutorial

    Test the Azure function locally

    It’s now time to run and test our Azure function. Launch the Azure Functions debugger in Visual Studio; the following command line prompt should appear:

    Azure tutorial

    Now run the following cURL commands (cURL is available with Cygwin, for instance) or use Postman to craft the equivalent commands.

    To create a restaurant document, run:

    using MongoDB.Driver;
    using MongoDB.Bson;
    using MongoDB.Bson.Serialization;

    If you used Postman, you should get a 200 OK result:

    Azure tutorial

    Now, try to retrieve the restaurant by running the following curl command:

    using MongoDB.Bson;
    using MongoDB.Driver;
    using System;
    
    namespace MongoDB.Tutorials.AzureFunctions
    {
        public sealed class RestaurantsCollection
        {
            private static volatile IMongoCollection<BsonDocument> instance;
            private static object syncRoot = new Object();
    
            private RestaurantsCollection() { }
    
            public static IMongoCollection<BsonDocument> Instance
            {
                get
                {
                    if (instance == null)
                    {
                        lock (syncRoot)
                        {
                            if (instance == null)
                            {
                                string strMongoDBAtlasUri = System.Environment.GetEnvironmentVariable("MongoDBAtlasURI");
                                var client = new MongoClient(strMongoDBAtlasUri);
                                var db = client.GetDatabase("travel");
                                instance = db.GetCollection<BsonDocument>("restaurants");
                            }
                        }
                    }
                    return instance;
                }
            }
        }
    }
    
    

    Next, you can try to update the restaurant by issuing a PATCH request:

    using System;
    using System.Net;
    using System.Net.Http;
    using System.Threading.Tasks;
    using Microsoft.Azure.WebJobs;
    using Microsoft.Azure.WebJobs.Extensions.Http;
    using Microsoft.Azure.WebJobs.Host;
    using MongoDB.Bson;
    using MongoDB.Bson.Serialization;
    using MongoDB.Driver;
    
    
    namespace MongoDB.Tutorials.AzureFunctions
    {
        public static class RestaurantFunction
        {
            [FunctionName("Restaurant")]
            public static Task<HttpResponseMessage> Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "patch", "delete", Route = "Restaurant/id/{restaurantId}")]HttpRequestMessage req, string restaurantId, TraceWriter log)
            {
                log.Info("Restaurant function processed a request.");
                try
                {
                    var collection = RestaurantsCollection.Instance;
                    switch (req.Method.Method)
                    {
                        case "GET":
                            return RunGet(req, restaurantId, log, collection);
                        case "PATCH":
                            return RunPatch(req, restaurantId, log, collection);
                        case "DELETE":
                            return RunDelete(req, restaurantId, log, collection);
                        default:
                            return Task.FromResult(req.CreateResponse(HttpStatusCode.MethodNotAllowed));
                    }
                }
                catch (System.Exception ex)
                {
                    log.Error("An error occurred", ex);
                    return Task.FromResult(req.CreateResponse(HttpStatusCode.InternalServerError));
                }
            }
    
            private static async Task<HttpResponseMessage> RunGet(HttpRequestMessage req, string restaurantId, TraceWriter log, IMongoCollection<BsonDocument> collection)
            {
                var filter = Builders<BsonDocument>.Filter.Eq("restaurant_id", restaurantId);
                var results = await collection.Find(filter).ToListAsync().ConfigureAwait(false);
                if (results.Count > 0)
                {
                    return req.CreateResponse(HttpStatusCode.OK, results[0].ToString());
                }
    
                return req.CreateResponse(HttpStatusCode.NotFound, $"A restaurant with id {restaurantId} could not be found");
            }
    
            private static async Task<HttpResponseMessage> RunDelete(HttpRequestMessage req, string restaurantId, TraceWriter log, IMongoCollection<BsonDocument> collection)
            {
                var filter = Builders<BsonDocument>.Filter.Eq("restaurant_id", restaurantId);
                var result = await collection.FindOneAndDeleteAsync(filter).ConfigureAwait(false);
                if (result != null)
                {
                    return req.CreateResponse(HttpStatusCode.OK);
                }
    
                return req.CreateResponse(HttpStatusCode.NotFound, $"A restaurant with id {restaurantId} could not be deleted");
            }
    
            private static async Task<HttpResponseMessage> RunPatch(HttpRequestMessage req, string restaurantId, TraceWriter log, IMongoCollection<BsonDocument> collection)
            {
                var filter = Builders<BsonDocument>.Filter.Eq("restaurant_id", restaurantId);
                string jsonContent = await req.Content.ReadAsStringAsync();
                BsonDocument changesDocument;
                try
                {
                    changesDocument = BsonSerializer.Deserialize<BsonDocument>(jsonContent);
                }
                catch (System.FormatException)
                {
                    var msg = $"The JSON content is invalid: {jsonContent}";
                    log.Info(msg);
                    return req.CreateResponse(HttpStatusCode.BadRequest, msg);
                }
    
                UpdateDefinition<BsonDocument> update = null;
                foreach (var change in changesDocument)
                {
                    if (update == null)
                    {
                        update = Builders<BsonDocument>.Update.Set(change.Name, change.Value);
                    }
                    else
                    {
                        update = update.Set(change.Name, change.Value);
                    }
                }
    
                //you can also use the simpler form below if you're OK with bypassing the UpdateDefinitionBuilder (and trust the JSON string to be fully correct)
                //update = new BsonDocument("$set", changesDocument);
    
                //The following lines could be uncommented out for debugging purposes
                //var registry = collection.Settings.SerializerRegistry;
                //var serializer = collection.DocumentSerializer;
                //var rendered = update.Render(serializer, registry).ToJson();
    
                var updateResult = await collection.UpdateOneAsync(filter, update).ConfigureAwait(false);
    
                if (updateResult.ModifiedCount == 1)
                {
                    return req.CreateResponse(HttpStatusCode.OK);
                }
                return req.CreateResponse(HttpStatusCode.NotFound, $"A restaurant with id {restaurantId} could not be updated");
            }
        }
    }

    Last, delete the restaurant with a DELETE Http request:

    var changesDocument = BsonSerializer.Deserialize<BsonDocument>(jsonContent);
    UpdateDefinition<BsonDocument> update = null;
    foreach (var change in changesDocument)
    {
    if (update == null)
    {
        var builder = Builders<BsonDocument>.Update;
        update = builder.Set(change.Name, change.Value);
    }
    else
    {
        update = update.Set(change.Name, change.Value);
    }
    }
    
    var updateResult = await collection.UpdateOneAsync(filter, update);
    

    Deploy the Azure function to Microsoft Azure

    Now that we’ve verified that all the tests above are successful, let’s move forward and deploy our function to Azure. You can deploy your Azure function using Continuous Integration (CI) tools such as Visual Studio Team Services (VSTS) or the Azure CLI, but we’ll take a simpler approach in this post by using the Graphical User Interface available in Visual Studio 2017.

    Right-click on the MongoDB.Tutorials.AzureFunctions project, select Azure Function App and press Publish.

    Azure tutorial

    The Create App Service wizard appears and lets you configure your Azure App Service name, as well as the subscription, resource group, app service plan and storage account you want to use for that function:

    Azure tutorial

    When you’re done configuring all these parameters, press Create.

    Configure and test the Azure function running on Microsoft Azure

    The Visual Studio deployment process publishes pretty much all of your Azure Function artifacts, except the local.settings.json file where we configured the MongoDB connection string.

    In order to create it on Azure, head over to your Azure function in your Azure portal and select the Application Settings link:

    Azure tutorial

    In the App Settings section, add the MongoDBAtlasURI key and set the value to a valid MongoDB Atlas connection string.

    We’re not done yet though. Unless you have allowed any IP address to have access to your database cluster, we must configure the IP Whitelist of your Atlas cluster to let Microsoft Azure connect to it. To do so, head over the Platform features tab and select Properties.

    Azure tutorial

    In the properties tab, copy the comma-delimited list of IP addresses and enter each one of them in your Atlas cluster’s IP whitelist.

    Azure tutorial

    Once you’re done, your cluster’s IP Whitelist should have 5 Azure IP addresses along with your local machine’s IP address:

    Azure tutorial

    You can now replace the http://localhost:7071 url you used in your cURL scripts or Postman with the url of your published Azure function (such as https://restaurantappfunction.azurewebsites.net) to test your published Azure function. The screenshot below shows the successful result of an /api/CreateRestaurant call in Postman, which is evidence the published version of Azure Function was able to connect to MongoDB Atlas.

    Azure tutorial

    Conclusion

    I hope you have found this tutorial helpful to get started with Azure Functions and MongoDB Atlas. Cherry on the cake, you can find the complete source code of this tutorial on GitHub.

    As a next step, I suggest that you download MongoDB Compass to visualize the documents you just created in your MongoDB Atlas database cluster with our CreateRestaurant Azure Function. Here’s a small tip: if you just copy an Atlas connection string to your clipboard and start MongoDB Compass, it will automatically detect your connection string and offer you to pre-populate the login screen. Pretty neat, no?

    If you’re planning to use Azure Functions for a production deployment, you might also be interested in the available continuous integration deployment options offered by Microsoft. And if you don’t already have your MongoDB Atlas cluster, sign up now and create a MongoDB cluster on Microsoft Azure in minutes!

    About the Author - Raphael Londner

    Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

Sitecore Tutorial: Deploy Sitecore on Azure & MongoDB Atlas

Sitecore training

This blog post is a tutorial written for Sitecore administrators who would like to deploy Sitecore on Microsoft Azure with MongoDB Atlas as the Database as a Service (DBaaS provider) for Sitecore’s MongoDB databases.

The Sitecore Azure Toolkit scripts allow you to easily deploy Sitecore as an App Service on Microsoft Azure, but the setup and configuration of the required analytics and tracking MongoDB databases is the responsibility of the operations team running the Sitecore cloud deployment.

Now that MongoDB Atlas is available on Microsoft Azure, you can use it to dramatically accelerate the time to market of your Sitecore Cloud deployment. Atlas makes maintenance easy by relying on MongoDB’s expertise to maintain the database for you instead of setting up and operating your own MongoDB infrastructure on Microsoft Azure Virtual Machines. Additionally, by hosting your Sitecore VMs in the same region as your MongoDB Atlas clusters, you benefit from fast, local Internet connections between Azure VMs and MongoDB Atlas. Here is what Sitecore has to say:

“With MongoDB Atlas on Azure, Sitecore customers now have the benefit of sourcing MongoDB directly from its creators,” said Ryan Donovan, Senior Vice President of Product Management at Sitecore. “This, coupled with MongoDB’s enterprise-class support and service levels, delivers a vehicle that seamlessly complements Sitecore’s strong commitment to the Microsoft Azure cloud.

Sitecore deployment on Azure

To install Sitecore on Microsoft Azure, you should start by reading the related Sitecore documentation page.

Once you have chosen your Sitecore deployment type (XP0, XM, XP or XDB) and uploaded the corresponding WebDeploy package to your Microsoft Azure storage account, head over to MongoDB Atlas to prepare the cluster. You will use it to host your Sitecore MongoDB database. If you don’t have a MongoDB Atlas account yet, register here to create one.

It is possible to host your Sitecore MongoDB cluster is an existing Atlas group, but recall that security configurations are scoped at the group, not cluster level. I highly recommend using a new, independent Atlas group for security reasons (namely, to keep its IP Whitelisting and database users configuration independent). The following tutorial assumes that we will deploy a Sitecore 8.2.3 XP0 environment using a dedicated Atlas group we’ll name Sitecore-Azure.

MongoDB Atlas cluster setup

Once you have signed in to MongoDB Atlas, select your name in the top right corner of any MongoDB Atlas page and select My Groups.

Sitecore training

Add a new group called Sitecore-Azure and make sure you choose MongoDB Atlas as the group type.

Sitecore training

Once your Atlas group has been created, press the Build a New Cluster button. Give a name to your cluster (for instance, Sitecore). Choose the Microsoft Azure provider and the region of your choice (among those supported by MongoDB Atlas). Using the same deployment region as your Sitecore web and Azure SQL servers provides latency benefits and cost savings. In this tutorial, I chose to deploy Sitecore in the westus region.

Sitecore training

Choose the M30 cluster instance size, knowing that you will always have the option to scale up to a larger instance size, without any upgrade downtime at all.

Sitecore training

Since we’re setting up a brand new cluster, you’ll need an administrator account. Scroll down to configure your cluster admin user (I use atlasAdmin as the admin user name) and press the Continue to Payment button. After filling out your credit card information, MongoDB Atlas starts provisioning your Sitecore cluster. It’s that easy!

MongoDB Atlas cluster security configuration

Sitecore needs a MongoDB database user account for access to its databases. While your cluster is being provisioned, head over to the Security tab to create database users. We highly recommend that you follow least-privilege access best practices and create a specific user for each of the 4 MongoDB databases.

Press the Add New User button to create the database user we’ll use to access the Analytics database. Security binds a user to one or more databases and in this tutorial, I chose the username scAnalytics and the analytics database name sitecoreAnalytics. The scAnalytics user should only have readWrite permissions on this database as shown in the screenshot below. The readWrite built-in role provides Sitecore all necessary access to create collections and change data while still following the least-privilege access best practice.

Select the Show Advanced Options link in the User Privileges section to add the readWrite permission.

Sitecore training

After creating 3 additional users for the 3 Sitecore tracking databases with similar permissions, the Security/MongoDB Users tab should display the following users:

Sitecore training

Now that we have user accounts, let’s move back to provisioning Sitecore. Before provisioning our Sitecore environment, we need to retrieve our database cluster’s connection string. Select the Clusters tab, select the Sitecore cluster and press the Connect button.

In the pop-up window, press the Copy button next to the URI Connection String and paste the connection string into a safe location.

Sitecore training

It’s now time to set up your Sitecore Cloud environment. There are 2 ways you can provision your Sitecore Cloud environment in Azure:

  1. Using the Sitecore Azure Toolkit
  2. Using the Sitecore Azure Marketplace wizard

I'll cover both options in the sections below.

Sitecore Cloud environment setup with the Sitecore Azure Toolkit

First, make sure your Windows (physical or virtual) machine matches the Sitecore Azure Toolkit requirements.

Next, from the Sitecore Azure Quickstarts GitHub repository, download the azuredeploy.parameters.json files from the proper folder. Since I want to install Sitecore 8.2.3 in a XP0 configuration, the corresponding folder is https://github.com/Sitecore/Sitecore-Azure-Quickstart-Templates/tree/master/Sitecore%208.2.3/xp0. Put this file at the root of the Sitecore Azure Toolkit folder on your Windows operating system, along with your Sitecore license file. Next, open the azuredeploy.parameters.json file in your favorite text editor.

Using Microsoft Azure Storage Explorer, right-click on each WDP file you previously uploaded to your Azure Storage account (as instructed in the Prepare WebDeploy packages section) and select the Get Shared Access Signature menu:

Sitecore training

The Shared Access Signature window shows up. Note that the Start and Expiry times might be slightly off and that the generated link might not be valid. I therefore recommend you decrease the Start Time by one hour (or more):

Sitecore training

Press the Create button, Copy the URL field and paste it to its corresponding parameter in the azuredeploy.parameters.json file, as instructed in the Sitecore environment template configuration configuration (in my case, I configured the singleMsDeployPackageUrl parameter).

Sitecore training

For the four MongoDB-related parameters (analyticsMongoDbConnectionString, trackingLiveMongoDbConnectionString, trackingHistoryMongoDbConnectionString and trackingContactMongoDbConnectionString), use the MongoDB Atlas connection string you previously retrieved and replace atlasAdmin with . Your connection string should then be similar to the following example:

mongodb://<USERNAME>:<PASSWORD>@sitecore-shard-00-00-x00xx.azure.mongodb.net:27017,sitecore-shard-00-01-x00xx.azure.mongodb.net:27017,sitecore-shard-00-02-x00xx.azure.mongodb.net:27017/<DATABASE>?ssl=true&replicaSet=Sitecore-shard-0&authSource=admin

Replace , and with the values you chose for each of the dedicated MongoDB users you set up, such as:

USERNAME PASSWORD DATABASE
scAnalytics [PASSWORD1] sitecoreAnalytics
scTrackingLive [PASSWORD2] sitecoreTrackingLive
scTrackingHistory [PASSWORD3] sitecoreTrackingHistory
scTrackingContact [PASSWORD4] sitecoreTrackingContact

Paste these connection strings to their corresponding parameters in the azuredeploy.parameters.json file. Don’t forget to also fill out other required parameters in that file, such as deploymentId, sqlServerLogin, sqlServerPassword and sitecoreAdminPassword.

Finally, open a Powershell command prompt running as administrator, navigate to the root folder of the Sitecore Azure Toolkit on your machine, and run the following commands:

Import-Module AzureRM
Import-Module .\tools\Sitecore.Cloud.Cmdlets.psm1 -Verbose
Login-AzureRMAccount

Provided you get no error, the last line should prompt a browser window requiring you to sign in with your Microsoft Azure account.

After successfully signing in with Azure, invoke the Sitecore deployment command. In my case, I ran the following command:

Start-SitecoreAzureDeployment -Location "westus" -Name "sc" -ArmTemplateUrl "https://raw.githubusercontent.com/Sitecore/Sitecore-Azure-Quickstart-Templates/master/Sitecore%208.2.3/xp0/azuredeploy.json" -ArmParametersPath ".\azuredeploy.parameters.json" -LicenseXmlPath ".\MongoDBTempLic.xml"

The command line should display “Deployment Started…” but since the Azure provisioning process takes a few minutes, I advise you follow the provisioning process from the Resource groups page on your Azure portal:

Sitecore training

Sitecore Cloud environment setup with the Sitecore Azure Marketplace wizard

If you prefer to use the more automated Sitecore wizard on Azure Marketplace, navigate to Sitecore Experience Platform product page and start the creation process by pressing Get It Now. Once you reach the Credentials tab, enter your 4 MongoDB Atlas connection strings, as shown in the screenshot below.

Sitecore training

After you complete the wizard, your Sitecore environment will be provisioned in Microsoft Azure similarly to the Sitecore Azure Toolkit process described above.

IP Whitelisting

Each Azure App Service exposes the outbound IP addresses it uses. While Microsoft doesn’t formally guarantee that these are fixed IPs, there seems to be evidence that these outbound IP addresses don’t change unless you make significant modifications to your app service (such as scaling it up or down). Another option would be to create an Azure App Service Environment, but this is outside the scope of this blog post.

To find out which outbound IP addresses your app service uses, head over to the Properties tab of your app service and copy the outbound IP addresses available in the namesake section:

Sitecore training

Navigate to the Security/IP Whitelist tab of your MongoDB Atlas cluster, press the Add IP Address button and add each Azure outbound IP address.

Testing connectivity with MongoDB Atlas

Once the Sitecore Powershell command completes, your Sitecore web site should be up and running at the url available in your Azure App Service page (in my case, the “sc-single” App Service):

Sitecore training

Copy/paste the URL available in your Azure App Service page into a browser (see screenshot above). The following page should appear:

Sitecore training

You can also navigate to [your_azurewebsites_sitecore_url]/sitecore/admin where you can access the site administration page. Use admin as the username and the sitecoreAdminPassword value from the azuredeploy.parameters.json file as your password.

Verify that your MongoDB Atlas cluster has the proper collections in each of the 4 databases previously mentioned by using MongoDB Atlas’ Data Explorer tab (or MongoDB Compass if you prefer to use a client-side tool). For example, the Sitecore Analytics database shows the following collections when using Sitecore 8.2.3:

Sitecore training

You can even drill down inside each collection to see the entries Sitecore might already have generated, for instance in the UserAgents collection:

Sitecore training

Conclusion

I hope that you found this tutorial helpful. You should now have a running Sitecore Cloud environment with MongoDB Atlas on Microsoft Azure.

If you’re interested in MongoDB Atlas and don’t have an account yet, you can sign up for free and create a cluster in minutes.

If you’d like to know more about MongoDB deployment options for Sitecore, including our Sitecore consulting engagement package, visit the MongoDB for Sitecore page.

Please use the comment form below to provide your feedback or seek help with any issues.

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

Integrating MongoDB Atlas, Twilio, and AWS Simple Email Service with AWS Step Functions - Part 2

This is Part 2 of the AWS Step Functions overview post published a few weeks ago. If you want to get more context on the sample application business scenario, head back to read Part 1. In this post, you’ll get a deep dive into the application’s technical details. As a reference, the source code of this sample app is available on GitHub.

AWS Step Functions Visual Workflow

Setting up the Lambda functions

The screenshot above is the graphical representation of the state machine we will eventually be able to test and run. But before we get there, we need to set up and publish the 4 Lambda functions this Step Functions state machine relies on. To do so, clone the AWS Step Functions with MongoDB GitHub repository and follow the instructions in the Readme file to create and configure these Lambda functions.

If you have some time to dig into their respective codebases, you'll realize they're all made up of just a few lines, making it simple to embed Twilio, AWS and MongoDB APIs in your Lambda function code. In particular, I would like to point out the concise code the Get-Restaurants lambda function uses to query the MongoDB Atlas database:

db.collection('restaurants').aggregate(
  [
    { $match: 
      { 
        "address.zipcode": jsonContents.zipcode, 
        "cuisine": jsonContents.cuisine, 
        "name": new RegExp(jsonContents.startsWith) 
     } 
   },
   { $project: 
      { 
        "_id": 0, 
        "name": 1, 
        "address.building": 1, 
        "address.street": 1, 
        "borough": 1, 
        "address.zipcode": 1, 
        "healthScoreAverage": 
          { $avg: "$grades.score" }, 
        "healthScoreWorst": 
          { $max: "$grades.score" } 
      } 
    }
  ]
)

The code snippet above is a simple yet powerful example of aggregation framework queries using the $match and $project stages along with the $avg and $max accumulator operators. In a nutshell, this aggregation filters the restaurants dataset by 3 properties (zip code, cuisine, and name) in the $match stage, returns a subset of each restaurant’s properties (to minimize the bandwidth usage and query latency), and computes the maximum and average values of health scores obtained by each restaurant (over the course of 4 years) in the $project stage. This example shows how you can very easily replace SQL clauses (such as WHERE(), MAX() and AVG()) using MongoDB’s expressive query language.

Creating the Step Functions state machine

Once you are done with setting up and configuring these Lambda functions, it's time to finally create our Step Functions state machine.

AWS created a JSON-based declarative language called the Amazon States Language, fully documented on the Amazon States Language specification page. A Step Functions state machine is essentially a JSON file whose structure conforms to this new Amazon States Language. While you don’t need to read its whole specification to understand how it works, I recommend reading the AWS Step Functions Developer Guide to understand its main concepts and artifacts.

For now, let's go ahead and create our WhatsThisRestaurantAgain state machine. Head over to the Create State Machine page in AWS Step Functions and give your new state machine a name (such as WhatsThisRestaurantAgain).

Next, copy and paste the following JSON document (also available on GitHub) into the Code text editor (at the bottom of the Create State Machine page):

{
    "Comment": "A state machine showcasing the use of MongoDB Atlas to notify a user by text message or email depending on the number of returned restaurants",
    "StartAt": "GetRestaurants",
    "States": {
        "GetRestaurants": {
            "Type": "Task",
            "Resource": "",
            "ResultPath": "$.restaurants",
            "Next": "CountItems"
        },
        "CountItems": {
            "Type": "Task",
            "Resource": "",
            "InputPath": "$.restaurants",
            "ResultPath": "$.count",
            "Next": "NotificationMethodChoice"
        },
        "NotificationMethodChoice": {
            "Type": "Choice",
            "Choices": [
                {
                    "Variable": "$.count",
                    "NumericGreaterThan": 1,
                    "Next": "SendByEmail"
                },
                {
                    "Variable": "$.count",
                    "NumericLessThanEquals": 1,
                    "Next": "SendBySMS"
                }
            ],
            "Default": "SendByEmail"
        },
        "SendByEmail": {
            "Type": "Task",
            "Resource": "",
            "End": true
        },
        "SendBySMS": {
            "Type": "Task",
            "Resource": "",
            "End": true
        }
    }
}

Once you’re done pasting this JSON document, press the Refresh button of the Preview section right above the Code editor and... voilà! The state machine now shows up in its full, visual glory:

AWS Step Functions Visual Workflow

We’re not quite done yet. But before we complete the last steps to get a fully functional Step Functions state machine, let me take a few minutes to walk you through some of the technical details of my state machine JSON file.

Note that 4 states are of type "Task" but that their Resource attributes are empty. These 4 "Task" states represent the calls to our 4 Lambda functions and should thus reference the ARNs (Amazon Resource Names) of our Lambda functions. You might think you have to get these ARNs one by one—which might prove to be tedious—but don't be discouraged; AWS provides a neat little trick to get these ARNs automatically populated!

Simply click inside the double quotes for each Resource attribute and the following drop-down list should appear (if it doesn't, make sure you are creating your state machine in the same region as your Lambda functions):

AWS Step Functions Code Editor - Lambda Functions ARN Dropdown List

Once you have filled out the 4 empty Resource attributes with their expected values, press the Create State Machine button at the bottom. Last, select the IAM role that will execute your state machine (AWS should have conveniently created one for you) and press OK:

AWS Step Functions IAM Role

On the page that appears, press the New execution button:

AWS Step Functions - Created

Enter the following JSON test document (with a valid emailTo field) and press Start Execution:

{
    "startsWith": "M",
    "cuisine": "Italian",
    "zipcode": "10036",
    "phoneTo": "+15555555555",
    "firstnameTo": "Raphael",
    "emailTo": "raphael@example.com",
    "subject": "List of restaurants for {{firstnameTo}}",
}

If everything was properly configured, you should get a successful result, similar to the following one:

AWS Step Functions - Execution Result

If you see any red boxes (in lieu of a green one), check CloudWatch where the Lambda functions log their errors. For instance, here is one you might get if you forgot to update the emailTo field I mentioned above:

AWS Step Functions - CloudWatch Error

And that's it (I guess you can truly say we’re "done done" now)! You have successfully built and deployed a fully functional cloud workflow that mashes up various API services thanks to serverless functions.

For those of you who are still curious, read on to learn how that sample state machine was designed and architected.

Design and architecture choices

Let's start with the state machine design:

  1. The GetRestaurants function queries a MongoDB Atlas database of restaurants using some search criteria provided by our calling application, such as the restaurant's cuisine type, its zip code and the first few letters of the restaurant's name. It retrieves a list of matching restaurants and passes that result to the next function (CountItems). As I pointed out above, it uses MongoDB's aggregation framework to retrieve the worst and average health score granted by New York's Health Department during its food safety inspections. That data provides the end user with information on the presumed cleanliness and reliability of the restaurant she intends to go to. Visit the aggregation framework documentation page to learn more about how you can leverage it for advanced insights into your data.

  2. The CountItems method counts the number of the restaurants; we'll use this number to determine how the requesting user is notified.

  3. If we get a single restaurant match, we'll send the name and address of the restaurant to the user's cell phone using the SendBySMS function.

  4. However, if there's more than one match, it's probably more convenient to display that list in a table format. As such, we'll send an email to the user using the SendByEmail method.

At this point, you might ask yourself: how is the data passed from one lambda function to another?

As it turns out, the Amazon States Language provides developers with a flexible and efficient way of treating inputs and outputs. By default, the output of a state machine function becomes the input of the next function. That doesn't exactly work well for us since the SendBySMS and SendByEmail methods must know the user's cell phone number or email address to properly work. An application that would like to use our state machine would have no choice but to pass all these parameters as a single input to our state machine, so how do we go about solving this issue?

Fortunately for us, the Amazon States Language has the answer: it allows us to easily append the result of a function to the input it received and forward the concatenated result to the next function. Here's how we achieved this with our GetRestaurants function:

"GetRestaurants": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
    "ResultPath": "$.restaurants",
    "Next": "CountItems"
}

Note the ResultPath attribute above where we instruct Step Functions to append the result of our GetRestaurants task (an array of matching restaurants) to the input it received, whose structure is the test JSON document I mentioned above (duplicated here for reading convenience):

{
    "startsWith": "M",
    "cuisine": "Italian",
    "zipcode": "10036",
    "phoneTo": "+15555555555",
    "firstnameTo": "Raphael",
    "emailTo": "raphael@example.com",
    "subject": "List of restaurants for {{firstnameTo}}"
}

This input contains all the information my state machine might need, from the search criteria (startsWith, cuisine, and zipcode), to the user's cell phone number (if the state machine ends up using the SMS notification method), first name, email address and email subject (if the state machine ends up using the email notification method).

Thanks to the ResultPath attribute we set on the GetRestaurants task, its output has a structure similar to the following JSON document (additional data in bold):


{
  "firstnameTo": "Raphael",
  "emailTo": "raphael@example.com",
  "subject": "List of restaurants for {{firstnameTo}}",
 <strong>"restaurants": [
  {
    "address": {
      "building": "235-237",
      "street": "West 48 Street"
    },
    "borough": "Manhattan",
    "name": "La Masseria"
  },
  {
    "address": {
      "building": "315",
      "street": "West 48 Street"
    },
    "borough": "Manhattan",
    "name": "Maria'S Mont Blanc Restaurant"
  },
  {
    "address": {
      "building": "654",
      "street": "9 Avenue"
    },
    "borough": "Manhattan",
    "name": "Cara Mia"
  }
]</strong>
}

As expected, the restaurants sub-document has been properly appended to our original JSON input. That output becomes by default the input for the CountItems method. But, we don't want that function to have any dependency on the input it receives. Since it's a helper function, we might want to use it in another scenario where the input structure is radically different. Once again, the Amazon States Language comes to the rescue with the optional InputPath parameter. Let's take a closer look at our CountItems task declaration in the state machine’s JSON document:

"CountItems": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
    "InputPath": "$.restaurants",
    "ResultPath": "$.count",
    "Next": "NotificationMethodChoice"
}

By default, the InputPath value is the whole output of the preceding task (GetRestaurants in our state machine). The Amazon States Language allows you to override this parameter by explicitly setting it to a specific value or sub-document. As you can see in the JSON fragment above, this is exactly what I have done to only pass an array of JSON elements to the CountItems Lambda function (in my case, the array of restaurants we received from our previous GetRestaurants function), thereby making it agnostic to any JSON schema. Conversely, the result of the CountItems task is stored in a new count attribute that serves as the input of the NotificationMethodChoice choice state that follows:

"NotificationMethodChoice": {
    "Type": "Choice",
    "Choices": [
        {
            "Variable": "$.count",
            "NumericGreaterThan": 1,
            "Next": "SendByEmail"
        },
        {
            "Variable": "$.count",
            "NumericLessThanEquals": 1,
            "Next": "SendBySMS"
        }
    ],
    "Default": "SendByEmail"
}

The logic here is fairly simple: if the restaurants count is greater than one, the state machine will send an email message with a nicely formatted table of the restaurants to the requesting user’s email address. If only one restaurant is returned, we’ll send a text message to the user’s phone number (using Twilio’s SMS API) since it’s probably faster and more convenient for single row results (especially since the user might be on the move while requesting this piece of information). Note that my JSON "code" actually uses the NumericLessThanEquals operator to trigger the SendBySMS task and not the Equals operator as it really should. So technically speaking, even if no result is returned from the GetRestaurants task, the state machine would still send a text message to the user with no restaurant information whatsoever! I’ll leave it up to you to fix this intentional bug.

Next steps

In this post, I showed you how to create a state machine that orchestrates calls to various cloud services and APIs using a fictitious restaurant search and notification scenario. I hope you enjoyed this tutorial explaining how to deploy and test that state machine using the AWS console. Last, I went through various design and architecture considerations, with a focus on data flow abilities available in Step Functions.

If you haven’t done so already, sign up for MongoDB Atlas and create your free M0 MongoDB cluster in minutes.
Next, you can get more familiar with AWS Lambda development and deployment by following our 101 Lambda tutorial.
If you already have some experience with AWS Lambda, Developing a Facebook Chatbot with AWS Lambda and MongoDB Atlas will walk through a richer use case.
As a last step, you might be interested in Step Functions integration with API Gateway to learn how to call a state machine from an external application.

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

Optimizing AWS Lambda performance with MongoDB Atlas and Node.js

I attended an AWS user group meeting some time ago, and many of the questions from the audience concerned caching and performance. In this post, I review the performance implications of using Lambda functions with any database-as-a-service (DBaaS) platform (such as MongoDB Atlas). Based on internal investigations, I offer a specific workaround available for Node.js Lambda functions. Note that other supported languages (such as Python) may only require implementing some parts of the workaround, as the underlying AWS containers may differ in their resource disposal requirements. I will specifically call out below which parts are required for any language and which ones are Node.js-specific.

AWS Lambda is serverless, which means that it is essentially stateless. Well, almost. As stated in its developer documentation, AWS Lambda relies on a container technology to execute its functions. This has several implications:

  • The first time your application invokes a Lambda function it will incur a penalty hit in latency – time that is necessary to bootstrap a new container that will run your Lambda code. The definition of "first time" is fuzzy, but word on the street is that you should expect a new container (i.e. a “first-time” event) each time your Lambda function hasn’t been invoked for more than 5 minutes.

  • If your application makes subsequent calls to your Lambda function within 5 minutes, you can expect that the same container will be reused, thus saving some precious initialization time. Note that AWS makes no guarantee it will reuse the container (i.e. you might just get a new one), but experience shows that in many cases, it does manage to reuse existing containers.

  • As mentioned in the How It Works page, any Node.js variable that is declared outside the handler method remains initialized across calls, as long as the same container is reused.

Understanding Container Reuse in AWS Lambda, written in 2014, dives a bit deeper into the whole lifecycle of a Lambda function and is an interesting read, though may not reflect more recent architectural changes to the service. Note that AWS makes no guarantee that containers are maintained alive (though in a "frozen" mode) for 5 minutes, so don’t rely on that specific duration in your code.

In our very first attempt to build Lambda functions that would run queries against MongoDB Atlas, our database as a service offering, we noticed the performance impact of repeatedly calling the same Lambda function without trying to reuse the MongoDB database connection. The wait time for the Lambda function to complete was around 4-5 seconds, even with the simplest query, which is unacceptable for any real-world operational application.

In our subsequent attempts to declare the database connection outside the handler code, we ran into another issue: we had to call db.close() to effectively release the database handle, lest the Lambda function time out without returning to the caller. The AWS Lambda documentation doesn’t explicitly mention this caveat which seems to be language dependent since we couldn’t reproduce it with a Lambda function written in Python.

Fortunately, we found out that Lambda’s context object exposes a callbackWaitsForEmptyEventLoop property, that effectively allows a Lambda function to return its result to the caller without requiring that the MongoDB database connection be closed (you can find more information about callbackWaitsForEmptyEventLoop in the Lambda developer documentation). This allows the Lambda function to reuse a MongoDB Atlas connection across calls, and reduce the execution time to a few milliseconds (instead of a few seconds).

In summary, here are the specific steps you should take to optimize the performance of your Lambda function:

  • Declare the MongoDB database connection object outside the handler method, as shown below in Node.js syntax (this step is required for any language, not just Node.js):
'use strict'

var MongoClient = require('mongodb').MongoClient;

let cachedDb = null;
  • In the handler method, set context.callbackWaitsForEmptyEventLoop to false before attempting to use the MongoDB database connection object (this step is only required for Node.js Lambda functions):
exports.handler = (event, context, callback) => {

    context.callbackWaitsForEmptyEventLoop = false;
  • Try to re-use the database connection object using the MongoDB.connect(Uri) method only if it is not null and db.serverConfig.isConnected() returns true (this step is required for any language, not just Node.js):
function connectToDatabase(uri) {
  
    if (cachedDb && cachedDb.serverConfig.isConnected()) {
        console.log('=> using cached database instance');
        return Promise.resolve(cachedDb);
    }
    const dbName = 'YOUR_DATABASE_NAME';
    return MongoClient.connect(uri)
        .then(client => { cachedDb = client.db(dbName); return cachedDb; });
}
  • Do NOT close the database connection! (so that it can be reused by subsequent calls).

The Serverless development with Node.js, AWS Lambda and MongoDB Atlas tutorial post makes use of all these best practices so I recommend that you take the time to read it. The more experienced developers can also find optimized Lambda Node.js functions (with relevant comments) in:

I’d love to hear from you, so if you have any question or feedback, don’t hesitate to leave them below.

Additionally, if you’d like to learn more about building serverless applications with MongoDB Atlas, I highly recommend our webinar below where we have an interactive tutorial on serverless architectures with AWS Lambda.

Watch Serverless Architectures with AWS Lambda and MongoDB Atlas

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.