GIANT Stories at MongoDB

Production Ready IoT with MongoDB Stitch and Electric Imp: Part 1

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. See all posts here.

Introduction

We’ve been tinkering with hardware a lot at MongoDB lately, and with this has come a lot of trial and error and first hand learnings about how hard building real IoT is.

While sending a message or getting a light to blink is easy, the jump between that and building something production ready can be daunting.

Really, what makes IoT hard is:

  • Building, testing, and modifying hardware (especially if you’re only used to software)
  • Establishing and maintaining a secure and scalable connection to the cloud in real-world environments
  • Standing up a full backend to process, persist, and analyze IoT data
  • Integrating all the other services you use without writing a lot of boring integration code
  • Making sure all aspects of the application are robust and secure

Even though MongoDB is the best database for IoT, there’s still a lot more to getting your project off the ground. We’ve found the following to be a killer combination to getting your end-to-end deployment running smoothly:

  • Electric Imp – Flexible development boards with pre-integrated software, a polished IDE, and managed, secure cloud connectivity.
  • MongoDB Stitch – A full backend as a service with integrations with your favorite services and built-in JavaScript functions.
  • MongoDB Atlas – The best database for persisting and analyzing your IoT data.

In the following tutorial we’re going to put all these together:

  • Use an impExplorer™ Developer Kit and the Electric Imp platform to read the temperature and send it to Stitch.
  • Stitch combines the temperature readings with more general weather data from Dark Sky, stores it in Atlas, and protects your data with field-level access rules.
  • Stitch also integrates with Twilio so you can retrieve temperature data from the database via text.

Getting Started with your Electric Imp Device

For this tutorial you will need to pick up Electric Imp’s impExplorer™ Developer Kit. This is what you will use to read the temperature and send it to our backend. Before you get set up, you’ll need to make an account for Electric Imp’s IDE and download the Electric Imp app (on an iOS or Android phone). Your account is where the device code will live, and you’ll use the app to link your device to your account.

Setting up your imp

To start, unpack your impExplorer. It should include a board, a card (a WiFi IoT module), and a USB power cord. Start by inserting the card into the board and powering it on by attaching the provided cable to your board and then plugging it into a USB port on your computer or a USB wall adaptor. If you are having trouble or want more information, see the impExplorer guide.

After you power on your board, use the Electric Imp app to ‘BlinkUp’ your board. This connects it to the internet and assigns it to your account so that you can deploy code to it. If you wait too long to configure your board – you know this when the card’s LED stops flashing – simply restart it and try again.

If you have issues blinking up your impExplorer you can check out Electric Imp’s troubleshooting guide.

Deploying the Code

Now that you’ve got your device blinked up, you’ll want to build out your device code quickly. Electric Imp actually breaks down the code into two components:

  • Device code reads inputs from the sensors on the impExplorer
  • Agent code ensures that the device communicates reliably and securely with the internet.

We’ve provided code that covers both the device and agent code for the impExplorer. You can find them with our other Github samples.

Once you have downloaded the code, sign into the Electric Imp IDE. From there you’ll need to do the following:

  • Create a model and link your Device: After signing in, the impExplorer device should appear under devices as an unassigned device.
    • Create a model by going to Models and clicking the +.
    • Assign your device to the new model (and rename your device if you choose).
  • Create a model and link your Device: After signing in, the impExplorer device should appear under devices as an unassigned device.
    • Create a model by going to Models and clicking the +.
    • Assign your device to the new model (and rename your device if you choose).

If you have any issues with the above process, see the tour of the Imp IDE.

Can’t wait for hardware and don’t mind cheating a little bit? We’ve got a node script that fakes its place of the demo. But it’s so much more fun with the actual device!

Connecting your Device to Stitch

Now you have a device that can measure and send the temperature, but nowhere to send it to. Luckily, with MongoDB Stitch you can quickly setup an entire backend.

Getting Started with Atlas

To start, if you don’t already have a MongoDB Atlas account then you’ll want to register for an account and create an Atlas cluster.

Create your Stitch Application

Once you have Atlas up and running, you can create a Stitch app to cover our backend connections and logic. To create an app, you can click on “Stitch Apps” on the left hand nav of the UI and then click “Create New Application” or follow the more in depth instructions on how to create a MongoDB Stitch app.

With Stitch, all of your data is protected by default, so you’ll have to do the following to grant your device the ability to write through Stitch:

  1. Create an API Key so the device authenticate to your Stitch application
  2. Put Rules in place such that devices can read/write data properly
  3. Create a function for the device to execute when it wants to write data

Setting up Authentication

To start, we’re going to enable API key authentication in the application. To do this:

  • First go to the Stitch UI
  • From there click on ‘Authentication’ in the left hand menu
  • Find ‘API Keys’ under ‘Providers’ and click ‘Edit’
  • Enable API Key authentication, and then create an API key (the name doesn’t matter).
    • Note: You can only view a key once, so if you lose this key you’ll have to generate a new one.

After you create and copy the key, go back to the Electric Imp IDE and add your App ID (found on the Stitch ‘Getting Started’ or ‘Clients’ page) and you API key to the code as shown below.

Note: In order to fully roll out this code you will need to ‘Build’ it by clicking the ‘Build’ button at the top of the IDE, and you may need to restart your device as well.

Now your device will be able to make authenticated requests to Stitch. However, you will still need to create rules and a Stitch Function to get everything working properly.

Creating Rules

One important thing to note with Stitch is that even when you’re authenticated, all access to your data and services is off by default. In order to access anything you must use Stitch’s rules to enable access.

Therefore, in order for our function that writes to Stitch to work properly, you will need to set a Write rule for the namespace. In order to do this:

  • Click on your MongoDB Atlas cluster under ‘Atlas Clusters’ on the left-hand menu.
  • In the ‘Rules’ tab, click ‘New’ to add a new collection to your rules, and enter ‘Imp’ for the database and the ‘TempData’ for the collection.
  • Click the created collection to edit the following details:
    • Click the ‘Filter’ tab and delete the existing ‘Filter’
    • Select ‘Top-Level Document’ to adjust the rules for the namespace as a whole
    • Set the write rule to {}
    • Delete the read rule entirely, leaving an empty box.

By doing this, you enable the device (and only the device) to write data and prevent all reads to the data by devices.

Creating a Stitch Function for Writing

Now that your device has a user and permissions associated with it, it’s time to define the function that it will use to write to the database.

Check out the code in the ‘Imp_Write.js’ file. This contains the JavaScript function that the ImpExplorer is calling in its agent code. With Functions, this code can be hosted and executed by Stitch. To add it to your application:

  • In the Stitch UI, click ‘Functions’ on the lefthand menu
  • Click ‘New Function’
  • In the ‘Settings’ tab, set your Function Name to “Imp_Write” and your ‘Can Evaluate’ rule to {}.
  • Go to the ‘Function Editor’ tab and paste the code from Imp_Write.js into the editor. Then click the save icon on the upper left-hand corner of the Editor.

Now that you’ve got the basic write functionality set up, let’s look at how Stitch’s built-in services and functions can help broaden the scope of your app.

Building out your Backend

In this section we’ll show you how you can integrate with Dark Sky to use its public weather API to incorporate additional data and with Twilio to search our data via text message.

Additional Weather data with Dark Sky

One of Dark Sky’s many services is providing a public API that serves real-time weather data. Here, you will use this service to pair real-time weather data with our device’s temperature and humidity readings as they are loaded.

If you don’t have a Dark Sky account you can register for one here. Once you register, you’ll need to copy down your API key as you will incorporate it into Stitch shortly.

To start, you will set up two values in Stitch. You can hard code things like this in your functions, but using Values is prefered as it makes them accessible for reuse.

  • In the Stitch UI, go to ‘Values’ found on the left-hand menu
  • Create two new values, clicking ‘Save’ after each is set up:
    • DeviceLocation: Enter the longitude/latitude of where your device is located in the form "40.757,-73.987" – example for Times Square in New York City.
    • DarkSkyKey: Enter the API key that you received from Dark Sky.

Next, create a HTTP service to make the actual Dark Sky request. To do this:

  1. In the Stitch UI, click ‘Services’, then click ‘HTTP’, and name your service ‘darksky’ before creating
  2. After creation, click into the ‘Rules’ tab, and then click ‘New’. The name of this rule does not matter.
  3. To create a rule that enables GET requests, click on the ‘GET’ action and set this rule to {}.

Since the code that you will be using to call the Dark Sky API is already in in the Imp_Write Function, all you need to do now is uncomment the lines that call Dark Sky. They should be the only commented out lines containing code. Now, when your impExplorer loads data Stitch will automatically combine it with data from the Dark Sky API.

Searching your Data with Twilio

Now let’s take a look at serving your data. Since you are using MongoDB Atlas behind the scenes you can always connect to your database and query it directly. However, today we’ll also show you how you can query your database via Twilio + Stitch.

If you don’t have a Twilio account and an SMS-enabled phone number, you can learn more and create one here.

Once you have your Twilio account set up, keep your Account SID and Auth Token handy, as you will use them to configure your Twilio service. To do this:

  • Go to the Stitch UI and click on ‘Services’
  • Click ‘Twilio’, assign the service the name ‘twilio’, and then add your SID and Token.
  • Once your Twilio service is configured, set up a rule that enables Stitch to send text messages. To do this:
    • Click into the ‘Rules’ tab of your Twilio service
    • Click ‘New Rules’, assign it any name that you like, and then click ‘Add Rule’
    • Now click into the rule and click the ‘send’ to enable it.

You will also need to store your Twilio phone number as a Value named ‘TwilioPhone’ in Stitch. To do this:

  1. In the Stitch UI, Navigate to ‘Values’ using the left-hand menu.
  2. Click ‘New’ value and add your Twilio number in the form “+15558675309” before clicking ‘Save’.

After that is taken care of you are going to set up an incoming webhook that will parse a text message from Twilio, use it to search the database, and then send a return text message with weather information for a specific time. This code is contained in the twilio_webhook.js file.

To add this code to Stitch:

  1. In the Stitch UI, navigate to your Twilio service
  2. Under the ‘Incoming Webhooks’ tab click ‘+ Add Incoming Webhook’
  3. Under the ‘settings’ tab assign your webhook a name (exact naming is not important) and enable ‘Respond with Result’ by clicking on it
  4. Then, in the Function Editor, paste the code from the twilio_find.js file and click the save icon in the upper left-hand corner.

Before moving on, copy down the ‘Webhook URL’ under the ‘Settings’ tab. You’ll need to add this to Twilio in our final step.

Now that the webhook is set up in Stitch, make sure that Twilio will call it correctly. In order to do this:

  1. Go to your Twilio Console and navigate to ‘Phone Numbers’ using the left-hand menu.
  2. Click the number that you are using for this demo, then go to the ‘Messaging’ section at the bottom of the page.
  3. Make sure that ‘Configure With’ is set to ‘Webhooks,...’ and that when ‘A Message Comes In’ is configured to execute your Twilio webhook by selecting ‘Webhook’ and pasting your Webhook URL.
  4. Save any changes before navigating away.

Now Twilio will only be able to access data via the webhook that it is assigned, meaning that it doesn’t change how secure your app is.

Test everything out!

Now that you’ve got everything up and running, try querying your data via Twilio by sending a message to your Twilio number in one of the following formats:

  • Send “Temp” or “Temp Now” to get the current temperature.
  • Or search for a specific time by sending “Temp [Time]” where [Time] is in the format “YYYY-MM-DDThh:mm” (ex. “Temp 2017-11-27T11:20”)

Summary

Both MongoDB Stitch and Electric Imp offer you a big leg up when getting your IoT project off the ground. Now that you’ve gone through the basics with Stitch and Electric Imp, you can move onto some of our other projects like building a real-time dashboard or build your next great idea.

As always, we’re interested to hear about what you’re building. Please continue the conversation by commenting on this post or send me a note at drew.dipalma@mongodb.com to share what you’re working on!

Building a voice-activated movie search app powered by Amazon Lex, Lambda, and MongoDB Atlas - Part 3

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here.

Introduction

This is Part 3 of our Amazon Lex blog post series, part of our larger Road to re:Invent 2017 series. As a reminder, this tutorial is divided into 3 parts:

In this last blog post, we will deploy our Lambda function using the AWS Command Line Interface and verify that the bot fully works as expected. We’ll then review the code that makes up our Lambda function and explain how it works.

Let’s deploy our AWS Lambda function

Please follow the deployment steps available in this GitHub repository. I have chosen to use Amazon’s SAM Local tool to showcase how you can test your Lambda function locally using Docker, as well as package it and deploy it to an AWS account in just a few commands. However, if you’d like to deploy it manually to the AWS Console, you can always use this zip script to deploy it in pretty much the same way I did in this MongoDB Atlas with Lambda tutorial.

Let’s test our Lex bot (end-to-end)

Now that our Lambda fulfillment function has been deployed, let’s test our bot again in the Amazon Lex console and verify that we get the expected response. For instance, we might want to search for all the romance movies Jennifer Aniston starred in, a scenario we can test with the following bot conversation:

Amazon Lex Test Bot UI

Amazon Lex Test Bot UI

Amazon Lex Test Bot UI

As the screenshot above testifies, the Lex bot replies with the full list of Jennifer Aniston’s romance movies retrieved from our movies MongoDB database through our Lambda function. But how does our Lambda function process that request? We’ll dig deeper into our Lambda function code in the next section.

Let's dive into the Lambda function code

Our Lambda function always receives a JSON payload with a structure compliant with Amazon Lex’ input event format (as this event.json file is):

{
  "messageVersion": "1.0",
  "invocationSource": "FulfillmentCodeHook",
  "userId": "user-1",
  "sessionAttributes": {},
  "bot": {
    "name": "SearchMoviesBot",
    "alias": "$LATEST",
    "version": "$LATEST"
  },
  "outputDialogMode": "Text",
  "currentIntent": {
    "name": "SearchMovies",
    "slots": {
      "castMember": "jennifer aniston",
      "year": "0",
      "genre": "Romance"
    }
  }
}

Note that the request contains the bot’s name (SearchMoviesBot) and the slot values representing the answers to the bot’s questions provided by the user.

The Lambda function starts with the exports.handler method which validates the bot’s name and performs some additional processing if the payload is received through Amazon API Gateway (this is only necessary if you want to test your Lambda function through Amazon API Gateway but is not relevant in an Amazon Lex context). It then calls the dispatch() method, which takes care of connecting to our MongoDB Atlas database and passing on the bot’s intent to the query() method, which we’ll explore in a second. Note that the dispatch() method uses the performance optimization technique I highlighted in Optimizing AWS Lambda performance with MongoDB Atlas and Node.js, namely not closing the database connection and using the callbackWaitsForEmptyEventLoop Lambda context property. This allows our bot to be more responsive after the first query fulfilled by the Lambda function.

Let’s now take a closer look at the query() method, which is the soul and heart of our Lambda function. First, that method retrieves the cast member, movie genre, and movie release year. Because these values all come as strings and the movie release year is stored as an integer in MongoDB, the function must convert that value to an integer.

We then build the query we will run against MongoDB:

var castArray = [castMember];

var matchQuery = {
    Cast: { $in: castArray },
    Genres: { $not: { $in: ["Documentary", "News", ""] } },
    Type: "movie"
  };

  if (genre != undefined && genre != allGenres) {
    matchQuery.Genres = { $in: [genre] };
    msgGenre = genre.toLowerCase();
  }

  if ((year != undefined && isNaN(year)) || year > 1895) {
    matchQuery.Year = year;
    msgYear = year;
  }

We first restrict the query to items that are indeed movies (since the database also stores TV series) and we exclude some irrelevant movie genres such as the documentary and news genres. We also make sure we only query movies in which the cast member starred. Note that the $in operator expects an array, which is why we have to wrap our unique cast member into the castArray array. Since the cast member is the only mandatory query parameter, we add it first and then optionally add the Genres and Year parameters if the code determines that they were provided by the user (i.e. the user did not use the All and/or 0 escape values).

The query() method then goes on to define the default response message based on the user-provided parameters. This default response message is used if the query doesn’t return any matching element:

var resMessage = undefined;
  if (msgGenre == undefined && msgYear == undefined) {
    resMessage = `Sorry, I couldn't find any movie for ${castMember}.`;
  }
  if (msgGenre != undefined && msgYear == undefined) {
    resMessage = `Sorry, I couldn't find any ${msgGenre} movie for ${castMember}.`;
  }
  if (msgGenre == undefined && msgYear != undefined) {
    resMessage = `Sorry, I couldn't find any movie for ${castMember} in ${msgYear}.`;
  }
  if (msgGenre != undefined && msgYear != undefined) {
    resMessage = `Sorry, ${castMember} starred in no ${msgGenre} movie in ${msgYear}.`;
  }

The meat of the query() method happens next as the code performs the database query using 2 different methods: the classic db.collection.find() method and the db.collection.aggregate() method. The default method used in this Lambda function is the aggregate one, but you can easily test the find() method by setting the *aggregationFramewor*k variable to false.

In our specific use case scenario (querying for one single cast member and returning a small amount of documents), there likely won’t be any noticeable performance or programming logic impact. However, if we were to query for all the movies multiple cast members each starred in (i.e. the union of these movies, not the intersection), the aggregation framework query is a clear winner. Indeed, let’s take a closer look at the find() query the code runs:

cursor = db.collection(moviesCollection)
      .find(matchQuery, { _id: 0, Title: 1, Year: 1 })
      .collation(collation)
      .sort({ Year: 1 });

It’s a fairly simple query that retrieves the movie’s title and year, sorted by year. Note that we also use the same { locale: "en", strength: 1 } collation we used to create the case-insensitive index on the Cast property in Part 2 of this blog post series. This is critical since the end user might not title case the cast member’s name (and Lex won’t do it for us either).

The simplicity of the query is in contrast to the relative complexity of the app logic we have to write to process the result set we get with the find() method:

var maxYear, minYear;
for (var i = 0, len = results.length; i < len; i++) { 
    castMemberMovies += `${results[i].Title} (${results[i].Year}), `;
}

 //removing the last comma and space
castMemberMovies = castMemberMovies.substring(0, castMemberMovies.length - 2);

moviesCount = results.length;
var minYear, maxYear;
minYear = results[0].Year;
maxYear = results[results.length-1].Year;
yearSpan = maxYear - minYear;

First, we have to iterate over all the results to concatenate its Title and Year properties into a legible string. This might be fine for 20 items, but if we had to process hundreds of thousands or millions of records, the performance impact would be very noticeable. We further have to remove the last period and white space characters of the concatenated string since they’re in excess. We also have to manually retrieve the number of movies, as well as the low and high ends of the movie release years in order to compute the time span it took the cast member to shoot all these movies. This might not be particularly difficult code to write, but it’s clutter code that affects app clarity. And, as I wrote above, it definitely doesn’t scale when processing millions of items.

Contrast this app logic with the succinct code we have to write when using the aggregation framework method:

for (var i = 0, len = results.length; i < len; i++) { 
    castMemberMovies = results[i].allMovies;
    moviesCount = results[i].moviesCount;
    yearSpan = results[i].timeSpan;
}

The code is not only much cleaner and concise now, it’s also more generic, as it can handle the situation where we want to process movies for each of multiple cast members. You can actually test this use case by uncommenting the following line earlier in the source code:

castArray = [castMember, "Angelina Jolie"]

and by testing it using this SAM script.

With the aggregation framework, we get the correct raw and final results without changing a single line of code:

MongoDB Aggregation Framework Query Response

However, the find() method’s post-processing requires some significant effort to fix this incorrect output (the union of comedy movies in which Angelina Jolie or Brad Pitt starred in, all incorrectly attributed to Brad Pitt):

MongoDB Find Query Response

We were able to achieve this code conciseness and correctness by moving most of the post-processing logic to the database layer using a MongoDB aggregation pipeline:

cursor = db.collection(moviesCollection).aggregate(
      [
        { $match: matchQuery },
        { $sort: { Year: 1 } },
        unwindStage,
        castFilterStage,
        { $group: {
            _id: "$Cast",
            allMoviesArray: {$push: {$concat: ["$Title", " (", { $substr: ["$Year", 0, 4] }, ")"] } },
            moviesCount: { $sum: 1 },
            maxYear: { $last: "$Year" },
            minYear: { $first: "$Year" }
          }
        },
        {
          $project: {
            moviesCount: 1,
            timeSpan: { $subtract: ["$maxYear", "$minYear"] },
            allMovies: {
              $reduce: {
                input: "$allMoviesArray",
                initialValue: "",
                in: {
                  $concat: [
                    "$$value",
                    {
                      $cond: {
                        if: { $eq: ["$$value", ""] },
                        then: "",
                        else: ", "
                      }
                    },
                    "$$this"
                  ]
                }
              }
            }
          }
        }
      ],
      {collation: collation}

);

This aggregation pipeline is arguably more complex than the find() method discussed above, so let’s try to explain it one stage at a time (since an aggregation pipeline consists of stages that transform the documents as they pass through the pipeline):

  1. $match stage: performs a filter query to only return the documents we’re interested in (similarly to the find() query above).
  2. $sort stage: sorts the results by year ascending.
  3. $unwind stage: splits each movie document into multiple documents, one for each cast member in the original document. For each original document, this stage unwinds the Cast array of cast members and creates separate, unique documents with the same values as the original document, except for the Cast property which is now a string value (equal to each cast member) in each unwinded document. This stage is necessary to be able to group by only the cast members we’re interested in (especially if there are more than one). The output of this stage may contain documents with other cast members irrelevant to our query, so we must filter them out in the next stage.
  4. $match stage: filters the deconstructed documents from the $unwind stage by only the cast members we’re interested in. This stage essentially removes all the documents tagged with cast members irrelevant to our query.
  5. $group stage: groups movies by cast member (for instance, all movies with Brad Pitt and all movies with Angelina Jolie, separately). This stage also concatenates each movie title and release year into the Title (Year) format and adds it to an array called allMoviesArray (one such array for each cast member). This stage also computes a count of all movies for each cast member, as well as the earliest and latest year the cast member starred in a movie (of the requested movie genre, if any). This stage essentially performs most of the post-processing we previously had to do in our app code when using the find() method. Because that post-processing now runs at the database layer, it can take advantage of the database server’s computing power along with the distributed system nature of MongoDB (in case the collection is partitioned across multiple shards, each shard performs this stage independently of the other shards).
  6. $project stage: last but not least, this stage performs a $reduce operation (new in MongoDB 3.4) to concatenate our array of ‘Title (Year)’ strings into one single string we can use as is in the response message sent back to the bot.

Once the matching movies have been retrieved from our MongoDB Atlas database, the code generates the proper response message and sends it back to the bot according to the expected Amazon Lex response format:

 if (msgGenre != allGenres) {
                resMessage = `${toTitleCase(castMember)} starred in 
                the following ${moviesCount>1?moviesCount+" ":""}
                ${msgGenre.toLowerCase()} movie(s)${yearSpan>0?" over " 
                + yearSpan +" years":""}: ${castMemberMovies}`;
} else {
    resMessage = `${toTitleCase(castMember)} starred in the following 
    ${moviesCount>1?moviesCount+" ":""}movie(s)${yearSpan>0?" over " 
    + yearSpan +" years":""}: ${castMemberMovies}`;
}
if (msgYear != undefined) {
    resMessage = `In ${msgYear}, ` + resMessage;

callback(
    close(sessionAttributes, "Fulfilled", {
        contentType: "PlainText",
        content: resMessage
    })
);

Our Jennifer Aniston fan can now be wowed by the completeness of our bot's response!

Amazon Lex MongoDB response

Wrap-up and next steps

This completes our Lex blog post series and I hope you enjoyed reading it as much as I did writing it.

In this final blog post, we tested and deployed a Lambda function to AWS using the SAM Local tool.

We also learned:

  • How a Lambda function processes a Lex request and responds to it using Amazon Lex’ input and out event format.

  • How to use a case-insensitive index in a find() or aggregate() query

  • How to make the most of MongoDB’s aggregation framework to move complexity from the app layer to the database layer

As next steps, I suggest you now take a look at the AWS documentation to learn how to deploy your bot to Facebook Messenger , Slack or to your own web site.

Happy Lex-ing!

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

Building a voice-activated movie search app powered by Amazon Lex, Lambda, and MongoDB Atlas - Part 2

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here.

Introduction

This is Part 2 of our Road to re:Invent 2017 blog post series. If you haven’t read it yet, take a look at Part 1 for a brief overview of Amazon Lex and instructions to set up our movie database with MongoDB Atlas, our fully managed database service.

As a reminder, this tutorial is divided into 4 parts:

In this blog post, we will set up our Lex bot in the AWS Console and verify that its basic flow works as expected. We’ll implement the business logic (which leverages MongoDB) in Part 3 of this post series.

Amazon Lex bot setup instructions

In this section, we will go through the whole process of creating our SearchMovies bot while explaining the architectural decisions I made.

After signing in into the AWS Console, select the Lex service (in the Artificial Intelligence section) and press the Create button.

Select the Custom bot option and fill out the form parameters as follows:

  • Bot name: SearchMoviesBot

  • Output voice: None

  • Session timeout: 5

  • COPPA: No

Press the Create button at the bottom of the form.

A new page appears, where you can create an intent. Press the Create Intent button and in the Add intent pop-up page, click the Create new intent link and enter SearchMovies in the intent name field.

In the Slot types section, add a new slot type with the following properties:

  • Slot type name: MovieGenre

  • Description: Genre of the movie (Action, Comedy, Drama…)

  • Slot Resolution: Restrict to Slot values and Synonyms

  • Values: All, Action, Adventure, Biography, Comedy, Crime, Drama, Romance, Thriller

image alt text

You can add synonyms to all these terms (which strictly match the possible values for movie genres in our sample database), but the most important one for which you will want to configure synonyms is the Any value. We will use it as a keyword to avoid filtering on movie genre in scenarios when the user cannot qualify the genre of the movie he’s looking for or wants to retrieve all the movies for a specific cast member. Of course, you can explore the movie database on your own to identify and add other movie genres I haven’t listed above. Once you’re done, press the Save slot type button.

Next, in the Slots section, add the following 3 slots:

  1. genre

    1. Type: MovieGenre

    2. Prompt: I can help with that. What's the movie genre?

    3. Required: Yes

  2. castMember

    1. Type: AMAZON.Actor

    2. Prompt: Do you know the name of an actor or actress in that movie?

    3. Required: Yes

  3. year

    1. Type: AMAZON.FOUR_DIGIT_NUMBER

    2. Prompt: Do you know the year {castMember}'s movie was released? If not, just type 0

    3. Required: Yes

Press the Save Intent button and verify you have the same setup as shown in the screenshot below:

image alt text

The order of the slots is important here: once the user’s first utterance has been detected to match a Lex intent, the Lex bot will (by default) try to collect the slot values from the user in the priority order specified above by using the Prompt texts for each slot. Note that you can use previously collected slot values in subsequent slot prompts, which I demonstrate in the ‘year’ slot. For instance, if the user answered Angelina Jolie to the castMember slot prompt, the year slot prompt will be: ‘Do you know the year Angelina Jolie’s movie was released? If not, just type 0

Note that it’s important that all the slots are marked Required. Otherwise, the only opportunity for the user to specify them is to mention them in the original utterance. As you will see below, we will provide such ability for Lex to identify slots right from the start, but what if the user chooses to kick off the process without mentioning any of them? If the slots aren’t required, they are by default overlooked by the Lex bot so we need to mark them Required to offer the user the option to define them.

But what if the user doesn’t know the answer to those prompts? We’ve handled this case as well by defining "default" values: All for the genre slot and 0 for the year slot. The only mandatory parameter the bot’s user must provide is the cast member’s name; the user can restrict the search further by providing the movie genre and release year.

Last, let’s add the following sample utterances that match what we expect the user will type (or say) to launch the bot:

  • I am looking for a movie

  • I am looking for a ​{genre}​ movie

  • I am looking for a movie released in ​{year}​

  • I am looking for a ​{genre}​ movie released in ​{year}​

  • In which movie did ​{castMember}​ play

  • In which movie did ​{castMember}​ play in {year}

  • In which ​{genre}​ movie did ​{castMember}​ play

  • In which ​{genre}​ movie did ​{castMember}​ play in {year}

  • I would like to find a movie

  • I would like to find a movie with ​{castMember}​

Once the utterances are configured as per the screenshot below, press Save Intent at the bottom of the page and then Build at the top of the page. The process takes a few seconds, as AWS builds the deep learning model Lex will use to power our SearchMovies bot.

image alt text

It’s now time to test the bot we just built!

Testing the bot

Once the build process completes, the test window automatically shows up:

image alt text

Test the bot by typing (or saying) sentences that are close to the sample utterances we previously configured. For instance, you can type ‘Can you help me find a movie with Angelina Jolie?’ and see the bot recognize the sentence as a valid kick-off utterance, along with the {castMember} slot value (in this case, ‘Angelina Jolie’). This can be verified by looking at the Inspect Response panel:

image alt text

At this point, the movie genre hasn’t been specified yet, so Lex prompts for it (since it’s the first required slot). Once you answer that prompt, notice that Lex skips the second slot ({castMember}) since it already has that information.

Conversely, you can test that the ‘Can you help me find a comedy movie with angelina jolie?’ utterance will immediately prompt the user to fill out the {year} slot since both the {castMember} and {genre} values were provided in the original utterance:

image alt text

An important point to note here is that enumeration slot types (such as our MovieGenre type) are not case-sensitive. This means that both "comedy" and “coMeDy” will resolve to “Comedy”. This means we will be able to use a regular index on the Genres property of our movies collection (as long as our enumeration values in Lex match the Genres case in our database).

However, the AMAZON.Actor type is case sensitive - for instance, "angelina jolie" and “Angelina Jolie” are 2 distinct values for Lex. This means that we must define a case-insensitive index on the Cast property (don’t worry, there is already such an index, called ‘Cast_1’ in our sample movie database). Note that in order for queries to use that case-insensitive index, we’ll have to make sure our find() query specifies the same collation as the one used to create the index (locale=’en’ and strength=1). But don’t worry for now: I’ll make sure to point it out again in Part 3 when we review the code of our chat’s business logic (in the Lambda function we’ll deploy).

Summary

In this blog post, we created the SearchMovies Lex bot and tested its flow. More specifically, we:

  • Created a custom Lex slot type (MovieGenre)

  • Configured intent slots

  • Defined sample utterances (some of which use our predefined slots)

  • Tested our utterances and the specific prompt flows each of them starts

We also identified the case sensitivity of a built-in Lex slot that adds a new index requirement on our database.

In Part 3, we’ll get to the meat of this Lex blog post series and deploy the Lambda function that will allow us to complete our bots’ intended action (called ‘fulfillment’ in the Lex terminology).

Meanwhile, I suggest the following readings to further your knowledge of Lex and MongoDB:

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

The User Guide to AWS re:Invent

This post is a mini-guide that walks through some of the things to do while you are at AWS re:Invent this year.

Building a voice-activated movie search app powered by Amazon Lex, Lambda, and MongoDB Atlas - Part 1

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here.

Introduction

As we prepare to head out to Las Vegas for AWS re:Invent 2017, I thought it’d be a good opportunity to explore how to combine serverless and artificial intelligence services such as Lex and Lambda with MongoDB Atlas, our fully managed database service.

This tutorial is divided into 3 parts:

Since this is Part 1 of our blog series, let’s dig right into it now.

What is Amazon Lex?

Amazon Lex is a deep learning service provided by AWS to power conversational bots (more commonly known as "chatbots"), which can either be text- or voice-activated. It’s worth mentioning that Amazon Lex is the technology that powers Alexa, the popular voice service available with Amazon Echo products and mobile applications (hence the Lex name). Amazon Lex bots are built to perform actions (such as ordering a pizza), which in Amazon lingo is referred to as intents.

Note that each bot may perform multiple intents (such as "booking a flight" and “booking a hotel”), which can each be kicked off by distinct phrases (called utterances). This is where the Natural Language Understanding (NLU) power of Lex bots shines — you define a few sample utterances and let the Lex AI engine infer all the possible variations of these utterances (another interesting aspect of Lex’ AI engine is its Automatic Speech Recognition technology, which allows).

Let's illustrate this concept with a fictitious, movie search scenario. If you create a SearchMovies intent, you may want to define a sample utterance as “I would like to search for a movie”, since you expect it to be what the user will say to express their movie search intention. But as you may well know, human beings have a tendency to express the same intention in many different ways, depending on their mood, cultural background, language proficiency, etc... So if the user types (or says) “I’d like to find a movie” or “I’d like to see a movie”, what happens? Well, you’ll find that Lex is smart enough to figure out that those phrases have the same meaning as “I would like to search for a movie” and consequently trigger the “SearchMovies” intent.

However, as our ancestors the Romans would say, dura lex sed lex and if the user’s utterance veers too far away from the sample utterances you have defined, Lex would stop detecting the match. For instance, while "I’d like to search for a motion picture" and “I’d like to see a movie” are detected as matches of our sample utterance (I would like to search for a movie), “I’d like to see a motion picture” is not (at least in the tests I performed).

The interim conclusion I drew from that small experiment is that Lex’ AI engine is not yet ready to power Blade Runner’s replicants or Westworld’s hosts, but it definitely can be useful in a variety of situations (and I’m sure the AWS researchers are hard at work to refine it).

In order to fulfill the intent (such as providing the name of the movie the user is looking for), Amazon Lex would typically need some additional information, such as the name of a cast member, the movie genre and the movie release year. These additional parameters are called slots in the Lex terminology and theye are collected one at a time after a specific Lex prompt.

For instance, after an utterance is detected to launch the SearchMovies intent, Lex may ask the following questions to fill all the required slots:

  • What's the movie genre? (to fill the genre slot)

  • Do you know the name of an actor or actress with a role in that movie? (to fill the castMember slot)

  • When was the movie was released? (to fill the year slot)

Once all the required slots have been filled, Lex tries to fulfill the intent by passing all the slot values to some business logic code that performs the necessary action — e.g, searching for matching movies in a movie database or booking a flight. As expected, AWS promotes its own technologies so Lex has a built-in support for Lambda functions, but you can also "return parameters to the client", which is the method you’ll want to use if you want to process the fulfillment in your application code (used in conjunction with the Amazon Lex Runtime Service API).

Demo bot scenario

Guess what? This will be a short section since the scenario we will implement in this blog post series is exactly the "fictitious example" I described above (what a coincidence!).

Indeed, we are going to build a bot allowing us to search for movies among those stored in a movie database. The data store we will use is a MongoDB database running in MongoDB Atlas, which is a good serverless fit for developers and DevOps folks who don’t want to set up and manage infrastructure.

Speaking of databases, it’s time for us to deploy our movie database to MongoDB Atlas before we start building our Lex bot.

Data setup and exploration

To set up the movie database, follow the instructions available in this GitHub repository.

Note that in order to keep the database dump file under GitHub's 100MB limit per file, the database I have included isn’t complete (for instance, it doesn’t include movies released prior to 1950 - sincere apologies to Charlie Chaplin fans).

Now, let’s take a look at a typical document in this database (Mr. & Mrs. Smith released in 2005):

{
    "_id" : ObjectId("573a13acf29313caabd287dd"),
    "ID" : 356910,
    "imdbID" : "tt0356910",
    "Title" : "Mr. & Mrs. Smith",
    "Year" : 2005,
    "Rating" : "PG-13",
    "Runtime" : "120 min",
    "Genre" : "Action, Comedy, Crime",
    "Released" : "2005-06-10",
    "Director" : "Doug Liman",
    "Writer" : "Simon Kinberg",
    "Cast" : [
        "Brad Pitt",
        "Angelina Jolie",
        "Vince Vaughn",
        "Adam Brody"
    ],
    "Metacritic" : 55,
    "imdbRating" : 6.5,
    "imdbVotes" : 311244,
    "Poster" : "http://ia.media-imdb.com/images/M/MV5BMTUxMzcxNzQzOF5BMl5BanBnXkFtZTcwMzQxNjUyMw@@._V1_SX300.jpg",
    "Plot" : "A bored married couple is surprised to learn that they are both assassins hired by competing agencies to kill each other.",
    "FullPlot" : "John and Jane Smith are a normal married couple, living a normal life in a normal suburb, working normal jobs...well, if you can call secretly being assassins \"normal\". But neither Jane nor John knows about their spouse's secret, until they are surprised to find each other as targets! But on their quest to kill each other, they learn a lot more about each other than they ever did in five (or six) years of marriage.",
    "Language" : "English, Spanish",
    "Country" : "USA",
    "Awards" : "9 wins & 17 nominations.",
    "lastUpdated" : "2015-09-04 00:02:26.443000000",
    "Type" : "movie",
    "Genres" : [
        "Action",
        "Comedy",
        "Crime"
    ]
}

I have highlighted the properties of interest to our use case. Each movie record typically includes the principal cast members (stored in a string array), a list of genres the movie can be categorized in (stored in a string array) and a release year (stored as a 4-digit integer).

These are the 3 properties we will leverage in our Lex bot (which we will create in Part 2) and consequently in our Lambda function (which we will build in Part 3) responsible for querying our movies database.

Storing these properties as string arrays is key to ensure that our bot is responsive: they allow us to build small, multikey indexes that will make our queries much faster compared to full collection scans (which regex queries would trigger).

Summary

In this blog post, we introduced the core concepts of Amazon Lex and described the scenario of the Lex bot we’ll create in Part 2. We then deployed a sample movie database to MongoDB Atlas, explored the structure of a typical movie document and identified the fields we’ll use in the Lambda function we’ll build in Part 3. We then reviewed the benefits of using secondary indexes on these fields to speed up our queries.

I have only scratched the surface on all these topics, so here is some additional content for those of you who strive to learn more:

I hope this introduction to Lex has drawn enough interest for you to continue our journey with Part 2!

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

Building a NodeJS App with MongoDB Atlas and AWS Elastic Container Service - Part 2

In my last post, we started preparing an application built on Node.js and MongoDB Atlas for simple CRUD operations. We've completed the initial configuration of the code and are now ready to launch this into production.

Predictions for AWS re:Invent 2017 (tl;dr: AI & IoT)

This post is the second installment of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here.

In just under two months, more than 46,000 technologists will descend on Las Vegas for this year’s AWS re:Invent. Ranging from seasoned members of the AWS community to the cloud-curious, re:Invent attendees should expect the conference’s sixth iteration to deliver the same parade of ecosystem partners, an extensive agenda focused on moving to (and being successful in) AWS cloud, and the inevitable announcement of a fresh batch of new AWS services.

In attempting to predict what this year’s re:Invent keynote will unveil, we’ll look at how the industry has changed since last November, as well as Amazon’s track record for debuting new products at past re:Invents.

Since last year’s conference, the two most significant shifts in the space are underpinned by the two largest trends of the moment: AI and IoT.

It is safe to assume that we will see an augmentation of AWS’s artificial intelligence and machine learning offerings next month. Last year’s conference brought us Lex, Polly, and Rekognition as Amazon made its entrée into advanced text, voice, and image processing. Widespread adoption of this flavor of artificial intelligence is still modest, so these releases may have been overshadowed by seemingly more relevant tools like Athena, which allows users to run SQL-based queries on data stored in S3. Nonetheless, the development of its AI portfolio is of strategic importance for AWS. Despite being the most popular public cloud, Amazon has faced increasing pressure from Azure and Google Cloud Platform. The latter has been able to differentiate itself among the early-adopter community primarily for its more mature AI offerings. To remain dominant over Google in the space, Amazon must prove able to keep up with the same pace of innovation in this sector.

The areas that appear most ripe for innovation from AWS this year are in voice, image, and video analysis. Already, we have seen success among e-commerce players when using text and image-based search to shorten their conversion cycles. In fact, Gartner reports that voice-based search is the fastest growing mobile search type. The opportunity to exploit users’ devices for image and voice-based search is evident in Amazon’s offerings (Alexa, Amazon iOS/Android app). Furthermore, the explosion of intelligent chat-based interfaces (Messenger, Drift, etc.) has increased the demand for a broader set of capabilities in natural language processing services like Lex. As a result, we should be prepared to see further enhancements to Lex, Polly, and Rekognition.

Video remains the one area of machine learning-based processing AWS has yet to touch. As their image analysis engines improve, the next logical step would be for the low-latency processing of video inputs. With the untold volume of video content being generated every day by ever-improving cameras, it stands to reason that organizations will want to turn that into insight and profit.

These first two predictions hint at another group of potential releases we could see from AWS next month. The development of extensible models for the analysis of text, voice, image, and video is predicated on the accessibility of high quality, low-cost microphones and cameras. While smartphones have supported these inputs for more than a decade now, the availability of WiFi and reliable cellular networks has increased the speed and frequency by which their outputs can be shared or uploaded for further analysis.

So, that brings us to our next theme: the Internet of Things.

Many analysts and skeptics have suggested IoT adoption is weak and its promises are over-hyped. Their skepticism is primarily centered on two ongoing challenges with IoT: 1) the lack of one or two emergent platforms on which IoT technologies can standardize and 2) the relatively limited ability for data from decentralized sensors to be analyzed at “the edge” rather than in a central cloud.

As with operating systems, media encodings, or network protocols, mass adoption of the technologies they support is typically preceded by one to three main players emerging as the default options. AWS entered the competition to build the winning IoT platform at re:Invent 2015 with its announcement of AWS IoT. All other major technology companies have made similar bids for dominance of this market. In addition, there are hundreds of venture-funded startups aiming to serve as a universal platform untethered from an existing “marketecture.” Nevertheless, the fact remains that no winner in this race has yet been crowned.

This remains a large opportunity and Amazon is well-poised with its existing portfolio of software and ecosystem of networking and hardware partners. AWS appeared to renew its commitment to capturing the IoT market at last year’s re:Invent with the debut of AWS Greengrass and Lambda@Edge. Greengrass allows for the running of Lambda functions on local, offline devices rather than in Amazon’s cloud. Lambda@Edge is one of AWS’s first forays into “edge computing,” allowing users to run low-latency and device-specific Node.js functions in their “edge locations”. Both releases mark a shift from centralized cloud computing to distributed edge computing—perhaps less comfortable for AWS, but necessary for sometimes-offline or time-sensitive IoT projects.

However, Greengrass was just the first step to enabling AWS users to better serve disparate, intermittently-connected devices. Notably, Greengrass still requires ML-powered data processing and analysis to take place in the cloud rather than locally (at the edge). Improvements in hardware technology may also prompt AWS to improve their on-device offerings and make services like S3 and DynamoDB available outside of their infrastructure to better store and process sensor data on the devices themselves. Similarly, we may also see devices become a more significant player in more seasoned services like Kinesis, enabling the local ingestion of data.

No matter what gets announced on the keynote stage this year, you can rest assured it will lead the conversation for the months that follow.

Building a NodeJS App with MongoDB Atlas and AWS Elastic Container Service - Part 1

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud.