GIANT Stories at MongoDB

Sitecore Tutorial: Deploy Sitecore on Azure & MongoDB Atlas

Sitecore training

This blog post is a tutorial written for Sitecore administrators who would like to deploy Sitecore on Microsoft Azure with MongoDB Atlas as the Database as a Service (DBaaS provider) for Sitecore’s MongoDB databases.

The Sitecore Azure Toolkit scripts allow you to easily deploy Sitecore as an App Service on Microsoft Azure, but the setup and configuration of the required analytics and tracking MongoDB databases is the responsibility of the operations team running the Sitecore cloud deployment.

Now that MongoDB Atlas is available on Microsoft Azure, you can use it to dramatically accelerate the time to market of your Sitecore Cloud deployment. Atlas makes maintenance easy by relying on MongoDB’s expertise to maintain the database for you instead of setting up and operating your own MongoDB infrastructure on Microsoft Azure Virtual Machines. Additionally, by hosting your Sitecore VMs in the same region as your MongoDB Atlas clusters, you benefit from fast, local Internet connections between Azure VMs and MongoDB Atlas. Here is what Sitecore has to say:

“With MongoDB Atlas on Azure, Sitecore customers now have the benefit of sourcing MongoDB directly from its creators,” said Ryan Donovan, Senior Vice President of Product Management at Sitecore. “This, coupled with MongoDB’s enterprise-class support and service levels, delivers a vehicle that seamlessly complements Sitecore’s strong commitment to the Microsoft Azure cloud.

Sitecore deployment on Azure

To install Sitecore on Microsoft Azure, you should start by reading the related Sitecore documentation page.

Once you have chosen your Sitecore deployment type (XP0, XM, XP or XDB) and uploaded the corresponding WebDeploy package to your Microsoft Azure storage account, head over to MongoDB Atlas to prepare the cluster. You will use it to host your Sitecore MongoDB database. If you don’t have a MongoDB Atlas account yet, register here to create one.

It is possible to host your Sitecore MongoDB cluster is an existing Atlas group, but recall that security configurations are scoped at the group, not cluster level. I highly recommend using a new, independent Atlas group for security reasons (namely, to keep its IP Whitelisting and database users configuration independent). The following tutorial assumes that we will deploy a Sitecore 8.2.3 XP0 environment using a dedicated Atlas group we’ll name Sitecore-Azure.

MongoDB Atlas cluster setup

Once you have signed in to MongoDB Atlas, select your name in the top right corner of any MongoDB Atlas page and select My Groups.

Sitecore training

Add a new group called Sitecore-Azure and make sure you choose MongoDB Atlas as the group type.

Sitecore training

Once your Atlas group has been created, press the Build a New Cluster button. Give a name to your cluster (for instance, Sitecore). Choose the Microsoft Azure provider and the region of your choice (among those supported by MongoDB Atlas). Using the same deployment region as your Sitecore web and Azure SQL servers provides latency benefits and cost savings. In this tutorial, I chose to deploy Sitecore in the westus region.

Sitecore training

Choose the M30 cluster instance size, knowing that you will always have the option to scale up to a larger instance size, without any upgrade downtime at all.

Sitecore training

Since we’re setting up a brand new cluster, you’ll need an administrator account. Scroll down to configure your cluster admin user (I use atlasAdmin as the admin user name) and press the Continue to Payment button. After filling out your credit card information, MongoDB Atlas starts provisioning your Sitecore cluster. It’s that easy!

MongoDB Atlas cluster security configuration

Sitecore needs a MongoDB database user account for access to its databases. While your cluster is being provisioned, head over to the Security tab to create database users. We highly recommend that you follow least-privilege access best practices and create a specific user for each of the 4 MongoDB databases.

Press the Add New User button to create the database user we’ll use to access the Analytics database. Security binds a user to one or more databases and in this tutorial, I chose the username scAnalytics and the analytics database name sitecoreAnalytics. The scAnalytics user should only have readWrite permissions on this database as shown in the screenshot below. The readWrite built-in role provides Sitecore all necessary access to create collections and change data while still following the least-privilege access best practice.

Select the Show Advanced Options link in the User Privileges section to add the readWrite permission.

Sitecore training

After creating 3 additional users for the 3 Sitecore tracking databases with similar permissions, the Security/MongoDB Users tab should display the following users:

Sitecore training

Now that we have user accounts, let’s move back to provisioning Sitecore. Before provisioning our Sitecore environment, we need to retrieve our database cluster’s connection string. Select the Clusters tab, select the Sitecore cluster and press the Connect button.

In the pop-up window, press the Copy button next to the URI Connection String and paste the connection string into a safe location.

Sitecore training

It’s now time to set up your Sitecore Cloud environment. There are 2 ways you can provision your Sitecore Cloud environment in Azure:

  1. Using the Sitecore Azure Toolkit
  2. Using the Sitecore Azure Marketplace wizard

I'll cover both options in the sections below.

Sitecore Cloud environment setup with the Sitecore Azure Toolkit

First, make sure your Windows (physical or virtual) machine matches the Sitecore Azure Toolkit requirements.

Next, from the Sitecore Azure Quickstarts GitHub repository, download the azuredeploy.parameters.json files from the proper folder. Since I want to install Sitecore 8.2.3 in a XP0 configuration, the corresponding folder is https://github.com/Sitecore/Sitecore-Azure-Quickstart-Templates/tree/master/Sitecore%208.2.3/xp0. Put this file at the root of the Sitecore Azure Toolkit folder on your Windows operating system, along with your Sitecore license file. Next, open the azuredeploy.parameters.json file in your favorite text editor.

Using Microsoft Azure Storage Explorer, right-click on each WDP file you previously uploaded to your Azure Storage account (as instructed in the Prepare WebDeploy packages section) and select the Get Shared Access Signature menu:

Sitecore training

The Shared Access Signature window shows up. Note that the Start and Expiry times might be slightly off and that the generated link might not be valid. I therefore recommend you decrease the Start Time by one hour (or more):

Sitecore training

Press the Create button, Copy the URL field and paste it to its corresponding parameter in the azuredeploy.parameters.json file, as instructed in the Sitecore environment template configuration configuration (in my case, I configured the singleMsDeployPackageUrl parameter).

Sitecore training

For the four MongoDB-related parameters (analyticsMongoDbConnectionString, trackingLiveMongoDbConnectionString, trackingHistoryMongoDbConnectionString and trackingContactMongoDbConnectionString), use the MongoDB Atlas connection string you previously retrieved and replace atlasAdmin with . Your connection string should then be similar to the following example:

mongodb://<USERNAME>:<PASSWORD>@sitecore-shard-00-00-x00xx.azure.mongodb.net:27017,sitecore-shard-00-01-x00xx.azure.mongodb.net:27017,sitecore-shard-00-02-x00xx.azure.mongodb.net:27017/<DATABASE>?ssl=true&replicaSet=Sitecore-shard-0&authSource=admin

Replace , and with the values you chose for each of the dedicated MongoDB users you set up, such as:

    <td>scTrackingLive</td>
    <td>[PASSWORD2]</td>
    <td>scTrackingHistory</td>
    <td>[PASSWORD3]</td>
    <td>sitecoreTrackingHistory</td>
</tr>
    <td>scTrackingContact</td>
    <td>[PASSWORD4]</td>
    <td>sitecoreTrackingContact</td>
</tr>
USERNAME PASSWORD DATABASE
scAnalytics [PASSWORD1] sitecoreAnalytics
sitecoreTrackingLive

Paste these connection strings to their corresponding parameters in the azuredeploy.parameters.json file. Don’t forget to also fill out other required parameters in that file, such as deploymentId, sqlServerLogin, sqlServerPassword and sitecoreAdminPassword.

Finally, open a Powershell command prompt running as administrator, navigate to the root folder of the Sitecore Azure Toolkit on your machine, and run the following commands:

Import-Module AzureRM
Import-Module .\tools\Sitecore.Cloud.Cmdlets.psm1 -Verbose
Login-AzureRMAccount

Provided you get no error, the last line should prompt a browser window requiring you to sign in with your Microsoft Azure account.

After successfully signing in with Azure, invoke the Sitecore deployment command. In my case, I ran the following command:

Start-SitecoreAzureDeployment -Location "westus" -Name "sc" -ArmTemplateUrl "https://raw.githubusercontent.com/Sitecore/Sitecore-Azure-Quickstart-Templates/master/Sitecore%208.2.3/xp0/azuredeploy.json" -ArmParametersPath ".\azuredeploy.parameters.json" -LicenseXmlPath ".\MongoDBTempLic.xml"

The command line should display “Deployment Started…” but since the Azure provisioning process takes a few minutes, I advise you follow the provisioning process from the Resource groups page on your Azure portal:

Sitecore training

Sitecore Cloud environment setup with the Sitecore Azure Marketplace wizard

If you prefer to use the more automated Sitecore wizard on Azure Marketplace, navigate to Sitecore Experience Platform product page and start the creation process by pressing Get It Now. Once you reach the Credentials tab, enter your 4 MongoDB Atlas connection strings, as shown in the screenshot below.

Sitecore training

After you complete the wizard, your Sitecore environment will be provisioned in Microsoft Azure similarly to the Sitecore Azure Toolkit process described above.

IP Whitelisting

Each Azure App Service exposes the outbound IP addresses it uses. While Microsoft doesn’t formally guarantee that these are fixed IPs, there seems to be evidence that these outbound IP addresses don’t change unless you make significant modifications to your app service (such as scaling it up or down). Another option would be to create an Azure App Service Environment, but this is outside the scope of this blog post.

To find out which outbound IP addresses your app service uses, head over to the Properties tab of your app service and copy the outbound IP addresses available in the namesake section:

Sitecore training

Navigate to the Security/IP Whitelist tab of your MongoDB Atlas cluster, press the Add IP Address button and add each Azure outbound IP address.

Testing connectivity with MongoDB Atlas

Once the Sitecore Powershell command completes, your Sitecore web site should be up and running at the url available in your Azure App Service page (in my case, the “sc-single” App Service):

Sitecore training

Copy/paste the URL available in your Azure App Service page into a browser (see screenshot above). The following page should appear:

Sitecore training

You can also navigate to *[your_azurewebsites_sitecore_url]/sitecore/admin* where you can access the site administration page. Use admin as the username and the sitecoreAdminPassword value from the azuredeploy.parameters.json file as your password.

Verify that your MongoDB Atlas cluster has the proper collections in each of the 4 databases previously mentioned by using MongoDB Atlas’ Data Explorer tab (or MongoDB Compass if you prefer to use a client-side tool). For example, the Sitecore Analytics database shows the following collections when using Sitecore 8.2.3:

Sitecore training

You can even drill down inside each collection to see the entries Sitecore might already have generated, for instance in the UserAgents collection:

Sitecore training

Conclusion

I hope that you found this tutorial helpful. You should now have a running Sitecore Cloud environment with MongoDB Atlas on Microsoft Azure.

If you’re interested in MongoDB Atlas and don’t have an account yet, you can sign up for free and create a cluster in minutes.

If you’d like to know more about MongoDB deployment options for Sitecore, including our Sitecore consulting engagement package, visit the MongoDB for Sitecore page.

Please use the comment form below to provide your feedback or seek help with any issues.

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

Announcing MongoDB Stitch: A Backend as a Service for MongoDB

Yesterday, we announced the release of MongoDB Stitch, a backend as a service for MongoDB designed for the modern application development paradigm. As with MongoDB Server, MongoDB Stitch was born to ameliorate the drudgery associated with building applications and replace it with the joy of creativity. What we did with the database 10 years ago, we now hope to do at the heart of application logic.


Back in 2007, MongoDB was born from 10gen, the platform as a service that Dwight Merriman and I created to solve a very similar problem, in a very different environment. Back then there were no services. The state of the art in application development was a Ruby on Rails app deployed to Heroku, and if you were really cutting-edge you'd be storing data in S3. Back then, we were trying to solve things like image thumbnailing and load balancing and rapid scaling… oh, and data storage... in such a way that developers could just write *their code* and not worry about the rest.

Speaking of which, how did people do, for example, image thumbnailing back then? Well, we all did the same thing: we installed ImageMagick on a bunch of servers and wrote some wrapper code. Of course, no-one would dream of doing that today. Today there are services for that sort of thing! In fact, we now have a virtual embarrassment of riches in the array of services available to handle everything from image OCR to credit card payments to geo-position mapping. But this glut of functionality has created a new pain point for developers.

A modern application developer needs to do three things above all:

1) support CRUD operations with data;

2) specify access control rules on their data; and

3) connect services together, and to their application, whether they are third-party services providing commodity functionality or proprietary microservices.

Unfortunately, each of these three things, until now, have required developers to create kloc upon kloc of the same time-draining, boilerplate code. Even when everything goes right with that code it's a drain on productivity, and when things go wrong, it becomes a liability ranging from costly to disastrous. All just to realize the promise of cloud computing and a service oriented architecture. This is exactly the kind of aggravation and inefficiency that MongoDB was founded to eliminate.

MongoDB Stitch does that by providing exactly the three things a modern application requires, as a simple layer adjacent to a MongoDB cluster. MongoDB Stitch provides a:

1) REST API to MongoDB, letting client code interact directly with the databases;

2) configuration based access control system, providing a flexible and powerful way to express precisely which users can perform what operations on what data; and

3) uniform, document-centric mechanism to connect services with custom application code.

And that's it. MongoDB Stitch can be used alongside existing code or to stand up brand new applications. Applications can do all the standard CRUD against MongoDB, but with complete assurance that clients can only access data to the exact extent to which they’re supposed to. Developers can compose services with MongoDB data operations into pipelines, meaning text messages routed from Twilio can become documents that can flow to MongoDB where they can be stored and continue on to S3 to be served via http. And none of it requires glue code beyond the bare minimum required to name and connect the services.

MongoDB Stitch is the next phase of MongoDB's mission to erase the routine tasks facing developers that lead to tedious, error-prone, and undifferentiated work. We're thrilled to finally have a chance to put the next piece of the puzzle in place.

P.S. I’m so thrilled that when I finally got to announce Stitch on stage at MongoDB World, I couldn’t help but show it off myself. If you want to see the code I demoed, it’s up on GitHub here.



About the Author - Eliot Horowitz

Eliot Horowitz is CTO and Co-Founder of MongoDB. Eliot is one of the core MongoDB kernel committers. Previously, he was Co-Founder and CTO of ShopWiki. Eliot developed the crawling and data extraction algorithm that is the core of its innovative technology. He has quickly become one of Silicon Alley's up and coming entrepreneurs and was selected as one of BusinessWeek's Top 25 Entrepreneurs Under Age 25 nationwide in 2006. Earlier, Eliot was a software developer in the R&D group at DoubleClick (acquired by Google for $3.1 billion). Eliot received a BS in Computer Science from Brown University.

Listen to Eliot Horowitz on the Future of the Database

"The main motivation for people trying out MongoDB and adopting MongoDB really came around from developers wanting to be more productive."

Six years after MongoDB was open sourced, we’re still thinking about how to empower software engineers by making app development more efficient and productive. Our CTO and co-founder, Eliot Horowitz, recently sat down with Jeff Meyerson, host of Software Engineering Daily, to talk about how the evolution of MongoDB and its ecosystem has been propelled by the goal of developer productivity.

MongoDB is best known for its JSON-based documents and Eliot explains that this data model provides a "fundamentally easier data structure for developers to work with. It more naturally suits the way programming languages work and thee way people think. No one thinks about breaking things up into rows and columns but they do think of things as structures."

{
'_id' : 1,
    'name' : { 'first' : 'John', 'last' : 'Backus' },
    'contribs' : [ 'Fortran', 'ALGOL', 'Backus-Naur Form', 'FP' ],
    'awards' : [
        {
            'award' : 'W.W. McDowell Award',
            'year' : 1967,
            'by' : 'IEEE Computer Society'
        }, {
            'award' : 'Draper Prize',
            'year' : 1993,
            'by' : 'National Academy of Engineering'
        }
    ]
}
*An example JSON document*

By basing all interactions with data on the document model the creators of MongoDB made it easier for them to store and work with data, and therefore easier to get more value out of it.

Reducing friction for developers doesn't just reduce developer headaches, it also has a direct impact on the bottom line. Since the 1980s hardware and infrastructure costs have fallen, the value of the individual developer has soared. Ensuring individual engineers are productive is critical to today’s businesses.

This is the story that Eliot and MongoDB have been telling for years, but it's particularly interesting to hear Eliot discuss how MongoDB has evolved alongside two other major trends in software engineering: cloud computing and service-oriented architectures (and, by extension, microservices).

Not coincidentally, both of these paradigms are also rooted in unburdening the individual developer. Cloud computing reduces things like lengthy infrastructure provisioning times whereas microservices decouple application logic to allow for faster iteration and feature development. As Eliot points out, it also fundamentally changes the way apps are built as developers are able to use third party services in place of coding necessary functionality from scratch.

Listen in to Eliot's conversation with Jeff as, in addition to talking about the evolution of MongoDB, Eliot talks about the future of the database as well as how we use our own products internally in a hybrid cloud configuration.

If you’re interested in listening to Jeff’s other conversations around the software landscape, Software Engineering Daily comprises hours of fascinating technical content and many of our own engineers are already avid listeners! I hope you'll listen in as this episode kicks off MongoDB’s first podcast partnership. We’re looking forward to engaging with you through this medium. As always, please give us suggestions for new ways to contribute to the ever-growing MongoDB community!

Listen to Eliot on the Software Engineering Daily podcast

Can't listen? You can view the transcript here.

Developing a Facebook Chatbot with AWS Lambda and MongoDB Atlas

This post is part of our Road to re:Invent series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud.

Road to re:Invent

Introduction

While microservices have been the hot trend over the past couple of years, serverless architectures have been gaining momentum by providing a new way to build scalable, responsive and cost effective applications. Serverless computing frees developers from the traditional cost and effort of building applications by automatically provisioning servers and storage, maintaining infrastructure, upgrading software, and only charging for consumed resources. More insight into serverless computing can be found in this whitepaper.

Amazon’s serverless computing platform, AWS Lambda, lets you run code without provisioning and running servers. MongoDB Atlas is Hosted MongoDB as a Service. MongoDB Atlas provides all the features of the database without the heavy operational lifting. Developers no longer need to worry about operational tasks such as provisioning, configuration, patching, upgrades, backups, and failure recovery. In addition, MongoDB Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. Together, AWS Lambda and MongoDB Atlas allow developers to spend more time developing code and less time managing the infrastructure.

Learn how to easily integrate an AWS Lambda Node.js function with a MongoDB database in this tutorial.

To demonstrate the power of serverless computing and managed database as a service, I’ll use this blog post to show you how to develop a Facebook chatbot that responds to weather requests and stores the message information in MongoDB Atlas.

Facebook chatbot

Setting Up MongoDB Atlas

MongoDB Atlas provides multiple size options for instances. Within an instance class, there is also the ability to customize storage capacity and storage speed, as well as to use encrypted storage volumes. The number of virtual CPUs (vCPUs) – where a vCPU is a shared physical core or one hyperthread – increases as the instance class grows larger.

The M10, M20, and M30 instances are excellent for development and testing purposes, but for production it is recommended to use instances higher than M30. The base options for instances are:

  • M0 - Variable RAM, 512 MB Storage
  • M10 – 2 GB RAM, 10 GB Storage, 1 vCPU
  • M20 – 4 GB RAM, 20 GB Storage, 2 vCPUs
  • M30 – 8 GB RAM, 40 GB Storage, 2 vCPUs
  • M40 – 16 GB RAM, 80 GB Storage, 4 vCPUs
  • M50 – 32 GB RAM, 160 GB Storage, 8 vCPUs
  • M60 – 64 GB RAM, 320 GB Storage, 16 vCPUs
  • M100 – 160 GB RAM, 1000 GB Storage, 40 vCPUs

Register with MongoDB Atlas and use the intuitive user interface to select the instance size, region, and features you need.

Connecting MongoDB Atlas to AWS Lambda

Important note: VPC Peering is not available with MongoDB Atlas free tier (M0). If you use an M0 cluster, allow any IP to connect to your M0 cluster and switch directly to the Set up AWS Lambda section.

MongoDB Atlas enables VPC (Virtual Private Cloud) peering, which allows you to easily create a private networking connection between your application servers and backend database. Traffic is routed between the VPCs using private IP addresses. Instances in either VPC can communicate with each other as if they are within the same network. Note, VPC peering requires that both VPCs be in the same region. Below is an architecture diagram of how to connect MongoDB Atlas to AWS Lambda and route traffic to the Internet.

AWS VPC Peering Architecture
Figure 1: AWS Peering Architecture Architecture

For our example, a Network Address Translation (NAT) Gateway and Internet Gateway (IGW) is needed as the Lambda function will require internet access to query data from the Yahoo weather API. The Yahoo weather API will be used to query real-time weather data from the chatbot. The Lambda function we will create resides in the private subnet of our VPC. Because the subnet is private, the IP addresses assigned to the Lambda function cannot be used in public. To solve this issue, a NAT Gateway can be used to translate private IP addresses to public, and vice versa. An IGW is also needed to provide access to the internet.

The first step is to set up an Elastic IP address, which will be the static IP address of your Lambda functions to the outside world. Go to Services->VPC->Elastic IPs, and allocate a new Elastic IP address.

Elastic IP address set up

Next we will create a new VPC, which you will attach to your Lambda function.

Go to Services->VPC->Start VPC Wizard.

After clicking VPC wizard, select VPC with Public and Private Subnets.

Let’s configure our VPC. Give the VPC a name (e.g., “Chatbot App VPC”), select an IP CIDR block, choose an Availability Zone, and select the Elastic IP you created in the previous step. Note, the IP CIDR that you select for your VPC, must not overlap with the Atlas IP CIDR. Click Create VPC to set up your VPC. The AWS VPC wizard will automatically set up the NAT and IGW.

You should see the VPC you created in the VPC dashboard.

Go to the Subnets tab to see if your private and public subnets have been set up correctly.

Click on the Private Subnet and go to the Route Table tab in the lower window. You should see the NAT gateway set to 0.0.0.0/0, which means that messages sent to IPs outside of the private subnet will be routed to the NAT gateway.

Next, let's check the public subnet to see if it’s configured correctly. Select Public subnet and the Route Table tab in the lower window. You should see 0.0.0.0/0 connected to your IGW. The IGW will enable outside internet traffic to be routed to your Lambda functions.

Now, the final step is initiating a VPC peering connection between MongoDB Atlas and your Lambda VPC. Log in to MongoDB Atlas, and go to Clusters->Security->Peering->New Peering Connection.

After successfully initiating the peering connection, you will see the Status of the peering connection as Waiting for Approval.

Go back to AWS and select Services->VPC->Peering Connections. Select the VPC peering connection. You should see the connection request pending. Go to Actions and select Accept Request.

Once the request is accepted, you should see the connection status as active.

We will now verify that the routing is set up correctly. Go to the Route Table of the Private Subnet in the VPC you just set up. In this example, it is rtb-58911e3e. You will need to modify the Main Route Table (see Figure 1) to add the VPC Peering connection. This will allow traffic to be routed to MongoDB Atlas.

Go to the Routes tab and select Edit->Add another route. In the Destination field, add your Atlas CIDR block, which you can find in the Clusters->Security tab of the MongoDB Atlas web console:

Click in the Target field. A dropdown list will appear, where you should see the peering connection you just created. Select it and click Save.

Now that the VPC peering connection is established between the MongoDB Atlas and AWS Lambda VPCs, let’s set up our AWS Lambda function.

Set Up AWS Lambda

Now that our MongoDB Atlas cluster is connected to AWS Lambda, let’s develop our Lambda function. Go to Services->Lambda->Create Lambda Function. Select your runtime environment (here it’s Node.js 4.3), and select the hello-world starter function.

Select API Gateway in the box next to the Lambda symbol and click Next.

Create your API name, select dev as the deployment stage, and Open as the security. Then click Next.

In the next step, make these changes to the following fields:

  • Name: Provide a name for your function – for example, lambda-messenger-chatbot
  • Handler: Leave as is (index.handler)
  • Role: Create a basic execution role and use it (or use an existing role that has permissions to execute Lambda functions)
  • Timeout: Change to 10 seconds. This is not necessary but will give the Lambda function more time to spin up its container on initialization (if needed)
  • VPC: Select the VPC you created in the previous step
  • Subnet: Select the private subnet for the VPC (don’t worry about adding other subnets for now)
  • Security Groups: the default security group is fine for now

Press Next, review and create your new Lambda function.

In the code editor of your Lambda function, paste the following code snippet and press the Save button:

'use strict';

var VERIFY_TOKEN = "mongodb_atlas_token";

exports.handler = (event, context, callback) => {
  var method = event.context["http-method"];
  // process GET request
  if(method === "GET"){
    var queryParams = event.params.querystring;
    var rVerifyToken = queryParams['hub.verify_token']
    if (rVerifyToken === VERIFY_TOKEN) {
      var challenge = queryParams['hub.challenge']
      callback(null, parseInt(challenge))
    }else{
      callback(null, 'Error, wrong validation token');
    }   
  }
};
 

This is the piece of code we'll need later on to set up the Facebook webhook to our Lambda function.

Set Up AWS API Gateway

Next, we will need to set up the API gateway for our Lambda function. The API gateway will let you create, manage, and host a RESTful API to expose your Lambda functions to Facebook messenger. The API gateway acts as an abstraction layer to map application requests to the format your integration endpoint is expecting to receive. For our example, the endpoint will be our Lambda function.

Go to Services->API Gateway->[your Lambda function]->Resources->ANY.

Click on Integration Request. This will configure the API Gateway to properly integrate Facebook with your backend application (AWS Lambda). We will set the integration endpoint to lambda-messenger-bot, which is the name I chose for our Lambda function.

Uncheck Use Lambda Proxy Integration and navigate to the Body Mapping Templates section.

Select When there are no templates defined as the Request body passthrough option and add a new template called application/json. Don't select any value in the Generate template section, add the code below and press Save:

##  See http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-mapping-template-reference.html
##  This template will pass through all parameters including path, querystring, header, stage variables, and context through to the integration endpoint via the body/payload
#set($allParams = $input.params())
{
"body-json" : $input.json('$'),
"params" : {
#foreach($type in $allParams.keySet())
    #set($params = $allParams.get($type))
"$type" : {
    #foreach($paramName in $params.keySet())
    "$paramName" : "$util.escapeJavaScript($params.get($paramName))"
        #if($foreach.hasNext),#end
    #end
}
    #if($foreach.hasNext),#end
#end
},
"stage-variables" : {
#foreach($key in $stageVariables.keySet())
"$key" : "$util.escapeJavaScript($stageVariables.get($key))"
    #if($foreach.hasNext),#end
#end
},
"context" : {
    "account-id" : "$context.identity.accountId",
    "api-id" : "$context.apiId",
    "api-key" : "$context.identity.apiKey",
    "authorizer-principal-id" : "$context.authorizer.principalId",
    "caller" : "$context.identity.caller",
    "cognito-authentication-provider" : "$context.identity.cognitoAuthenticationProvider",
    "cognito-authentication-type" : "$context.identity.cognitoAuthenticationType",
    "cognito-identity-id" : "$context.identity.cognitoIdentityId",
    "cognito-identity-pool-id" : "$context.identity.cognitoIdentityPoolId",
    "http-method" : "$context.httpMethod",
    "stage" : "$context.stage",
    "source-ip" : "$context.identity.sourceIp",
    "user" : "$context.identity.user",
    "user-agent" : "$context.identity.userAgent",
    "user-arn" : "$context.identity.userArn",
    "request-id" : "$context.requestId",
    "resource-id" : "$context.resourceId",
    "resource-path" : "$context.resourcePath"
    }
}

The mapping template will structure the Facebook response in the desired format specified by the application/json template. The Lambda function will then extract information from the response and return the required output to the chatbot user. For more information on AWS mapping templates, see the AWS documentation.

Go back to Services->API Gateway->[your Lambda function]->Resources->ANY and select Method Request. In the Settings section, make sure NONE is selected in the Authorization dropdown list. If not, change it NONE and press the small Update button.

Go back to the Actions button for your API gateway and select Deploy API to make your API gateway accessible by the internet. Your API gateway is ready to go.

Set Up Facebook Messenger

Facebook makes it possible to use Facebook Messenger as the user interface for your chatbot. For our chatbot example, we will use Messenger as the UI. To create a Facebook page and Facebook app, go to the Facebook App Getting Started Guide to set up your Facebook components.

To connect your Facebook App to AWS Lambda you will need to go back to your API gateway. Go to your Lambda function and find the API endpoint URL (obscured in the picture below).

Go back to your Facebook App page and in the Add Product page, click on the Get Started button next to the Messenger section. Scroll down and in the Webhooks section, press the Setup webhooks button. A New Page Subscription page window should pop up. Enter your API endpoint URL in the Callback URL text box and in the Verify Token text box, enter a token name that you will use in your Lambda verification code (e.g. mongodb_atlas_token). As the Facebook docs explain, your code should look for the Verify Token and respond with the challenge sent in the verification request. Last, select the messages and messaging_postbacks subscription fields.

Press the Verify and Save button to start the validation process. If everything went well, the Webhooks section should show up again and you should see a Complete confirmation in green:

In the Webhooks section, click on Select a Page to select a page you already created. If you don't have any page on Facebook yet, you will first need to create a Facebook page. Once you have selected an existing page and press the Subscribe button.

Scroll up and in the Token Generation section, select the same page you selected above to generate a page token.

The first time you want to complete that action, Facebook might pop up a consent page to request your approval to grant your Facebook application some necessary page-related permissions. Press the Continue as [your name] button and the OK button to approve these permissions. Facebook generates a page token which you should copy and paste into a separate document. We will need it when we complete the configuration of our Lambda function.

Connect Facebook Messenger UI to AWS Lambda Function

We will now connect the Facebook Messenger UI to AWS Lambda and begin sending weather queries through the chatbot. Below is the index.js code for our Lambda function. The index.js file will be packaged into a compressed archive file later on and loaded to our AWS Lambda function.

"use strict";

var assert = require("assert");
var https = require("https");
var request = require("request");
var MongoClient = require("mongodb").MongoClient;

var facebookPageToken = process.env["PAGE_TOKEN"];
var VERIFY_TOKEN = "mongodb_atlas_token";
var mongoDbUri = process.env["MONGODB_ATLAS_CLUSTER_URI"];

let cachedDb = null;

exports.handler = (event, context, callback) => {
  context.callbackWaitsForEmptyEventLoop = false;

  var httpMethod;

  if (event.context != undefined) {
    httpMethod = event.context["http-method"];
  } else {
    //used to test with lambda-local
    httpMethod = "PUT";
  }

  // process GET request (for Facebook validation)
  if (httpMethod === "GET") {
    console.log("In Get if loop");
    var queryParams = event.params.querystring;
    var rVerifyToken = queryParams["hub.verify_token"];
    if (rVerifyToken === VERIFY_TOKEN) {
      var challenge = queryParams["hub.challenge"];
      callback(null, parseInt(challenge));
    } else {
      callback(null, "Error, wrong validation token");
    }
  } else {
    // process POST request (Facebook chat messages)
    var messageEntries = event["body-json"].entry;
    console.log("message entries are " + JSON.stringify(messageEntries));
    for (var entryIndex in messageEntries) {
      var messageEntry = messageEntries[entryIndex].messaging;
      for (var messageIndex in messageEntry) {
        var messageEnvelope = messageEntry[messageIndex];
        var sender = messageEnvelope.sender.id;
        if (messageEnvelope.message && messageEnvelope.message.text) {
          var onlyStoreinAtlas = false;
          if (
            messageEnvelope.message.is_echo &&
            messageEnvelope.message.is_echo == true
          ) {
            console.log("only store in Atlas");
            onlyStoreinAtlas = true;
          }
          if (!onlyStoreinAtlas) {
            var location = messageEnvelope.message.text;
            var weatherEndpoint =
              "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22" +
              location +
              "%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys";
            request(
              {
                url: weatherEndpoint,
                json: true
              },
              function(error, response, body) {
                try {
                  var condition = body.query.results.channel.item.condition;
                  var response =
                    "Today's temperature in " +
                    location +
                    " is " +
                    condition.temp +
                    ". The weather is " +
                    condition.text +
                    ".";
                  console.log(
                    "The response to send to Facebook is: " + response
                  );
                  sendTextMessage(sender, response);
                  storeInMongoDB(messageEnvelope, callback);
                } catch (err) {
                  console.error(
                    "error while sending a text message or storing in MongoDB: ",
                    err
                  );
                  sendTextMessage(sender, "There was an error.");
                }
              }
            );
          } else {
            storeInMongoDB(messageEnvelope, callback);
          }
        } else {
          process.exit();
        }
      }
    }
  }
};

function sendTextMessage(senderFbId, text) {
  var json = {
    recipient: { id: senderFbId },
    message: { text: text }
  };
  var body = JSON.stringify(json);
  var path = "/v2.6/me/messages?access_token=" + facebookPageToken;
  var options = {
    host: "graph.facebook.com",
    path: path,
    method: "POST",
    headers: { "Content-Type": "application/json" }
  };
  var callback = function(response) {
    var str = "";
    response.on("data", function(chunk) {
      str += chunk;
    });
    response.on("end", function() {});
  };
  var req = https.request(options, callback);
  req.on("error", function(e) {
    console.log("problem with request: " + e);
  });

  req.write(body);
  req.end();
}

function storeInMongoDB(messageEnvelope, callback) {
  if (cachedDb && cachedDb.serverConfig.isConnected()) {
    sendToAtlas(cachedDb, messageEnvelope, callback);
  } else {
    console.log(`=> connecting to database ${mongoDbUri}`);
    MongoClient.connect(mongoDbUri, function(err, db) {
      assert.equal(null, err);
      cachedDb = db;
      sendToAtlas(db, messageEnvelope, callback);
    });
  }
}

function sendToAtlas(db, message, callback) {
  db.collection("records").insertOne({
    facebook: {
      messageEnvelope: message
    }
  }, function(err, result) {
    if (err != null) {
      console.error("an error occurred in sendToAtlas", err);
      callback(null, JSON.stringify(err));
    } else {
      var message = `Inserted a message into Atlas with id: ${result.insertedId}`;
      console.log(message);
      callback(null, message);
    }
  });
}

We are passing the MongoDB Atlas connection string (or URI) and Facebook page token as environment variables so we'll configure them in our Lambda function later on.

For now, clone this GitHub repository and open the README file to find the instructions to deploy and complete the configuration of your Lambda function.

Save your Lambda function and navigate to your Facebook Page chat window to verify that your function works as expected. Bring up the Messenger window and enter the name of a city of your choice (such as New York, Paris or Mumbai).

Store Message History in MongoDB Atlas

AWS Lambda functions are stateless; thus, if you require data persistence with your application you will need to store that data in a database. For our chatbot, we will save message information (text, senderID, recipientID) to MongoDB Atlas (if you look at the code carefully, you will notice that the response with the weather information comes back to the Lambda function and is also stored in MongoDB Atlas).

Before writing data to the database, we will first need to connect to MongoDB Atlas. Note that this code is already included in the index.js file.

function storeInMongoDB(messageEnvelope, callback) {
  if (cachedDb && cachedDb.serverConfig.isConnected()) {
    sendToAtlas(cachedDb, messageEnvelope, callback);
  } else {
    console.log(`=> connecting to database ${mongoDbUri}`);
    MongoClient.connect(mongoDbUri, function(err, db) {
      assert.equal(null, err);
      cachedDb = db;
      sendToAtlas(db, messageEnvelope, callback);
    });
  }
}

sendToAtlas will write chatbot message information to your MongoDB Atlas cluster.

function sendToAtlas(db, message, callback) {
  db.collection("records").insertOne({
    facebook: {
      messageEnvelope: message
    }
  }, function(err, result) {
    if (err != null) {
      console.error("an error occurred in sendToAtlas", err);
      callback(null, JSON.stringify(err));
    } else {
      var message = `Inserted a message into Atlas with id: ${result.insertedId}`;
      console.log(message);
      callback(null, message);
    }
  });
}

Note that the storeInMongoDB and sendToAtlas methods implement MongoDB's recommended performance optimizations for AWS Lambda and MongoDB Atlas, including not closing the database connection so that it can be reused in subsequent calls to the Lambda function.

The Lambda input contains the message text, timestamp, senderID and recipientID, all of which will be written to your MongoDB Atlas cluster. Here is a sample document as stored in MongoDB:

{
  "_id": ObjectId("58124a83c976d50001f5faaa"),
  "facebook": {
    "message": {
      "sender": {
        "id": "1158763944211613"
      },
      "recipient": {
        "id": "129293977535005"
      },
      "timestamp": 1477593723519,
      "message": {
        "mid": "mid.1477593723519:81a0d4ea34",
        "seq": 420,
        "text": "San Francisco"
      }
    }
  }
}

If you'd like to see the documents as they are stored in your MongoDB Atlas database, download MongoDB Compass, connect to your Atlas cluster and visualize the documents in your fbchats collection:

"MongoDB Compass"

Note that we're storing both the message as typed by the user, as well as the response sent back by our Lambda function (which comes back to the Lambda function as noted above).

Using MongoDB Atlas with other AWS Services

In this blog, we demonstrated how to build a Facebook chatbot, using MongoDB Atlas and AWS Lambda. MongoDB Atlas can also be used as the persistent data store with many other AWS services, such as Elastic Beanstalk and Kinesis. To learn more about developing an application with AWS Elastic Beanstalk and MongoDB Atlas, read Develop & Deploy a Node.js App to AWS Elastic Beanstalk & MongoDB Atlas.

To learn how to orchestrate Lambda functions and build serverless workflows, read Integrating MongoDB Atlas, Twilio, and AWS Simple Email Service with AWS Step Functions.

For information on developing an application with AWS Kinesis and MongoDB Atlas, read Processing Data Streams with Amazon Kinesis and MongoDB Atlas.

To learn how to use your favorite language or framework with MongoDB Atlas, read Using MongoDB Atlas From Your Favorite Language or Framework.

About the Author - Raphael Londner

Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner.

Modernizing and Protecting Data Center Operations with MongoDB and Dell EMC

As part of our ongoing series highlighting our partner ecosystem, we recently sat down with Dell EMC Global Alliances Director, Tarik Dwiek and Director of Product Management, Philip Fote, to better understand how Dell EMC and MongoDB partner to help modernize and protect data center operations.

What do you want customers to know about the MongoDB and Dell EMC relationship?
Tarik: We have been partnering for over a year on Dell EMC Flash and software-defined platforms, and the traction has been amazing. To fully realize the potential of MongoDB, customers need to modernize their infrastructure and transform their data center operations. At Dell EMC, our strategy is to help customers achieve this modernization by taking advantage of 4 key pillars: flash, software-defined, scale-out, and cloud enabled solutions. In addition, we are working on a data protection strategy for enterprise-grade backup and restore of MongoDB.

Can you further explain how this strategy relates directly to MongoDB?
Tarik: First off, MongoDB unlocks the ability for unparalleled performance at the database layer. This is where Flash is essential, meeting these performance requirements with compelling economics. Second, scale-out architectures, like MongoDB, have become a requirement because customers are generating orders of magnitude more data. Third, many organizations are implementing a software-defined data center. This model automates the deployment and configuration of IT services, resulting in agility and flexibility for managing data services. Finally, we want to ensure that the on-prem data center can leverage public cloud economics non-disruptively.

Tell us more about Dell EMC Data Protection solutions.
Philip: At Dell EMC, we believe data needs to be protected wherever it lives and no matter what happens. With this in mind, we start with the reality that data protection cannot be one size fits all in terms of service levels. Protection and availability should be based on data value and service levels that align to business objectives. Dell EMC looks at data protection as a continuum that spans over many protection tiers, including availability, replication, backup, snapshots, and archive; we offer products and solutions that span this continuum. With this, customers can tailor their data protection solution to best serve their specific needs.

What is Data Domain?
Philip: Dell EMC Data Domain systems deliver industry leading protection storage. Data Domain can reduce the amount of disk storage needed to retain and protect data by ratios of 10-30x and greater. It can scale up to 150 PB of logical capacity managed by a single system and with throughput up to 68 TB/hour. Data Domain systems make it possible to complete more backups in less time and provide faster, more reliable restores. The Data Domain Operating System (DD OS) is the intelligence behind Data Domain systems that makes them the industry’s most reliable and cloud-enabled protection storage.

What is DD Boost?
Philip: DD Boost provides advanced integration between Data Domain systems and leading backup and enterprise applications. DD Boost distributes parts of the deduplication process to the backup server or application client to speed backups by up to 50 percent and reduce bandwidth requirements by up to 99 percent.

What is DD Boost file system plug-in?
Philip: DD Boost is now immediately available for new workloads that were previously unavailable by using a standard file system interface. BoostFS can be deployed in minutes, reducing backup windows and storage capacity.

Why did you choose to certify MongoDB with BoostFS?
Philip: Dell EMC is committed to providing customers a holistic data protection strategy that evolves with changes in the market. The adoption of NoSQL open source databases is one of those changes, and MongoDB is a market leader. This new partnership with the Data Domain ecosystem will better allow our customers to add MongoDB workloads to their existing infrastructure. BoostFS provides all the benefits and efficiencies of DD Boost, and does so in a simple, cost effective manner. With Dell EMC and MongoDB, customers are now given a valuable, synergistic solution built from two industry leaders.

What MongoDB configurations are supported with BoostFS?
Database: MongoDB v2.6, 3.0, 3.2, and 3.4 (future)
Storage Engines: mmapv1 and wired tiger
Backup Tools: Ops Manager 2.0.7, mongodump


Data Domain: All Platforms and DDVE
DD OS: v6.0
BoostFS: v1.0

For more information or to ask questions about BoostFS with MongoDB, please visit the Data Domain Community web site.

Where do you see this relationship going?
Philip: As the Product Manager for DD Boost and BoostFS, part of my responsibilities include running the partner ecosystem for DD Boost, so I have a lot experience in dealing with partners. When working in that capacity, it’s easy to separate the good from the bad. Working with MongoDB has been great from the start – they have been responsive, flexible, and proactive in solving problems. Both firms are excited about the solution being offered today, and discussions have already started on extending this solution to cloud use cases.

What is the main use case for MongoDB with BoostFS?
Philip: One of the main use cases for BoostFS is to provide an enterprise backup and recovery solution with the option to replicate to a remote site. This secondary site can be used for disaster recovery or long term retention. The BoostFS plug-in resides on the MongoDB Ops Manager server as a Linux file system mount point, and the DD Boost protocols transports the data written to the file system, by Ops Manager, to/from the Data Domain. Then backups are replicated using MTree replication to a remote Data Domain system.

MongoDB and Boost

What are the benefits you’ll get with BoostFS for MongoDB as opposed to Network File System (NFS)?
Philip: BoostFS offers advanced features while retaining the user experience you get with NFS, including load balancing and failover plus security. The chart below shows the benefits of BoostFS over NFS. Details on these features can be found on DellEMC.com or at the Data Domain User Community site.

BoostFS for MongoDB

What exciting things can we look forward to next from MongoDB and Dell EMC?
Tarik: We have invested heavily in hyper-converged infrastructure. More and more customers are seeing the benefits in shifting their focus from maintaining infrastructure to innovating their application. We see tremendous potential in validating and eventually embedding MongoDB into our converged offerings.

Thank you for speaking with us Tarik and Philip. If you’d like to learn more:

Dell EMC and MongoDB Solutions Brief



Announcing MongoDB 3.4

Today we are announcing MongoDB 3.4, another milestone in our march to being the default database for modern applications. 3.4 makes MongoDB more flexible than ever, allowing developers to consolidate even more use cases into their MongoDB deployment, even as we continue to mature the platform and its ecosystem.

MongoDB was created to make it easy for developers to work with their data, beginning with introducing the document model itself. Documents are the best rudimentary unit for a data store, because they let you represent any kind of data, and embody their structure however best suits your use case. Whether that means deep or shallow nesting (or no nesting), documents can handle it. The key is being able to add many types of queries and algorithms to the data.

MongoDB 3.4 adds a stage to the aggregation pipeline that enables faceted search, greatly simplifying the query load for applications that browse and explore that data. It also adds operators to power graph queries. As we continue to add query features, users can consolidate more uses cases, instead of bloating their application footprint with a proliferation of specialized data stores.

Just because it’s easy to work with data in MongoDB, it doesn’t mean we can’t make it easier. In 3.4, the aggregation pipeline continues to mature, with more operators and expressions, enhancing string handling, allowing more sophisticated use of array elements, testing fields for type, and support for branching. Financial calculations are made simple with the addition of a Decimal data type.

I think it was John Donne who said: “No database is an island,” but whoever said it, they were very right. A database has to work as the heart of an ecosystem, and in 3.4, we continue to build that thriving ecosystem. Connecting MongoDB to the outside world is better than ever. MongoDB 3.4 introduces a ground-up rewrite of the BI connector, which improves performance, simplifies installation and configuration, and supports Windows. 3.4 also includes an update for our Apache Spark connector, with support for the Spark 2.0.

We’ve also extended the platforms that MongoDB runs on, including ARM-64, and IBM’s POWER8 and zSeries platforms.

MongoDB Compass is growing up with 3.4. It has new ways to depict data, such as the map view for geographic data, and it has become a data manipulation and performance tuning tool as well. In 3.4, Compass offers visual plan explanations, real-time stats, CRUD operations and index creation, so now you can identify, diagnose, and fix performance and data problems all from within Compass.

Of course, MongoDB 3.4 is supported by our trifecta of enterprise-grade ops management platforms: Ops Manager, Cloud Manager, and MongoDB Atlas, each of which add new features with this release. Ops Manager, for example, has improved its monitoring with built in telemetry gathering tailored to each deployment platform, and now allows ops teams to create server pools to serve up database-as-a-service to internal teams. Atlas introduces Virtual Private Cloud (VPC) Peering, allowing teams to use convenient private IPs to talk to their MongoDB service from within their AWS VPC.

There’s a ton more than I can fit into a blog post. That’s what release notes are for. But I shouldn’t leave out a few highlights, like: tunable consistency control for replica sets, including linearizable reads; collations for queries and indexes; and read-only views, which enable us to bring field level security to apps handling regulated data.

We’re incredibly excited to ship MongoDB 3.4 to you, so it can help your data serve you, not the other way around. Our approach is to build a database that can handle any kind of data, and the capabilities to query that data however you need to.

Learn more about MongoDB 3.4, register for our upcoming webinar:

Find out what's new



About the Author - Eliot Horowitz

Eliot Horowitz is CTO and Co-Founder of MongoDB. Eliot is one of the core MongoDB kernel committers. Previously, he was Co-Founder and CTO of ShopWiki. Eliot developed the crawling and data extraction algorithm that is the core of its innovative technology. He has quickly become one of Silicon Alley's up and coming entrepreneurs and was selected as one of BusinessWeek's Top 25 Entrepreneurs Under Age 25 nationwide in 2006. Earlier, Eliot was a software developer in the R&D group at DoubleClick (acquired by Google for $3.1 billion). Eliot received a BS in Computer Science from Brown University.

Microservices Webinar Recap

Recently, we held a webinar discussing microservices, and how two companies, Hudl and UPS i-parcel, leverage MongoDB as the database powering their microservices environment. There have been a number of theoretical and vendor-led discussions about microservices over the past couple of years. We thought it would be of value to share with you real world insights from companies who have actually adopted microservices, as well as answers to questions we received from the audience during the live webinar.

Jon Dukulil is the VP of Engineering from Hudl and Yursil Kidwai is the VP of Technology from UPS i-parcel.

How are Microservices different from Service Oriented Architectures (SOAs) utilizing SOAP/REST with an Enterprise Service Bus (ESB)?

Microservices and SOAs are related in that both approaches distribute applications into individual services. Where they differ though, is the scope of the problem they address today. SOAs aim for flexibility at the enterprise IT level. This can be a complex undertaking as SOAs only work when the underlying services do not need to be modified. Microservices represent an architecture for an individual service, and aim at facilitating continous delivery and parallel development of multiple services. The following graphic highlights some of the differences.

One significant difference between SOAs and microservices revolves around the messaging system, which coordinates and synchronizes communication between different services in the application. Enterprise service buses (ESB) emerged as a solution for SOAs because of the need for service integration and a central point of coordination. As ESBs grew in popularity, enterprise vendors packaged more and more software and smarts into the middleware, making it difficult to decouple the different services that relied on the ESB for coordination. Microservices keep the messaging middleware focused on sharing data and events, and enabling more of the intelligence at the endpoints. This makes it easier to decouple and separate individual services.

How big should a microservice be?

There are many differing opinions about how large a microservice should be, thus it really depends on your application needs. Here is how Hudl and UPS i-parcel approach that question.

Jon Dukulil (Hudl): We determine how big our microservice should be the amount of work that can be completed by a squad. For us, a squad is a small completely autonomous team. It consists of 4 separate functions: product manager, developer, UI designer, and QA. When we are growing headcount we are not thinking of growing larger teams, we are thinking of adding more squads.

![](https://webassets.mongodb.com/_com_assets/cms/Microservices_MongoDB_Blog2-a6l74owk23.png)

Yursil Kidwai (UPS i-parcel): For us, we have defined microservice as a single verb (e.g. Billing), and are constantly challenging ourselves on how that verb should be defined. We follow the “two pizza” rule, in which a team should never be larger than what you can feed with two large pizzas. Whatever our “two pizza” team can deliver in one week is what we consider to be the right size for a microservice.

Why should I decouple databases in a microservices environment? Can you elaborate on this?

One of the core principles behind microservices is strong cohesion (i.e. related code grouped together) and loose coupling (i.e. a change to one service should not require a change to another). With a shared database architecture both these principles are lost. Consumers are tied to a specific technology choice, as well as particular database implementation. Application logic may also be spread among multiple consumers. If a shared piece of information needs to be edited, you might need to change the behavior in multiple places, as well as deploy all those changes. Additionally, in a shared database architecture a catastrophic failure with the infrastructure has the potential to affect multiple microservices and result in a substantial outage. Thus, it is recommended to decouple any shared databases so that each microservice has its own database.

Due to the distributed nature of microservices, there are more failure points. Because of all these movable parts in microservices, how do you deal with failures to ensure you meet your SLAs?

Jon Dukulil (Hudl): For us it’s an important point. By keeping services truly separate where they share as little as possible, that definitely helps. You’ll hear people working with microservices talk about “minimizing the blast radius” and that’s what I mean by the separation of services. When one service does have a failure it doesn’t take everything else down with it. Another thing is that when you are building out your microservices architecture, take care of the abstractions that you create. Things in a monolith that used to be a function call are now a network call, so there are many more things that can fail because of that: networks can timeout, network partitions, etc. Our developers are trained to think about what happens if we can’t complete the call. For us, it was also important to find a good circuit breaker framework and we actually wrote our own .NET version of a framework that Netflix built called Hystrix. That has been pretty helpful to isolate points of access between services and stop failures from cascading.

Yursil Kidwai (UPS i-parcel): One of the main approaches we took to deal with failures and dependencies was the choice to go with MongoDB. The advantage for us is MongoDB’s ability to deploy a single replica set across multiple regions. We make sure our deployment strategy always includes multiple regions to create that high availability infrastructure. Our goal is to always be up, and the ability of MongoDB’s replica sets to very quickly recover from failures is key to that. Another approach was around monitoring. We built our own monitoring framework that we are reporting on with Datadog. We have multiple 80 inch TVs displaying dashboards of the health of all our microservices. The dashboards are monitoring the throughput of the microservices on a continual basis, with alerts to our ops team configured if the throughput for a service falls below an acceptable threshold level. Finally, it’s important for the team to be accountable. Developers can’t just write code and not worry about, but they own the code from beginning to end. Thus, it is important for developers to understand the interdependencies between DevOps, testing, and release in order to properly design a service.

Why did you choose MongoDB and how does it fit in with your architecture?

Jon Dukulil (Hudl): One, from a scaling perspective, we have been really happy with MongoDB’s scalability. We have many small databases and a couple of very large databases. Our smallest database today is serving up just 9MB of data. This is pretty trivial so we need these small databases to run on cost effective hardware. Our largest database is orders of magnitude larger and is spread over 8 shards. The hardware needs of those different databases are very different, but they are both running on MongoDB. Fast failovers are another big benefit for us. It’s fully automated and it’s really fast. Failovers are in the order of 1-5 seconds for us, and the more important thing is they are really reliable. We’ve never had an issue where a failover hasn’t gone well. Lastly, since MongoDB has a dynamic schema, for us that means that the code is the schema. If I’m working on a new feature and I have a property that last week was a string, but this week I want it to be an array of strings, I update my code and I’m ready to go. There isn’t much more to it than that.

Yursil Kidwai (UPS i-parcel): In many parts of the world, e-commerce rules governing cross border transaction are still changing and thus our business processes in those areas are constantly being refined. To handle the dynamic environment that our business operates in, the requirement to change the schema was paramount to us. For example, one country may require a tax identification number, while another country may suddenly decide it needs your passport, as well as some other classification number. As these changes are occurring, we really need something behind us that will adapt with us and MongoDB’s dynamic schema gave us the ability to quickly experiment and respond to our ever changing environment. We also needed the ability to scale. We have 20M tracking events across 100 vendors processed daily, as well as tens of thousands of new parcels that enter into our system every day. MongoDB’s ability to scale-out on commodity hardware and its elastic scaling features really allowed us to handle any unexpected inflows.

Next Steps

To understand more about the business level drivers and architectural requirements of microservices, read Microservices: Evolution of Building Modern Apps Whitepaper.

For a technical deep dive into microservices and containers, read Microservices: Containers and Orchestration Whitepaper

Getting Started with Python, PyMODM, and MongoDB Atlas

What is PyMODM

PyMODM is an object modeling package for Python that works like an Object Relational Mapping (ORM) and provides a validation and modeling layer on top of PyMongo (MongoDB’s Python driver). Developers can use PyMODM as a template to model their data, validate schemas, and easily delete referenced objects. PyMODM can be used with any Python framework, is compatible with Python 3, and is supported by MongoDB.

Benefits of PyMODM

PyMODM allows developers to focus more on developing application logic instead of creating validation logic to ensure data integrity.

Some key benefits of PyMODM are:

Field Validation. MongoDB has a dynamic schema, but there are very few production use cases where data is entirely unstructured. Most applications expect some level of data validation either through the application or database tier. MongoDB provides Document Validation within the database. Users can enforce checks on document structure, data types, data ranges, and the presence of mandatory fields. Document validation is useful for centralizing rules across projects and APIs, as well as minimizing redundant code for multiple applications. In certain cases, application side validation makes sense as well, especially when you would like to obviate the need for a round trip between the application and database. PyMODM provides users the ability to define models and validate their data before storing it in MongoDB, thus eliminating the amount of data validation logic developers need to write in the application tier.

Built In Reference Handling. PyMODM has built in reference handling to make development simpler. Developers don’t have to plan on normalizing data as much as they would with an RDBMS. PyMODM can automatically populate fields that reference documents in other collections, in a similar way to foreign keys in a RDBMS.

For example, you might have a model for a blog post that contains an author. Let’s say we want to keep track of these entities in separate collections. The way we store this in MongoDB is to have the _id from the author document be stored as an author field in the post document:

{
"title": "Working with PyMODM",
"author": ObjectId('57dad74a6e32ab4894ea6898')
}

If we were using the low-level driver, we would just get an ObjectId when we accessed post['author'], whereas PyMODM will lazily dereference this field for you:

>>> post.author
Author(name='Jason Ma')

In other words, PyMODM handles all the necessary queries to resolve referenced objects, instead of having to pull out the ids yourself and perform the extra queries manually.

PyMODM also provides several strategies for managing how objects get deleted when they are involved in a relationship with other objects. For example, if you have a Book and Publisher class, where each Book document references a Publisher document, you have the option of deleting all Book objects associated with that Publisher.

Familiar PyMongo Syntax. PyMODM uses PyMongo-style syntax for queries and updates, which makes it familiar and easy to get started with for those already familiar with PyMongo.

Installing PyMODM

Getting started with PyMODM is simple. You can install PyMODM with pip.

pip install pymodm

Connecting to MongoDB Atlas

For developers that are interested in minimizing operational database tasks, MongoDB Atlas is an ideal option.

MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime.

Setting up Atlas is simple.

Select the instance size that fits your application needs and click “CONFIRM & DEPLOY”.

Connecting PyMODM to MongoDB Atlas is straightforward and easy. Just find the connection string and plug it into the ‘connect’ method. To ensure a secure system right out of the box, authentication and IP Address whitelisting are automatically enabled. IP address whitelisting is a key MongoDB Atlas security feature, adding an extra layer to prevent 3rd parties from accessing your data. Clients are prevented from accessing the database unless their IP address has been added to the IP whitelist for your MongoDB Atlas group. For AWS, VPC Peering for MongoDB Atlas is under development and will be available soon, offering a simple, robust solution. It will allow the whitelisting of an entire AWS Security Group within the VPC containing your application servers.

from pymodm import connect

#Establish a connection to the database and call the connection my-atlas-app
connect(
'mongodb://jma:PASSWORD@mongo-shard-00-00-efory.mongodb.net:27017,mongo-shard-00-01-efory.mongodb.net:27017,mongo-shard-00-02-efory.mongodb.net:27017/admin?ssl=true&replicaSet=mongo-shard-0&authSource=admin', 
    alias='my-atlas-app'
)

In this example, we have set alias=’my-atlas-app’. An alias in the connect method is optional, but comes in handy if we ever need to refer to the connection by name. Remember to replace “PASSWORD” with your own generated password.

Defining Models

One of the big benefits of PyMODM is the ability to define your own models and apply schema validation to those models. The below examples highlight how to use PyMODM to get started with a blog application.

Once a connection to MongoDB Atlas is established, we can define our model class. MongoModel is the base class for all top-level models, which represents data stored in MongoDB in a convenient object-oriented format. A MongoModel definition typically includes a number of field instances and possibly a Meta class that provides settings specific to the model:

from pymodm import MongoModel, fields
from pymongo.write_concern import WriteConcern

class User(MongoModel):
    email = fields.EmailField(primary_key=True)
    first_name = fields.CharField()
    last_name = fields.CharField()

    class Meta:
        connection_alias = 'my-atlas-app'
        write_concern = WriteConcern(j=True)

In this example, the User model inherits from MongoModel, which means that the User model will create a new collection in the database (myDatabase.User). Any class that inherits directly from MongoModel will always get it’s own collection.

The character fields (first_name, last_name) and email field (email) will always store their values as unicode strings. If a user stores some other type in first_name or last_name (e.g. Python ‘bytes’) then PyMODM will automatically convert the field to a unicode string, providing consistent and uniform access to that field. A validator is readily available on CharField, which will validate the maximum string length. For example, if we wanted to limit the length of a last name to 30 characters, we could do:

last_name = fields.CharField(max_length=30)

For the email field, we set primary_key=True. This means that this field will be used as the id for documents of this MongoModel class. Note, this field will actually be called _id in the database. PyMODM will validate that the email field contents contain a single ‘@’ character.

New validators can also be easily be created. For example, the email validator below ensures that the email entry is a Gmail address:

def is_gmail_address(string):
    if not string.endswith(‘@gmail.com’):
        raise ValidationError(‘Email address must be valid gmail account.’)

class User(MongoModel):
email = fields.EmailField(validators=[is_gmail_address])

Here, PyMODM will validate that the email field contains a valid Gmail address or throw an error. PyMODM handles field validation automatically whenever a user retrieves or saves documents, or on-demand. By rolling validation into the Model definition, we reduce the likelihood of storing invalid data in MongoDB. PyMODM fields also provide a uniform way of viewing data in that field. If we use a FloatField, for example, we will always receive a float, regardless of whether the data stored in that field is a float, an integer, or a quoted number. This mitigates the amount of logic that developers need to create in their applications.

Finally, the last part of our example is the Meta class, which contains two pieces of information. The connection_alias tells the model which connection to use. In a previous code example, we defined the connection alias as my-atlas-app. The write_concern attribute tells the model which write concern to use by default. You can define other Meta attributes such as read concern, read preference, etc. See the PyMODM API documentation for more information on defining the Meta class.

Reference Other Models

Another powerful feature of PyMODM is the ability to reference other models.

Let’s take a look at an example.

from pymodm import EmbeddedMongoModel, MongoModel, fields

class Comment(EmbeddedMongoModel):
    author = fields.ReferenceField(User)
    content = fields.CharField()

class Post(MongoModel):
    title = fields.CharField()
    author = fields.ReferenceField(User)
    revised_on = fields.DateTimeField()
    content = fields.CharField()
    comments = fields.EmbeddedDocumentListField(Comment)

In this example, we have defined two additional model types: Comment and Post. Both these models contain an author, which is an instance of the User model. The User that represents the author in each case is stored among all other Users in the myDatabase.User collection. In the Comment and Post models, we’re just storing the _id of the User in the author field. This is actually the same as the User’s email field, since we set primary_key=True for the field earlier.

The Post class gets a little bit more interesting. In order to support commenting on a Post, we’ve added a comments field, which is an EmbeddedDocumentListField. The EmbeddedDocumentListField embeds Comment objects directly into the Post object. The advantage of doing this is that you don’t need multiple queries to retrieve all comments associated with a given Post.

Now that we have created models that reference each other, what happens if an author deletes his/her account. PyMODM provides a few options in this scenario:

  • Do nothing (default behaviour).
  • Change the fields that reference the deleted objects to None.
  • Recursively delete all objects that were referencing the object (i.e. delete any comments and posts associated with a User).
  • Don’t allow deleting objects that have references to them.
  • If the deleted object was just one among potentially many other references stored in a list, remove the references from the list. For example, if the application allows for Post to have multiple authors we could remove from the list just the author who deleted their account.

For our previous example, let’s delete any comments and posts associated with a User that has deleted his/her account:

author = fields.ReferenceField(User, on_delete=ReferenceField.CASCADE)

This will delete all documents associated with the reference.

In this blog, we have highlighted just a few of the benefits that PyMODM provides. For more information on how to leverage the powerful features of PyMODM, check out this github example of developing a blog with the flask framework.

Summary

PyMODM is a powerful Python ORM for MongoDB that provides an object-oriented interface to MongoDB documents to make it simple to enforce data validation and referencing in your application. MongoDB Atlas helps developers free themselves from the operational tasks of scaling and managing their database. Together, PyMODM and MongoDB Atlas provide developers a compelling solution to enable fast, iterative development, while reducing costs and operational tasks.

Get Started with PyMODM

Serverless Architectures: The Evolution of Cloud Computing

Introduction

Since the advent of the computer, building software has been a complicated process. Over the past decade, new infrastructure approaches (IaaS and PaaS), software architectures (SOA and Microservices), and methodologies (Agile, Continuous Delivery and DevOps) have emerged to mitigate the complexity of application development. While microservices has been the hot trend over the past couple of years, serverless architectures have been gaining momentum by providing a new way to build scalable and cost effective applications. Serverless computing frees developers from the traditional cost of building applications by automatically provisioning servers and storage, maintaining infrastructure, upgrading software, and only charging for consumed resources.

This blog discusses what serverless computing is and key considerations when evaluating a database with your serverless environment.

What is Serverless Computing?

Serverless computing is the next layer of abstraction in cloud computing. It does not mean that there are no servers, but rather the underlying infrastructure (physical and virtual hosts, virtual machines, containers), as well as the operating system, is abstracted away from the developer. Applications are run in stateless compute containers that are event triggered (e.g. a user uploading a photo which triggers notifications to his/her followers). Developers create functions and depend on the infrastructure to allocate the proper resources to execute the function. If the load on the function grows, the infrastructure will create copies of the function and scale to meet demand.

Serverless computing supports multiple languages so developers can choose the tools they are most comfortable with. Users are only charged for runtime and resources (e.g. RAM) that the function consumes; thus there is no longer any concept of under or over provisioning. For example, if a function runs for 500ms and consumes 15 MB of RAM, the user will only be charged for 500 ms of runtime, and the cost to use 15 MB of RAM.

Serverless architectures are a natural extension of microservices. Similar to microservices, serverless architecture applications are broken down into specific core components. While microservices may group similar functionality into one service, serverless applications delineate functionality into finer grained components. Custom code is developed and executed as isolated, autonomous, granular functions that run in a stateless compute service.

To illustrate this point, let’s look at a simple example of how a microservice and serverless architecture differ.

In Figure 1, a client interacts with a “User” microservice. A container is pre-provisioned with all of the functionality of the “User” service residing in the container. The service consists of different functions (update_user, get_user, create_user, delete_user) and is scaled based on the overall load across the service. The service will consume hardware resources even while idle, and the user will still be charged for the underutilized resources.

![](https://webassets.mongodb.com/_com_assets/cms/ServerlessIMG1-9v49pgxm88.png)
Figure 1: Microservices Architecture

For a serverless architecture, the “User” service would be separated into more granular functions. In Figure 2, each API endpoint corresponds to a specific function and file. When a “create user” request is initiated by the client, the entire codebase of the “User” service does not have to run; instead only create_user.js will execute. There is no need to pre-provision containers, as standalone functions only consume resources when needed, and users are only charged on the actual runtime of their functions. This granularity also facilitates parallel development work, as functions can be tested and deployed independently.

Figure 2:Serverless Architecture

Benefits of Serverless Computing

Costs Scale With Usage: One of the biggest benefits of serverless computing is that you only pay for the runtime of your function. There is no concept of “idle” resources as you are not charged if the function is not executed. This is especially helpful for applications that are only used a few times an hour, which means any dedicated hardware, VMs, or containers would be sitting idle for the majority of the time, and a user would be charged for underutilized resources. With serverless computing, enterprises could build out an entire infrastructure and not pay for any compute resources until customers start using the application.

Elastic Scalability: Elastic scalability is also simple with a serverless architecture. If a function needs to scale, the infrastructure will make copies of the function to handle the load. An example of this could be a chatbot that responds to weather requests. In a serverless architecture, a chatbot function would handle the response by retrieving the user’s location and responding back with the temperature. For a few requests this is not a problem, but what happens if the chatbot service is flooded with thousands of request a second. For this scenario, the chatbot function would automatically scale by instantiating thousands of copies of the function. Once the requests have subsided, the environment would terminate the idle instances and scale down, allowing costs to scale proportionally with user demand.

Rapid Development and Iteration: Serverless computing is ideal for companies that need to quickly develop, prototype, and iterate. Development is quicker since there aren’t any dependencies on IT Ops. Functions are single threaded, which makes debugging and deploying functions simpler. The build process is also broken down into smaller and more manageable chunks. This increases the number of changes that can be pushed through the Continuous Delivery pipeline, resulting in rapid deployment and more iterative feedback. and Iterations are fast as the architecture is conducive to making large code changes quickly, resulting in more customer feedback and better product market fit.

Less System Administration: Serverless doesn’t mean that you completely obviate the operational element of your infrastructure, but it does mean that there is less system administration. There are no servers to manage, provision, and scale, as well as no patching and upgrading. Servers are automatically deployed in multiple availability zones to provide high availability. Support is also streamlined; if there is an issue in the middle of the night it is the responsibility of the cloud provider to resolve the problem.

Developer Productivity: By using a serverless architecture, developers can focus more on writing code than having to worry about managing the operational tasks of the application. This allows them to develop innovative features and focus on the core business logic that matters most to the business.

MongoDB Atlas and Serverless Computing

With MongoDB Atlas, users can leverage the rich functionality of MongoDB — expressive query language, flexible schema, always-on availability, distributed scale-out — from a serverless environment. MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime.

Setting up Atlas is simple.

Figure 3: Provisioning MongoDB Atlas cluster

Select the instance size that fits your application needs and click “CONFIRM & DEPLOY”. Depending on the instance size, a MongoDB cluster can be provisioned in seconds.

Figure 4: MongoDB Atlas Cluster Monitoring

MongoDB Atlas provides many benefits for those interested in building a serverless architecture:

Vendor Independence: Cloud providers typically only offer databases specific to that provider, which may not fit with what a developer needs. MongoDB Atlas provides independence from the underlying cloud provider and empowers developers to choose the appropriate tools for their needs. Developers can leverage the rich functionality of MongoDB’s query language and flexible data model, without worrying about the operational tasks of managing the database. If you decide to shift to another cloud provider, you won’t have to repopulate your data with a different database technology. MongoDB Atlas is currently available only on AWS, with support for Microsoft Azure and Google Cloud Platform (GCP) coming soon.

MEAN Stack: Serverless architectures accelerate the trend of shifting business logic from the back-end to the front-end. This makes the choice of front-end framework much more important. AngularJS, is ideally suited for this requirement and is a popular front-end for serverless architectures. AngularJS is a structural Javascript framework for dynamic web applications that provides interactive functions and AJAX — technique for creating fast and dynamic web pages — rich components. Combined with NodeJS, ExpressJS, and MongoDB, these tools form the MEAN stack (MongoDB, ExpressJS, AngularJS, NodeJS). There are huge advantages to using JavaScript and JSON throughout your serverless stack. Someone working on the front-end can easily understand the function (back-end) code and database queries. Additionally, using the same syntax and objects through the whole stack frees your team from understanding multiple language best practices, as well as reduces the barrier to entry to understand the codebase, resulting in higher software performance and developer productivity.

Rapid Deployment: With MongoDB Atlas, a MongoDB cluster can be provisioned and deployed in minutes and even seconds. Developers no longer need to worry about configuring or managing servers. Integrating MongoDB Atlas into your serverless platform requires you to pass the connection string into your serverless application.

Figure 5: MongoDB Atlas Connection

MongoDB Atlas features extensive capabilities to defend, detect, and control access to MongoDB, offering among the most complete security controls of any modern database:

  • User Rights Management: Control access to sensitive data using industry standard mechanisms for authentication and authorization at the database level
  • Encryption: Protect data in motion over the network and at rest in persistent storage

To ensure a secure system right out of the box, authentication and IP Address whitelisting are automatically enabled.

IP address whitelisting is a key MongoDB Atlas security feature, adding an extra layer to prevent 3rd parties from accessing your data. Clients are prevented from accessing the database unless their IP address has been added to the IP whitelist for your MongoDB Atlas group.

For AWS, VPC Peering for MongoDB Atlas is under development and will be available soon, offering a simple, robust solution. It will allow the whitelisting of an entire AWS Security Group within the VPC containing your application servers.

Scalability: You should expect your serverless functions to scale out, thus downstream setups need to be architected to keep up and scale out with your functions. Relational databases tend to break down with this model. MongoDB Atlas is designed with scalability as a core principle. When your cluster hits a certain threshold, MongoDB Atlas will alert you and with one click you can provision new servers.

Flexible Schema: Because serverless architectures are event driven, many use cases revolve around around the Internet of Things (IoT) and mobile. MongoDB is ideal for these use cases and more as its flexible document model enables you to store and process data of any type: events, geospatial, time series, text, binary, and anything else. Adding new fields to the document structure is simple, making it easy to handle changing data generated by event driven applications. Developers spend less time modifying schemas and more time innovating.

For more information on Serverless Architecture best practices and benefits download the Serverless Architectures: Evolution of Cloud Computing whitepaper.

Summary

Serverless architectures are relatively new and build on the work done by microservices. MongoDB Atlas is a database as a service and is well suited for serverless architectures as it provides elastic scalability and all the features of the database without managing operational tasks. Though serverless architectures have a lot of benefits, there are many considerations to keep in mind when looking to implement a serverless architecture. Thus, diligent planning should be taken before embarking on that path.

Resources

Download the Serverless Architectures White Paper


Building Applications with MongoDB's Pluggable Storage Engines: Part 2

In the previous post, I discussed MongoDB’s pluggable storage engine architecture and characteristics of each storage engine. In this post, I will talk about how to select which storage engine to use, as well as mixing and matching storage engines in a replica set.

**How To Select Which Storage Engine To Use**

WiredTiger Workloads

WiredTiger will be the storage engine of choice for most workloads. WiredTiger’s concurrency and excellent read and write throughput is well suited for applications requiring high performance:

  • IoT applications: sensor data ingestion and analysis
  • Customer data management and social apps: updating all user interactions and engagement from multiple activity streams
  • Product catalogs, content management, real-time analytics

For most workloads, it is recommended to use WiredTiger. The rest of the whitepaper will discuss situations where other storage engines may be applicable.

Encrypted Workloads

The Encrypted storage engine is ideally suited to be used in regulated industries such as finance, retail, healthcare, education, and government. Enterprises that need to build compliant applications with PCI DSS, HIPAA, NIST, FISMA, STIG, or other regulatory initiatives can use the Encrypted storage engine with native MongoDB security features such as authorization, access controls, authentication, and auditing to achieve compliance.

Before MongoDB 3.2, the primary methods to provide encryption-at-rest were to use 3rd party applications that encrypt files at the application, file system, or disk level. These methods work well with MongoDB but tend to add extra cost, complexity, and overhead.

The Encrypted Storage engine adds ~15% overhead compared to WiredTiger, as available CPU cycles are allocated to the encryption/decryption process – though the actual impact will be dependent on your data set and workload. This is still significantly less compared to 3rd party disk and file system encryption, where customers have noticed 25% overhead or more.

More information about the performance benchmark of the Encrypted storage engine can be found here.

The Encrypted storage engine, combined with MongoDB native security features such as authentication, authorization, and auditing provides an end to end security solution to safeguard data with minimal performance impact.

In-Memory Workloads

The advantages of in-memory computing are well understood. Data can be accessed in RAM nearly 100,000 times faster than retrieving it from disk, delivering orders-of-magnitude higher performance for the most demanding applications. With RAM prices continuing to tumble and new technologies such as 3D non-volatile memory on the horizon, the performance gains can now be realized with better and improving economics than ever before.

Not only is fast access important, but predictable access, or latency, is essential for certain modern day applications. For example, financial trading applications need to respond quickly to fluctuating market conditions as data flows through trading systems. Unpredictable latency outliers can mean the difference between making or losing millions of dollars.

While WiredTiger will be more than capable for most use cases, applications requiring predictable latency will benefit the most from the In-Memory storage engine.

Enterprises can harness the power of MongoDB core capabilities (expressive query language, primary and secondary indexes, scalability, high availability) with the benefits of predictable latency from the In-Memory storage engine.

Examples of when to use the In-Memory engine are:

Financial:

  • Algorithmic trading applications that are highly sensitive to predictable latency; such as when latency spikes from high traffic volumes can overwhelm a trading system and cause transactions to be lost or require re-transmission
  • Real-time monitoring systems that detect anomalies such as fraud
  • Applications that require predictable latency for processing of trade orders, credit card authorizations, and other high-volume transactions

Government:

  • Sensor data management and analytics applications interested in spatially and temporally correlated events that need to be contextualized with real time sources (weather, social networking, traffic, etc)
  • Security threat detection

ECommerce / Retail:

  • Session data of customer profiles during a purchase
  • Product search cache
  • Personalized recommendations in real time

Online Gaming:

  • First person shooter games
  • Caching of player data

Telco:

  • Real-time processing and caching of customer information and data plans
  • Tracking network usage for millions of users and performing real-time actions such as billing
  • Managing user sessions in real time to create personalized experiences on any mobile device

MMAPv1 Workloads

Though WiredTiger is better suited for most application workloads, there are certain situations where users may want to remain on MMAPv1:

Legacy Workloads: Enterprises that are upgrading to the latest MongoDB releases (3.0 and 3.2) and don’t want to re-qualify their applications with a new storage engine may prefer to remain with MMAPv1.

Version Downgrade: The upgrade process from MMAP/MMAPv1 to WiredTiger is a simple binary compatible “drop in” upgrade, but once upgraded to MongoDB 3.0 or 3.2 users cannot downgrade to a version lower than 2.6.8. This should be kept in mind for users that want to stay on version 2.6. There have been many added features included in MongoDB since version 2.6, thus it is highly recommended to upgrade to version 3.2.

Mixed Storage Engine Use Cases

MongoDB’s flexible storage architecture provides a powerful option to optimize your database. Storage engines can be mixed and matched within a single MongoDB cluster to meet diverse application needs for data. Users can evaluate different storage engines without impacting deployments and can also easily migrate and upgrade to a new storage engine following the rolling upgrade process. To simplify this process even further, users can utilize Ops or Cloud manager to upgrade their cluster’s version of MongoDB through a click of a button.

Though there are many possible mixed storage configurations, here are a few examples of mixed storage engine configurations with the In-Memory and WiredTiger engines.

**Figure 10**: eCommerce application with mixed storage engines

Since the In-Memory storage engine does not persist data as a standalone node, it can be used with another storage engine to persist data in a mixed storage engine configuration. The eCommerce application in Figure 10 uses two sharded clusters with three nodes (1 primary, 2 secondaries) in each cluster. The replica set with the In-Memory engine as the primary node provides low latency access and high throughput to transient user data such as session information, shopping cart items, and recommendations. The application’s product catalog is stored in the sharded cluster with WiredTiger as the primary node. Product searches can utilize the WiredTiger in-memory cache for low latency access. If the product catalog’s data storage requirements exceed server memory capacity, data can be stored and retrieved from disk. This tiered approach enables “hot” data to be accessed and modified quickly in real time, while persisting “cold” data to disk.

The configuration in Figure 11 demonstrates how to preserve low latency capabilities in a cluster after failover. Setting priority=1 in the secondary In-Memory node will result in automatic failover to that secondary, and eliminate the need to fully repopulate the failed primary when it comes back online. Additionally, if the transient data needs to be persisted then a secondary WiredTiger node can be configured to act as a replica, providing high availability and disk durability.

![](https://webassets.mongodb.com/_com_assets/cms/StorageEngineIMG12-pmr796cxcj.png)
**Figure 11:** Mixed storage engines, with hidden WiredTiger secondary

To provide even higher availability and durability a five node replica set with two In-Memory and three WiredTiger nodes can be used. In Figure 12, the In-Memory engine is the primary node, with four secondary nodes. If a failure to the primary occurs, the secondary In-Memory node will automatically failover as the primary and there will be no need to repopulate the cache. If the new primary In-Memory node also fails, then the replica set will elect a WiredTiger node as primary. This mitigates any disruption in operation as clients will still be able to write uninterrupted to the new WiredTiger primary.

![](https://webassets.mongodb.com/_com_assets/cms/StorageEngineIMG13-qs7y9kdkab.png)
**Figure 12:** Mixed storage engines with five node replica set

Additionally, a mixed storage engine approach is ideally suited for a microservices architecture. In a microservice architecture, a shared database between services can affect multiple services and slow down development. By decoupling the database and selecting the right storage engines for specific workloads, enterprises can improve performance and quickly develop features for individual services. Learn more about MongoDB and microservices.

Conclusion

MongoDB is the next generation database used by the world’s most sophisticated organizations, from cutting-edge startups to the largest companies, to create applications never possible at the fraction of the cost of legacy databases. With pluggable storage engine APIs, MongoDB continues to innovate and provide users the ability to choose the most optimal storage engines for their workloads. Now, enterprises have an even richer ecosystem of storage options to solve new classes of use cases with a single database framework.

Pluggable Storage Engine Architecture

If guidance is needed on upgrading to MongoDB 3.2, MongoDB offers consulting to help ensure a smooth transition without interruption.