Jason Ma

12 results

Developing a Facebook Chatbot with AWS Lambda and MongoDB Atlas

This post is part of our Road to re:Invent series . In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. Introduction While microservices have been the hot trend over the past couple of years, serverless architectures have been gaining momentum by providing a new way to build scalable, responsive and cost effective applications. Serverless computing frees developers from the traditional cost and effort of building applications by automatically provisioning servers and storage, maintaining infrastructure, upgrading software, and only charging for consumed resources. More insight into serverless computing can be found in this whitepaper . Amazon’s serverless computing platform, AWS Lambda, lets you run code without provisioning and running servers. MongoDB Atlas is Hosted MongoDB as a Service. MongoDB Atlas provides all the features of the database without the heavy operational lifting. Developers no longer need to worry about operational tasks such as provisioning, configuration, patching, upgrades, backups, and failure recovery. In addition, MongoDB Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. Together, AWS Lambda and MongoDB Atlas allow developers to spend more time developing code and less time managing the infrastructure. Learn how to easily integrate an AWS Lambda Node.js function with a MongoDB database in this tutorial. To demonstrate the power of serverless computing and managed database as a service, I’ll use this blog post to show you how to develop a Facebook chatbot that responds to weather requests and stores the message information in MongoDB Atlas . Setting Up MongoDB Atlas MongoDB Atlas provides multiple size options for instances. Within an instance class, there is also the ability to customize storage capacity and storage speed, as well as to use encrypted storage volumes. The number of virtual CPUs (vCPUs) – where a vCPU is a shared physical core or one hyperthread – increases as the instance class grows larger. The M10, M20, and M30 instances are excellent for development and testing purposes, but for production it is recommended to use instances higher than M30. The base options for instances are: M0 - Variable RAM, 512 MB Storage M10 – 2 GB RAM, 10 GB Storage, 1 vCPU M20 – 4 GB RAM, 20 GB Storage, 2 vCPUs M30 – 8 GB RAM, 40 GB Storage, 2 vCPUs M40 – 16 GB RAM, 80 GB Storage, 4 vCPUs M50 – 32 GB RAM, 160 GB Storage, 8 vCPUs M60 – 64 GB RAM, 320 GB Storage, 16 vCPUs M100 – 160 GB RAM, 1000 GB Storage, 40 vCPUs Register with MongoDB Atlas and use the intuitive user interface to select the instance size, region, and features you need. Connecting MongoDB Atlas to AWS Lambda Important note : VPC Peering is not available with MongoDB Atlas free tier (M0). If you use an M0 cluster, allow any IP to connect to your M0 cluster and switch directly to the Set up AWS Lambda section. MongoDB Atlas enables VPC (Virtual Private Cloud) peering, which allows you to easily create a private networking connection between your application servers and backend database. Traffic is routed between the VPCs using private IP addresses. Instances in either VPC can communicate with each other as if they are within the same network. Note, VPC peering requires that both VPCs be in the same region. Below is an architecture diagram of how to connect MongoDB Atlas to AWS Lambda and route traffic to the Internet. Figure 1: AWS Peering Architecture Architecture For our example, a Network Address Translation (NAT) Gateway and Internet Gateway (IGW) is needed as the Lambda function will require internet access to query data from the Yahoo weather API. The Yahoo weather API will be used to query real-time weather data from the chatbot. The Lambda function we will create resides in the private subnet of our VPC. Because the subnet is private, the IP addresses assigned to the Lambda function cannot be used in public. To solve this issue, a NAT Gateway can be used to translate private IP addresses to public, and vice versa. An IGW is also needed to provide access to the internet. The first step is to set up an Elastic IP address, which will be the static IP address of your Lambda functions to the outside world. Go to Services->VPC->Elastic IPs , and allocate a new Elastic IP address. Next we will create a new VPC, which you will attach to your Lambda function. Go to Services->VPC->Start VPC Wizard . After clicking VPC wizard, select VPC with Public and Private Subnets . Let’s configure our VPC. Give the VPC a name (e.g., “Chatbot App VPC”), select an IP CIDR block, choose an Availability Zone, and select the Elastic IP you created in the previous step. Note, the IP CIDR that you select for your VPC, must not overlap with the Atlas IP CIDR. Click Create VPC to set up your VPC. The AWS VPC wizard will automatically set up the NAT and IGW. You should see the VPC you created in the VPC dashboard. Go to the Subnets tab to see if your private and public subnets have been set up correctly. Click on the Private Subnet and go to the Route Table tab in the lower window. You should see the NAT gateway set to 0.0.0.0/0, which means that messages sent to IPs outside of the private subnet will be routed to the NAT gateway. Next, let's check the public subnet to see if it’s configured correctly. Select Public subnet and the Route Table tab in the lower window. You should see 0.0.0.0/0 connected to your IGW. The IGW will enable outside internet traffic to be routed to your Lambda functions. Now, the final step is initiating a VPC peering connection between MongoDB Atlas and your Lambda VPC. Log in to MongoDB Atlas, and go to Clusters->Security->Peering->New Peering Connection . After successfully initiating the peering connection, you will see the Status of the peering connection as Waiting for Approval . Go back to AWS and select Services->VPC->Peering Connections . Select the VPC peering connection. You should see the connection request pending. Go to Actions and select Accept Request . Once the request is accepted, you should see the connection status as active . We will now verify that the routing is set up correctly. Go to the Route Table of the Private Subnet in the VPC you just set up. In this example, it is rtb-58911e3e . You will need to modify the Main Route Table (see Figure 1) to add the VPC Peering connection. This will allow traffic to be routed to MongoDB Atlas . Go to the Routes tab and select Edit->Add another route . In the Destination field, add your Atlas CIDR block, which you can find in the Clusters->Security tab of the MongoDB Atlas web console: Click in the Target field. A dropdown list will appear, where you should see the peering connection you just created. Select it and click Save . Now that the VPC peering connection is established between the MongoDB Atlas and AWS Lambda VPCs, let’s set up our AWS Lambda function. Set Up AWS Lambda Now that our MongoDB Atlas cluster is connected to AWS Lambda, let’s develop our Lambda function. Go to Services->Lambda->Create Lambda Function . Select your runtime environment (here it’s Node.js 4.3), and select the hello-world starter function. Select API Gateway in the box next to the Lambda symbol and click Next . Create your API name, select dev as the deployment stage, and Open as the security. Then click Next . In the next step, make these changes to the following fields: Name : Provide a name for your function – for example, lambda-messenger-chatbot Handler : Leave as is (index.handler) Role : Create a basic execution role and use it (or use an existing role that has permissions to execute Lambda functions) Timeout : Change to 10 seconds. This is not necessary but will give the Lambda function more time to spin up its container on initialization (if needed) VPC : Select the VPC you created in the previous step Subnet : Select the private subnet for the VPC (don’t worry about adding other subnets for now) Security Groups : the default security group is fine for now Press Next , review and create your new Lambda function. In the code editor of your Lambda function, paste the following code snippet and press the Save button: 'use strict'; var VERIFY_TOKEN = "mongodb_atlas_token"; exports.handler = (event, context, callback) => { var method = event.context["http-method"]; // process GET request if(method === "GET"){ var queryParams = event.params.querystring; var rVerifyToken = queryParams['hub.verify_token'] if (rVerifyToken === VERIFY_TOKEN) { var challenge = queryParams['hub.challenge'] callback(null, parseInt(challenge)) }else{ callback(null, 'Error, wrong validation token'); } } }; This is the piece of code we'll need later on to set up the Facebook webhook to our Lambda function. Set Up AWS API Gateway Next, we will need to set up the API gateway for our Lambda function. The API gateway will let you create, manage, and host a RESTful API to expose your Lambda functions to Facebook messenger. The API gateway acts as an abstraction layer to map application requests to the format your integration endpoint is expecting to receive. For our example, the endpoint will be our Lambda function. Go to Services->API Gateway->[your Lambda function]->Resources->ANY . Click on Integration Request . This will configure the API Gateway to properly integrate Facebook with your backend application (AWS Lambda). We will set the integration endpoint to lambda-messenger-bot , which is the name I chose for our Lambda function. Uncheck Use Lambda Proxy Integration and navigate to the Body Mapping Templates section. Select When there are no templates defined as the Request body passthrough option and add a new template called application/json . Don't select any value in the Generate template section, add the code below and press Save : ## See http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-mapping-template-reference.html ## This template will pass through all parameters including path, querystring, header, stage variables, and context through to the integration endpoint via the body/payload #set($allParams = $input.params()) { "body-json" : $input.json('$'), "params" : { #foreach($type in $allParams.keySet()) #set($params = $allParams.get($type)) "$type" : { #foreach($paramName in $params.keySet()) "$paramName" : "$util.escapeJavaScript($params.get($paramName))" #if($foreach.hasNext),#end #end } #if($foreach.hasNext),#end #end }, "stage-variables" : { #foreach($key in $stageVariables.keySet()) "$key" : "$util.escapeJavaScript($stageVariables.get($key))" #if($foreach.hasNext),#end #end }, "context" : { "account-id" : "$context.identity.accountId", "api-id" : "$context.apiId", "api-key" : "$context.identity.apiKey", "authorizer-principal-id" : "$context.authorizer.principalId", "caller" : "$context.identity.caller", "cognito-authentication-provider" : "$context.identity.cognitoAuthenticationProvider", "cognito-authentication-type" : "$context.identity.cognitoAuthenticationType", "cognito-identity-id" : "$context.identity.cognitoIdentityId", "cognito-identity-pool-id" : "$context.identity.cognitoIdentityPoolId", "http-method" : "$context.httpMethod", "stage" : "$context.stage", "source-ip" : "$context.identity.sourceIp", "user" : "$context.identity.user", "user-agent" : "$context.identity.userAgent", "user-arn" : "$context.identity.userArn", "request-id" : "$context.requestId", "resource-id" : "$context.resourceId", "resource-path" : "$context.resourcePath" } } The mapping template will structure the Facebook response in the desired format specified by the application/json template. The Lambda function will then extract information from the response and return the required output to the chatbot user. For more information on AWS mapping templates, see the AWS documentation . Go back to Services->API Gateway->[your Lambda function]->Resources->ANY and select Method Request . In the Settings section, make sure NONE is selected in the Authorization dropdown list. If not, change it NONE and press the small Update button. Go back to the Actions button for your API gateway and select Deploy API to make your API gateway accessible by the internet. Your API gateway is ready to go. Set Up Facebook Messenger Facebook makes it possible to use Facebook Messenger as the user interface for your chatbot. For our chatbot example, we will use Messenger as the UI. To create a Facebook page and Facebook app, go to the Facebook App Getting Started Guide to set up your Facebook components. To connect your Facebook App to AWS Lambda you will need to go back to your API gateway. Go to your Lambda function and find the API endpoint URL (obscured in the picture below). Go back to your Facebook App page and in the Add Product page, click on the Get Started button next to the Messenger section. Scroll down and in the Webhooks section, press the Setup webhooks button. A New Page Subscription page window should pop up. Enter your API endpoint URL in the Callback URL text box and in the Verify Token text box, enter a token name that you will use in your Lambda verification code (e.g. mongodb_atlas_token ). As the Facebook docs explain, your code should look for the Verify Token and respond with the challenge sent in the verification request. Last, select the messages and messaging_postbacks subscription fields. Press the Verify and Save button to start the validation process. If everything went well, the Webhooks section should show up again and you should see a Complete confirmation in green: In the Webhooks section, click on Select a Page to select a page you already created. If you don't have any page on Facebook yet, you will first need to create a Facebook page . Once you have selected an existing page and press the Subscribe button. Scroll up and in the Token Generation section, select the same page you selected above to generate a page token. The first time you want to complete that action, Facebook might pop up a consent page to request your approval to grant your Facebook application some necessary page-related permissions. Press the Continue as [your name] button and the OK button to approve these permissions. Facebook generates a page token which you should copy and paste into a separate document. We will need it when we complete the configuration of our Lambda function. Connect Facebook Messenger UI to AWS Lambda Function We will now connect the Facebook Messenger UI to AWS Lambda and begin sending weather queries through the chatbot. Below is the index.js code for our Lambda function. The index.js file will be packaged into a compressed archive file later on and loaded to our AWS Lambda function. "use strict"; var assert = require("assert"); var https = require("https"); var request = require("request"); var MongoClient = require("mongodb").MongoClient; var facebookPageToken = process.env["PAGE_TOKEN"]; var VERIFY_TOKEN = "mongodb_atlas_token"; var mongoDbUri = process.env["MONGODB_ATLAS_CLUSTER_URI"]; let cachedDb = null; exports.handler = (event, context, callback) => { context.callbackWaitsForEmptyEventLoop = false; var httpMethod; if (event.context != undefined) { httpMethod = event.context["http-method"]; } else { //used to test with lambda-local httpMethod = "PUT"; } // process GET request (for Facebook validation) if (httpMethod === "GET") { console.log("In Get if loop"); var queryParams = event.params.querystring; var rVerifyToken = queryParams["hub.verify_token"]; if (rVerifyToken === VERIFY_TOKEN) { var challenge = queryParams["hub.challenge"]; callback(null, parseInt(challenge)); } else { callback(null, "Error, wrong validation token"); } } else { // process POST request (Facebook chat messages) var messageEntries = event["body-json"].entry; console.log("message entries are " + JSON.stringify(messageEntries)); for (var entryIndex in messageEntries) { var messageEntry = messageEntries[entryIndex].messaging; for (var messageIndex in messageEntry) { var messageEnvelope = messageEntry[messageIndex]; var sender = messageEnvelope.sender.id; if (messageEnvelope.message && messageEnvelope.message.text) { var onlyStoreinAtlas = false; if ( messageEnvelope.message.is_echo && messageEnvelope.message.is_echo == true ) { console.log("only store in Atlas"); onlyStoreinAtlas = true; } if (!onlyStoreinAtlas) { var location = messageEnvelope.message.text; var weatherEndpoint = "https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22" + location + "%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys"; request( { url: weatherEndpoint, json: true }, function(error, response, body) { try { var condition = body.query.results.channel.item.condition; var response = "Today's temperature in " + location + " is " + condition.temp + ". The weather is " + condition.text + "."; console.log( "The response to send to Facebook is: " + response ); sendTextMessage(sender, response); storeInMongoDB(messageEnvelope, callback); } catch (err) { console.error( "error while sending a text message or storing in MongoDB: ", err ); sendTextMessage(sender, "There was an error."); } } ); } else { storeInMongoDB(messageEnvelope, callback); } } else { process.exit(); } } } } }; function sendTextMessage(senderFbId, text) { var json = { recipient: { id: senderFbId }, message: { text: text } }; var body = JSON.stringify(json); var path = "/v2.6/me/messages?access_token=" + facebookPageToken; var options = { host: "graph.facebook.com", path: path, method: "POST", headers: { "Content-Type": "application/json" } }; var callback = function(response) { var str = ""; response.on("data", function(chunk) { str += chunk; }); response.on("end", function() {}); }; var req = https.request(options, callback); req.on("error", function(e) { console.log("problem with request: " + e); }); req.write(body); req.end(); } function storeInMongoDB(messageEnvelope, callback) { if (cachedDb && cachedDb.serverConfig.isConnected()) { sendToAtlas(cachedDb, messageEnvelope, callback); } else { console.log( => connecting to database ${mongoDbUri} ); MongoClient.connect(mongoDbUri, function(err, db) { assert.equal(null, err); cachedDb = db; sendToAtlas(db, messageEnvelope, callback); }); } } function sendToAtlas(db, message, callback) { db.collection("records").insertOne({ facebook: { messageEnvelope: message } }, function(err, result) { if (err != null) { console.error("an error occurred in sendToAtlas", err); callback(null, JSON.stringify(err)); } else { var message = Inserted a message into Atlas with id: ${result.insertedId} ; console.log(message); callback(null, message); } }); } We are passing the MongoDB Atlas connection string (or URI) and Facebook page token as environment variables so we'll configure them in our Lambda function later on. For now, clone this GitHub repository and open the README file to find the instructions to deploy and complete the configuration of your Lambda function. Save your Lambda function and navigate to your Facebook Page chat window to verify that your function works as expected. Bring up the Messenger window and enter the name of a city of your choice (such as New York , Paris or Mumbai ). Store Message History in MongoDB Atlas AWS Lambda functions are stateless; thus, if you require data persistence with your application you will need to store that data in a database. For our chatbot, we will save message information (text, senderID, recipientID) to MongoDB Atlas (if you look at the code carefully, you will notice that the response with the weather information comes back to the Lambda function and is also stored in MongoDB Atlas). Before writing data to the database, we will first need to connect to MongoDB Atlas . Note that this code is already included in the index.js file. function storeInMongoDB(messageEnvelope, callback) { if (cachedDb && cachedDb.serverConfig.isConnected()) { sendToAtlas(cachedDb, messageEnvelope, callback); } else { console.log(`=> connecting to database ${mongoDbUri}`); MongoClient.connect(mongoDbUri, function(err, db) { assert.equal(null, err); cachedDb = db; sendToAtlas(db, messageEnvelope, callback); }); } } sendToAtlas will write chatbot message information to your MongoDB Atlas cluster. function sendToAtlas(db, message, callback) { db.collection("records").insertOne({ facebook: { messageEnvelope: message } }, function(err, result) { if (err != null) { console.error("an error occurred in sendToAtlas", err); callback(null, JSON.stringify(err)); } else { var message = `Inserted a message into Atlas with id: ${result.insertedId}`; console.log(message); callback(null, message); } }); } Note that the storeInMongoDB and sendToAtlas methods implement MongoDB's recommended performance optimizations for AWS Lambda and MongoDB Atlas , including not closing the database connection so that it can be reused in subsequent calls to the Lambda function. The Lambda input contains the message text, timestamp, senderID and recipientID, all of which will be written to your MongoDB Atlas cluster. Here is a sample document as stored in MongoDB: { "_id": ObjectId("58124a83c976d50001f5faaa"), "facebook": { "message": { "sender": { "id": "1158763944211613" }, "recipient": { "id": "129293977535005" }, "timestamp": 1477593723519, "message": { "mid": "mid.1477593723519:81a0d4ea34", "seq": 420, "text": "San Francisco" } } } } If you'd like to see the documents as they are stored in your MongoDB Atlas database, download MongoDB Compass , connect to your Atlas cluster and visualize the documents in your fbchats collection: Note that we're storing both the message as typed by the user, as well as the response sent back by our Lambda function (which comes back to the Lambda function as noted above). Using MongoDB Atlas with other AWS Services In this blog, we demonstrated how to build a Facebook chatbot, using MongoDB Atlas and AWS Lambda. MongoDB Atlas can also be used as the persistent data store with many other AWS services, such as Elastic Beanstalk and Kinesis. To learn more about developing an application with AWS Elastic Beanstalk and MongoDB Atlas, read Develop & Deploy a Node.js App to AWS Elastic Beanstalk & MongoDB Atlas . To learn how to orchestrate Lambda functions and build serverless workflows, read Integrating MongoDB Atlas, Twilio, and AWS Simple Email Service with AWS Step Functions . For information on developing an application with AWS Kinesis and MongoDB Atlas , read Processing Data Streams with Amazon Kinesis and MongoDB Atlas . To learn how to use your favorite language or framework with MongoDB Atlas, read Using MongoDB Atlas From Your Favorite Language or Framework . About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner .

November 18, 2016

Microservices Webinar Recap

Recently, we held a webinar discussing microservices, and how two companies, Hudl and UPS i-parcel, leverage MongoDB as the database powering their microservices environment. There have been a number of theoretical and vendor-led discussions about microservices over the past couple of years. We thought it would be of value to share with you real world insights from companies who have actually adopted microservices, as well as answers to questions we received from the audience during the live webinar. Jon Dukulil is the VP of Engineering from Hudl and Yursil Kidwai is the VP of Technology from UPS i-parcel. How are Microservices different from Service Oriented Architectures (SOAs) utilizing SOAP/REST with an Enterprise Service Bus (ESB)? Microservices and SOAs are related in that both approaches distribute applications into individual services. Where they differ though, is the scope of the problem they address today. SOAs aim for flexibility at the enterprise IT level. This can be a complex undertaking as SOAs only work when the underlying services do not need to be modified. Microservices represent an architecture for an individual service, and aim at facilitating continous delivery and parallel development of multiple services. The following graphic highlights some of the differences. One significant difference between SOAs and microservices revolves around the messaging system, which coordinates and synchronizes communication between different services in the application. Enterprise service buses (ESB) emerged as a solution for SOAs because of the need for service integration and a central point of coordination. As ESBs grew in popularity, enterprise vendors packaged more and more software and smarts into the middleware, making it difficult to decouple the different services that relied on the ESB for coordination. Microservices keep the messaging middleware focused on sharing data and events, and enabling more of the intelligence at the endpoints. This makes it easier to decouple and separate individual services. How big should a microservice be? There are many differing opinions about how large a microservice should be, thus it really depends on your application needs. Here is how Hudl and UPS i-parcel approach that question. Jon Dukulil (Hudl) : We determine how big our microservice should be the amount of work that can be completed by a squad. For us, a squad is a small completely autonomous team. It consists of 4 separate functions: product manager, developer, UI designer, and QA. When we are growing headcount we are not thinking of growing larger teams, we are thinking of adding more squads. ![](https://webassets.mongodb.com/_com_assets/cms/Microservices_MongoDB_Blog2-a6l74owk23.png) Yursil Kidwai (UPS i-parcel) : For us, we have defined microservice as a single verb (e.g. Billing), and are constantly challenging ourselves on how that verb should be defined. We follow the “two pizza” rule, in which a team should never be larger than what you can feed with two large pizzas. Whatever our “two pizza” team can deliver in one week is what we consider to be the right size for a microservice. Why should I decouple databases in a microservices environment? Can you elaborate on this? One of the core principles behind microservices is strong cohesion (i.e. related code grouped together) and loose coupling (i.e. a change to one service should not require a change to another). With a shared database architecture both these principles are lost. Consumers are tied to a specific technology choice, as well as particular database implementation. Application logic may also be spread among multiple consumers. If a shared piece of information needs to be edited, you might need to change the behavior in multiple places, as well as deploy all those changes. Additionally, in a shared database architecture a catastrophic failure with the infrastructure has the potential to affect multiple microservices and result in a substantial outage. Thus, it is recommended to decouple any shared databases so that each microservice has its own database. Due to the distributed nature of microservices, there are more failure points. Because of all these movable parts in microservices, how do you deal with failures to ensure you meet your SLAs? Jon Dukulil (Hudl) : For us it’s an important point. By keeping services truly separate where they share as little as possible, that definitely helps. You’ll hear people working with microservices talk about “minimizing the blast radius” and that’s what I mean by the separation of services. When one service does have a failure it doesn’t take everything else down with it. Another thing is that when you are building out your microservices architecture, take care of the abstractions that you create. Things in a monolith that used to be a function call are now a network call, so there are many more things that can fail because of that: networks can timeout, network partitions, etc. Our developers are trained to think about what happens if we can’t complete the call. For us, it was also important to find a good circuit breaker framework and we actually wrote our own .NET version of a framework that Netflix built called Hystrix. That has been pretty helpful to isolate points of access between services and stop failures from cascading. Yursil Kidwai (UPS i-parcel) : One of the main approaches we took to deal with failures and dependencies was the choice to go with MongoDB. The advantage for us is MongoDB’s ability to deploy a single replica set across multiple regions. We make sure our deployment strategy always includes multiple regions to create that high availability infrastructure. Our goal is to always be up, and the ability of MongoDB’s replica sets to very quickly recover from failures is key to that. Another approach was around monitoring. We built our own monitoring framework that we are reporting on with Datadog. We have multiple 80 inch TVs displaying dashboards of the health of all our microservices. The dashboards are monitoring the throughput of the microservices on a continual basis, with alerts to our ops team configured if the throughput for a service falls below an acceptable threshold level. Finally, it’s important for the team to be accountable. Developers can’t just write code and not worry about, but they own the code from beginning to end. Thus, it is important for developers to understand the interdependencies between DevOps, testing, and release in order to properly design a service. Why did you choose MongoDB and how does it fit in with your architecture? Jon Dukulil (Hudl) : One, from a scaling perspective, we have been really happy with MongoDB’s scalability. We have many small databases and a couple of very large databases. Our smallest database today is serving up just 9MB of data. This is pretty trivial so we need these small databases to run on cost effective hardware. Our largest database is orders of magnitude larger and is spread over 8 shards. The hardware needs of those different databases are very different, but they are both running on MongoDB. Fast failovers are another big benefit for us. It’s fully automated and it’s really fast. Failovers are in the order of 1-5 seconds for us, and the more important thing is they are really reliable. We’ve never had an issue where a failover hasn’t gone well. Lastly, since MongoDB has a dynamic schema, for us that means that the code is the schema. If I’m working on a new feature and I have a property that last week was a string, but this week I want it to be an array of strings, I update my code and I’m ready to go. There isn’t much more to it than that. Yursil Kidwai (UPS i-parcel) : In many parts of the world, e-commerce rules governing cross border transaction are still changing and thus our business processes in those areas are constantly being refined. To handle the dynamic environment that our business operates in, the requirement to change the schema was paramount to us. For example, one country may require a tax identification number, while another country may suddenly decide it needs your passport, as well as some other classification number. As these changes are occurring, we really need something behind us that will adapt with us and MongoDB’s dynamic schema gave us the ability to quickly experiment and respond to our ever changing environment. We also needed the ability to scale. We have 20M tracking events across 100 vendors processed daily, as well as tens of thousands of new parcels that enter into our system every day. MongoDB’s ability to scale-out on commodity hardware and its elastic scaling features really allowed us to handle any unexpected inflows. Next Steps To understand more about the business level drivers and architectural requirements of microservices, read Microservices: Evolution of Building Modern Apps Whitepaper . For a technical deep dive into microservices and containers, read Microservices: Containers and Orchestration Whitepaper

October 31, 2016

Getting Started with Python, PyMODM, and MongoDB Atlas

What is PyMODM PyMODM is an object modeling package for Python that works like an Object Relational Mapping (ORM) and provides a validation and modeling layer on top of PyMongo (MongoDB’s Python driver). Developers can use PyMODM as a template to model their data, validate schemas, and easily delete referenced objects. PyMODM can be used with any Python framework, is compatible with Python 3, and is supported by MongoDB. Benefits of PyMODM PyMODM allows developers to focus more on developing application logic instead of creating validation logic to ensure data integrity. Some key benefits of PyMODM are: Field Validation. MongoDB has a dynamic schema, but there are very few production use cases where data is entirely unstructured. Most applications expect some level of data validation either through the application or database tier. MongoDB provides Document Validation within the database. Users can enforce checks on document structure, data types, data ranges, and the presence of mandatory fields. Document validation is useful for centralizing rules across projects and APIs, as well as minimizing redundant code for multiple applications. In certain cases, application side validation makes sense as well, especially when you would like to obviate the need for a round trip between the application and database. PyMODM provides users the ability to define models and validate their data before storing it in MongoDB, thus eliminating the amount of data validation logic developers need to write in the application tier. Built In Reference Handling. PyMODM has built in reference handling to make development simpler. Developers don’t have to plan on normalizing data as much as they would with an RDBMS. PyMODM can automatically populate fields that reference documents in other collections, in a similar way to foreign keys in a RDBMS. For example, you might have a model for a blog post that contains an author. Let’s say we want to keep track of these entities in separate collections. The way we store this in MongoDB is to have the _id from the author document be stored as an author field in the post document: { "title": "Working with PyMODM", "author": ObjectId('57dad74a6e32ab4894ea6898') } If we were using the low-level driver, we would just get an ObjectId when we accessed post['author'] , whereas PyMODM will lazily dereference this field for you: >>> post.author Author(name='Jason Ma') In other words, PyMODM handles all the necessary queries to resolve referenced objects, instead of having to pull out the ids yourself and perform the extra queries manually. PyMODM also provides several strategies for managing how objects get deleted when they are involved in a relationship with other objects. For example, if you have a Book and Publisher class, where each Book document references a Publisher document, you have the option of deleting all Book objects associated with that Publisher. Familiar PyMongo Syntax. PyMODM uses PyMongo-style syntax for queries and updates, which makes it familiar and easy to get started with for those already familiar with PyMongo. Installing PyMODM Getting started with PyMODM is simple. You can install PyMODM with pip . pip install pymodm Connecting to MongoDB Atlas For developers that are interested in minimizing operational database tasks, MongoDB Atlas is an ideal option. MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. Setting up Atlas is simple. Select the instance size that fits your application needs and click “CONFIRM & DEPLOY”. Connecting PyMODM to MongoDB Atlas is straightforward and easy. Just find the connection string and plug it into the ‘connect’ method. To ensure a secure system right out of the box, authentication and IP Address whitelisting are automatically enabled. IP address whitelisting is a key MongoDB Atlas security feature, adding an extra layer to prevent 3rd parties from accessing your data. Clients are prevented from accessing the database unless their IP address has been added to the IP whitelist for your MongoDB Atlas group. For AWS, VPC Peering for MongoDB Atlas is under development and will be available soon, offering a simple, robust solution. It will allow the whitelisting of an entire AWS Security Group within the VPC containing your application servers. from pymodm import connect #Establish a connection to the database and call the connection my-atlas-app connect( 'mongodb://jma:PASSWORD@mongo-shard-00-00-efory.mongodb.net:27017,mongo-shard-00-01-efory.mongodb.net:27017,mongo-shard-00-02-efory.mongodb.net:27017/admin?ssl=true&replicaSet=mongo-shard-0&authSource=admin', alias='my-atlas-app' ) In this example, we have set alias=’my-atlas-app’ . An alias in the connect method is optional, but comes in handy if we ever need to refer to the connection by name. Remember to replace “PASSWORD” with your own generated password. Defining Models One of the big benefits of PyMODM is the ability to define your own models and apply schema validation to those models. The below examples highlight how to use PyMODM to get started with a blog application. Once a connection to MongoDB Atlas is established, we can define our model class. MongoModel is the base class for all top-level models, which represents data stored in MongoDB in a convenient object-oriented format. A MongoModel definition typically includes a number of field instances and possibly a Meta class that provides settings specific to the model: from pymodm import MongoModel, fields from pymongo.write_concern import WriteConcern class User(MongoModel): email = fields.EmailField(primary_key=True) first_name = fields.CharField() last_name = fields.CharField() class Meta: connection_alias = 'my-atlas-app' write_concern = WriteConcern(j=True) In this example, the User model inherits from MongoModel , which means that the User model will create a new collection in the database ( myDatabase.User ). Any class that inherits directly from MongoModel will always get it’s own collection. The character fields ( first_name , last_name ) and email field ( email ) will always store their values as unicode strings. If a user stores some other type in first_name or last_name (e.g. Python ‘bytes’) then PyMODM will automatically convert the field to a unicode string, providing consistent and uniform access to that field. A validator is readily available on CharField, which will validate the maximum string length. For example, if we wanted to limit the length of a last name to 30 characters, we could do: last_name = fields.CharField(max_length=30) For the email field, we set primary_key=True . This means that this field will be used as the id for documents of this MongoModel class. Note, this field will actually be called _id in the database. PyMODM will validate that the email field contents contain a single ‘@’ character. New validators can also be easily be created. For example, the email validator below ensures that the email entry is a Gmail address: def is_gmail_address(string): if not string.endswith(‘@gmail.com’): raise ValidationError(‘Email address must be valid gmail account.’) class User(MongoModel): email = fields.EmailField(validators=[is_gmail_address]) Here, PyMODM will validate that the email field contains a valid Gmail address or throw an error. PyMODM handles field validation automatically whenever a user retrieves or saves documents, or on-demand. By rolling validation into the Model definition, we reduce the likelihood of storing invalid data in MongoDB. PyMODM fields also provide a uniform way of viewing data in that field. If we use a FloatField, for example, we will always receive a float , regardless of whether the data stored in that field is a float, an integer, or a quoted number. This mitigates the amount of logic that developers need to create in their applications. Finally, the last part of our example is the Meta class, which contains two pieces of information. The connection_alias tells the model which connection to use. In a previous code example, we defined the connection alias as my-atlas-app . The write_concern attribute tells the model which write concern to use by default. You can define other Meta attributes such as read concern, read preference, etc. See the PyMODM API documentation for more information on defining the Meta class . Reference Other Models Another powerful feature of PyMODM is the ability to reference other models. Let’s take a look at an example. from pymodm import EmbeddedMongoModel, MongoModel, fields class Comment(EmbeddedMongoModel): author = fields.ReferenceField(User) content = fields.CharField() class Post(MongoModel): title = fields.CharField() author = fields.ReferenceField(User) revised_on = fields.DateTimeField() content = fields.CharField() comments = fields.EmbeddedDocumentListField(Comment) In this example, we have defined two additional model types: Comment and Post . Both these models contain an author , which is an instance of the User model. The User that represents the author in each case is stored among all other Users in the myDatabase.User collection. In the Comment and Post models, we’re just storing the _id of the User in the author field. This is actually the same as the User’s email field, since we set primary_key=True for the field earlier. The Post class gets a little bit more interesting. In order to support commenting on a Post , we’ve added a comments field, which is an EmbeddedDocumentListField . The EmbeddedDocumentListField embeds Comment objects directly into the Post object. The advantage of doing this is that you don’t need multiple queries to retrieve all comments associated with a given Post . Now that we have created models that reference each other, what happens if an author deletes his/her account. PyMODM provides a few options in this scenario: Do nothing (default behaviour). Change the fields that reference the deleted objects to None . Recursively delete all objects that were referencing the object (i.e. delete any comments and posts associated with a User). Don’t allow deleting objects that have references to them. If the deleted object was just one among potentially many other references stored in a list, remove the references from the list. For example, if the application allows for Post to have multiple authors we could remove from the list just the author who deleted their account. For our previous example, let’s delete any comments and posts associated with a User that has deleted his/her account: author = fields.ReferenceField(User, on_delete=ReferenceField.CASCADE) This will delete all documents associated with the reference. In this blog, we have highlighted just a few of the benefits that PyMODM provides. For more information on how to leverage the powerful features of PyMODM, check out this github example of developing a blog with the flask framework. Summary PyMODM is a powerful Python ORM for MongoDB that provides an object-oriented interface to MongoDB documents to make it simple to enforce data validation and referencing in your application. MongoDB Atlas helps developers free themselves from the operational tasks of scaling and managing their database. Together, PyMODM and MongoDB Atlas provide developers a compelling solution to enable fast, iterative development, while reducing costs and operational tasks. Get Started with PyMODM

October 5, 2016

Serverless Architectures: The Evolution of Cloud Computing

Introduction Since the advent of the computer, building software has been a complicated process. Over the past decade, new infrastructure approaches (IaaS and PaaS), software architectures (SOA and Microservices), and methodologies (Agile, Continuous Delivery and DevOps) have emerged to mitigate the complexity of application development. While microservices has been the hot trend over the past couple of years, serverless architectures have been gaining momentum by providing a new way to build scalable and cost effective applications. Serverless computing frees developers from the traditional cost of building applications by automatically provisioning servers and storage, maintaining infrastructure, upgrading software, and only charging for consumed resources. This blog discusses what serverless computing is and key considerations when evaluating a database with your serverless environment. What is Serverless Computing? Serverless computing is the next layer of abstraction in cloud computing. It does not mean that there are no servers, but rather the underlying infrastructure (physical and virtual hosts, virtual machines, containers), as well as the operating system, is abstracted away from the developer. Applications are run in stateless compute containers that are event triggered (e.g. a user uploading a photo which triggers notifications to his/her followers). Developers create functions and depend on the infrastructure to allocate the proper resources to execute the function. If the load on the function grows, the infrastructure will create copies of the function and scale to meet demand. Serverless computing supports multiple languages so developers can choose the tools they are most comfortable with. Users are only charged for runtime and resources (e.g. RAM) that the function consumes; thus there is no longer any concept of under or over provisioning. For example, if a function runs for 500ms and consumes 15 MB of RAM, the user will only be charged for 500 ms of runtime, and the cost to use 15 MB of RAM. Serverless architectures are a natural extension of microservices. Similar to microservices, serverless architecture applications are broken down into specific core components. While microservices may group similar functionality into one service, serverless applications delineate functionality into finer grained components. Custom code is developed and executed as isolated, autonomous, granular functions that run in a stateless compute service. To illustrate this point, let’s look at a simple example of how a microservice and serverless architecture differ. In Figure 1, a client interacts with a “User” microservice. A container is pre-provisioned with all of the functionality of the “User” service residing in the container. The service consists of different functions (update_user, get_user, create_user, delete_user) and is scaled based on the overall load across the service. The service will consume hardware resources even while idle, and the user will still be charged for the underutilized resources. ![](https://webassets.mongodb.com/_com_assets/cms/ServerlessIMG1-9v49pgxm88.png) Figure 1: Microservices Architecture For a serverless architecture, the “User” service would be separated into more granular functions. In Figure 2, each API endpoint corresponds to a specific function and file. When a “create user” request is initiated by the client, the entire codebase of the “User” service does not have to run; instead only create_user.js will execute. There is no need to pre-provision containers, as standalone functions only consume resources when needed, and users are only charged on the actual runtime of their functions. This granularity also facilitates parallel development work, as functions can be tested and deployed independently. Figure 2:Serverless Architecture Benefits of Serverless Computing Costs Scale With Usage: One of the biggest benefits of serverless computing is that you only pay for the runtime of your function. There is no concept of “idle” resources as you are not charged if the function is not executed. This is especially helpful for applications that are only used a few times an hour, which means any dedicated hardware, VMs, or containers would be sitting idle for the majority of the time, and a user would be charged for underutilized resources. With serverless computing, enterprises could build out an entire infrastructure and not pay for any compute resources until customers start using the application. Elastic Scalability: Elastic scalability is also simple with a serverless architecture. If a function needs to scale, the infrastructure will make copies of the function to handle the load. An example of this could be a chatbot that responds to weather requests. In a serverless architecture, a chatbot function would handle the response by retrieving the user’s location and responding back with the temperature. For a few requests this is not a problem, but what happens if the chatbot service is flooded with thousands of request a second. For this scenario, the chatbot function would automatically scale by instantiating thousands of copies of the function. Once the requests have subsided, the environment would terminate the idle instances and scale down, allowing costs to scale proportionally with user demand. Rapid Development and Iteration: Serverless computing is ideal for companies that need to quickly develop, prototype, and iterate. Development is quicker since there aren’t any dependencies on IT Ops. Functions are single threaded, which makes debugging and deploying functions simpler. The build process is also broken down into smaller and more manageable chunks. This increases the number of changes that can be pushed through the Continuous Delivery pipeline, resulting in rapid deployment and more iterative feedback. and Iterations are fast as the architecture is conducive to making large code changes quickly, resulting in more customer feedback and better product market fit. Less System Administration: Serverless doesn’t mean that you completely obviate the operational element of your infrastructure, but it does mean that there is less system administration. There are no servers to manage, provision, and scale, as well as no patching and upgrading. Servers are automatically deployed in multiple availability zones to provide high availability. Support is also streamlined; if there is an issue in the middle of the night it is the responsibility of the cloud provider to resolve the problem. Developer Productivity: By using a serverless architecture, developers can focus more on writing code than having to worry about managing the operational tasks of the application. This allows them to develop innovative features and focus on the core business logic that matters most to the business. MongoDB Atlas and Serverless Computing With MongoDB Atlas , users can leverage the rich functionality of MongoDB — expressive query language, flexible schema, always-on availability, distributed scale-out — from a serverless environment. MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. Setting up Atlas is simple. Figure 3: Provisioning MongoDB Atlas cluster Select the instance size that fits your application needs and click “CONFIRM & DEPLOY”. Depending on the instance size, a MongoDB cluster can be provisioned in seconds. Figure 4: MongoDB Atlas Cluster Monitoring MongoDB Atlas provides many benefits for those interested in building a serverless architecture: Vendor Independence: Cloud providers typically only offer databases specific to that provider, which may not fit with what a developer needs. MongoDB Atlas provides independence from the underlying cloud provider and empowers developers to choose the appropriate tools for their needs. Developers can leverage the rich functionality of MongoDB’s query language and flexible data model, without worrying about the operational tasks of managing the database. If you decide to shift to another cloud provider, you won’t have to repopulate your data with a different database technology. MongoDB Atlas is currently available only on AWS, with support for Microsoft Azure and Google Cloud Platform (GCP) coming soon. MEAN Stack: Serverless architectures accelerate the trend of shifting business logic from the back-end to the front-end. This makes the choice of front-end framework much more important. AngularJS , is ideally suited for this requirement and is a popular front-end for serverless architectures. AngularJS is a structural Javascript framework for dynamic web applications that provides interactive functions and AJAX — technique for creating fast and dynamic web pages — rich components. Combined with NodeJS , ExpressJS , and MongoDB, these tools form the MEAN stack (MongoDB, ExpressJS, AngularJS, NodeJS). There are huge advantages to using JavaScript and JSON throughout your serverless stack. Someone working on the front-end can easily understand the function (back-end) code and database queries. Additionally, using the same syntax and objects through the whole stack frees your team from understanding multiple language best practices, as well as reduces the barrier to entry to understand the codebase, resulting in higher software performance and developer productivity. Rapid Deployment: With MongoDB Atlas, a MongoDB cluster can be provisioned and deployed in minutes and even seconds. Developers no longer need to worry about configuring or managing servers. Integrating MongoDB Atlas into your serverless platform requires you to pass the connection string into your serverless application. Figure 5: MongoDB Atlas Connection MongoDB Atlas features extensive capabilities to defend, detect, and control access to MongoDB, offering among the most complete security controls of any modern database: User Rights Management: Control access to sensitive data using industry standard mechanisms for authentication and authorization at the database level Encryption: Protect data in motion over the network and at rest in persistent storage To ensure a secure system right out of the box, authentication and IP Address whitelisting are automatically enabled. IP address whitelisting is a key MongoDB Atlas security feature, adding an extra layer to prevent 3rd parties from accessing your data. Clients are prevented from accessing the database unless their IP address has been added to the IP whitelist for your MongoDB Atlas group. For AWS, VPC Peering for MongoDB Atlas is under development and will be available soon, offering a simple, robust solution. It will allow the whitelisting of an entire AWS Security Group within the VPC containing your application servers. Scalability: You should expect your serverless functions to scale out, thus downstream setups need to be architected to keep up and scale out with your functions. Relational databases tend to break down with this model. MongoDB Atlas is designed with scalability as a core principle. When your cluster hits a certain threshold, MongoDB Atlas will alert you and with one click you can provision new servers. Flexible Schema: Because serverless architectures are event driven, many use cases revolve around around the Internet of Things (IoT) and mobile. MongoDB is ideal for these use cases and more as its flexible document model enables you to store and process data of any type: events, geospatial, time series, text, binary, and anything else. Adding new fields to the document structure is simple, making it easy to handle changing data generated by event driven applications. Developers spend less time modifying schemas and more time innovating. For more information on Serverless Architecture best practices and benefits download the Serverless Architectures: Evolution of Cloud Computing whitepaper. Summary Serverless architectures are relatively new and build on the work done by microservices. MongoDB Atlas is a database as a service and is well suited for serverless architectures as it provides elastic scalability and all the features of the database without managing operational tasks. Though serverless architectures have a lot of benefits, there are many considerations to keep in mind when looking to implement a serverless architecture. Thus, diligent planning should be taken before embarking on that path. Resources Run MongoDB Atlas today. Read the MongoDB Serverless Computing whitepaper . Download the Serverless Architectures White Paper

September 21, 2016

Building Applications with MongoDB's Pluggable Storage Engines: Part 2

In the previous post, I discussed MongoDB’s pluggable storage engine architecture and characteristics of each storage engine. In this post, I will talk about how to select which storage engine to use, as well as mixing and matching storage engines in a replica set. **How To Select Which Storage Engine To Use** WiredTiger Workloads WiredTiger will be the storage engine of choice for most workloads. WiredTiger’s concurrency and excellent read and write throughput is well suited for applications requiring high performance: IoT applications: sensor data ingestion and analysis Customer data management and social apps: updating all user interactions and engagement from multiple activity streams Product catalogs, content management, real-time analytics For most workloads, it is recommended to use WiredTiger. The rest of the whitepaper will discuss situations where other storage engines may be applicable. Encrypted Workloads The Encrypted storage engine is ideally suited to be used in regulated industries such as finance, retail, healthcare, education, and government. Enterprises that need to build compliant applications with PCI DSS, HIPAA, NIST, FISMA, STIG, or other regulatory initiatives can use the Encrypted storage engine with native MongoDB security features such as authorization, access controls, authentication, and auditing to achieve compliance. Before MongoDB 3.2, the primary methods to provide encryption-at-rest were to use 3rd party applications that encrypt files at the application, file system, or disk level. These methods work well with MongoDB but tend to add extra cost, complexity, and overhead. The Encrypted Storage engine adds ~15% overhead compared to WiredTiger, as available CPU cycles are allocated to the encryption/decryption process – though the actual impact will be dependent on your data set and workload. This is still significantly less compared to 3rd party disk and file system encryption, where customers have noticed 25% overhead or more. More information about the performance benchmark of the Encrypted storage engine can be found here. The Encrypted storage engine, combined with MongoDB native security features such as authentication, authorization, and auditing provides an end to end security solution to safeguard data with minimal performance impact. In-Memory Workloads The advantages of in-memory computing are well understood. Data can be accessed in RAM nearly 100,000 times faster than retrieving it from disk, delivering orders-of-magnitude higher performance for the most demanding applications. With RAM prices continuing to tumble and new technologies such as 3D non-volatile memory on the horizon, the performance gains can now be realized with better and improving economics than ever before. Not only is fast access important, but predictable access, or latency, is essential for certain modern day applications. For example, financial trading applications need to respond quickly to fluctuating market conditions as data flows through trading systems. Unpredictable latency outliers can mean the difference between making or losing millions of dollars. While WiredTiger will be more than capable for most use cases, applications requiring predictable latency will benefit the most from the In-Memory storage engine. Enterprises can harness the power of MongoDB core capabilities (expressive query language, primary and secondary indexes, scalability, high availability) with the benefits of predictable latency from the In-Memory storage engine. Examples of when to use the In-Memory engine are: Financial: Algorithmic trading applications that are highly sensitive to predictable latency; such as when latency spikes from high traffic volumes can overwhelm a trading system and cause transactions to be lost or require re-transmission Real-time monitoring systems that detect anomalies such as fraud Applications that require predictable latency for processing of trade orders, credit card authorizations, and other high-volume transactions Government: Sensor data management and analytics applications interested in spatially and temporally correlated events that need to be contextualized with real time sources (weather, social networking, traffic, etc) Security threat detection ECommerce / Retail: Session data of customer profiles during a purchase Product search cache Personalized recommendations in real time Online Gaming: First person shooter games Caching of player data Telco: Real-time processing and caching of customer information and data plans Tracking network usage for millions of users and performing real-time actions such as billing Managing user sessions in real time to create personalized experiences on any mobile device MMAPv1 Workloads Though WiredTiger is better suited for most application workloads, there are certain situations where users may want to remain on MMAPv1: Legacy Workloads: Enterprises that are upgrading to the latest MongoDB releases (3.0 and 3.2) and don’t want to re-qualify their applications with a new storage engine may prefer to remain with MMAPv1. Version Downgrade: The upgrade process from MMAP/MMAPv1 to WiredTiger is a simple binary compatible “drop in” upgrade, but once upgraded to MongoDB 3.0 or 3.2 users cannot downgrade to a version lower than 2.6.8. This should be kept in mind for users that want to stay on version 2.6. There have been many added features included in MongoDB since version 2.6, thus it is highly recommended to upgrade to version 3.2. Mixed Storage Engine Use Cases MongoDB’s flexible storage architecture provides a powerful option to optimize your database. Storage engines can be mixed and matched within a single MongoDB cluster to meet diverse application needs for data. Users can evaluate different storage engines without impacting deployments and can also easily migrate and upgrade to a new storage engine following the rolling upgrade process . To simplify this process even further, users can utilize Ops or Cloud manager to upgrade their cluster’s version of MongoDB through a click of a button. Though there are many possible mixed storage configurations, here are a few examples of mixed storage engine configurations with the In-Memory and WiredTiger engines. **Figure 10**: eCommerce application with mixed storage engines Since the In-Memory storage engine does not persist data as a standalone node, it can be used with another storage engine to persist data in a mixed storage engine configuration. The eCommerce application in Figure 10 uses two sharded clusters with three nodes (1 primary, 2 secondaries) in each cluster. The replica set with the In-Memory engine as the primary node provides low latency access and high throughput to transient user data such as session information, shopping cart items, and recommendations. The application’s product catalog is stored in the sharded cluster with WiredTiger as the primary node. Product searches can utilize the WiredTiger in-memory cache for low latency access. If the product catalog’s data storage requirements exceed server memory capacity, data can be stored and retrieved from disk. This tiered approach enables “hot” data to be accessed and modified quickly in real time, while persisting “cold” data to disk. The configuration in Figure 11 demonstrates how to preserve low latency capabilities in a cluster after failover. Setting priority=1 in the secondary In-Memory node will result in automatic failover to that secondary, and eliminate the need to fully repopulate the failed primary when it comes back online. Additionally, if the transient data needs to be persisted then a secondary WiredTiger node can be configured to act as a replica, providing high availability and disk durability. ![](https://webassets.mongodb.com/_com_assets/cms/StorageEngineIMG12-pmr796cxcj.png) **Figure 11:** Mixed storage engines, with hidden WiredTiger secondary To provide even higher availability and durability a five node replica set with two In-Memory and three WiredTiger nodes can be used. In Figure 12 , the In-Memory engine is the primary node, with four secondary nodes. If a failure to the primary occurs, the secondary In-Memory node will automatically failover as the primary and there will be no need to repopulate the cache. If the new primary In-Memory node also fails, then the replica set will elect a WiredTiger node as primary. This mitigates any disruption in operation as clients will still be able to write uninterrupted to the new WiredTiger primary. ![](https://webassets.mongodb.com/_com_assets/cms/StorageEngineIMG13-qs7y9kdkab.png) **Figure 12:** Mixed storage engines with five node replica set Additionally, a mixed storage engine approach is ideally suited for a microservices architecture. In a microservice architecture, a shared database between services can affect multiple services and slow down development. By decoupling the database and selecting the right storage engines for specific workloads, enterprises can improve performance and quickly develop features for individual services. Learn more about MongoDB and microservices . Conclusion MongoDB is the next generation database used by the world’s most sophisticated organizations, from cutting-edge startups to the largest companies, to create applications never possible at the fraction of the cost of legacy databases. With pluggable storage engine APIs, MongoDB continues to innovate and provide users the ability to choose the most optimal storage engines for their workloads. Now, enterprises have an even richer ecosystem of storage options to solve new classes of use cases with a single database framework. Pluggable Storage Engine Architecture If guidance is needed on upgrading to MongoDB 3.2, MongoDB offers consulting to help ensure a smooth transition without interruption.

September 20, 2016

Building Applications with MongoDB's Pluggable Storage Engines: Part 1

This is the first in a two post series about MongoDB’s pluggable storage engines. This post discusses characteristics of MongoDB’s storage engines. **Introduction** With users building increasingly complex data-driven apps, there is no longer a "one size fits all" database storage technology capable of powering every type of application built for the enterprise. Modern applications need to support a variety of workloads with different access patterns and price/performance profiles – from low latency, in-memory read and write applications, to real time analytics, to highly compressed "active" archives. Through the use of pluggable storage engines, MongoDB can be extended with new capabilities, and configured for optimal use of specific hardware architectures. This approach significantly reduces developer and operational complexity compared to running multiple database technologies. Storage engines can be mixed in the same replica set or sharded cluster. Users can also leverage the same MongoDB query language, data model, scaling, security and operational tooling across different applications, each powered by different pluggable MongoDB storage engines. **Figure 1:** Mix and match storage engines within a single MongoDB replica set MongoDB 3.2 ships with four supported storage engines that can be optimized for specific workloads: The default WiredTiger storage engine. For most applications, WiredTiger's granular concurrency control and native compression will provide the best all-around performance and storage efficiency. The Encrypted storage engine, protecting highly sensitive data, without the performance or management overhead of separate file system encryption. The Encrypted storage engine is based upon WiredTiger and so throughout this whitepaper, statements regarding WiredTiger also apply to the Encrypted storage engine. This engine is part of MongoDB Enterprise Advanced . The In-Memory storage engine for applications that have extremely strict SLAs for consistent and predictable low latency, while not requiring disk durability for the data. This engine is part of MongoDB Enterprise Advanced . The MMAPv1 engine, an improved version of the storage engine used in pre-3.x MongoDB releases. MMAPv1 was the default storage engine in MongoDB 3.0. MongoDB allows users to mix and match multiple storage engines within a single MongoDB cluster. This flexibility provides a simple and reliable approach to support diverse workloads. Traditionally, multiple database technologies would need to be managed to meet these needs, with complex, custom integration code to move data between technologies, and to ensure consistent, secure access. With MongoDB’s flexible storage architecture, the database automatically manages the movement of data between storage engine technologies using native replication. This approach significantly reduces developer and operational complexity when compared to running multiple distinct database technologies. **Table 1:** Comparing the MongoDB WiredTiger, In-Memory, Encrypted, and MMAPv1 storage engines **WiredTiger Storage Engine** MongoDB acquired WiredTiger in 2014, and with it the experts behind the WiredTiger storage engine: co-founders Keith Bostic (founder of Sleepycat Software) and Dr. Michael Cahill, and their colleagues. Bostic and Cahill were the original architects of Berkeley DB, the most widely-used embedded data management software in the world, and have decades of experience writing high performance storage engines. WiredTiger leverages modern hardware architectures and innovative software algorithms to provide industry-leading performance for the most demanding applications. WiredTiger is ideal for wide range of operational applications and is therefore MongoDB’s default storage engine. It should be the starting point for all new applications, with the exception of cases where you need the specific capabilities of the In-Memory or Encrypted storage engines. The key advantages of WiredTiger include: Maximize Available Cache: WiredTiger maximizes use of available memory as cache to reduce I/O bottlenecks. There are two caches that are used: the WiredTiger cache and the filesystem cache. The WiredTiger cache stores uncompressed data and provides in-memory-like performance. The operating system’s filesystem cache stores compressed data. When data is not found in the WiredTiger cache, WiredTiger will look for the data in the filesystem cache. **Figure 2:** WiredTiger Caches (WiredTiger Cache and FS Cache) Data found in the filesystem cache first goes through a decompression process before moving to the WiredTiger cache. The WiredTiger cache performs best when it holds as much of the working set as possible. However, it is also important to reserve memory for other processes that need it such as the operating system, including the filesystem cache. This also includes MongoDB itself, which as a whole will consume more memory than what is in active use by WiredTiger. MongoDB defaults to a WiredTiger cache size of approximately 60% of RAM . The minimum amount to leave the filesystem cache is at 20% of available memory. Anything lower and the operating system may be constrained for resources. High Throughput: WiredTiger uses “copy on write” — when a document is updated WiredTiger will make a new copy of the document and determine the latest version to return to the reader. This approach allows multiple clients to simultaneously modify different documents in a collection, resulting in higher concurrency and throughput. Optimum write performance is achieved when an application is utilizing a host with many cores (the more the better), and multiple threads are writing to different documents. Reducing Storage Footprint and Improving Disk IOPs: WiredTiger uses compression algorithms to reduce the amount of data stored on disk. Not only is storage reduced, but IOPs performance is increased as fewer bits are read from or written to disk. Some types of files compress better than others. Text files are highly compressible, while binary data may not be as compressible since it may already be encoded and compressed. WiredTiger does incur additional CPU cycles when using compression, but users can configure compression schemes to optimize CPU overhead vs. compression ratio. Snappy, which is the default compression engine, provides good balance between high compression ratio with low CPU overhead. Zlib will achieve higher compression ratios, but incur additional CPU cycles. Compression (Indexes and Journals): Indexes can be compressed in memory as well as on disk. WiredTiger utilizes prefix compression to compress the indexes, conserving RAM usage as well as freeing up storage IOPs. Journals are compressed by default with Snappy compression. Multi-Core Scalability: As CPU manufacturers shrink to smaller lithographies and power consumption becomes more and more of an issue, processor trends have shifted to multi-core architectures in order to sustain the cadence of Moore’s law. WiredTiger was designed with modern, multi-core architectures in mind, and provides scalability across multi-core systems. Programming techniques such as hazard pointers, lock free algorithms, and fast latching minimize contention between threads. Threads can perform operations without blocking each other — resulting in less thread contention, better concurrency, and higher throughput. Read Concern: WiredTiger allows users to specify a level of isolation for their reads. Read operations can return a view of data that has been accepted or committed to disk by a majority of the replica set. This provides a guarantee that applications only read data that will persist in the event of failure and won’t get rolled back when a new replica set member is promoted to primary. For more information on migrating from MMAP/MMAPv1 to WiredTiger here is the documentation . **Encrypted Storage Engine** Data security is top of mind for many executives due to increased attacks as well as a series of data breaches in recent years that have negatively impacted several high profile brands. For example, in 2015, a major health insurer was a victim of a massive data breach in which criminals gained access to the Social Security numbers of more than 80 million people ­— resulting in an estimated cost of $100M. In the end, one of the critical vulnerabilities was the health insurer did not encrypt sensitive patient data stored at-rest. Coupled with MongoDB’s extensive access control and auditing capabilities, encryption is a vital component in building applications that are compliant with standards such as HIPAA, FERPA, PCI, SOX, GLBA, ISO 27001, etc. The Encrypted storage engine is based on WiredTiger, and thus is designed for operational efficiency and performance: Document level concurrency control and compression Support for Intel’s AES-NI equipped CPUs for acceleration of the encryption/decryption process As documents are modified, only updated storage blocks need to be encrypted rather than the entire database With the Encrypted storage engine, protection of data at-rest is an integral feature of the database. The raw database “plaintext” content is encrypted using an algorithm that takes a random encryption key as input and generates ciphertext that can only be decrypted with the proper key. The Encrypted Storage Engine supports a variety of encryption algorithms from the OpenSSL library. AES-256 in CBC mode is the default, while other options include AES-256 in GCM mode, as well as FIPS mode for FIPS-140-2 compliance. Encryption is performed at the page level to provide optimal performance. Instead of having to encrypt/decrypt the entire file or database for each change, only the modified pages need to be encrypted or decrypted, resulting in less overhead and higher performance. Additionally, the Encrypted Storage Engine provides safe and secure management of the encryption keys. Each encrypted node contains an internal database key that is used to encrypt or decrypt the data files. The internal database key is wrapped with an external master key, which must be provided to the node for it to initialize. To ensure that keys are never written or paged to disk in unencrypted form, MongoDB uses operating system protection mechanisms, such as VirtualLock and mlock , to lock the process’ virtual memory space into memory. There are two primary ways to manage the master key: through an integration with a third party key management appliance via the Key Management Interoperability Protocol (KMIP) or local key management via a keyfile. Most regulatory requirements mandate that the encryption keys be rotated and replaced with a new key at least once annually. MongoDB can achieve key rotation without incurring downtime by performing rolling restarts of the replica set. When using a KMIP appliance, the database files themselves do not need to be re-encrypted, thereby avoiding the significant performance overhead imposed by key rotation in other databases. Only the master key is rotated, and the internal database keystore is re-encrypted. It is recommended to use a KMIP appliance with the Encrypted storage engine. **In-Memory Storage Engine** In modern applications, different subsets of application data have different latency and durability requirements. The In-Memory storage engine option is created for applications that have extremely strict SLAs even at 99th percentiles. The In-Memory engine will keep all of the data in memory, and will not write anything to disk. Data always has to be populated on start-up, and nothing can be assumed to be present on restart, including application data and system data (i.e users, permissions, index definitions, oplog, etc). All data must fit into the specified in-memory cache size. The In-Memory storage engine combines the predictable latency benefits of an “in memory cache” with the rich query and analytical capabilities of MongoDB. It has the advantage of using the exact same APIs as any other MongoDB server so your applications do not need special code to interact with the cache, such as handling cache invalidation as data is updated. In addition, a mongod that's configured with the In-Memory storage engine can be part of a replica set, and thus can have another node in the same replica set backed by fast persistent storage. The In-Memory engine is currently supported on MongoDB 3.2.6+. For performance metrics on the In-Memory storage engine view the MongoDB Pluggable Storage Engine white paper . For applications requiring predictable latencies, the In-Memory engine is the recommended storage engine as it provides low latency while also minimizing tail latencies resulting in high performance and a consistent user experience. Some of the key benefits of the In-Memory engine: Predictable and consistent latency for applications that want to minimize latency spikes Applications can combine separate caching and database layers into a single layer— all accessed and managed with the same APIs, operational tools, and security controls Data redundancy with use of a WiredTiger secondary node in a replica set **MMAPv1 Storage Engine** The MMAPv1 engine is an improved version of the storage engine used in pre 3.x MongoDB releases. It utilizes collection level concurrency and memory mapped files to access the underlying data storage. Memory management is delegated to the operating system. This prevents compression of collection data, though journals are compressed with Snappy. In the second part of this blog series, we will discuss how to select which storage engine to use. Learn more about MongoDB’s pluggable storage engines. Read the whitepaper. Pluggable Storage Engine Architecture About the author - Jason Ma Jason Ma is a Principal Product Marketing Manager based in Palo Alto, and has extensive experience in technology hardware and software. He previously worked for SanDisk in Corporate Strategy doing M&A and investments, and as a Product Manager on the Infiniflash All-Flash JBOF. Before SanDisk, he worked as a HW engineer at Intel and Boeing. Jason has a BSEE from UC San Diego, MSEE from the University of Southern California, and an MBA from UC Berkeley.

September 19, 2016

Visualizing your data with MongoDB Compass

Introduction Since the inception of MongoDB, the primary interface to the database has been the mongo shell. While the shell is incredibly powerful, it can be intimidating for new users, and let's face it — there are lots of things about managing data and a database that are easier to do with a GUI. In MongoDB 3.2 we introduced MongoDB Compass — our GUI for MongoDB. Our goal with Compass is to provide all the power of the mongo shell through an intuitive, easy to use GUI. In this post, we'll take a closer look at MongoDB Compass and our latest release, which adds CRUD capabilities and a long list of other exciting features. MongoDB 3.2 Introduces Compass v1.0 At MongoDB our user community is extremely important to us. Through meetups, community events, surveys, conferences — just to name a few — we collect user feedback to ensure we continue to build products that address and solve the needs of our users. Over the past couple of years, we have received an overwhelming number of requests for a schema visualizer and query builder. With MongoDB 3.2, we answered our users with MongoDB Compass v1.0 — a GUI schema visualization tool and query builder. MongoDB Compass provides a native GUI to analyze collections and beautifully visualize your schema. Evolution of Compass During our beta program for Compass, we received a lot of great feedback. The three most requested features were: edit/insert/delete operations, visualization of indexes, and visualization of query performance. In Compass v1.2 , which is now in GA, we added the ability to visualize indexes and query performance. The figure below highlights the details of indexes in a collection. Users are not only able to determine which fields are indexed, but also their type, size, usage, and properties of their indexes. Introducing Compass v1.3 Today, we are very excited to introduce Compass v1.3, currently available in beta . Compass v1.3 contains many new features, including: Insert/edit/delete/clone documents through the GUI Build and visually interact with geo/coordinate data to construct queries in a few clicks of a button Visual explain plans to understand the performance of a query SSH tunnels to allow users to connect securely from outside of a datacenter firewall Let’s take a closer look at these features in a little more detail. Edit/Insert/Delete/Clone Documents The most requested feature in our beta program was the ability to edit/insert/delete documents. With Compass v1.3, users will be able to: Modify existing documents Insert new documents Delete existing documents Clone the schema and values of an existing document Editing a document is simple - documents are modified through a powerful editing visualizer. Clicking on the pencil icon puts the document in edit mode. Once in edit mode, users can freely edit any value, field, or data type. Deleted fields are marked in red, while new fields are highlighted in green. Any changes made will not be sent to the server until the user confirms the edits by clicking UPDATE . Documents can also be easily inserted into MongoDB by clicking the + INSERT button at the top of the document list. Clicking + INSERT will open up a new window where a user can modify the contents of the document to be inserted. Deleting documents is straightforward as well – users can click the trash icon at the top right corner of the document in the document list to delete a document. A document can be cloned by clicking on the clone icon within the document. Compass will clone all the elements of the document, except for _id where a new _id will be generated. A new window will appear with the cloned document that contains the same values and fields as the original. You can edit any fields or types before inserting the document. Introducing Map and Geo Query Features Another powerful feature for Compass v1.3 is the ability to see and understand geographical data. Geo data can be displayed on an interactive map, and drawing on the map creates a geo-query with resulting coordinate pairs displayed on the map. The map can be zoomed and panned with a click and hold of the mouse. Holding the shift key enables a circle to appear and the circle can be moved, expanded, or retracted. The example below represents a database of Yelp restaurant reviews in Silicon Valley. Here, restaurants with 4 stars (or greater) and within a 5 mile radius of San Jose Airport are selected. Expanding or retracting the radius of the circle automatically builds a new query in the Compass command bar. In this example, applying the query returns all restaurants with 4 stars (or greater) and within a 5 mile radius of San Jose Airport. Visualize Query Execution Performance (Explain Plans) Compass also provides the ability to visualize explain plans. To access this feature, users navigate to the EXPLAIN PLAN tab. Key information about the query execution plan are highlighted. Explain plans are easier to understand in a new query plan visualization. Here a tree structure presents information on how the query performed — # documents returned, query execution time, index usage, etc. Each stage of the pipeline is shown as a node in the tree. This makes viewing explain plans in sharded clusters extremely simple. Details of each stage of the query plan can be investigated in further detail by clicking on the DETAILS button, which provides more information about that particular stage. Comprehensive Security MongoDB Compass provides comprehensive enterprise grade security. Compass supports Kerberos, MongoDB-CR, SCRAM-SHA-1, LDAP based authentication, as well as x.509 certificates. SSL/TLS encrypts data transferred “over the wire” between MongoDB and the clients. Users are also now able to connect to deployments via ssh tunnel — this is useful for connecting to production clusters when outside of the data center firewall. Connecting Compass with MongoDB Atlas At MongoDB World 2016 , we announced MongoDB Atlas , which is a database as a service. MongoDB Atlas provides all the features of the database without the heavy operational lifting. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery of the database. In addition, Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. Connecting Compass to Atlas is simple. You first need to find your MongoDB Atlas connection string. From the connection string, take the hostnames, as well as your credentials and enter in the information into the Compass login screen, and just click CONNECT to login. Benefits of Compass MongoDB Compass addresses an important aspect in application development today as new software development trends and practices emerge. Microservices and continuous delivery continue to become more pervasive, resulting in shifts in operational culture and organizational thinking. Developer and operation teams can no longer operate in silos but must be more tightly coupled to successfully deploy new features. MongoDB Compass addresses these evolving organizational requirements and provides many benefits ideal for modern app development: Faster Time To Market: Modern software practices, such as DevOps , use a wide variety of different components that require teams to be well versed in many technologies. Shared insight between teams is critical to ensure multiple technologies integrate seamlessly and function as expected. Compass provides valuable insight into your data so that teams can efficiently integrate MongoDB into their IT stack, resulting in quicker prototyping and faster time to market. Easier Project Handoffs: MongoDB’s flexible schema and and rich document structure make it easy to build applications with any data model. Though this is empowering for developers, it may be unfamiliar for new teams picking up the project. This problem can be compounded when teams are distributed. With Compass, teams can quickly visualize and understand the organization of their data and share that with other teams to minimize uncertainty during project handoffs. Increased Productivity: GUIs provide a natural visualization tool that resonate with many users. With Compass, new users can quickly visualize the data model so they spend more time deriving value from their data, instead of working with their data. Availability MongoDB Compass is included with both MongoDB Professional and MongoDB Enterprise Advanced subscriptions. MongoDB Compass is free to use for evaluation and in development environments. Next Steps: Read MongoDB Compass v1.3 documentation here Download MongoDB Compass today Try it now

July 19, 2016

Building Modern Applications with Microservices: Part 2

In the previous post , I discussed the background behind microservices and their advantages. In this post, I will talk about how MongoDB enables microservices, as well as considerations to keep in mind before implementing a microservices project. How MongoDB Enables Microservices There are some fundamental technology principles that are required to ensure companies can reap the advantages of microservices, specifically around a flexible data model, redundancy, automation, and scalability. Flexible Data Model: MongoDB’s dynamic schema is ideal for handling the requirements of microservices and continuous delivery. When rolling out a new feature that changes the data model, there’s no requirement to update all of the existing records, something that can take weeks for a relational database. Developers can quickly iterate and model data against an ever changing environment, resulting in faster time to market and greater agility. Redundancy: Due to the distributed nature of microservices, there are more potential failure points, such as more network links, and thus, microservices need to be designed with redundancy in mind. MongoDB is well suited for this requirement, as it provides built-in redundancy through MongoDB replica sets . Replica sets not only provide greater resilience to failure, but also provide disaster recovery with multi-data center deployments and the ability to isolate operational workloads from analytical reporting in a single database cluster. Monitoring and Automation: With a small number of services, it is not difficult to manage tasks manually. As the number of services grow, productivity can stall if there is not an automated process in place to handle the growing complexity. Choosing technology that handles monitoring and automation is key to ensuring devops teams can remain productive, especially as the environment becomes more complex. MongoDB Ops Manager (also available as the hosted Cloud Manager service) features visualization, custom dashboards, and automated alerting to help manage a complex environment. Ops Manager tracks 100+ key database and systems health metrics including operations counters, CPU utilization, replication status, and any node status. The metrics are securely reported to Ops Manager where they are processed and visualized. Figure 1: Ops Manager provides real time & historic visibility into the MongoDB deployment IIntegration with existing monitoring tools is also straightforward via the Ops Manager RESTful API, and with packaged integrations to leading Application Performance Management (APM) platforms such as New Relic. This integration allows MongoDB status to be consolidated and monitored alongside the rest of your application infrastructure, all from a single pane of glass. Scalability: Scaling to meet extra demand is a requirement of any IT environment, and microservices are no exception. MongoDB provides a scalable solution that automatically partitions and distributes the database across nodes, which can easily serve IT infrastructures that require dynamic and high-performance capabilities. Additionally, MongoDB is ideally suited to scale-out on commodity hardware with auto-sharding, which, if needed, allows the service to be easily distributed across different geographic regions. This is better from the monolithic, scale up design of traditional RDBMS because scaling in MongoDB is automatic and transparent. Manage Multiple Database Instances: In a microservices architecture it is best practice to dedicate a separate database for each service. This leads to multiple database instances, which can be difficult to manage. At MongoDB World 2016 , we announced MongoDB Atlas , which is hosted MongoDB as a Service. Developers don’t need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery of the database. MongoDB Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. Additionally, you can view, monitor, and manage all your MongoDB clusters from a single GUI, streamlining the management of your database clusters. To capture more business benefit, many organizations are also shifting microservices to the cloud. The dynamic nature of the cloud allows enterprises to spin instances up and down, while providing continuous availability in case of any failures. Considerations Before Moving to Microservices Though microservices offer many advantages, they are not appropriate for all deployments. There are several considerations to keep in mind before implementing a microservices project: Though microservices offer many advantages, they are not appropriate for all deployments. There are several considerations to keep in mind before implementing a microservices project: Monitoring Challenges: One of the biggest challenges for microservices is effectively monitoring the overall system. Monitoring one or two services is relatively straightforward, but effectively monitoring many services can be very challenging. Not only are there are more servers to monitor, but there are also more log files to analyze, as well as additional opportunities for network partitions. Traditional approaches to monitoring stats, such as CPU, memory, and network latencies are important, but enterprises also need to expand ways to view metrics about the system and how it behaves over a long period of time. Automating the monitoring process can help mitigate some of these challenges and reduce operational overhead. High Developer Skillset: Microservices is implemented on distributed systems, which are necessarily more complex. Network latency, hardware failures, unreliable networks, asynchronicity, and fault tolerance need to be dealt with gracefully and appropriately. In order to handle the added complexity, developers need to have a strong operations and production background. Developers can no longer create the application and hand it off to the operations team; they need to understand the interdependencies between DevOps, testing, and release in order to properly design a service. Before implementing a microservices architecture, it is important to determine if your team has the right capabilities to handle the associated complexities. More Operations Overhead: For a given monolithic application, it may require one application server cluster with a few processes, while a microservice application may comprise 50 services and 200 processes after adding in resiliency. Operating and monitoring all these new process can be a daunting task. Additionally, services need to be tested and quickly propagated through the continuous delivery pipeline, which requires proper tooling and skills. Incorrect Service Boundaries: It is imperative to establish the proper service boundaries during the design phase. A common problem is to create services from internal components without considering the proper service boundaries. As more functionality gets added, there is a risk that the team ends up building a giant distributed monolith. Getting the service boundaries incorrect may result in higher costs, overcoupled services, and more testing complexity. MongoDB Microservice Deployments MongoDB is deployed by thousands of organizations around the world, including over half of all Fortune 100 companies. Many enterprises use MongoDB in a microservices architecture to achieve their business and deployment goals. Comparethemarket.com is a one of the UK’s leading providers for price comparison services and uses MongoDB as the operational database behind its large microservice environment. Service uptime is critical, and MongoDB’s distributed design is key to ensure that SLA’s are always met. Comparethemarket.com’s deployment consists of microservices deployed in AWS. Each microservice, or logical grouping of related microservices, is provisioned with its own MongoDB replica set running in Docker containers , and deployed across multiple AWS Availability Zones to provide resiliency and high availability. MongoDB Ops Manager is used to provide the operational automation that is essential to launch new features quickly: deploying replica sets, providing continuous backups, and performing zero downtime upgrades. fuboTV is a streaming service in North America that streams sports content from soccer leagues all over the world and uses MongoDB as the core database for its microservices architecture. The traffic profile of the fuboTV streaming service is extremely bursty with the site typically handling 100x normal traffic volumes ten minutes before a match. To keep pace with business growth and demanding software release schedule, fuboTV migrated its MongoDB database to Docker containers managed by the Kubernetes orchestration system on the Google Cloud Platform. Figure 2: fuboTV Microservices Architecture This brings high levels of flexibility, efficiency, and uptime to fuboTV. Using containers managed by Kubernetes, fuboTV can provision all of its environments – development, test, QA and production – to a single cluster of physical hosts. Kubernetes scheduler is used to precisely control resource allocation across all of its apps, enabling fuboTV to maximize utilization and reduce costs. Kubernetes replication controller automatically reschedules containers if an instance fails — enabling fault resiliency and continuous availability. Data redundancy is provided by MongoDB replication within the replica set. This enables fuboTV to have zero downtime as it deploys and upgrades its applications OTTO is top German retailer for fashion and lifestyle goods that has two million daily site visitors. The problem was that OTTO had parallel teams spanning multiple business domains (business, project management, IT) that had various business problems but all needed to deliver results quickly. Independently, all the teams chose MongoDB as the best tool to quickly and easily achieve results. With loosely coupled teams, architecture, and operations, OTTO removed the bottleneck to deploy and test. Teams could quickly and iteratively correct errors and support continuous delivery. MongoDB was the driving force to enable OTTO’s business, IT, and project management teams to deliver fast results, drive development agility, and allow teams to innovate risk-free. Summary A microservices architecture provides many advantages over a monolithic architecture, but this does not imply microservices do not come without their own challenges. Proper planning and application decoupling is required to ensure that a microservices architecture will achieve your desired results. MongoDB is well suited for a microservices architecture with its ability to provide a flexible schema, redundancy, automation, and scalability. Together, MongoDB and microservices can help organizations align teams effectively, achieve faster innovation, and meet the challenges of a demanding new age in application development and delivery. Learn more about MongoDB and microservices. Read the white paper. Microservices: The Evolution of Building Modern Applications About the Author - Jason Ma Jason is a Principal Product Marketing Manager based in Palo Alto, and has extensive experience in technology hardware and software. He previously worked for SanDisk in Corporate Strategy doing M&A and investments, and as a Product Manager on the Infiniflash All-Flash JBOF. Before SanDisk, he worked as a HW engineer at Intel and Boeing. Jason has a BSEE from UC San Diego, MSEE from the University of Southern California, and an MBA from UC Berkeley.

June 7, 2016

Building Modern Applications with Microservices: Part 1

This is the first in a two post series about microservices. This post discusses the background behind microservices, new technologies that have enabled them, and the benefit of microservices. Introduction As enterprises work to replicate the development agility of internet companies and innovate in highly competitive markets, application development has grown increasingly complex. The large, monolithic codebases that traditionally power enterprise applications make it difficult to quickly launch new services. Siloed and potentially distributed development and operations teams present organizational alignment problems. On top of this, users are more demanding than ever – enterprises need to scale effectively and monitor deployments to ensure customers are provided with high performance and a consistent experience. Of course, all this needs to be done while providing always-on service availability. Due to these trends, there is demand for a software architecture pattern that can handle the requirements of the modern age. Monolithic architectures have been the traditional approach, but limitations with scaling, difficulties in maintaining a large codebase, high risk upgrades, and large upfront setup costs have compelled enterprises to explore different approaches. In the last few years, microservices have come to the forefront of the conversation. They have been rapidly adopted, due to their ability to provide modularity, scalability, high availability, as well as facilitate organizational alignment. The Monolith Before microservices, a common approach to application design was to use a monolithic architecture. In this mode of development, the application is developed, tested, packaged, and deployed as a single unit. Codebases are compiled together, and the application is deployed as one entity. Scaling required copying instances of the application binaries and the required libraries to different servers, and the application code typically ran as a single process. Continuous delivery — an approach that involves fast, iterative software development and safe updates to the deployed application — was challenging since the full monolithic application stack needed to be recompiled, relinked, and tested for even the smallest incremental release. What are Microservices? Microservices is a software architecture where applications are broken down into small autonomous services. Services are typically focused on a specific, discrete objective or function and decoupled along business boundaries. Separating services by business boundaries allows teams to focus on the right goals and also ensures autonomy between services. Each service is developed, tested, and deployed independently, and services are usually separated as independent processes that communicate over a network via agreed APIs, although in some cases that network may be local to the machine. Microservices grew from Service Oriented Architecture (SOA), which gained popularity in the early 2000s and emerged as a way to combat large monolithic applications. Key differences between SOA and microservices are: SOAs are stateful, while microservices are stateless SOAs tend to use a enterprise service bus for communication, while microservices use a less elaborate and simple messaging system SOAs may have hundreds or thousands of lines of code, while microservices could have less than one hundred lines SOAs put a greater emphasis on reusability (i.e. runtime code, databases), whereas microservices focus on decoupling as much as possible A systematic change in a SOA requires modifying the monolith, whereas a systematic change in a microservice is to create a new service SOAs use traditional relational databases more often, while microservices gravitate more towards modern, non-relational databases. Further sections will cover the advantages of non-relational databases over relational databases in a microservices architecture Many architects found that SOAs suffered problems with communication protocols and lacked sufficient guidelines on effectively separating services, which laid the foundation for microservices to emerge as a best practice method to implement a truly SOA. New Technologies Enable Microservices The downsides of deploying and provisioning hundreds and potentially thousands of services did not outweigh the benefits gained with a microservices architecture (faster development, scalability). The emergence of technologies such as containers (Docker, LXC) and orchestration frameworks (Kubernetes, Mesos) mitigate many of the problems that prevented using microservices architectures in the past. Containers are lightweight run-time environments that provide isolation and scalability with minimal impact to performance and capacity. Packaging is simplified as the same environment can simultaneously host development, support, test, and production versions of the application, so that going from dev to test to QA to production is easier. Containers work very well in a microservices environment as they isolate services to an individual container. Updating a service becomes a simple process to automate and manage, and changing one service will not impact other services, provided that APIs are maintained. Figure 1: Container in microservices When organizations start running containers at scale, many look to orchestration frameworks to help manage the increased complexity. Orchestration frameworks help deploy and manage containers: provision hosts, instantiate containers, handle failures, and provide automated scaling. Kubernetes and Mesos are popular orchestration frameworks that make it easier to deploy containers at massive scale in a microservice environment. To learn more about building microservices architectures with containers and MongoDB, download our guide: Enabling Microservices: Containers and Orchestration Explained Benefits of Microservices Many organizations can better meet the needs of modern application development by implementing microservices. The benefits include: Faster Time To Market: In a monolithic application, any small change in the application will require redeploying the entire application stack, which carries higher risk and complexity. This results in longer release cycles, as changes may be batched together and not released until reaching a minimum threshold. With microservices, a small change to a service can be committed, tested, and deployed immediately since changes are isolated from the rest of the system. Continuous integration — a software practice of integrating and testing developer changes to the main code branch multiple times a day — is much simpler and faster as there are fewer functions to test. This results in a more iterative release cadence as less code needs to be compiled and retested. Orchestration tools such as Kubernetes facilitate faster time to market by automating the on-line, rolling upgrade of containers, and providing the ability to roll back any changes should they be necessary. Flexibility and Scalability: Monolithic applications require all components of the system to scale together. If one service requires extra performance, the only option is to scale all the services rather than the individual service that needs additional capacity. With microservices, only the services that require extra performance need to be scaled. Scaling is achieved by deploying more containers, enabling more effective capacity planning, less software licensing costs, and lower TCO as the service and hardware can be matched more appropriately. Figure 2: Scaling containers Resiliency: A major issue with monolithic applications is that if a service fails, the whole application may be compromised. In microservices, service boundaries serve as natural isolation barriers to prevent cascading failures from bringing down the whole system. If using containers, orchestration frameworks can provide added resiliency: when one container fails, a new one is started, restoring full redundancy and capacity. Alignment With Organization: Microservices enable better alignment of the architecture to the organization, as team sizes can be optimally defined to match the required tasks. Teams can be broken down into smaller groups and focus on a single component of the application. This is especially useful for distributed teams. For example, if a team in Singapore handles three services, while a team in San Francisco handles five services, each team can release and deploy features and functionalities independently. This helps break down silos between teams and fosters better collab oration as cross discipline teams (Ops, Dev, QA) collectively own the services. This also ensures that the communication between teams matches the communication through the services' APIs. Essentially, the APIs between services define a contract between development teams on what each service should provide to others. Reduction in Cost: By using containers, applications and environments (design, test, production, support) can better share the same infrastructure, resulting in increased hardware utilization and reduced costs due to administrative simplification. In addition, microservices also help reduce technical debt. With a monolithic application, there are costs (time, resources) associated with refactoring code for a large application. By breaking the application into API accessible microservices, code refactoring can be done service by service, resulting in less time maintaining and updating code. In the second part of this blog post series , we will discuss how MongoDB enables microservices. Learn more about MongoDB and microservices. Read the white paper. Microservices: The Evolution of Building Modern Applications About the Author - Jason Ma Jason is a Principal Product Marketing Manager based in Palo Alto, and has extensive experience in technology hardware and software. He previously worked for SanDisk in Corporate Strategy doing M&A and investments, and as a Product Manager on the Infiniflash All-Flash JBOF. Before SanDisk, he worked as a HW engineer at Intel and Boeing. Jason has a BSEE from UC San Diego, MSEE from the University of Southern California, and an MBA from UC Berkeley.

May 6, 2016

At-Rest Encryption in MongoDB 3.2: Features and Performance

Introduction MongoDB 3.2 introduces a new option for at-rest data encryption. In this post we take a closer look at the forces driving the need for increased encryption, MongoDB features for encrypting your data, as well as the performance characteristics of the new Encrypted Storage Engine. Data security is top of mind for many executives due to increased attacks as well as a series of data breaches in recent years that have negatively impacted several high profile brands. For example, in 2015, a major health insurer was a victim of a massive data breach in which criminals gained access to the Social Security numbers of more than 80 million people ­— resulting in an estimated cost of $100M. In the end, one of the critical vulnerabilities was the health insurer did not encrypt sensitive patient data stored at-rest. Data encryption is a key part of a comprehensive strategy to protect sensitive data. However, encrypting and decrypting data is potentially very resource intensive. It is important to understand the performance characteristics of your encryption technology to accurately conduct capacity planning. MongoDB 3.2: Delivering Native Encryption At-Rest MongoDB 3.2 provides a comprehensive encryption solution that protects your data, both in-flight and at-rest. For encryption-in-flight, MongoDB uses SSL/TLS, which ensures secure communication between your database and client, as well as inter-cluster traffic between nodes. Learn more about MongoDB and SSL/TLS . With the latest version 3.2 , MongoDB also includes a fully integrated encryption-at-rest solution that reduces cost and performance overhead. Encryption-at-rest is part of MongoDB Enterprise Advanced only, but is freely available for development and evaluation. We will take a closer look at this new option later in the post. Before 3.2, the primary methods to provide encryption-at-rest were to use 3rd party applications that encrypt files at the application, file system, or disk level. These methods work well with MongoDB but can add extra cost, complexity, and overhead. Additionally, disk and file system encryption might not protect against all situations. While disk level encryption protects from someone taking the physical drive from the machine, it does not protect from someone that has physical access to the machine and can override the file system. Similarly, file system encryption will prevent someone from overriding the file system, but does not preclude someone from gaining unauthorized access through the application or database layer. Database encryption mitigates these problems by adding an extra layer of security. Even if an administrator has access to the file system, he/she will first need to be authenticated to the database before decrypting the data files. MongoDB’s Encrypted Storage Engine supports a variety of encryption algorithms from the OpenSSL library. AES-256 in CBC mode is the default, while other options include GCM mode, as well as FIPS mode for FIPS-140-2 compliance. Encryption is performed at the page level to provide optimal performance. Instead of having to encrypt/decrypt the entire file or database for each change, only the modified pages need to be encrypted or decrypted. Additionally, the Encrypted Storage Engine provides safe and secure management of the encryption keys. Each encrypted node contains an internal database key that is used to encrypt/decrypt the data files. The database key is wrapped with an external master key, which must be given to the node for it to initialize. MongoDB uses operating system protection mechanisms, such as VirtualLock and mlock , that lock the process’ virtual memory space into memory, ensuring that keys are never written or paged to disk in unencrypted form. Evaluating Performance Encrypting and decrypting data requires the use of additional resources, and administrators will want to understand the performance impact to adjust capacity planning accordingly. In our Encrypted Storage Engine benchmarking tests, we saw an average throughput overhead between 10% and 20%. Let’s take a closer look at some benchmark data to show the results for Insert Only, Read Only, and 50%-Read/50%-Insert workloads. For our benchmark, we used Intel Xeon X5675 CPUs, which support the AES-NI instruction set, and ran the CPUs at high load(100%). There were four different configurations that we evaluated; “ Working Set Fits In Memory ”, “ Working Set Exceeds Memory ”, “ Encrypted ”, and “ Unencrypted ”. The ‘ Working Set ’ refers to the amount of data and indexes that is actively used by your system. Let’s first look at an Insert-Only workload. With a high CPU load, we see an encryption overhead of around ~16%. Now, let’s take a look at the results of our Read-Only Workload. We ran the benchmark between two scenarios; “ Working Set Fits In Memory ” and “ Working Set Exceeds Memory ”. From the benchmark results, the decryption overhead for a Read-Only workload ranges between 5–20%. Lastly, here are the benchmark results for a 50%-Read, 50%-Insert workload. For the 50%-Read/50%-Insert workloads, the encryption overhead ranges between 12%–20%. In addition to throughput, latency is also a critical component of encryption overhead. From our benchmark, average latency overheads ranged between 6% to 30%. Though average latency overhead was slightly higher than throughput overhead, latencies were still very low—all under 1ms. Average Latency(us) Unencrypted Encrypted % Overhead Insert Only Average Latency(us) 32.4 40.9 -26.5% Read Only Working Set Fits In Memory Avg Latency(us) 230.5 245.0 -6.3% Read Only Working Set Exceeds Memory Avg Latency(us) 447.0 565.8 -26.6% 50% Insert/50% Read Working Set Fits In Memory Avg Latency(us) 276.1 317.4 -15.0% 50% Insert/50% Read Working Set Exceeds Memory Avg Latency(us) 722.3 936.5 -29.7% MongoDB Atlas Encryption At Rest MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime. MongoDB Atlas provides encryption of data-in-flight over the network and at rest on disk. Data-at-rest can be optionally protected using encrypted data volumes. Encrypted data volumes secure your data without the need for you to build, maintain, and secure your own key management infrastructure. Summary In this post, we looked at a few workloads to determine the impact of encryption with MongoDB's new Encrypted Storage Engine. The results demonstrate that the Encrypted Storage Engine provides a secure way to encrypt your data-at-rest, while maintaining exceptional performance. With the Encrypted Storage Engine and diligent capacity planning, you shouldn't have to make a tradeoff between high performance and strong security when encrypting data-at-rest. For users interested in a database as a service, MongoDB Atlas provides encrypted data volumes to ensure your data at rest is secure. Environment These tests were conducted on bare metal servers. Each server had the following specification: CPU : 3.06GHz Intel Xeon Westmere(X5675-Hexcore) RAM : 6x16GB Kingston 16GB DDR3 2Rx4 OS : Ubuntu 14.04-64 Network Card : SuperMicro AOC-STGN-i2S Motherboard : SuperMicro X8DTN+_R2 Document Size : 1KB Workload : YCSB Version : MongoDB 3.2 Learn More About Encryption and all of the security features available for MongoDB by reading our guide. MongoDB Security Architecture Guide Additional Resources Try MongoDB’s New Encrypted Storage Engine. Users can try the Encrypted Storage Engine free for unlimited development and evaluation. Read our installing MongoDB Enterprise 3.2 documentation . About the Author - Jason Ma Jason is a Principal Product Marketing Manager based in Palo Alto, and has extensive experience in technology hardware and software. He previously worked for SanDisk in Corporate Strategy doing M&A and investments, and as a Product Manager on the Infiniflash All-Flash JBOF. Before SanDisk, he worked as a HW engineer at Intel and Boeing. Jason has a BSEE from UC San Diego, MSEE from the University of Southern California, and an MBA from UC Berkeley.

March 4, 2016

Announcing the MongoDB World Startup Showcase

Startups are not for the faint of heart. A friend of mine once told me about the challenges he faced in starting his company. To come up with the capital to manufacture his first batch of units, he mortgaged everything he had, slept under his office desk for four months, and worked 16-hour days tirelessly. Luckily, he was able to get his first batch of units to customers on time, and the market reception was overwhelmingly positive. Fast forward nine years and my friend’s company is now publicly traded, with over 200 employees. The company recently shipped its 15 millionth device. Hard work and success stories aside, the first step for entrepreneurs to lift their startup off the ground is getting visibility, generating buzz, and showcasing their product in front of the right audience. At the MongoDB World Startup Showcase , we’re excited to give entrepreneurs a chance to demo their product in front of venture capitalists, the media, and potential employees. Participants will be invited to give their elevator pitch and will be able to showcase their products. Benefits Elevator pitch in the Giant Ideas Theatre - attract investors, employees, and media Two complimentary tickets to MongoDB World 2016 * Inclusion in MongoDB World marketing and press campaigns Promotion in MongoDB social media campaigns One year complimentary subscription to Cloud Manager , the easiest way to run MongoDB** *Hotel and airfare not included **Does not include backup Eligibility Participating companies must be building a product or service, employ less than 100 people, and actively use MongoDB. The details are here . Submission Submit an online application . In your responses, make sure to address the following questions: Company Description : Give us some information about your company, including company name, product, website URL, videos, etc. Target Market : Describe the target market and audience for your product or service. Preferred markets are Data Analytics, Mobile, Machine Learning, and IoT. MongoDB Usage : Describe how your company uses MongoDB. Team Background : Include some background information about your team. Applications must be submitted by 5:00 PM(PST) on March 25, 2016 . Submit your application We are looking forward to hearing from you!

March 3, 2016