Raphael Londner

13 results

An Introduction to Change Streams

There is tremendous pressure for applications to immediately react to changes as they occur. As a new feature in MongoDB 3.6 , change streams enable applications to stream real-time data changes by leveraging MongoDB’s underlying replication capabilities. Think powering trading applications that need to be updated in real-time as stock prices change. Or creating an IoT data pipeline that generates alarms whenever a connected vehicle moves outside of a geo-fenced area. Or updating dashboards, analytics systems, and search engines as operational data changes. The list, and the possibilities, go on, as change streams give MongoDB users easy access to real-time data changes without the complexity or risk of tailing the oplog (operation log). Any application can readily subscribe to changes and immediately react by making decisions that help the business to respond to events in real-time. Change streams can notify your application of all writes to documents (including deletes) and provide access to all available information as changes occur, without polling that can introduce delays, incur higher overhead (due to the database being regularly checked even if nothing has changed), and lead to missed opportunities. Characteristics of change streams Targeted changes Changes can be filtered to provide relevant and targeted changes to listening applications. As an example, filters can be on operation type or fields within the document. Resumablility Resumability was top of mind when building change streams to ensure that applications can see every change in a collection. Each change stream response includes a resume token. In cases where the connection between the application and the database is temporarily lost, the application can send the last resume token it received and change streams will pick up right where the application left off. In cases of transient network errors or elections, the driver will automatically make an attempt to reestablish a connection using its cached copy of the most recent resume token. However, to resume after application failure, the applications needs to persist the resume token, as drivers do not maintain state over application restarts. Total ordering MongoDB 3.6 has a global logical clock that enables the server to order all changes across a sharded cluster. Applications will always receive changes in the order they were applied to the database. Durability Change streams only include majority-committed changes. This means that every change seen by listening applications is durable in failure scenarios such as a new primary being elected. Security Change streams are secure – users are only able to create change streams on collections to which they have been granted read access. Ease of use Change streams are familiar – the API syntax takes advantage of the established MongoDB drivers and query language, and are independent of the underlying oplog format. Idempotence All changes are transformed into a format that’s safe to apply multiple times. Listening applications can use a resume token from any prior change stream event, not just the most recent one, because reapplying operations is safe and will reach the same consistent state. An example Let’s imagine that we run a small grocery store. We want to build an application that notifies us every time we run out of stock for an item. We want to listen for changes on our stock collection and reorder once the quantity of an item gets too low. { _id: 123UAWERXHZK4GYH product: pineapple quantity: 3 } Setting up the cluster As a distributed database, replication is a core feature of MongoDB, mirroring changes from the primary replica set member to secondary members, enabling applications to maintain availability in the event of failures or scheduled maintenance. Replication relies on the oplog (operation log). The oplog is a capped collection that records all of the most recent writes, it is used by secondary members to apply changes to their own local copy of the database. In MongoDB 3.6, change streams enable listening applications to easily leverage the same internal, efficient replication infrastructure for real-time processing. To use change streams, we must first create a replica set. Download MongoDB 3.6 and after installing it, run the following commands to set up a simple, single-node replica set (for testing purposes). mkdir -pv data/db mongod --dbpath ./data/db --replSet "rs" Then in a separate shell tab, run: mongo After the rs:PRIMARY> prompt appears, run: rs.initiate() If you have any issues, check out our documentation on creating a replica set . Seeing it in action Now that our replica set is ready, let’s create a few products in a demo database using the following Mongo shell script: conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs"); db = conn.getDB("demo"); collection = db.stock; var docToInsert = { name: "pineapple", quantity: 10 }; function sleepFor(sleepDuration) { var now = new Date().getTime(); while (new Date().getTime() < now + sleepDuration) { /* do nothing */ } } function create() { sleepFor(1000); print("inserting doc..."); docToInsert.quantity = 10 + Math.floor(Math.random() * 10); res = collection.insert(docToInsert); print(res) } while (true) { create(); } Copy the code above into a createProducts.js text file and run it in a Terminal window with the following command: mongo createProducts.js . Creating a change stream application Now that we have documents being constantly added to our MongoDB database, we can create a change stream that monitors and handles changes occurring in our stock collection: conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs"); db = conn.getDB("demo"); collection = db.stock; const changeStreamCursor = collection.watch(); pollStream(changeStreamCursor); //this function polls a change stream and prints out each change as it comes in function pollStream(cursor) { while (!cursor.isExhausted()) { if (cursor.hasNext()) { change = cursor.next(); print(JSON.stringify(change)); } } pollStream(cursor); } By using the parameterless watch() method, this change stream will signal every write to the stock collection. In the simple example above, we’re logging the change stream's data to the console. In a real-life scenario, your listening application would do something more useful (such as replicating the data into a downstream system, sending an email notification, reordering stock...). Try inserting a document through the mongo shell and see the changes logged in the Mongo Shell. Creating a targeted change stream Remember that our original goal wasn’t to get notified of every single update in the stock collection, just when the inventory of each item in the stock collection falls below a certain threshold. To achieve this, we can create a more targeted change stream for updates that set the quantity of an item to a value no higher than 10. By default, update notifications in change streams only include the modified and deleted fields (i.e. the document “deltas”), but we can use the optional parameter fullDocument: "updateLookup" to include the complete document within the change stream, not just the deltas. const changeStream = collection.watch( [{ $match: { $and: [ { "updateDescription.updatedFields.quantity": { $lte: 10 } }, { operationType: "update" } ] } }], { fullDocument: "updateLookup" } ); Note that the fullDocument property above reflects the state of the document at the time lookup was performed, not the state of the document at the exact time the update was applied. Meaning, other changes may also be reflected in the fullDocument field. Since this use case only deals with updates, it was preferable to build match filters using updateDescription.updatedFields , instead of fullDocument . The full Mongo shell script of our filtered change stream is available below: conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs"); db = conn.getDB("demo"); collection = db.stock; let updateOps = { $match: { $and: [ { "updateDescription.updatedFields.quantity": { $lte: 10 } }, { operationType: "update" } ] } }; const changeStreamCursor = collection.watch([updateOps]); pollStream(changeStreamCursor); //this function polls a change stream and prints out each change as it comes in function pollStream(cursor) { while (!cursor.isExhausted()) { if (cursor.hasNext()) { change = cursor.next(); print(JSON.stringify(change)); } } pollStream(cursor); } In order to test our change stream above, let’s run the following script to set the quantity of all our current products to values less than 10: conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs"); db = conn.getDB("demo"); collection = db.stock; let updatedQuantity = 1; function sleepFor(sleepDuration) { var now = new Date().getTime(); while (new Date().getTime() < now + sleepDuration) { /* do nothing */ } } function update() { sleepFor(1000); res = collection.update({quantity:{$gt:10}}, {$inc: {quantity: -Math.floor(Math.random() * 10)}}, {multi: true}); print(res) updatedQuantity = res.nMatched + res.nModified; } while (updatedQuantity > 0) { update(); } You should now see the change stream window display the update shortly after the script above updates our products in the stock collection. Resuming a change stream In most cases, drivers have retry logic to handle loss of connections to the MongoDB cluster (such as , timeouts, or transient network errors, or elections). In cases where our application fails and wants to resume, we can use the optional parameter resumeAfter : < resumeToken >, as shown below: conn = new Mongo("mongodb://localhost:27017/demo?replicaSet=rs"); db = conn.getDB("demo"); collection = db.stock; const changeStreamCursor = collection.watch(); resumeStream(changeStreamCursor, true); function resumeStream(changeStreamCursor, forceResume = false) { let resumeToken; while (!changeStreamCursor.isExhausted()) { if (changeStreamCursor.hasNext()) { change = changeStreamCursor.next(); print(JSON.stringify(change)); resumeToken = change._id; if (forceResume === true) { print("\r\nSimulating app failure for 10 seconds..."); sleepFor(10000); changeStreamCursor.close(); const newChangeStreamCursor = collection.watch([], { resumeAfter: resumeToken }); print("\r\nResuming change stream with token " + JSON.stringify(resumeToken) + "\r\n"); resumeStream(newChangeStreamCursor); } } } resumeStream(changeStreamCursor, forceResume); } With this resumability feature, MongoDB change streams provide at-least-once semantics. It is therefore up to the listening application to make sure that it has not already processed the change stream events. This is especially important in cases where the application’s actions are not idempotent (for instance, if each event triggers a wire transfer). All the of shell scripts examples above are available in the following GitHub repository . You can also find similar Node.js code samples here , where a more realistic technique is used to persist the last change stream token before it is processed. Next steps I hope that this introduction gets you excited about the power of change streams in MongoDB 3.6. If you want to know more: Watch Aly’s session about Change Streams Read the Change Streams documentation Try out Change Streams examples in Python, Java, C, C# and Node.js Read the What’s new in MongoDB 3.6 white paper Take MongoDB University’s M036: New Features and Tools and Tools in MongoDB 3.6 course If you have any question, feel free to file a ticket at https://jira.mongodb.org or connect with us through one of the social channels we use to interact with the developer community. About the authors – Aly Cabral and Raphael Londner Aly Cabral is a Product Manager at MongoDB. With a focus on Distributed Systems (i.e. Replication and Sharding), when she hears the word election she doesn’t think about politics. You can follow her or ask any questions on Twitter at @aly_cabral Raphael Londner is a Principal Developer Advocate at MongoDB. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner Get Started with MongoDB Atlas Run MongoDB in the cloud for free with MongoDB Atlas. No credit card required. Try Free

February 6, 2018

Integrating MongoDB Atlas with Heroku Private Spaces

Introduction Heroku and MongoDB Atlas are the perfect fit for modern, cloud-based app development and deployment. Since its inception in 2007, Heroku has been a PaaS (Platform-as-a-Service) favorite of developer and operations teams thanks to its tight integration to CI tools and ease of app deployment. MongoDB is also a long-time favorite of developers who value increasing their productivity and decreasing application development cycles. MongoDB’s fully managed DBaaS (Database-as-a-Service), Atlas , is also popular among cloud DevOps teams, who are naturally demanding a strong integration between Heroku and MongoDB Atlas. Today, we are happy to present a tutorial showcasing how to securely integrate Heroku with MongoDB Atlas. Protecting your cloud data assets with MongoDB Atlas MongoDB Atlas provides industry-grade, out-of-the-box security controls: encrypted data in-flight and at-rest, encrypted backups, authentication enabled by default, IP whitelisting and VPC Peering (with customer-owned AWS accounts) are strong safeguards MongoDB provides its users to ensure their data is safe in the cloud. Companies hosting their MongoDB Atlas-backed applications on Heroku typically require that their data be only accessed by their applications. This has proved to be challenging in most Heroku deployments, which typically don’t offer guarantees that requests performed by their hosted applications originate from fixed IPs or a fixed range of IPs (defined as CIDR blocks). With Heroku Private Spaces however, companies can combine Heroku powerful developer experience with enterprise-grade secure network topologies. More specifically, peering a Heroku Private Space with a MongoDB Atlas cluster running in AWS is a straightforward option to secure the communication between a Heroku-deployed application and a MongoDB Atlas database, by using MongoDB Atlas VPC Peering capabilities . The tutorial below goes through the specific steps required to link a Heroku Private Space with a MongoDB Atlas project. Initiating the VPC Peering request The first step is to initiate the VPC Peering request on the Atlas side. To do so, it’s necessary to retrieve a few parameters from the Heroku Private Space, by using the Heroku CLI . After logging in with an account having access to a Private Space, use the spaces:peering:info command to retrieve the AWS information required by MongoDB Atlas: heroku spaces:peering:info <your_private_space_name> In the screenshot above, I chose to use a Private Space hosted in the us-west-2 AWS region (aptly prefixed "oregon-*"), since my M10 MongoDB Atlas cluster is also deployed in that region. Copy the AWS Account ID , AWS Region , AWS VPC ID and AWS VPC CIDR values from the Heroku console above. Now, head over the MongoDB Atlas website and navigate to the Security tab of your cluster (M10 or above and in the same region as your Heroku Private Space). Select the +New Peering Connection button and fill out the form with the values you previously copied: Press the Initiate Peering button, and verify that the VPC Peering request appears in Atlas’ VPC Peering list (with a "Waiting for Approval" status): Approving the VPC Peering request Now that the VPC Peering request has been initiated on the MongoDB Atlas side, let’s approve it on the Heroku side. In the Heroku console, the following command should display the request we just created in MongoDB Atlas: heroku spaces:peerings <your_private_space_name> Take note of the PCX ID value of your VPC Peering ID and pass it to Heroku space:peerings:accept command: heroku spaces:peerings:accept <your_PCX_ID> --space <your_private_space_name> Verifying that VPC Peering works The first step to verify that VPC Peering has been properly set up between your Heroku Private Space and MongoDB Atlas is by running the following Heroku command again: heroku spaces:peerings <your_private_space_name> The peering connection should now appear as active. In MongoDB Atlas, the peering connection should now also appear as available: The next verification step would be to run an Heroku-deployed app connected to your MongoDB Atlas cluster and verify that you can read from or write to it. For instance, you could clone this GitHub repository , customize its config.js file with your MongoDB Atlas connection string, and deploy its atlas-test branch to your Heroku Private Space using Heroku GitHub Deploys . Since Heroku automatically runs npm start for each Node-detected app, it will keep calling the produce.js script. As a result, documents should be created in the devices collection of a demo database in your Atlas cluster (if it doesn’t, I recommend that you first verify that the CIDR block of your Heroku Private Space is present in the IP Whitelist of your MongoDB Atlas cluster). Next steps I hope that you found this Heroku-MongoDB Atlas integration tutorial useful. As next steps, I recommend the following: Sign up for MongoDB Atlas if you don’t already use it. Watch a VPC Peering video tutorial with MongoDB Atlas. Get more familiar with MongoDB Atlas documentation Contact Heroku if you don’t already have access to a Private Space. Explore the Heroku Private Spaces documentation . About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

January 16, 2018

JSON Schema Validation and Expressive Query Syntax in MongoDB 3.6

One of MongoDB’s key strengths has always been developer empowerment: by relying on a flexible schema architecture, MongoDB makes it easier and faster for applications to move through the development stages from proof-of-concept to production and iterate over update cycles as requirements evolve. However, as applications mature and scale, they tend to reach a stable stage where frequent schema changes are no longer critical or must be rolled out in a more controlled fashion, to prevent undesirable data from being inserted into the database. These controls are especially important when multiple applications write into the same database, or when analytics processes rely on predefined data structures to be accurate and useful. MongoDB 3.2 was the first release to introduce Document Validation , one of the features that developers and DBAs who are accustomed to relational databases kept demanding. As MongoDB’s CTO, Eliot Horowitz, highlighted in Document Validation and What Dynamic Schemas means : Along with the rest of the 3.2 "schema when you need it" features, document validation gives MongoDB a new, powerful way to keep data clean . These are definitely not the final set of tools we will provide, but is rather an important step in how MongoDB handles schema . Announcing JSON Schema Validation support Building upon MongoDB 3.2’s Document Validation functionality, MongoDB 3.6 introduces a more powerful way of enforcing schemas in the database, with its support of JSON Schema Validation, a specification which is part of IETF’s emerging JSON Schema standard. JSON Schema Validation extends Document Validation in many different ways, including the ability to enforce schemas inside arrays and prevent unapproved attributes from being added. These are the new features we will focus on in this blog post, as well as the ability to build business validation rules. Starting with MongoDB 3.6, JSON Schema is the recommended way of enforcing Schema Validation . The next section highlights the features and benefits of using JSON Schema Validation. Switching from Document Validation to JSON Schema Validation We will start by creating an orders collection (based on an example we published in the Document Validation tutorial blog post ): db.createCollection("orders", { validator: { item: { $type: "string" }, price: { $type: "decimal" } } }); With this document validation configuration, we not only make sure that both the item and price attributes are present in any order document, but also that item is a string and price a decimal (which is the recommended type for all currency and percentage values). Therefore, the following element cannot be inserted (because of the "rogue" *price *attribute): db.orders.insert({ "_id": 6666, "item": "jkl", "price": "rogue", "quantity": 1 }); However, the following document could be inserted (notice the misspelled "pryce" attribute): db.orders.insert({ "_id": 6667, "item": "jkl", "price": NumberDecimal("15.5"), "pryce": "rogue" }); Prior to MongoDB 3.6, you could not prevent the addition of misspelled or unauthorized attributes. Let’s see how JSON Schema Validation can prevent this behavior. To do so, we will use a new operator, $jsonSchema : db.runCommand({ collMod: "orders", validator: { $jsonSchema: { bsonType: "object", required: ["item", "price"], properties: { item: { bsonType: "string" }, price: { bsonType: "decimal" } } } } }); The JSON Schema above is the exact equivalent of the document validation rule we previously set above on the orders collection. Let’s check that our schema has indeed been updated to use the new $jsonSchema operator by using the db.getCollectionInfos() method in the Mongo shell: db.getCollectionInfos({name:"orders"}) This command prints out a wealth of information about the orders collection. For the sake of readability, below is the section that includes the JSON Schema: ... "options" : { "validator" : { "$jsonSchema" : { "bsonType" : "object", "required" : [ "item", "price" ], "properties" : { "item" : { "bsonType" : "string" }, "price" : { "bsonType" : "decimal" } } } }, "validationLevel" : "strict", "validationAction" : "error" } ... Now, let’s enrich our JSON schema a bit to make better use of its powerful features: db.runCommand({ collMod: "orders", validator: { $jsonSchema: { bsonType: "object", additionalProperties: false , required: ["item", "price"], properties: { _id: {} , item: { bsonType: "string", description: "'item' must be a string and is required" }, price: { bsonType: "decimal", description: "'price' must be a decimal and is required" }, quantity: { bsonType: ["int", "long"] , minimum: 1, maximum: 100, exclusiveMaximum: true, description: "'quantity' must be short or long integer between 1 and 99" } } } } }); Let’s go through the additions we made to our schema: First, note the use of the additionalProperties:false attribute: it prevents us from adding any attribute other than those mentioned in the properties section. For example, it will no longer be possible to insert data containing a misspelled pryce attribute. As a result, the use of additionalProperties:false at the root level of the document also makes the declaration of the _id property mandatory: whether our insert code explicitly sets it or not, it is a field MongoDB requires and would automatically create, if not present. Thus, we must include it explicitly in the properties section of our schema. Second, we have chosen to declare the quantity attribute as either a short or long integer between 1 and 99 (using the minimum , maximum and exclusiveMaximum attributes). Of course, because our schema only allows integers lower than 100, we could simply have set the bsonType property to int . But adding long as a valid type makes application code more flexible, especially if there might be plans to lift the maximum restriction. Finally, note that the description attribute (present in the item , price , and quantity attribute declarations) is entirely optional and has no effect on the schema aside from documenting the schema for the reader. With the schema above, the following documents can be inserted into our orders collection: db.orders.insert({ "item": "jkl", "price": NumberDecimal(15.50), "quantity": NumberInt(99) }); db.orders.insert({ "item": "jklm", "price": NumberDecimal(15.50), "quantity": NumberLong(99) }); However, the following documents are no longer considered valid: db.orders.insert({ "item": "jkl", "price": NumberDecimal(15.50), "quantity": NumberInt(100) }); db.orders.insert({ "item": "jkl", "price": NumberDecimal(15.50), "quantity": "98" }); db.orders.insert({ "item": "jkl", "pryce": NumberDecimal(15.50), "quantity": NumberInt(99) }); You probably noticed that our orders above are seemingly odd: they only contain one single item. More realistically, an order consists of multiple items and a possible JSON structure might be as follows: { _id: 10000, total: NumberDecimal(141), VAT: 0.20, totalWithVAT: NumberDecimal(169), lineitems: [ { sku: "MDBTS001", name: "MongoDB Stitch T-shirt", quantity: NumberInt(10), unit_price:NumberDecimal(9) }, { sku: "MDBTS002", quantity: NumberInt(5), unit_price: NumberDecimal(10) } ] } With MongoDB 3.6, we can now control the structure of the lineitems array, for instance with the following JSON Schema: db.runCommand({ collMod: "orders", validator: { $jsonSchema: { bsonType: "object", required: ["lineitems"], properties: { lineitems: { bsonType: ["array"], minItems: 1, maxItems:10, items: { required: ["unit_price", "sku", "quantity"], bsonType: "object", additionalProperties: false, properties: { sku: { bsonType: "string", description: "'sku' must be a string and is required" }, name: { bsonType: "string", description: "'name' must be a string" }, unit_price: { bsonType: "decimal", description: "'unit_price' must be a decimal and is required" }, quantity: { bsonType: ["int", "long"], minimum: 0, maximum: 100, exclusiveMaximum: true, description: "'quantity' must be a short or long integer in [0, 100)" }, } } } } } } }); With the schema above, we enforce that any order inserted or updated in the orders collection contain a lineitems array of 1 to 10 documents that all have sku , unit_price and quantity attributes (with quantity required to be an integer). The schema would prevent inserting the following, badly formed document: db.orders.insert({ total: NumberDecimal(141), VAT: NumberDecimal(0.20), totalWithVAT: NumberDecimal(169), lineitems: [ { sku: "MDBTS001", name: "MongoDB Stitch T-shirt", quantity: NumberInt(10), price: NumberDecimal(9) //this should be 'unit_price' }, { name: "MDBTS002", //missing a 'sku' property quantity: NumberInt(5), unit_price: NumberDecimal(10) } ] }) But it would allow inserting the following, schema-compliant document: db.orders.insert({ total: NumberDecimal(141), VAT: NumberDecimal(0.20), totalWithVAT: NumberDecimal(169), lineitems: [ { sku: "MDBTS001", name: "MongoDB Stitch T-shirt", quantity: NumberInt(10), unit_price: NumberDecimal(9) }, { sku: "MDBTS002", quantity: NumberInt(5), unit_price: NumberDecimal(10) } ] }) However, if you pay close attention to the order above, you may notice that it contains a few errors: The totalWithVAT attribute value is incorrect (it should be equal to 141*1.20=169.2) The total attribute value is incorrect (it should be equal to the sum of each line item sub-total, (i.e. 10*9+10*5=140) Is there any way to enforce that total and totalWithVAT values be correct using database validation rules, without relying solely on application logic? Introducing MongoDB expressive query syntax Adding more complex business validation rules is now possible thanks to the expressive query syntax, a new feature of MongoDB 3.6. One of the objectives of the expressive query syntax is to bring the power of MongoDB’s aggregation expressions to MongoDB’s query language . An interesting use case is the ability to compose dynamic validation rules that compute and compare multiple attribute values at runtime. Using the new $expr operator, it is possible to validate the value of the totalWithVAT attribute with the following validation expression: $expr: { $eq: [ "$totalWithVAT", {$multiply: [ "$total", {$sum: [1, "$VAT"]} ]} ] } The above expression checks that the totalWithVAT attribute value is equal to total * (1+VAT) . In its compact form, here is how we could use it as a validation rule, alongside our JSON Schema validation: db.runCommand({ collMod: "orders", validator: { $expr:{$eq:[ "$totalWithVAT", {$multiply:["$total", {$sum:[1,"$VAT"]}]} ]} , $jsonSchema: { bsonType: "object", required: ["lineitems"], properties: { lineitems: { bsonType: ["array"], minItems: 1, maxItems:10, items: { required: ["unit_price", "sku", "quantity"], bsonType: "object", additionalProperties: false, properties: { sku: { bsonType: "string", description: "'sku' must be a string and is required" }, name: { bsonType: "string", description: "'name' must be a string" }, unit_price: { bsonType: "decimal", description: "'unit_price' must be a decimal and is required" }, quantity: { bsonType: ["int", "long"], minimum: 0, maximum: 100, exclusiveMaximum: true, description: "'quantity' must be a short or long integer in [0, 100)" }, } } } } } } }); With the validator above, the following insert operation is no longer possible: db.orders.insert({ total: NumberDecimal(141), VAT: NumberDecimal(0.20), totalWithVAT: NumberDecimal(169), lineitems: [ { sku: "MDBTS001", name: "MongoDB Stitch T-shirt", quantity: NumberInt(10), Unit_price: NumberDecimal(9) }, { sku: "MDBTS002", quantity: NumberInt(5), unit_price: NumberDecimal(10) } ] }) Instead, the totalWithVAT value must be adjusted according to our new VAT validation rule: db.orders.insert({ total: NumberDecimal(141), VAT: NumberDecimal(0.20), totalWithVAT: NumberDecimal(169.2) , lineitems: [ { sku: "MDBTS001", name: "MongoDB Stitch T-shirt", quantity: NumberInt(10), unit_price: NumberDecimal(9) }, { sku: "MDBTS002", quantity: NumberInt(5), unit_price: NumberDecimal(10) } ] }) If we also want to make sure that the total value is the sum of each order line item value (i.e. quantity*unit_price ), the following expression should be used: $expr: { $eq: [ "$total", {$sum: { $map: { "input": "$lineitems", "as": "item", "in": { "$multiply": [ "$$item.quantity", "$$item.unit_price" ] } } }} ] } The above expression uses the $map operator to compute each line item’s sub-total, then sums all these sub-totals, and finally compares it to the total value. To make sure that both the Total and VAT validation rules are checked, we must combine them using the $and operator. Finally, our collection validator can be updated with the following command: db.runCommand({ collMod: "orders", validator: { $expr:{ $and:[ {$eq:[ "$totalWithVAT", {$multiply:["$total", {$sum:[1,"$VAT"]}]} ]}, {$eq: [ "$total", {$sum: {$map: { "input": "$lineitems", "as": "item", "in":{"$multiply":["$$item.quantity","$$item.unit_price"]} }}} ]} ]}, $jsonSchema: { bsonType: "object", required: ["lineitems", "total", "VAT", "totalWithVAT"], properties: { total: { bsonType: "decimal" }, VAT: { bsonType: "decimal" }, totalWithVAT: { bsonType: "decimal" }, lineitems: { bsonType: ["array"], minItems: 1, maxItems:10, items: { required: ["unit_price", "sku", "quantity"], bsonType: "object", additionalProperties: false, properties: { sku: {bsonType: "string"}, name: {bsonType: "string"}, unit_price: {bsonType: "decimal"}, quantity: { bsonType: ["int", "long"], minimum: 0, maximum: 100, exclusiveMaximum: true }, } } } } } } }); Accordingly, we must update the total and totalWithVAT properties to comply with our updated schema and business validation rules (without changing the lineitems array): db.orders.insert({ total: NumberDecimal(140), VAT: NumberDecimal(0.20), totalWithVAT: NumberDecimal(168), lineitems: [ { sku: "MDBTS001", name: "MongoDB Stitch T-shirt", quantity: NumberInt(10), unit_price: NumberDecimal(9) }, { sku: "MDBTS002", quantity: NumberInt(5), unit_price: NumberDecimal(10) } ] }) Next steps With the introduction of JSON Schema Validation in MongoDB 3.6, database administrators are now better equipped to address data governance requirements coming from compliance officers or regulators, while still benefiting from MongoDB’s flexible schema architecture. Additionally, developers will find the new expressive query syntax useful to keep their application code base simpler by moving business logic from the application layer to the database layer. If you want to learn more about everything new in MongoDB 3.6, download our What’s New guide . If you want to get deeper on the technical side, visit the Schema Validation and Expressive Query Syntax pages in our official documentation. If you want to get more practical, hands-on experience, take a look at this JSON Schema Validation hands-on lab . You can try it right away on MongoDB Atlas , which supports MongoDB 3.6 since its general availability date. Last but not least, sign up for our free MongoDB 3.6 training from MongoDB University. About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

December 13, 2017

Using Amazon Lex, Lambda, & MongoDB Atlas to Build a Voice-Activated Movie Search App - Part 3

It's that time of year again! This post is part of our Road to AWS re:Invent 2017 blog series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud. See all posts here. Introduction This is Part 3 of our Amazon Lex blog post series, part of our larger Road to re:Invent 2017 series . As a reminder, this tutorial is divided into 3 parts: Part 1: Lex overview, demo scenario and data layer setup Part 2: Set up and test an Amazon Lex bot Part 3: Deploy a Lambda function as our Lex bot fulfillment (this blog post) In this last blog post, we will deploy our Lambda function using the AWS Command Line Interface and verify that the bot fully works as expected. We’ll then review the code that makes up our Lambda function and explain how it works. Let’s deploy our AWS Lambda function Please follow the deployment steps available in this GitHub repository . I have chosen to use Amazon’s SAM Local tool to showcase how you can test your Lambda function locally using Docker , as well as package it and deploy it to an AWS account in just a few commands. However, if you’d like to deploy it manually to the AWS Console, you can always use this zip script to deploy it in pretty much the same way I did in this MongoDB Atlas with Lambda tutorial . Let’s test our Lex bot (end-to-end) Now that our Lambda fulfillment function has been deployed, let’s test our bot again in the Amazon Lex console and verify that we get the expected response. For instance, we might want to search for all the romance movies Jennifer Aniston starred in, a scenario we can test with the following bot conversation: As the screenshot above testifies, the Lex bot replies with the full list of Jennifer Aniston’s romance movies retrieved from our movies MongoDB database through our Lambda function. But how does our Lambda function process that request? We’ll dig deeper into our Lambda function code in the next section. Let's dive into the Lambda function code Our Lambda function always receives a JSON payload with a structure compliant with Amazon Lex’ input event format (as this event.json file is): { "messageVersion": "1.0", "invocationSource": "FulfillmentCodeHook", "userId": "user-1", "sessionAttributes": {}, "bot": { "name": "SearchMoviesBot", "alias": "$LATEST", "version": "$LATEST" }, "outputDialogMode": "Text", "currentIntent": { "name": "SearchMovies", "slots": { "castMember": "jennifer aniston", "year": "0", "genre": "Romance" } } } Note that the request contains the bot’s name ( SearchMoviesBot ) and the slot values representing the answers to the bot’s questions provided by the user. The Lambda function starts with the exports.handler method which validates the bot’s name and performs some additional processing if the payload is received through Amazon API Gateway (this is only necessary if you want to test your Lambda function through Amazon API Gateway but is not relevant in an Amazon Lex context). It then calls the dispatch() method, which takes care of connecting to our MongoDB Atlas database and passing on the bot’s intent to the query() method, which we’ll explore in a second. Note that the dispatch() method uses the performance optimization technique I highlighted in Optimizing AWS Lambda performance with MongoDB Atlas and Node.js , namely not closing the database connection and using the callbackWaitsForEmptyEventLoop Lambda context property. This allows our bot to be more responsive after the first query fulfilled by the Lambda function. Let’s now take a closer look at the query() method, which is the soul and heart of our Lambda function. First, that method retrieves the cast member, movie genre, and movie release year. Because these values all come as strings and the movie release year is stored as an integer in MongoDB, the function must convert that value to an integer . We then build the query we will run against MongoDB: var castArray = [castMember]; var matchQuery = { Cast: { $in: castArray }, Genres: { $not: { $in: ["Documentary", "News", ""] } }, Type: "movie" }; if (genre != undefined && genre != allGenres) { matchQuery.Genres = { $in: [genre] }; msgGenre = genre.toLowerCase(); } if ((year != undefined && isNaN(year)) || year > 1895) { matchQuery.Year = year; msgYear = year; } We first restrict the query to items that are indeed movies (since the database also stores TV series) and we exclude some irrelevant movie genres such as the documentary and news genres. We also make sure we only query movies in which the cast member starred. Note that the $in operator expects an array, which is why we have to wrap our unique cast member into the castArray array. Since the cast member is the only mandatory query parameter, we add it first and then optionally add the Genres and Year parameters if the code determines that they were provided by the user (i.e. the user did not use the All and/or 0 escape values). The query() method then goes on to define the default response message based on the user-provided parameters. This default response message is used if the query doesn’t return any matching element: var resMessage = undefined; if (msgGenre == undefined && msgYear == undefined) { resMessage = `Sorry, I couldn't find any movie for ${castMember}.`; } if (msgGenre != undefined && msgYear == undefined) { resMessage = `Sorry, I couldn't find any ${msgGenre} movie for ${castMember}.`; } if (msgGenre == undefined && msgYear != undefined) { resMessage = `Sorry, I couldn't find any movie for ${castMember} in ${msgYear}.`; } if (msgGenre != undefined && msgYear != undefined) { resMessage = `Sorry, ${castMember} starred in no ${msgGenre} movie in ${msgYear}.`; } The meat of the query() method happens next as the code performs the database query using 2 different methods: the classic db.collection.find() method and the db.collection.aggregate() method. The default method used in this Lambda function is the aggregate one, but you can easily test the find() method by setting the * aggregationFramewor*k variable to false . In our specific use case scenario (querying for one single cast member and returning a small amount of documents), there likely won’t be any noticeable performance or programming logic impact. However, if we were to query for all the movies multiple cast members each starred in (i.e. the union of these movies, not the intersection), the aggregation framework query is a clear winner. Indeed, let’s take a closer look at the find() query the code runs: cursor = db.collection(moviesCollection) .find(matchQuery, { _id: 0, Title: 1, Year: 1 }) .collation(collation) .sort({ Year: 1 }); It’s a fairly simple query that retrieves the movie’s title and year, sorted by year. Note that we also use the same { locale: "en", strength: 1 } collation we used to create the case-insensitive index on the Cast property in Part 2 of this blog post series . This is critical since the end user might not title case the cast member’s name (and Lex won’t do it for us either). The simplicity of the query is in contrast to the relative complexity of the app logic we have to write to process the result set we get with the find() method: var maxYear, minYear; for (var i = 0, len = results.length; i < len; i++) { castMemberMovies += `${results[i].Title} (${results[i].Year}), `; } //removing the last comma and space castMemberMovies = castMemberMovies.substring(0, castMemberMovies.length - 2); moviesCount = results.length; var minYear, maxYear; minYear = results[0].Year; maxYear = results[results.length-1].Year; yearSpan = maxYear - minYear; First, we have to iterate over all the results to concatenate its Title and Year properties into a legible string. This might be fine for 20 items, but if we had to process hundreds of thousands or millions of records, the performance impact would be very noticeable. We further have to remove the last period and white space characters of the concatenated string since they’re in excess. We also have to manually retrieve the number of movies, as well as the low and high ends of the movie release years in order to compute the time span it took the cast member to shoot all these movies. This might not be particularly difficult code to write, but it’s clutter code that affects app clarity. And, as I wrote above, it definitely doesn’t scale when processing millions of items. Contrast this app logic with the succinct code we have to write when using the aggregation framework method: for (var i = 0, len = results.length; i < len; i++) { castMemberMovies = results[i].allMovies; moviesCount = results[i].moviesCount; yearSpan = results[i].timeSpan; } The code is not only much cleaner and concise now, it’s also more generic, as it can handle the situation where we want to process movies for each of multiple cast members. You can actually test this use case by uncommenting the following line earlier in the source code : castArray = [castMember, "Angelina Jolie"] and by testing it using this SAM script . With the aggregation framework, we get the correct raw and final results without changing a single line of code: However, the find() method’s post-processing requires some significant effort to fix this incorrect output (the union of comedy movies in which Angelina Jolie or Brad Pitt starred in, all incorrectly attributed to Brad Pitt): We were able to achieve this code conciseness and correctness by moving most of the post-processing logic to the database layer using a MongoDB aggregation pipeline : cursor = db.collection(moviesCollection).aggregate( [ { $match: matchQuery }, { $sort: { Year: 1 } }, unwindStage, castFilterStage, { $group: { _id: "$Cast", allMoviesArray: {$push: {$concat: ["$Title", " (", { $substr: ["$Year", 0, 4] }, ")"] } }, moviesCount: { $sum: 1 }, maxYear: { $last: "$Year" }, minYear: { $first: "$Year" } } }, { $project: { moviesCount: 1, timeSpan: { $subtract: ["$maxYear", "$minYear"] }, allMovies: { $reduce: { input: "$allMoviesArray", initialValue: "", in: { $concat: [ "$$value", { $cond: { if: { $eq: ["$$value", ""] }, then: "", else: ", " } }, "$$this" ] } } } } } ], {collation: collation} ); This aggregation pipeline is arguably more complex than the find() method discussed above, so let’s try to explain it one stage at a time (since an aggregation pipeline consists of stages that transform the documents as they pass through the pipeline): $match stage : performs a filter query to only return the documents we’re interested in (similarly to the find() query above). $sort stage : sorts the results by year ascending. $unwind stage : splits each movie document into multiple documents, one for each cast member in the original document. For each original document, this stage unwinds the Cast array of cast members and creates separate, unique documents with the same values as the original document, except for the Cast property which is now a string value (equal to each cast member) in each unwinded document. This stage is necessary to be able to group by only the cast members we’re interested in (especially if there are more than one). The output of this stage may contain documents with other cast members irrelevant to our query, so we must filter them out in the next stage. $match stage : filters the deconstructed documents from the $unwind stage by only the cast members we’re interested in. This stage essentially removes all the documents tagged with cast members irrelevant to our query. $group stage : groups movies by cast member (for instance, all movies with Brad Pitt and all movies with Angelina Jolie, separately). This stage also concatenates each movie title and release year into the Title (Year) format and adds it to an array called allMoviesArray (one such array for each cast member). This stage also computes a count of all movies for each cast member, as well as the earliest and latest year the cast member starred in a movie (of the requested movie genre, if any). This stage essentially performs most of the post-processing we previously had to do in our app code when using the find() method. Because that post-processing now runs at the database layer, it can take advantage of the database server’s computing power along with the distributed system nature of MongoDB (in case the collection is partitioned across multiple shards, each shard performs this stage independently of the other shards). $project stage : last but not least, this stage performs a $reduce operation (new in MongoDB 3.4) to concatenate our array of ‘ Title (Year) ’ strings into one single string we can use as is in the response message sent back to the bot. Once the matching movies have been retrieved from our MongoDB Atlas database, the code generates the proper response message and sends it back to the bot according to the expected Amazon Lex response format : if (msgGenre != allGenres) { resMessage = `${toTitleCase(castMember)} starred in the following ${moviesCount>1?moviesCount+" ":""} ${msgGenre.toLowerCase()} movie(s)${yearSpan>0?" over " + yearSpan +" years":""}: ${castMemberMovies}`; } else { resMessage = `${toTitleCase(castMember)} starred in the following ${moviesCount>1?moviesCount+" ":""}movie(s)${yearSpan>0?" over " + yearSpan +" years":""}: ${castMemberMovies}`; } if (msgYear != undefined) { resMessage = `In ${msgYear}, ` + resMessage; callback( close(sessionAttributes, "Fulfilled", { contentType: "PlainText", content: resMessage }) ); Our Jennifer Aniston fan can now be wowed by the completeness of our bot's response! Wrap-up and next steps This completes our Lex blog post series and I hope you enjoyed reading it as much as I did writing it. In this final blog post, we tested and deployed a Lambda function to AWS using the SAM Local tool . We also learned: How a Lambda function processes a Lex request and responds to it using Amazon Lex’ input and out event format . How to use a case-insensitive index in a find() or aggregate() query How to make the most of MongoDB’s aggregation framework to move complexity from the app layer to the database layer As next steps, I suggest you now take a look at the AWS documentation to learn how to deploy your bot to Facebook Messenger , Slack or to your own web site . Happy Lex-ing! About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

November 26, 2017

Using Amazon Lex, Lambda, & MongoDB Atlas to Build a Voice-Activated Movie Search App - Part 2

November 13, 2017

Using Amazon Lex, Lambda, & MongoDB Atlas to Build a Voice-Activated Movie Search App - Part 1

November 12, 2017

Azure Tutorial: How to Integrate Azure Functions with MongoDB

As announced at MongoDB World ‘17, MongoDB Atlas, the database-as-a-service provided by the creators of MongoDB, is now available on the three major public cloud providers: Amazon Web Services, Google Cloud Platform and Microsoft Azure. In this blog post, I’ll cover the integration of Microsoft Azure Functions with MongoDB Atlas from a developer standpoint. What are Azure Functions? In a nutshell, Azure Functions are the core building block of Microsoft’s serverless technologies, similar to AWS Lambda and Google Cloud Functions. You can write your Azure Functions code in a variety of languages and execute it at scale without worrying about the underlying virtual machine and operating system. That’s not very different from other cloud vendor offerings, but what seems to be unique about Azure Functions is Microsoft’s promise to open source the Azure Functions Runtime , which means that we could theoretically run Azure functions anywhere - on Azure, on private data centers or in other cloud providers. At the time of writing, we have yet to see whether Microsoft will deliver on that promise). To their credit, Microsoft already provides tools to run and debug Azure functions locally, as we’ll see below. In this post, I’ll introduce you to the process I recommend to create an Azure function with Visual Studio. I’ll also show how to leverage the .NET MongoDB driver to perform CRUD operations on a fully-managed MongoDB Atlas hosted on Azure. Specifically, I will take you through the following steps: Set up your development environment Create an Azure function in Visual Studio Write MongoDB CRUD queries Connect the Azure function to MongoDB Atlas Test the Azure function locally Deploy the Azure function to Microsoft Azure Configure and test the Azure function running on Microsoft Azure Set up your development environment First, you should make sure you have Visual Studio 2017 version 15.3 (or higher) installed on your Windows machine (the Community Edition is enough, but the Professional and Enterprise Edition also work with this tutorial). At the time of this writing, Visual Studio 2017 version 15.3 is in Preview and can be installed from https://visualstudio.com/vs/preview (VS 2017 v15.3 is required to run the Azure Functions Tools for Visual Studio 2017 ) When installing Visual Studio 2017, make sure you select the Azure development workload (as well as any other workload you wish to install). If you already installed Visual Studio 2017, but did not install the Azure development workload, you can do so by going to Settings → Apps & features, find the Visual Studio 2017 app and select Modify. At the time of writing, the Azure Functions Tools for Visual Studio 2017 must be installed as a Visual Studio extension. Please refer to Microsoft’s documentation for detailed installation instructions. Create an Azure function in Visual Studio Azure Functions offer a wide choice of programming languages, such as C#, F#, Node.js, Python, PHP and more. Given that C# is the language of choice of most Microsoft developers, this tutorial will focus on developing and deploying an Azure function using C#. Open Visual Studio 2017 and select File → New → Project. Select the Azure Functions project type and give your project a name (for instance MongoDB.Tutorials.AzureFunctions ). Next, right-click on your project in the Solution Explorer and select Add → New item. Select the Azure Function item and give it a name such as CreateRestaurantFunction.cs (the file name doesn’t matter as much as the function name, as we’ll see below). A new window appears and lets you choose the type of Azure Function you would like to create. Let’s keep it simple for now and choose the HttpTrigger function type, which will allow us to use our function as a REST API we’ll be able to call from cURL, Postman, or any custom application. Select Anonymous in AccessRights (you will be able to change this later) and name the function CreateRestaurant. Press the Create button. A CreateRestaurant.cs file gets created with boilerplate code in the public static async Task Run(...) method. This method is invoked every time you call your function endpoint, which by default is http://localhost:7071/api/ on your local machine ( http://localhost:7071/api/CreateRestaurant in our case). Let’s take a closer look at that Run(...) method: [FunctionName("Restaurants")] public static async Task Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)]HttpRequestMessage req, TraceWriter log) First, note the FunctionName attribute that determines your function url (so use caution if you want to update it). Second, the AuthorizationLevel is set to Anonymous as previously configured when creating the function. There are ways you can enforce authentication and authorization using OpenID Connect with Windows Azure Active Directory, Facebook, Google or Twitter (as acknowledged by Microsoft ), but we’ll leave it to identity experts to fill in the gaps. Third, the get and post parameters indicate that the function can be called with the GET or a POST Http methods. Typically, the GET method is used to retrieve data, the POST method to create data, the PUT method to update data, and the DELETE method to, well, delete data, as you guessed. Since we only want to use this function to create a document in MongoDB, let’s remove the get parameter and let’s keep the post parameter. We’ll create another function that we’ll use to retrieve, update and delete. However, we’ll use the HttpTriggerWithParameters function type as it will allow us to provide parameters (such as the restaurant id to retrieve, update or delete) as part of the API endpoint url. Following the same process as above, create another RestaurantFunction.cs Azure function file and name that function Restaurant. Write CRUD queries to MongoDB Atlas Now that our 2 functions are set up, let’s move to the meat of this blog post: interact with MongoDB Atlas by writing CRUD queries. In order to write any C# application connected to a MongoDB database, you need the MongoDB .NET driver. An Azure function is no exception to the rule so let’s go ahead and install it with NuGet. Right-click on your Visual Studio project and select Manage NuGet Packages. In the NuGet Package Manager, select the Browse tab and search for MongoDB. In the search results, select MongoDB.Driver, make sure you choose the latest version of the driver and press Install (v2.4.4 at the time of writing). The MongoDB .NET driver requires dependent assemblies ( MongoDB.Bson and MongoDB.Driver.Core ) so accept the Apache 2.0 license for these 3 libraries: The MongoDB driver also depends on the System.Runtime.InteropServices.RuntimeInformation assembly (v4.0.0), which the current version of the Azure Function Tools don’t automatically import with the MongoDB.Driver package (as other Visual Studio project types do). We therefore need to explicitly import it with NuGet as well: Once that’s done, edit the CreateRestaurantFunction.cs file, and add the following using statements: curl -iX PATCH http://localhost:7071/api/Restaurant/id/40356018 -H 'content-type: application/json' -d '{ "address.zipcode" : "10036", "borough" : "Manhattan", "cuisine" : "Italian" }' Next, delete the content of the Run(...) method in the CreateRestaurantFunction.cs file and replace it with the following: curl -iX GET http://localhost:7071/api/Restaurant/id/40356018 Replace the entire content of the RestaurantFunction.cs file with the following code: curl -iX POST http://localhost:7071/api/CreateRestaurant -H 'content-type: application/json' -d '{ "address" : { "building" : "2780", "coord" : [ -73.98241999999999, 40.579505 ], "street" : "Stillwell Avenue", "zipcode" : "11224" }, "borough" : "Brooklyn", "cuisine" : "American", "name" : "Riviera Caterer", "restaurant_id" : "40356018" }' Note that I made several changes to that function. Namely, I changed the signature of the Run() method to make it asynchronous and I enabled it to handle GET, PATCH and DELETE http requests (as mentioned above). Note also the RunPatch() of the code where I make use of the BsonDocument and UpdateDefinition objects to update an existing document given an arbitrary (but valid) JSON string: curl -iX DELETE http://localhost:7071/api/Restaurant/id/40356018 This would allow me to update the cuisine and borough properties of an existing document by sending the following JSON to the /api/Restaurant/[restaurantId] endpoint of the function: { "cuisine": "Italian", "borough": "Manhattan" } The same piece of code would also be able to update the zipcode and building property of the address sub-document by setting JSON attributes with dot notation: { "address.zipcode": "99999", "address.building": "999" } Note that if you prefer to use sub-document notation, such as { "address": { "zipcode": "99999", "building": "999" } } then you should use a simpler form (in order to preserve the sub-document attributes you don’t update): update = new BsonDocument("$set", changesDocument); At this point, our function doesn’t compile because it misses an additional, singleton RestaurantsCollection class, responsible for instantiating a MongoDB database connection and returning a reference to the restaurants collection. The purpose of this class is two-fold: Encapsulate similar code we’d otherwise have to write for each function Only instantiate a new database connection if none already exists. Indeed, an Azure function has the ability to reuse the same underlying operating system it uses as its core runtime across close enough calls, thereby allowing us to reuse any database connection we’ve already established in a previous call. Add a RestaurantsCollection.cs class file add paste the following content into it: log.Info("CreateRestaurant function processed a request."); var itemId = ObjectId.Empty; var jsonContent = string.Empty; try { //retrieving the content from the request's body jsonContent = await req.Content.ReadAsStringAsync().ConfigureAwait(false); //assuming we have valid JSON content, convert to BSON var doc = BsonSerializer.Deserialize (jsonContent); var collection = RestaurantsCollection.Instance; //store new document in MongoDB collection await collection.InsertOneAsync(doc).ConfigureAwait(false); //retrieve the _id property created document itemId = (ObjectId)doc["_id"]; } catch (System.FormatException fex) { //thrown if there's an error in the parsed JSON log.Error($"A format exception occurred, check the JSON document is valid: {jsonContent}", fex); } catch (System.TimeoutException tex) { log.Error("A timeout error occurred", tex); } catch (MongoException mdbex) { log.Error("A MongoDB error occurred", mdbex); } catch (System.Exception ex) { log.Error("An error occurred", ex); } return itemId == ObjectId.Empty ? req.CreateResponse(HttpStatusCode.BadRequest, "An error occurred, please check the function log") : req.CreateResponse(HttpStatusCode.OK, $"The created item's _id is {itemId}"); Connect the Azure function to MongoDB Atlas The last step is to configure our function so it can connect to MongoDB. To do so, edit the local.settings.json file and add a MongoDBAtlasURI attribute inside the Values nested document: While you test your Azure function locally, you can use your local MongoDB instance and specify http://localhost:27017 . However, since you will publish your Azure function to Microsoft Azure, I recommend that you use MongoDB Atlas to host your MongoDB database cluster since MongoDB Atlas is by default secure, yet publicly available (to configurable IP addresses and specific database users). If you don’t have a MongoDB Atlas cluster yet, sign up now and set up a MongoDB Atlas database on Microsoft Azure. You can retrieve your cluster’s connection string from the MongoDB Atlas portal by pressing the Connect button on your cluster page: Next, you should press the Copy button to copy your MongoDB Atlas URI to your clipboard: Then paste it to the local.settings.json file and modify it to match your needs. If you chose Microsoft Azure to host your 3-node MongoDB Atlas replica set, the format of your connection string is the following: mongodb://<USERNAME>:<PASSWORD>@<CLUSTERNAME_LOWERCASE>-shard-00-00-<SUFFIX>.azure.mongodb.net:27017,<CLUSTERNAME_LOWERCASE>-shard-00-01-<SUFFIX>.azure.mongodb.net:27017,<CLUSTERNAME_LOWERCASE>-shard-00-02-<SUFFIX>.azure.mongodb.net:27017/<DATABASE>?ssl=true&replicaSet=<CLUSTERNAME>-shard-0&authSource=admin While you’re at it, press the Add current IP address button to allow your current machine (or virtual machine) to access your MongoDB Atlas database: Test the Azure function locally It’s now time to run and test our Azure function. Launch the Azure Functions debugger in Visual Studio; the following command line prompt should appear: Now run the following cURL commands (cURL is available with Cygwin , for instance) or use Postman to craft the equivalent commands. To create a restaurant document, run: using MongoDB.Driver; using MongoDB.Bson; using MongoDB.Bson.Serialization; If you used Postman, you should get a 200 OK result: Now, try to retrieve the restaurant by running the following curl command: using MongoDB.Bson; using MongoDB.Driver; using System; namespace MongoDB.Tutorials.AzureFunctions { public sealed class RestaurantsCollection { private static volatile IMongoCollection instance; private static object syncRoot = new Object(); private RestaurantsCollection() { } public static IMongoCollection<BsonDocument> Instance { get { if (instance == null) { lock (syncRoot) { if (instance == null) { string strMongoDBAtlasUri = System.Environment.GetEnvironmentVariable("MongoDBAtlasURI"); var client = new MongoClient(strMongoDBAtlasUri); var db = client.GetDatabase("travel"); instance = db.GetCollection<BsonDocument>("restaurants"); } } } return instance; } } } } Next, you can try to update the restaurant by issuing a PATCH request: using System; using System.Net; using System.Net.Http; using System.Threading.Tasks; using Microsoft.Azure.WebJobs; using Microsoft.Azure.WebJobs.Extensions.Http; using Microsoft.Azure.WebJobs.Host; using MongoDB.Bson; using MongoDB.Bson.Serialization; using MongoDB.Driver; namespace MongoDB.Tutorials.AzureFunctions { public static class RestaurantFunction { [FunctionName("Restaurant")] public static Task Run([HttpTrigger(AuthorizationLevel.Anonymous, "get", "patch", "delete", Route = "Restaurant/id/{restaurantId}")]HttpRequestMessage req, string restaurantId, TraceWriter log) { log.Info("Restaurant function processed a request."); try { var collection = RestaurantsCollection.Instance; switch (req.Method.Method) { case "GET": return RunGet(req, restaurantId, log, collection); case "PATCH": return RunPatch(req, restaurantId, log, collection); case "DELETE": return RunDelete(req, restaurantId, log, collection); default: return Task.FromResult(req.CreateResponse(HttpStatusCode.MethodNotAllowed)); } } catch (System.Exception ex) { log.Error("An error occurred", ex); return Task.FromResult(req.CreateResponse(HttpStatusCode.InternalServerError)); } } private static async Task<HttpResponseMessage> RunGet(HttpRequestMessage req, string restaurantId, TraceWriter log, IMongoCollection<BsonDocument> collection) { var filter = Builders<BsonDocument>.Filter.Eq("restaurant_id", restaurantId); var results = await collection.Find(filter).ToListAsync().ConfigureAwait(false); if (results.Count > 0) { return req.CreateResponse(HttpStatusCode.OK, results[0].ToString()); } return req.CreateResponse(HttpStatusCode.NotFound, $"A restaurant with id {restaurantId} could not be found"); } private static async Task<HttpResponseMessage> RunDelete(HttpRequestMessage req, string restaurantId, TraceWriter log, IMongoCollection<BsonDocument> collection) { var filter = Builders<BsonDocument>.Filter.Eq("restaurant_id", restaurantId); var result = await collection.FindOneAndDeleteAsync(filter).ConfigureAwait(false); if (result != null) { return req.CreateResponse(HttpStatusCode.OK); } return req.CreateResponse(HttpStatusCode.NotFound, $"A restaurant with id {restaurantId} could not be deleted"); } private static async Task<HttpResponseMessage> RunPatch(HttpRequestMessage req, string restaurantId, TraceWriter log, IMongoCollection<BsonDocument> collection) { var filter = Builders<BsonDocument>.Filter.Eq("restaurant_id", restaurantId); string jsonContent = await req.Content.ReadAsStringAsync(); BsonDocument changesDocument; try { changesDocument = BsonSerializer.Deserialize<BsonDocument>(jsonContent); } catch (System.FormatException) { var msg = $"The JSON content is invalid: {jsonContent}"; log.Info(msg); return req.CreateResponse(HttpStatusCode.BadRequest, msg); } UpdateDefinition<BsonDocument> update = null; foreach (var change in changesDocument) { if (update == null) { update = Builders<BsonDocument>.Update.Set(change.Name, change.Value); } else { update = update.Set(change.Name, change.Value); } } //you can also use the simpler form below if you're OK with bypassing the UpdateDefinitionBuilder (and trust the JSON string to be fully correct) //update = new BsonDocument("$set", changesDocument); //The following lines could be uncommented out for debugging purposes //var registry = collection.Settings.SerializerRegistry; //var serializer = collection.DocumentSerializer; //var rendered = update.Render(serializer, registry).ToJson(); var updateResult = await collection.UpdateOneAsync(filter, update).ConfigureAwait(false); if (updateResult.ModifiedCount == 1) { return req.CreateResponse(HttpStatusCode.OK); } return req.CreateResponse(HttpStatusCode.NotFound, $"A restaurant with id {restaurantId} could not be updated"); } } } Last, delete the restaurant with a DELETE Http request: var changesDocument = BsonSerializer.Deserialize (jsonContent); UpdateDefinition update = null; foreach (var change in changesDocument) { if (update == null) { var builder = Builders .Update; update = builder.Set(change.Name, change.Value); } else { update = update.Set(change.Name, change.Value); } } var updateResult = await collection.UpdateOneAsync(filter, update); Deploy the Azure function to Microsoft Azure Now that we’ve verified that all the tests above are successful, let’s move forward and deploy our function to Azure. You can deploy your Azure function using Continuous Integration (CI) tools such as Visual Studio Team Services (VSTS) or the Azure CLI, but we’ll take a simpler approach in this post by using the Graphical User Interface available in Visual Studio 2017. Right-click on the MongoDB.Tutorials.AzureFunctions project, select Azure Function App and press Publish. The Create App Service wizard appears and lets you configure your Azure App Service name, as well as the subscription, resource group, app service plan and storage account you want to use for that function: When you’re done configuring all these parameters, press Create. Configure and test the Azure function running on Microsoft Azure The Visual Studio deployment process publishes pretty much all of your Azure Function artifacts, except the local.settings.json file where we configured the MongoDB connection string. In order to create it on Azure, head over to your Azure function in your Azure portal and select the Application Settings link: In the App Settings section, add the MongoDBAtlasURI key and set the value to a valid MongoDB Atlas connection string. We’re not done yet though. Unless you have allowed any IP address to have access to your database cluster, we must configure the IP Whitelist of your Atlas cluster to let Microsoft Azure connect to it. To do so, head over the Platform features tab and select Properties. In the properties tab, copy the comma-delimited list of IP addresses and enter each one of them in your Atlas cluster’s IP whitelist. Once you’re done, your cluster’s IP Whitelist should have 5 Azure IP addresses along with your local machine’s IP address: You can now replace the http://localhost:7071 url you used in your cURL scripts or Postman with the url of your published Azure function (such as https://restaurantappfunction.azurewebsites.net ) to test your published Azure function. The screenshot below shows the successful result of an /api/CreateRestaurant call in Postman, which is evidence the published version of Azure Function was able to connect to MongoDB Atlas. Conclusion I hope you have found this tutorial helpful to get started with Azure Functions and MongoDB Atlas. Cherry on the cake, you can find the complete source code of this tutorial on GitHub. As a next step, I suggest that you download MongoDB Compass to visualize the documents you just created in your MongoDB Atlas database cluster with our CreateRestaurant Azure Function. Here’s a small tip: if you just copy an Atlas connection string to your clipboard and start MongoDB Compass, it will automatically detect your connection string and offer you to pre-populate the login screen. Pretty neat, no? If you’re planning to use Azure Functions for a production deployment, you might also be interested in the available continuous integration deployment options offered by Microsoft. And if you don’t already have your MongoDB Atlas cluster, sign up now and create a MongoDB cluster on Microsoft Azure in minutes! About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner .

August 3, 2017

How to Deploy Sitecore on Azure & MongoDB Atlas

This blog post is a tutorial written for Sitecore administrators who would like to deploy Sitecore on Microsoft Azure with MongoDB Atlas as the Database as a Service (DBaaS provider) for Sitecore’s MongoDB databases. The Sitecore Azure Toolkit scripts allow you to easily deploy Sitecore as an App Service on Microsoft Azure, but the setup and configuration of the required analytics and tracking MongoDB databases is the responsibility of the operations team running the Sitecore cloud deployment. Now that MongoDB Atlas is available on Microsoft Azure, you can use it to dramatically accelerate the time to market of your Sitecore Cloud deployment. Atlas makes maintenance easy by relying on MongoDB’s expertise to maintain the database for you instead of setting up and operating your own MongoDB infrastructure on Microsoft Azure Virtual Machines. Additionally, by hosting your Sitecore VMs in the same region as your MongoDB Atlas clusters, you benefit from fast, local Internet connections between Azure VMs and MongoDB Atlas. Here is what Sitecore has to say: “With MongoDB Atlas on Azure, Sitecore customers now have the benefit of sourcing MongoDB directly from its creators,” said Ryan Donovan, Senior Vice President of Product Management at Sitecore. “This, coupled with MongoDB’s enterprise-class support and service levels, delivers a vehicle that seamlessly complements Sitecore’s strong commitment to the Microsoft Azure cloud. Sitecore deployment on Azure To install Sitecore on Microsoft Azure, you should start by reading the related Sitecore documentation page. Once you have chosen your Sitecore deployment type (XP0, XM, XP or XDB) and uploaded the corresponding WebDeploy package to your Microsoft Azure storage account, head over to MongoDB Atlas to prepare the cluster. You will use it to host your Sitecore MongoDB database. If you don’t have a MongoDB Atlas account yet, register here to create one. It is possible to host your Sitecore MongoDB cluster is an existing Atlas group, but recall that security configurations are scoped at the group, not cluster level. I highly recommend using a new, independent Atlas group for security reasons (namely, to keep its IP Whitelisting and database users configuration independent). The following tutorial assumes that we will deploy a Sitecore 8.2.3 XP0 environment using a dedicated Atlas group we’ll name Sitecore-Azure. MongoDB Atlas cluster setup Once you have signed in to MongoDB Atlas, select your name in the top right corner of any MongoDB Atlas page and select My Groups. Add a new group called Sitecore-Azure and make sure you choose MongoDB Atlas as the group type. Once your Atlas group has been created, press the Build a New Cluster button. Give a name to your cluster (for instance, Sitecore ). Choose the Microsoft Azure provider and the region of your choice (among those supported by MongoDB Atlas). Using the same deployment region as your Sitecore web and Azure SQL servers provides latency benefits and cost savings. In this tutorial, I chose to deploy Sitecore in the westus region. Choose the M30 cluster instance size, knowing that you will always have the option to scale up to a larger instance size, without any upgrade downtime at all. Since we’re setting up a brand new cluster, you’ll need an administrator account. Scroll down to configure your cluster admin user (I use atlasAdmin as the admin user name) and press the Continue to Payment button. After filling out your credit card information, MongoDB Atlas starts provisioning your Sitecore cluster. It’s that easy! MongoDB Atlas cluster security configuration Sitecore needs a MongoDB database user account for access to its databases. While your cluster is being provisioned, head over to the Security tab to create database users. We highly recommend that you follow least-privilege access best practices and create a specific user for each of the 4 MongoDB databases. Press the Add New User button to create the database user we’ll use to access the Analytics database. Security binds a user to one or more databases and in this tutorial, I chose the username scAnalytics and the analytics database name sitecoreAnalytics . The scAnalytics user should only have readWrite permissions on this database as shown in the screenshot below. The readWrite built-in role provides Sitecore all necessary access to create collections and change data while still following the least-privilege access best practice. Select the Show Advanced Options link in the User Privileges section to add the readWrite permission. After creating 3 additional users for the 3 Sitecore tracking databases with similar permissions, the Security/MongoDB Users tab should display the following users: Now that we have user accounts, let’s move back to provisioning Sitecore. Before provisioning our Sitecore environment, we need to retrieve our database cluster’s connection string. Select the Clusters tab, select the Sitecore cluster and press the Connect button. In the pop-up window, press the Copy button next to the URI Connection String and paste the connection string into a safe location. It’s now time to set up your Sitecore Cloud environment. There are 2 ways you can provision your Sitecore Cloud environment in Azure: Using the Sitecore Azure Toolkit Using the Sitecore Azure Marketplace wizard I'll cover both options in the sections below. Sitecore Cloud environment setup with the Sitecore Azure Toolkit First, make sure your Windows (physical or virtual) machine matches the Sitecore Azure Toolkit requirements . Next, from the Sitecore Azure Quickstarts GitHub repository , download the azuredeploy.parameters.json files from the proper folder. Since I want to install Sitecore 8.2.3 in a XP0 configuration, the corresponding folder is https://github.com/Sitecore/Sitecore-Azure-Quickstart-Templates/tree/master/Sitecore%208.2.3/xp0 . Put this file at the root of the Sitecore Azure Toolkit folder on your Windows operating system, along with your Sitecore license file. Next, open the azuredeploy.parameters.json file in your favorite text editor. Using Microsoft Azure Storage Explorer, right-click on each WDP file you previously uploaded to your Azure Storage account (as instructed in the Prepare WebDeploy packages section) and select the Get Shared Access Signature menu: The Shared Access Signature window shows up. Note that the Start and Expiry times might be slightly off and that the generated link might not be valid. I therefore recommend you decrease the Start Time by one hour (or more): Press the Create button, Copy the URL field and paste it to its corresponding parameter in the azuredeploy.parameters.json file, as instructed in the Sitecore environment template configuration configuration (in my case, I configured the singleMsDeployPackageUrl parameter). For the four MongoDB-related parameters ( analyticsMongoDbConnectionString, trackingLiveMongoDbConnectionString, trackingHistoryMongoDbConnectionString and trackingContactMongoDbConnectionString ), use the MongoDB Atlas connection string you previously retrieved and replace atlasAdmin with . Your connection string should then be similar to the following example: mongodb://<USERNAME>:<PASSWORD>@sitecore-shard-00-00-x00xx.azure.mongodb.net:27017,sitecore-shard-00-01-x00xx.azure.mongodb.net:27017,sitecore-shard-00-02-x00xx.azure.mongodb.net:27017/<DATABASE>?ssl=true&replicaSet=Sitecore-shard-0&authSource=admin Replace , and with the values you chose for each of the dedicated MongoDB users you set up, such as: USERNAME PASSWORD DATABASE scAnalytics [PASSWORD1] sitecoreAnalytics <td>scTrackingLive</td> <td>[PASSWORD2]</td> sitecoreTrackingLive <td>scTrackingHistory</td> <td>[PASSWORD3]</td> <td>sitecoreTrackingHistory</td> </tr> <td>scTrackingContact</td> <td>[PASSWORD4]</td> <td>sitecoreTrackingContact</td> </tr> Paste these connection strings to their corresponding parameters in the azuredeploy.parameters.json file. Don’t forget to also fill out other required parameters in that file, such as deploymentId, sqlServerLogin, sqlServerPassword and sitecoreAdminPassword . Finally, open a Powershell command prompt running as administrator, navigate to the root folder of the Sitecore Azure Toolkit on your machine, and run the following commands: Import-Module AzureRM Import-Module .\tools\Sitecore.Cloud.Cmdlets.psm1 -Verbose Login-AzureRMAccount Provided you get no error, the last line should prompt a browser window requiring you to sign in with your Microsoft Azure account. After successfully signing in with Azure, invoke the Sitecore deployment command . In my case, I ran the following command: Start-SitecoreAzureDeployment -Location "westus" -Name "sc" -ArmTemplateUrl "https://raw.githubusercontent.com/Sitecore/Sitecore-Azure-Quickstart-Templates/master/Sitecore%208.2.3/xp0/azuredeploy.json" -ArmParametersPath ".\azuredeploy.parameters.json" -LicenseXmlPath ".\MongoDBTempLic.xml" The command line should display “Deployment Started…” but since the Azure provisioning process takes a few minutes, I advise you follow the provisioning process from the Resource groups page on your Azure portal: Sitecore Cloud environment setup with the Sitecore Azure Marketplace wizard If you prefer to use the more automated Sitecore wizard on Azure Marketplace, navigate to Sitecore Experience Platform product page and start the creation process by pressing Get It Now . Once you reach the Credentials tab, enter your 4 MongoDB Atlas connection strings, as shown in the screenshot below. After you complete the wizard, your Sitecore environment will be provisioned in Microsoft Azure similarly to the Sitecore Azure Toolkit process described above. IP Whitelisting Each Azure App Service exposes the outbound IP addresses it uses. While Microsoft doesn’t formally guarantee that these are fixed IPs , there seems to be evidence that these outbound IP addresses don’t change unless you make significant modifications to your app service (such as scaling it up or down). Another option would be to create an Azure App Service Environment , but this is outside the scope of this blog post. To find out which outbound IP addresses your app service uses, head over to the Properties tab of your app service and copy the outbound IP addresses available in the namesake section: Navigate to the Security/IP Whitelist tab of your MongoDB Atlas cluster, press the Add IP Address button and add each Azure outbound IP address. Testing connectivity with MongoDB Atlas Once the Sitecore Powershell command completes, your Sitecore web site should be up and running at the url available in your Azure App Service page (in my case, the “sc-single” App Service): Copy/paste the URL available in your Azure App Service page into a browser (see screenshot above). The following page should appear: You can also navigate to [your_azurewebsites_sitecore_url]/sitecore/admin where you can access the site administration page. Use admin as the username and the sitecoreAdminPassword value from the azuredeploy.parameters.json file as your password. Verify that your MongoDB Atlas cluster has the proper collections in each of the 4 databases previously mentioned by using MongoDB Atlas’ Data Explorer tab (or MongoDB Compass if you prefer to use a client-side tool). For example, the Sitecore Analytics database shows the following collections when using Sitecore 8.2.3: You can even drill down inside each collection to see the entries Sitecore might already have generated, for instance in the UserAgents collection: Conclusion I hope that you found this tutorial helpful. You should now have a running Sitecore Cloud environment with MongoDB Atlas on Microsoft Azure. If you’re interested in MongoDB Atlas and don’t have an account yet, you can sign up for free and create a cluster in minutes. If you’d like to know more about MongoDB deployment options for Sitecore, including our Sitecore consulting engagement package, visit the MongoDB for Sitecore page. Please use the comment form below to provide your feedback or seek help with any issues. About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner .

July 18, 2017

AWS Step Functions: Integrating MongoDB Atlas, Twilio,& AWS Simple Email Service - Part 2

This is Part 2 of the AWS Step Functions overview post published a few weeks ago. If you want to get more context on the sample application business scenario, head back to read Part 1 . In this post, you’ll get a deep dive into the application’s technical details. As a reference, the source code of this sample app is available on GitHub . Setting up the Lambda functions The screenshot above is the graphical representation of the state machine we will eventually be able to test and run. But before we get there, we need to set up and publish the 4 Lambda functions this Step Functions state machine relies on. To do so, clone the AWS Step Functions with MongoDB GitHub repository and follow the instructions in the Readme file to create and configure these Lambda functions. If you have some time to dig into their respective codebases, you'll realize they're all made up of just a few lines, making it simple to embed Twilio, AWS and MongoDB APIs in your Lambda function code. In particular, I would like to point out the concise code the Get-Restaurants lambda function uses to query the MongoDB Atlas database: db.collection('restaurants').aggregate( [ { $match: { "address.zipcode": jsonContents.zipcode, "cuisine": jsonContents.cuisine, "name": new RegExp(jsonContents.startsWith) } }, { $project: { "_id": 0, "name": 1, "address.building": 1, "address.street": 1, "borough": 1, "address.zipcode": 1, "healthScoreAverage": { $avg: "$grades.score" }, "healthScoreWorst": { $max: "$grades.score" } } } ] ) The code snippet above is a simple yet powerful example of aggregation framework queries using the $match and $project stages along with the $avg and $max accumulator operators . In a nutshell, this aggregation filters the restaurants dataset by 3 properties (zip code, cuisine, and name) in the $match stage , returns a subset of each restaurant’s properties (to minimize the bandwidth usage and query latency), and computes the maximum and average values of health scores obtained by each restaurant (over the course of 4 years) in the $project stage . This example shows how you can very easily replace SQL clauses (such as WHERE(), MAX() and AVG()) using MongoDB’s expressive query language. Creating the Step Functions state machine Once you are done with setting up and configuring these Lambda functions, it's time to finally create our Step Functions state machine. AWS created a JSON-based declarative language called the Amazon States Language , fully documented on the Amazon States Language specification page . A Step Functions state machine is essentially a JSON file whose structure conforms to this new Amazon States Language. While you don’t need to read its whole specification to understand how it works, I recommend reading the AWS Step Functions Developer Guide to understand its main concepts and artifacts. For now, let's go ahead and create our WhatsThisRestaurantAgain state machine. Head over to the Create State Machine page in AWS Step Functions and give your new state machine a name (such as WhatsThisRestaurantAgain ). Next, copy and paste the following JSON document ( also available on GitHub ) into the Code text editor (at the bottom of the Create State Machine page): { "Comment": "A state machine showcasing the use of MongoDB Atlas to notify a user by text message or email depending on the number of returned restaurants", "StartAt": "GetRestaurants", "States": { "GetRestaurants": { "Type": "Task", "Resource": "", "ResultPath": "$.restaurants", "Next": "CountItems" }, "CountItems": { "Type": "Task", "Resource": "", "InputPath": "$.restaurants", "ResultPath": "$.count", "Next": "NotificationMethodChoice" }, "NotificationMethodChoice": { "Type": "Choice", "Choices": [ { "Variable": "$.count", "NumericGreaterThan": 1, "Next": "SendByEmail" }, { "Variable": "$.count", "NumericLessThanEquals": 1, "Next": "SendBySMS" } ], "Default": "SendByEmail" }, "SendByEmail": { "Type": "Task", "Resource": "", "End": true }, "SendBySMS": { "Type": "Task", "Resource": "", "End": true } } } Once you’re done pasting this JSON document, press the Refresh button of the Preview section right above the Code editor and... voilà! The state machine now shows up in its full, visual glory: We’re not quite done yet. But before we complete the last steps to get a fully functional Step Functions state machine, let me take a few minutes to walk you through some of the technical details of my state machine JSON file. Note that 4 states are of type "Task" but that their Resource attributes are empty. These 4 "Task" states represent the calls to our 4 Lambda functions and should thus reference the ARNs (Amazon Resource Names) of our Lambda functions. You might think you have to get these ARNs one by one—which might prove to be tedious—but don't be discouraged; AWS provides a neat little trick to get these ARNs automatically populated! Simply click inside the double quotes for each Resource attribute and the following drop-down list should appear (if it doesn't, make sure you are creating your state machine in the same region as your Lambda functions): Once you have filled out the 4 empty Resource attributes with their expected values, press the Create State Machine button at the bottom. Last, select the IAM role that will execute your state machine (AWS should have conveniently created one for you) and press OK : On the page that appears, press the New execution button: Enter the following JSON test document (with a valid emailTo field) and press Start Execution : { "startsWith": "M", "cuisine": "Italian", "zipcode": "10036", "phoneTo": "+15555555555", "firstnameTo": "Raphael", "emailTo": "raphael@example.com", "subject": "List of restaurants for {{firstnameTo}}", } If everything was properly configured, you should get a successful result, similar to the following one: If you see any red boxes (in lieu of a green one), check CloudWatch where the Lambda functions log their errors. For instance, here is one you might get if you forgot to update the emailTo field I mentioned above: And that's it (I guess you can truly say we’re " done done " now)! You have successfully built and deployed a fully functional cloud workflow that mashes up various API services thanks to serverless functions. For those of you who are still curious, read on to learn how that sample state machine was designed and architected. Design and architecture choices Let's start with the state machine design: The GetRestaurants function queries a MongoDB Atlas database of restaurants using some search criteria provided by our calling application, such as the restaurant's cuisine type, its zip code and the first few letters of the restaurant's name. It retrieves a list of matching restaurants and passes that result to the next function ( CountItems ). As I pointed out above, it uses MongoDB's aggregation framework to retrieve the worst and average health score granted by New York's Health Department during its food safety inspections. That data provides the end user with information on the presumed cleanliness and reliability of the restaurant she intends to go to. Visit the aggregation framework documentation page to learn more about how you can leverage it for advanced insights into your data. The CountItems method counts the number of the restaurants; we'll use this number to determine how the requesting user is notified. If we get a single restaurant match, we'll send the name and address of the restaurant to the user's cell phone using the SendBySMS function. However, if there's more than one match, it's probably more convenient to display that list in a table format. As such, we'll send an email to the user using the SendByEmail method. At this point, you might ask yourself: how is the data passed from one lambda function to another? As it turns out, the Amazon States Language provides developers with a flexible and efficient way of treating inputs and outputs. By default, the output of a state machine function becomes the input of the next function. That doesn't exactly work well for us since the SendBySMS and SendByEmail methods must know the user's cell phone number or email address to properly work. An application that would like to use our state machine would have no choice but to pass all these parameters as a single input to our state machine, so how do we go about solving this issue? Fortunately for us, the Amazon States Language has the answer: it allows us to easily append the result of a function to the input it received and forward the concatenated result to the next function. Here's how we achieved this with our GetRestaurants function: "GetRestaurants": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME", "ResultPath": "$.restaurants", "Next": "CountItems" } Note the ResultPath attribute above where we instruct Step Functions to append the result of our GetRestaurants task (an array of matching restaurants) to the input it received, whose structure is the test JSON document I mentioned above (duplicated here for reading convenience): { "startsWith": "M", "cuisine": "Italian", "zipcode": "10036", "phoneTo": "+15555555555", "firstnameTo": "Raphael", "emailTo": "raphael@example.com", "subject": "List of restaurants for {{firstnameTo}}" } This input contains all the information my state machine might need, from the search criteria (startsWith, cuisine, and zipcode), to the user's cell phone number (if the state machine ends up using the SMS notification method), first name, email address and email subject (if the state machine ends up using the email notification method). Thanks to the ResultPath attribute we set on the GetRestaurants task, its output has a structure similar to the following JSON document (additional data in bold): { "firstnameTo": "Raphael", "emailTo": "raphael@example.com", "subject": "List of restaurants for {{firstnameTo}}", "restaurants": [ { "address": { "building": "235-237", "street": "West 48 Street" }, "borough": "Manhattan", "name": "La Masseria" }, { "address": { "building": "315", "street": "West 48 Street" }, "borough": "Manhattan", "name": "Maria'S Mont Blanc Restaurant" }, { "address": { "building": "654", "street": "9 Avenue" }, "borough": "Manhattan", "name": "Cara Mia" } ] } As expected, the restaurants sub-document has been properly appended to our original JSON input. That output becomes by default the input for the CountItems method. But, we don't want that function to have any dependency on the input it receives. Since it's a helper function, we might want to use it in another scenario where the input structure is radically different. Once again, the Amazon States Language comes to the rescue with the optional InputPath parameter. Let's take a closer look at our CountItems task declaration in the state machine’s JSON document: "CountItems": { "Type": "Task", "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME", "InputPath": "$.restaurants", "ResultPath": "$.count", "Next": "NotificationMethodChoice" } By default, the InputPath value is the whole output of the preceding task ( GetRestaurants in our state machine). The Amazon States Language allows you to override this parameter by explicitly setting it to a specific value or sub-document. As you can see in the JSON fragment above, this is exactly what I have done to only pass an array of JSON elements to the CountItems Lambda function (in my case, the array of restaurants we received from our previous GetRestaurants function), thereby making it agnostic to any JSON schema. Conversely, the result of the CountItems task is stored in a new count attribute that serves as the input of the NotificationMethodChoice choice state that follows: "NotificationMethodChoice": { "Type": "Choice", "Choices": [ { "Variable": "$.count", "NumericGreaterThan": 1, "Next": "SendByEmail" }, { "Variable": "$.count", "NumericLessThanEquals": 1, "Next": "SendBySMS" } ], "Default": "SendByEmail" } The logic here is fairly simple: if the restaurants count is greater than one, the state machine will send an email message with a nicely formatted table of the restaurants to the requesting user’s email address. If only one restaurant is returned, we’ll send a text message to the user’s phone number (using Twilio’s SMS API ) since it’s probably faster and more convenient for single row results (especially since the user might be on the move while requesting this piece of information). Note that my JSON "code" actually uses the NumericLessThanEquals operator to trigger the SendBySMS task and not the Equals operator as it really should. So technically speaking, even if no result is returned from the GetRestaurants task, the state machine would still send a text message to the user with no restaurant information whatsoever! I’ll leave it up to you to fix this intentional bug. Next steps In this post, I showed you how to create a state machine that orchestrates calls to various cloud services and APIs using a fictitious restaurant search and notification scenario. I hope you enjoyed this tutorial explaining how to deploy and test that state machine using the AWS console. Last, I went through various design and architecture considerations, with a focus on data flow abilities available in Step Functions. If you haven’t done so already, sign up for MongoDB Atlas and create your free M0 MongoDB cluster in minutes. Next, you can get more familiar with AWS Lambda development and deployment by following our 101 Lambda tutorial . If you already have some experience with AWS Lambda, Developing a Facebook Chatbot with AWS Lambda and MongoDB Atlas will walk through a richer use case. As a last step, you might be interested in Step Functions integration with API Gateway to learn how to call a state machine from an external application. About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner .

May 17, 2017

Optimizing AWS Lambda With MongoDB Atlas & NodeJS

I attended an AWS user group meeting some time ago, and many of the questions from the audience concerned caching and performance. In this post, I review the performance implications of using Lambda functions with any database-as-a-service (DBaaS) platform (such as MongoDB Atlas ). Based on internal investigations, I offer a specific workaround available for Node.js Lambda functions. Note that other supported languages (such as Python) may only require implementing some parts of the workaround, as the underlying AWS containers may differ in their resource disposal requirements. I will specifically call out below which parts are required for any language and which ones are Node.js-specific. AWS Lambda is serverless, which means that it is essentially stateless. Well, almost. As stated in its developer documentation , AWS Lambda relies on a container technology to execute its functions. This has several implications: The first time your application invokes a Lambda function it will incur a penalty hit in latency – time that is necessary to bootstrap a new container that will run your Lambda code. The definition of "first time" is fuzzy, but word on the street is that you should expect a new container (i.e. a “first-time” event) each time your Lambda function hasn’t been invoked for more than 5 minutes . If your application makes subsequent calls to your Lambda function within 5 minutes , you can expect that the same container will be reused, thus saving some precious initialization time. Note that AWS makes no guarantee it will reuse the container (i.e. you might just get a new one), but experience shows that in many cases, it does manage to reuse existing containers. As mentioned in the How It Works page, any Node.js variable that is declared outside the handler method remains initialized across calls, as long as the same container is reused. Understanding Container Reuse in AWS Lambda , written in 2014, dives a bit deeper into the whole lifecycle of a Lambda function and is an interesting read, though may not reflect more recent architectural changes to the service. Note that AWS makes no guarantee that containers are maintained alive (though in a "frozen" mode) for 5 minutes, so don’t rely on that specific duration in your code. In our very first attempt to build Lambda functions that would run queries against MongoDB Atlas, our database as a service offering, we noticed the performance impact of repeatedly calling the same Lambda function without trying to reuse the MongoDB database connection. The wait time for the Lambda function to complete was around 4-5 seconds, even with the simplest query, which is unacceptable for any real-world operational application. In our subsequent attempts to declare the database connection outside the handler code, we ran into another issue: we had to call db.close() to effectively release the database handle, lest the Lambda function time out without returning to the caller. The AWS Lambda documentation doesn’t explicitly mention this caveat which seems to be language dependent since we couldn’t reproduce it with a Lambda function written in Python. Fortunately, we found out that Lambda’s context object exposes a callbackWaitsForEmptyEventLoop property, that effectively allows a Lambda function to return its result to the caller without requiring that the MongoDB database connection be closed (you can find more information about callbackWaitsForEmptyEventLoop in the Lambda developer documentation ). This allows the Lambda function to reuse a MongoDB Atlas connection across calls, and reduce the execution time to a few milliseconds (instead of a few seconds). In summary, here are the specific steps you should take to optimize the performance of your Lambda function: Declare the MongoDB database connection object outside the handler method, as shown below in Node.js syntax (this step is required for any language, not just Node.js): 'use strict' var MongoClient = require('mongodb').MongoClient; let cachedDb = null; In the handler method, set context.callbackWaitsForEmptyEventLoop to false before attempting to use the MongoDB database connection object (this step is only required for Node.js Lambda functions): exports.handler = (event, context, callback) => { context.callbackWaitsForEmptyEventLoop = false; Try to re-use the database connection object using the MongoDB.connect(Uri) method only if it is not null and db.serverConfig.isConnected() returns true (this step is required for any language, not just Node.js): function connectToDatabase(uri) { if (cachedDb && cachedDb.serverConfig.isConnected()) { console.log('=> using cached database instance'); return Promise.resolve(cachedDb); } const dbName = 'YOUR_DATABASE_NAME'; return MongoClient.connect(uri) .then(client => { cachedDb = client.db(dbName); return cachedDb; }); } Do NOT close the database connection! (so that it can be reused by subsequent calls). The Serverless development with Node.js, AWS Lambda and MongoDB Atlas tutorial post makes use of all these best practices so I recommend that you take the time to read it. The more experienced developers can also find optimized Lambda Node.js functions (with relevant comments) in: a Node.js Lambda function using callbacks a Node.js Lambda function using promises I’d love to hear from you, so if you have any question or feedback, don’t hesitate to leave them below. Additionally, if you’d like to learn more about building serverless applications with MongoDB Atlas, I highly recommend our webinar below where we have an interactive tutorial on serverless architectures with AWS Lambda. Watch Serverless Architectures with AWS Lambda and MongoDB Atlas About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner . Learn more about using MongoDB with AWS , either self-managed or with our fully-managed database as a service, MongoDB Atlas. You can also check out information about the estimated cost of running MongoDB on AWS with MongoDB Atlas.

April 10, 2017

AWS Step Functions: Integrating MongoDB Atlas, Twillio,& AWS Simple Email Service - Part 1

A few weeks ago, I took a close look at AWS Lambda functions and I hope you've enjoyed that tutorial. I highly suggest that you first read it if you have no previous experience with AWS Lambda, as I will build on that knowledge for the second leg of our road trip into developer-centric services available in Amazon Web Services. Indeed, I'd like to continue our discovery journey by investigating AWS Step Functions . If that name doesn't ring a bell, don't worry: Step Functions were recently introduced at AWS re:Invent 2016 and are still relatively unknown, despite their incredible potential, which I'll try to make apparent in this post. Lambda functions are great, but... If you're building a real cloud native app, you probably already know that your app won't be able to rely on one single Lambda function. As a matter of fact, it will rely on a multitude of functions that interact with multiple systems, such as databases, message queues and even physical or virtual servers (yes, even in a serverless world, there are still servers!). And as the screenshot below tries to depict, your app will need to call them at certain times, in a specific order and under specific conditions. @Copyright Amazon Web Services (from the AWS:reInvent 2016 Step Functions SVR201 session ) Orchestrating all these calls is a tedious endeavor that falls into the developer's lap (you!). Wouldn't it be nice if there was some sort of cloud technology that would help us reliably deal with all that coordination work? Introducing AWS Step Functions In a nutshell, Step Functions are a cloud workflow engine: they aim at solving the issue of orchestrating multiple serverless functions (such as AWS Lambdas) without having to rely on the application itself to perform that orchestration. Essentially, Step Functions allow us to design visual workflows (or "state machines" as AWS refers to) that coordinate the order and conditions in which serverless functions should be called. Surely enough, the concepts of orchestration and workflows have been around for quite some time so there's nothing groundbreaking about them. As a matter of fact, AWS even released its own Simple Workflow Service back in 2012, before serverless had become the cool kid on the block. What's interesting about Step Functions though is that they provide an integrated environment primarily designed to ease the orchestration of AWS Lambda functions. And as Lambda functions become more and more popular, AWS Step Functions turn out to be exactly what we need! So what's a good use case to employ Step Functions? For those of you who have no idea what the screenshot above means, don't worry! In my next post, I'll dive into the technical details of the sample app it's taken from. For now, let's just say that it's the visual representation of the state machine I built. But you may ask yourself: what does this state machine do exactly and why is it relevant to the topic today? Here's the fictitious (but probably not too hypothetical) use case I tried to solve with it: you went to a great italian restaurant in New York but you don't quite remember its exact name. But the food was so delicious you'd really like to go back there! (you might think only Dory - or me - does not to remember an amazing restaurant but I'm sure that happens even to the best of us). Wouldn't it be useful if you could get notified instantly about the possible restaurant matches with their exact names and addresses in your area? Ideally, if there happens to be one match only, you'd like to get the notification via text message (or "SMS" in non-US parlance). But if there are a lot of matches, a text message might be difficult to read, so you'd rather get an email instead with the list of restaurants matching your search. Now, I'm quite sure that service already exists (Yelp, anyone?) but I thought it was a good use case to demonstrate how Step Functions can help you solve a business process requirement, as well as Step Functions’ ability to easily mash up different APIs and services together into one single workflow. How did I go about building such a step function? As I was envisioning this sample app built with AWS Step Functions, I thought about the required components I'd have to build, and then boiled them down to 3 AWS Lambda functions: A GetRestaurants function that queries a collection of restaurants stored in a MongoDB Atlas database. A SendBySMS function that sends a text message using SMS by Twilio if the result of the GetRestaurants query only returns one restaurant. A SendByEmail function that sends an email using AWS Simple Email Service if the GetRestaurants function returns more than one restaurant. If you look closely at the screenshot above, you will probably notice I seemingly forgot a step: there's indeed a fourth Lambda helper function named CountItems whose purpose is simply to count the items returned by the GetRestaurants function and pass that count value on to the NotificationMethodChoice branching logic. Granted, I could have easily merged that helper function into the GetRestaurants function but I chose to leave it because I figured it was a good way to experiment with Step Functions' inputs and outputs flexibility and showcase their power to you (more about this topic in my next post). It's a Step Functions technique I've used extensively to pass my initial input fields down to the latest SendBy* Lambda functions. I hope you liked this short introduction to AWS Step Functions and the use case of the sample app I built to demonstrate its usefulness. You can now read Part 2 here ! Enjoyed this post? Replay our webinar where we have an interactive tutorial on serverless architectures with AWS Lambda. Watch Serverless Architectures with AWS Lambda and MongoDB Atlas About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

March 30, 2017

Serverless development with Node.js, AWS Lambda and MongoDB Atlas

The developer landscape has dramatically changed in recent years. It used to be fairly common for us developers to run all of our tools (databases, web servers, development IDEs…) on our own machines, but cloud services such as GitHub , MongoDB Atlas and AWS Lambda are drastically changing the game. They make it increasingly easier for developers to write and run code anywhere and on any device with no (or very few) dependencies. A few years ago, if you crashed your machine, lost it or simply ran out of power, it would have probably taken you a few days before you got a new machine back up and running with everything you need properly set up and configured the way it previously was. With developer tools in the cloud, you can now switch from one laptop to another with minimal disruption. However, it doesn’t mean everything is rosy. Writing and debugging code in the cloud is still challenging; as developers, we know that having a local development environment, although more lightweight, is still very valuable. And that’s exactly what I’ll try to show you in this blog post: how to easily integrate an AWS Lambda Node.js function with a MongoDB database hosted in MongoDB Atlas , the DBaaS (database as a service) for MongoDB. More specifically, we’ll write a simple Lambda function that creates a single document in a collection stored in a MongoDB Atlas database. I’ll guide you through this tutorial step-by-step, and you should be done, and running mongodb on AWS in less than an hour. Let’s start with the necessary requirements to get you up and running: An Amazon Web Services account available with a user having administrative access to the IAM and Lambda services. If you don’t have one yet, sign up for a free AWS account . A local machine with Node.js (I told you we wouldn’t get rid of local dev environments so easily…). We will use Mac OS X in the tutorial below but it should be relatively easy to perform the same tasks on Windows or Linux. A MongoDB Atlas cluster alive and kicking. If you don’t have one yet, sign up for a free MongoDB Atlas account and create a cluster in just a few clicks. You can even try our M0, free cluster tier , perfect for small-scale development projects!). Now that you know about the requirements, let’s talk about the specific steps we’ll take to write, test and deploy our Lambda function: MongoDB Atlas is by default secure, but as application developers, there are steps we should take to ensure that our app complies with least privilege access best practices . Namely, we’ll fine-tune permissions by creating a MongoDB Atlas database user with only read/write access to our app database. We will set up a Node.js project on our local machine, and we’ll make sure we test our lambda code locally end-to-end before deploying it to Amazon Web Services. We will then create our AWS Lambda function and upload our Node.js project to initialize it. Last but not least, we will make some modifications to our Lambda function to encrypt some sensitive data (such as the MongoDB Atlas connection string) and decrypt it from the function code. A short note about VPC Peering I’m not delving into the details of setting up VPC Peering between our MongoDB Atlas cluster and AWS Lambda for 2 reasons: 1) we already have a detailed VPC Peering documentation page and a VPC Peering in Atlas post that I highly recommend and 2) M0 clusters (which I used to build that demo) don’t support VPC Peering . Here’s what happens if you don’t set up VPC Peering though: You will have to add the infamous 0.0.0.0/0 CIDR block to your MongoDB Atlas cluster IP Whitelist because you won’t know which IP address AWS Lambda is using to make calls to your Atlas database. You will be charged for the bandwidth usage between your Lambda function and your Atlas cluster. If you’re only trying to get this demo code to write, these 2 caveats are probably fine, but if you’re planning to deploy a production-ready Lambda-Atlas integration, setting up VPC Peering is a security best practice we highly recommend. M0 is our current free offering; check out our MongoDB Atlas pricing page for the full range of available instance sizes. As a reminder, for development environments and low traffic websites, M0, M10 and M20 instance sizes should be fine. However, for production environments that support high traffic applications or large datasets, M30 or larger instances sizes are recommended. Setting up security in your MongoDB Atlas cluster Making sure that your application complies with least privilege access policies is crucial to protect your data from nefarious threats. This is why we will set up a specific database user that will only have read/write access to our travel database. Let’s see how to achieve this in MongoDB Atlas: On the Clusters page, select the Security tab, and press the Add New User button In the pop-up window that opens, add a user name of your choice (such as lambdauser ): In the User Privileges section, select the Show Advanced Options link. This allows us to assign read/write on a specific database, not any database. You will then have the option to assign more fine-grained access control privileges: In the Select Role dropdown list, select readWrite and fill out the Database field with the name of the database you’ll use to store documents. I have chosen to name it travel . In the Password section, use the Autogenerate Secure Password button (and make a note of the generated password) or set a password of your liking. Then press the Add User button to confirm this user creation. Let’s grab the cluster connection string while we’re at it since we’ll need it to connect to our MongoDB Atlas database in our Lambda code: Assuming you already created a MongoDB Atlas cluster , press the Connect button next to your cluster: Copy the URI Connection String value and store it safely in a text document. We’ll need it later in our code, along with the password you just set. Additionally, if you aren’t using VPC Peering, navigate to the IP Whitelist tab and add the 0.0.0.0/0 CIDR block or press the Allow access from anywhere button. As a reminder, this setting is strongly NOT recommended for production use and potentially leaves your MongoDB Atlas cluster vulnerable to malicious attacks. Create a local Node.js project Though Lambda functions are supported in multiple languages, I have chosen to use Node.js thanks to the growing popularity of JavaScript as a versatile programming language and the tremendous success of the MEAN and MERN stacks (acronyms for M ongoDB, E xpress.js, A ngular/ R eact, N ode.js - check out Andrew Morgan’s excellent developer-focused blog series on this topic). Plus, to be honest, I love the fact it’s an interpreted, lightweight language which doesn’t require heavy development tools and compilers. Time to write some code now, so let’s go ahead and use Node.js as our language of choice for our Lambda function. Start by creating a folder such as lambda-atlas-create-doc mkdir lambda-atlas-create-doc && cd lambda-atlas-create-doc Next, run the following command from a Terminal console to initialize our project with a package.json file npm init You’ll be prompted to configure a few fields. I’ll leave them to your creativity but note that I chose to set the entry point to app.js (instead of the default index.js ) so you might want to do so as well. We’ll need to use the MongoDB Node.js driver so that we can connect to our MongoDB database (on Atlas) from our Lambda function, so let’s go ahead and install it by running the following command from our project root: npm install mongodb --save We’ll also want to write and test our Lambda function locally to speed up development and ease debugging, since instantiating a lambda function every single time in Amazon Web Services isn’t particularly fast (and debugging is virtually non-existent, unless you’re a fan of the console.log() function). I’ve chosen to use the lambda-local package because it provides support for environment variables (which we’ll use later): (sudo) npm install lambda-local -g Create an app.js file. This will be the file that contains our lambda function: touch app.js Now that you have imported all of the required dependencies and created the Lambda code file, open the app.js file in your code editor of choice (Atom, Sublime Text, Visual Studio Code…) and initialize it with the following piece of code: 'use strict' var MongoClient = require('mongodb').MongoClient; let atlas_connection_uri; let cachedDb = null; exports.handler = (event, context, callback) => { var uri = process.env['MONGODB_ATLAS_CLUSTER_URI']; if (atlas_connection_uri != null) { processEvent(event, context, callback); } else { atlas_connection_uri = uri; console.log('the Atlas connection string is ' + atlas_connection_uri); processEvent(event, context, callback); } }; function processEvent(event, context, callback) { console.log('Calling MongoDB Atlas from AWS Lambda with event: ' + JSON.stringify(event)); } Let’s pause a bit and comment the code above, since you might have noticed a few peculiar constructs: The file is written exactly as the Lambda code Amazon Web Services expects (e.g. with an “exports.handler” function). This is because we’re using lambda-local to test our lambda function locally, which conveniently lets us write our code exactly the way AWS Lambda expects it. More about this in a minute. We are declaring the MongoDB Node.js driver that will help us connect to and query our MongoDB database. Note also that we are declaring a cachedDb object OUTSIDE of the handler function. As the name suggests, it's an object that we plan to cache for the duration of the underlying container AWS Lambda instantiates for our function. This allows us to save some precious milliseconds (and even seconds) to create a database connection between Lambda and MongoDB Atlas. For more information, please read my follow-up blog post on how to optimize Lambda performance with MongoDB Atlas . We are using an environment variable called MONGODB_ATLAS_CLUSTER_URI to pass the uri connection string of our Atlas database, mainly for security purposes: we obviously don’t want to hardcode this uri in our function code, along with very sensitive information such as the username and password we use. Since AWS Lambda supports environment variables since November 2016 (as the lambda-local NPM package does), we would be remiss not to use them. The function code looks a bit convoluted with the seemingly useless if-else statement and the processEvent function but it will all become clear when we add decryption routines using AWS Key Management Service (KMS). Indeed, not only do we want to store our MongoDB Atlas connection string in an environment variable, but we also want to encrypt it (using AWS KMS) since it contains highly sensitive data (note that you might incur charges when you use AWS KMS even if you have a free AWS account). Now that we’re done with the code comments, let’s create an event.json file (in the root project directory) and fill it with the following data: { "address" : { "street" : "2 Avenue", "zipcode" : "10075", "building" : "1480", "coord" : [ -73.9557413, 40.7720266 ] }, "borough" : "Manhattan", "cuisine" : "Italian", "grades" : [ { "date" : "2014-10-01T00:00:00Z", "grade" : "A", "score" : 11 }, { "date" : "2014-01-16T00:00:00Z", "grade" : "B", "score" : 17 } ], "name" : "Vella", "restaurant_id" : "41704620" } (in case you’re wondering, that JSON file is what we’ll send to MongoDB Atlas to create our BSON document) Next, make sure that you’re set up properly by running the following command in a Terminal console: lambda-local -l app.js -e event.json -E {\"MONGODB_ATLAS_CLUSTER_URI\":\"mongodb://lambdauser:$PASSWORD@lambdademo-shard-00-00-7xh42.mongodb.net:27017\,lambdademo-shard-00-01-7xh42.mongodb.net:27017\,lambdademo-shard-00-02-7xh42.mongodb.net:27017/$DATABASE?ssl=true\&replicaSet=lambdademo-shard-0\&authSource=admin\"} If you want to test it with your own cluster URI Connection String (as I’m sure you do), don’t forget to escape the double quotes, commas and ampersand characters in the E parameter, otherwise lambda-local will throw an error (you should also replace the $PASSWORD and $DATABASE keywords with your own values). After you run it locally, you should get the following console output: If you get an error, check your connection string and the double quotes/commas/ampersand escaping (as noted above). Now, let’s get down to the meat of our function code by customizing the processEvent() function and adding a createDoc() function: function processEvent(event, context, callback) { console.log('Calling MongoDB Atlas from AWS Lambda with event: ' + JSON.stringify(event)); var jsonContents = JSON.parse(JSON.stringify(event)); //date conversion for grades array if(jsonContents.grades != null) { for(var i = 0, len=jsonContents.grades.length; i < len; i++) { //use the following line if you want to preserve the original dates //jsonContents.grades[i].date = new Date(jsonContents.grades[i].date); //the following line assigns the current date so we can more easily differentiate between similar records jsonContents.grades[i].date = new Date(); } } //the following line is critical for performance reasons to allow re-use of database connections across calls to this Lambda function and avoid closing the database connection. The first call to this lambda function takes about 5 seconds to complete, while subsequent, close calls will only take a few hundred milliseconds. context.callbackWaitsForEmptyEventLoop = false; try { if (cachedDb == null) { console.log('=> connecting to database'); MongoClient.connect(atlas_connection_uri, function (err, client) { cachedDb = client.db('travel'); return createDoc(cachedDb, jsonContents, callback); }); } else { createDoc(cachedDb, jsonContents, callback); } } catch (err) { console.error('an error occurred', err); } } function createDoc (db, json, callback) { db.collection('restaurants').insertOne( json, function(err, result) { if(err!=null) { console.error("an error occurred in createDoc", err); callback(null, JSON.stringify(err)); } else { console.log("Kudos! You just created an entry into the restaurants collection with id: " + result.insertedId); callback(null, "SUCCESS"); } //we don't need to close the connection thanks to context.callbackWaitsForEmptyEventLoop = false (above) //this will let our function re-use the connection on the next called (if it can re-use the same Lambda container) //db.close(); }); }; Note how easy it is to connect to a MongoDB Atlas database and insert a document, as well as the small piece of code I added to translate JSON dates (formatted as ISO-compliant strings) into real JavaScript dates that MongoDB can store as BSON dates . You might also have noticed my performance optimization comments and the call to context.callbackWaitsForEmptyEventLoop = false . If you're interested in understanding what they mean (and I think you should!), please refer to my follow-up blog post on how to optimize Lambda performance with MongoDB Atlas . You’re now ready to fully test your Lambda function locally. Use the same lambda-local command as before and hopefully you’ll get a nice “Kudos” success message: If all went well on your local machine, let’s publish our local Node.js project as a new Lambda function! Create the Lambda function The first step we’ll want to take is to zip our Node.js project, since we won’t write the Lambda code function in the Lambda code editor. Instead, we’ll choose the zip upload method to get our code pushed to AWS Lambda. I’ve used the zip command line tool in a Terminal console, but any method works (as long as you zip the files inside the top folder, not the top folder itself!) : zip -r archive.zip node_modules/ app.js package.json Next, sign in to the AWS Console and navigate to the IAM Roles page and create a role (such as LambdaBasicExecRole ) with the AWSLambdaBasicExecutionRole permission policy: Let’s navigate to the AWS Lambda page now. Click on Get Started Now (if you’ve never created a Lambda function) or on the Create a Lambda function button. We’re not going to use any blueprint and won’t configure any trigger either, so select Configure function directly in the left navigation bar: In the Configure function page, enter a Name for your function (such as MongoDB_Atlas_CreateDoc ). The runtime is automatically set to Node.js 4.3 , which is perfect for us, since that’s the language we’ll use. In the Code entry type list, select Upload a .ZIP file , as shown in the screenshot below: Click on the Upload button and select the zipped Node.js project file you previously created. In the Lambda function handler and role section, modify the Handler field value to app.handler (why? here’s a hint: I’ve used an app.js file, not an index.js file for my Lambda function code...) and choose the existing LambdaBasicExecRole role we just created: In the Advanced Settings section, you might want to increase the Timeout value to 5 or 10 seconds, but that’s always something you can adjust later on. Leave the VPC and KMS key fields to their default value (unless you want to use a VPC and/or a KMS key) and press Next . Last, review your Lambda function and press Create function at the bottom. Congratulations, your Lambda function is live and you should see a page similar to the following screenshot: But do you remember our use of environment variables? Now is the time to configure them and use the AWS Key Management Service to secure them! Configure and secure your Lambda environment variables Scroll down in the Code tab of your Lambda function and create an environment variable named MONGODB_ATLAS_CLUSTER_URI and set it to your Atlas cluster URI value. At this point, you could press the Save and test button at the top of the page, but for additional (and recommended) security, we’ll encrypt that connection string. Check the Enable encryption helpers check box and if you already created an encryption key, select it (otherwise, you might have to create one - it’s fairly easy): Next, select the Encrypt button for the MONGODB_ATLAS_CLUSTER_URI variable: Back in the inline code editor, add the following line at the top: const AWS = require('aws-sdk'); and replace the contents of the “else” statement in the “exports.handler” method with the following code: const kms = new AWS.KMS(); kms.decrypt({ CiphertextBlob: new Buffer(uri, 'base64') }, (err, data) => { if (err) { console.log('Decrypt error:', err); return callback(err); } atlas_connection_uri = data.Plaintext.toString('ascii'); processEvent(event, context, callback); }); (hopefully the convoluted code we originally wrote makes sense now!) If you want to check the whole function code I’ve used, check out the following Gist . And for the Git fans, the full Node.js project source code is also available on GitHub . Now press the Save and test button and in the Input test event text editor, paste the content of our event.json file: Scroll and press the Save and test button. If you configured everything properly, you should receive the following success message in the Lambda Log output: Kudos! You can savor your success a few minutes before reading on. What’s next? I hope this AWS Lambda-MongoDB Atlas integration tutorial provides you with the right steps for getting started in your first Lambda project. You should now be able to write and test a Lambda function locally and store sensitive data (such as your MongoDB Atlas connection string) securely in AWS KMS. So what can you do next? If you don’t have a MongoDB Atlas account yet, it’s not too late to create one ! If you’re not familiar with the MongoDB Node.js driver , check out our Node.js driver documentation to understand how to make the most of the MongoDB API. Additionally, we also offer an online Node.js course for the Node.js developers who are getting started with MongoDB. Learn how to visualize the data you created with your Lambda function, download MongoDB Compass and read Visualizing your data with MongoDB Compass to learn how to connect it to MongoDB Atlas. Planning to build a lot of Lambda functions? Learn how to orchestrate them with AWS Step Functions by reading our Integrating MongoDB Atlas, Twilio and AWS Simple Email Service with AWS Step Functions post . Learn how to integrate MongoDB and AWS Lambda in a more complex scenario, check out our more advanced blog post: Developing a Facebook Chatbot with AWS Lambda and MongoDB Atlas . Learn more about running MongoDB on AWS And of course, don’t hesitate to ask us any questions or leave your feedback in a comment below. Happy coding! Enjoyed this post? Replay our webinar where we have an interactive tutorial on serverless architectures with AWS Lambda. Watch Serverless Architectures with AWS Lambda and MongoDB Atlas About the Author - Raphael Londner Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at @rlondner

March 8, 2017