GIANT Stories at MongoDB

MongoDB Hackathon Guide

Michael Lynn

hackathon

This guide was created to help you through the process of leveraging MongoDB as part of your hackathon project.

Connecting MongoDB Stitch to Google Places

One of the services that make available a wealth of data via API, is Google Places.

Imagine we want to provide users of our application with information about a business with whom we partner. Insurance companies do this with providers of accommodations, transportation, and healthcare. We don’t want to maintain this information, or own it - rather, we’d prefer to leverage a service that provides this information about these service providers. Google Places is just such a service provider.

For this application, we’ll use the following Stitch components to integrate MongoDB with Google Places.

Stitch Functions

Stitch functions are written in JavaScript ES6 and can be called from our SDKs, Triggers, or Webhooks and are great for coordinating data access or doing light processing alongside a query or insert. Communicating with data provider services such as Google Places is as simple as leveraging an HTTP service within a serverless stitch function:

    
const http = context.services.get("GooglePlaces");
 return http
   .get({url: GooglePlacesSearchURL})
   .then(resp=>{
       //The response body is encoded as raw BSON.Binary. Parse it to JSON.
       var search_result = EJSON.parse(resp.body.text());
    

Stitch’s Functions also let you reference context – such as services, variables, or user information – in order to make it easier to leverage services and information across your entire application. Stitch also provides several third-party services including AWS, Twilio, and Github.

Stitch Services

The HTTP service that we create here will also have an incoming webhook, meaning that it can make outgoing HTTP requests within Stitch Functions, but also handle incoming HTTP services.

Stitch Trigger

Stitch Triggers enable reactivity to inserts, updates, deletes, and replaces that occur in the database. In our case, an insert will trigger execution of a function.

Figure 1. Trigger Configuration

Building Your Application

Let’s take a look at how all the pieces of this application fit together –

Figure 2. Stitch Architectural Diagram
  1. In step 1, an application accepts input either from a user or as a result of some action the user performed (using geofencing, for example.) The input, in our case, will be the name of a business. The application will insert a document with the name of the business into MongoDB.
  2. The firing of the trigger is automatic because we configured it to watch for inserts or updates to our database.
  3. The trigger executes a custom function called getGooglePlaceInfo then captures and forwards the entire inserted document.
  4. Next, in step 4, the function we created invokes the HTTP Webhook we created. The webhook conducts the conversation between Google Places and Stitch.
  5. In step 5, Google Places will respond with a JSON document containing the requested information.

The function will catch this JSON information and update the MongoDB document. It is worth saying that the function can also manipulate the data before inserting it. Allowing it meet all your project requirements (format, types, calculations). As an example, the function may create a new GeoJSON object from the Google coordinates. All of this is done in step 6.

In Conclusion

We’ve taken a very brief look at how leveraging MongoDB Atlas, Stitch, and Triggers in conjunction with a data API service such as Google Places transforms applications into intelligent apps users will truly love to use. Because we’re adding data without having to bother the user, the application becomes much more usable, much more valuable. MongoDB Stitch and Triggers give your application the ability to react to changes in the database. Then leverage integration with external services to fetch in-context data to enrich your applications’ data further. This improves both usability and value to the user.

Without MongoDB Stitch, a developer would have had to contend with building an application server, dealing with management, availability, scalability, and backup and restoration of the data.

Oh, and did we mention that Stitch provides other benefits as well? It leverages Atlas security, adds third-party authentication and granular, field-level access controls to MongoDB data. This gives the ability for users to retrieve data anywhere. Without developers having to create REST APIs from scratch, secure and maintain them?

The content described in this blog article is publically available and can be found here: https://github.com/julienmongodb/mongodb-googleplaces

Introduction to Serverless Functions in MongoDB Stitch

Michael Lynn

Serverless and Functions as a Service are relatively new development paradigms that are gaining in popularity. Let's take a quick look at what MongoDB Stitch offers in this space.

Functions in Stitch are written in JavaScript ECMAScript and can be called from other scripts in Stitch, or from external JavaScript leveraging the client SDK. Here's an example of a function in Stitch:

exports = function(message) {
  const mongodb = context.services.get("mongodb-atlas");
  const coll = mongodb.db("db").collection("users");
  const twilio = context.services.get("my-twilio-service");
  const yourTwilioNumber = context.values.get("twilioNumber");
  coll.find().toArray().then(users => {
    users.forEach(user => twilio.send({
      to: user.phone,
      from: yourTwilioNumber,
      body: message
    })
  );
});

A few things might stand out about the structure and content of this script. Let's take a closer look at some of these components.

The first thing you'll notice is exports. All Stitch functions run the JavaScript function assigned to the global variable exports. This is similar in theory and practice to the Node.js module.exports object.

Next, you're likely to see references to context. Access to resources from within Stitch scripts is facilitated through the use of a context object. This object has several elements:

context.services

context.services gives you access to pre-configured third-party services. There is one built-in service called "mongodb-atlas". This provides access to the underlying database within Atlas. Other services can be instantiated for third-party services including GitHub, AWS, Twilio and more. These additional services will be accessed using the name you provide during configuration.

const db = context.services.get("mongodb-atlas").db("ecommerce");

Notice how easily I'm able to begin accessing my database, and collection. context.values: Values are named constants that you can use in MongoDB Stitch functions and rules. These are like global variables available to all functions in a Stitch application. Similarly, we can configure access to third party services such as twilio as shown in the example.

context.functions

context.functions enables you to reference other serverless functions written and hosted in Stitch.

exports = function(a, b) {
 return context.functions.execute("sum", a, -1 * b);
};

context.users

context.users provides a view of the currently authenticated user.

{
  "id": "5a01f135b6fc810a19421c12",
  "type": "server",
  "data": {
    "name": "api-key"
  },
  "identities": [{
    "id": "5a09f135b6fc810f19421c13",
    "provider_type": "api-key"
  }]
}

context.values

Values are named constants that you can use in MongoDB Stitch functions and rules. To access a value in functions, use the context.values variable.

exports = function() {
    return context.values.get("test");
}

Utilities

Lastly, a resource available to developers of serverless functions in Stitch that we're not representing in the example function, but that bears mentioning is the set of utilities exposed as methods in Stitch functions. There are several pre-imported JavaScript libraries available for your serverless functions:

Function Description
console can be used to output to the debug console and console logs. console.log("Hello!")
JSON can be used to convert between string and object representations of standard JSON.
EJSON can be used to convert between string and object representations of MongoDB Extended JSON.
BSON can be used to construct and manipulate BSON types.
utils.crypto provides methods for working with cryptographic algorithms.

Utilities and context objects leveraged as a part of your application make it possible to create rich, powerfully integrated applications that don't need a lot of the boilerplate code set up.

Developing applications leveraging serverless is a bit different,  but once you get used to leveraging the tools available, it is extremely powerful and helps you create better applications – and do it faster as well. It's a huge benefit to using MongoDB Stitch in your development cycle.

To read more, and get started, check out the documentation, sign up for a free Atlas account and begin writing a serverless application today.

Welcome to Hacktoberfest 2018!

Hacktoberfest is a month-long celebration of open source software, started originally by our friends at DigitalOcean, and held in partnership with GitHub and Twilio.

Creating a Data Enabled API in 10 Minutes with MongoDB Stitch

Michael Lynn

api

Creating an API that exposes data doesn’t have to be complicated. With MongoDB Stitch, you can create a data enabled endpoint in about 10 minutes or less.

At the heart of the entire process is MongoDB Stitch’s Services. There are several from which to choose and to create a data enabled endpoint, you’ll choose the HTTP Service with a Webhook.

Adding a Stitch Service

When you create an HTTP Service, you’re enabling access to this service from Stitch’s serverless functions in the form of an object called context.services. More on that later when we create a serverless function attached to this service.

Name and add the service and you’ll then get to create an “Incoming Webhook”. This is the process that will be contacted when your clients request data of your API.

Call the webhook whatever you like, and set the parameters as you see below:

We’ll create this API to respond with results to GET requests. Next up, you’ll get to create the logic in a function that will be executed whenever your API is contacted with a GET request.

Defining the Function

Before we modify this script to return data, let’s take a look at the Settings tab — this is where you’ll find the URL where your clients will reach your API.

That’s it — you’ve configured your API. It’s not going to do anything interesting. In fact, the default responds to requests with “Hello World”. Let’s add some data.

Assuming we have a database called mydatabase and a collection of contact data called mycollection, let’s write a function for our service:

Creating a function to return data from a collection in MongoDB Stitch

And here’s the source:

exports = function(payload) {
 const mongodb = context.services.get(“mongodb-atlas”);
 const mycollection = mongodb.db(“mydatabase”).collection(“mycollection”);
 return mycollection.find({}).toArray();
};

This exposes all documents in the database whenever a client calls the webhook URL associated with our HTTP Service. That’s it.

Let’s use Postman to show how this works. Grab your API Endpoint URL from the service settings screen. Mine is as follows — yours will differ.


https://webhooks.mongodb-stitch.com/api/client/v2.0/app/devrel-mrmrq/service/api/incoming_webhook/webhook0

Paste that into the GET URL field and hit Send, you should see something similar to the following:

Check out the GitHub Repository to review the code and try it yourself and review the screencast where I create a data enabled API in 10 Minutes with MongoDB Stitch.

Want to try this for yourself? Sign up for a free MongoDB Atlas account. Looking to leverage an API for integration with MongoDB? Read Andrew Morgan’s article on Building a REST API with MongoDB Stitch.

Additional Resources:

Hacking for Resilience with MongoDB Stitch at PennApps XVIII

Michael Lynn

Hosted and run by students at The University of Pennslyvania, PennApps is billed as “The original hackathon.” The eighteenth iteration of the nation's first college hackathon kicked off on Friday, September 7th at 7:30 pm and with participants hacking away until Sunday, September 9th at 8:00 am.

MongoDB was a technology choice for many of the hackathon teams, and as the weekend progressed, participants leveraging MongoDB stopped by to share details of their projects.

One application that stood out immediately was pitched by its team as a “100% offline communication app” called Babble. The trio from Carnegie Mellon University spoke enthusiastically about the app they were developing.

“Babble will be the world’s first chat platform that is able to be installed, setup, and used 100% offline,” said Manny Eppinger, a Junior studying CS at CMU.

The Babble development team
From left to right: Manny Eppinger, Michael Lynn (MongoDB), Conlon Novak, and Aneek Mukerjee

In keeping with the PennApps XVIII theme of “HACK-FOR-RESILIENCE”, a critical design goal of Babble is to be able to support 100% offline utilization including application installation via near-field communication (NFC).

Imagine you’re in the midst of a disaster scenario and the internet infrastructure is damaged, or severely degraded. Communication into, and out of these areas is absolutely critical. Babble asks the questions:

  • What if you didn’t have to rely on that infrastructure to communicate?
  • What if you could rely on what you do have -- people, cell phones, and physical proximity?

Working in a peer-to-peer model, each Babble user’s device keeps a localized ledger of all messages that it has sent and received, as well as all of the ledgers of each device that this instance of Babble has been connected directly to via Android Nearby Connections.

The team leveraged MongoDB Stitch and MongoDB Mobile, now in beta to ensure that the app will capture and store chats and communication from its users and when a connection becomes available, automatically sync with the online version of the database.

Babble Stitch Diagram

As hackathon mentors and judges for the event, my team and I were so impressed with the team's vision, and with their innovation that we chose them as recipients of the Best Use of MongoDB Stitch award which includes a prize package valued at $500.

Whether you’re a student hacker, or an engineer simply looking to get your brilliant app idea off the ground, I’d strongly encourage you to take a look at MongoDB Atlas, MongoDB Stitch, and MongoDB Mobile to help you accelerate your innovation cycle and reduce the amount of time you need to spend building and managing servers and replicating boilerplate code.

Check out project Babble on Devpost.
Are you a developer, advocate or similar with a combination of excellent coding and communication skills and a passion for helping other developers be awesome? We’re hiring at MongoDB and we’d love to talk with you.

Integrating MongoDB and Amazon Kinesis for Intelligent, Durable Streams

You can build your online, operational workloads atop MongoDB and still respond to events in real time by kicking off Amazon Kinesis stream processing actions, using MongoDB Stitch Triggers.

Let’s look at an example scenario in which a stream of data is being generated as a result of actions users take on a website. We’ll durably store the data and simultaneously feed a Kinesis process to do streaming analytics on something like cart abandonment, product recommendations, or even credit card fraud detection.

We’ll do this by setting up a Stitch Trigger. When relevant data updates are made in MongoDB, the trigger will use a Stitch Function to call out to AWS Kinesis, as you can see in this architecture diagram:

Figure 1. Architecture Diagram

What you’ll need to follow along:

  1. An Atlas instance
    If you don’t already have an application running on Atlas, you can follow our getting started with Atlas guide here. In this example, we’ll be using a database called streamdata, with a collection called clickdata where we’re writing data from our web-based e-commerce application.
  2. An AWS account and a Kinesis stream
    In this example, we’ll use a Kinesis stream to send data downstream to additional applications such as Kinesis Analytics. This is the stream we want to feed our updates into.
  3. A Stitch application
    If you don’t already have a Stitch application, log into Atlas, and click Stitch Apps from the navigation on the left, then click Create New Application.

Create a Collection

The first step is to create a database and collection from the Stitch application console. Click Rules from the left navigation menu and click the Add Collection button. Type streamdata for the database and clickdata for the collection name. Select the template labeled Users can only read and write their own data and provide a field name where we’ll specify the user id.

Figure 2. Create a collection

Configuring Stitch to talk to AWS

Stitch lets you configure Services to interact with external services such as AWS Kinesis. Choose Services from the navigation on the left, and click the Add a Service button, select the AWS service and set AWS Access Key ID, and Secret Access Key.

Figure 3. Service Configuration in Stitch

Services use Rules to specify what aspect of a service Stitch can use, and how. Add a rule which will enable that service to communicate with Kinesis by clicking the button labeled NEW RULE. Name the rule “kinesis” as we’ll be using this specific rule to enable communication with AWS Kinesis. In the section marked Action, select the API labeled Kinesis and select All Actions.

Figure 4. Add a rule to enable integration with Kinesis

Write a function that uses the service to stream documents into Kinesis

Now that we have a working AWS service, we can use it to put records into a Kinesis stream. The way we do that in Stitch is with Functions. Let’s set up a putKinesisRecord function.

Select Functions from the left-hand menu, and click Create New Function. Provide a name for the function and paste the following in the body of the function.

exports = function(event){
 const awsService = context.services.get('aws');
try{
   awsService.kinesis().PutRecord({
     Data: JSON.stringify(event.fullDocument),
     StreamName: "stitchStream",
     PartitionKey: "1"
      }).then(function(response) {
        return response;
      });
}
catch(error){
  console.log(JSON.parse(error));
}
};
Figure 5. Example Function - putKinesisRecord

Test out the function

Let’s make sure everything is working by calling that function manually. From the Function Editor, Click Console to view the interactive javascript console for Stitch.

Functions called from Triggers require an event. To test execution of our function, we’ll need to pass a dummy event to the function. Creating variables from the console in Stitch is simple. Simply set the value of the variable to a JSON document. For our simple example, use the following:

event = {
   "operationType": "replace",
   "fullDocument": {
       "color": "black",
       "inventory": {
           "$numberInt": "1"
       },
       "overview": "test document",
       "price": {
           "$numberDecimal": "123"
       },
       "type": "backpack"
   },
   "ns": {
       "db": "streamdata",
       "coll": "clickdata"
   }
}
exports(event);

Paste the above into the console and click the button labeled Run Function As. Select a user and the function will execute.

Ta-da!

Putting it together with Stitch Triggers

We’ve got our MongoDB collection living in Atlas, receiving events from our web app. We’ve got our Kinesis stream ready for data. We’ve got a Stitch Function that can put data into a Kinesis stream.

Configuring Stitch Triggers is so simple it’s almost anticlimactic. Click Triggers from the left navigation, name your trigger, provide the database and collection context, and select the database events Stitch will react to with execution of a function.

For the database and collection, use the names from step one. Now we’ll set the operations we want to watch with our trigger. (Some triggers might care about all of them – inserts, updates, deletes, and replacements – while others can be more efficient because they logically can only matter for some of those.) In our case, we’re going to watch for insert, update and replace operations.

Now we specify our putKinesisRecord function as the linked function, and we’re done.

Figure 6. Trigger Configuration in Stitch

As part of trigger execution, Stitch will forward details associated with the trigger event, including the full document involved in the event (i.e. the newly inserted, updated, or deleted document from the collection.) This is where we can evaluate some condition or attribute of the incoming document and decide whether or not to put the record onto a stream.

Test the trigger!

Amazon provides a dashboard which will enable you to view details associated with the data coming into your stream.

Figure 7. Kinesis Stream Monitoring

As you execute the function from within Stitch, you’ll begin to see the data entering the Kinesis stream.

Building some more functionality

So far our trigger is pretty basic – it watches a collection and when any updates or inserts happen, it feeds the entire document to our Kinesis stream. From here we can build out some more intelligent functionality. To wrap up this post, let’s look at what we can do with the data once it’s been durably stored in MongoDB and placed into a stream.

Once the record is in the Kinesis Stream you can configure additional services downstream to act on the data. A common use case incorporates Amazon Kinesis Data Analytics to perform analytics on the streaming data. Amazon Kinesis Data Analytics offers pre-configured templates to accomplish things like anomaly detection, simple alerts, aggregations, and more.

For example, our stream of data will contain orders resulting from purchases. These orders may originate from point-of-sale systems, as well as from our web-based e-commerce application. Kinesis Analytics can be leveraged to create applications that process the incoming stream of data. For our example, we could build a machine learning algorithm to detect anomalies in the data or create a product performance leaderboard from a sliding, or tumbling window of data from our stream.

Figure 8. Amazon Data Analytics - Anomaly Detection Example

Wrapping up

Now you can connect MongoDB to Kinesis. From here, you’re able to leverage any one of the many services offered from Amazon Web Services to build on your application. In our next article in the series, we’ll focus on getting the data back from Kinesis into MongoDB. In the meantime, let us know what you’re building with Atlas, Stitch, and Kinesis!

Resources

MongoDB Atlas

MongoDB Stitch

Amazon Kinesis

Charting a Course to MongoDB Atlas: Part 1 - Preparing for the Journey

Michael Lynn

Cloud

MongoDB Atlas is an automated cloud MongoDB service engineered and run by the same team that builds the database. It incorporates operational best practices we’ve learned from optimizing thousands of deployments across startups and the Fortune 100. You can build on MongoDB Atlas with confidence, knowing you no longer need to worry about database management, setup and configuration, software patching, monitoring, backups, or operating a reliable, distributed database cluster.

MongoDB introduced its Database as a Service offering, in July of 2016 and it’s been a phenomenal success since its launch. Since then, thousands of customers have deployed highly secure, highly scalable and performant MongoDB databases using this service. Among its most compelling features are the ability to deploy Replica Sets in any of the major cloud hosting providers (AWS, Azure, GCP) and the ability to deploy database clusters spanning multiple cloud regions. In this series, I’ll explain the steps you can follow to migrate data from your existing MongoDB database into MongoDB Atlas.

Preparing for the Journey

Before you embark on any journey regardless of the destination, it’s always a good idea to take some time to prepare. As part of this preparation, we’ll review some options for the journey — methods to get your data migrated into MongoDB Atlas — along with some best practices and potential wrong turns to watch out for along the way.

Let’s get a bit more specific about the assumptions I’ve made in this article.

  • You have data that you want to host in MongoDB Atlas.
    • There’s probably no point in continuing from here if you don’t want to end up with your data in MongoDB Atlas.
  • Your data is currently in a MongoDB database.
    • If you have data in some other format, all is not lost — we can help. However, we’re going to address a MongoDB to MongoDB migration in this series. If you have other requirements -- data in another database or another format, for example, let me know you’ll like an article covering migration from some other database to MongoDB and I’ll make that the subject of a future series.
  • Your current MongoDB database is running MongoDB Version 3.0 or greater. MongoDB Atlas supports version 3.4, and 3.6. Therefore, we’ll need to work to get your database upgraded either as part of the migration - or, you can handle that ahead of the migration. We have articles and documentation designed to help you upgrade your MongoDB instances should you need.
  • Your data is in a clustered deployment (Sharded or Replica Set). We’ll cover converting a standalone deployment to a replica set in part 3 of this series.

At a high level, there are 4 basic steps to migrating your data. Let’s take a closer look at the journey:

  1. Deploy a Destination Cluster in MongoDB Atlas
  2. Prepare for the Journey
  3. Migrate the databases
  4. Cutover and Modify Your Applications to use the new MongoDB Atlas-based Deployment

As we approach the journey, it's important to know the various routes from your starting point to your eventual destination. Each route has its considerations and benefits and the choice of which route you choose will ultimately be up to you. Review the following table which presents a list of the available data migration methods from which you may choose.

Method Descriptions Considerations Benefits Version Notes
Live Import Fully automated via the Atlas administrative console. Downtime: Minimal - Cutover Only. Fully automated. From:Version 2.6, 3.0, 3.2, 3.4To: 3.4, 3.6
mongomirror mongomirror is a utility for migrating data from an existing MongoDB replica set to a MongoDB Atlas replica set. mongomirror does not require you to shut down your existing replica set or applications Downtime: Minimal - Cutover Only. Version 2.6 or great 3.4, 3.6
mongorestore mongorestore is a command-line utility program that loads data from either a binary database dump created by mongodump or the standard input. Downtime required Version 2.6 or Greater 3.4, 3.6
mongoimport mongoimport tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool Downtime required

For a majority of deployments, Live Import is the best, most efficient route to get your data into MongoDB Atlas. It offers the ability to keep your existing cluster up and active (but not too active, see considerations.) There are considerations, however. If you’re not located in a region that is geographically close to the US-EAST AWS datacenter, for example, you may encounter unacceptable latency. There are a number of possible concerns you should consider prior to embarking on your migration journey. The following section offers some helpful route guidance to ensure that you’re going in the right direction and moving steadily toward your destination.

Route Guidance for the Migration Journey

If you've made it this far, you’re likely getting ready to embark on a journey that will bring your data into a robust, secure, and scalable environment within MongoDB Atlas. The potential to encounter challenges along the way is real and the likelihood of encountering difficulties depends primarily upon your starting point in that journey. In this section, I’ll discuss some potential issues you may encounter as you prepare for your migration journey. A summary of the potential detours and guidance for each is presented in the following table.

Follow the links in the table to read more about each potential detour and its relevant guidance:

Potential Detour Guidance Reference
Insufficient RAM on Destination Cluster Calculate the RAM required for your application and increase that to account for the migration process requirements How do I calculate how much RAM I need for my application?
Too Much Network Latency Between Source and Destination Reduce Latency, or leverage mongodump/mongorestore instead of Live Import
Insufficient Network Access due to missing IP Whitelist or Firewall Rules Ensure that MongoDB Live Import Application Servers are whitelisted and that corporate firewalls permit access between source, destination
Insufficient user access permissions to source database deployment Ensure that authentication is enabled and that the user credentials granted for source database have required entitlements
Insufficient Oplog Size on Destination Size the operations log appropriately based on the application workload Sizing the Operations Log

Potential Detour: Insufficient RAM on Destination Cluster

Every deployment of MongoDB requires some form of resource to run efficiently. These resource requirements will include things like RAM, CPU, Disk and Network. To ensure acceptable response times and performance of the database, we typically look to the application’s read/write profile to inform the decisions we make about the amounts and sizes of each of these resources we’ll need for our deployment.

The amount of RAM a deployment will require is largely informed by the applications’ demand for data in the database. To approximate RAM requirements, we typically look at the frequently accessed documents in each collection, adding up the total data size and then we increase that by the total size of required indexes. Referred to as the working set, this is typically the approximate amount of RAM we’ll want our deployment to have. A more complete discussion of sizing will be found in the documentation pages on sizing for MongoDB.

Sizing is a tricky task especially for the cost constrained. We obviously don’t want to waste money by over-provisioning servers larger than those we’ll need to support the profile of our users and applications. However, it is important to consider that during our migration, we’ll not only need to account for the application requirements -- we also need to account for the resources required by the migration process itself. Therefore, you will want to ensure that you surpass the requirements for your production implementation when sizing your destination cluster.

Route Guidance: Increase available RAM During Migration

The size of the destination cluster should provide adequate resource across all environmentals (storage, CPU, and Memory) with room to spare. The migration process will require additional CPU and Memory as the destination database is being built from the source. It’s quite common for incoming clusters to be undersized and as a result the migration process fails. If this happens during a migration, you must empty the destination cluster, and resize the cluster to a larger M-Value to increase the amount of available RAM. A great feature of Atlas is that resizing -- in both directions, is extremely easy to do. Whether you’re adding resource (increasing the amount of RAM, Disk, CPU, shards, etc.) or decreasing the same, the process is very simple. Therefore, increasing the resource available on your target environment is painless and easy -- and once the migration completes, you can simply scale back down to a cluster size with less RAM, and CPU.


Potential Detour: Network Latency

Latency is defined as the amount of time it takes for a packet of data to get from one designated point to another. Because the migration process is all about moving packets of data between servers it is by its very nature latency sensitive.

Migrating data into MongoDB Atlas leveraging the Live Import capability involves connecting your source MongoDB Instance to a set of application servers running in the AWS us-east-1 region. These servers act as the conductors running the actually migration process between your source and destination MongoDB Database Servers. A potential detour can crop up when your source MongoDB database deployment exists in a datacenter located far from the AWS us-east-1 region.

Route Guidance: Reduce latency if possible or use mongomirror instead of Live Import

Should your source MongoDB Database servers exist in regions far from these application servers, you may need to leverage mongomirror, mongodump/mongorestore rather than Live Import.


Potential Detour: Network Access

In order to accomplish a migration using Live Import, Atlas streams data through a set of MongoDB-Controller application servers. Atlas provides the IP Address ranges of the MongoDB Live Import servers during the Live Import process. You must be certain to add these IP Address ranges to the IP Whitelist for your Destination cluster.

The migration processes within Atlas run on a set of application servers --- these are the traffic directors. The following is a list of the IP Addresses on which these application servers depend. It is important to ensure that traffic between these servers, your source cluster and the destination cluster is able to freely flow. These addresses are in C.I.D.R. notation.

  • 4.71.186.128/25
  • 4.35.16.128/25
  • 52.72.201.163/32
  • 34.196.196.255/32

An additional area where a detour may be encountered is in the realm of corporate firewall policy.

To avoid these potential detours, ensure that you have the appropriate connectivity from the networks where your source deployment resides to the networks where MongoDB Atlas exists.

Route Guidance: Whitelist the IP Ranges of the MongoDB Live Import Process

These IP ranges will be provided at the start of the migration process. Ensure that you configure the whitelist to enable appropriate access during the migration.


Potential Detour: Insufficient User Rights on Source Deployment

Every deployment of MongoDB should enforce authentication. This will ensure that only appropriate individuals and applications may access your MongoDB data.

A potential detour may arise when you attempt to Live Migrate a database without creating or providing the appropriately privileged user credentials.

If the source cluster enforces authentication, create a user with the following privileges:

  • Read all databases and collections (i.e. readAnyDatabase on the admin database)
  • Read the oplog.

Route Guidance: Ensure Appropriate User Access Permissions on the Source Deployment

Create a SCRAM user and password on each server in the replica set and ensure that this user belongs to roles that have the following permissions:

Read and write to the config database Read all databases and collections. Read the oplog.

For example:

  • For 3.4+ source clusters, a user with both clusterMonitor and backup roles would have the appropriate privileges.
  • For 3.2 source cluster, a user with clusterMonitor, clusterManager, and backup roles would have appropriate privileges.

Specify the username and password to Atlas when prompted by the Live Migration procedure.

Also, once you’ve migrated your data, if the source cluster enforced authentication you must consider that Atlas does not migrate any user or role data to the destination cluster. Therefore, you must re-create the credentials used by your applications on the destination Atlas cluster. Atlas uses SCRAM for user authentication. See Add MongoDB Users for a tutorial on creating MongoDB users in Atlas.


Potential Detour: Insufficient Oplog Size on Destination

The oplog, or operations log is a capped collection that keeps a rolling record of all operations that modify the data stored in your databases. When you create an Atlas cluster to serve as the destination for your migration, by default Atlas creates the oplog size at 5% of the total amount of disk you allocated for the cluster. If the activity profile of your application requires a larger oplog size, you will need to submit a proactive support ticket to have the oplog size increased on your destination cluster.

Route Guidance: Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

As stated previously, the decisions regarding the resources we apply to a given MongoDB Deployment are informed by the profile of the applications that depend on the database. As such, there are certain application read/write profiles or workloads that require a larger than default operations log. These are listed in detail in the documentation pages on the subject of Replica Set Oplog. Here is a summary of the workloads that typically require a larger than normal Oplog:

Updates to Multiple Documents at Once

The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use.

Deletions Equal the Same Amount of Data as Inserts

If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.

Significant Number of In-Place Updates If a significant portion of the workload is updates that do not increase the size of the documents, the database records a large number of operations but does not change the quantity of data on disk.

In Conclusion

Regardless of your starting point, MongoDB provides a robust, secure and scalable destination for your data. MongoDB Atlas Live Import automates and simplifies the process of migrating your data to MongoDB Atlas. The command line version of this utility, called mongomirror, gives users additional control and flexibility around how the data gets migrated. Other options include exporting (mongoexport) and importing (mongoimport) your data manually or even writing your own application to accomplish migration. The decision to use one particular method over another depends upon the size of your database, its geographic location as well as your tolerance for application downtime.

If you choose to leverage MongoDB Atlas Live Import, be aware of the following potential challenges along the journey.

  • Increase available RAM During Migration sufficient for application plus migration requirements.
  • Reduce latency if possible or use mongomirror instead of Live Import.
  • Whitelist the IP Ranges of the MongoDB Live Import Process
  • Ensure Appropriate User Access Permissions on the Source Deployment
  • Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

Now that you’re fully prepared, let’s embark on the journey and I’ll guide you through the process of deploying a cluster in MongoDB Atlas and walk you through migrating your data from an AWS Replica Set.

Introducing the MongoDB Masters Program for 2018

My name is Michael Lynn and I’m the Worldwide Director of Developer Advocacy at MongoDB. I’m incredibly proud to be a part of the Developer Relations and Marketing team here at MongoDB.

A majority of what we do in Developer Advocacy is related to increasing awareness of MongoDB within the community of developers and data scientists. We do this through involvement in a variety of user groups, industry conferences, and events as well as through management of the MongoDB Masters Program.

This program was created to recognize leaders within their community, experts in MongoDB, and professionals who freely share their knowledge. This year’s class includes returning Masters, as well as new members who have distinguished themselves in the past year.

MongoDB Masters in years past have provided valuable product feedback and driven thought leadership in their fields. We look forward to deepening this relationship over the coming year. This year’s class of Masters will also be encouraged to participate in beta testing programs, share their experiences with MongoDB, and continue to expand and broaden their own voices as leaders in the technical community.

The Masters program has been an incredibly rewarding and valuable program for MongoDB and we greatly appreciate the efforts of our most vocal, and most active supporters. This is why we’ve put so much time and effort into creating a program to recognize these individuals and thank them for their contributions.

Master Honorees enjoy benefits ranging from access to the MongoDB Engineering and Product Management teams to discounted MongoDB Atlas Credits.

Preparations are underway for the MongoDB Masters Summit, which will be held on Tuesday, June 26th as part of MongoDB World 2018. We’ll have several speakers and a special Q&A session with Eliot Horowitz, our Co-Founder, and CTO. We encourage all members of our community to register for MongoDB World 2018, meet the Masters in person, and join our Advocacy Hub to start their own path to becoming a MongoDB Master.

So, all this talk of Masters – how, you might be thinking, do I become a Master?

Before I dive into an explanation of the requirements, please take a moment to review the bios of some of the existing Masters. You’ll easily spot some things in common across all of these incredibly talented and accomplished individuals.

Passion

Masters are passionate about technology and about solutions to technical problems. This passion drives these individuals to do things that few technologists will do. While this attribute is common among the existing and past Masters, it’s not easy to measure. You know it when you see it and it’s woven into the careers of many of the people I’ve encountered surrounding this program.

Impact

If passion is fuel, then impact is fire. Impact is the result of the passionate pursuit of worthy causes. Again, this is an attribute easily found in common across our Masters membership. Measuring impact is also difficult because in many cases, especially when dealing with the Masters, the impact of their actions, projects, and even their careers is widespread. Masters are individuals that positively impact their families, teams, companies, and their communities.

Execution

Execution is the spark that ignites fire. Elegant, efficient and effective solutions to technical challenges rarely, if ever, happen by accident. Rather, truly successful solutions require intelligent, deliberate execution – and in most cases, hard work. I strongly encourage you to spend time with any of the Masters and it will become clear that these individuals know how to execute. They know how to get things accomplished.

These are the attributes of a MongoDB Master and to achieve membership, an individual should be passionate about great technology and about solving technical problems. These individuals should have demonstrated, through successful execution, a massively beneficial impact on their company, team and/or community.

Are you interested in becoming a MongoDB Master, or do you think you may already meet the requirements? I would like to invite you to join us at MongoDB World in New York to learn more; consider completing the nomination form below to have yourself or a colleague considered for a MongoDB Masters membership.

MongoDB Masters membership nomination →

MongoDB Presents an Evening With Eliot Horowitz and Stitch

On April 19th, 2018, the MongoDB User Group (MUG) met at the MongoDB HQ in New York City for an evening of conversation, trivia, and a live coding session from MongoDB co-founder and CTO Eliot Horowitz.