GIANT Stories at MongoDB

MongoDB’s BI Connector the Smart Connector for Business Intelligence

Ken W. Alger

Cloud, Atlas, BI

In today's world, data is being produced and stored all around us. Businesses leverage this data to provide insights into what users and devices are doing. MongoDB is a great way to store your data. From the flexible data model and dynamic schema, it allows for data to be stored in rich, multi-dimensional documents. But, most Business Intelligence tools, such as Tableau, Qlik, and Microsoft Excel, need things in a tabular format. This is where MongoDB's Connector for BI (BI Connector) shines.

MongoDB BI Connector

The BI Connector allows for the use of MongoDB as a data source for SQL based business intelligence and analytics platforms. These tools allow for the creation of dashboards and data visualization reports on your data. Leveraging them allows you to extract hidden insights in your data. This allows for more insights into how your customers are using your products.

The MongoDB Connector for BI is a tool for your data toolbox which acts as a translation layer between the database and the reporting tool. The BI Connector itself stores no data. It serves as a bridge between your MongoDB data and business intelligence tools.

MongoDB BI Connector Flow

The BI Connector bridges the tooling gap from local, on-premise, or hosted instances of MongoDB. If you are using MongoDB Atlas and are on an M10 or above cluster, there's an integrated built-in option.

Why Use The BI Connector

Without the BI Connector you often need to perform an Extract, Transform, and Load (ETL) process on your data. Moving it from the "source of truth" in your database to a data lake. With MongoDB and the BI Connector, this costly step can be avoided. Performing analysis on your most current data is possible. In real-time.

There are four components of a business intelligence system. The database itself, the BI Connector, an Open Database Connectivity (ODBC) data source name (DSN), and finally, the business intelligence tool itself. Let's take a look at how to connect all these pieces.

I'll be doing this example in Mac OS X, but other systems should be similar. Before I dive in, there are some system requirements you'll need:

  • A MongoDB Atlas account
  • Administrative access to your system
  • ODBC Manager, and
  • The MongoDB ODBC Driver for DSN
  • Instructions for loading the dataset used in the video in your Atlas cluster can be found here.

    Feel free to leave a comment below if you have questions.

    Get started with MongoDB Atlas today to start using the MongoDB Connector for BI to examine and visualize your data.

MongoDB wins 2018 Customer Impact award at SpringOne Platform!

Sarah Branfman

Cloud, Events, Company

MongoDB has won the Independent Software Vendor 2018 Pivotal Partner Award for Customer Impact , at the PivotalSpringOne Platform summit. This award recognizes partners that have delivered technology contributing to notable customer success.

We are humbled and honored!

In a world with endless options, the most sophisticated and demanding organizations choose to run their business on MongoDB. This is not a coincidence. MongoDB constantly pushes the pace of innovation to addresses the most challenging problems for our customers and strategically partners with leaders like Pivotal to deliver robust solutions to accelerate development and success.

We launched MongoDB for Pivotal Cloud FoundryⓇ (PCF) earlier this year and already have amazing and very public success. For example, Merrill Corporate launched a new category for M&A professionals with MongoDB, Pivotal and Microsoft. Our joint solution yielded 20x faster Deployment, client-identified bugs fixed in hours, 25% increase in sales and a true transformation into a product led technology organization!
Staying close to customer requirements and focused on customer success - that’s why it’s now easier than ever to rapidly deploy MongoDB powered applications on Pivotal Cloud Foundry by abstracting complexities around ensuring a consistent, predictable, and secure underlying infrastructure that can scale. MongoDB will continue with frequent updates and releases as we evolve our product inline with customer needs. Customers can download a BOSH deployed tile for Pivotal Application ServiceⓇ (PAS) and we are thrilled to announce the beta of MongoDB for the Pivotal Container ServiceⓇ (PKS) as well!

Learn more about the tile in these great posts by my colleagues: On Demand MongoDB Enterprise Server on Pivotal Cloud Foundry and MongoDB Enterprise Server for Pivotal Cloud Foundry goes GA

For those with us in Washington D.C. for the SpringOne Platform conference, we have two amazing sessions for you:

Join MongoDB’s Jeff Yemin (Lead Engineer, Database Engineering) and Pivotal’s Christoph Strobl (Software Engineer) for ‘Next Generation MongoDB: Sessions, Streams, Transactions’ and Diana Esteves (MongoDB Senior Engineer) for MongoDB + CredHub = Secure By Default Data Services on PCF and stop by our meeting room to say hello!

We’re honored to have received this prestigious award from Pivotal and look forward to continued success for our joint customers as MongoDB and Pivotal help tackle their biggest challenges!

Building a REST API with MongoDB Stitch

MongoDB Stitch QueryAnywhere often removes the need to create REST APIs, but for the other times, Stitch webhooks let you create them in minutes.

New to MongoDB Atlas — Data Explorer Now Available for All Cluster Sizes

Ken W. Alger

Releases, Cloud, Atlas

At the recent MongoDB .local Chicago event, MongoDB CTO and Co-Founder, Eliot Horowitz made an exciting announcement about the Data Explorer feature of MongoDB Atlas. It is now available for all Atlas cluster sizes, including the free tier.

The easiest way to explore your data

What is the Data Explorer? This powerful feature allows you to query, explore, and take action on your data residing inside MongoDB Atlas (with full CRUD functionality) right from your web browser. Of course, we've thought about security; Data Explorer access and whether or not a user can modify documents is tied to her role within the Atlas Project. Actions performed via the Data Explorer are also logged in the Atlas alerting window.

Bringing this feature to the "shared" Atlas cluster sizes — the free M0s, M2s, and M5s — allows for even faster development. You can now perform actions on your data while developing your application, which is where these shared cluster sizes really shine.

Check out this short video to see the Data Explorer in action.

Atlas is the easiest and fastest way to get started with MongoDB. Deploy a free cluster in minutes.

Recording sensor data with MongoDB Stitch & Electric Imp

Electric Imp devices and cloud services are a great way to get started with IoT. Electric Imp's MongoDB Stitch client library makes it a breeze to integrate with Stitch. This post describes how.

Controlling humidity with a MongoDB Stitch HTTP service and IFTTT

IFTTT is a great cloud service for pairing up cloud and IoT services. This post shows how to invoke an IFTTT webhook from a MongoDB Stitch function, where that webhook controls a dehumidifier via a Smart power plug.

Integrating MongoDB and Amazon Kinesis for Intelligent, Durable Streams

You can build your online, operational workloads atop MongoDB and still respond to events in real time by kicking off Amazon Kinesis stream processing actions, using MongoDB Stitch Triggers.

Let’s look at an example scenario in which a stream of data is being generated as a result of actions users take on a website. We’ll durably store the data and simultaneously feed a Kinesis process to do streaming analytics on something like cart abandonment, product recommendations, or even credit card fraud detection.

We’ll do this by setting up a Stitch Trigger. When relevant data updates are made in MongoDB, the trigger will use a Stitch Function to call out to AWS Kinesis, as you can see in this architecture diagram:

Figure 1. Architecture Diagram

What you’ll need to follow along:

  1. An Atlas instance
    If you don’t already have an application running on Atlas, you can follow our getting started with Atlas guide here. In this example, we’ll be using a database called streamdata, with a collection called clickdata where we’re writing data from our web-based e-commerce application.
  2. An AWS account and a Kinesis stream
    In this example, we’ll use a Kinesis stream to send data downstream to additional applications such as Kinesis Analytics. This is the stream we want to feed our updates into.
  3. A Stitch application
    If you don’t already have a Stitch application, log into Atlas, and click Stitch Apps from the navigation on the left, then click Create New Application.

Create a Collection

The first step is to create a database and collection from the Stitch application console. Click Rules from the left navigation menu and click the Add Collection button. Type streamdata for the database and clickdata for the collection name. Select the template labeled Users can only read and write their own data and provide a field name where we’ll specify the user id.

Figure 2. Create a collection

Configuring Stitch to talk to AWS

Stitch lets you configure Services to interact with external services such as AWS Kinesis. Choose Services from the navigation on the left, and click the Add a Service button, select the AWS service and set AWS Access Key ID, and Secret Access Key.

Figure 3. Service Configuration in Stitch

Services use Rules to specify what aspect of a service Stitch can use, and how. Add a rule which will enable that service to communicate with Kinesis by clicking the button labeled NEW RULE. Name the rule “kinesis” as we’ll be using this specific rule to enable communication with AWS Kinesis. In the section marked Action, select the API labeled Kinesis and select All Actions.

Figure 4. Add a rule to enable integration with Kinesis

Write a function that uses the service to stream documents into Kinesis

Now that we have a working AWS service, we can use it to put records into a Kinesis stream. The way we do that in Stitch is with Functions. Let’s set up a putKinesisRecord function.

Select Functions from the left-hand menu, and click Create New Function. Provide a name for the function and paste the following in the body of the function.

exports = function(event){
 const awsService = context.services.get('aws');
try{
   awsService.kinesis().PutRecord({
     Data: JSON.stringify(event.fullDocument),
     StreamName: "stitchStream",
     PartitionKey: "1"
      }).then(function(response) {
        return response;
      });
}
catch(error){
  console.log(JSON.parse(error));
}
};
Figure 5. Example Function - putKinesisRecord

Test out the function

Let’s make sure everything is working by calling that function manually. From the Function Editor, Click Console to view the interactive javascript console for Stitch.

Functions called from Triggers require an event. To test execution of our function, we’ll need to pass a dummy event to the function. Creating variables from the console in Stitch is simple. Simply set the value of the variable to a JSON document. For our simple example, use the following:

event = {
   "operationType": "replace",
   "fullDocument": {
       "color": "black",
       "inventory": {
           "$numberInt": "1"
       },
       "overview": "test document",
       "price": {
           "$numberDecimal": "123"
       },
       "type": "backpack"
   },
   "ns": {
       "db": "streamdata",
       "coll": "clickdata"
   }
}
exports(event);

Paste the above into the console and click the button labeled Run Function As. Select a user and the function will execute.

Ta-da!

Putting it together with Stitch Triggers

We’ve got our MongoDB collection living in Atlas, receiving events from our web app. We’ve got our Kinesis stream ready for data. We’ve got a Stitch Function that can put data into a Kinesis stream.

Configuring Stitch Triggers is so simple it’s almost anticlimactic. Click Triggers from the left navigation, name your trigger, provide the database and collection context, and select the database events Stitch will react to with execution of a function.

For the database and collection, use the names from step one. Now we’ll set the operations we want to watch with our trigger. (Some triggers might care about all of them – inserts, updates, deletes, and replacements – while others can be more efficient because they logically can only matter for some of those.) In our case, we’re going to watch for insert, update and replace operations.

Now we specify our putKinesisRecord function as the linked function, and we’re done.

Figure 6. Trigger Configuration in Stitch

As part of trigger execution, Stitch will forward details associated with the trigger event, including the full document involved in the event (i.e. the newly inserted, updated, or deleted document from the collection.) This is where we can evaluate some condition or attribute of the incoming document and decide whether or not to put the record onto a stream.

Test the trigger!

Amazon provides a dashboard which will enable you to view details associated with the data coming into your stream.

Figure 7. Kinesis Stream Monitoring

As you execute the function from within Stitch, you’ll begin to see the data entering the Kinesis stream.

Building some more functionality

So far our trigger is pretty basic – it watches a collection and when any updates or inserts happen, it feeds the entire document to our Kinesis stream. From here we can build out some more intelligent functionality. To wrap up this post, let’s look at what we can do with the data once it’s been durably stored in MongoDB and placed into a stream.

Once the record is in the Kinesis Stream you can configure additional services downstream to act on the data. A common use case incorporates Amazon Kinesis Data Analytics to perform analytics on the streaming data. Amazon Kinesis Data Analytics offers pre-configured templates to accomplish things like anomaly detection, simple alerts, aggregations, and more.

For example, our stream of data will contain orders resulting from purchases. These orders may originate from point-of-sale systems, as well as from our web-based e-commerce application. Kinesis Analytics can be leveraged to create applications that process the incoming stream of data. For our example, we could build a machine learning algorithm to detect anomalies in the data or create a product performance leaderboard from a sliding, or tumbling window of data from our stream.

Figure 8. Amazon Data Analytics - Anomaly Detection Example

Wrapping up

Now you can connect MongoDB to Kinesis. From here, you’re able to leverage any one of the many services offered from Amazon Web Services to build on your application. In our next article in the series, we’ll focus on getting the data back from Kinesis into MongoDB. In the meantime, let us know what you’re building with Atlas, Stitch, and Kinesis!

Resources

MongoDB Atlas

MongoDB Stitch

Amazon Kinesis

Adopting a Serverless Approach at Bazaarvoice with MongoDB Atlas and AWS Lambda

I recently had the pleasure of welcoming Ani Hammond, Senior Staff Software Engineer from Bazaarvoice, to the MongoDB World stage. To a completely packed room, Ani chronicled her team’s journey as they replatformed Bazaarvoice’s Curations service from a runaway monolith architecture to a completely serverless architecture backed by MongoDB Atlas.

Bazaarvoice logo

Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services. To use Ani’s own description, “If you're shopping online and you’re reading a review, it's probably powered by us.”

Bazaarvoice strives to connect brands and retailers with consumers through the gathering, curation, and display of user-generated content—anything from pictures on Instagram to an online product review—during a potential customer’s buying journey.

To give you a sense of the scale of this task, Bazaarvoice clocked over a billion total page views between Thanksgiving Day and Cyber Monday in 2017, peaking at around 6,000 page views per second!

Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services.

One of the technologies behind this herculean task is the Curations platform. To understand how this platform works, let’s look at an example:

An Instagram user posts a cute photo of their child wearing a particular brand’s rain boots. Using Curations, that brand is watching for specific content that mentions their products, so the social collection service picks up that post and shows it to the client team in the Curations application. The post can then be enriched in various manual and automatic ways. For example, a member of the client team can append metadata describing the product contained in the image or automatic rules can filter content for potentially offensive material. The Curations platform then automates the process of securing the original poster’s permission for the client to use their content. Now, this user-generated content is able to be displayed in real time on the brand’s homepage or product pages to potential customers considering similar products.

In a nutshell, this is what Curations does for hundreds of clients and hundreds of thousands of individual content pieces.

The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS.

The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS.

This platform was effective in allowing Bazaarvoice to scale to hundreds of new clients. However, this architecture did have an Achilles heel: each additional client onboarded to Bazaarvoice’s platform represented an additional Python/Django/MySQL cluster to manage. Not only was this configuration expensive (approximately $60,000/month), the operational overhead generated by each additional cluster made debugging, patching, releases, and general data management an ever-growing challenge. As Ani put it, “Most of our solutions were basically to throw more hardware/money at the problem and have a designated DevOps person to manage these clusters.”

One of the primary factors in selecting MongoDB for the new Curations platform was its support for a variety of different access patterns. For example, the part of the platform responsible for sourcing new social content had to support high write volume whereas the mechanism for displaying the content to consumers is read-intensive with strict availability requirements.

Diving into the specifics of why the Bazaarvoice team opted to move from a MySQL-based stack to one built on MongoDB is a blog post for another day. (Though, if you’d like to see what motivated other teams to do so, I recommend How DevOps, Microservices, and MongoDB are Making HSBC “Simpler, Better, and Faster” and Breuninger delivers omnichannel shopping experience for thousands of daily online users.)

That is to say, the focus of this particular post is the paradigm shift the Curations team made from a linearly-scaling monolith to a completely serverless approach, underpinned by MongoDB Atlas.

The new Curations platform is broken into three distinct services for content collection, enrichment, and display. The collections service is powered by a series of AWS Lambda functions triggered by an Amazon Kinesis stream written in Node.js whereas the enrichment and display services are built on autoscaling AWS Elastic Beanstalk instances. All three services making up the new Curations platform are backed by MongoDB Atlas.

Not only did this approach address the cluster-per-customer challenges of the old system, but the monthly costs were reduced by nearly 90% to approximately $6,500/month. The results are, again, best captured by Ani’s own words:

Massive cost savings, huge performance gains, strong consistency, and a handful of services rather than hundreds of clusters.

MongoDB Atlas was a natural fit in this new serverless paradigm as the team is fully able to focus on developing their product rather than on infrastructure management. In fact, the team had originally opted to manage the MongoDB instances on AWS themselves. After a couple of iterations of manual deployment and management, a desire to gain even more operational efficiency and increased insight into database performance prompted their move to Atlas. According to Ani, the cost of migrating to and leveraging a fully managed service was, "Way cheaper than having dedicated DevOps engineers.” Atlas’ support for direct VPC peering also made the transition to a hosted solution straightforward for the team.

Speaking of DevOps, one of the first operational benefits Ani and her team experienced was the ability to easily optimize their index usage in MongoDB. Previously, their approach to indexing was “build stuff that makes sense at the time and is easy to iterate on.” After getting up and running on Atlas, they were able to use the built-in Performance Advisor to make informed decisions on indexes to add and unused ones to remove. As Ani puts it:

An index killed is as valuable as an index added. This ensures all your indexes to fit into memory and a bad index doesn't push out the good ones.

Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries. According to her, the built-in tools helped keep the team honest, "[People] say, ‘My database isn't scaling. It's not able to perform complex queries in real time...it doesn't work.’ Fix your code. The hardware is great, the tools are great but they can only carry you so far. I think sometimes we tend to get sloppy with how we write our code because of how cheap and how easy hardware is but we have to write code responsibly too.”

In another incident, a different Atlas feature, the Real Time Performance Panel, was key to identifying an issue with high load times in the display service. Some client’s displays were taking more than 6 seconds to load. (For context, content delivery network provider, Akamai, found that a two-second delay in web page load time can cause bounce rates to double!) High-level metrics in Datadog reported 5+ seconds query response times, while Atlas reported less than 100 ms response times for the same query. The team used both data points to triangulate and soon realized the discrepancy was a result of the time it took for Lambda to connect to MongoDB for each new operation. Switching from standard Lambda functions to a dockerized service ensured each operation could leverage an open connection rather than initiating a “cold start.”

I know a lot of the cool things that Atlas does can be done by hand but unless this is your full-time job, you're just not going to do it and you’re not going to do it as well.

Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries.

Before wrapping up her presentation, Ani shared an improvement over the old system that the team wasn’t expecting. Using Atlas, they were able to provide the customer support and services teams read-only views into the database. This afforded them deeper insight into the data and allowed them to perform ad-hoc queries directly. The result was a more proactive approach to issue management, leading to an 80% reduction in inbound support tickets.

By re-architecting their Curations platform, Bazaarvoice is well-positioned to bring on hundreds of new clients without a proportional increase in operations work for the team. But once again, Ani summarized it best:

As the old commercial goes… ‘Old platform: $60,000. New platform: $6,000. Getting to focus all of my time on development: priceless.'

Thank you very much to Ani Hammond and the rest of the Curations team at Bazaarvoice for putting together the presentation that inspired this post. Be sure to check out Ani’s full presentation in addition to dozens of other high-quality talks from MongoDB World on our YouTube channel.

If you haven’t tried out MongoDB Atlas for yourself, you can started with a free sandbox cluster.

Automating Adobe AEM on MongoDB Atlas and Microsoft Azure

Jason Mimick

Cloud, Atlas
Update (August 31, 2018)
Please note that this post uses a version of MongoDB (3.6.6) which is currently not officially supported by Adobe AEM. While future versions of AEM will support this version of MongoDB we recommend following all of the official Adobe guidance on this topic. Such information can be found directly in the AEM documenation: aem-with-mongodb and technical-requirements. The specific versions used here are for testing and development purposes only. It is also recommended to follown the best practices as documented by Adobe.

Introduction

Given the overwhelming trend towards automation in the technology industry we thought it would be a great idea to showcase how you can use MongoDB to do some awesome automation with two of MongoDB's premiere partners, Adobe and Microsoft.

Adobe Experience Manager (AEM) is a leading digital experience management solution. Enterprises like Manulife, Nissan, and Philips rely on AEM to deliver rapidly adaptable experiences across their customer journey. With MongoDB Atlas behind AEM, customers get a comprehensive solution integrated with a highly available, globally scalable, and enterprise-level database. Running all this on Microsoft's Azure cloud infrastructure ensures everything will run smoothly, meet your SLA's, and not break your budget. All this together makes delivering great digital experiences that much easier and allows you and your team to focus on your core missions.

This blog post will run through a straight-forward end-to-end automated deployment of an Adobe AEM Author node using MongoDB Atlas all running on Microsoft Azure for a development or testing environment. Production-level deployments require additional steps, such as nodes for additional AEM components (publish, etc) and high-availability for the AEM nodes for example. We hope you can leverage this example in your own automated workflows.

AEM released version 6.4 in March 2018. One of the exciting new features is ability to support cloud-based deployments for MongoDB. For customers looking to move to the cloud, MongoDB is easy to deploy and even easier when leveraging MongoDB Atlas for a full DBaaS - thus minimizing operational overhead, costs, and additional risk of maintaining your own databases. Atlas is our cloud-hosted MongoDB service engineered and run by the same team that builds the database.

This blog post presents a step-by-step example which automates the deployment of your own standalone AEM Author node along with a MongoDB Atlas replica set in Microsoft Azure. We'll use "development-level" cloud services to keep the costs low, but the steps for production-grade deployments are the same. Simply select beefier servers for your database and AEM instance. And, guess what? We'll actually be able to migrate to more substantial resources without requiring any downtime! How is this possible? MongoDB Atlas supports live cluster migration and AEM supports the ability to host multiple Author nodes.

Getting Ready - Cloud Services Access

Step 0 - 5 minutes

To start let's ensure we have accounts on all the services we need to use. There are 3:

Adobe

Use the Adobe Licensing Website to access the AEM software.

Microsoft Azure

Microsoft Azure will host our AEM server in the cloud.

MongoDB Atlas

For deploying the backing MongoDB replica set. Open MongoDB Atlas and click the green "Get started free" button. You'll actually need to configure your Atlas account payment settings as we'll be deploying a MongoDB cluster that is not free.

If you haven’t already, please use the links above and follow the directions to create your accounts. If you already have accounts for these services, then you’re all set.

Provisioning - Spinning up Cloud Resources

Step 1 - MongoDB Atlas API Access - 5 minutes

We'll start by provisioning our database for AEM to store its content. The goal is to configure a 3 node MongoDB replica set using the M20 Cluster Tier.

Sign into MongoDB Atlas to start managing multiple cloud-based MongoDB deployments. Atlas uses the concepts of Organizations and Projects to help manage these deployments. Each organization can have multiple projects. You can also create teams with various levels of access to each project. The automated deployment presented below requires a set of credentials for API access. Please follow the instructions for configuring API access in the MongoDB Atlas documentation. This is the only manual Atlas step required.

If this is your first visit to Atlas, you'll need to create an organization and a project within that organization. For these, you'll just need to name them something. I used AEM-ON-AZURE and aem-on-azure-db. Note your project name, you'll need to update the deployment script with that name in order to associate the AEM MongoDB database with that group.

Step 2 - Get automation tools and scripts

For this demonstration, we'll be showing some of the great automation capabilities of Azure (MongoDB Atlas also has a rich API along with some open source automation tooling.) To get started, download and install the Azure CLI 2.0: https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest Mac users with brew can just run:

brew update && brew install azure-cli

A script which automates launching an AEM Author node can be found in the aem-on-atlas-and-azure Github repository.

git clone https://github.com/jasonmimick/aem-on-atlas-and-azure
cd aem-on-atlas-and-azure

The spin-up-demo-aem-azure.sh script will:

  1. Provision a MongoDB Atlas replica-set.
  2. Create an Azure resource group.
  3. Launch an Ubuntu VM.
  4. Create a public IP and firewall rule for the AEM Author HTTP service.
  5. Upload your AEM jar file and license.properties.
  6. Start the AEM Author node.

We've leveraged an excellent blog our partner Adobe released on running a minimal setup AEM6 on MongoDB for the commands to launch AEM. We're also doing a non-interactive launch which allows us to set the admin password without user intervention (https://helpx.adobe.com/in/experience-manager/6-3/sites/administering/using/security-configure-admin-password.html).

Using your favorite editor, update the variables in the spin-up script.

It's a best practice to use the same region for both the Atlas and Azure components, the script contains 2 variables which define the region (fixed values between Atlas and Azure for "US East" are slightly different formats).

Step 3 - Run spin-up script

You'll just need to login to your Azure account and run the script.

az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code AW84CG7T5 to 
authenticate.
./spin-up-demo-aem-azure.sh aem-demo      # DEMO_NAME="aem-demo"

The very last step in the spin-up script starts the AEM author node. Once this is running we can ssh into the new Azure VM and do some inspection. A quick grep in the AEM logs does indeed validate that AEM is connecting to Atlas…

$# Note - put your DEMO_NAME for the 'resourceGroup'
$ssh aem@$(az network public-ip list --query="[?resourceGroup == 'my-aem-demo-61h'] | [].ipAddress" --output tsv)
aem@my-aem-demo-61h-aem-vm:~$grep mongodb ./crx-quickstart/logs/error.log
26.06.2018 16:11:40.133 *INFO* [cluster-ClusterId{value='5b3265bb19c9cf6b565256a1', description='MongoConnection for Oak DocumentMK'}-srv2-shard-00-00-n1oxl.mongodb.net:27017] org.mongodb.driver.connection Opened connection [connectionId{localValue:3, serverValue:46593}] to srv2-shard-00-00-n1oxl.mongodb.net:27017
26.06.2018 16:11:40.137 *INFO* [cluster-ClusterId{value='5b3265bb19c9cf6b565256a1', description='MongoConnection for Oak DocumentMK'}-srv2-shard-00-00-n1oxl.mongodb.net:27017] org.mongodb.driver.cluster Monitor thread successfully connected to server with description ServerDescription{address=srv2-shard-00-00-n1oxl.mongodb.net:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 6, 5]}, minWireVersion=0, maxWireVersion=6, maxDocumentSize=16777216, roundTripTimeNanos=3391937, setName='srv2-shard-0', canonicalAddress=srv2-shard-00-00-n1oxl.mongodb.net:27017, hosts=[srv2-shard-00-00-n1oxl.mongodb.net:27017, srv2-shard-00-01-n1oxl.mongodb.net:27017, srv2-shard-00-02-n1oxl.mongodb.net:27017], passives=[], arbiters=[], primary='srv2-shard-00-01-n1oxl.mongodb.net:27017', tagSet=TagSet{[]}, electionId=null, setVersion=3, lastWriteDate=Tue Jun 26 16:11:37 UTC 2018, lastUpdateTimeNanos=5447761739832831}
26.06.2018 16:11:40.169 *INFO* [FelixStartLevel] org.mongodb.driver.connection Opened connection [connectionId{localValue:4, serverValue:96728}] to srv2-shard-00-01-n1oxl.mongodb.net:27017

Checking the Atlas database with the mongo shell shows that our new AEM Author node is indeed able to connect and write data. Note the use of the mongodb+srv format connection string. This feature allows MongoDB replica set nodes to be discovered through DNS SRV records.

$mongo "mongodb+srv://srv2-n1oxl.mongodb.net/test" --username aem
MongoDB shell version v3.6.3
Enter password:
connecting to: mongodb+srv://srv2-n1oxl.mongodb.net/test
2018-06-28T14:58:40.036-0400 I NETWORK  [thread1] Starting new replica set monitor for srv2-shard-0/srv2-shard-00-02-n1oxl.mongodb.net.:27017,srv2-shard-00-00-n1oxl.mongodb.net.:27017,srv2-shard-00-01-n1oxl.mongodb.net.:27017
2018-06-28T14:58:40.327-0400 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to srv2-shard-00-02-n1oxl.mongodb.net.:27017 (1 connections now open to srv2-shard-00-02-n1oxl.mongodb.net.:27017 with a 5 second timeout)
2018-06-28T14:58:40.333-0400 I NETWORK  [thread1] Successfully connected to srv2-shard-00-00-n1oxl.mongodb.net.:27017 (1 connections now open to srv2-shard-00-00-n1oxl.mongodb.net.:27017 with a 5 second timeout)
2018-06-28T14:58:40.690-0400 I NETWORK  [thread1] Successfully connected to srv2-shard-00-01-n1oxl.mongodb.net.:27017 (1 connections now open to srv2-shard-00-01-n1oxl.mongodb.net.:27017 with a 5 second timeout)
2018-06-28T14:58:40.690-0400 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to srv2-shard-00-01-n1oxl.mongodb.net:27017 (1 connections now open to srv2-shard-00-01-n1oxl.mongodb.net:27017 with a 5 second timeout)
2018-06-28T14:58:40.765-0400 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] changing hosts to srv2-shard-0/srv2-shard-00-00-n1oxl.mongodb.net:27017,srv2-shard-00-01-n1oxl.mongodb.net:27017,srv2-shard-00-02-n1oxl.mongodb.net:27017 from srv2-shard-0/srv2-shard-00-00-n1oxl.mongodb.net.:27017,srv2-shard-00-01-n1oxl.mongodb.net.:27017,srv2-shard-00-02-n1oxl.mongodb.net.:27017
2018-06-28T14:58:42.015-0400 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to srv2-shard-00-00-n1oxl.mongodb.net:27017 (1 connections now open to srv2-shard-00-00-n1oxl.mongodb.net:27017 with a 5 second timeout)
2018-06-28T14:58:42.015-0400 I NETWORK  [thread1] Successfully connected to srv2-shard-00-02-n1oxl.mongodb.net:27017 (1 connections now open to srv2-shard-00-02-n1oxl.mongodb.net:27017 with a 5 second timeout)
MongoDB server version: 3.6.5
MongoDB Enterprise srv2-shard-0:PRIMARY> show dbs
admin       0.000GB
aem-author  0.320GB
config      0.000GB
local       0.936GB
MongoDB Enterprise srv2-shard-0:PRIMARY> use aem-author
switched to db aem-author
MongoDB Enterprise srv2-shard-0:PRIMARY> show tables
blobs
clusterNodes
journal
nodes
settings
MongoDB Enterprise srv2-shard-0:PRIMARY>

Wrap up

The aem-on-atlas-and-azure repository also contains a script to clean things up. The tear-down-aem-on-atlas-and-azure.sh bash script will attempt to read the latest configuration from the log file created by the spin-up script and then delete the corresponding Atlas and Azure resources.

This post reviewed an automated procedure for provisioning a complete AEM Author node hosted in the cloud. We hope you've found this informational and are able to apply these techniques in your own AEM deployments. An additional benefit to these kinds of deployments is that they are automatically integrated with the Atlas and Azure built in monitoring and alerting systems. Are you currently or planning to provision AEM in the cloud within your organization? If so, head over to our MongoDB community Slack, join the #aem channel, and let us know!

Charting a Course to MongoDB Atlas: Part 1 - Preparing for the Journey

MongoDB Atlas is an automated cloud MongoDB service engineered and run by the same team that builds the database. It incorporates operational best practices we’ve learned from optimizing thousands of deployments across startups and the Fortune 100. You can build on MongoDB Atlas with confidence, knowing you no longer need to worry about database management, setup and configuration, software patching, monitoring, backups, or operating a reliable, distributed database cluster.

MongoDB introduced its Database as a Service offering, in July of 2016 and it’s been a phenomenal success since its launch. Since then, thousands of customers have deployed highly secure, highly scalable and performant MongoDB databases using this service. Among its most compelling features are the ability to deploy Replica Sets in any of the major cloud hosting providers (AWS, Azure, GCP) and the ability to deploy database clusters spanning multiple cloud regions. In this series, I’ll explain the steps you can follow to migrate data from your existing MongoDB database into MongoDB Atlas.

Preparing for the Journey

Before you embark on any journey regardless of the destination, it’s always a good idea to take some time to prepare. As part of this preparation, we’ll review some options for the journey — methods to get your data migrated into MongoDB Atlas — along with some best practices and potential wrong turns to watch out for along the way.

Let’s get a bit more specific about the assumptions I’ve made in this article.

  • You have data that you want to host in MongoDB Atlas.
    • There’s probably no point in continuing from here if you don’t want to end up with your data in MongoDB Atlas.
  • Your data is currently in a MongoDB database.
    • If you have data in some other format, all is not lost — we can help. However, we’re going to address a MongoDB to MongoDB migration in this series. If you have other requirements -- data in another database or another format, for example, let me know you’ll like an article covering migration from some other database to MongoDB and I’ll make that the subject of a future series.
  • Your current MongoDB database is running MongoDB Version 3.0 or greater. MongoDB Atlas supports version 3.4, and 3.6. Therefore, we’ll need to work to get your database upgraded either as part of the migration - or, you can handle that ahead of the migration. We have articles and documentation designed to help you upgrade your MongoDB instances should you need.
  • Your data is in a clustered deployment (Sharded or Replica Set). We’ll cover converting a standalone deployment to a replica set in part 3 of this series.

At a high level, there are 4 basic steps to migrating your data. Let’s take a closer look at the journey:

  1. Deploy a Destination Cluster in MongoDB Atlas
  2. Prepare for the Journey
  3. Migrate the databases
  4. Cutover and Modify Your Applications to use the new MongoDB Atlas-based Deployment

As we approach the journey, it's important to know the various routes from your starting point to your eventual destination. Each route has its considerations and benefits and the choice of which route you choose will ultimately be up to you. Review the following table which presents a list of the available data migration methods from which you may choose.

Method Descriptions Considerations Benefits Version Notes
Live Import Fully automated via the Atlas administrative console. Downtime: Minimal - Cutover Only. Fully automated. From:Version 2.6, 3.0, 3.2, 3.4To: 3.4, 3.6
mongomirror mongomirror is a utility for migrating data from an existing MongoDB replica set to a MongoDB Atlas replica set. mongomirror does not require you to shut down your existing replica set or applications Downtime: Minimal - Cutover Only. Version 2.6 or great 3.4, 3.6
mongorestore mongorestore is a command-line utility program that loads data from either a binary database dump created by mongodump or the standard input. Downtime required Version 2.6 or Greater 3.4, 3.6
mongoimport mongoimport tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool Downtime required

For a majority of deployments, Live Import is the best, most efficient route to get your data into MongoDB Atlas. It offers the ability to keep your existing cluster up and active (but not too active, see considerations.) There are considerations, however. If you’re not located in a region that is geographically close to the US-EAST AWS datacenter, for example, you may encounter unacceptable latency. There are a number of possible concerns you should consider prior to embarking on your migration journey. The following section offers some helpful route guidance to ensure that you’re going in the right direction and moving steadily toward your destination.

Route Guidance for the Migration Journey

If you've made it this far, you’re likely getting ready to embark on a journey that will bring your data into a robust, secure, and scalable environment within MongoDB Atlas. The potential to encounter challenges along the way is real and the likelihood of encountering difficulties depends primarily upon your starting point in that journey. In this section, I’ll discuss some potential issues you may encounter as you prepare for your migration journey. A summary of the potential detours and guidance for each is presented in the following table.

Follow the links in the table to read more about each potential detour and its relevant guidance:

Potential Detour Guidance Reference
Insufficient RAM on Destination Cluster Calculate the RAM required for your application and increase that to account for the migration process requirements How do I calculate how much RAM I need for my application?
Too Much Network Latency Between Source and Destination Reduce Latency, or leverage mongodump/mongorestore instead of Live Import
Insufficient Network Access due to missing IP Whitelist or Firewall Rules Ensure that MongoDB Live Import Application Servers are whitelisted and that corporate firewalls permit access between source, destination
Insufficient user access permissions to source database deployment Ensure that authentication is enabled and that the user credentials granted for source database have required entitlements
Insufficient Oplog Size on Destination Size the operations log appropriately based on the application workload Sizing the Operations Log

Potential Detour: Insufficient RAM on Destination Cluster

Every deployment of MongoDB requires some form of resource to run efficiently. These resource requirements will include things like RAM, CPU, Disk and Network. To ensure acceptable response times and performance of the database, we typically look to the application’s read/write profile to inform the decisions we make about the amounts and sizes of each of these resources we’ll need for our deployment.

The amount of RAM a deployment will require is largely informed by the applications’ demand for data in the database. To approximate RAM requirements, we typically look at the frequently accessed documents in each collection, adding up the total data size and then we increase that by the total size of required indexes. Referred to as the working set, this is typically the approximate amount of RAM we’ll want our deployment to have. A more complete discussion of sizing will be found in the documentation pages on sizing for MongoDB.

Sizing is a tricky task especially for the cost constrained. We obviously don’t want to waste money by over-provisioning servers larger than those we’ll need to support the profile of our users and applications. However, it is important to consider that during our migration, we’ll not only need to account for the application requirements -- we also need to account for the resources required by the migration process itself. Therefore, you will want to ensure that you surpass the requirements for your production implementation when sizing your destination cluster.

Route Guidance: Increase available RAM During Migration

The size of the destination cluster should provide adequate resource across all environmentals (storage, CPU, and Memory) with room to spare. The migration process will require additional CPU and Memory as the destination database is being built from the source. It’s quite common for incoming clusters to be undersized and as a result the migration process fails. If this happens during a migration, you must empty the destination cluster, and resize the cluster to a larger M-Value to increase the amount of available RAM. A great feature of Atlas is that resizing -- in both directions, is extremely easy to do. Whether you’re adding resource (increasing the amount of RAM, Disk, CPU, shards, etc.) or decreasing the same, the process is very simple. Therefore, increasing the resource available on your target environment is painless and easy -- and once the migration completes, you can simply scale back down to a cluster size with less RAM, and CPU.


Potential Detour: Network Latency

Latency is defined as the amount of time it takes for a packet of data to get from one designated point to another. Because the migration process is all about moving packets of data between servers it is by its very nature latency sensitive.

Migrating data into MongoDB Atlas leveraging the Live Import capability involves connecting your source MongoDB Instance to a set of application servers running in the AWS us-east-1 region. These servers act as the conductors running the actually migration process between your source and destination MongoDB Database Servers. A potential detour can crop up when your source MongoDB database deployment exists in a datacenter located far from the AWS us-east-1 region.

Route Guidance: Reduce latency if possible or use mongomirror instead of Live Import

Should your source MongoDB Database servers exist in regions far from these application servers, you may need to leverage mongomirror, mongodump/mongorestore rather than Live Import.


Potential Detour: Network Access

In order to accomplish a migration using Live Import, Atlas streams data through a set of MongoDB-Controller application servers. Atlas provides the IP Address ranges of the MongoDB Live Import servers during the Live Import process. You must be certain to add these IP Address ranges to the IP Whitelist for your Destination cluster.

The migration processes within Atlas run on a set of application servers --- these are the traffic directors. The following is a list of the IP Addresses on which these application servers depend. It is important to ensure that traffic between these servers, your source cluster and the destination cluster is able to freely flow. These addresses are in C.I.D.R. notation.

  • 4.71.186.128/25
  • 4.35.16.128/25
  • 52.72.201.163/32
  • 34.196.196.255/32

An additional area where a detour may be encountered is in the realm of corporate firewall policy.

To avoid these potential detours, ensure that you have the appropriate connectivity from the networks where your source deployment resides to the networks where MongoDB Atlas exists.

Route Guidance: Whitelist the IP Ranges of the MongoDB Live Import Process

These IP ranges will be provided at the start of the migration process. Ensure that you configure the whitelist to enable appropriate access during the migration.


Potential Detour: Insufficient User Rights on Source Deployment

Every deployment of MongoDB should enforce authentication. This will ensure that only appropriate individuals and applications may access your MongoDB data.

A potential detour may arise when you attempt to Live Migrate a database without creating or providing the appropriately privileged user credentials.

If the source cluster enforces authentication, create a user with the following privileges:

  • Read all databases and collections (i.e. readAnyDatabase on the admin database)
  • Read the oplog.

Route Guidance: Ensure Appropriate User Access Permissions on the Source Deployment

Create a SCRAM user and password on each server in the replica set and ensure that this user belongs to roles that have the following permissions:

Read and write to the config database Read all databases and collections. Read the oplog.

For example:

  • For 3.4+ source clusters, a user with both clusterMonitor and backup roles would have the appropriate privileges.
  • For 3.2 source cluster, a user with clusterMonitor, clusterManager, and backup roles would have appropriate privileges.

Specify the username and password to Atlas when prompted by the Live Migration procedure.

Also, once you’ve migrated your data, if the source cluster enforced authentication you must consider that Atlas does not migrate any user or role data to the destination cluster. Therefore, you must re-create the credentials used by your applications on the destination Atlas cluster. Atlas uses SCRAM for user authentication. See Add MongoDB Users for a tutorial on creating MongoDB users in Atlas.


Potential Detour: Insufficient Oplog Size on Destination

The oplog, or operations log is a capped collection that keeps a rolling record of all operations that modify the data stored in your databases. When you create an Atlas cluster to serve as the destination for your migration, by default Atlas creates the oplog size at 5% of the total amount of disk you allocated for the cluster. If the activity profile of your application requires a larger oplog size, you will need to submit a proactive support ticket to have the oplog size increased on your destination cluster.

Route Guidance: Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

As stated previously, the decisions regarding the resources we apply to a given MongoDB Deployment are informed by the profile of the applications that depend on the database. As such, there are certain application read/write profiles or workloads that require a larger than default operations log. These are listed in detail in the documentation pages on the subject of Replica Set Oplog. Here is a summary of the workloads that typically require a larger than normal Oplog:

Updates to Multiple Documents at Once

The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use.

Deletions Equal the Same Amount of Data as Inserts

If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.

Significant Number of In-Place Updates If a significant portion of the workload is updates that do not increase the size of the documents, the database records a large number of operations but does not change the quantity of data on disk.

In Conclusion

Regardless of your starting point, MongoDB provides a robust, secure and scalable destination for your data. MongoDB Atlas Live Import automates and simplifies the process of migrating your data to MongoDB Atlas. The command line version of this utility, called mongomirror, gives users additional control and flexibility around how the data gets migrated. Other options include exporting (mongoexport) and importing (mongoimport) your data manually or even writing your own application to accomplish migration. The decision to use one particular method over another depends upon the size of your database, its geographic location as well as your tolerance for application downtime.

If you choose to leverage MongoDB Atlas Live Import, be aware of the following potential challenges along the journey.

  • Increase available RAM During Migration sufficient for application plus migration requirements.
  • Reduce latency if possible or use mongomirror instead of Live Import.
  • Whitelist the IP Ranges of the MongoDB Live Import Process
  • Ensure Appropriate User Access Permissions on the Source Deployment
  • Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.

Now that you’re fully prepared, let’s embark on the journey and I’ll guide you through the process of deploying a cluster in MongoDB Atlas and walk you through migrating your data from an AWS Replica Set.