GIANT Stories at MongoDB

Introducing VPC Peering for MongoDB Atlas

MongoDB Atlas now allows you to directly peer virtual private clouds (VPCs) in your AWS accounts with the MongoDB Atlas VPC created for your MongoDB clusters. Easily create an extended, private network connecting your application servers and backend databases.

VPC Peering in MongoDB Atlas is a significant ease of use and security improvement:

  • Your application servers (and development environments) can directly connect to MongoDB Atlas while remaining isolated from public networks.
  • Automatically scale your application tier without having to manage your database firewall rules.
  • Peer multiple VPCs in the same region from your AWS account(s) to each MongoDB Atlas group.

Security groups from your peered VPC can even be referenced in MongoDB Atlas clusters.

Tutorial

Let’s walk through what using this functionality feels like.

Prerequisites:

  • Create an AWS account
    • Create a VPC
    • Enable “DNS hostnames” on the VPC (optional). This will make it possible to immediately resolve the hostnames in the peered MongoDB Atlas clusters VPC to their private IP addresses (otherwise propagation can take up to one hour).
    • Launch instances that you can SSH into
    • Download MongoDB shell software onto those instances to confirm connectivity
  • Create a MongoDB Atlas account
    • Deploy a cluster in the same region as your AWS VPC

Step by Step Guide

  1. Register for a MongoDB Atlas account.
  2. Deploy cluster (US-East region is shown here)
  3. While the database cluster is deploying, navigate to the “Security” tab’s “Peering” section
  4. Add a New Peering Connection and include the information about your existing VPC (helpful “Show me how” instructions can be found throughout this process)
  5. Note that the default VPC used for EC2 instances uses a CIDR block that overlaps with that used by MongoDB Atlas and so cannot be peered – a new one must be created. I created a VPC with a CIDR block “10.0.0.0/16” for testing, like so:
  6. Enable “DNS hostnames” on the VPC and record the VPC ID for use in the peering form:
  7. Before using the VPC for any EC2 instances, it is necessary to create a new subnet for the VPC, in this case I used the full CIDR of the VPC:
  8. Create an EC2 instance using the new VPC and subnet:
  9. Fill in the peering request form as shown below (AWS account detail omitted) and include the entire VPC CIDR (10.0.0.0/16); you could optionally include a subset here. Notes:
    • In this example, I am leaving the default option, “Add this CIDR block to my IP whitelist”, selected so that I will be able to immediately connect (but as we’ll see later, I could instead use a security group).
    • Also, because I have already created a MongoDB Atlas cluster, the MongoDB Atlas region and CIDR block cannot be adjusted (if I were in a new MongoDB Atlas group that did not have a cluster yet, I could specify those).
  10. At this point, assuming you have correctly filled in the peering request details, you should see “Waiting for Approval”.
    • The UI shown below contains a helpful “How do I approve the connection?” section with two steps:

      i. Accept the peering request in my AWS account and

      ii. Add the route table entry for the Atlas CIDR Block shown in the top right so that my VPC routes to the MongoDB Atlas VPC
  11. In the AWS Console, under the VPC Dashboard, in the “Peering Connections” section, choose “Accept Request”.
  12. In the AWS Console under the “Route Table” for your VPC, choose “Add another rule”, paste in the MongoDB Atlas CIDR block, and associate it with the VPC peering connection.
    a. b. c.
    d. e. f. Note that if you don't see a 0.0.0.0/0 route associated with an internet gateway then you should add one if you want to SSH directly into your VPC’s instances from your laptop – this may necessitate creating a new internet gateway.
  13. After accepting the Peering Connection in our VPC, MongoDB Atlas will display the Peering Connection as “Available” (this may take up to 10 minutes to show)
  14. Now let’s demonstrate connectivity in this tutorial by navigating to our cluster in MongoDB Atlas and clicking “Connect” to follow instructions.
    a.
    b. We can confirm that the CIDR block associated with our Peered VPC has already been added to our IP address whitelist
    c. d. We’ll download and extract the MongoDB shell for the operating system of the instance in our VPC, and use the ‘mongo’ shell instructions shown below e. Success! We’ve connected successfully without having any public IP addresses open to our MongoDB Atlas cluster!
  15. Now let’s remove the CIDR block (IP addresses) from our IP Address Whitelist, and demonstrate that we can instead reference a Security Group from our peered VPC

    a. We’ll navigate to “Security” tab’s “IP Whitelist” section b. After clicking “Delete” on the Peer VPC’s CIDR Block (10.0.0.0/16 in this case) we’ll see c. Let’s add an inbound rule to our EC2 instance’s Security Group such that connectivity on ports 27000-28000 can be made within the Security Group itself d. Now we’ll click “Add IP Address” but specifically enter the security group ID associated with the instance in our VPC e. Now we can confirm connectivity again (with no explicit IP Addresses in our white list) — Awesome!

#### Next steps [Register for MongoDB Atlas](https://www.mongodb.com/cloud/atlas?jmp=blog) and deploy your first cluster today!



Atlas Update – Faster Scaling

Jay Gordon

Cloud

MongoDB Atlas has only been around for a few months and already we’ve improved the speed at which you can scale your clusters.

“Scaling should be faster than a speeding bullet” - https://twitter.com/Jennifer_Seelin
https://webassets.mongodb.com/_com_assets/blog/tblr/tumblr_ofrpuh2iBB1sdaytmo5_1280.jpg
Jennifer - Corporate Communications Super Hero @MongoDB.

Some heroes indeed do wear capes, like the Cloud Engineers at MongoDB. OK, they really do not wear capes but they do get coffee and plenty of snacks. Recently it was time to look at the time spent on working with MongoDB Atlas, and specifically scaling to your requirements.

Your time, your data and all things you do are precious. No one needs to spend extra time waiting on larger sized hardware or cloud servers. When we first launched MongoDB Atlas, modifying your cluster was a lot of work on the back end. Our engineering department had to take a hard look at the process in which scaling worked and find a method to best utilize our Cloud vendor’s rich APIs and standard UNIX tools.

One of the more common tasks our users have is to scale from one size to another. For our initial implementation we chose a process that was simple, worked for all configuration changes, but was not particularly fast. When a user requested a new configuration we would terminate one server, build a new server in the new configuration, and then wait for MongoDB replication to copy the data and rebuild the indexes on the new server. This was repeated for each server, until all servers satisfied the new configuration. .

Let’s look at the “old way” from a high level:

https://webassets.mongodb.com/_com_assets/blog/tblr/tumblr_ofrpuh2iBB1sdaytmo4_1280.png

While this was an extremely reliable process it could be time consuming. Each step required waiting on the AWS API to respond with the proper status of the instance while we continued to serve data. As each part of the process continued, your primary and at least one secondary remained on line. The downside of this is sometimes we saw those index build times just take far too long. It was time to make a fundamental change to this process.

In order to expedite this upgrade during scaling what we have implemented a new and faster process for our M30 through M100 instance types.

Let’s upgrade from an M30 instance to a M40. We’ve gone ahead and accessed our Atlas UI and started our changes:

https://webassets.mongodb.com/_com_assets/blog/tblr/tumblr_ofrpuh2iBB1sdaytmo2_1280.png

Atlas receives details on your new plan and then begins the process of putting it into service. A basic overview of how Atlas functions flow can be seen here:

https://webassets.mongodb.com/_com_assets/blog/tblr/tumblr_ofrpuh2iBB1sdaytmo1_1280.png

Now we’ve reached the “Plan Execution” point of our scaling and how we improved speed has so much to do with the replication of your data.

As mentioned prior, our previous method (still in used for upgrades from M10/M20 instances to to M30 and above due to resource restrictions by Amazon Web Services) would require MongoDB replication and index creation for these new instances during an initial sync.

“Avoid the Initial-Sync”

Rather than destroy your EC2 instance, we stop each one at a time in a rolling fashion and then notify the AWS API of your new MongoDB Atlas Class. If modifying your disk attributes, to avoid the time consuming method of an initial sync, we turned to standard UNIX utilities.

We take the time to review the data currently onboarded to your deployment’s disks and validate our checksums match to ensure a complete replica of the data. We also allow validate once again, by reviewing existing members in your set and ensuring normal replication continues.

https://webassets.mongodb.com/_com_assets/blog/tblr/tumblr_ofrpuh2iBB1sdaytmo3_1280.png

As we reach the point of resize a job is kicked off that creates a duplicate of your data directory along with all previously existing indexes and options. There’s no additional need for MongoDB to spend time to rebuild any of this. Here’s where the actual speed of the process comes into play, no longer do we worry about recreating indexes. This process for larger deployments can actually take longer than copying the data.

Once the process of this data copy completes, MongoDB continues to resume reading your the OpLog to resume any writes that occurred during the upgrade process, this is how our standard replication works.

As this upgrade process occurs one by one, you’ll notice that you will have no downtime, no outages and a continuation of normal service. Your connection string will never be modified based on this, so there’s no need for you to make code changes on the application side to reflect you’ve scaled in any way.

“Eradicate Downtime”

Need to size up because your application is taking off? Finished with a project and need a smaller instance size? This new method of scaling will ensure you’ll stay able to keep up with your always changing environment and remain stable. The ability to change your cluster with no downtime is one of the most powerful features of our offering and why developers and companies continue to migrate their workloads to Atlas.

5 Blogs to Read Before You Head to AWS re:Invent Next Month

This post is part of our Road to re:Invent series series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud.

![Road to AWS re:Invent](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent-683wqzsi2z.jpg)

Before you head to AWS re:Invent next month, we’ve pulled together our most popular blog posts about running MongoDB alongside different AWS solutions.

1. Virtualizing MongoDB on Amazon EC2 and GCE

As part of a migration to a cloud hosting environment, David Mytton, Founder and CTO of Server Density, did an investigation into the best ways to deploy MongoDB into two popular platforms, Amazon EC2, and Google Compute Engine.

In this two part series, we will review David’s general pros and cons of virtualization along with the challenges and methods of virtualizing MongoDB on EC2 and GCE.

Read the post >

2. Maximizing MongoDB Performance on AWS

You have many choices to make when running MongoDB on AWS: from instance type and security, to how you configure MongoDB processes and more. In addition, you now have options for tooling and management. In this post we’ll take a look at several recommendations that can help you get the best performance out of AWS.

Read the post >

3. Develop & Deploy a Node.js App to AWS Elastic Beanstalk & MongoDB Atlas

AWS Elastic Beanstalk is a service offered by Amazon to make it simple for developers to deploy and manage their cloud-based applications. In this post, Andrew Morgan will walk you through how to build and deploy a Node.js app to AWS Elastic Beanstalk using MongoDB Atlas.

Read the tutorial >

4. Oxford Nanopore Technologies Powers Real-Time Genetic Analysis Using Docker, MongoDB, and AWS

In this post, we take a look at how containerization, the public cloud, and MongoDB is helping a UK-based biotechnology firm track the spread of Ebola.

Get the full story >

5. Selecting AWS Storage for MongoDB Deployments: Ephemeral vs. EBS

Last but not least, take a look at what we were writing about this time last year as Bryan Reinero explores how to select the right AWS solution for your deployment.

Keep reading >

Want more?

We’ll be blogging about MongoDB and the cloud leading up to re:Invent again this year in our Road to re:Invent series. You can see the posts we’ve already published here.

Going to re:Invent?

The MongoDB team will be in Las Vegas at re:Invent 11/29 to 12/2. If you’re attending re:Invent, be sure to visit us at booth 2620!


MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more.

Get the guide for MongoDB on AWS

Crossing the Chasm: Looking Back on a Seminal Year of Cloud Technology

This post is part of our Road to re:Invent series. In the weeks leading up to AWS re:Invent in Las Vegas this November, we'll be posting about a number of topics related to running MongoDB in the public cloud.

![Road to AWS re:Invent](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent-683wqzsi2z.jpg)

On the main stage of Amazon’s AWS re:Invent conference in Las Vegas last year, Capital One’s CIO, Rob Alexander made his way into headlines of tech publications when he explained that, under his leadership, the bank would be reducing the number of data centers from 8 in 2015 to just 3 in 2018. Capital One began using cloud-hosted infrastructure organically, with developers turning to the public cloud for a quick and easy way to provision development environments. The increase in productivity prompted IT leadership to adopt a cloud-first strategy not just for development and test environments, but for some of the bank’s most vital production workloads.

What generated headlines just a short year ago, Capital One’s story has now become just one of many examples of large enterprises shifting mission critical deployments to the cloud.

In a recent report released by McKinsey & Company, the authors declared “the cloud debate is over—businesses are now moving a material portion of IT workloads to cloud environments.” The report goes on to validate what many industry-watchers (including MongoDB, in our own Cloud Brief this May) have noted: cloud adoption in the enterprise is gaining momentum and is driven primarily by benefits in time to market.

According to McKinsey’s survey almost half (48 percent) of large enterprises have migrated an on-premises workload to the public cloud. Based on the conventional model of innovation adoption, this marks the divide between the “early majority” of cloud adopters and “late majority.” This not only means that the cloud computing “chasm” has been crossed, but that we have entered the period where the near term adoption of cloud-centric strategies will play a strong role in an organization’s ability to execute, and as a result, its longevity in the market.

![](https://webassets.mongodb.com/_com_assets/cms/AWS_ReInvent_Adoption_Lifecycle-awjdat7emu.png)
Image source: [Technology Adoption Lifecycle](https://upload.wikimedia.org/wikipedia/commons/d/d3/Technology-Adoption-Lifecycle.png)

An additional indication that the “chasm” has been bridged comes as more heavily-regulated industries put down oft-cited security concerns and pair public cloud usage with other broad-scale digitization initiatives. As Amazon, Google, and Microsoft (the three “hyperscale” public cloud vendors as McKinsey defines them) continue to invest significantly in securing their services, the most memorable soundbite from Alexander’s keynote continues to ring true: that Capital One can “operate more securely in the public cloud than we can in our own data centers."

As the concern over security in the public cloud continues to wane, other barriers to cloud adoption are becoming more apparent. Respondents to McKinsey’s survey and our own Cloud Adoption Survey earlier this year reported concerns of vendor lock-in and of limited access to talent with the skills needed for cloud deployment. With just 4 vendors holding over half of the public cloud market, CIOs are careful to select technologies that have cross-platform compatibility as Amazon, Microsoft, IBM, and Google continue to release application and data services exclusive to their own clouds.

This reluctance to outsource certain tasks to the hyperscale vendors is mitigated by a limited talent pool. Developers, DBAs, and architects with experience building and managing internationally-distributed, highly-available, cloud-based deployments are in high demand. In addition, it is becoming more complex for international business to comply with the changing landscape of local data protection laws as legislators try to keep pace with cloud technology. As a result, McKinsey predicts enterprises will increasingly turn to managed cloud offerings to offset these costs.

It is unclear whether the keynote at Amazon’s re:Invent conference next month will once again predicate the changing enterprise technology landscape for the coming year. However, we can be certain that the world’s leading companies will be well-represented as the public cloud continues to entrench itself even deeper into enterprise technology.


MongoDB Atlas, the cloud database service for MongoDB, is the easiest way to deploy and run MongoDB, allowing you to get started in minutes. Click here to learn more.

The MongoDB team will be at AWS re:Invent this November in Las Vegas and our CTO Eliot Horowitz will be speaking Thursday (12/1) afternoon. If you’re attending re:Invent, be sure to attend the session & visit us at booth #2620!

Learn more about AWS re:Invent

Getting Started with Python, PyMODM, and MongoDB Atlas

Jason Ma

Technical, Cloud

What is PyMODM

PyMODM is an object modeling package for Python that works like an Object Relational Mapping (ORM) and provides a validation and modeling layer on top of PyMongo (MongoDB’s Python driver). Developers can use PyMODM as a template to model their data, validate schemas, and easily delete referenced objects. PyMODM can be used with any Python framework, is compatible with Python 3, and is supported by MongoDB.

Benefits of PyMODM

PyMODM allows developers to focus more on developing application logic instead of creating validation logic to ensure data integrity.

Some key benefits of PyMODM are:

Field Validation. MongoDB has a dynamic schema, but there are very few production use cases where data is entirely unstructured. Most applications expect some level of data validation either through the application or database tier. MongoDB provides Document Validation within the database. Users can enforce checks on document structure, data types, data ranges, and the presence of mandatory fields. Document validation is useful for centralizing rules across projects and APIs, as well as minimizing redundant code for multiple applications. In certain cases, application side validation makes sense as well, especially when you would like to obviate the need for a round trip between the application and database. PyMODM provides users the ability to define models and validate their data before storing it in MongoDB, thus eliminating the amount of data validation logic developers need to write in the application tier.

Built In Reference Handling. PyMODM has built in reference handling to make development simpler. Developers don’t have to plan on normalizing data as much as they would with an RDBMS. PyMODM can automatically populate fields that reference documents in other collections, in a similar way to foreign keys in a RDBMS.

For example, you might have a model for a blog post that contains an author. Let’s say we want to keep track of these entities in separate collections. The way we store this in MongoDB is to have the _id from the author document be stored as an author field in the post document:

{
"title": "Working with PyMODM",
"author": ObjectId('57dad74a6e32ab4894ea6898')
}```

If we were using the low-level driver, we would just get an **ObjectId** when we accessed **post['author']**, whereas PyMODM will lazily dereference this field for you:

post.author Author(name='Jason Ma')

In other words, PyMODM handles all the necessary queries to resolve referenced objects, instead of having to pull out the ids yourself and perform the extra queries manually. 

PyMODM also provides several strategies for managing how objects get deleted when they are involved in a relationship with other objects. For example, if you have a Book and Publisher class, where each Book document references a Publisher document, you have the option of deleting all Book objects associated with that Publisher.

Familiar PyMongo Syntax. PyMODM uses PyMongo-style syntax for queries and updates, which makes it familiar and easy to get started with for those already familiar with PyMongo.

Installing PyMODM

Getting started with PyMODM is simple. You can install PyMODM with pip.

pip install pymodm

Connecting to MongoDB Atlas

For developers that are interested in minimizing operational database tasks, MongoDB Atlas is an ideal option.

MongoDB Atlas is a database as a service and provides all the features of the database without the heavy lifting of setting up operational tasks. Developers no longer need to worry about provisioning, configuration, patching, upgrades, backups, and failure recovery. Atlas offers elastic scalability, either by scaling up on a range of instance sizes or scaling out with automatic sharding, all with no application downtime.

Setting up Atlas is simple.

Select the instance size that fits your application needs and click “CONFIRM & DEPLOY”.

Connecting PyMODM to MongoDB Atlas is straightforward and easy. Just find the connection string and plug it into the ‘connect’ method. To ensure a secure system right out of the box, authentication and IP Address whitelisting are automatically enabled. IP address whitelisting is a key MongoDB Atlas security feature, adding an extra layer to prevent 3rd parties from accessing your data. Clients are prevented from accessing the database unless their IP address has been added to the IP whitelist for your MongoDB Atlas group. For AWS, VPC Peering for MongoDB Atlas is under development and will be available soon, offering a simple, robust solution. It will allow the whitelisting of an entire AWS Security Group within the VPC containing your application servers.

from pymodm import connect

#Establish a connection to the database and call the connection my-atlas-app
connect(
'mongodb://jma:PASSWORD@mongo-shard-00-00-efory.mongodb.net:27017,mongo-shard-00-01-efory.mongodb.net:27017,mongo-shard-00-02-efory.mongodb.net:27017/admin?ssl=true&replicaSet=mongo-shard-0&authSource=admin', 
    alias='my-atlas-app'
)

In this example, we have set alias=’my-atlas-app’. An alias in the connect method is optional, but comes in handy if we ever need to refer to the connection by name. Remember to replace “PASSWORD” with your own generated password.

Defining Models

One of the big benefits of PyMODM is the ability to define your own models and apply schema validation to those models. The below examples highlight how to use PyMODM to get started with a blog application.

Once a connection to MongoDB Atlas is established, we can define our model class. MongoModel is the base class for all top-level models, which represents data stored in MongoDB in a convenient object-oriented format. A MongoModel definition typically includes a number of field instances and possibly a Meta class that provides settings specific to the model:

from pymodm import MongoModel, fields
from pymongo.write_concern import WriteConcern

class User(MongoModel):
    email = fields.EmailField(primary_key=True)
    first_name = fields.CharField()
    last_name = fields.CharField()

    class Meta:
        connection_alias = 'my-atlas-app'
        write_concern = WriteConcern(j=True)

In this example, the User model inherits from MongoModel, which means that the User model will create a new collection in the database (myDatabase.User). Any class that inherits directly from MongoModel will always get it’s own collection.

The character fields (first_name, last_name) and email field (email) will always store their values as unicode strings. If a user stores some other type in first_name or last_name (e.g. Python ‘bytes’) then PyMODM will automatically convert the field to a unicode string, providing consistent and uniform access to that field. A validator is readily available on CharField, which will validate the maximum string length. For example, if we wanted to limit the length of a last name to 30 characters, we could do:

last_name = fields.CharField(max_length=30)

For the email field, we set primary_key=True. This means that this field will be used as the id for documents of this MongoModel class. Note, this field will actually be called _id in the database. PyMODM will validate that the email field contents contain a single ‘@’ character.

New validators can also be easily be created. For example, the email validator below ensures that the email entry is a Gmail address:

def is_gmail_address(string):
    if not string.endswith(‘@gmail.com’):
        raise ValidationError(‘Email address must be valid gmail account.’)

class User(MongoModel):
email = fields.EmailField(validators=[is_gmail_address])

Here, PyMODM will validate that the email field contains a valid Gmail address or throw an error. PyMODM handles field validation automatically whenever a user retrieves or saves documents, or on-demand. By rolling validation into the Model definition, we reduce the likelihood of storing invalid data in MongoDB. PyMODM fields also provide a uniform way of viewing data in that field. If we use a FloatField, for example, we will always receive a float, regardless of whether the data stored in that field is a float, an integer, or a quoted number. This mitigates the amount of logic that developers need to create in their applications.

Finally, the last part of our example is the Meta class, which contains two pieces of information. The connection_alias tells the model which connection to use. In a previous code example, we defined the connection alias as my-atlas-app. The write_concern attribute tells the model which write concern to use by default. You can define other Meta attributes such as read concern, read preference, etc. See the PyMODM API documentation for more information on defining the Meta class.

Reference Other Models

Another powerful feature of PyMODM is the ability to reference other models.

Let’s take a look at an example.

from pymodm import EmbeddedMongoModel, MongoModel, fields

class Comment(EmbeddedMongoModel):
    author = fields.ReferenceField(User)
    content = fields.CharField()

class Post(MongoModel):
    title = fields.CharField()
    author = fields.ReferenceField(User)
    revised_on = fields.DateTimeField()
    content = fields.CharField()
    comments = fields.EmbeddedDocumentListField(Comment)

In this example, we have defined two additional model types: Comment and Post. Both these models contain an author, which is an instance of the User model. The User that represents the author in each case is stored among all other Users in the myDatabase.User collection. In the Comment and Post models, we’re just storing the _id of the User in the author field. This is actually the same as the User’s email field, since we set primary_key=True for the field earlier.

The Post class gets a little bit more interesting. In order to support commenting on a Post, we’ve added a comments field, which is an EmbeddedDocumentListField. The EmbeddedDocumentListField embeds Comment objects directly into the Post object. The advantage of doing this is that you don’t need multiple queries to retrieve all comments associated with a given Post.

Now that we have created models that reference each other, what happens if an author deletes his/her account. PyMODM provides a few options in this scenario:

  • Do nothing (default behaviour).
  • Change the fields that reference the deleted objects to None.
  • Recursively delete all objects that were referencing the object (i.e. delete any comments and posts associated with a User).
  • Don’t allow deleting objects that have references to them.
  • If the deleted object was just one among potentially many other references stored in a list, remove the references from the list. For example, if the application allows for Post to have multiple authors we could remove from the list just the author who deleted their account.

For our previous example, let’s delete any comments and posts associated with a User that has deleted his/her account:

author = fields.ReferenceField(User, on_delete=ReferenceField.CASCADE)

This will delete all documents associated with the reference.

In this blog, we have highlighted just a few of the benefits that PyMODM provides. For more information on how to leverage the powerful features of PyMODM, check out this github example of developing a blog with the flask framework.

Summary

PyMODM is a powerful Python ORM for MongoDB that provides an object-oriented interface to MongoDB documents to make it simple to enforce data validation and referencing in your application. MongoDB Atlas helps developers free themselves from the operational tasks of scaling and managing their database. Together, PyMODM and MongoDB Atlas provide developers a compelling solution to enable fast, iterative development, while reducing costs and operational tasks.

Get Started with PyMODM

Check your query and index performance with the Query Targeting Chart

MongoDB

Cloud, Company

We’re excited to announce a new feature for Monitoring in both Cloud Manager and Atlas: The Query Targeting Chart. This chart tracks two variables, the first is “scanned/returned” and the second is “scanned objects/returned”.

https://webassets.mongodb.com/_com_assets/blog/tblr/66.media.tumblr.com--d1cca82c135ff5d50e4ff6af52a9d15d--tumblr_odt0epEg5y1sdaytmo1_400.png

“Scanned/returned” refers to the ratio between the number of index items scanned and the number of documents returned by queries. If this value is 1.0, then your query scanned exactly as many index items as documents it returned – it’s an efficient query. This is available for MongoDB 2.4 and newer.

“Scanned objects/returned” is similar, except it’s about the number of documents scanned versus the number returned. A large number is a sign that you may need an index on the fields you are querying on. This metric is available for MongoDB 2.6 and newer.

For a little more understanding of this graph, let’s talk about a collection with 1000 documents in it. We then issue a query without an index (so it is a collection scan). Scanned objects/returned for this query could be as bad as 1000, but the average value would be 500. Now, let’s put an index on that same query, return one document and we only have scanned one document. This means that scanned/returned is 1, and scanned objects/returned is also 1. Finally, let’s say you do a covered query, in this case the scanned/returned is 1, but the scanned objects is 0, because the index has all the data you requested, so you didn’t need to query any objects!

This feature is available for all Cloud Manager and Atlas deployments. We believe this new chart will help you refine your queries and indexes to get the best performance out of your MongoDB deployment. However, if you need more help, the Visual Profiler as part of Cloud Manager Premium can help you identify slow queries and suggest indexes as well. Contact your Account Executive for more information about MongoDB subscriptions with access to Cloud Manager Premium.

Peter C. Gravelle is a Technical Account Manager at MongoDB, Inc. He can be found via Atlas’ chat option as well as in tickets. He can also be found in New York City.

Using MongoDB Atlas From Your Favorite Language or Framework

Andrew Morgan

Technical, Cloud

Developers love working with MongoDB. One reason is the flexible data model, another is that there's an idiomatic driver for just about every programming language and someone's probably already built a framework on top of MongoDB that takes care of a lot of the grunt work. With high availability and scaling built in, they can also be confident that MongoDB will continue to meet their needs as their business grows.

MongoDB Atlas provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need (Figure 1).

*Figure 1: Create MongoDB Atlas Cluster*

MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of cloud providers, regions, and billing options

This post provides instructions on how to use MongoDB Atlas directly from your application or how to configure your favorite framework to use it. It goes on to provide links to some worked examples for specific frameworks.

Worked Examples for Specific Frameworks

Detailed walkthroughs are available for specific programming languages and frameworks:

This list will be extended as new blog posts are produced. If your preferred language or framework isn't listed above then read on as the following, generic instructions cover most other cases.

Preparing MongoDB Atlas For Your Application

Launch your MongoDB cluster using MongoDB Atlas and then (optionally) create a user with read and write privileges for just the database that will be used for your application, as shown in Figure 2.

*Figure 2: Creating an Application user in MongoDB Atlas*

You must also add the IP address of your application server to the IP Whitelist in the MongoDB Atlas security tab (Figure 3). Note that if multiple application servers will be accessing MongoDB Atlas then an IP address range can be specified in CIDR format (IP Address/number of significant bits).

*Figure 3: Add App Server IP Address(es) to MongoDB Atlas*

Connecting Your Application (Framework) to MongoDB Atlas

The exact way that you specify how to connect to MongoDB Atlas will vary depending on your programming language and (optionally) the framework you're using. However it's pretty universal that you'll need to provide a connection string/URI. The core of this URI can be retrieved by clicking on the CONNECT button for your cluster in the MongoDB Atlas GUI, selecting the MongoDB Drivers tab and then copying the string (Figure 4).

*Figure 4: Copy MongoDB Atlas Connection String/URI*

Note that this URI contains the administrator username for your MongoDB Atlas group and will connect to the admin database – you'll probably want to change that.

Your final URI should look something like this:

mongodb://appuser:my_password@cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/appdatabase?ssl=true&authSource=admin'

The URI contains these components:

  • appuser is the name of the user you created in the MongoDB Atlas UI.
  • my_password is the password you chose when creating the user in MongoDB Atlas.
  • cluster0-shard-00-00-qfovx.mongodb.net, cluster0-shard-00-01-qfovx.mongodb.net, & cluster0-shard-00-02-qfovx.mongodb.net are the hostnames of the instances in your MongoDB Atlas replica set (click on the "CONNECT" button in the MongoDB Atlas UI if you don't have these).
  • 27017 is the standard MongoDB port number.
  • appdatabase is the name of the database (schema) that your application or framework will use. Note that for some frameworks, this should be omitted and the database name configured separately – check the default configuration file or documentation for your framework to see if it's possible to provide the database name outside of the URI.
  • To enforce security, MongoDB Atlas mandates that the ssl option is used.
  • admin is the database that's being used to store the credentials for appuser.

Check Your Application Data

At this point, you should add some test data through your application and then confirm that it's being correctly stored in MongoDB Atlas.

MongoDB Compass is the GUI for MongoDB, allowing you to visually explore your data and interact with your data with full CRUD functionality. The same credentials can be used to connect Compass to your MongoDB database (Figure 5).

*Figure 5: Connect MongoDB Compass to MongoDB Atlas*

Once connected, explore the data added to your collections (Figure 6).

*Figure 6: Explore MongoDB Atlas Data Using MongoDB Compass*

It is also possible to add, delete, and modify documents (Figure 7).

*Figure 7: Modify a Document in MongoDB Compass*

You can verify that the document has really been updated from the MongoDB shell:

Cluster0-shard-0:PRIMARY> use appdatabase
Cluster0-shard-0:PRIMARY> db.simples.find({
    first_name: "Stephanie", 
    last_name: "Green"}).pretty()
{
    "_id" : ObjectId("57a206be0e8ecb0d5b5549f9"),
    "first_name" : "Stephanie",
    "last_name" : "Green",
    "email" : "sgreen1b@tiny.cc",
    "gender" : "Female",
    "ip_address" : "129.173.45.61",
    "children" : [
        {
            "first_name" : "Eugene",
            "birthday" : "8/25/1985"
        },
        {
            "first_name" : "Nicole",
            "birthday" : "12/29/1963",
            "favoriteColor" : "Yellow"
        }
    ]
}

Migrating Your Data to MongoDB Atlas

This post has assumed that you're building a new application but what if you already have one, with data stored in a MongoDB cluster that you're managing yourself? Fortunately, the process to migrate your data to MongoDB Atlas (and back out again if desired) is straightforward and is described in Migrating Data to MongoDB Atlas.

We offer a MongoDB Atlas Migration service to help you properly configure MongoDB Atlas and develop a migration plan. This is especially helpful if you need to minimize downtime for your application, if you have a complex sharded deployment, or if you want to revise your deployment architecture as part of the migration. Contact us to learn more about the MongoDB Atlas Migration service.

Next Steps

While MongoDB Atlas radically simplifies the operation of MongoDB there are still some decisions to take to ensure the best performance and reliability for your application. The MongoDB Atlas Best Practices white paper provides guidance on best practices for deploying, managing, and optimizing the performance of your database with MongoDB Atlas.

The guide outlines considerations for achieving performance at scale with MongoDB Atlas across a number of key dimensions, including instance size selection, application patterns, schema design and indexing, and disk I/O. While this guide is broad in scope, it is not exhaustive. Following the recommendations in the guide will provide a solid foundation for ensuring optimal application performance.

Download MongoDB Atlas


Andrew is part of the MongoDB product team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements.

Before joining MongoDB, Andrew was director of product management for MySQL at Oracle – with a particular focus on distributed, highly available databases. Prior to Oracle, Andrew worked in software development for telecoms with a focus on HA, in-memory, real-time databases.

Building a New Parse Server & MongoDB Atlas-Based Application

Andrew Davidson

Technical, Cloud

We will learn in this blog post:

  • How to deploy a MongoDB Atlas cluster
  • How to deploy the Parse Server (in our case we will show how to do so using AWS Elastic Beanstalk quick start, but updated to use the newest version of Parse Server)
  • How to configure Parse Server to connect to MongoDB Atlas
  • How to confirm connectivityWe will learn in this blog post:
  • How to deploy a MongoDB Atlas cluster
  • How to deploy the Parse Server (in our case we will show how to do so using AWS Elastic Beanstalk quick start, but updated to use the newest version of Parse Server)
  • How to configure Parse Server to connect to MongoDB Atlas
  • How to confirm connectivity

Using PencilBlue with MongoDB Atlas

PencilBlue is a Node.js based, open source blogging and Content Management System, targeted at enterprise grade websites.

This post explains why MongoDB Atlas is an ideal choice for PencilBlue and then goes on to show how to configure PencilBlue to use it.

MongoDB Atlas Best Practices: Part 4

Operational Management and Securing your Deployment

MongoDB Atlas radically simplifies the operation of MongoDB. As with any hosted database as a service there are still decisions you need to take to ensure the best performance and availability for your application. This blog series provides a series of recommendations that will serve as a solid foundation for getting the most out of the MongoDB Atlas service.

We’ll cover four main areas over this series of blog posts:

  • In part 1, we got started by preparing for our deployment, focusing specifically on schema design and application access patterns.
  • In part 2, we discussed additional considerations as you prepare for your deployment, including indexing, data migration and instance selection.
  • In part 3, we provided a deep dive into how you scale your MongoDB Atlas deployment, and achieve your required availability SLAs.
  • In this final instalment, we’ll wrap up with best practices for operational management and ensuring data security.

If you want to get a head start and learn about all of these topics now, just go ahead and download the MongoDB Atlas Best Practices guide.

Managing MongoDB Atlas: Provisioning, Monitoring and Disaster Recovery

Created by the engineers who develop the database, MongoDB Atlas is the simplest way to run MongoDB, making it easy to deploy, monitor, backup, and scale MongoDB.

MongoDB Atlas incorporates best practices to help keep managed databases healthy and optimized. They ensure operational continuity by converting complex manual tasks into reliable, automated procedures with the click of a button:

  • Deploy. Using your choice of instance size, number of replica set members, and number of shards
  • Scale. Add capacity, without taking the application offline
  • Point-in-time, Scheduled Backups. Restore complete running clusters to any point in time with just a few clicks, because disasters aren't predictable
  • Performance Alerts. Monitor system metrics and get custom alerts

Deployments and Upgrades

All the user needs to do in order for MongoDB Atlas to automatically deploy the cluster is to select a handful of options:

  • Instance size
  • Storage size (optional)
  • Storage speed (optional)
  • Data volume encryption (optional)
  • Number of replicas in the replica set
  • Number of shards (optional)
  • Automated backups (optional)

The database nodes will automatically be kept up date with the latest stable MongoDB and underlying operating system software versions; rolling upgrades ensure that your applications are not impacted during upgrades.

Monitoring & Capacity Planning

System performance and capacity planning are two important topics that should be addressed as part of any MongoDB deployment. Part of your planning should involve establishing baselines on data volume, system load, performance, and system capacity utilization. These baselines should reflect the workloads you expect the system to perform in production, and they should be revisited periodically as the number of users, application features, performance SLA, or other factors change.

Featuring charts and automated alerting, MongoDB Atlas tracks key database and system health metrics including disk free space, operations counters, memory and CPU utilization, replication status, open connections, queues, and node status.

Historic performance can be reviewed in order to create operational baselines and to support capacity planning. Integration with existing monitoring tools is also straightforward via the MongoDB Atlas RESTful API, making the deep insights from MongoDB Atlas part of a consolidated view across your operations.

*Figure 1: Database monitoring with MongoDB Atlas GUI*

MongoDB Atlas allows administrators to set custom alerts when key metrics are out of range. Alerts can be configured for a range of parameters affecting individual hosts and replica sets. Alerts can be sent via email, webhooks, Flowdock, HipChat, and Slack or integrated into existing incident management systems such as PagerDuty.

When it's time to scale, just hit the CONFIGURATION button in the MongoDB Atlas GUI and choose the required instance size and number of shards – the automated, on-line scaling will then be performed.

Things to Monitor

MongoDB Atlas monitors database-specific metrics, including page faults, ops counters, queues, connections and replica set status. Alerts can be configured against each monitored metric to proactively warn administrators of potential issues before users experience a problem. The MongoDB Atlas team are also monitoring the underlying infrastructure, ensuring that it is always in a healthy state.

Application Logs And Database Logs

Application and database logs should be monitored for errors and other system information. It is important to correlate your application and database logs in order to determine whether activity in the application is ultimately responsible for other issues in the system. For example, a spike in user writes may increase the volume of writes to MongoDB, which in turn may overwhelm the underlying storage system. Without the correlation of application and database logs, it might take more time than necessary to establish that the application is responsible for the increase in writes rather than some process running in MongoDB.

Page Faults

When a working set ceases to fit in memory, or other operations have moved working set data out of memory, the volume of page faults may spike in your MongoDB system.

Disk

Beyond memory, disk I/O is also a key performance consideration for a MongoDB system because writes are journaled and regularly flushed to disk. Under heavy write load the underlying disk subsystem may become overwhelmed, or other processes could be contending with MongoDB, or the storage speed chosen may be inadequate for the volume of writes.

CPU

A variety of issues could trigger high CPU utilization. This may be normal under most circumstances, but if high CPU utilization is observed without other issues such as disk saturation or pagefaults, there may be an unusual issue in the system. For example, a MapReduce job with an infinite loop, or a query that sorts and filters a large number of documents from the working set without good index coverage, might cause a spike in CPU without triggering issues in the disk system or pagefaults.

Connections

MongoDB drivers implement connection pooling to facilitate efficient use of resources. Each connection consumes 1MB of RAM, so be careful to monitor the total number of connections so they do not overwhelm RAM and reduce the available memory for the working set. This typically happens when client applications do not properly close their connections, or with Java in particular, that relies on garbage collection to close the connections.

Op Counters

The utilization baselines for your application will help you determine a normal count of operations. If these counts start to substantially deviate from your baselines it may be an indicator that something has changed in the application, or that a malicious attack is underway.

Queues

If MongoDB is unable to complete all requests in a timely fashion, requests will begin to queue up. A healthy deployment will exhibit very short queues. If metrics start to deviate from baseline performance, requests from applications will start to queue. The queue is therefore a good first place to look to determine if there are issues that will affect user experience.

Shard Balancing

One of the goals of sharding is to uniformly distribute data across multiple servers. If the utilization of server resources is not approximately equal across servers there may be an underlying issue that is problematic for the deployment. For example, a poorly selected shard key can result in uneven data distribution. In this case, most if not all of the queries will be directed to the single mongod that is managing the data. Furthermore, MongoDB may be attempting to redistribute the documents to achieve a more ideal balance across the servers. While redistribution will eventually result in a more desirable distribution of documents, there is substantial work associated with rebalancing the data and this activity itself may interfere with achieving the desired performance SLA.

If in the course of a deployment it is determined that a new shard key should be used, it will be necessary to reload the data with a new shard key because designation and values of the shard keys are immutable. To support the use of a new shard key, it is possible to write a script that reads each document, updates the shard key, and writes it back to the database.

Replication Lag

Replication lag is the amount of time it takes a write operation on the primary replica set member to replicate to a secondary member. A small amount of delay is normal, but as replication lag grows, significant issues may arise.

If this is observed then replication throughput can be increased by moving to larger MongoDB Atlas instances or adding shards.

Disaster Recovery: Backup & Restore

A backup and recovery strategy is necessary to protect your mission-critical data against catastrophic failure, such as a software bug or a user accidentally dropping collections. With a backup and recovery strategy in place, administrators can restore business operations without data loss, and the organization can meet regulatory and compliance requirements. Taking regular backups offers other advantages, as well. The backups can be used to seed new environments for development, staging, or QA without impacting production systems.

MongoDB Atlas backups are maintained continuously, just a few seconds behind the operational system. If the MongoDB cluster experiences a failure, the most recent backup is only moments behind, minimizing exposure to data loss.

mongodump

In the vast majority of cases, MongoDB Atlas backups deliver the simplest, safest, and most efficient backup solution. mongodump is useful when data needs to be exported to another system, when a local backup is needed, or when just a subset of the data needs to be backed up.

mongodump is a tool bundled with MongoDB that performs a live backup of the data in MongoDB. mongodump may be used to dump an entire database, collection, or result of a query. mongodump can produce a dump of the data that reflects a single moment in time by dumping the oplog entries created during the dump and then replaying it during mongorestore, a tool that imports content from BSON database dumps produced by mongodump.

Integrating MongoDB with External Monitoring Solutions

The MongoDB Atlas API provides integration with external management frameworks through programmatic access to automation features and monitoring data. APM Integration Many operations teams use Application Performance Monitoring (APM) platforms to gain global oversight of their complete IT infrastructure from a single management UI. Issues that risk affecting customer experience can be quickly identified and isolated to specific components – whether attributable to devices, hardware infrastructure, networks, APIs, application code, databases and, more.

The MongoDB drivers include an API that exposes query performance metrics to APM tools. Administrators can monitor time spent on each operation, and identify slow running queries that require further analysis and optimization.

In addition, MongoDB Atlas provides packaged integration with the New Relic platform. Key metrics from MongoDB Atlas are accessible to the APM for visualization, enabling MongoDB health to be monitored and correlated with the rest of the application estate.

*Figure 2: MongoDB integrated into a single view of application performance*

As shown in Figure 1, summary metrics are presented within the APM’s UI. Administrators can also run New Relic Insights for analytics against monitoring data to generate dashboards that provide real-time tracking of Key Performance Indicators (KPIs).

Security

As with all software, MongoDB administrators must consider security and risk exposure for a MongoDB deployment. There are no magic solutions for risk mitigation, and maintaining a secure MongoDB deployment is an ongoing process.

Defense in Depth

A Defense in Depth approach is recommended for securing MongoDB deployments, and it addresses a number of different methods for managing risk and reducing risk exposure.

MongoDB Atlas features extensive capabilities to defend, detect, and control access to MongoDB, offering among the most complete security controls of any modern database:

  • User Rights Management. Control access to sensitive data using industry standard mechanisms for authentication and authorization at the database level
  • Encryption. Protect data in motion over the network and at rest in persistent storage

To ensure a secure system right out of the box, authentication and IP Address whitelisting are automatically enabled.

Review the security section of the MongoDB Atlas documentation to learn more about each of the security features discussed below.

IP Whitelisting

Clients are prevented from accessing the database unless their IP address (or a CIDR covering their IP address) has been added to the IP whitelist for your MongoDB Atlas group.

Authorization

MongoDB Atlas allows administrators to define permissions for a user or application, and what data it can access when querying MongoDB. MongoDB Atlas provides the ability to provision users with roles specific to a database, making it possible to realize a separation of duties between different entities accessing and managing the data.

Additionally, MongoDB's Aggregation Framework Pipeline includes a stage to implement Field-Level Redaction, providing a method to restrict the content of a returned document on a per-field level, based on user permissions. The application must pass the redaction logic to the database on each request. It therefore relies on trusted middleware running in the application to ensure the redaction pipeline stage is appended to any query that requires the redaction logic.

Encryption

MongoDB Atlas provides encryption of data in flight over the network and at rest on disk.

Support for SSL/TLS allows clients to connect to MongoDB over an encrypted channel. Clients are defined as any entity capable of connecting to MongoDB Atlas, including:

  • Users and administrators
  • Applications
  • MongoDB tools (e.g., mongodump, mongorestore)
  • Nodes that make up a MongoDB Atlas cluster, such as replica set members and query routers.

Data at rest can optionally be protected using encrypted data volumes.

Wrapping Up

This brings us to the end of our 4-part blog series. As you’ve seen, MongoDB Atlas automates the operational tasks that usually burdens the user, freeing you up to focus on what you do best – delivering great applications. There remain some tasks that will keep your application running smoothly and quickly; this blog series has described those best practices. Collectively, they will help you get the most out of the database service.

Remember, if you want to get a head start and learn about all of our recommendations now, just go ahead and download:

Download the MongoDB Atlas Best Practices Guide