Adopting a Serverless Approach at Bazaarvoice with MongoDB Atlas and AWS Lambda
I recently had the pleasure of welcoming Ani Hammond, Senior Staff Software Engineer from Bazaarvoice, to the MongoDB World stage. To a completely packed room, Ani chronicled her team’s journey as they replatformed Bazaarvoice’s Curations service from a runaway monolith architecture to a completely serverless architecture backed by MongoDB Atlas.
Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services. To use Ani’s own description, “If you're shopping online and you’re reading a review, it's probably powered by us.”
Bazaarvoice strives to connect brands and retailers with consumers through the gathering, curation, and display of user-generated content—anything from pictures on Instagram to an online product review—during a potential customer’s buying journey.
To give you a sense of the scale of this task, Bazaarvoice clocked over a billion total page views between Thanksgiving Day and Cyber Monday in 2017, peaking at around 6,000 page views per second!
Even if you’ve never heard of Bazaarvoice, it’s almost impossible that you’ve never interacted with their services.
One of the technologies behind this herculean task is the Curations platform. To understand how this platform works, let’s look at an example:
An Instagram user posts a cute photo of their child wearing a particular brand’s rain boots. Using Curations, that brand is watching for specific content that mentions their products, so the social collection service picks up that post and shows it to the client team in the Curations application. The post can then be enriched in various manual and automatic ways. For example, a member of the client team can append metadata describing the product contained in the image or automatic rules can filter content for potentially offensive material. The Curations platform then automates the process of securing the original poster’s permission for the client to use their content. Now, this user-generated content is able to be displayed in real time on the brand’s homepage or product pages to potential customers considering similar products.
In a nutshell, this is what Curations does for hundreds of clients and hundreds of thousands of individual content pieces.
The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS.
The technology behind Curations was previously a monolithic Python/Django-based stack on Amazon EC2 instances on top of a MySQL datastore deployed via RDS.
This platform was effective in allowing Bazaarvoice to scale to hundreds of new clients. However, this architecture did have an Achilles heel: each additional client onboarded to Bazaarvoice’s platform represented an additional Python/Django/MySQL cluster to manage. Not only was this configuration expensive (approximately $60,000/month), the operational overhead generated by each additional cluster made debugging, patching, releases, and general data management an ever-growing challenge. As Ani put it, “Most of our solutions were basically to throw more hardware/money at the problem and have a designated DevOps person to manage these clusters.”
One of the primary factors in selecting MongoDB for the new Curations platform was its support for a variety of different access patterns. For example, the part of the platform responsible for sourcing new social content had to support high write volume whereas the mechanism for displaying the content to consumers is read-intensive with strict availability requirements.
Diving into the specifics of why the Bazaarvoice team opted to move from a MySQL-based stack to one built on MongoDB is a blog post for another day. (Though, if you’d like to see what motivated other teams to do so, I recommend How DevOps, Microservices, and MongoDB are Making HSBC “Simpler, Better, and Faster” and Breuninger delivers omnichannel shopping experience for thousands of daily online users.)
That is to say, the focus of this particular post is the paradigm shift the Curations team made from a linearly-scaling monolith to a completely serverless approach, underpinned by MongoDB Atlas.
The new Curations platform is broken into three distinct services for content collection, enrichment, and display. The collections service is powered by a series of AWS Lambda functions triggered by an Amazon Kinesis stream written in Node.js whereas the enrichment and display services are built on autoscaling AWS Elastic Beanstalk instances. All three services making up the new Curations platform are backed by MongoDB Atlas.
Not only did this approach address the cluster-per-customer challenges of the old system, but the monthly costs were reduced by nearly 90% to approximately $6,500/month. The results are, again, best captured by Ani’s own words:
Massive cost savings, huge performance gains, strong consistency, and a handful of services rather than hundreds of clusters.
MongoDB Atlas was a natural fit in this new serverless paradigm as the team is fully able to focus on developing their product rather than on infrastructure management. In fact, the team had originally opted to manage the MongoDB instances on AWS themselves. After a couple of iterations of manual deployment and management, a desire to gain even more operational efficiency and increased insight into database performance prompted their move to Atlas. According to Ani, the cost of migrating to and leveraging a fully managed service was, "Way cheaper than having dedicated DevOps engineers.” Atlas’ support for direct VPC peering also made the transition to a hosted solution straightforward for the team.
Speaking of DevOps, one of the first operational benefits Ani and her team experienced was the ability to easily optimize their index usage in MongoDB. Previously, their approach to indexing was “build stuff that makes sense at the time and is easy to iterate on.” After getting up and running on Atlas, they were able to use the built-in Performance Advisor to make informed decisions on indexes to add and unused ones to remove. As Ani puts it:
An index killed is as valuable as an index added. This ensures all your indexes to fit into memory and a bad index doesn't push out the good ones.
Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries. According to her, the built-in tools helped keep the team honest, "[People] say, ‘My database isn't scaling. It's not able to perform complex queries in real time...it doesn't work.’ Fix your code. The hardware is great, the tools are great but they can only carry you so far. I think sometimes we tend to get sloppy with how we write our code because of how cheap and how easy hardware is but we have to write code responsibly too.”
In another incident, a different Atlas feature, the Real Time Performance Panel, was key to identifying an issue with high load times in the display service. Some client’s displays were taking more than 6 seconds to load. (For context, content delivery network provider, Akamai, found that a two-second delay in web page load time can cause bounce rates to double!) High-level metrics in Datadog reported 5+ seconds query response times, while Atlas reported less than 100 ms response times for the same query. The team used both data points to triangulate and soon realized the discrepancy was a result of the time it took for Lambda to connect to MongoDB for each new operation. Switching from standard Lambda functions to a dockerized service ensured each operation could leverage an open connection rather than initiating a “cold start.”
I know a lot of the cool things that Atlas does can be done by hand but unless this is your full-time job, you're just not going to do it and you’re not going to do it as well.
Ani’s team also used the Atlas Performance Advisor to diagnose and correct inefficient queries.
Before wrapping up her presentation, Ani shared an improvement over the old system that the team wasn’t expecting. Using Atlas, they were able to provide the customer support and services teams read-only views into the database. This afforded them deeper insight into the data and allowed them to perform ad-hoc queries directly. The result was a more proactive approach to issue management, leading to an 80% reduction in inbound support tickets.
By re-architecting their Curations platform, Bazaarvoice is well-positioned to bring on hundreds of new clients without a proportional increase in operations work for the team. But once again, Ani summarized it best:
As the old commercial goes… ‘Old platform: $60,000. New platform: $6,000. Getting to focus all of my time on development: priceless.'
Thank you very much to Ani Hammond and the rest of the Curations team at Bazaarvoice for putting together the presentation that inspired this post. Be sure to check out Ani’s full presentation in addition to dozens of other high-quality talks from MongoDB World on our YouTube channel.
Charting a Course to MongoDB Atlas: Part 1 - Preparing for the Journey
MongoDB Atlas is an automated cloud MongoDB service engineered and run by the same team that builds the database. It incorporates operational best practices we’ve learned from optimizing thousands of deployments across startups and the Fortune 100. You can build on MongoDB Atlas with confidence, knowing you no longer need to worry about database management, setup and configuration, software patching, monitoring, backups, or operating a reliable, distributed database cluster.
MongoDB introduced its Database as a Service offering, in July of 2016 and it’s been a phenomenal success since its launch. Since then, thousands of customers have deployed highly secure, highly scalable and performant MongoDB databases using this service. Among its most compelling features are the ability to deploy Replica Sets in any of the major cloud hosting providers (AWS, Azure, GCP) and the ability to deploy database clusters spanning multiple cloud regions. In this series, I’ll explain the steps you can follow to migrate data from your existing MongoDB database into MongoDB Atlas.
Preparing for the Journey
Before you embark on any journey regardless of the destination, it’s always a good idea to take some time to prepare. As part of this preparation, we’ll review some options for the journey — methods to get your data migrated into MongoDB Atlas — along with some best practices and potential wrong turns to watch out for along the way.
Let’s get a bit more specific about the assumptions I’ve made in this article.
- You have data that you want to host in MongoDB Atlas.
- There’s probably no point in continuing from here if you don’t want to end up with your data in MongoDB Atlas.
- Your data is currently in a MongoDB database.
- If you have data in some other format, all is not lost — we can help. However, we’re going to address a MongoDB to MongoDB migration in this series. If you have other requirements -- data in another database or another format, for example, let me know you’ll like an article covering migration from some other database to MongoDB and I’ll make that the subject of a future series.
- Your current MongoDB database is running MongoDB Version 3.0 or greater. MongoDB Atlas supports version 3.4, and 3.6. Therefore, we’ll need to work to get your database upgraded either as part of the migration - or, you can handle that ahead of the migration. We have articles and documentation designed to help you upgrade your MongoDB instances should you need.
- Your data is in a clustered deployment (Sharded or Replica Set). We’ll cover converting a standalone deployment to a replica set in part 3 of this series.
At a high level, there are 4 basic steps to migrating your data. Let’s take a closer look at the journey:
- Deploy a Destination Cluster in MongoDB Atlas
- Prepare for the Journey
- Migrate the databases
- Cutover and Modify Your Applications to use the new MongoDB Atlas-based Deployment
As we approach the journey, it's important to know the various routes from your starting point to your eventual destination. Each route has its considerations and benefits and the choice of which route you choose will ultimately be up to you. Review the following table which presents a list of the available data migration methods from which you may choose.
|Live Import||Fully automated via the Atlas administrative console.||Downtime: Minimal - Cutover Only.||Fully automated.||From:Version 2.6, 3.0, 3.2, 3.4To: 3.4, 3.6|
|mongomirror||mongomirror is a utility for migrating data from an existing MongoDB replica set to a MongoDB Atlas replica set. mongomirror does not require you to shut down your existing replica set or applications||Downtime: Minimal - Cutover Only.||Version 2.6 or great||3.4, 3.6|
|mongorestore||mongorestore is a command-line utility program that loads data from either a binary database dump created by mongodump or the standard input.||Downtime required||Version 2.6 or Greater||3.4, 3.6|
|mongoimport||mongoimport tool imports content from an Extended JSON, CSV, or TSV export created by mongoexport, or potentially, another third-party export tool||Downtime required|
For a majority of deployments, Live Import is the best, most efficient route to get your data into MongoDB Atlas. It offers the ability to keep your existing cluster up and active (but not too active, see considerations.) There are considerations, however. If you’re not located in a region that is geographically close to the US-EAST AWS datacenter, for example, you may encounter unacceptable latency. There are a number of possible concerns you should consider prior to embarking on your migration journey. The following section offers some helpful route guidance to ensure that you’re going in the right direction and moving steadily toward your destination.
Route Guidance for the Migration JourneyIf you've made it this far, you’re likely getting ready to embark on a journey that will bring your data into a robust, secure, and scalable environment within MongoDB Atlas. The potential to encounter challenges along the way is real and the likelihood of encountering difficulties depends primarily upon your starting point in that journey. In this section, I’ll discuss some potential issues you may encounter as you prepare for your migration journey. A summary of the potential detours and guidance for each is presented in the following table.
Follow the links in the table to read more about each potential detour and its relevant guidance:
The amount of RAM a deployment will require is largely informed by the applications’ demand for data in the database. To approximate RAM requirements, we typically look at the frequently accessed documents in each collection, adding up the total data size and then we increase that by the total size of required indexes. Referred to as the working set, this is typically the approximate amount of RAM we’ll want our deployment to have. A more complete discussion of sizing will be found in the documentation pages on sizing for MongoDB.
Sizing is a tricky task especially for the cost constrained. We obviously don’t want to waste money by over-provisioning servers larger than those we’ll need to support the profile of our users and applications. However, it is important to consider that during our migration, we’ll not only need to account for the application requirements -- we also need to account for the resources required by the migration process itself. Therefore, you will want to ensure that you surpass the requirements for your production implementation when sizing your destination cluster.
The size of the destination cluster should provide adequate resource across all environmentals (storage, CPU, and Memory) with room to spare. The migration process will require additional CPU and Memory as the destination database is being built from the source. It’s quite common for incoming clusters to be undersized and as a result the migration process fails. If this happens during a migration, you must empty the destination cluster, and resize the cluster to a larger M-Value to increase the amount of available RAM. A great feature of Atlas is that resizing -- in both directions, is extremely easy to do. Whether you’re adding resource (increasing the amount of RAM, Disk, CPU, shards, etc.) or decreasing the same, the process is very simple. Therefore, increasing the resource available on your target environment is painless and easy -- and once the migration completes, you can simply scale back down to a cluster size with less RAM, and CPU.
Latency is defined as the amount of time it takes for a packet of data to get from one designated point to another. Because the migration process is all about moving packets of data between servers it is by its very nature latency sensitive.
Migrating data into MongoDB Atlas leveraging the Live Import capability involves connecting your source MongoDB Instance to a set of application servers running in the AWS us-east-1 region. These servers act as the conductors running the actually migration process between your source and destination MongoDB Database Servers. A potential detour can crop up when your source MongoDB database deployment exists in a datacenter located far from the AWS us-east-1 region.
In order to accomplish a migration using Live Import, Atlas streams data through a set of MongoDB-Controller application servers. Atlas provides the IP Address ranges of the MongoDB Live Import servers during the Live Import process. You must be certain to add these IP Address ranges to the IP Whitelist for your Destination cluster.
The migration processes within Atlas run on a set of application servers --- these are the traffic directors. The following is a list of the IP Addresses on which these application servers depend. It is important to ensure that traffic between these servers, your source cluster and the destination cluster is able to freely flow. These addresses are in C.I.D.R. notation.
An additional area where a detour may be encountered is in the realm of corporate firewall policy.
To avoid these potential detours, ensure that you have the appropriate connectivity from the networks where your source deployment resides to the networks where MongoDB Atlas exists.
These IP ranges will be provided at the start of the migration process. Ensure that you configure the whitelist to enable appropriate access during the migration.
Every deployment of MongoDB should enforce authentication. This will ensure that only appropriate individuals and applications may access your MongoDB data.
A potential detour may arise when you attempt to Live Migrate a database without creating or providing the appropriately privileged user credentials.
If the source cluster enforces authentication, create a user with the following privileges:
- Read all databases and collections (i.e. readAnyDatabase on the admin database)
- Read the oplog.
Create a SCRAM user and password on each server in the replica set and ensure that this user belongs to roles that have the following permissions:
Read and write to the config database Read all databases and collections. Read the oplog.
- For 3.4+ source clusters, a user with both clusterMonitor and backup roles would have the appropriate privileges.
- For 3.2 source cluster, a user with clusterMonitor, clusterManager, and backup roles would have appropriate privileges.
Specify the username and password to Atlas when prompted by the Live Migration procedure.
Also, once you’ve migrated your data, if the source cluster enforced authentication you must consider that Atlas does not migrate any user or role data to the destination cluster. Therefore, you must re-create the credentials used by your applications on the destination Atlas cluster. Atlas uses SCRAM for user authentication. See Add MongoDB Users for a tutorial on creating MongoDB users in Atlas.
The oplog, or operations log is a capped collection that keeps a rolling record of all operations that modify the data stored in your databases. When you create an Atlas cluster to serve as the destination for your migration, by default Atlas creates the oplog size at 5% of the total amount of disk you allocated for the cluster. If the activity profile of your application requires a larger oplog size, you will need to submit a proactive support ticket to have the oplog size increased on your destination cluster.
Route Guidance: Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.
As stated previously, the decisions regarding the resources we apply to a given MongoDB Deployment are informed by the profile of the applications that depend on the database. As such, there are certain application read/write profiles or workloads that require a larger than default operations log. These are listed in detail in the documentation pages on the subject of Replica Set Oplog. Here is a summary of the workloads that typically require a larger than normal Oplog:
Updates to Multiple Documents at Once
The oplog must translate multi-updates into individual operations in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in data size or disk use.
Deletions Equal the Same Amount of Data as Inserts
If you delete roughly the same amount of data as you insert, the database will not grow significantly in disk use, but the size of the operation log can be quite large.
Significant Number of In-Place Updates If a significant portion of the workload is updates that do not increase the size of the documents, the database records a large number of operations but does not change the quantity of data on disk.
Regardless of your starting point, MongoDB provides a robust, secure and scalable destination for your data. MongoDB Atlas Live Import automates and simplifies the process of migrating your data to MongoDB Atlas. The command line version of this utility, called mongomirror, gives users additional control and flexibility around how the data gets migrated. Other options include exporting (mongoexport) and importing (mongoimport) your data manually or even writing your own application to accomplish migration. The decision to use one particular method over another depends upon the size of your database, its geographic location as well as your tolerance for application downtime.
If you choose to leverage MongoDB Atlas Live Import, be aware of the following potential challenges along the journey.
- Increase available RAM During Migration sufficient for application plus migration requirements.
- Reduce latency if possible or use mongomirror instead of Live Import.
- Whitelist the IP Ranges of the MongoDB Live Import Process
- Ensure Appropriate User Access Permissions on the Source Deployment
- Size the Operations Log (Oplog) Appropriately - Submit a Proactive Support Ticket if Oplog Resize is Needed.
Now that you’re fully prepared, let’s embark on the journey and I’ll guide you through the process of deploying a cluster in MongoDB Atlas and walk you through migrating your data from an AWS Replica Set.
High-end retailer in Germany delivers omni-channel shopping experience on MongoDB Atlas for thousands of daily online users
The importance of delivering an optimized customer experience cannot be overstated, especially if your business is high-end retail. For Breuninger, the customer-first approach has been in their DNA for more than 130 years.
When the top German retailer set out to build a new e-commerce platform, they wanted the online experience to match that of walking in to one of Breuninger’s premium department stores. Accomplishing this goal required a feature-rich, high-performance, and reliable database capable of supporting complex data sets across multiple categories.
“Today, our development teams have a lot of independence. We only have a handful of rules about how they design and build applications within their respective business units,” says Benedikt Stemmildt, Lead Software Architect of E. Breuninger GmbH & Co. “It’s not quite a rule that you have to use MongoDB, but you do have to explain yourself if you don’t.”
However, it wasn’t always this way. Breuninger’s previous platform was built on one of the industry-standard product content management (PCM) platforms, which Stemmildt felt was “monolithic and difficult to code for.” Code freezes were common and the underlying architecture was a frequent cause of frustration for an organization striving to adopt more agile processes.
A new development and feature roll-out approach was needed to execute the company’s aggressive omni-channel integration plans, and time to market for new online features became a top priority. Breuninger decided to build a technology group in response, going from 10 to 30 in-house developers in just a year.
“We broke down our monolithic architecture and split our application into separate microservices that reflect how our customers shop in the physical stores,” Stemmildt says. “It’s the customer journey — they search, discover, evaluate, and buy not just individual products, but complete outfits.”
“To reflect this architectural change, we split our development teams by different steps of the customer journey and kept dependencies to an absolute minimum,” Stemmildt continues. “One key to making this work is a high-performance database capable of working easily with data in lots of different ways. The document model of MongoDB means we can deliver data with the quality and detail that reflects our products and shopping experience.”
The result? Much faster time to market. Breuninger was able to build their omni-channel platform in months rather than years by enabling teams to decide on important architectural components for their own sections, without having to ask the permission of other teams.
As a seven-year veteran of MongoDB, Stemmildt was confident in recommending the database to his organization. “There are a lot of good databases,” he says. “However, many of them require developers to have a deep knowledge about how they work before getting any benefit. MongoDB is not like that. It’s very quick to learn and start getting results. Our teams are able to deliver features straight away. Once users do expand their use of the database, it’s so feature-rich that you never get a sense of having to push it beyond what it was designed for.”
And agile wouldn’t be agile without automation. “Everything we deploy is automated, and with MongoDB Atlas on AWS, the deployment and management of our databases fit neatly into our processes. After a period of operating MongoDB ourselves on EC2, it’s great not having to worry about the details and not having to spend time setting up, configuring, and managing database[s]. You free up a lot of opportunities to add value to your service by not running things yourself.”
AWS offers a healthy mix of other tools for the teams at Breuninger to leverage, such as a managed Kubernetes service and serverless Lambda functions. MongoDB Atlas and AWS also help Breuninger stay on the right side of the regulators. “We need to comply with GDPR so we keep everything running within our borders. MongoDB Atlas’s built-in security features have helped us satisfy these requirements.”
The finished platform might look different to someone who is used to traditional architectures, but to Stemmildt, not being restrained by legacy approaches makes a lot of sense. “Each of our teams owns one or more sections of the customer journey. The search team updates its own database, pulling data in from the product data producer via a feed and re-populating its own database as needed. We don’t have to ripple refreshes out across the system as they happen. That means each team is free to add new features without changing some core database component and affecting other teams. Self-contained systems are an important design rule.”
And although there are some 25 different and largely independent systems, the customers see just one website. A front-end proxy uses server-side includes to marshal data as required from a mix of micro-frontends before delivering the final composite to the shopper. Product data, product availability, outfit data, price information, navigation metadata — these are all woven together from separate MongoDB databases as the customer goes through the shopping experience online.
Comparing a microservices architecture to a monolithic one revealed to Breuninger that some metrics don’t matter as much as they once did, while others matter more. “With multiple teams developing things so rapidly, I don’t know exactly how much total data is in play. But we are a very metrics-driven company, not just in the technical infrastructure but across the business. We know when a component is and is not working well from both a technical and business perspective, if it needs optimizing for performance, or whether it is delivering value to the business or we need to revisit that aspect of the system architecture.”
While Stemmildt couldn’t comment too much on future plans, he’s enthusiastic about MongoDB’s part in whatever they may be. “We wanted high performance, but most importantly we wanted to be able to add more features. We’re not using MongoDB’s graph database feature yet, but we may be by the end of the year. There are a lot of things we could do with text search, too.”
Other new features — such as multi-document transaction support in MongoDB 4.0 — may also be useful, but in unorthodox ways. “I don’t actually think transactions are needed anymore for our platform,” he laughs, “But there are some teams, like the customer data team, who don’t agree with me yet and won’t use MongoDB because of that. So the release of MongoDB 4.0 will help me to help them make the transition.”
While customers won’t see the nuts and bolts of Breuninger’s transformation to a data-driven enterprise, they will benefit from the company’s newly integrated omni-channel platform, which delivers an improved customer experience and more ways to get inspired.
And to anyone thinking about using MongoDB on their next project, Stemmildt has just one piece of advice: “Use it. Get a MongoDB Atlas account, create a cluster, and play with it. The way we see it, after the majority of our teams have naturally adopted MongoDB, if you can’t say why you should use another database, then you should just use MongoDB.”
New to MongoDB Atlas — Global Clusters Enable Low-Latency Reads and Writes from Anywhere
The ability to replicate data across any number of cloud regions was introduced to MongoDB Atlas, the fully managed service for the database, last fall. This granted Atlas customers two key benefits. For those with geographically distributed applications, this functionality allowed them to leverage local replicas of their data to reduce read latency and provide a fast, responsive customer experience on a global scale. It also meant that an Atlas cluster could be easily configured to failover to another region during cloud infrastructure outages, providing customers with the ability to provision multi-region fault tolerance in just a few clicks.
But what about improving write latency and addressing increasingly demanding regulations, many of which have data residency requirements? In the past, users could address these challenges in a couple of ways. If they wanted to continue using a fully managed MongoDB service, they could deploy separate databases in each region. Unfortunately, this often resulted in added operational and application complexity. They could also build and manage a geographically distributed database deployment themselves and satisfy these requirements using MongoDB’s zone sharding capabilities.
Today we’re excited to introduce Global Clusters to MongoDB Atlas. This new feature makes it possible for anyone to effortlessly deploy and manage a single database that addresses all the aforementioned requirements. Global Clusters allow organizations with distributed applications to geographically partition a fully managed deployment in a few clicks, and control the distribution and placement of their data with sophisticated policies that can be easily generated and changed.
Improving app performance by reducing read and write latency
With Global Clusters, geographically distributed applications can write to (and of course, read from) local partitions of an Atlas deployment called zones. This new Global Writes capability allows you to associate and direct data to a specific zone, keeping it in close proximity to nearby application instances and end users. In its simplest configuration, an Atlas zone contains a 3-node replica set distributed across the availability zones of its preferred cloud region. This configuration can be adjusted depending on your requirements. For example, you can turn the 3-node replica set into multiple shards to address increases in local write throughput. You can also distribute the secondaries within a zone into other cloud regions to enable fast, responsive read access to that data from anywhere.
The illustration above represents a simple Global Cluster in Atlas with two zones. For simplicity’s sake, we’ve labeled them blue and red. The blue zone uses a cloud region in Virginia as the preferred region, while the red zone uses one in London. Local application instances will write to and read from the MongoDB primaries located in the respective cloud regions, ensuring low latency read and write access. Each zone also features a read-only replica of its data located in the cloud region of the other one. This ensures that users in North America will have fast, responsive read access to data generated in Europe, and vice versa.
Satisfying data residency for regulatory requirements
By allowing developers to easily direct the movement of data at the document level, Global Clusters provide a foundational building block that helps organizations achieve compliance with regulations containing data residency requirements. Data is associated with a zone and pinned to that zone unless otherwise configured.
The illustration below represents an Atlas Global Cluster with 3 zones — blue, red, and orange. The configuration of the blue and red zones are very similar to what we already covered. Local application instances read and write to nearby primaries located in the preferred regions — Virginia and London — and each zone includes a read-only replica in the preferred cloud region of every other zone for serving fast, global reads. What’s different is the orange zone, which serves Germany. Unlike data generated in North America and the UK, data generated in and around Germany is not replicated globally; instead, it remains pinned to the preferred cloud region located in Frankfurt.
Deploying your first Global Cluster
Now let’s walk through how easy it is to set up a Global Cluster with MongoDB Atlas.
In the Atlas UI, when you go to create a cluster, you’ll notice a new accordion labelled Global Cluster Configuration. If you click into this and enable “Global Writes”, you’ll find two easy-to-use and customizable templates. Global Performance provides reasonable read and write latency to the majority of the global population and Excellent Global Performance provides excellent read and write latency to majority of the global population. Both options are available across AWS, Google Cloud Platform, and Microsoft Azure.
You can also configure your own zones. Let’s walk through the setup of a Global Cluster using the Global Performance template on AWS. After selecting the Global Performance template, you’ll see that the Americas are mapped to the North Virginia region, EMEA is mapped to Frankfurt, and APAC is mapped to Singapore.
As your business requirements change over time, you are able to switch to the Excellent Global Performance template or fully customize your existing template.
Customizing your Global Cluster
Say you wanted to move your EMEA zone from Frankfurt to London. You can do so in just a few clicks. If you scroll down in the Create Cluster Dialog, you’ll see the Zone configuration component (pictured below). Select the zone you want to edit and simply update the preferred cloud region.
Once you’re happy with the configuration, you can verify your changes in the latency map and then proceed to deploy the cluster.
After your Global Cluster has been deployed, you’ll find that it looks just like any other Atlas cluster. If you click into the connect experience to find your connection string, you’ll find a simple and concise connection string that you can use in all of your geographically distributed application instances.
Configuring data for a Global Cluster
Now that your Global Cluster is deployed, let's have a look at the Atlas Data Explorer, where you can create a new database and collection. Atlas will walk you through this process, including the creation of an appropriate compound shard key — the mechanism used to determine how documents are mapped to different zones.
This shard key must contain the
location field. The second field should be a well-distributed identifier, such as
userId. Full details on key selection can be found in the MongoDB Atlas docs.
To help show what documents might look like in your database, we’ve added a few sample documents to a collection in the Data Explorer. As you can see above, we’ve included a field called
location containing a ISO-3166-1 alpha 2 country code ("US", "DE", "IN") or a supported ISO-3166-2 subdivision code ("US-DC", "DE-BE", "IN-DL"), as well as a field called
userId, which acts as our well-distributed identifier. This ensures that location affinity is baked into each document.
In the background, MongoDB Atlas will have automatically placed each of these documents in their respective zones. The document corresponding to Anna Bell will live in North Virginia and the document corresponding to John Doe will live in Singapore. Assuming we have application instances deployed in Singapore and North Virginia, both will use the same MongoDB connection string to connect to the cluster. When Anna Bell connects to our application from the US, she will automatically be working with data kept in close proximity to her. Similarly, when John Doe connects from Australia, he will be writing to the Singapore region.
Adding a zone to your Global Cluster
Now let’s say that you start to see massive adoption of your application in India and you want to improve the performance for local users. At anytime, you can return to your cluster configuration, click “Add a Zone”, and select Mumbai as the preferred cloud region for the new zone.
The global latency map will update, showing us the new zone and an updated view of the countries that map to it. When we deploy the changes, the documents that are tagged with relevant ISO country codes will gracefully be transferred across to the new zone, without downtime.
Scaling write throughput in a single zone
As we mentioned earlier in this post, it’s possible to scale out a single zone to address increases in local write throughput. Simply scroll to the “Zone Configuration”, click on “Additional Options” and increase the number of shards. By adding a second shard to a zone, you are able to double your write throughput.
Low-latency reads of data originating from other zones
We also referenced the ability to distribute read-only replicas of data from a zone into the preferred cloud regions of other zones, providing users with low-latency read access to data originating from other regions. This is easy to configure in MongoDB Atlas. In “Zone Configuration”, select “Add secondary and read-only regions”. Under “Deploy read-only replicas”, select “Add a node” and choose the region where you’d like your read-only replica to live.
For global clusters, Atlas provides a shortcut to creating read-only replicas of each zone in every other zone. Under “Zone configuration summary”, simply select the “Configure local reads in every zone” button.
MongoDB Atlas Global Clusters are very powerful, making it possible for practically any developer or organization to easily deploy, manage, and scale a distributed database layer optimized for low-latency reads and writes anywhere in the world. We're very excited to see what you build with this new functionality.
Global clusters are available today on Amazon Web Services, Google Cloud Platform, and Microsoft Azure for clusters M30 and larger.
Introducing Free Cloud Monitoring for MongoDB
With the release of MongoDB 4.0, we’re excited to announce the availability of free cloud monitoring, the easiest way to monitor and visualize the status of your MongoDB deployments.
Let’s walk through how it works.
After you’ve installed MongoDB 4.0, connect to your instance(s) using the
MongoDB shell version v4.0.0
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.0
Enable MongoDB's free cloud-based monitoring service to collect and display
metrics about your deployment (disk utilization, CPU, operation statistics,
The monitoring data will be available on a MongoDB website with a unique
URL created for you. Anyone you share the URL with will also be able to
view this page. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.
To enable free monitoring, run the following command:
When you run the command, you should see something similar to what’s shown below.
"state" : "enabled",
"message" : "To see your monitoring data, navigate to the unique URL below. Anyone you share the URL with will also be able to view this page. You can disable monitoring at any time by running db.disableFreeMonitoring().",
"url" : "https://cloud.mongodb.com/freemonitoring/cluster/22E5ZH35UZ77JY3UHS3VYYTI7BKBIHWF",
"userReminder" : "",
"ok" : 1
Simply copy and paste your unique URL into a browser to access your monitoring dashboard. Free cloud monitoring tracks key performance indicators such as operation execution times, disk utilization, memory, network input/out, and more in interactive charts.
Mousing over chart lines reveal precise metrics.
You can also zoom in to 1 minute granularity.
Free cloud monitoring supports standalone instances and replica sets of MongoDB 4.0+. Of course, only monitoring metadata is accessed, never the contents of your databases. You can disable monitoring and your unique URL at any time by running the
For more information, visit our documentation.
New to MongoDB Atlas — Free Fully Managed Databases on Google Cloud Platform
Today we’re excited to announce that the MongoDB Atlas free tier — which provides access to a fully managed M0 cluster with 512 MB of storage at no cost — is now available on Google Cloud Platform (GCP).
We launched the MongoDB Atlas database as a service on GCP one year ago at MongoDB World 2017. Since then, we’ve made significant product enhancements, culminating in the most powerful MongoDB service for developers building their applications on Google’s expanding ecosystem of cloud services. For example, we launched Atlas into 13 Google cloud regions earlier this year, and added the ability for customers to replicate their data to any number of regions for fast, responsive read access and multi-region fault tolerance.
Companies like Longbow Advantage, a supply chain partner to Del Monte Foods and Subaru of America, are using MongoDB Atlas on GCP to accelerate innovation and stay agile and efficient in their development life cycles.
We wanted to ensure that our team could remain focused on the application and not have to worry about the underlying infrastructure. Atlas allowed us to do just that.Alex Wakefield, Chief Commercial Officer, Longbow Advantage
The availability of the Atlas free tier on GCP will make it easier than ever for developers using the cloud platform to experiment in an optimized environment for MongoDB, with no barrier to entry. The M0 cluster is ideal for learning MongoDB, prototyping, or early development, and has built-in security, availability, and fully managed upgrades.
The Atlas free tier on GCP is available in 3 regions:
- Iowa (us-central1)
- Belgium (europe-west1)
- Singapore (asia-southeast1)
MongoDB Atlas is available in 13 GCP regions
Getting started is simple. When building a new cluster in MongoDB Atlas, select GCP as the cloud provider and then select the region closest to your application server(s) with the "Free tier available" label.
Then, in the Cluster Tier tab, select the M0 cluster size.
Finally, name your cluster.
That's it. We're excited to see what you build with MongoDB Atlas and GCP!
New to MongoDB Atlas — Fully Managed Connector for Business Intelligence
Driven by emerging requirements for self-service analytics, faster discovery, predictions based on real-time operational data, and the need to integrate rich and streaming data sets, business intelligence (BI) and analytics platforms are one of the fastest growing software markets.
Today, it’s easier than ever for MongoDB Atlas customers to make use of the MongoDB Connector for BI. The new BI Connector for Atlas is a fully managed, turnkey service that allows you to use your automated cloud databases as data sources for popular SQL-based BI platforms, giving you faster time to insight on rich, multi-structured data.
The BI Connector for Atlas removes the need for additional BI middleware and custom ETL jobs, and relies on the underlying Atlas platform to automate potentially time-consuming administration tasks such as setup, authentication, maintaining availability, and ongoing management.
Customers can use the BI Connector for Atlas along with the recently released MongoDB ODBC Driver to provide a SQL interface to fully managed MongoDB databases. This allows data scientists and business analysts responsible for analytics and business reporting on MongoDB data to easily connect to and use popular visualization and dashboarding tools such as Excel, Tableau, MicroStrategy, Microsoft Power BI, and Qlik.
When deploying the BI Connector, Atlas designates a secondary in your managed cluster as the data source for analysis, minimizing the likelihood an analytical workload could impact performance on your operational data store. The BI Connector for Atlas also utilizes MongoDB’s aggregation pipeline to push more work to the database and reduce the amount of data that needs to be moved and computed in the BI layer, helping deliver insights faster.
The BI Connector for Atlas is currently available for M10 Atlas clusters and higher.
New to MongoDB Atlas — Full CRUD Support in Data Explorer
As a fully managed database service, MongoDB Atlas makes life simpler for anyone interacting with MongoDB, whether you’re deploying a cluster on demand, restoring a snapshot, evaluating real-time performance metrics, or inspecting data.
Today, we’re taking it one step further by allowing developers to manipulate their data right from within the Atlas UI. The embedded Data Explorer, which has historically allowed you to run queries, view metadata regarding your deployments, and retrieve information such as index usage statistics, now supports full CRUD functionality.
To support these capabilities, new Project-level roles with different permission levels have been added.
You can assign users these new roles in the Users and Teams settings.
In addition, all Data Explorer operations are tracked and presented in the Atlas Activity Feed (found in the Alerts menu for each Project), allowing you to see who did what, and when.
When you click into the Data Explorer in Atlas, you should see new controls for interacting with your documents, collections, databases, and indexes. For example, modify existing documents using the intuitive visual editor, or insert new documents and clone or delete existing ones in just a few clicks. A comprehensive list of available Data Explorer operations can be found in the Atlas documentation.
The Data Explorer is currently available for M10 Atlas clusters and higher.
New to MongoDB Atlas on AWS — AWS Cloud Provider Snapshots, Free Tier Now Available in Singapore & Mumbai
AWS Cloud Provider Snapshots
MongoDB Atlas is an automated cloud database service designed for agile teams who’d rather spend their time building apps than managing databases, backups, and restores. Today, we’re happy to announce that Cloud Provider Snapshots are now available for MongoDB Atlas replica sets on AWS. As the name suggests, Cloud Provider Snapshots provide fully managed backup storage and recovery using the native snapshot capabilities of the underlying cloud service provider.
Choosing a backup method for a database cluster in MongoDB Atlas
When this feature is enabled, MongoDB Atlas will perform snapshots against the primary in the replica set; snapshots are stored in the same cloud region as the primary, granting you control over where all your data lives. Please visit our documentation for more information on snapshot behavior.
Cloud Provider Snapshots on AWS have built-in incremental backup functionality, meaning that a new snapshot only saves the data that has changed since the previous one. This minimizes the time it takes to create a snapshot and lowers costs by reducing the amount of duplicate data. For example, a cluster with 10 GB of data on disk and 3 snapshots may require less than 30 GB of total snapshot storage, depending on how much of the data changed between snapshots.
Cloud Provider Snapshots are available for M10 clusters or higher in all of the 15 AWS regions where you can deploy MongoDB Atlas clusters.
Free, $9, and $25 MongoDB Atlas clusters now available in Singapore & Mumbai
We’re committed to lowering the barrier to entry to MongoDB Atlas and allowing developers to build without worrying about database deployment or management. Last week, we released a 14% price reduction on all MongoDB Atlas clusters deployed in AWS Mumbai. And today, we’re excited to announce the availability of free and affordable database cluster sizes in South and Southeast Asia on AWS .
Free M0 Atlas clusters, which provide 512 MB of storage for experimentation and early development, can now be deployed in AWS Singapore and AWS Mumbai. If more space is required, M2 and M5 Atlas clusters, which provide 2 GB and 5 GB of storage, respectively, are now also available in these regions for just $9 and $25 per month.
MongoDB Atlas Price Reduction - AWS Mumbai
Developers use MongoDB Atlas, the fully automated cloud service for MongoDB, to quickly and securely create database clusters that scale effortlessly to meet the needs of a new generation of applications.
We recognize that the developer community in India is an incredibly vibrant one, one that is growing rapidly thanks to startups like Darwinbox. The team there built a full suite of HR services online, going from a standing start to a top-four sector brand in the Indian market in just two years.
As part of our ongoing commitment to support the local developer community and lower the barrier to entry to using a MongoDB service that removes the need for time-consuming administration tasks, we are excited to announce a price reduction for MongoDB Atlas. Prices are being reduced by up to 14% on all MongoDB Atlas clusters deployed in AWS Mumbai. With this, we aim to give more developers access to the best way to work with data, automated with built-in best practices.
MongoDB Atlas is available in India on AWS Mumbai and GCP Mumbai. It is also available on Microsoft Azure in Pune, Mumbai and Chennai. Never tried MongoDB Atlas? Click here to learn more.