Avery Rosen

2 results

Virtualizing MongoDB on Amazon EC2 and GCE: Part 2

In part one of this series, we introduced David Mytton, CTO of Server Density , and discussed his pros and cons of virtualizing infrastructure in public clouds. In part 2, we finish with specific practices for virtualizing MongoDB in two popular public clouds, Amazon EC2 and Google Compute Engine. Virtualizing Databases To perform well, databases have system requirements that were initially a challenge for cloud providers. Databases don’t generally need a whole lot of CPU, what they want is as much RAM as you can give them, and low IO latency for many small reads and writes. While RAM in the cloud is performant and readily available, low-latency IO is a challenge, due to limitations of the underlying hardware typically used for VMs. Many teams trying out database systems such as MySQL or MongoDB ran afoul of these issues at first. Thankfully, these challenges have since been overcome as providers improved their offerings to meet them, and today it is no problem for Server Density to handle their massive MongoDB installation completely in the cloud. General Best Practices Every MongoDB operator should be aware of the guidance found in the MongoDB Production Notes . These practices are not particular to virtualized instances of MongoDB, however configuration particulars, such as disabling atime and setting good readahead values are more important to get right from the get-go, as any sub-optimal setup will be exacerbated by a virtual environment. Additionally, there are considerations that are particular to virtualized environments. For example, when using linux and virtual block devices, such as EC2’s Elastic Block Store and GCE’s Persistent Disk, the noop IO scheduler should be used, to allow the underlying hypervisor to handle the scheduling. For more, see the Production Notes section on Virtual Environments and Performance Best Practices for MongoDB . EC2 Specific Optimizations For many enterprises, using MongoDB Management Service (MMS) is a great way to get started with MongoDB on EC2. By doing so you can launch instances at the push of a button which are already set up with all the best practices, so you can be assured you’re not overlooking any of them. Those teams that want a config more tailored for their needs can do so, taking heed of these tips: Only use the newer instance types. Use instance types optimized for memory (r3) or storage (i2), as MongoDB will generally not be CPU-bound. Only use a CPU optimized instance if your prototyping shows your app is one of those cases where CPU bottlenecking is a concern. (Note that when MongoDB 3.0 is released, deployments using the new WiredTiger storage engine will also see greater CPU utilization.) Use EBS optimized instances, and use provisioned IOPS. This is the single most important detail to get right on EC2. Without using provisioned IOPS, you will experience unpredictable latency on IO, a deal-breaker for almost any database. In some rare cases, you may be able to tolerate some amount of latency spikes on a small database. In those cases, the Generic SSD option is cheaper than the Provisioned IOPS option, but this will require validation. Split out the log file, journal file, and data directories onto separate volumes. Each of these volumes can have their size and IOPS provisioned individually to suit their needs. To achieve the highest possible throughput, set up volumes as RAID–10. AWS recommends this as a redundancy measure as well, but David points out that the amount of time to rebuild a RAID, and the performance impact in production during that rebuild, makes it more advantageous to just spin up a new instance and add it as a new replica set member. The MongoDB documentation guide to Deploying MongoDB to EC2 contains more detail, as well as a walkthrough of setting up a MongoDB instance on EC2. GCE Specific Optimizations Google Compute Engine differs significantly from EC2, in this case most importantly with respect to their IO subsystems. With GCE, both RAIDing and separating log, journal, and data files onto their own volumes is unnecessary, and in fact negatively impacts performance. This is because GCE Persistent Disks are implemented as stripes across many physical disks already, making RAIDing redundant. This is also why volume size translates directly into available IOPS. To get the most out of MongoDB on GCE, follow these guidelines: Use the high memory machine types, and make your choice of which one based on the memory footprint you need. Volume size is directly correlated to the IOPS available, making it very easy to understand the performance you can expect. Always keep in mind that the size you allocate must be the greater of the size you need to store your data and the size you need to guarantee the IOPS performance you need. Refer to the documentation for Compute Engine Disks for an extensive discussion of the performance characteristics of Persistent Disks, as well as several useful examples. Virtual Machines have IOPS limits themselves, detailed on the GCE Disks documentation . As explained above, don’t use RAIDs! The MongoDB documentation guide to Deploying MongoDB to GCE contains more detail, as well as a walkthrough of setting up a MongoDB instance on GCE. Also, download our operations whitepaper for best practices on deploying and managing a MongoDB cluster: Read the ops best practices guide About the Author - Avery Avery is an infrastructure engineer, designer, and strategist with 20 years experience in every facet of internet technology and software development. As principal of Bringing Fire Consulting, he offers clients his expertise at the intersection of technology, business strategy, and product formulation. He earned a B.A in Computer Science from Brown University, where he specialized in systems and network programming, while also studying anthropology, fiction, cog sci, and semiotics. Avery got his start in internet technology in 1993, configuring apache and automating systems at Panix, the third-oldest ISP in the world. He has an obsession with getting to the heart of a problem, a flair for communication, and a devotion to providing delight to end users. < Read Part 1

January 7, 2015

Virtualizing MongoDB on Amazon EC2 and GCE: Part 1

Building an infrastructure in the cloud is an increasingly popular choice these days. Low startup costs, flexibility, and ease of deploying to multiple regions are all compelling features for new ventures and established enterprises alike. As part of this trend, virtualizing MongoDB is increasingly common. Databases present specific challenges to virtualization, which in many cases has led to poor performance, especially before the emergence of clear best practices. As part of a migration to a cloud hosting environment, David Mytton, Founder and CTO of Server Density , did an investigation into the best ways to deploy MongoDB into two popular platforms, Amazon EC2, and Google Compute Engine. In part one of this two part series , we will review David’s general pros and cons of virtualization, and in part two, we will cover the challenges and methods of virtualizing MongoDB on EC2 and GCE. Introducing David Mytton and Server Density David Mytton is the CTO of Server Density , which boldly proclaims it offers “server monitoring that doesn’t suck.” Their service provides remote or on-premises monitoring of infrastructure. Besides the standard metrics from servers, they can collect any custom metrics you want via custom plugins, and they inter-operate with Nagios plugins as well. Server Density uses MongoDB to store all of its monitoring data. Every metric from every server for every client adds up to quite a bit of it! Each month Server Density ingests 250TB of monitoring data, inserting roughly a billion documents into MongoDB every day. At his talk at MongoDB World 2014 , David went into detail about why one would want to virtualize MongoDB, what considerations to have in mind while doing so, and the specifics of deploying MongoDB into both EC2 and Google Compute Engine. Cloud Infrastructure vs. Bare Metal David segments the overall trade-offs between bare metal and cloud VM providers into two categories: operational, and financial. Operationally, the cloud offers ease of management and agility, while bare metal offers performance and the ability to purchase machines tailored exactly to your workload. Financially, cloud infrastructure costs more over time but has very small startup costs, while co-location of bare metal requires capital expenditure, and eventual liquidation of inventory, but costs less in the long run. That’s just the high level overview, though… let’s get into the weeds. Virtualization: Advantages Virtual infrastructures are easy to manage, and agile, because provisioning an instance is fast and simple. With public cloud providers, one can take advantage of machine templates (AMIs in the parlance of EC2, or Images on the GCE side). The public images (such as the official MongoDB AMIs) are well vetted, and you can roll your own if you want to deploy the same custom image to lots of hosts. Containment is easy with a cloud architecture, just deploy everything into its own VM. Snapshotting is very easy with cloud providers. This provides two benefits: - Fast backup - If an instance requires vertical scaling, it is easy to resize, or migrate. Just take a snapshot, provision a new volume, and restore to the new volume from the snapshot. With a large cloud provider, you have effectively unlimited resources to scale rapidly. If you need to add MongoDB replica set nodes, for example, or entire new shards, you can spin up instances and have them in the cluster within minutes. It’s cheap to get started, even if you want to handle an unknown amount of load. You can spin up a lot of nodes without paying for physical hardware, and spin down what you don’t need when your load level is established. The same flexibility means you can scale to handle seasonal traffic without being over-provisioned year-round. With cloud providers, you can take advantage of other products they offer, such as DNS, email, storage, search, and load balancing. Virtualization: Disadvantages The hypervisor, which orchestrates virtualization, has overhead, and that affects performance. VMs on the same host can experience contention for resources, especially in public clouds. Databases such as MongoDB are particularly sensitive to IO latency, so this contention can lead to very poor performance if not accounted for. Bare Metal: Advantages With bare metal you get dedicated resources for all your apps, without the overhead of the hypervisor or contention between VMs. You can completely customize your boxes, as opposed to having to use whatever configurations your provider offers. Especially once you reach about 50 servers, even including the salaries or contracted cost of infrastructure expertise, owning your own hardware is much cheaper. Bare Metal: Disadvantages Unlike a Virtual server, which can be provisioned in minutes, provisioning bare metal will take at least 4 hours, and that’s assuming a good arrangement with a bare metal hosting service. It’s days to weeks if you’re ordering and racking in your own colo. With bare metal, you must always be over provisioned to handle growth. Snapshotting is hard, or at least harder. LVM offers relatively easy snapshots, but not as easy as a button-click, and managing the storage is up to you. Resizing is hard. In fact, no-one would have called it “resizing” before virtualization, it was just called “getting a bigger box, migrating the app, and finding some hand-me-down use for the now unused old box.” Investment! Bare metal requires CapEx, inventory depreciation, and eventually liquidation, or leasing, both of which have higher upfront costs than provisioning VMs. A Typical Trajectory Because of the trade-offs, a typical trajectory for a new enterprise is to start up their infrastructure purely in the cloud, and eventually to migrate to data centers of their own once revenue and/or investment is established and the benefits of scale emerge. That’s not the the only path, however. Sometimes operational concerns dominate, and businesses opt to stay with a cloud provider even after they reach a strict break-even point. And in some cases, businesses migrate from their own hardware to the cloud. This was the case with Server Density, and you can hear David discuss their rationale in detail in a video at the bottom of his post on the Server Density blog . Stay tuned for the next installment , where we discuss the challenges of virtualizing databases in public clouds, as well as specific best practices for EC2 and GCE. In the meantime, download our operations white paper for best practices on deploying and managing a MongoDB cluster: Read the ops best practices guide About the Author - Avery Avery is an infrastructure engineer, designer, and strategist with 20 years experience in every facet of internet technology and software development. As principal of Bringing Fire Consulting, he offers clients his expertise at the intersection of technology, business strategy, and product formulation. He earned a B.A in Computer Science from Brown University, where he specialized in systems and network programming, while also studying anthropology, fiction, cog sci, and semiotics. Avery got his start in internet technology in 1993, configuring apache and automating systems at Panix, the third-oldest ISP in the world. He has an obsession with getting to the heart of a problem, a flair for communication, and a devotion to providing delight to end users. Read Part 2 >>

January 7, 2015