3 results

Provisioned IOPS On AWS Marketplace Significantly Boosts MongoDB Performance, Ease Of Use

One of the largest factors affecting the performance of MongoDB is the choice of storage configuration. As data sets exceed the size of memory, the random IOPS rate of your storage will begin to dominate database performance . How you split your logs, journal and data files across drives will impact performance and the maintainability of your database. Even choice of filesystem and read-ahead settings can have a major impact. A large number of performance issues we encounter in the field are related to misconfigured or under-provisioned storage. Storage configuration is often more important than instance size in determining the expected performance of a MongoDB server. MongoDB With Provisioned IOPS: Better Performance, Less Guesswork That’s why we’re excited to announce the availability of MongoDB with bundled storage configurations on the Amazon Web Services (AWS) Marketplace . Working closely with the Marketplace and EBS teams, we’ve made available a new set of MongoDB AMI’s that not only include the world’s leading document database software installed and configured according to our best practices, but also include high performance storage configurations that leverage Amazon’s Provisioned IOPS storage volumes , including Amazon’s new 4000 IOP pIOPS drives. These options take a lot of the guess work out of running MongoDB on EC2 and help ensure a great out-of-the-box experience without needing to do any additional setup yourself. These configurations offer radically improved performance for MongoDB, even on datasets much larger than RAM. If you want to take MongoDB for a spin, or set up your first production cluster, we recommend starting with these images. We plan to keep extending this set of configurations to give you more choices to address different workloads and use cases . The MongoDB with Bundled Storage AMI is available today in 3 configurations: MongoDB 2.4 with 1000 IOPS MongoDB 2.4 with 2000 IOPS MongoDB 2.4 with 4000 IOPS The choice of configuration will depend on how much storage capacity you want to put behind your MongoDB instance. For comparison, we have found that ephemeral storage and regular (non-pIOPS) EBS volumes can reliably deliver about 100 IOPS on a sustained basis. That means that these configurations can deliver up to 10x-40x higher out-of-memory throughput than non-pIOPS based setups. There’s no charge from 10gen for using these AMI’s. You pay only the EC2 usage charges for the instances and disk volumes used by your setup. Take them for a test-drive and please let us know what you think. Implications Of Using MongoDB With pIOPS Here’s what you get when you use these instances: Separate Volumes For Data, Journal And Logs When you launch the AMI on an EC2 instance, there will be three EBS volumes attached. One each for Data, Journal and Logs. By separating these onto separate volumes, we help decrease contention for disk access during high load scenarios and avoid head-of-line blocking that can occur. The data volume is provisioned at 200GB or 400GB, with IOPS rates at 1000, 2000 and 4000. For write-heavy workloads, this helps ensure that the background flush can get synced quickly to disk. For read-heavy workloads, the IOPS rate of the drive determines the rate at which a random document, or b-tree bucket can be loaded from disk into memory. The journal gets its own 25GB drive provisioned at 250 IOPS. While 25GB is large for the journal, we wanted to make sure we had enough IOPS to handle the journal load and to provide sufficient capacity for reading the journal during a recovery. In order to maintain the 10:1 ratio of size to IOPS imposed by EBS, we made it a little bigger than needed. Separating the journal onto a separate volume ensures that a journal flush is never queued behind the big IO’s that happen when data files are synced. The log volumes are provisioned at 10GB, 15GB and 20GB sizes with 100, 150, and 200 IOPS. This gives you plenty of room for storage of logs as well as predictable storage performance for collection of log data. Pre-tuned Filesystem And OS Configuration We’ve pre-configured EXT4 filesystem, sensible mount options, read-aheads and ulimit settings into the AMI. pIOPS EBS volumes are rated for 16KB IO’s, so using read-aheads higher than this size actually lead to decreased throughput. We’ve set this up out of the box. Amazon Linux With Pre-installed Software And Repositories We started with Amazon’s latest and greatest Linux AMI, and then added in 10gen’s RPM repo. No more adding a repo to get access to the latest software version. We’ve also pre-installed MongoDB, the 10gen MMS Agent and various useful software utilities like sysstat (which contains the useful iostat utility) and munin-node (which MMS can use to access host statistics). The MMS agent is deactivated by default, but can be activated simply by adding your MMS account ID and then starting the agent. A New Wave Of MongoDB Adoption In The Cloud A significant percentage of MongoDB applications are currently deployed in the cloud. We expect this percentage to continue to grow as enterprises discover the cost and agility benefits of running their applications on clouds like AWS. As such, it's critically important that MongoDB run exceptionally well on Amazon, and with the addition of pIOPS to the MongoDB AMI's on Marketplace, MongoDB performance in the cloud just got a big boost. We look forward to continuing to work closely with Amazon to facilitate MongoDB performance improvements on AWS.

May 7, 2013

MongoDB powers Mappy Health's tweet-based disease tracking

Twitter has come a long way from being the place to read what your friends ate for dinner last night (though it still has that). Now it’s also a place where researchers can track the ebb and flow of diseases, and take appropriate action. In early 2012, the U.S. Department of Health and Human Services challenged developers to design applications that use the free Twitter API to track health trends in real time. With $21,000 in prize money at stake, Charles Boicey , Chief Innovation Officer of Social Health Insights, and team got started on the Trending Now Challenge , and ultimately won with its MongoDB-powered solution, Mappy Health . Not bad, especially since the small team had only three weeks to put together a solution. Choosing a Database MongoDB was critical to getting the application done well, and on time, as Boicey tells it, MongoDB is just a wonderful environment in which to work. What used to take weeks with relational database technology is a matter of days or hours with MongoDB. Fortunately, Boicey had a running start. Having used MongoDB previously in a healthcare environment, and seeing how well it had ingested health information exchange data in an XML format, Boicey felt sure MongoDB could manage incoming Twitter data. Plus, Mappy Health needed MongoDB’s geospatial capabilities so as to be able to track diseases by location. Finally, while the team evaluated other NoSQL options, “MongoDB was the easiest to stand up” and is “extremely fast.” To make the development process even more efficient, Mappy Health runs the service on Amazon EC2. Processing the Data While UCI has a Hadoop ecosystem Mappy Health could have used, the team found that for processing real-time algorithms and MapReduce jobs, they run much faster on MongoDB, and so runs MapReduce within MongoDB, yielding insights like this: As Boicey notes, Writing MapReduce jobs in Javascript has been fairly simple and allows us to cache collections/hashes of data frequently displayed on the site easily using a Memcached middleman between the MongoDB server and the Heroku-served front-end web app. This jibes well with Mappy Health’s overall rationale for choosing MongoDB: MongoDB doesn’t require a lot of work upfront (e.g., schema design - “doing the same thing in a relational database would require a lot of advance planning and then ongoing maintenance work like updating tables) and MongoDB works really well and scales beautifully Since winning the Trending Now Challenge, Mappy Health has been working with a number of other organizations. We look forward to even bigger and better things from this team. Imagine what they could do if given a whole four weeks to build an application! Tagged with: Mappy Health, case study, disease tracking, US Department of Health and Human Services, flexibility, ease of use, Amazon, EC2, dynamic schema

March 18, 2013