3 results

Provisioned IOPS On AWS Marketplace Significantly Boosts MongoDB Performance, Ease Of Use

One of the largest factors affecting the performance of MongoDB is the choice of storage configuration. As data sets exceed the size of memory, the random IOPS rate of your storage will begin to dominate database performance . How you split your logs, journal and data files across drives will impact performance and the maintainability of your database. Even choice of filesystem and read-ahead settings can have a major impact. A large number of performance issues we encounter in the field are related to misconfigured or under-provisioned storage. Storage configuration is often more important than instance size in determining the expected performance of a MongoDB server. MongoDB With Provisioned IOPS: Better Performance, Less Guesswork That’s why we’re excited to announce the availability of MongoDB with bundled storage configurations on the Amazon Web Services (AWS) Marketplace . Working closely with the Marketplace and EBS teams, we’ve made available a new set of MongoDB AMI’s that not only include the world’s leading document database software installed and configured according to our best practices, but also include high performance storage configurations that leverage Amazon’s Provisioned IOPS storage volumes , including Amazon’s new 4000 IOP pIOPS drives. These options take a lot of the guess work out of running MongoDB on EC2 and help ensure a great out-of-the-box experience without needing to do any additional setup yourself. These configurations offer radically improved performance for MongoDB, even on datasets much larger than RAM. If you want to take MongoDB for a spin, or set up your first production cluster, we recommend starting with these images. We plan to keep extending this set of configurations to give you more choices to address different workloads and use cases . The MongoDB with Bundled Storage AMI is available today in 3 configurations: MongoDB 2.4 with 1000 IOPS MongoDB 2.4 with 2000 IOPS MongoDB 2.4 with 4000 IOPS The choice of configuration will depend on how much storage capacity you want to put behind your MongoDB instance. For comparison, we have found that ephemeral storage and regular (non-pIOPS) EBS volumes can reliably deliver about 100 IOPS on a sustained basis. That means that these configurations can deliver up to 10x-40x higher out-of-memory throughput than non-pIOPS based setups. There’s no charge from 10gen for using these AMI’s. You pay only the EC2 usage charges for the instances and disk volumes used by your setup. Take them for a test-drive and please let us know what you think. Implications Of Using MongoDB With pIOPS Here’s what you get when you use these instances: Separate Volumes For Data, Journal And Logs When you launch the AMI on an EC2 instance, there will be three EBS volumes attached. One each for Data, Journal and Logs. By separating these onto separate volumes, we help decrease contention for disk access during high load scenarios and avoid head-of-line blocking that can occur. The data volume is provisioned at 200GB or 400GB, with IOPS rates at 1000, 2000 and 4000. For write-heavy workloads, this helps ensure that the background flush can get synced quickly to disk. For read-heavy workloads, the IOPS rate of the drive determines the rate at which a random document, or b-tree bucket can be loaded from disk into memory. The journal gets its own 25GB drive provisioned at 250 IOPS. While 25GB is large for the journal, we wanted to make sure we had enough IOPS to handle the journal load and to provide sufficient capacity for reading the journal during a recovery. In order to maintain the 10:1 ratio of size to IOPS imposed by EBS, we made it a little bigger than needed. Separating the journal onto a separate volume ensures that a journal flush is never queued behind the big IO’s that happen when data files are synced. The log volumes are provisioned at 10GB, 15GB and 20GB sizes with 100, 150, and 200 IOPS. This gives you plenty of room for storage of logs as well as predictable storage performance for collection of log data. Pre-tuned Filesystem And OS Configuration We’ve pre-configured EXT4 filesystem, sensible mount options, read-aheads and ulimit settings into the AMI. pIOPS EBS volumes are rated for 16KB IO’s, so using read-aheads higher than this size actually lead to decreased throughput. We’ve set this up out of the box. Amazon Linux With Pre-installed Software And Repositories We started with Amazon’s latest and greatest Linux AMI, and then added in 10gen’s RPM repo. No more adding a repo to get access to the latest software version. We’ve also pre-installed MongoDB, the 10gen MMS Agent and various useful software utilities like sysstat (which contains the useful iostat utility) and munin-node (which MMS can use to access host statistics). The MMS agent is deactivated by default, but can be activated simply by adding your MMS account ID and then starting the agent. A New Wave Of MongoDB Adoption In The Cloud A significant percentage of MongoDB applications are currently deployed in the cloud. We expect this percentage to continue to grow as enterprises discover the cost and agility benefits of running their applications on clouds like AWS. As such, it's critically important that MongoDB run exceptionally well on Amazon, and with the addition of pIOPS to the MongoDB AMI's on Marketplace, MongoDB performance in the cloud just got a big boost. We look forward to continuing to work closely with Amazon to facilitate MongoDB performance improvements on AWS.

May 7, 2013

MongoDB powers Mappy Health's tweet-based disease tracking

Twitter has come a long way from being the place to read what your friends ate for dinner last night (though it still has that). Now it’s also a place where researchers can track the ebb and flow of diseases, and take appropriate action. In early 2012, the U.S. Department of Health and Human Services challenged developers to design applications that use the free Twitter API to track health trends in real time. With $21,000 in prize money at stake, Charles Boicey , Chief Innovation Officer of Social Health Insights, and team got started on the Trending Now Challenge , and ultimately won with its MongoDB-powered solution, Mappy Health . Not bad, especially since the small team had only three weeks to put together a solution. Choosing a Database MongoDB was critical to getting the application done well, and on time, as Boicey tells it, MongoDB is just a wonderful environment in which to work. What used to take weeks with relational database technology is a matter of days or hours with MongoDB. Fortunately, Boicey had a running start. Having used MongoDB previously in a healthcare environment, and seeing how well it had ingested health information exchange data in an XML format, Boicey felt sure MongoDB could manage incoming Twitter data. Plus, Mappy Health needed MongoDB’s geospatial capabilities so as to be able to track diseases by location. Finally, while the team evaluated other NoSQL options, “MongoDB was the easiest to stand up” and is “extremely fast.” To make the development process even more efficient, Mappy Health runs the service on Amazon EC2. Processing the Data While UCI has a Hadoop ecosystem Mappy Health could have used, the team found that for processing real-time algorithms and MapReduce jobs, they run much faster on MongoDB, and so runs MapReduce within MongoDB, yielding insights like this: As Boicey notes, Writing MapReduce jobs in Javascript has been fairly simple and allows us to cache collections/hashes of data frequently displayed on the site easily using a Memcached middleman between the MongoDB server and the Heroku-served front-end web app. This jibes well with Mappy Health’s overall rationale for choosing MongoDB: MongoDB doesn’t require a lot of work upfront (e.g., schema design - “doing the same thing in a relational database would require a lot of advance planning and then ongoing maintenance work like updating tables) and MongoDB works really well and scales beautifully Since winning the Trending Now Challenge, Mappy Health has been working with a number of other organizations. We look forward to even bigger and better things from this team. Imagine what they could do if given a whole four weeks to build an application! Tagged with: Mappy Health, case study, disease tracking, US Department of Health and Human Services, flexibility, ease of use, Amazon, EC2, dynamic schema

March 18, 2013

Living in the post-transactional database future

Given that we’ve spent decades building applications around relational databases, it’s not surprising that the first response to the introduction of NoSQL databases like MongoDB is sometimes “Why?” Developers aren’t usually the ones asking this question, because they love the approachability and flexibility MongoDB gives them. But DBAs who have built their careers on managing heavy RDBMS infrastructure? They’re harder to please. 10gen president Max Schireson estimates that 60 percent of the world’s databases are operational in nature, which is MongoDB’s market. Of those use cases, most of them are ripe for a non-relational approach. The database market is rapidly changing, and very much up for grabs. Or as Redmonk analyst James Governor puts it , “The idea that everything is relational? Those days are gone.” As useful as relational databases are (and they’re very useful for a certain class of application), they are losing relevance in a world where complex transactions are more the exception, less the rule. In fact, I’d argue that over time, the majority of application software that developers write will be in use cases that are better fits for MongoDB and other NoSQL technology, not legacy RDBMS. That’s the future. What about now? Arguably, many of the applications being built today are already post-transaction, ripe for MongoDB and poor fits for RDBMS. Consider: Amazon: its systems that process order transactions (RDBMS) are largely “done” and “stable”. Amazon’s current development is largely focusing on how to provide better search and recommendations or how to adapt prices on the fly (NoSQL). Netflix: the vast majority of it engineering is focusing on recommending better movies to you (NoSQL), not processing your monthly bill (RDBMS). Square: the easy part is processing the credit card (RDBMS). The hard part is making it location aware, so it knows where you are and what you’re buying (NoSQL). It’s easy, but erroneous, to pigeon-hole these examples as representative of an anomalous minority of enterprises. Yes, these companies represent the cutting edge of both business and technology. But no, they are not alone in building these sorts of applications. For every early-adopter Netflix there’s a sizable, growing population of mainstream companies in media (e.g., The Guardian ), finance (e.g., Intuit ), or other verticals that are looking to turn technology into a revenue-driving asset, and not simply something that helps keep the lights on and payrolls running. When what we built were websites, RDBMS worked great. But today, we’re building applications that are mobile, social, involve high volume data feeds, incorporate predictive analytics, etc. These modern applications? They don’t fit RDBMS. Andy Oliver lists 10 things never to do with a relational database , but the list is much longer, and growing. MongoDB is empowering the next generation of applications: post-transactional applications that rely on bigger data sets that move much faster than an RDBMS can handle. Yes, there will remain a relatively small sphere of applications unsuitable for MongoDB (including applications with a heavy emphasis on complex transactions), but the big needs going forward like search, log analysis, media repositories, recommendation engines, high-frequency trading, etc.? Those functions that really help a company innovate and grow revenue? They’re best done with MongoDB. Of course, given RDBMS’ multi-decade legacy, it’s natural for developers to try to force RDBMS to work for a given business problem. Take log analysis, for example. Oliver writes: Log analysis : …[T]urn on the log analysis features of Hadoop or RHQ/JBossON for a small cluster of servers. Set the log level and log capture to anything other than ERROR. Do something more complex and life will be very bad. See, this kind of somewhat unstructured data analysis is exactly what MapReduce à la Hadoop and languages like PIG are for. It’s unfortunate that the major monitoring tools are RDBMS-specific — they really don’t need transactions, and low latency is job No. 1. For forward-looking organizations, they already realize that MongoDB is an excellent fit for log management, which is why we see more and more enterprises turning to MongoDB for this purpose. I expect this to continue. As MongoDB continues to enrich its functionality , the universe of applications for which it is not merely applicable, but also better , will continue to expand, even as the universe of applications for which RDBMS is optimal will decline. Indeed, we’re already living in a post-transactional world. Some people just don’t know it yet. (Or, as William Gibson would say, “The future is already here – it’s just not very evenly distributed.”) Posted by Matt Asay , vice president of Corporate Strategy, with significant help from my inestimable colleague, Jared Rosoff . Tagged with: NoSQL, MongoDB, RDBMS, relational, James Governor, Redmonk, log analysis, Andy Oliver, transactions, Netflix, Amazon, Square, operational database, DBA

November 20, 2012