MongoDB powers Mappy Health's tweet-based disease tracking

Twitter has come a long way from being the place to read what your friends ate for dinner last night (though it still has that). Now it’s also a place where researchers can track the ebb and flow of diseases, and take appropriate action.

image

In early 2012, the U.S. Department of Health and Human Services challenged developers to design applications that use the free Twitter API to track health trends in real time. With $21,000 in prize money at stake, Charles Boicey, Chief Innovation Officer of Social Health Insights, and team got started on the Trending Now Challenge, and ultimately won with its MongoDB-powered solution, Mappy Health.

Not bad, especially since the small team had only three weeks to put together a solution.

Choosing a Database

MongoDB was critical to getting the application done well, and on time, as Boicey tells it,

MongoDB is just a wonderful environment in which to work. What used to take weeks with relational database technology is a matter of days or hours with MongoDB.

Fortunately, Boicey had a running start. Having used MongoDB previously in a healthcare environment, and seeing how well it had ingested health information exchange data in an XML format, Boicey felt sure MongoDB could manage incoming Twitter data. Plus, Mappy Health needed MongoDB’s geospatial capabilities so as to be able to track diseases by location.

Finally, while the team evaluated other NoSQL options, “MongoDB was the easiest to stand up” and is “extremely fast.” To make the development process even more efficient, Mappy Health runs the service on Amazon EC2.

Processing the Data

While UCI has a Hadoop ecosystem Mappy Health could have used, the team found that for processing real-time algorithms and MapReduce jobs, they run much faster on MongoDB, and so runs MapReduce within MongoDB, yielding insights like this:

image

As Boicey notes,

Writing MapReduce jobs in Javascript has been fairly simple and allows us to cache collections/hashes of data frequently displayed on the site easily using a Memcached middleman between the MongoDB server and the Heroku-served front-end web app.

This jibes well with Mappy Health’s overall rationale for choosing MongoDB:

  1. MongoDB doesn’t require a lot of work upfront (e.g., schema design - “doing the same thing in a relational database would require a lot of advance planning and then ongoing maintenance work like updating tables) and
  2. MongoDB works really well and scales beautifully

Since winning the Trending Now Challenge, Mappy Health has been working with a number of other organizations. We look forward to even bigger and better things from this team. Imagine what they could do if given a whole four weeks to build an application!

Tagged with: Mappy Health, case study, disease tracking, US Department of Health and Human Services, flexibility, ease of use, Amazon, EC2, dynamic schema