MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease
Rajiv Narayan
Corey Flynn
June 24, 2014
The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional states of cells when treated with drugs and other types of perturbations. Referred to as the Connectivity Map (or CMap), these data when paired with pattern matching algorithms, facilitate the discovery of connections between drugs, genes and diseases. We wished to expose this resource to scientists around the world via an API that is easily accessible to programmers and biologists alike. We required a database solution that could handle a variety of data types and handle frequent changes to the schema. We realized that a relational database did not fit our needs, and gravitated towards MongoDB for its ease of use, support for dynamic schema, complex data structures and expressive query syntax. In this talk, we’ll walk through how we built the CMap library. We’ll discuss why we chose MongoDB, the various schema design iterations and tradeoffs we’ve made, how people are using the API, and what we’re planning for the next generation of biomedical data.
Recommended Videos

Sprinklr cut costs by more than 55% by migrating to Mon...
March 12, 2018

Doubling the number of background checks per year with ...
February 20, 2018

Merrill Corp: Building a technology-enabled platform to...
February 13, 2018

The City of Chicago: Analyzing unstructured data in rea...
February 09, 2018
View more like this >