MongoDB and the Connectivity Map: Making Connections Between Genetics and Disease
June 24, 2014
The Broad Institute has developed a novel high-throughput gene-expression profiling technology and has used it to build an open-source catalog of over a million profiles that captures the functional states of cells when treated with drugs and other types of perturbations. Referred to as the Connectivity Map (or CMap), these data when paired with pattern matching algorithms, facilitate the discovery of connections between drugs, genes and diseases. We wished to expose this resource to scientists around the world via an API that is easily accessible to programmers and biologists alike. We required a database solution that could handle a variety of data types and handle frequent changes to the schema. We realized that a relational database did not fit our needs, and gravitated towards MongoDB for its ease of use, support for dynamic schema, complex data structures and expressive query syntax. In this talk, we’ll walk through how we built the CMap library. We’ll discuss why we chose MongoDB, the various schema design iterations and tradeoffs we’ve made, how people are using the API, and what we’re planning for the next generation of biomedical data.