Last year, pharmaceutical giant AstraZeneca embarked on an ambitious program to use next-generation genome sequencing to develop drugs to fight all kinds of disease, including cancer.
The technology creates a synthetic version of messenger RNA, which helps create protein in cells. If successful, the proteins could fight cancer, among other diseases.
Unfortunately, such genomic sequencing requires a great deal of computing power. As Jason Tetrault, architect of R&D information at AstraZenaca explained recently, analyzing 88 whole human genomes took 15,000 hours and 171 terabytes (TB) of data. Analyzing a single human genome can take four days.
Fortunately, breakthroughs in genetics are coinciding with quantum leaps in computing power. In particular, MongoDB’s cross-platform, document-oriented database has arrived to crunch such numbers on a grand scale. “I would include MongoDB in the camp of very disruptive technology,” Tetrault says. “Anything that helps us along the way to quicken our pace to help find things faster is great.”
For Tetrault, genomic sequencing is a classic example of the challenge of unstructured data. AstraZeneca chose MongoDB for its document storage capability. Then, the company offered lunch and learns to educate staffers. “Of course lunch-and-learns,” he says. “Everyone’s going to show up for a free lunch.”
Examples also helped bring employees on board. Tetrault pointed out that Craigslist and MTV, among others, use MongoDB to track and arrange their huge troves of data. Tetrault says dropping those names “really opens the discussion...it’s an ‘Oh wow’ moment.”
Perhaps the most persuasive tactic to getting everyone on board was to show how MongoDB could achieve a useful result relatively quickly. “Find something hard and make it easy,” he says.
AstraZeneca’s experiment involved taking 10% of all its compounds and pulling in information from its disparate database systems. Using MongoDB, the company was able to execute Tanimoto comparisons on about 500,000,000 compounds. “All of this, underneath my desk,” Tetrault says.
Though early in the process, Tetrault says he’s excited that AstraZeneca can use MongoDB to help fight cancer. “I’m enabling the cancer researchers. Our researchers are trying to figure out which drug can be most effective against specific tumor types.” With a greater command of the data, Tetrault says AstraZeneca can pursue links and patterns that it never noticed before. “Maybe this worked for 10% in liver cancer but wow this lung cancer actually has the same biomarker. That’s the kind of question that I would like to ask and that’s why I’m interested in big data technology.”