Oxford Nanopore Technologies & MongoDB: Powering Real-Time Genetic Analysis with Docker, MongoDB, & AWS

Mat Keep


Genetic analysis is entering the mobile age. Earlier this year scientific journal Nature published a paper showing how Ebola researchers in Guinea were able to analyse genetic material in hours, rather than the weeks it had previously taken. This increased speed meant doctors could better understand the spread of the disease. Then quickly develop strategies to stop it.

The hardware that enabled the genetic analysis is the MinION, from UK-based Oxford Nanopore Technology. The stapler-sized MinION is the data-capture side of the analysis, but for the purposes of this article we’re interested in data processing and analysis. In particular how Oxford Nanopore has been able to build a fast, agile and powerful cloud-based platform that has the potential to deliver biological analyses to any scientist, at any time, anywhere in the world.

The applications for this genetic analysis go far beyond the medical field and disease control. Oxford Nanopore is using technologies like MongoDB, Amazon Web Services, and Docker containers in its stated goal: “to enable the real-time analysis of any living thing, by any user, in any environment.”

A Billionth of a Meter

The MinION does its genetic magic through the use of nanopores. Each nanopore is just a billionth of a meter wide. The technology in the MinION threads the genetic material through the nanopores where tiny differences in each sample can be registered as electrical disruptions. If you want a more detailed explanation of nanopores, check out Oxford Nanopore Technologies’ website.

DNA sequencing can be associated with predictive human questions alone, for example “what probability is there that this person will develop a specific disease?” But human genome research is just a part of the equation, and the portable nature of the MinION means it might be suitable for a more diverse range of questions: Is the soup I’m about to eat safe? What type of disease am I looking at? Where did this pathogen originate? How can we grow more resilient plants? Is this hospital ward clean?

Crucially, these questions need to be answered quickly, and in a range of environments – from the science lab to the middle of the jungle.

Three Billion Bases in the Cloud

The cleverest sequencing tool in the world would be worthless if we were unable to process and understand the data it created. To deal with the volume and velocity of processing billions of lines of DNA, Oxford Nanopore Technologies built analysis capabilities offered by Metrichor, on powerful software that can scale seamlessly in the cloud.

Richard Carter, Associate Director, Data Integration at Oxford Nanopore gave a presentation at MongoDB Days where he noted:

“When we began building Metrichor services, it was clear our data would not fit in the neat rows and columns of a relational database. We needed a database that could look at our complex information in more flexible and dynamic ways. It was a straightforward decision to go with MongoDB. It’s robust, best of breed, and has the data modelling and analytics flexibility we required. We also observed the technology has an incredible community behind it, coupled with extensive documentation and training. All of which enable us to get productive with the technology much faster.”

The DNA data is read locally onto the MinION and it’s then sent to an Amazon Web Services cloud. The findings are then analysed before the results are sent back to the user’s laptop or displayed in web reports. All of this is driven by, and stored in the non-relational database MongoDB. Docker containers are used to package, deploy and run the software across the cloud deployment.

Carter also noted that: “The biology and hardware is the real trick, of course, but we needed power and scalability to run cloud based services as we wished.”

There were other challenges the team had during development of their software. They had a technical goal and a number of ways they could reach it while keeping the focus on the biology. It was essential they had the freedom to experiment and make significant changes as they went along.

“Happily, MongoDB supports an evolutionary approach to development.” explained Carter. “We were spinning up instances and working on the science almost instantly. The database got out of the way.”

Carter’s team does not have a database administrator. They have found that MongoDB Cloud Manager is able to provide all the monitoring data needed to keep a deployment healthy. Features like simple, automated deployment across any cloud region, continuous backups, and telemetry visualisations also mean administration doesn’t monopolise the developers’ time.

Giant Ideas

Guinea is just one of the many places where researchers are using Nanopore’s data architecture for analysis. In fact, NASA will soon be using the MinION for testing biological molecules on the International Space Station.

Regardless of the location, the combination of rigorous science and the power of cloud computing is ushering in a new way of understanding the world.

Read more about MongoDB and its implementation on the AWS cloud platform.
MongoDB on AWS: Guidelines and Best Practices

About the Author - Mat Keep

Mat is director of product and market analysis at MongoDB. He is responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.