{Event}  Couldn't attend MongoDB.local London? Here's what we announced >

A Translational Medicine Platform at Sanofi Transcript

Sorry that this sessions is so empty compared to the great stuff we're going to hear. I've been spending the last day and a half listening to stories about how companies are transforming their business with MongoDB, and I've had the pleasure of working with these two gentlemen from Sanofi over the past few months about their presentation. And I have to say, it's really impactful to understand how new data types and cancer research are coming together and why it's facilitated by MongoDB.

So Mr. David Peyruc and Mr. Erwan David have been working in the R&D department at Sanofi. They've focused on new data types that actually bring together some possibilities in cancer research. So if that's not inspiration about innovation, I don't really know what is. I'm really pleased to actually listen to their final presentation. We'll take questions at the end. And if you would, please, at the end of the session fill out your surveys. And if you have mobile phones, please put them on silent now. Gentlemen, thank you so much for joining us today.

Thank you, Rebecca. OK. So, thank you, Rebecca. So we are extremely proud to be here in New York at the MongoDB World Congress and to have the great opportunity today to present our recent work done in Sanofi research and development.

So life science, including former companies entering a new area. It's transnational medicine. Is it possible today in a big company such as Sanofi to innovate and bring internally cutting age technologies like MongoDB? So we will try to answer this question with the following agenda. So first I will briefly present some figure about the Sanofi company. I will come back on the translational medicine concept and globalized or architecture view of what has been put in place in Sanofi research and development. And then after, Erwan will present why we've chosen MongoDB, the various business cases we are currently addressing by the solution. And finally, benefits for the company.

So Sanofi is a global health care company who is, in fact, engaged in the research, the development, the manufacturing and the marketing of health care project. Sanofi is the third largest pharma company in the world and the second largest in Europe. The 110,000 people who is working in Sanofi are distributed in 100 countries. And Sanofi is focused on three main activities. So it's pharmaceutical, it's a vaccine with Sanofi [INAUDIBLE], and [INAUDIBLE] with [? Marion ?].

Sanofi research and development is distributed around 20 site of research. And investing 4.7 billion euros per year. The 16,500 employees are currently developing more than 50 molecules and vaccine in the portfolio.

So I will now introduce briefly the team who has worked on this project. So here Erwan, today. Today here. Erwan is a project manager of our project and our initiative. He has a strong technical knowledge in various technical area. And he has lead a lot of scientific project who have as a common things to have to manage a huge data volume.

The second person is Florian Chuit, unfortunately not there today. So Florian is our agile methodology guru. He has worked for many various different pharma company, and is currently our Scrum Master, the Scrum Master of the team.

Finally, me. I'm the global solution leader of the translational medicine platform in Sanofi Aventis research and development. And I'm also the Sanofi representative in a Europe project called Etrix, who is dealing with the same thematic.

So let's go now on the heart of the business topic today. So Pharma companies are facing currently many different factor that impact deeply the business model, the classical business model of pharma. The first one, it's a strong competition by generic drugs. It's also called the patent cliff. At the same time, we have a leak of innovation in the classical way we are trying to find some drugs. And finally, a mutation of the business model that have clearly [INAUDIBLE] of the Blockbuster age.

So in front of this mutation, at the beginning in thousand year. In fact, the 4Ps concept has being initiated. So the 4P concept, it's a medicine more personalized, more predictive, more preventive, and finally, more participatory. Thus 4P concept has been translate into translational medicine into pharma company.

So the foundation of the translational medicine is to bridge clinical data and research data. The clinical data and information is mainly to manage trials information and health record information, and many electronic health record information now. To bridge them both, the trial information and health record information, reveal new evidence.

For the research parts, into the research, we have to deal with OMICS data. We mean by OMICS, it's genomics information, genomics data, proteomics data, or [INAUDIBLE] data, and also fundamental knowledge. And to bridge the both, generate new hypothesis. To bridge the both, the clinical information and the research information, allow scientist and physician to perform a personalized medicine. And to, for instance, to provide a better medication and a medication dedicated to a single person or a single sub-population. At the same time, to bridge the both permit also to increase dramatically the efficiency of clinical trials.

The OMICS technology and the knowledge we can bring all together. Or also to scientists, to not only have the capabilities to compute on a single biological pathway for instance, but to the [? wall ?] [? metabolum ?]. And those new capabilities, of course, provide better insight for their research. And at the same time, the NGS new generation sequencing, so the NGS technology, revolutionize the way to think of preventive action, medical action.

But, on [INAUDIBLE], we have to render this a reality for patients. So the first challenge translational medicine is facing, it's the high diversity of object to connect all together. In a single disease program, internally, we have the scientific teams are generally to manage thousands of files, including files of gigabytes. So it's a very huge number of files, of data, to manage, and in a high diversity of domain and fields.

At the same time, we have to maintain the full consistency of this data all together in the platform. But also to maintain the full trustability of the life cycle of the data and of the usage of the data and file. And finally, we have to put in place a user friendly curation process in order to annotate this file and data with metadata consistently. And this in order to create, extract knowledge, gain understanding, in order for each single data to be able to cross analyze this data.

So this slide represents the translational medicine platform we have put in place in Sanofi research and development. So you can see into the different block of this representation the different areas, the different domains of the translational medicine. In each of these block, we are dealing with the best in class technology, or best in class software for specific analysis, like molecular profiling or interpretation. All these applications are linked together in order to to be able to support the full process of the early stage drug discovery.

The need. Together, all this data and all these files, all together into the platform push us to investigate big data solution in order to put in place this central repository for all this data. So now Erwan will present you why we've chosen MongoDB to become the technology of this cornerstone pieces of the platform.

Thank you, David. So indeed, I'm going to go to the solution. And first of all, why MongoDB. As you understand, we have to store lots of binaries and their related metadata coming from the curation process that David mentioned. And MongoDB, with the GridFS on one hand and the document oriented database on the other hand, perfectly fits this need. So we are happy with that.

We need something scalable. Because when you enter to large volume, you can't afford to have a data migration or to renew your hardware each time you grow. So we need something scalable, and MongoDB is scalable. I think everybody knows that now. And we start from day one with a sharding platform. And last but not least, MongoDB is easy. That's also things that have been said hundred times. It's not only easy to install or to use, but it's also easy to adopt. What we mean by that is that our colleagues is mainly coming from relational database world. And when it comes to MongoDB, it's not so different from the specification. So it's easy for us to convince them and to run project over MongoDB thanks to that.

And so by the way, how we move to MongoDB? It has been a one year drama that I'm going to tell. And when it comes to new technology, we usually start to contact academics. And we work with universities. So young people you see here are trainees at this time. And we start pragmatically to write white paper about big data for translational medicine. And on the paper, it was clear that MongoDB was the solution.

So we said, OK, let's move forward. And we asked these young guys to install MongoDB on their classroom computers. So they was a bit sceptic, but it works perfectly. And we could, in the classroom, set a MongoDB cluster that was quite performant. So we move on. And next was done on the Sanofi infrastructure. And we create what we call a proof of concept. So it was basically a few JavaScript web pages that show the key functionality of the future software. This is to gain the user agreement. And we gain it.

I can clearly remember this period. It was very end of the summer 2013. Our students get graduate, and based on their work, we could launch a project at Sanofi. And so involved some more people that you see here, so professional developers. And in a few additional months, we could roll out into production a version of our software. A quite basic version, but it was already solving the high pain they have in using, for example, SharePoint, or this kind of stuff to store a large amount of data, and it's not working. So in one year, from almost nothing to something in production. Again, thanks to MongoDB. And this is great. And pretty unusual in a big company.

So let's go a little bit deeper in the solution. As you understand, we are based on the MongoDB cluster, which is running a number of databases. Some of them are dedicated to GridFS, so binaries. And some other are dedicated to metadata or to related data. On the top of this, we have Solr search engine. And I can answer question about that, why we bring Solr. But we need for the website, mainly.

And our main development, in fact, has been to build what we call a service layer, which is in fact Java REST API which is federating all those complex objects and bring together in order to build easily software on the top of that. And everything on this picture can scale. We can add computers, servers, and it will scale. So of course we need some GUI, because this is for the users. So we have built a website. And I will demo one of the websites we made. We are also happy to integrate with the desktop of our users. So we can mount this as a network drive on Windows or on Linux, which is very convenient. And finally, we can also solve problem that other software has. As you understand, David present, it's a large platform with lots of software. And most of them are web-based. And most of them have to deal with file. And web software are not good at dealing with file, because a web server is usually [INAUDIBLE] VM.

So by using our API, they can get the persistency of the storage, and they can benefit of all the other tool we have on the platform. So we are also solving some other problem on the platform. And because you might not trust me, I will show a quick video. I hope that I'm able to load. Yes. I'm looking for the mouse. So it's a video. I don't take too much risk.

So this is our software. You see we have a tree view. It can be dynamically rendered. We are looking on the program here. And you see at the moment, we have almost no metadata. So nothing [INAUDIBLE]. So let's fix that. So we go to a sub folder, where we have quite a large amount of file that are stored here. And we'll tag them, we'll annotate them with a right click.

And here, we have the annotation page that I will show. And basically, we'll start by adding metadata. So these are coming from control vocabulary, as you would expect. So a small dictionary for the first one, and a bigger dictionary-- so this is about genes. So we have 50,000 genes. So we need an auto completion mechanism. So we'll add some gene at this file. And the [INAUDIBLE] is now, is because we can extend basically. And we can have almost all the functionality of Excel in this spreadsheet, JavaScript.

And when we save the metadata-- of course, save at the file level-- but they are also aggregated at the program level. So this is a really good. And how MongoDB is helping us here? Well basically, first of all, for the metadata. Thanks to the flexible schema of MongoDB, we can just by configuration add new metadata. So disease, population, whatever the user wants. And the second point is, of course the user don't want to go to each and every file to see the metadata. So we need to aggregate them at the higher level. And we are dynamically generating aggregation framework queries in order to aggregate the data at the upper level. And we can configure which data aggregate to which level. So MongoDB help us a lot in this situation.

Second example use case of our platform is what we call the 360 Data Explorer. When it comes to have lots of data, you want to find new insight into this data. And what we can do for you, is to re-organize the data on purpose. So for example, if you are an early discovery researcher, you are really interested in disease, of course, that you want to cure. And usually, you want to target some cell receptor for your drugs.

So basically, what we can do, we can reorganize the data that way. And in fact, we can organize data in a way that the disease will be the sub level of folder. So you will see all your diseases as a first level. And when you click on a specific disease-- so I zoom in for the example-- if you click on the coronary syndrome, cardiovascular problems, you will see all the receptor that are involved in this disease that we have on our document. And if I click on a specific receptor, I can then see that we have seven documents about the coronary syndrome and the [? peroxism ?] receptor. So this is really reorganizing the data on the fly and on purpose of the people. So second example just to fix ideas.

If you are interested in health activity around the world, you can sort the data by geographic zone. So for example, we take the United States, of course, and the [? Child Health Services ?], and we know that we have 16 documents in the database about that.

So what is this? In fact, it is exactly the same data. So this is public data, this is PubMed that we have automatically indexed that way. And we can expose the data differently, depending of what the user want to do. And this is what we call the faceted navigation. Just like the faceted search from the search engine. And we have very, very good feedback about this, because people just rediscover their data within this feature. And this is really good.

So of course such project came with some challenges. We understand that within those new technologies, usually it's not one technology but at least two. Maybe one day Hadoop and stuff like that. And we need to keep that in sync. And at the moment, we are using Mongo Connector, which can be found on GitHub. It works. But it does not work so fast. And so when we have high throughput on the platform, the search engine is quite behind. So we are working with MongoDB of course, but also with other customers. MongoDB connect people here, to come with a more scalable solution and more powerful solution to manage that.

The second point is that our IT guys, our infrastructure guys that manage the data center-- MongoDB is not a standard for them. So it means that we have to support MongoDB from the project team. And well, it's not a big problem, but we lose time. So basically, we discuss with them. And now, thanks to MongoDB-- because again, it's easy to adopt and to explain to a DBA, but also because it's not anymore an exotic product. I think we cannot say that. They accept to have MongoDB as a standard now, and we will now transition the support of MongoDB to the infrastructure team. So we are pretty happy with that.

We'll go quickly to the next step. Because we want to integrate more and more clinical data that are really messy-- you can't imagine-- we are integrating tools like OpenRefine, that you might know if you work with messy data, which is great to transform your data on the fly, and [? will ?] directly transform data on the platform, which will be very nice.

Next, of course, we want to scale up. At the moment, we have one cluster in the Boston area for our [INAUDIBLE] department. We plan to set up next year's with a second cluster on Europe, and more and more departments. And for that, we want to bring more collaborate feature, in order to create more awareness on the platform. So people need to know if there is a new document somewhere. So we'll bring this kind of nice future and also improve the permission, which is not so good at the moment. And finally, we transition the support to the infrastructure, of course.

So as a conclusion, what are the benefits? Why did we do that? First of all, to gain time. This is always useful, to have some time. And indeed, the scientists gain time by this curation process, this easiness of the JavaScript, what they can do, configuration and so on. This is really good. And of course, we gain consistency around the data. So it's really important. But the second point is even more important. It's really this awareness of existing data. We create this kind of serendipity, you know? You really create new discovery thanks to looking at the data differently. And this is just what the researcher want to do. It's also easier to integrate external data, because if you type consistently your data, you can compare them. So obviously.

And finally, for the IT guys that we are, MongoDB's also great. So here, also, I will say something that has been said 100 times. It speeds up the development. And we clearly see the differences. We used to speak about database and code, and we create view on the between, et cetera. You know that. Now, it's not the case anymore. Our developers are trained on MongoDB, and they are able to discuss with us the entire design. And we share the same kind of object from the database up to the [INAUDIBLE].

Performance, flexibility. We saw this morning with the locks going at the development level, the performance will be great. This is what we have multiple database. Maybe we can revisit that. And finally, we are really comfortable to use MongoDB as a world project. Within the documentation, the support, the training, we really feel comfortable and happy with Mongo. And being such meeting, we are ever more happy. So thanks, Mongo. And thanks for your attention. [APPLAUSE]

Thank you so much for that. Marrying together clinical data and research data, amazing. You actually shared, in your presentation, that you had rolled this out in Boston, your first trial area. Can you tell us about how the users of the systems are feeling? What's their experience?

So in fact, it's really good. We were last week in Boston to take the temperature, to see everything was good. And yeah, we have a very good time. We were very well welcomed. Of course, we have always things to improve. Because when you create something new, you have lots of new demands. And I hope that this is a good sign. So we'll move on. But so far, the feedback is good, and it's refreshing. And also, you know those people are contacted by lots of big companies that bring products. And they are happy to see that internally we can also be a challenger of these big companies.

Excellent. So, I open it up for questions. All right. Go ahead. Please make sure you state your name when you're--

Thank you. I'm [? Yedi. ?] I have a question to Erwan. In your architecture, I see that you have Solr and you have MongoDB. So the REST API that you have, is it talking to Solr? Is it talking to MongoDB? For which data is it connecting to--?

It's talking to both. In fact, everything that is related to navigation on the website. So browsing, et cetera, is done using the MongoDB database as a back end. But in fact, when you really need Solr, when it comes to full text search. Because we index the content of the file, and we put that into Solr. And we don't use the MongoDB full text for that, because we want to bring the faceted search, so sort by category. And if you want to do that with Mongo, you have to run lots of query, whereas it's one Solr query. So just for this reason, we keep Solr.

Thank you.

My name is Wes [? Solestin ?]. So at the moment, are you using MongoDB and the architecture mostly as an enterprise content management? Or do you plan on doing analytics at some point? At the moment, it's more content management. The only things we do as an analytics is doing some NLP. If When I saw the open data that we sort on different purpose it's indexed by NLP code. But we are not really using MongoDB for that. So it's more a store and less analytics so far. But really, analytics is the next stage.

Because of translational medicine platform, we have different analytics application into the different pieces of the platform. So here, MongoDB's just for the storage. But into the next step, will be linked more and more with different application like translational medicine analytics application for instance, or other applications. The goal is clearly after to have this link, and also to create this consistency. To be sure that the consistency between our application is maintained and maintained by the semantic we will put in place in Mongo.

More questions?

Hello. My name is George [? Haria ?], Bank of Georgia. So was this a new project? Or you completely started from the scratch from MongoDB, or you kind of port it from a relational database [INAUDIBLE] data? No we-- sorry.

How did you go? I mean, if it was porting?

We had the chance to start from a white page. So it was really nice. And now, in fact, we are migrating data from SharePoint. So it's not really rational database. And we are migrating also the metadata. But there was not to lots of places where we can store metadata, finally. So we don't have lots to migrate. So the user are really currently doing this job, what we call the curation, of putting the metadata in place within our software.

Any more questions? Oh yeah.

What's the user population for this?

What's the user population for this? Yeah. [INAUDIBLE] and where's it going to ramp to? Where are you at now, and where will it go to?

So currently, so as mentioned in fact, we are putting in place more a platform than one application. So currently we have one application. A second application will be the storage for the [INAUDIBLE]. So for the first application, for the oncology, the population currently is around 20%. And this should be the entire oncology team. And people involving to the research team. So it will be 100%, 200% for oncology.

At the same time, we are in discussion with other unit in Sanofi. So Genzyme unit. And they should come with other business case. And we will implement a kind of same application in order to gather all those information. So this mean for Genzyme perhaps 500 person more. We are also in contact with diabetes unit in Frankfurt. So it will be 200 people more, and so on.

So in fact, we are more and more in contact with the different units. Because of course, due to the subject of translational medicine, the goal is to gather the maximum of information internally into the same platform and the same database.

Maybe to give a different answer, the kind of people that is using-- it's not too much clinicians, but it's more early researchers. So people that are at the early stage of the discovery. OK? But we bring clinical data, we extract them from clinical system, and bring them to the [INAUDIBLE] so that they can be used as much as possible. Of course, it came with lots of policy restrictions, as you would imagine.

Thank you. OK. Thank you very much for that very informational and exciting presentation, I appreciate it. I'll ask you to please fill out the surveys at the bottom of the MongoDB app. In 10 more minutes, we'll hear more innovation and data. We'll learn specifically about how to fall in love with eHarmony. So come back in 10 minutes, I look forward to seeing you. Thank you very much.