Around $4 trillion is invested globally every year in medical and scientific research. Elsevier publishes 17% of the content and discoveries generated from that research, providing visibility to much more through products such as Scopus. MongoDB is at the core of the Elsevier cloud-based platform, enabling the company to apply software and analytics that turn content into actionable knowledge and new insights for its customers.
We met with Kim Baddeley, Application Architect in Business Technology Solutions at Elsevier, to learn more.
Can you start by telling us a little bit about your company?
Elsevier is a global information analytics business that helps institutions and professionals advance healthcare, open science, and improve performance for the benefit of humanity. Elsevier provides digital solutions and tools that enable strategic research management, R&D performance, clinical decision support, and professional education; including ScienceDirect, Scopus, SciVal, ClinicalKey and Sherpath. Elsevier publishes over 2,500 digitized journals, including The Lancet and Cell, more than 35,000 e-book titles and many iconic reference works, including Gray's Anatomy. We are part of RELX Group, a global provider of information and analytics for professionals and business customers across multiple industries.
Can you describe how you are using MongoDB?
MongoDB is at the heart of managing our content and digital assets, powering two critical parts of the infrastructure at Elsevier:
Virtual Total Warehouse (VTW) is the central hub for our content, using MongoDB to manage the JSON-serialized metadata for each piece of research, including the title, author, date, abstract, version numbers, distribution rights, and more. Our revenue-generating publishing apps use VTW to access the appropriate research.
Unified Cloud Service (UCS) sits alongside VTW, storing the physical binary content assets (i.e. PDFs, Word documents, HTML, notebooks) in Amazon Web Services (AWS) S3 buckets, with MongoDB managing metadata for the asset, including its title, its indexed location in S3, and file size.
High level Elsevier architecture: MongoDB deployed across two AWS regions
Our platforms store 1.2 billion physical assets, represented as 200 million MongoDB documents, before replication. We serve an average of 50 million API calls per day, reaching 100 million calls during our peak publishing cycles.
Did you use MongoDB from the start, or something else?
We initially built out on a Key-Value NoSQL database, using it to store indexes into our assets persisted in S3. The content’s metadata was also stored in S3 along with the binary asset itself. We found this approach had a number of limitations. For one, it was expensive. And then it was next to impossible to ask more complex business queries for analytics against the content.
We decided to explore alternatives, and ran a Proof of Concept (PoC) on MongoDB, which proved itself in all of our tests, so we made the switch.
What encouraged you to consider MongoDB?
Currently our internal data model is a JSON-LD schema, and so MongoDB seemed an ideal fit as it offers native JSON document storage, along with a rich query language, and distributed, scale-out design.
We use agile methodologies and DevOps to build and run our applications. With MongoDB, our developers can move much faster, creating new services without first having to pre-define database schema.
Can you talk us through the migration process?
Migration was completed in a multi-stage approach:
Copy the data from the Key-Value datastore to MongoDB
The application then writes to both databases, and reads from the existing Key-Value store to maintain existing application functionality.
After one month we performed a reconciliation between the databases to ensure they were both in sync, and then moved all reads to MongoDB while still writing to the Key-Value database. This approach would enable us to switch back if required.
After another month we re-directed all traffic to MongoDB, turning off writes to the existing store, and dropping it.