The National Archives

The National Archives is one of the world’s largest record repositories, holding records spanning 1,000 years, from the Domesday Book and the Magna Carta to official reports on the Titanic and modern government papers. As the official archive of the United Kingdom government, the organization provides a searchable catalogue of all public records. They are experts in information management, offering detailed guidance to government departments and the public sector about the care of historical archives.

The National Archives’ catalogue is more than 10 years old, and comprises a collection of datasets stored in relational databases, web applications for user access, and back office systems for data entry and maintenance. It provides a free online search facility for the collection, but has its limitations. A certain level of knowledge - whether of the subject matter or record series itself - is required to successfully search the Catalogue. Additionally, the size and amount of data in the Catalogue has doubled since it was first launched in 1998; data volumes handled by the system are expected to reach hundreds of terabytes by 2014 and petabyte scale by 2020. Many data storage technologies cannot support the required level of scalability, performance, and maintenance, while keeping support costs low.

In 2010 The National Archives decided to move to a Service Oriented Architecture and spent several months designing a Business Information Architecture framework. This framework views data as the language of integration and insists on central management of data via a federation of information services. This provides a growing set of business objects that can be managed as corporate data assets. Implementation of the system based on such a framework required a new approach to its data store. They chose MongoDB as it offers all the necessary general advantages – fast and easy scalability, efficient filing and complex metadata storage.

A beta version of the new catalogue, Discovery (nationalarchives.gov.uk/discovery) based on MongoDB, was launched in April 2011, and is in full production as of November 2012.

Deployment

  • Stack: Microsoft .NET, ASP.NET MVC, WCF
  • OS: Windows
  • Programming Language: C#
  • MongoDB Drivers: C#, plus Perl drivers for projects using Perl
  • NetApp storage and Autonomy search
  • Monitoring by MongoDB Management Service (MMS)
  • Database size: 117 GB and growing; eventually petabytes

More Information

For more, see Aleks Drozdov's presentation at MongoUK 2011: From SQL Server to MongoDB