High school students only have to worry about one transcript: their own. But for Pearson, a multi-billion dollar learning company that operates in over 70 countries and employs some 36,000 people, its transcript management problem is much bigger. Pearson Education manages the transcripts for over 14 million students from more than 25,000 institutions, and makes and allows NTC member institutions to securely send records and transcripts to any of over 137,000 academic institutions, not to mention employers, licensure agencies, and scholarship organizations.
To manage this big data problem, Pearson turned to MongoDB as the underlying database for its National Transcript Center.
Pearson’s National Transcript Center isn’t merely a data store for student transcripts. Pearson stores student data and also transforms it from one standard format to another, including PESC High School Transcript XML, PESC College Transcript XML, SPEEDE EDI, SIF Student Record Exchange, and others. Pearson also generates PDF copies of a student’s records, and provides print copies when electronic delivery is not available.
The impetus to use MongoDB was a request to archive student data at the end of each year, rather than deleting it. If the student had graduated, why keep her records around? As it turned out, there was plenty of reasons, including the potential need to transfer records between higher educational institutions or on to employers.
But how best to store and manage this student data?
Pearson had been using an open-source relational database (RDBMS) to store the student records. However, Pearson ran into performance problems with this RDBMS, problems that would compound each year. The idea of taking a year’s worth of student records and sticking it in a separate table, then sharding over and over as the years passed was going to make performance even worse.
So Pearson turned to a key-value NoSQL database. Unfortunately, this too, posed problems. Pearson had no idea what a student record would look like in the future and so needed a dynamic schema. The company did not want to keep creating new tables as fields changed.
Another problem with this key-value data store was that its filtering mechanism was hard to work with as Pearson employs very complicated queries, where the company searches different fields at the same time. It proved too difficult to get all that query data marshaled with a key-value database.
At this point, Pearson decided to give MongoDB a try.
Pearson’s development team immediately appreciated the ease of working with MongoDB’s flexible and dynamic data model. But it was perhaps MongoDB’s query mechanism that sold the team on using the NoSQL database. Mongo automatically converted Pearson’s queries from Hibernate into MongoDB. Pearson had Hibernate criteria calls, which allowed the team to avoid building SQL queries by hand. This work mapped directly to MongoDB, saving Pearson time and trouble.
Other benefits became apparent over time. With Pearson’s original RDBMS approach, Pearson would have been forced to search gigantic tables when querying the student records. But with MongoDB, if Pearson starts putting too much data in a namespace, it can easily shard the namespace in MongoDB, for example, enabling search by district rather than of an entire state.
Hence, instead of storing student data in a blob, as happened with the RDBMS, Pearson is able to use MongoDB’s GridFS, enabling Pearson to keep files and metadata automatically synced and deployed across a number of systems and facilities.
For students looking to get into a good college or employer, their transcript is their passport. By using MongoDB, Pearson has been able to boost performance for its end-users, all while improving ease of use and productivity for its developers.