A Mobile-First, Cloud-First Stack at Pearson Transcript

[INAUDIBLE] going to feel all alone if we're all spread out. Feel free to come together. Thank you for everyone who's hung with me through the business track. We have three more great presentations. I promise it's worth the time and effort to hear how companies, organizations, businesses are transforming the way they do business, think different, act differently, using MongoDB.

So we heard some great stories today. We heard about how to fall in love using MongoDB by signing up for eHarmony. We heard about cancer research, how its data between clinical and research data is coming together. We've heard some really great stories, and we're going to continue with Pearson Education. Probably all of you at some point have touched a book that's been published by Pearson. (WHISPERING) Is that right?

There are books all over the world by Pearson. And once you know the name, you actually see it all over the place, particularly if you have anyone, kids, yourself, doing continuing education. And so as they expand around the world with their education books, they're thinking about mobile. How do we move out of books into the mobile space?

We're very lucky today and very appreciative to hear another set of stories from the CTO, Mr. Aref Martin. He is an industry expert, been working in IT many years. He's been the CTO of TomTom. So someone said MapQuest earlier today-- now TomTom, the modern version of MapQuest.

He's been the CTO of RealNetworks. So he's covered different industries. And he's now currently the CTO of Pearson Education. He's here to share a story. Please remember to turn your mobile phones on silent. Remember, there's a survey for each session that you've been in. If you could fill it out, that would be really great. What do you win if you win the prize for doing the survey? Someone tell me.

An Xbox.

An Xbox, all right, somebody's been listening, great. OK, thank you for that. We'll have questions at the end. Mr. Aref Martin, thank you for sharing your story.

Thank you. Do I need to turn this on? No, I guess you can all hear me, right?

We can hear you.

Great, perfect. Good afternoon. So she gave up a little bit of my presentation. Because I was going to ask how many people really know Pearson. For the educated people amongst us, you've probably all been customers at some point in time, either for yourselves or for family members that are attending K through 20 programs in primarily North America and Western Europe. And other things, other little known facts about Pearson-- everybody knows Financial Times. Financial Times is a Pearson company. Everybody knows The Economist. The Economist is a Pearson entity.

There are many other sets of activities. At one point in time, we were a very diversified company. But we've now consolidated pretty much on publishing, going digital and online education over the years, and have a focus that is driven by lifelong learning, so K-12, higher education, and professional education as people go through their graduations and go into their professional life, and also driven by other activities.

But our motto, as you see on the screen, is "Always Learning." And it's a pretty interesting motto. Because it never stops. Not necessarily our mission-- our mission is to educate the world. We have a vision that we want to have a billion students that are kind of on a regular basis touching our content and are getting educated across the globe-- kind of the official vision.

But my vision, a little bit personally, a little bit of a tangent to reaching out to a billion people to educate them, is really to increase the pace of learning. So if it's taking you a year to learn a subject, if it's taking you a semester to learn a subject, how can you reduce it down to three days, right? Because if you can actually do that, then you're touching the holy grail of education, right? Because that's what a lot of people are after. Education is an expensive proposition. Everybody wants it.

It's an expensive proposition. But if you can kind of reduce the length of time that you spend on educating yourself on a lot of different subjects, you can learn a lot more in a shorter amount of time. You don't have to spend as much on it. And be able to really educate the world at a much more rapid pace is kind of my personal view of what really needs to happen in this industry.

Education as a whole industry is kind of a slow industry to begin with, right? And there are many reasons behind this. The universities in North America have kind of set the pace. They have defined the definition of what higher education means. The school systems in North America, K-12 school systems, have pretty much defined it for the rest of the world as to how you go about doing this. There are slight differences from country to country. But really increasing the pace of learning will kind of be the disruptive force that will motivate a lot of things.

So "Always Learning" is basically the mission, the motto that we have. Mobile first, cloud first technology stack at Pearson, and the introduction of NoSQL and a MongoDB initiative is something that I would like to explain to you a little bit today. A lot had to happen before we kind of get to this stage. And I think that our story is kind of interesting for those who are kind of thinking about this or are half way in the middle of implementation. And there's some lessons to be learned. At least for myself, there were plenty of lessons to be learned.

So introduction to Pearson-- I talked about who we are, $9.5 billion in revenue, half of it coming from education and growing fast. Primarily, our market has been limited to North America and Western European countries. But going forward, our revenue picture is changing rapidly. We will still grow in our current markets. But what needs to happen for us to have sustainable revenue and continue to have double digit growth figures associated with our business is to ensure that we go to the emerging markets.

And that kind of introduces an interesting dynamic for us, right? The emerging markets are very different. They require different attention here. To go to the emerging markets, you have to worry about a few things. Many of the emerging markets that you want to go to, one of the primary modes of consumption of content is the mobile phone. The other one is that you can't afford to put a data center in every country around the world. So you have to be cloud-based.

You have to be running on a mobile. Many of the countries that you want to go to are very bandwidth hungry countries. Therefore, your content has to play offline and needs to be accessible when the consumer does not have access to bandwidth, and introduces a set of really interesting technology issues for the technologists to try to introduce new solutions for this and how to deal with this.

I joined Pearson about two years ago. Kind of my mission here was to really try to take a fresh look at what we are doing technology-wise, consolidate from a multitude of platforms, double digits, to a single platform. And the definition of the platform had to be defined.

So to me, a platform is something that's open. It's really usable by all who want to access the functionality and services that the platform provides. But it's really not a platform until you've actually opened up the APIs to the external world and not just to your own developers so that external world can have the capability to actually introduce new applications on top of this platform. And then, you have kind of got yourself to a point where you do have a platform.

How to get started-- wanted to go away from a lot of proprietariness, many Microsoft.net stacks that existed and still exist to some extent. When you want to go through a transition like this, a lot of people get kind of caught up in the middle of all the day to day firefighting issues. I think you kind of need to rise above it a little bit and just basically say what is it, where is it that you want to go, and kind of work your way back from that. In a lot of cases, kind of ignore a lot of the obstacles that exist. The problems that you have with your current stack should not become inhibitors to you going forward.

And we had to kind of struggle a lot with this. There was a lot of technology issues that had to be overcome. There were a lot of people issues that you had to overcome. There were a lot of organizational structure issues that we had to overcome, which I'll kind of walk through a little bit.

And then, I want to discuss a few use cases that we've had. These are things that are immediately affecting our external facing applications. External customers are affected by this. So our identity and access management capability is an interesting one. This is in our database about 120 million or so user accounts. On our peak days around 40, 50 million students basically show up for registration and log into our different applications and tools that we have online.

And then, the adaptive learning and analytics-- very, very important advance for education that we'll talk about here a little bit. And then, the capability that we internally call-- this one is an internal capability. But it affects external facing applications, which is our activity framework, the Pearson activity framework, which tracks the usage kind of like a Google double click, right? It tracks the usage of the applications, so system to system, or users to systems, and interactions that take place in between to try to figure out who goes where and does what with the education material so that we can actually collect that data and do the necessary analysis for that and feed our analytics engine and the learning analytics and adaptive capabilities that we have so that we can eventually increase the pace of learning here through the adaptive capabilities that we have.

And then, some challenges that we've had along the way with MongoDB and learning experiences that we've had kind of along the way-- and then going forward what we see as kind of our challenges that we have to overcome here. So our vision is driven by a few things. We want to be mobile first. We talked about this, right? Many of the countries that you go to-- bandwidth hungry, need offline capability. Mobile is the only device that they have access to, pretty much, and therefore not build your applications for web first consumption.

This was a very radical way of thinking about how you run a business when we were talking about our technology stack, right? So when you want to build the new application, you're just going to build the mobile version of it and see what happens from there. If people require a web version of the application, and if it's necessary, you kind of-- if you use modern browsers, many of the modern browsers already have the capability to accommodate the web consumption of it. And if they don't, you basically modify from the web version going back to the modern browsers, so very radical way of thinking about this.

100% of our applications were based in our own data centers. And some of our data centers, they've been hosting stacks of Microsoft, stacks of Oracle, stacks of you name it, legacy capability out there-- a few not modern, a few basically really kind of ugly looking implementations of data center technologies. And so we just basically said that anything new that we build, we know that it needs to go global. At some point in time, it may have to be actually hosted in somebody else's cloud. So why not just kind of build it based on cloud first?

And so build for mobile first, and then put it into Amazon-- not necessarily use all the Amazon proprietary feature functionality. But really provision a server capability there and just kind of use the server technologies and use off the shelf technology piece parts that I'll talk about here next.

And then so mobile first, cloud first, and then our application really has to be built such that it measures how it is that our consumers are interacting with this application, right? The efficacy of our content really matters here, right? For those who have actually followed Pearson, you may know that we have kind of announced as a public company to the market that we will measure in year 2018-- and the engineers amongst us argue, is it the beginning of 2018 or the end of 2018?

But we have announced that in year 2018, we will be announcing to the public markets as a part of our revenue, annual and quarterly revenue reports, as to what percentage of our revenue is actually associated with the efficacy of our content. And that's a big, bold move from where we've been, kind of measuring data, receiving data, gathering data, aggregating data in different silos like double digit number of independent platforms, to all of a sudden now consolidating everything on a platform with a data channel that exists there in this platform, and measuring things, measuring the efficacy, doing the analytics, improving the efficacy through the adaptive engine that we build, and the algorithms, and then kind of feeding the data channel to ensure that our content is associated with our efficacy capability, and then report it to the stock market as a part of our quarterly report, right?

So that's a big, bold move, right? So neither one of these things are easy challenges-- going from a web to a mobile, going from a data center to a cloud, going from literally not being able to measure any sort of efficacy outside of a single property to a global data channel for efficacy measurement, and then reporting it to the stock market. So, big challenges.

Many of the team members that actually have presented here have demonstrated these components, right? So basically, we had to really step back from the problem-- double digit number of platforms. Let's just try to figure out where we want to go instead of worrying about where we are. And we spent a good couple months with a very strong architecture team analyzing all the components that are there and ended up with this picture here, right? Some items that are green are items that we've actually built and we have open sourced ourselves, or have used from the open source community.

And items that are kind of bluish are items that are closed source. These are things that we are building ourselves. And we just use it for our own internal usage here. And notice that up on top, there's a web stack and a mobile stack, right? You can't just abandoned web, because you're coming from that history and the background and the huge customer base that you need to preserve. Therefore, you really need to do something about it.

No reference to any sort of legacy relational, no reference to anything that's .net-ish-- and basically use technologies that are off the shelf. Many of you use the same components and start building a new generation of applications on top of this technology, right? So, many things happen when you kind of actually make a big, bold move like this.

Some of you may say, well, what's the big deal about this, right? We do this every day. Well, there are different classes of companies, companies that are referred to as unicorns. Unicorns basically are start-ups. They've got nothing to fear. There's literally no revenue. And they go off. They jump into these things. If they fail, they pull themselves out. They try something else.

So unicorns have the opportunity to actually go around. Well, when you're supporting $4.5 billion of revenue, you're not a unicorn. You're a horse, right? So that's where we are. We are not a huge bank like many of the ones that you've seen here-- Citi, et cetera. We are kind of a mid-size. Our revenue really matters a lot.

But if we use $1 billion, it's a quarter of our revenue that disappears. If Citi loses $1 billion, it's probably nothing to them. So for us, we're the horses, right? So the horses have to really worry about making sure that they keep pace with what it is that they're doing. But also, at the same time, they need to make sure that they kind of set the stage for proper transition, too.

So we decided here, the how is important here. We decided that we want to think like a start-up, right? We want to use open source wherever it's available. We don't really want to reinvent the wheel, right? We want to use vendor solutions wherever it's possible.

And we want to build proprietary solutions whenever it's necessary and strategic for us. So this kind of led to a lot of discussions from the technology side, as you can imagine. But also, organizational issues-- big company, $9.5 billion company. As you can imagine, it's not a unicorn. It's not a start-up, right? It's got organizations. It's got boundaries. People in this part of the organization can only use that. People in that part of the organization can only use this.

If you have teams that are not reporting to you, they start saying, well, we're not interested in going in this direction. If you have teams in the other parts, they're rooting for you, because you're saying all the right things here. So you have to kind of deal with those challenges. And they're not easy, right? So you kind of have to navigate your way through this. And at some point in time, it requires a little bit of courage to say that, OK, well I'm pulling the plug on this other stuff, and see who's going to scream, right?

You can't just kind of be dictatorial about this. But really, you can't build a new generation of software in a democracy, right? So what we really needed to do was to kind of really navigate carefully through this-- so organizational issues and, as you can imagine, people issues, right? I literally was pulled into a room with all of my DBAs that were in the room who kind of told me, please explain yourself. What do you mean by these things that you're saying here, right?

There was a room full of people, around 50, 60 or so people. Now, our organization is large, 1,500 engineers, right? So 50, 60 is really nothing. But 50, 60 people kind of pulled me into a room-- and I had to actually buy lunch for them to kind of make sure that they were smiling.

And we had to go through this entire thing one more time with them-- this is what we mean, this is what we are doing, right? We need to transition. We're not saying that we're going to abandon ship, because we need the revenue. But we're saying that we can't stay where we are either, right? And this process, by the way, we really need your help, right?

So we are based in Denver. A big part of our technology is based in Denver plus other locations around the globe. So in Denver, there's this common saying that if you do this, people are going to run for the hills, right? And they told me that, right? If you kind of basically go ahead and go through this experience of changing everything on everyone, we're going to lose steam.

And there you go, right? So sleeping stops. You need to start thinking about this. What are you going to do if so and so disappears? What are you going to do if so and so disappears? And you kind of go into this endless discussion regarding individuals and people and critical team members that you need to kind of not deal with, and make sure that-- do it such that they don't run for the hills.

And it was an interesting experience. So I just kind of basically went back to my own self engineer and decided that, well, we really need to do this now. Because as an engineer, I would've never liked to be stuck in one form or another type of technology forever and never learn anything beyond what I've been learning for ages here, right?

So I kind of put myself back into my engineering days and went back and discussed with a couple of our leads and discussed it with our managers, discussed it with our critical people, and finally decided that we've got to pull the plug. And we're not going to disconnect overnight. Because this is a journey, not a one shot deal. It's going to take us three years to kind of go through this transition.

But anything new that we build from here on forward, we're going to build based on this technology aesthetic. So how many people do you think ran for the hills after we did this? So I lost a contractor who wanted to be specialized in Microsoft technologies. And I lost a full time employee who had actually dedicated an entire career to MS SQL. And that was it.

So nobody else-- I mean, we had attrition. But nobody else ran for the hills due to this reason. As a matter of fact, part of this activity was, well, we're going to spend a lot of time and energy doing a skills upgrade program, right? For anyone that wants to be trained, great. Anybody that we lose through our attrition program, we're going to rehire people back in.

And this entire thing really actually helped our recruiting campaign in our office locations. Because a lot of engineers wanted to kind of deal with this kind of stack and nothing proprietary anymore, right? They liked the openness. They liked the approach that we have here.

So really no one ran for the hills, right? So that was a critical experience there. It's an important thing to note, important lesson learned, that I think you've got to think like an engineer when you're kind of dealing with issues like this, and kind of go back to your own roots and think about the logic of, what would happen if so and so did this, and then go back through the entire analysis and figure out your way out. So big challenges, technology challenges-- a lot of this was new for the team, people challenges, individual challenges, and then organizational challenges that had to be overcome here, right?

And selling it above myself was not a very difficult thing. Because that was the charter that I was given. Selling it below myself to my teams and peers was the difficult part of this, right? Because you had a lot of people that were emotionally attached to the pieces of work that they have produced over the years. And just like any other scenario in life, babies grow up, and they need to leave, right? And that's kind of basically what had to happen here. So I thought I'd share this with you. Because I think that this is an interesting experience here. As a part of this, why MongoDB, right? We wanted to ensure that we quickly proved to ourselves that we can actually build things based on this new technology stack, right? So MongoDB was one of the components. But there were many other components, right?

We had to actually prove to ourselves that, yeah, this technology stack has got legs at Pearson. Other companies have actually done this, have proven this, unicorns or horses. But nobody at Pearson had done this, right? So we had to actually prove this to ourselves that-- let's do this quickly and show and demonstrate. So we want to pick something that's open source. We don't want to get stuck for the long run. We want the agility.

The application that we wanted to build was an e-book. So it was really a document-oriented type application which really fit the model nicely. The scalability issues were there, although we weren't quite sure what that really meant. Because scalability for us in the early days was basically up to 100,000 users based on this new application that we were introducing, even though we have double digit millions, high millions, of subscribers.

Not all of them are going to be using this application. But we want to take it global. Because if it catches fire, we want to really do an experiment with this. We are building a lot of new applications based on this technology stack. Our culture is really a DevOps culture. We want to fail fast to see which one of these things actually catches fire in the market.

If it does, then should we bring it back to our own cloud? If it doesn't, then kill it, right? So this fail fast culture associated with our DevOps culture that we have established here really helps us to try to determine what applications you really need to pay a lot of attention and focus on. And scalability is an issue when you're thinking about, OK, if something catches fire and needs to go global, what are we going to do?

Or all of a sudden, instead of 100,000 initial users, you have 40 million users that want to come in and start using this. So scalability-- big issue here. We had lots of fast reads. Because that's kind of the nature of many of the educational components. There aren't really that many heavy write-oriented applications, aside from some of our labs applications, which have more write.

And what was important for me but not noted here-- really NoSQL is important, right? If you think about the nature of what's happening in a classroom, there's a teacher, there's an instructors. They're talking to the students. Students are asking questions. There's some content that's structured.

But the majority of the interaction is not. We have capabilities that you can do peer to peer. From here, I can send out a question to you. And based on your answers, I'll group these three people, those four people, these five people, into study groups, peer to peer study groups. And then, I switch gears. And based on my adaptive capabilities and algorithms, I assign different types of questions to different groups. And different groups start interacting with each other through our social capability that we have associated with this classroom.

And at that point in time, there's nothing structured anymore, so basically semi-structured or NoSQL data that you really need to deal with. So, important selection criteria for us was essentially the Mongo usage at this point in time. And yeah, like everybody else, we were worried about reducing maintenance, not paying the heavy licenses. Because in our fail fast model, we didn't really want to get stuck with continued activity.

Some examples and use cases that we have here up on my left, your right-- so the identity access management. This is an application that-- essentially all of our users will be touched by this capability eventually. We refer to it as Peterson Identity. So any ID that you have, that's coming in from any sort of association that you had with Pearson basically gets resolved through this Pearson Identity and federated. And we recognize who you are. And we provide you access management to our content that you're allowed to, that you're subscribed to, with your devices.

So all of our users will eventually be touched by this capability. So today, around 50 million or so people-- 120 million user IDs, but 50 million or so people showing up on our peak week during back to school season, which is coming up on me here now. So keep your fingers crossed.

The other issue that we are in the middle of investigation but we haven't really resolved yet, and really a challenge that we see in the Mongo technology, is that what we want to do is that we want to take our identity access management global, right? So that has its own implication we'll talk about. Some elements of our learning management system are using MongoDB capability. Some assignment authoring functions that we have are using some components of MongoDB capability.

Our social capability is important, right? So for every classroom, there's a social agenda. And if you're taking more that one classroom, you can participate in the social network that's associated with that school and start communicating with people that are taking the classes, and have an offline chat with instructors, with other students, a lot of threaded discussions that go on in a typical classroom scenario.

Our adaptivity analytics-- very, very important, right? Some students learn different subjects at a different place. And you really need to be able to detect that if you want to make sure that your content is efficacious. That's a Pearson word. For the efficacy of our content, you need to make sure that if a student is having problems with this subject, try to determine that early on as to, should they really receive question number two firsts, and then question number three, or the other way around?

And this really needs to happen real fast near real time, right? So our goal is to kind of reach a millisecond range to be able to make sure that you have content that's really adaptive like that. The analytics part-- very important, right? Because that will tell the instructors, the schools, the programs how the classroom is coming up, how the individuals are coming up, how they compare to the previous classrooms, how much time are they spending on different subjects. Do they really go back and re-look at the material that they reviewed? All kinds of analytics come about as a result of analytics in the classroom and in the school.

And then, another capability that we have that uses MongoDB is a capability that we call the activity framework, right? From the time you log in, what else are you doing here in this classroom setting, right? What tools are you using, what capabilities? What questions are you looking up? What are you answering, et cetera, et cetera. And then, we use all that to kind of feed our analytics engine and our adaptive engine that we built.

So in the identity access management-- four shards, three replicas. Nobody wants to spend time waiting on an identity authorization authentication, right? Just kind of get it done really fast and get me through to the capability that I want. And it needs to be highly reliable. Any time somebody comes in, that identity really needs to work. And therefore, the average down time per year really needs to be reduced. So a lot of this really fits Mongo-- am I running out of time? OK, I'm going to go faster.

This is a problem that we haven't solved, but we are in the middle of discussion here. Identity is important. But as you're going global, it becomes even more important, right? Why? Because privacy is a big issue as you are kind of bringing in data back. And you really need to make sure whose data can actually come back to the US. You really need to make sure what portion of the data you really need to keep outside of the country because of privacy and regulations that you deal with.

It requires highly reliable multi-zone, multi-region seasonal issues, right? When back to school season starts here in the US, you can't really be doing maintenance in Singapore and having impact here, or vice versa. Because the seasons for schools are different, so a big issue which we have not resolved quite yet but are in discussions with the Mongo team.

Also interesting, to my right, to your left, we do a lot of background data gathering, pull all the data, anonymize it, AWS S3, use elastic MapReduce, feed MongoDB, and use the APIs to feed our analytics engine, which is feeding our learning applications, right? So as people are-- and the reverse direction also happens.

As people basically go through the learning applications, we get our analytics put into MongoDB and have interfaces between MongoDB and our storage capabilities in Amazon for Hadoop farms to make sure that we do background analytics of this, and then use the MongoDB APIs for the real-time feeding the learning applications that we have.

And again, like I said, the activity framework, from the point of an author of a piece of material that is getting prepared for content management systems, the activity framework kind of tracks it from that angle, also from the point of view of a learner that's coming into our system, also from the point of view of, what is the learning platform providing to this learner, and also from the learning services and the applications as to what are they doing when they're going through this experiment. We gather all that data through our activity hub, store that partially in RDBMS, but a lot of it in the Mongo installation, and do key value matches with different aspects, the multiple attributes that we have gathered regarding this specific individual who is using the system, build the user profiles, and then leverage a lot of inherent Mongo capabilities to make sure that we get a proper data, anonymize it, feed our application engines, and the adaptive engines that we have.

So my last slide here, actually-- as we go multi-region, as we extend globally, this is a big challenge. I think you heard some of the earlier presentations where some of the users have extended this. I'm really hoping that I can work closely with those members of the community, as well as other members of the community, to try to figure out, what is the right way of reaching a conclusion on this? Because it's not as easy as it looks.

I think we had lots of challenges in the early days with configuration management. And I saw that we are improving through the automation capabilities. I was quite excited to see that and hear about it today. And enterprise licensing model-- I think MongoDB as a company is kind of reaching this stage where they're realizing that this isn't just about start-ups anymore, right? Because there are many larger mid-size organizations that actually need some form of enterprise support to ensure that they can conduct their business properly. They can rely on MongoDB as an organization to support them properly through this.

And then, continuous training in this front-- this is moving fast. And you can't assume that all developers are coming from a unicorn culture, right? There are a lot of developers that are coming from a horse culture. And it takes a lot of time for them to adjust and adapt to the NoSQL paradigms and learn the best practices around that, right? So continuous training I think is a very good thing to go by. So that's all I have. Thank you. [APPLAUSE]

Sorry for taking so long.

No, not at all. Thank you so much for sharing so much of your journey. I like the analogy, you're not a unicorn. You want to be a unicorn. You're a horse.

I'd love to be a unicorn. I've been a unicorn before.

How do you take a new generation of software and put it in without creating a revolution-- very nicely shared. Thank you so much. I'm going to ask that we do questions in the back of the room. Because we're actually at the end of time. Thank you for sharing your story with us. Everyone here knows Pearson anyway. Maybe your children are going to be in that back to school sign-up. You never know.

Please remember to fill out your surveys for this. In 10 minutes, we'll be back in this room with another exciting story, not about mobile first this time, but more about health care. Medtronic is coming to share with us their story. Please come back and join us. And then at 5 o'clock, our very last graveyard shift, we've got a great story about Hike, who's created 15 million users in a social messaging app that's great to hear about. Thanks, everyone.