Jeff: Eight years ago MongoDB was an internal project at 10gen, a company that was trying to build a platform at a service out of open source components. The team at 10gen realized that the platform as a service play would be too complex and too difficult to build. Since MongoDB was the most valuable component of that project, they narrowed their focus to this new document oriented database and changed the name of the company to MongoDB. In today’s episode, MongoDB CTO Eliot Horowitz describes the history of MongoDB. The open source project as well as the company which recently released a managed cloud service, MongoDB Atlas.
Eliot explains how the company has architected the MongoDB Atlas service on top of AWS and why developers often want a managed service as their database rather than managing database servers themselves. This is a great episode if you yourself are building a managed cloud service or if you are thinking about where to host an instance of a database. Should you host it on a cloud, should you host it on a managed cloud service provider? There are pros and cons to each of these approaches. Full disclosure; MongoDB is a sponsor of software engineering daily. I hope you enjoy this episode.
Eliot Horowitz is the CTO of MongoDB. Eliot, welcome to Software Engineering Daily.
Eliot: Thanks. Great to be here.
Jeff: So I would like our conversation today to focus on how a database company works and I want to touch on product development, the engineering, the business model. We’re talking about MongoDB specifically because you are the CTO there but I think there will be lessons in this conversation that will be applicable to any kind of infrastructure business, certainly to database businesses. So in order to build our conversation in that direction, I want to start with the customer who in this case is a developer. So when MongoDB started to gain popularity six or seven years ago, why did people start using it? What were those customer desires?
Eliot: Sure. So really the main motivation for people trying at MongoDB and adopting MongoDB really came around from developers who wanted to be more productive. So when people look at MongoDB, specifically developers they tend to look at two things, one is the data model. So MongoDB uses documents instead of the relational data model. Two is the distributed systems components of MongoDB. So the data model for MongoDB is documents and documents we believe are a fundamentally easier data structures for developers to work with. They more naturally suit the way programming languages work, because documents map pretty well to objects, they naturally map to the way people thing. No one thinks about breaking up objects into tables, the rows and columns but they do think about things in structures. Three, documents lead to a lot of efficiency on the computing side.
The other big piece is the distributed systems. So, starting on the simple things such as high availability and then moving on to horizontal scaling and geographic diversity. So really those two concepts are what brought people to MongoDB overall making it much easier to work with data, to store data, to work with data, to query data and to get value out of the data.
Jeff: Those are absolutely true. I also think that MongoDB gained popularity for similar reasons that Ruby on Rails gained popularity because that document centricity where you can think about documents more easily, means that you can build prototypes with it more quickly, you can evolve those prototypes more quickly and it makes ease of use a lot more straight forward. I think it also coincided with, maybe not the beginning, but I felt like there was a growth in new developers starting to program around that time where MongoDB was gaining populous was also when Ruby on Rails was getting a lot of popularity. Do you think that’s an accurate depiction?
Eliot: Yeah, I think that you saw the growth of the internet and web applications, the growth of mobile applications which we’re bringing in a whole new breed of software developers. Cloud computing starting to be a real thing was a big deal because the cost of infrastructure was so low. You know in the ‘80s when people were deploying applications or building applications, the numbers we see is that 75% of the cost went to hardware and software and 25% went to developers. And now that’s flipped. 75% of the cost is developers and 25% is hardware and software.
So making developers cost effective and very efficient, is incredibly important and the time to market of features is becoming paramount. So I think that is where things like MongoDB and all the new programming languages, whether it’s Ruby or Python are also making a pretty big difference.
Jeff: The other thing was that around this time 2009 there were, so all these new developers that were joining were not people that were fluent in SQL, so they were totally open to, you know, oh this is the way that we do databases like documents, it could be totally organic for them.
There are lots of podcasts out there that have done comparisons between MongoDB and SQL. So I don’t want to spend too much time on that area, but maybe we should spend a little time for people who need a refresher or who are new to this. Explain some of the differences for the programming experience between document oriented database like Mongo and a SQL database like MySQL.
Eliot: So fundamentally the main difference is how you store the data. So in Mongo you’re storing data in documents. So documents in Mongo look very similar to JSON documents. So they are structured, you can have arrays, you can have nested elements, you can have arrays of nested elements.
So for example if you’re storing information about a person and you want to store a list of all their addresses and previous addresses, you would store documents within an array called addresses and there’d be a list of documents. The document would have city, state, ZIP, it could have a field like current, it could have other fields like types or it could be a home or a school and you can put in whatever you want there. So then when you are going to look at a person in a database it’s just a single document. When you’re mapping that to code, it’s a single dictionary or object or whatever sort of idiom it is in your language.
In a relational database it’s a little bit different because then you’re going to have a table for your core user information and then a separate table for every address. Then what you have to do is join those two tables together. And for two tables that’s not so bad, but for a typical user profile in a big application it’s not two tables it’s dozens of tables and that’s where things start to get pretty complicated.
Jeff: Okay. What about the query language between Mongo and SQL? How do those differ?
Eliot: See the query language for Mongo, well first of all the first key thing is that Mongo does have a query language. One of the things we tried very hard to do is keep the paradigms about relational databases as similar as possible when people move to MongoDB. So there is a query language, there’s a shell where you can go and type things in, indexing in MongoDB works exactly the same way it does as a relational database. So there is a query language.
The difference is that it’s a query language designed for a document model. So SQL is very much designed for relations and a lot of people have tried to put JSON or Hierarchy into SQL and none of it seems to fit very well. We also wanted to modernize it in a number of ways to make it more programmatic, to make it easier for developers to work with as opposed to typing in SQL directly. So we basically made a new query language for MongoDB scale the MongoDB query language. And it’s really just the same kinds of things you can do in SQL but just not in a very document centric way.
Jeff: Now today there are many architectures that might have a combination of NoSQL and SQL databases. I think you also talked about this type of trend where SQL databases might try to build them a document like interface and document base languages databases, sometimes build a SQL like interface on top of it. Why are these different types of polyglot database architecture useful? Or are some of them not useful or maybe you could give me some light on this polyglot notion.
Eliot: So certainly some are useful today and some may not be useful in the future, but frankly there’ll be polyglot systems forever if for no other reason whether they are very large complex relational systems being used today where the migration cost from going from relational database to a document database is in a way too costly to move. So polyglot is going to be here and it’s going to be around forever.
So if you put the new application, I think that the biggest challenge with MongoDB in all the new database are they are still pretty new so there are features that MongoDB simply doesn’t have yet and if you want one of those features or if you are using a tool that only speaks to relational database, you’re going to have to use a relational database. Over time I think we expect things like MongoDB to be able to handle more of the used cases in applications but it’s certainly not a hundred percent ever and definitely not anywhere near that today.
Jeff: So we’ll get back to where databases are today, but let’s talk a little bit about the business; how the business has evolved. MongoDB came out of 10gen which was a company that wanted to have a platform as a service that was based on open source tools. This sounds like a very reasonable idea for a 2007 business. Cloud was growing, we’ve seen it grow exponentially since then. So why did you end up pivoting to a database company?
Eliot: So when we first started 10gen, we were building a database which became MongoDB and the full platform as a service. The closest analogy to what we were building was App Engine and this is, we started building this before App Engine was out. So it’s definitely one of the things where you take your code, you give your code to us and we run it. So a fully contained platform.
So when we were looking at it, we were a startup, we had a few people and basically we said there is no way we could build everything you possibly need in order to really completely use this platform as a service. And if you can’t use it for everything, the utility starts to drop dramatically. And you feel like with App Engine I think it started out not very popular and still it’s almost eight, nine years later and now it’s starting to get more and more features but it still doesn’t have everything and it still suffers from the same kind of problems.
So we looked at the full platform and we thought the database was the most interesting part of it, sort of was really less of a pivot and more of a we’re going to slice the top half of what we’re doing off and just focus on the database which we’re sure we’re most interested in at that point.
Jeff: If you would have gotten the open source Flywheel spinning maybe you could have actually built out that whole platform. Was there some friction to getting open source heavily involved and sort of saying, hey this is where we are going roadmap wise and then you lay out some kind of skeleton for all these different pieces that you would need. That’s often how these smaller companies that have an open core model try to leverage the community.
Eliot: We haven’t really thought about it. I would say open source, and it is hard to make that work. I don’t think there are very many examples; there are not very many examples that work right there.
Jeff: There’s huge executioners there.
Eliot: If you look at the concertion model open source, it’s pretty messy I would say. Not a huge number of successful things and building kind of new technologies out. I don’t think that’s a really innovative round new models and for something like that, I don’t think we really thought that was likely to be successful.
Jeff: Where there other document oriented databases at the time?
Eliot: So when we first published MongoDB as a public open source database which was February of 2009, the other document database out there was CashiDB. I think it was really just the two of us.
Jeff: Okay. And so before 10gen you were working at DoubleClick which was eventually acquired by Google and that was a big business. Were there any lessons that you took away from DoubleClick that were valuable in the early days of Mongo?
Eliot: Yeah but frankly a lot of the things we wanted to solve with MongoDB were problems we faced at DoubleClick. DoubleClick was serving billions of Ads per day, it was doing it around the globe, the SLEs around the latency for serving ads was incredibly low. The system had to be up for 24/7. The business requirements for what the Ad serving could do were changing constantly as serving was evolving very rapidly
So frankly when we, and after DoubleClick I started a company called ShopWiki and after ShopWiki was when I started 10gen. Frankly, when we started thinking of MongoDB it really was, you know, let’s build the database that we would want to use for every application we ever built from now on. That was sort of the motivation and was very much a selfish pursuit of I want a database that I want to use for everything, let’s make that happen.
Jeff: Has that become the case? Is Mongo the predominant database that you use internally at Mongo?
Eliot: Definitely. It’s often the predominant database internally. Personally I can’t quite imagine using anything besides that at this point.
Jeff: Really, so are there any edge cases that you have to use other databases for internally?
Eliot: The only databases we use besides Mongo are one, like when we use software that has to work with relational [indiscernible] [0:18:11]. Otherwise we’re all Mongo.
Jeff: That’s awesome. It’s some serious dogfooding. So the first product that the company offered was commercial support for MongoDB. What kinds of support did people need in early days? Why did they need support for their database?
Eliot: I think the answer is the same now as it was then. I think databases are hard, they’re complicated and one of the things we want to do with MongoDB is make it easier and easier, but as applications get more and more complicated, the model that people are using is really to have stateless application servers that are sort of disposable and simple and you tear them up, tear them down and all the state lies in the database and that is hard. Databases performance requirements factor in how fast your CPU is, how many disks you have, the latency of the disks, what kind these are, what kinds of transactions you’re doing, the ratio of reads to writes.
So the complexity of running a database with high availability and making sure that the performance is exactly what you need; it’s complicated. And one thing we do on the products side is make that as easy as possible, and we have a lot of tools to make that as easy as possible. But I think it is complicated and when there is downtime on a database it’s very costly. So the motivation to make sure your database is running perfectly all the time is pretty high and if there is a problem you want to have that problem solved as fast as humanly possible and that’s how we can help.
Jeff: In order to get the kind of introspection that you need into a customer’s deployment, do you have to give them a special version of Mongo or do you have to have access to their systems somehow?
Eliot: We almost never do custom versions of MongoDB. Everything is instrumented in the main version. Definitely when we have access to their systems it’s easier often, but predominantly for most of our clients we don’t have access to their systems and we either go through log data or monitoring data or other things of that nature. But we very rarely have actual access to their systems.
Jeff: So I have data that shows recently about these systems for distribute orchestration of Docker containers or some other kind of containers like Mesos or Kubernetes. And there are companies that are building this model where they sell a commercial version or they sell basically support; they sell maybe an open sourced version but they have support for it. That’s what you pay for. What I always wonder about this companies, is how you scale a commercial support team of engineers when the product you’re building or the platform that you’re supporting is often s new. Kubernetes is very new. How on earth do you scale up a commercial support team? So I would ask the same question about MongoDB in the early days when there weren’t very many people that knew how MongoDB worked in terms of performance and what the gashes are. How did you scale the commercial support team?
Eliot: So in the beginning and for quite a while, the main people answering anything remotely complex on the support side were the engineers building it. When we first had our first commercial support client we had nine people in the company. The first person who ever answered a commercial support ticket was me. So at the very beginning it was that and then after that it was really just training. We spent a lot of time training people, running boot camps, learning word games, running all the different things we can do. We’ve got a pretty complex shadowing program where people can run and learn from their peers. So it really just comes down to training and then making sure the engineers are pretty heavily involved too so that everyone is really working together and it’s very collaborative.
Jeff: Can you talk more about the Techstack that you use to run and manage MongoDB clusters?
Eliot: Do you mean the clients use or…? I’m not sure I understand the question.
Jeff: Well I guess I kind of jumped the gun because you’re building this cloud service for running MongoDB that will get into… and I assumed that the way that you run and manage your MongoDB stuff internally is somewhat similar to how you manage the cloud service. So I want to eventually get into the cloud service but since we’re talking about just MongoDB internals right now, I’m sure you have lots of data internally. So I’m just wondering do you host your own MongoDB clusters or do you use a cloud service provider and where is the instrumentation around your internal deployments?
Eliot: So we have a combination, we have a lot of MongoDB internally on our own hardware, that’s in archival somewhere. We’ve got our own MongoDB in the cloud or a number of different cloud providers. All that’s running using MongoDB management tools for running MongoDB; there are three different variants of that which we can get into and we actually use all variants of that on purpose. So we try to dogfood every management tool we offer, we use for managing something so it’s actually a little bit inconsistent internally just because we want to dogfood our own stuff so much.
Then for the infrastructure side we mostly use the Mongo tools and then a bunch of different other tools. Again we try to test a lot of different configurations because it’s more important for us to dogfood and to make sure we know what’s going to happen with our clients than to have a very consistent internal infrastructure.
Jeff: Okay. So what have you built internally for redundancy and availability and do you have multi cloud stuff going on? Can you talk more about that?
Eliot: So far, I’ll say for our main and our most mission critical infrastructure, it’s half in the cloud and half on prime and we do it that way. We use a couple of different physical data centers, a couple of cloud data centers and have the physical data centers pair with the cloud and that works quite well for us with very high viability.
Jeff: So as we move towards talking about the cloud service, what have been the changes in developer preferences since the early days of MongoDB?
Eliot: Do you mean our internal developers?
Jeff: I’m sorry I should have specified. Developers more generally. So in contrast to what we talked about early on where developers want to model their system as documents rather than relational tables, that’s a transition and I feel like in the nine years since then there have been plenty of other changes in the broader populous of developers in terms of what they prefer, how they like to build things and so when you’re thinking about product development what are those changes in developer preferences that you think about? Eliot: So, obviously one of the big ones is Cloud. When we first started building MongoDB the cloud was, Amazon for example was just starting. So now that is predominant and that was obviously a pretty big shift and it means that people are used to getting infrastructure very easily and very quickly. They’re used to things like Amazon RDS for spinning up clusters, they want tools like that to make it very easy. So that’s sort of one big area. The other really big area is services, third party services and you can combine that with micro services. So today applications tend to be built using a couple of internal micro services maybe more, maybe less and a bunch of different third party services. So if you want to send text messages you’re going to use a service for that, if you want to do image resizing you want to use a service for that.
When I did this a decade ago, and we wanted to do video conversion, we actually had to download software to, you know when people uploaded videos then we had to convert them ourselves on our own software and send them out and now you would never do that. Right now you just going to use a service to do video conversion or image resizing or anything of that nature. I think that is fundamentally changing the way applications are built, I think that is a big paradigm shift. Otherwise developers really just want the most effective tool for the jib. At the end of the day developers really want to be focusing on what value they are adding and what their end user is experiencing and not so much on work and I think we just need to make sure we’re always doing the right thing for that.
Jeff: What do you think is it that core of that core of that preference for services? So for example you mentioned text messaging. Why do people prefer Twilio to standing up some AWS server somewhere with some sort of open source API for handling text messages? Why do people prefer services?
Eliot: It really comes down to cost and simplicity. Doing it yourself requires you to actually get a server, learn how to run the software, make it highly available, make it redundant, handle issues when they come up. It’s a lot of work it’s complicated. And if you’re trying to build a new app and you want to add a feature, the last thing you want to do is take a week off of your main focus to go and spin this thing up. If you can use a service like Twilio, for example you can go sign up and 10 minutes later you’re sending text messages. So it’s just so much simpler. And again it’s all about focusing on actually building your application and not doing everything else.
So it really just comes down to how fast you can get things done, how productive you are. And again, none of these services… at some point maybe the services aren’t the most cost effective solution but you’ve got to go to pretty high end to get there. And or most people these services are just by far the most cost effective way and time efficient way to get things done.
Jeff: So that brings us to MongoDB Atlas which is a cloud service that you released for running MongoDB. Why did you release the Cloud service?
Eliot: Yes so it was pretty simple. When we talked to our users almost every user we talk to is running in the Cloud, would love to hand off management of their cluster to us. They don’t really want to manage it themselves, they just feel like they have to. And if it’s one less thing they have to worry about, one less thing they have to learn about, one less thing they have to setup and rather than having to learn how to manage MongoDB they can just let us do it for them; it’s a no brainer. And if you look at the way people use are using other databases in the cloud I think this is becoming more and more true for all databases, not just for MongoDB. And I think that any database that’s going to be incredibly popular now or in the future has to have a service like this or developers aren’t going to want to use it. It’s just so much easier for developers, for productivity and it just makes a lot of sense.
Jeff: What’s the difference from the developers’ point of view between running a cloud service for a database versus running and managing that database on a server for the Cloud service provider?
Eliot: It depends on what mood the developer is. When they’re writing code, none. It’s the exact same API, same rules, same everything. And that’s one of the nice things about it. It is exactly the same database; putting it in the exactly same way. But from an operations standpoint, the setup time goes from learning and downloading, figuring out how you’re going to set it up. So maybe hours or days to minutes and the maintenance goes almost away. When you want to resize, when you’ve got to go from a small cluster to a larger cluster, you can now click a few buttons in a UI or make an API call and then we do all the heavy lifting on the backend.
Not only are we doing it for you, but it’s also fully automated by our systems which are running not just your cluster but thousands of other clusters. So you’re actually getting the benefit of having thousands of clusters run by the same software. So the reliability just goes way up also. So it’s both a lot easier for you and a lot more reliable because we’re doing it at a much larger scale and are looking at all the edge cases and all the issues that could possibly arise and sort of know how to handle them.
Jeff: What kind of economies of scale do you at Mongo DD… what do you get out of running this database as a service.
Eliot: So my benefit is really two main things, one that people like being able to offload this work and they will pay for it. So in Atlas, we believe the most cost effective way for anyone to run MongoDB; so that’s both a benefit for us and for our clients. Another big benefit for us is we also get to have the most experience running MongoDB which makes MongoDB better for not just us but for everyone else. Lastly, the other really great thing about Atlas is that we think that users will be more successful with MongoDB if we can handle the management for them And again that’s good for everyone.
At the end of the day we want everyone to be as successful with MongoDB as they possibly can no matter where they are running it. But we believe that using MongoDB Atlas will make people as successful as possible which is good for everyone.
Jeff: For the developer, were the benefits around scalability, availability, redundancy, what advantages do you get out of this when you’re using a service versus hosting your own?
Eliot: So fundamentally the technology level it’s the same technology. So you have the same availability, the same shorting, same distribution. So the core pieces are the same. The difference is that when you’re using it as a service, you don’t have to configure it, you don’t have to set it up. If hardware goes… and you do get the added benefit that let’s say you’ve got a [indiscernible] [0:31:39] and you’ve got three nodes and one of the nodes actually physically goes away. Well the cluster is still up no matter how you do it, but if you’re using the service then we’re responsible for spinning up a new server, making sure the data gets synchronized. Essentially that we handle as opposed to you having to handle.
So on the availability side we make sure that works, on the scaling side, if you want to go from smaller nodes to bigger nodes to bigger nodes, we do that automatically for you and while maintaining high availability. If you want to scale your sharted cluster going from three shards to 10 shards, we sort of manage the process for you. So a lot of it comes down from the ease and reliability. You’re still going to have high reliability no matter how you deploy MongoDB. You’re still going to have scalability no matter how you do it. It’s just a question of how easy it is for you to get access to those things and how reliable are those things going to work all the time.
Jeff: Does the architecture of the MongoDB Atlas Service the experiences like this endless pool of compute that is giving you a database service. Is that the same experience that developers within MongoDB, the company, have? Did you replicate your own internal model for how people spin up databases?
Eliot: No. We build the model that we thought people wanted. We never really had an internal service for this; we sort of did what our clients did. Now all the new internal apps that use MongoDB, pretty much use Atlas except that we do still use a few of other management products just to make sure that we test all of them and use all of them internally, but most of our new internal things are going on to Atlas.
Jeff: Is Atlas built mostly on AWS? I think I read somewhere that it’s mostly on AWS, right.
Eliot: So right now you can only spin up Atlas sources on AWS. Next year we’ll be adding support for Azure and for Google Cloud. We really want this to be cross Cloud wherever you want to deploy your application we want to have the application running there. Next year we’ll also let you do cross Cloud clusters so you can have some nodes in Amazon, some nodes in Google, some nodes on Microsoft so you can be redundant across Cloud providers as well.
Jeff: Continuous integration gives you faster safer software delivery. With a continuous integration tool like Snap CI from ThoughtWorks, the members of your team can push changes independently of each other and they can all see their new builds running against different phases of tests before those changes make their way into production. The fastest moving companies that I’ve talked to on Software Engineering Daily are all using continuous integration. Snap CI from ThoughtWorks is available to anyone and if you go to snap.ci/softwareengineeringdaily you could check it out for yourself and support software engineering daily.
With just a few clicks I had my own continuous integration setup for some projects that were just sitting in my GitHub account without continuous integration. I got continuous integration up quite easily using Snap CI. If you want to be that hero at your company that starts moving your organization towards deploying often more confidently towards that DevOps dream, start working with Snap CI at snap.co/softwareengineeringdaily. Your coworkers will see you working with Snap CI and they will fall in love with it themselves. Often times it takes somebody at a bigger company to go out on their own and say okay I’m going to roll out CI even though nobody else at the company is using it and maybe that hero at the company is going to be you. So check out snap.ci/softwareengineeringdaily and thanks to ThoughtWorks for being a continued sponsor of Software Engineering Daily; it really means a lot.
Are there Amazon managed services that you use or do you avoid using them to avoid lock-in and keep your costs low?
Eliot: So we don’t use too many Amazon services, not for those reasons but mostly because we wanted to build something that would work across any cloud. So we wanted the same solution we have on Amazon today, we want to have on Google and Microsoft and so we really want to make sure that we use sort of the lowest level features so that we can replicate on whatever Clouds our clients want. And we don’t want to be the only ones that offer service on Amazon because we have a lot of clients who are interested in both Amazon, Azure and Google.
Jeff: So there are services on AWS that have similar analogues on Google or Azure, like an elastic load balancer. For example, I’m pretty sure that Google and Azure have some similar service, I don’t know for sure, but when you see something like that, does that lower the barrier to your desire of using a managed service?
Eliot: I think it would. I think that one of the things that is different about Atlas, is Atlas is actually a pretty low level function. So the main thing we use on EC2 is both the hardware and the VPCs in security groups. At the end of the day Atlas is really just about running MongoDB and so none of the tools that any of the clouds offer are going to let you manage a cluster, let you manage a Mongo cluster during rolling upgrades or scaling up a shard or cluster. So because it’s not sort of a complex web service or anything like that, the tools that we need from the Cloud service providers frankly aren’t all that complicated and so we’re able to use something that’s pretty low level that’s going to work everywhere.
Jeff: Why did you start with AWS?
Eliot: Simply because we basically go where our users tell us they want us to go and that is definitely where most of our users would like us to be first.
Jeff: Why is that? Because it seems like if they are just accessing the MongoDB Atlas Service it doesn’t matter what the Cloud service provider is, right?
Eliot: So there are two reasons why you sort of care. One is latency, whereas if your application servers are sitting in Amazon, it’s nice if your databases are in Amazon also just because it’s a little bit closer. And then two, on the security sine, so for example we can now… so every Atlas group gets their own Amazon VPC and now we can actually peer your Atlas VPC with your application VPCs so that you can keep things very secure. So you do not have to send your database traffic on to the public network at all.
So for those two reasons alone you’re interested in keeping things inside of one cloud and as you’re architecting your system. So it just makes it easier when you think of using services like Amazon Lambda or even if you’re just spinning up easy two incidences, being able to go through the internal Amazon network not over the public network with the lowest possible latency is pretty advantageous.
Jeff: Do the failure cases get more complex? Because we didn’t really discuss this but MongoDB has a lot of redundancy and failure resilience built into just kind of how it works but does that get more complicated for you as the service provider like node failures on AWS? Does working about that become complicated or maybe you can just walk me through what happens during a node failure on AWS.
Eliot: So it’s kind of interesting because it’s more complicated because we have to handle it at a larger scale but because we have to handle it at a large scale it’s all fully automated. So if a node goes down in your cluster we will just automatically know that and then replace it. So we have to build that software, we have to determine whether or node is down transiently or down permanently then we have to make a decision on what to do.
But because we have to do it in an automatic fashion we sort of have already built all that logic and all those systems. So on a day to day basis is actually very simple. He only complicated things that might arise and we’ve actually built things to handle this is let’s say you’re on an Amazon region and we support many different Amazon regions, but what I you’re in a region and the entire availability zone goes down and how do we handle that?
We’ve got our own playbooks for that and we’ve got systems that detect it, but again we’ve never experienced that. I don’t think it has happened in a couple of years, but it’s those kind of things that I think will be more complicated that we just haven’t seen yet.
Jeff: So, one of my favorite infrastructure episodes that I did was this episode about Dropbox moving off of the cloud. It was just like they moved off of AWS and it took them like two years to re-architect their infrastructure and build essentially their own version of Amazon S3 and write the software for their data centers. It was really epic, but the lesson that I took away from that is if you’re an infrastructure company, it can be erased. I mean Dropbox is not even an infrastructure I mean it’s kind of an infrastructure company but it’s more like user level service anybody could use it and it has this veneer of a really nice user interface and yet they still had to do this migration in order to keep their margins high. Do you think that’s a possibility in the future where you would want to move to your own hardware and in order to get margins that would be higher than you would get on a cloud service provider?
Elliot: Yeah I think we would never move. I think that people are going to want their infrastructure where they want it, but I don’t think it’s implausible to imagine us having our own data centers where you might be able to get Atlas at a different price point. I don’t think it will be a wholesome movement because sometime if you’re in Amazon some people are just going to want their database in the Amazon no matter what. It comes down to maybe you can get your MongoDB clusters elsewhere at a lower price.
Jeff: The Company WiredTiger was a company that Mongo acquired not too long ago and they make a storage engine and as I understand this acquisition was important to the creation of Atlas. So maybe we can talk a little about that, first of all what is a database storage engine?
Elliot: Yes a database storage engine is simply how the database puts data onto disks. How it stores it onto disk and how it handles the durability into the next finality of that. So when you write a Mongo, Mongo is going to do much of stuff, it’s going to look at indexes and determine what’s valid and do aggregation results that’s even sort of the core of Mongo. And then at the end of the day it’s going to insert something onto disk or get something off of disk and that’s what the storage engine is all about. And they need to do it obviously very likably and very quickly and very safely.
Jeff: Why are there different ways of doing that? Seems like that would be a very straightforward universally well studied sort of thing. What are the subjectivities in building a storage engine?
Elliot: So storage are simply very complicated. So a number of things have changed a lot over the last couple of decades. I think that if you look at the best storage engines of today versus 20 years ago, a lot of it looks the same, but if you think about the hardware that’s changed and how fast computers are to things like SSDs or spinning disks; to think about memory. 20 years ago you hardware memory, your hardware systems was very low compared to now. So the nature of how these things have to work is very different. The requirements are much more complicated and performance.
So there’s a number of storage engines. Now they’re actually that many popular storage engines out there. Just like databases, they’re not hundreds. There’s maybe a handful of really good storage engines. WiredTiger, the company was created by the same people who built BurkleyDB; which was sort of been predominantly most popular storage engine in the last almost 30 years. So real people who had this before and are really building the best modern storage engine for the new requirements and hardware that people use.
Jeff: What did WiredTiger do that you didn’t have with your own storage engine?
Eliot: WiredTiger led us… at the end of the day the main thing it brought on was a much higher 3.2 storage engine. So the concurrency in WiredTiger, the overall performance and latency of WiredTiger, was a lot better than our previous storage engine. So in parallel we were both working on our own storage and had plans on how to improve it and we’re also looking at other alternatives to how we can go much faster. At the end of the day, WiredTiger, we both really liked their product, we really liked the team and we thought it would not just let us solve what we needed to solve that day, but really accelerate our road map by a number of years. So it really made sense to us as we’ve acquired the company, the team and their product and switched it out of server main storage engine.
Jeff: Where there some ways that that acquisition empowered you to build this cloud service the MongoDB Atlas Service?
Eliot: I’m not sure this would have a direct impact for Atlas. I think that overall, it did two things. One it means that we’re able to do more things and have more use cases for MongoDB. So I think it overall impacted the overall business quite dramatically and so the secondary impact in Atlas are pretty large. Not really a direct correlation, more of a secondary correlation.
Jeff: I understand. So I'd like to zoom out a little bit and talk about databases more broadly. So there are these databases that call themselves NewSQL. These things like Crate, or MemSQL or Vault DB and I've done a number of shows about these. What is driving the creation of these new databases?
Eliot: So I think if you're specifically referring to what's driving the NewSQL databases, I think mostly that is around performance and around scalability. I mean people trying to solve those problems in the relational world and I think that's the main motivation there, not being at one of them.
Jeff: Can you talk more about that. What are the performance and scalability requirements that these different databases are attacking?
Eliot: So one of the challenges with relational databases as compared to Mongo is that let's say, let's go back to our user profile example. So if you've got 50 tables that represent your user profile, and for a given user you may have hundreds of rows that represent that user. If all 300 rows are sitting on one physical server, doing that join is costly but not insane. You probably will cache it somewhere in memcached or something but you can do it and it is okay. But now let’s say those 300 rows are spread across six different machines. Now that join is not just about getting 300 rows off disk, it’s navigating 300 rows from six different servers and putting it together. Now let’s say you want to modify a user atomically, but now you've got to do a transaction that goes across those six servers.
So with MongoDB that's all in one document so it's a lot simpler, but in the relational world, horizontal scaling has always been really hard and the solutions have always been very expensive. I think those NewSQL companies are trying to solve that problem at a cheaper more cost effective way. It's not that it hasn't been done before, it's just that they like to do it in a cheaper method. And for all the legacy applications that are un-relational, where moving to MongoDB might be more work than they can handle right now, but they do want a better more scalable database.
Jeff: Why is it that horizontal scaling is easier to handle in Mongo?
Eliot: So it really comes down to the data model once again. If you've got your user profile and it’s sitting in one document, the number of cases where you need to do joins across data types or documents or transactions across documents, it doesn't go to zero but it goes dramatically lower. And if you don’t have to do joins or transactions across documents, horizontal scaling becomes much easier because you don't have to do things like two-phase commits, you don't have to move as much data across the network and so it's just a simpler problem to solve and also a more scalable. So you can more easily have a scalable solution.
Jeff: So one trend that I've been doing some shows on, is this trend of building these bigger in-memory platforms are just… I guess this is a combination of needing to have faster access to large data sets as well as a decrease in cost in memory. I think those are both trends that are true; maybe I'm wrong. But how do you see that impacting the database landscape, like the rise of systems that keep more stuff in memory?
Eliot: So I think it really depends on the database industry. One of the things that we really try very hard at MongoDB to do is to let you figure MongoDB such that you can handle different kinds of things and do different kinds of things. Our distributed architecture lets us do that very nicely. So for example, we have our WiredTiger Storage Engine which is pretty traditional, stores data on disk, does what you'd expect from a database to do and we also have the in-memory storage for MongoDB.
You can run a single MongoDB cluster with some nodes in memory, and some nodes persistent as a regular database. Then if you want to keep your data in memory to have for the highest possible throughput or lowest possible latency you can do that, but you can also have the data persisted a disk so it's very safe. So for a lot of applications the thought of a throughput is too high for a non in memory database. But it’s really about handling spikes.
So let's say you're in the finance industry. From 9:00 AM to 4:00 PM your traffic is towards a magnitude higher than it is at other points of the day. If you're looking at a gaming thing, you get a game launch time or these you know really crazy spikes, you need to be able to handle those spikes but your throughput for the day isn't so crazy. And those kinds of used cases where in-memory is really good. Or if you've got a ton of data you need a really fast aggregations.
So for us at MongoDB, it's all about letting you do different kinds of things inside the same database cluster. So it's sort of from a developer standpoint, you don't have to think about it so much but then when you go and need to optimize it you've got the right knobs so you can do things pretty quickly. So for us both the in-memory trend in this large data trend worked pretty nicely, but I think for databases that can't do those kinds of things in the same cluster very easily, you may have trouble.
Jeff: The listeners have let me know that they like episodes about management. So you are the CTO of MongoDB, I picture this a unique opportunity to ask you about management because most of the shows I do are about… they tend to be like web application companies and I guess MongoDB is kind of becoming one with Atlas but you start out as a database company and I'm wondering what are the unique aspects of management within a database company? How do you think about features and support and new products and just how do you organize your day or is it no different than any other software company you've worked at?
Eliot: No. I think the biggest change with MongoDB from a lot of the companies you probably talk to is two things. One is that most people use MongoDB, they download it and apply it to their shipped software. It's not a service, it's not a web application. If there's a bug and we introduce a bug, we have to deal with it for years. We release a version of MongoDB and we have people running for the MongoDB from years ago and we have to support that, we have to make that work.
So that’s one big change, what changes a lot of what you do on the product side. The other big thing is that we can't break the database, we can't add features that add instability or add risk. The database has to work all the time, has to be the most reliable piece of your sack. But at the same time we are new database company so we do need to evolve, we need to add a lot of features, we need to continue to innovate.
So one of the big challenges that we have is how do you both innovate, add features quickly in an incredibly safe and reliable way? I think that is one of the pretty unique challenges that database companies have which has made it even more complicated because we also ship software predominantly. So I think those are pretty challenging. A lot of it comes down to just incredibly strong process around user design which is on the database side it's not about UI’s, it’s about using API design and query language design and making sure that we spend a lot of time focusing on those things because if we do something we have to live with it for years.
A huge amount of effort on tested and automated testing. We built our own continuous integration system because the testing infrastructure required for MongoDB is pretty vast and we need something that could do that very effectively and very efficiently across thousands of machines. So a lot of stuff and a lot of work on automated testing, a lot of work on a really good process around both finding issues, finding bugs, the design and so just a pretty rigorous engineering process that we've been developing over the last seven, eight years.
Jeff: Well now I have to ask you about that continuous integration system. How do you build a continuous integration system for a database?
Eliot: So if you think about it, there's two big challenges that we have. One of the challenge test across a lot of different systems. So we test automatically on, I don’t know the exact number, but I know like a bunch of different Linux versions. We test on different versions of Macro OS. We test on the version of Windows. We support Solaris, we support the IBMZ platform, we support the IBM PowerPC platform. We support ARM. So we support a number of different hardware platforms and even OS platforms and a lot of different versions across those platforms.
So I think it's about somewhere in 30 to 40 different hardware OS variants. That’s challenge number one. Challenge number two is that ignoring so tests that run for multiple days through the basic unit and functional test for MongoDB to run on a single box, on a single OS would take about, I believe the current numbers around 40 hours. So that’s obviously… so 40 hours times 30 to 40 variance is a lot of time. What we found over the number of years is that the longer it takes to run that test, the odds of bugs getting into master or conflicts happening goes up and so it's incredibly important to drive that down to a low as possible.
So we're working towards an SLA of about an hour from a commit going in to having a fully tested version of MongoDB binary. So that means we need to paralyze across each one, each operating system variant around 40 ways. So I just did a math on, it’s 1600 servers.
We also have systems for doing performance testing and every commit so that's added a more resources to that. And with also our developers who are developing on whatever operating system that they develop on to be able to very quickly in a matter of minutes push a change set to the CI system and test it not just on one operating system but on a number of systems with whatever test which they want to test.
At the end of the day there is nothing that we could find that did that. And in an efficient way and that is still going to do it in a cost effective way as well. So we have real hardware that's under this. We use the cloud for this. We bid on easy to spot instances. So it's a pretty complex system to make all this work both reliably and of the performance side you want to be predictable. So it's a pretty complex system.
Jeff: It’s interesting what you say about just really not being able to make breaking changes because you're a database. I don't mean to pick on Docker, but I hear people talking about Docker recently and just saying that the faith in the product has kind of been shaken because there are breaking changes on a regular basis and people just want a boring way to run containers. So yeah I guess that's maybe an important lesson for any infrastructure company.
Eliot: Yeah we spend an inordinate amount of time working around things so that users don't have to get impacted. Even on minor things. When we make an upgrade accidentally harder than it should be, it's a huge deal. So we spend a huge amount of time making sure upgrades require as little work as humanly possible from any of our users. It sort of a huge focus of ours. Otherwise, people just don't upgrade and they're using old version forever, they don't get to use new features. They don't get the new benefits of better systems as we spend a huge amount of time on that.
Jeff: Are there any business opportunities or product opportunities in the Mongo space or maybe even just in the database space that you wish somebody would build a business around?
Eliot: So I think if you ignore the ones that we are in or potentially going in, I think that one of the interesting things that's happening, I mentioned this before, is around services. I think that one of the interesting challenges that someone's going to solve at some point is how do you do an analytics across all these services. So at MongoDB we use dozens of different services to keep track of data, to do client things, whatever it is, website analytics. Doing analysis across is hard.
So we do, I think what a lot of people do now is we take a lot of the data and we dump it on MongoDB and then we analyze it and that works but it's not, it doesn’t feel right to me at least. I do think there’s going to be some solution over the next couple of years that’s developed, that lets you do things like analyze data across all your services without having to put it all one place that it's actually cost effective.
We can do that pretty easily now because it’s in all databases, we know a lot about how to move data around, we help clients with this all the time, but it's not a fundamentally easy or pleasant thing to do. I think over time now there should be better system to doing that especially as more and more applications are built using lots of different services, I think it’s going to make bigger and bigger challenge. That's definitely an area that I'm personally pretty interested in someone else trying to solve.
Jeff: Absolutely. Yeah these services are certainly making life a lot easier as a developer. I remember writing programs 10 or… not 10 years ago but maybe eight years ago, six to seven or eight years ago and just the amount of upfront work you had to do relative today was copious. But anyway Eliot, I want to thank you for coming on the show. It's been great talking to you and I'm really happy to have MongoDB as a sponsor of Software Engineering Daily. So thanks again for coming on the show.
Eliot: Great, thank you Jeff.