Scaling the Gaming Industry with Gaspard Petit of Square Enix
Rate this podcast
is one of the most popular gaming brands in the world. They're known for such franchise games as Tomb Raider, Final Fantasy, Dragon Quest, and more. In this article, we provide a transcript of the MongoDB Podcast episode in which Michael and Nic sit down with Gaspard Petit, software architect at Square Enix, to talk about how they're leveraging MongoDB, and his own personal experience with MongoDB as a data platform.
Gaspard Petit (00:00): Hi everybody, this is Gaspard Petit. I'm from Square Enix. Welcome to this MongoDB Podcast.
Gaspard Petit (00:09): MongoDB was perfect for processes, there wasn't any columns predefined, any schema, we could just add fields. And why this is important as designers is that we don't know ahead of time what the final game will look like. This is something that evolves, we do a prototype of it, you like it, you don't like it, you undo something, you redo something, you go back to something you did previously, and it keeps changing as the game evolves. It's very rare that I've seen a game production go straight from point A to Z without twirling a little bit and going back and forth. So that back and forth process is cumbersome. For the back-end, where the requirements are set in stone, you have to deliver it so the game team can experience it, and then they'll iterate on it. And if you're set in stone on your database, and each time you change something you have to migrate your data, you're wasting an awful lot of time.
Michael Lynn (00:50): Welcome to the show. On today's episode, we're talking with Gaspard Petit of the Square Enix, maker of some of the best-known, best-loved games in the gaming industry. Today, we're talking about how they're leveraging MongoDB and a little bit about Gaspard's journey as a software architect. Hope you enjoy this episode.
Automated (01:07): You're listening to the MongoDB podcast, exploring the world of software development, data, and all things MongoDB. And now your hosts, Michael Lynn and Nic Raboy.
Michael Lynn (01:26): Hey, Nic. How you doing today?
Nic Raboy (01:27): I'm doing great, Mike. I'm really looking forward to this episode. I've been looking forward to it for what is it? More than a month now because it's really one of the things that hits home to me, and that's gaming. It's one of the reasons why I got into software development. So this is going to be awesome stuff. What do you think, Mike?
Michael Lynn (01:43): Fantastic. I'm looking forward to it as well. And we have a special guest, Gaspard Petit, from Square Enix. Welcome to the podcast, it's great to have you on the show.
Gaspard Petit (01:51): Hi, it's good to be here.
Michael Lynn (01:52): Fantastic. Maybe if you could introduce yourself to the folks and let folks know what you do at Square Enix.
Gaspard Petit (01:58): Sure. So I'm software online architect at Square Enix. I've been into gaming pretty much my whole life. And when I was a kid that was drawing game levels on piece of papers with my friends, went to university as a software engineer, worked in a few companies, some were gaming, some were around gaming. For example, with Autodesk or Softimage. And then got into gaming, first game was a multiplayer game. And it led me slowly into multiplayer games. First company was at Behaviour and then to Eidos working on the reboot of Tomb Raider on the multiplayer side. Took a short break, went back into actually a company called Datamine, where I learned about the back-end how to work. It wasn't on the Azure Cloud at the time. And I learned a lot about how to do these processes on the cloud, which turned out to be fascinating how you can converge a lot of requests, a lot of users into a distributed environment, and process this data efficiently.
Gaspard Petit (03:03): And then came back to Square Enix as a lead at the time for the internally, we call it our team, the online suite, which is a team in charge of many of the Square Enix's game back-ends. And I've been there for a couple of years now. Six years, I think, and now became online architect. So my role is making sure we're developing in the right direction using the right services, that our solutions will scale, that they're appropriate for the needs of the game team. That we're giving them good online services basically, and that they're also reliable for the users.
Nic Raboy (03:44): So the Tomb Raider reboot, was that your first big moment in the professional game industry, or did you have prior big moments before that?
Gaspard Petit (03:54): I have to say it was probably one of the ones I'm most proud of. To be honest, I worked on a previous game, it was called Naughty Bear. It wasn't a great success from the public's point of view, the meta critics weren't great. But the team I worked on was an amazing team, and everyone on that team was dedicated. It was a small team, the challenges were huge. So from my point of view, that game was a huge success. It didn't make it, the public didn't see it that way. But the challenges, it was a multiplayer game. We had the requirements fairly last-minute to make this a multiplayer game. So we had to turn in single player into multiplayer, do the replication. A lot of complicated things in a short amount of time. But with the right team, with the right people motivated. To me, that was my first gaming achievement.
Michael Lynn (04:49): You said the game is called Naughty Bear?
Gaspard Petit (04:51): Naughty Bear, yes.
Michael Lynn (04:52): What type of game is that? Because I'm not familiar with that.
Gaspard Petit (04:55): No, not many people are. It's a game where you play a teddy bear waking up on an island. And you realize that there's a party and you're not invited to that party. So you just go postal and kill all the bears on the island pretty much. But there's AI involved, there's different ways of killing, there's different ways of interacting with those teddy bears. And of course, there's no blood, right? So it's not violence. It's just plain fun, right? So it's playing a little bit on that side, on the-
Michael Lynn (05:23): Absolutely.
Gaspard Petit (05:26): But it's on a small island, so it's very limited. But the fun is about the AI and playing with friends. So you can play as the bears that are trying to hide or as the bear that's trying to carnage the island.
Gaspard Petit (05:41): This is pretty much what introduced me to leaderboards, multiplayer replication. We didn't have any saved game. It was over 10 years ago, so the cloud was just building up. But you'd still have add matchmaking features, these kind of features that brought me into the online environment.
Nic Raboy (05:59): Awesome. In regards to your Naughty Bear game, before we get into the scoring and stuff, what did you use to develop it?
Gaspard Petit (06:05): It was all C++, a little bit of Lua back then. Like I said, on the back-end side, there wasn't much to do. We used the first party API's which were C++ connected to their server. The rest was a black box. To me at the time, I didn't know how matchmaking worked or how all these leaderboards worked, I just remember that it felt a bit frustrating that I remember posting scores, for example, to leaderboards. And sometimes it would take a couple of seconds for the rank to be updated. And I remember feeling frustration about that. Why isn't this updated right away? I've just posted my score and can take a minute or two before my rank is updated. And now that I'm working back-end, I totally get it. I understand the volume of scores getting posted, the ranking, the sorting out, all the challenges on the back-end. But to me back then it was still a black box.
Michael Lynn (06:57): So was that game leveraging MongoDB as part of the back-end?
Gaspard Petit (07:01): No, no, no. Like I said, it wasn't really on the cloud. It was just first party API. I couldn't tell you what Microsoft, Sony is using. But from our point of view, we were not using any in-house database. So that was a different company, it was at Behaviour.
Michael Lynn (07:19): And I'm curious as an early developer in your career, what things did you learn about game development that you still take with you today?
Gaspard Petit (07:28): I think a lot of people are interested in game development for the same reasons I am. It is very left and right brain, you have a lot of creativity, you have to find ways to make things work. Sometimes you're early on in a project and you get a chance to do things right. So you architect things, you do the proper design, you even sometimes draw UML and organize your objects so that it's all clean, and you feel like you're doing theoretical and academic almost work, and then the project evolves. And as you get closer to the release date, this is not something that will live forever, it's not a product that you will recycle, and needs to be maintained for the next 10 years. This is something you're going to ship and it has to work on ideally on the day you ship it.
Gaspard Petit (08:13): So you start shifting your focus saying, "This has to work no matter what. I have to find a solution. There's something here that doesn't work." And I don't have time to find a proper design to refactor this, I just have to make it work. And you shift your way of working completely into ship it, make it work, find a solution. And you get into a different kind of creativity as a programmer. Which I love, which is also scary some time because you put this duct tape in your code and it works. And you'rE wondering, "Should I feel right about shipping this?" And actually, nobody's going to notice and it's going to hold and the game will be fun. And it doesn't matter that you have this duct tape somewhere. I think this is part of the fun of shaping the game, making it work at the end no matter what. And it doesn't have to be perfectly clean, it has to be fun at the end.
Gaspard Petit (09:08): This is definitely one aspect of it. The other aspect is the real-time, you want to hit 30fps or 60fps or more. I'm sure PC people are now demanding more. But you want this frame rate, and at the same time you want the AI, and you want the audio, and you want the physics and you want everything in that FPS. And you somehow have to make it all work. And you have to find whatever trick you can. If you can pre-process things on their hard drive assets, you do it. Whatever needs you can optimize, you get a chance to optimize it.
Gaspard Petit (09:37): And there's very few places in the industry where you still get that chance to optimize things and say, "If I can remove this one millisecond somewhere, it will have actually an impact on something." Back-end has that in a way. MongoDB, I'm sure if you can remove one second in one place, you get that feeling of I can now perform this amount of more queries per second. But the game also has this aspect of, I'll be able to process a little bit more, I'll be able to load more assets, more triangles, render more things or hit more bounding boxes. So the performance is definitely an interesting aspect of the game.
Nic Raboy (10:12): You spent a lot of time doing the actual game development being the creative side, being the performance engineer, things like that. How was the transition to becoming an online architect? I assume, at least you're no longer actually making what people see, but what people experience in the back-end, right? What's that like?
Gaspard Petit (10:34): That's right. It wasn't an easy transition. And I was the lead on the team for a couple of years. So I got that from a few candidates joining the team, you could tell they wish they were doing gameplay or graphics, and they got into the back-end team. And it feels like you're, "Okay, I'll do that for a couple of years and then I'll see." But it ended up that I really loved it. You get a global view of the players what they're doing, not just on a single console, you also get to experience the game as it is live, which I didn't get to experience when I was programming the game, you program the game, it goes to a disk or a digital format, it's shipped and this is where Julian, you take your vacation after when a game has shipped.
Gaspard Petit (11:20): The exhilaration of living the moment where the game is out, monitoring it, seeing the player while something disconnect, or having some problems, monitoring the metrics, seeing that the game is performing as expected or not. And then you get into other interesting things you can do on the back-end, which I couldn't do on the game is fixing the game after it has shipped. So for example, you discovered that the balancing is off. Something on the game doesn't work as expected. But you have a way of somehow figuring out from the back-end how you can fix it.
Gaspard Petit (11:54): Of course, ideally, you would fix in the game. But nowadays, it's not always easy to repackage the game on each platform and deliver it on time. It can take a couple of weeks to fix it to fix the game from the code. So whatever we can fix from the back-end, we do. So we need to have the proper tools for monitoring this humongous amount of data coming our way. And then we have this creativity kicking in saying, "Okay, I've got this data, how can I act on it to make the game better?" So I still get those feelings from the back-end.
Michael Lynn (12:25): And I feel like the line between back-end and front-end is really blurring lately. Anytime I get online to play a game, I'm forced to go through the update process for many of the games that I play. To what degree do you have flexibility? I'll ask the question this way. How frequently Are you making changes to games that have already shipped?
Gaspard Petit (12:46): It's not that frequent. It's not rare, either. It's somewhere in between. Ideally, we would not have to make any changes after the game is out. But in practice, the games are becoming so complex, they no longer fit on a small 32 megabyte cartridge. So there's a lot of things going on in the game. They're they're huge. It's almost impossible to get them perfectly right, and deliver them within a couple of years.
Gaspard Petit (13:16): And there's also a limitation to what you can test internally. Even with a huge team of QA, you will discover things only when players are experiencing the game. Like I said the flow of fixing the game is long. You hear about the report on Reddit or on Twitter, and then you try to reproduce it internally right there. It might take a couple of days to get the same bug the player has reported. And then after that, you have to figure out in the code how you can fix it, make sure you don't break anything else. So it can take literally weeks before you fix something very trivial.
Gaspard Petit (13:55): On the back-end, if we can try it out, we can segment a specific fix for a single player, make sure for that player it works. Do some blue-green introduction of that test or do it only on staging first, making sure it works, doing it on production. And within a couple of sometimes I would say, a fix has come out in a couple of hours in some case where we noticed it on production, went to staging and to production within the same day with something that would fix the game.
Gaspard Petit (14:25): So ideally, you would put as much as you can on the back-end because you have so much agility from the back-end. I know players are something called about this idea of using back-ends for game because they see it as a threat. I don't think they realize how much they can benefit from fixes we do on the back-end.
Nic Raboy (14:45): So in regards to the back-end that you're heavily a part of, what typically goes in to the back-end? I assume that you're using quite a few tools, frameworks, programming languages, maybe you could shed some light onto that.
Gaspard Petit (14:57): Oh yes, sure. So typically, in almost every project, there is some telemetry that is useful for us to monitor that the game is working like I said, as expected. We want to know if the game is crashing, we want to know if players are stuck on the level and they can't go past through it. If there's an achievement that doesn't lock or something that shouldn't be happening and doesn't happen. So we want to make sure that we're monitoring these things.
Gaspard Petit (15:23): There's, depending on the project, we have community features. For example, comparing what you did in the life experience series to what the community did, and sometime it will be engagements or creating challenges that will change on a weekly basis. In some cases recently for outriders for example, we have the whole save game saved online, which means two things, right? We can get an idea of the state of each player, but we can also fix things. So it really depends on the project. It goes from simple telemetry, just so we know that things are going okay, or we can act on it to adding some game logic on the back-end getting executed on the back-end.
Michael Lynn (16:09): And what are the frameworks and development tools that you leverage?
Gaspard Petit (16:12): Yes, sorry. So the back-ends, we write are written in Java. We have different tools, we use outside of the back-end. We deploy on Kubernetes. Almost everything is Docker eyes at this point. We use MongoDB as the main storage. Redis as ephemeral storage. We also use Kafka for the telemetry pipeline to make sure we don't lose them and can process them asynchronously. Jenkins for building. So this is pretty much our environment.
Gaspard Petit (16:45): We also work on the game integration, this is in C++ and C#. So our team provides and actually does some C++ development where we try to make a HTTP client, C++ clients, that is cross platform and as efficient as possible. So at least impacting the frame rate. Even sometimes it means downloading things a little bit slower or are not ticking as many ticks. But we customize our HTTP client to make sure that the online impact is minimal on the gameplay. So our team is in charge of both this client integration into the game and the back-end development.
Michael Lynn (17:24): So those HTTP clients, are those custom SDKs that you're providing your own internal developers for using?
Gaspard Petit (17:31): Exactly, so it's our own library that we maintain. It makes sure that what we provide can authenticate correctly with the back-end as a right way to communicate with it, the right retries, the right queuing. So we don't have to enforce through policies to each game themes, how to connect to the back-end. We can bundle these policies within the SDK that we provide to them.
Michael Lynn (17:57): So what advice would you have for someone that's just getting into developing games? Maybe some advice for where to focus on their journey as a game developer?
Gaspard Petit (18:08): That's a great question. The advice I would give is, it starts of course, being passionate about it. You have to because there's a lot of work in the gaming, it's true that we do a lot of hours. If we did not enjoy the work that we did, we would probably go somewhere else. But it is fun. If you're passionate about it, you won't mind as much because the success and the feeling you get on each release compensates the effort that you put into those projects. So first, you need to be passionate about it, you need to be wanting to get those projects and be proud of them.
Gaspard Petit (18:46): And then I would say not to focus too much on one aspect of gaming because at first, I did several things, right? My studies were on the image processing, I wanted to do 3D rendering. At first, that was my initial goal as a teenager. And this is definitely not what I ended up doing. I did almost everything. I did a little bit of rendering, but almost none. I ended up in the back-end. And I learned that almost every aspect of the game development has something interesting and challenging.
Gaspard Petit (19:18): So I would say not too much to focus on doing the physics or the rendering, sometime you might end up doing the audio and that is still something fascinating. How you can place your audio within the scene and make it sound like it comes from one place, and hit the walls. And then in each aspect, you can dig and do something interesting. And the games now at least within Square Enix they're too big for one person to do it all. So it's generally, you will be part of a team anyway. And within that team, there will be something challenging to do.
Gaspard Petit (19:49): And even the back-end, I know not so many people consider back-end as their first choice. But I think that's something that's actually a mistake. There is a lot of interesting things to do with the back-end, especially now that there is some gameplay happening on back-ends, and increasingly more logic happening on the back-end. I don't want to say that one is better than the other, of course, but I would personally not go back, and I never expected to love it so much. So be open-minded and be passionate. I think that's my general advice.
Michael Lynn (20:26): So speaking of back-end, can we talk a little bit about how Square Enix is leveraging MongoDB today?
Gaspard Petit (20:32): So we've been using MongoDB for quite some time. When I joined the team, it was already been used. We were on, I think version 2.4. MongoDB had just implemented authentication on collections, I think. So quite a while ago, and I saw it evolve over time. If I can share this, I remember my first day on the team hitting MongoDB. And I was coming from a SQL-like world, and I was thinking, "What is this? What is this query language and JSON?" And of course, I couldn't query anything at first because it all seemed the syntax was completely strange to me. And I didn't understand anything about sharding, anything about chunking, anything about how the database works. So it actually took me a couple of months, I would say before I started appreciating what Mongo did, and why it had been picked.
Gaspard Petit (21:27): So it has been recommended, if I remember, I don't want to say incorrect things. But I think it had been recommended before my time. It was a consulting team that had recommended MongoDB for the gaming. I wouldn't be able to tell you exactly why. So over time, what I realized is that MongoDB was perfect for our processes because there wasn't any columns predefine, any schema, we could just add fields. If the fields were missing, it wasn't a big deal, we could encode in the back-end, and we could just set them to default values.
Gaspard Petit (22:03): And why this is important is because the game team generally doesn't know. I don't want to say the game team actually, the designers or the producer, they don't know ahead of time, what the final game will look like, this is something that evolves. You play, you do a prototype of it, you like it, you don't like it, you undo something, you redo something, you go back to something you did previously, and it keeps changing as the game evolves. It's very rare that I've seen a game production go straight from point A to Z without twirling a little bit and going back and forth.
Gaspard Petit (22:30): So that back and forth process is cumbersome for the back-end. You're asked to implement something before the requirements are set in stone, you have to deliver it so the game team can experience it and then we'll iterate on it. And if you're set in stone on your database, and each time that you change something, you have to migrate your data, you're wasting an awful lot of time. And after, like I said, after a couple of months that become obvious that MongoDB was a perfect fit for that because the game team would ask us, "Hey, I need now to store this thing, or can you change this type for that type?" And it was seamless, we would change a string for an integer or a string, we would add a field to a document and that was it. No migration. If we needed, the back-end would catch the cases where a default value was missing. But that was it.
Gaspard Petit (23:19): And we were able to progress with the game team as they evolved their design, we were able to follow them quite rapidly with our non-schema database. So now I wouldn't switch back. I've got used to the JSON query language, I think human being get used to anything. And once you're familiar with something, you don't want to learn something else. And I ended up learning the SQL Mongo syntax, and now I'm actually very comfortable with it. I do aggregation on the command line, these kinds of things. So it's just something you have to be patient off if you haven't used MongoDB before. At first, it looks a little bit weird, but it quickly becomes quite obvious why it is designed in a way. It's actually very intuitive to use.
Nic Raboy (24:07): In regards to game development in general, who is determining what the data should look like? Is that the people actually creating the local installable copy of the game? Or is that the back-end team deciding what the model looks like in general?
Gaspard Petit (24:23): It's a mix of both. Our team acts as an expert team, so we don't dictate where the back-end should be. But since we've been on multiple projects, we have some experience on the good and bad patterns. And in MongoDB it's not always easy, right? We've been hit pretty hard with anti-patterns in the past. So we would now jump right away if the game team asks us to store something in a way that we knew would not perform well when scaling up. So we're cautious about it, but it in general, the requirements come from the game team, and we translate that into a database schema, which says in a few cases, the game team knows exactly what they want. And in those cases, we generally just store their data as a raw string on MongoDB. And then we can process it back, whether it's JSON or whatever other format they want. We give them a field saying, "This belongs to you, and use whatever schema you want inside of it."
Gaspard Petit (25:28): But of course, then they won't be able to insert any query into that data. It's more of a storage than anything else. If they need to perform operations, and we're definitely involved because we want to make sure that they will be hitting the right indexes, that the sharding will be done properly. So it's a combination of both sides.
Michael Lynn (25:47): Okay, so we've got MongoDB in the stack. And I'm imagining that as a developer, I'm going to get a development environment. And tell me about the way that as a developer, I'm interacting with MongoDB. And then how does that transition into the production environment?
Gaspard Petit (26:04): Sure. So every developer has a local MongoDB, we use that for development. So we have our own. Right now is docker-compose image. And it has a full virtual environment. It has all the other components I mentioned earlier, it has Kafka, it even LDAP, it has a bunch of things running virtually including MongoDB. And it is even configured as a sharded cluster. So we have a local sharded cluster on each of our machine to make sure that our queries will work fine on the actual sharded cluster. So it's actually very close to production, even though it's on our local PC. And we start with that, we develop in Java and write our unit test to make sure we cover what we write and don't have regression. And those unit tests will run against a local MongoDB instance.
Gaspard Petit (26:54): At some point, we are about to release something on production especially when there's a lot of changes, we want to make sure we do load testing. For our load testing, we have something else and I am not sure that that's a very well known feature from MongoDB, but it's extremely useful for us. It's the MongoDB Operator, which is an operator within Kubernetes. And it allows spinning up clusters based on the simple YAML. So you can say, "I want a sharded cluster with three deep, five shards," and it will spin it up for you, it will take a couple of seconds a couple of minutes depending on what you have in your YAML. And then you have it. You have your cluster configured in your Kubernetes cluster. And then we run our tests on this. It's a new cluster, fresh. Run the full test, simulate millions of requests of users, destroy it. And then if we're wondering you know what? Does our back-end scale with the number of shards? And then we just spin up a new shard cluster with twice the number of shards, expect twice the performance, run the same test. Again, if we don't have one. Generally, we won't get that exactly twice the performance, right? But it will get an idea of, this operation would scale with the number of shards, and this one wouldn't.
Gaspard Petit (28:13): So that Operator is very useful for us because it'll allow us to simulate these scenarios very easily. There's very little work involved in spinning up these Kubernetes cluster.
Gaspard Petit (28:23): And then when we're satisfied with that, we go to Atlas, which provides us the deployment of the CloudReady clusters. So this is not me personally who does it, we have an ops team who handle this, but they will prepare for us through Atlas, they will prepare the final database that we want to use. We work together to find the number of shards, the type of instance we want to deploy. And then Atlas takes care of it. We benefit from disk auto-scaling on Atlas. We generally start with lower instance, to set up the database when the big approaches for the game release, we scale up instance type again, through Atlas.
Gaspard Petit (29:10): In some cases, we've realized that the number of shards was insufficient after testing, and Atlas allows us to make these changes quite close to the launch date. So what that means is that we can have a good estimate a couple of weeks before the launch of our requirements in terms of infrastructure, but if we're wrong, it doesn't take that long to adjust and say, "Okay, you know what? We don't need five shards, we need 10 shards." And especially if you're before the launch, you don't have that much data. It just takes a couple of minutes, a couple of hours for Atlas to redeploy these things and get the database ready for us. So it goes in those three stages of going local for unit testing with our own image of Mongo. We have a Kubernetes cluster for load testing which use the Mongo Operator, and then we use Atlas in the end for the actual cloud deployment.
Gaspard Petit (30:08): We actually go one step further when the game is getting old and load is predictable on it. And it's not as high as it used to be, we move this database in-house. So we have our own data centers. And we will actually share Mongo instances for multiple games. So we co-host multiple games on a single cluster, not single database, of course, but a single Mongo cluster. And that becomes very, very cost effective. We get to see, for example, if there's a sales on one game, while the other games are less active, it takes a bit more load. But next week, something else is on sales, and they kind of average out on that cluster. So older games, I'm talking like four or five years old games tend to be moved back to on-premises for cost effectiveness.
Nic Raboy (31:00): So it's great to know that you can have that choice to bring games back in when they become old, and you need to scale them down. Maybe you can talk about some of the other benefits that come with that.
Gaspard Petit (31:12): Yeas. And while it also ties in to the other aspects I mentioned of. We don't feel locked with MongoDB, we have options. So we have the Atlas option, which is extremely useful when we launch a game. And it's high risk, right? If an incident happened on the first week of a game launch, you want all hands on deck and as much support as you can. After a couple of years, we know the kind of errors we can get, we know what can go wrong with the back-end. And generally the volume is not as high, so we don't necessarily need that kind of support anymore. And there's also a lot of overhead on running things on the cloud, if you're on the small volume. There's not just the Mongo itself, there's the pods themselves that need to run on a compute environment, there's the traffic that is counting.
Gaspard Petit (32:05): So we have that data center. We actually have multiple data centers, we're lucky to be big enough to have those. But it gives us this extra option of saying, "We're not locked to the cloud, it's an option to be on the cloud with MongoDB." We can run it locally on a Docker, we can run it on the cloud, where we can control where we go. And this has been a key element in the architecture of our back-ends from the start actually, making sure that every component we use can be virtualized, brought back on-premises so that we can control locally. For example, we can run tests and have everything controlled, not depending on the cloud. But we also get the opportunity of getting an external team looking at the project with us on the critical moments. So I think we're quite happy to have those options of running it wherever we want.
Michael Lynn (32:56): Yeah, that's clearly a benefit. Talk to me a little bit about the scale. I know you probably can't mention numbers and transactions per second and things like that. But this is clearly one of the challenges in the gaming space, you're going to face massive scale. Do you want to talk a little bit about some of the challenges that you're facing, with the level of scale that you're achieving today?
Gaspard Petit (33:17): Yes, sure. That's actually one of the challenging aspects of the back-end, making sure that you won't hit a ceiling at some point or an unexpected ceiling. And there's always one, you just don't always know which one it is. When we prepare for a game launch, regardless of its success, we have to prepare for the worst, the best success. I don't know how to phrase that. But the best success might be the worst case for us. But we want to make sure that we will support whatever number of players comes our way. And we have to be prepared for that.
Gaspard Petit (33:48): And depending on the scenarios, it can be extremely costly to be prepared for the worst/best. Because it might be that you have to over scale right away, and make sure that your ceiling is very high. Ideally, you want to hit something somewhere in the middle where you're comfortable that if you were to go beyond that, you would be able to adjust quickly. So you sort of compromise between the cost of your launch with the risk and getting to a point where you feel comfortable saying, "If I were to hit that and it took 30 minutes to recover, that would be fine." Nobody would mind because it's such a success that everyone would understand at that point. That ceiling has to be pretty high in the gaming industry. We're talking millions of concurrent users that are connecting within the same minute, are making queries at the same time on their data. It's a huge number. It's difficult, I think, even for the human mind to comprehend these numbers when we're talking millions.
Gaspard Petit (34:50): It is a lot of requests per second. So it has to be distributed in a way that will scale, and that was also one of the things that I realized Mongo did very well with the mongos and the mongod split to a sharded cluster, where you pretty much have as many databases you want, you can split the workload on as many database as you want with the mongos, routing it to the right place. So if you're hitting your ceiling with two shards, and you had two more shards, in theory, you can get twice the volume of queries. For that to work, you have to be careful, you have to shard appropriately. So this is where you want to have some experience and you want to make sure that your shard keys is well picked. This is something we've tuned over the years that we've had different experience with different shard keys.
Gaspard Petit (35:41): For us, I don't know if everyone in the gaming is doing it this way, but what seems to be the most intuitive and most convenient shard key is the user ID, and we hash it. This way it goes to... Every user profile goes to a random shard, and we can scale Mongo within pretty much the number of users we have, which is generally what tends to go up and down in our case.
Gaspard Petit (36:05): So we've had a couple of projects, we've had smaller clusters on one, two. We pretty much never have one shard, but two shards, three shards. And we've been up to 30 plus shards in some cases, and it's never really been an issue. The size, Mongo wise, I would say. There's been issues, but it wasn't really with the architecture itself, it was more of the query pattern, or in some cases, we would pull too much data in the cache. And the cache wasn't used efficiently. But there was always a workaround. And it was never really a limitation on the database. So the sharding model works very well for us.
Michael Lynn (36:45): So I'm curious how you test in that type of scale. I imagine you can duplicate the load patterns, but the number of transactions per second must be difficult to approximate in a development environment. Are you leveraging Atlas for your production load testing?
Gaspard Petit (37:04): No. Well, yes and no. The initial tests are done on Kubernetes using the Mongo Operator. So this is where we will simulate. For one operation, we will test will it scale with instance type? So adding more CPU, more RAM, will it scale with number of shards? So we do this grid on each operation that the players might be using ahead of time. At some point, we're comfortable that everything looks right. But testing each operation individually doesn't mean that they will all work fine, they will all play fine when they're mixed together. So the final mix goes through either the production database, if it's not being used yet, or a copy is something that it would look like the production database in Atlas.
Gaspard Petit (37:52): So we spin up a Atlas database, similar to the one we expect to use in production. And we run the final load test on that one, just to get clear number with their real components, what will it look like. So it's not necessarily the final cluster we will use, sometimes it's a copy of it. Depending if it's available, sometimes there's already certification ongoing, or QA is already testing on production. So we can't hit the production database for that, so we just spin a different instance of it.
Nic Raboy (38:22): So this episode has been fantastic so far, I wanted to leave it open for you giving us or the listeners I should say, any kind of last minute words of wisdom or any anything that we might have missed that you think would be valuable for them to walk away with.
Gaspard Petit (38:38): Sure. So maybe I can share something about why I think we're efficient at what we do and why we're still enjoying the work we're doing. And it has to do a little bit with how we're organized within Square Enix with the different teams. I mentioned earlier that with our interaction with the game team was not so much to dictate how the back-end should be for them, but rather to act as experts. And this is something I think we're lucky to have within Square Enix, where our operation team and our development team are not necessarily acting purely as service providers. And this touches Mongo as well, the way we integrate Mongo in our ecosystem is not so much at... It is in part, "Please give us database, please make sure they're healthy and working and give us support when we need it." But it's also about tapping into different teams as experts.
Gaspard Petit (39:31): So Mongo for us is a source of experts where if we need recommendations about shards, query patterns, even know how to use a Java driver. We get a chance to ask MongoDB experts and get accurate feedback on how we should be doing things. And this translate on every level of our processes. We have the ops team that will of course be monitoring and making sure things are healthy, but they're also acting as experts to tell us how the development should be ongoing or what are the best practices?
Gaspard Petit (40:03): The back-end dev team does the same thing with the game dev team, where we will bring them our recommendations of how the game should use, consume the services of the back-end, even how they should design some features so that it will scale efficiently or tell them, "This won't work because the back-end won't scale." But act as experts, and I think that's been key for our success is making sure that each team is not just a service provider, but is also bringing expertise on the table so that each other team can be guided in the right direction.
Gaspard Petit (40:37): So that's definitely one of the thing that I appreciate over my years. And it's been pushed down from management down to every developers where we have this mentality of acting as experts to others. So we have that as embedded engineers model, where we have some of our folks within our team dedicated to the game teams. And same thing with the ops team, they have the dedicated embedded engineers from their team dedicated to our team, making sure that we're not in silos. So that's definitely a recommendation I would give to anyone in this industry, making sure that the silos are broken and that each team is teaching other teams about their best practices.
Michael Lynn (41:21): Fantastic. And we love that customers are willing to partner in that way and leverage the teams that have those best practices. So Gaspard, I want to thank you for spending so much time with us. It's been wonderful to chat with you and to learn more about how Square Enix is using MongoDB and everything in the game space.
Gaspard Petit (41:40): Well, thank you very much. It was a pleasure.
Automated (41:44): Thanks for listening. If you enjoyed this episode, please like and subscribe. Have a question or a suggestion for the show? Visit us in the MongoDB community forums at community.mongodb.com.