Santa Claus and His Distributed Team

Norberto Leite


So here it comes again, that happy season when Santa visits and brings the presents that we've earned for being so "well behaved" throughout the year.

Well… it happens that there are a lot of requests (+6 billion of those!), requiring the Elves to build a very scalable system to support so many gift deliveries. Not only handling the total number of requests, but also the concentration of these requests around the holiday season.

And of course, do not forget about the different types of websites that we need to build according to the regional variations of the requests (wool pajamas are likely to be requested next to the North Pole but not so much in the tropics!). To make sure that all requests are well served we need to have different applications serving this immense variability.

Architecture design

In order to deliver all presents in time for Christmas, Santa has asked Elf Chief Architect (ECA) to build a distributed, globally scalable platform so that regional Elves could build their apps to meet the needs of their local users.

Apparently Elf Chief Architect has been attending several MongoDB Days Conferences and will certainly be attending MongoDB World, but one of the things that intrigued Elf Chief Architect was a talk around distributed container platforms supported by a MongoDB sharded cluster. Elf Chief Architect has a great deal of experience scaling databases using MongoDB, but the use of containers started gaining a lot of traction so he decided to give it a go.

First thing that the ECA did was to deploy a container fleet on different data centers across the world:

Schema design

Using tag aware sharding, the Elfs team in Brazil can now build their app for South America with complete independence from the Japanese team

//brazil team 
focus on 4 languages
dutch - yes, surinam and the duct antilles are located in SA!
schema design for present requests
  _id: "surf_board_12334244", 
  color: "azul e amarelo",
  size: "M"
  present_for: "Ze Pequeno",
  address: {
    street: "Rua Noe 10",
    city: "Cidade de Deus"
    zip: "22773-410 - RJ",
    geo: {
        "type": "Point",
        "coordinates": [
//japan team 
focus on 2 languages 
schema design for present requests 
  _id: { inc: 5535345, name: "Maneki Neko"}
  shape: "Round"
  receiver: "Shinobu Tsukasa",
  street: "〒東京都中央区 日本橋3-1-15, 1F",
  city: "Tokio"
  zip: "103-0027 - TK",
  position: {
      "type": "Point",
      "coordinates": [
<pre><code>  ]

} ... }

As one can figure out from the above misaligned schema designs, the 2 teams have considerably different views on how to model some important base information.

We start by the simple but quite important format of _id.

On the Brazil team, a simple composed string would be enough to uniquely identify the intended gift while in Japan, they adopted a different strategy setting _id as a sub-document with 2 fields, incremental value (inc) and the name of the object (name).

While both are valid MongoDB schemas,and can coexist the same collection (shared or not), this situation can cause some "interesting" side effects:

  • more complex queries to find all intended elements
  • index inefficiencies due to the multiple data types that we are indexing
  • sorting issues
  • ordering of keys in sub-documents will matter
  • specific match criterion
> db.test.find()
{ "_id" : { "a" : 1234, "b" : 5678 } }
> db.test.find( { _id : { b : 5678, a : 1234 } } ) <- No Match! Wuh?
> db.test.find( { _id : {  a : 1234, b : 5678 } } ) <- But this does? What"s the diff?

This can become a very hairy situation!

Although flexibility is one of the most loved and appreciated attributes of MongoDB, one needs to be aware that "with great power comes great responsibility".

One of the basic rules of good schema design is to have a common base structure for documents stored in the same collection. This common structure should be a set of fields that are generally the same for different applications, with agreed upon data types and formats.

This uniform data structure makes everyone's life much simpler and of course, when Santa wants to get a list of all presents he needs to deliver (yes, Santa does his own queries with the MongoDB shell!) he does not need to build a large $or statement with all the variations that schema might contain.

Document Validation

Now we all know that even with the best intentions, once we have a distributed team like Santa's regional super expert Elf developers, sometimes we change the schema involuntarily; either by implementing a new schema that slightly changes a data type or even the format of a given field.

To avoid issues like these we introduced Document Validation in MongoDB 3.2 . Document validation enforces guarantees on the structural definition of your documents.

For optimal configuration the validation of incoming write operations needs to be defined at the collection level on the server. In MongoDB this setup is very flexible and we can adjust the validation rules as we evolve our application. All incoming write operations, either new writes or updates, will be matched against the predefined validator rules to acknowledge or reject those operations.

The behavior of the validator can also be changed according to the operations, giving the system a way to override the rejection of certain write operations (if we want to bypass the validation rule) or just purely set the action for the validation by changing the default rejection to a warning in the mongod log.

db.presents.insert( { 
  _id: "skate_board_434", 
  color: "blue",
  for: "Mariott"
} )  -> results in an error since it's missing `present_for` field

This is particularly interesting for distributed and multi app/multi versioned environments that have multiple teams working over the same dataset and all sorts of different roles (developers, sys admins, DBAs …)

The minute Elf Chief Architect read about this feature he jumped in his warm, comfortable, well cushioned sofa and started playing around with the existing release candidate! "What a great feature" some Elves reported hearing.

Lookup operator

Now, one of Santa's main responsibilities is to make sure that only well behaved children actually receive presents.

In the past, the Elves would overcome this problem by putting together a list of all the poorly behaved children (far less than the well behaved ones) and mark the presents that would match this list to _deserves:false, and then filter out this field on the list of results.

While this was efficient since they were doing an inplace update of a given list, an extra write operation consisted of batching the 6 billion children (we are all children inside!). 6 billion * 16 (16 is the average amount of presents that each child gets on Christmas in the UK) tends to become a massive operation, but nothing that MongoDB can’t handle. To avoid changing data, another option would be to filter this with an aggregation operation. Since we just need a report at the end of the present submission period, what the Elves refer to as CFP (call for presents), ECA decided to test the new $lookup operator.

The Elves decided to have the following architecture, collection of all children and how well behaved they've been this year:

// collection children 
> db.children.find()
 name: "Norberto Leite",
 behaved: true,
 note: "Deserves all the presents possible!"
 name: "Ze Pequeno",
 behaved: false,
 note: "very nasty gangster!"
 name: "Shinobu Tsukasa",
 behaved: false,
 note: "japanese mafia member"

And another collection with all the presents that our parents and friends submitted on our behalf:

// presents collection
> db.presents.find()
  _id: "5535345_Maneki Neko",
  shape: "Round"
  receiver: "Shinobu Tsukasa",
  street: "〒東京都中央区 日本橋3-1-15, 1F",
  city: "Tokio"
  zip: "103-0027 - TK",
  geo: {
      "type": "Point",
      "coordinates": [
<pre><code>  ]

} } { _id: "surf_board_12334244", color: "azul e amarelo", size: "M" receiver: "Ze Pequeno", address: { street: "Rua Noe 10", city: "Cidade de Deus" zip: "22773-410 - RJ", geo: { "type": "Point", "coordinates": [ -43.36263209581375, -22.949136390313374 ] } } } ...

Given these 2 collections we can perform a left outer join between them using the aggregation framework:

db.children.aggregate( {"$match": { "behaved": true }}, 
{"$lookup": {  "from": "presents", "localField": "name", "foreignField": "present_for", "as":"presents"   }  })

With the previous instruction we enabled, we will collect all presents for each child and set those values on the presents field:

  name: "Norberto Leite"
  behaved: true
  presents: [{_id: "play_that_box_34432", ...}]


Elf Chief Architect was really pleased with the present MongoDB has delivered this year. Not only can he can make much more concise decisions around how data can be handled by the different teams across the globe, he can accommodate some known regional challenges:

  • distribution - sharding
  • schema variation - document validation
  • enhancement of technical expertise - lots of different drivers
  • complex queries across different collections - $lookup
  • good integration with container architecture

There are many new tricks available with 3.2 that make the Elf Chief Architect happy:

  • partial indexes
  • connector for BI
  • new election protocol
  • new aggregation framework operators
  • ...

...and full bag of other features that enable the Elves to produce great applications for the Christmas operations to run smoothly. You can learn more about all of these by downloading our What’s New in MongoDB 3.2 white paper.

With MongoDB 3.2 not only does your application get that edge required for enabling large distributed teams to work on the same dataset with extra guarantees at the server level, but keeping the flexibility and scalability that developers love.

Happy Holidays, everyone!

Learn more about MongoDB 3.2.

Read the What's New in 3.2 white paper

About the Author - Norberto

Norberto Leite is Technical Evangelist at MongoDB. Norberto has been working for the last 5 years on large scalable and distributable application environments, both as advisor and engineer. Prior to MongoDB Norberto served as a Big Data Engineer at Telefonica.