MongoDB, build parties, and deploying your web application at 11am on a Wednesday

MongoDB

#Releases

This is a guest post by Sean Reilly. Release your applications with MongoDB more often and get closer to the ultimate goal of deploying applications anytime and why not at 11am on Wednesday mornings?

What you will learn…

This article explores how to make use of MongoDB characteristics in order to avoid the downtime traditionally required by migration scripts in the SQL world. This is in order to get closer to the goal of being able to deploy applications with no downtime.

What you should know

The basics of MongoDB

Many software developers reading this article will be familiar with the concept of the “build party”. For those who aren’t familiar with the term, this story should explain:

In another life, I worked for a medium sized startup in Canada, as part of a web application development team. The product had a typical software stack for the mid 2000’s — a monolithic application written in managed code (c# in this case) deployed to a cluster of application servers, backed by a fairly large relational database. As we were actively developing the product, we released new versions of it quite often.

Our development team was in the middle of North America (time zone: UTC-6). As the application became more popular throughout North America, we had to account for users on the West Coast (UTC-8), and the East Coast (UTC-5). This shrunk our “maintenance window” to approximately 9 hours. Then the application started winning users in the United Kingdom (UTC-0). This caused moved the end of our maintenance window pretty drastically. Then we started getting traction in Hawaii (UTC-10), and the beginning of our maintenance window moved drastically.

The details of the product aren’t particularly relevant, except that the target market was businesses, so we originally started with a fairly wide “maintenance window”. The policy was that our “maintenance window” (the times when we could deploy new releases of the application) was “outside business hours”, which was defined as approximately 8pm-8am. So, a few times each month we would have some developers stay late, order some pizza, put the application into maintenance mode, and deploy the new version of the web application. This is the build party.

Originally, our maintenance window gave us plenty of time to do whatever we needed to deploy the app, without inconveniencing our users. And for a few years, this pattern served us fairly well. Then our application became more popular. Popularity in terms of sheer numbers of users was never a problem — what hurt us was when we gained a significant user base across different time zones.

Then Australian users (UTC+10) started to complain when we put the application into maintenance mode in the middle of their day.

Various discussions and strategies were considered. In the end we settled on a maintenance window from 2:30am-5am local time; performing deployments between the end of business in Hawaii, and the beginning of business on the eastern edge of Australia.

  • If you can release a new version of your software Wednesday morning at 11am without inconveniencing your customers, you can release it whenever you want.

At the same time, our organisation was becoming more Agile, which means that the natural cadence for us to release a new version of the app was every sprint — in our case, weekly.

This is when I realised that the best time to release a web application to production is Wednesday morning at 11am.

So we were faced with the spectre of having a small team of developers come into the office after midnight, prepping for a time sensitive operation that started downtime at 2:30am, and absolutely needed to be done by 4:30am (if it wasn’t, we had to rollback the deployment in time for the end of our maintenance window at 5am). Every week.

This is when I realised that the best time to release a web application to production is Wednesday morning at 11am.

That may sound crazy, but there are a few reasons why:

  • Wednesday is the day of the week least likely to be a holiday. When your team is located within a single time zone, by 11am everyone on the team should be in the office, well caffeinated, with morning email checks and standup meetings out of the way. Should something should go wrong with a release, Wednesday at 11am gives you great odds that the entire team will be on hand to help out.
  • Developers are just as happy to eat pizza at 11am as they are between 2:30-5am, and despite what many of them will claim, they do better work when they are not sleep deprived.
  • If you can release a new version of your software Wednesday morning at 11am without inconveniencing your customers, you can release it whenever you want.

Our original “maintenance window” was based on the idea that a release usually required the application to be taken offline. If an upgrade could be performed without users noticing, then we could do it whenever we liked (within reason).

As the years went by, our deployment process became quite sophisticated. Rolling deploys to application servers behind a load balancer eliminated the possibility of broken pages when an individual server was upgraded, and automating the process kept it as quick as possible. The one hurdle that we never managed to overcome was the downtime imposed by database migration scripts.

All SQL databases divide their language into two kinds of statements. Data Modification Language (or DML) are the traditional INSERT, SELECT, UPDATE, and DELETE statements. Data Definition Language (or DDL) are the statements like ALTER TABLE that are used to modify the database schema. While DML operations can be written to affect a single row at a time, DDL operations by their very nature affect (and usually exclusively lock) an entire table at once. It is impossible for a SELECT statement to successfully return data while an ALTER TABLE statement is executing. If your migration script is unlucky enough to have to run an ALTER TABLE statement on a table that contains several million rows, that entire table will be unavailable for quite soFast forward several years, into the brave new NoSQL world where products like MongoDB are available, and we now have other options. Despite some of the hype around NoSQL products being “schema less”, in most cases where you are writing software that uses MongoDB you will want to enforce constraints on how your data is stored. There are, however, some significant differences:

  1. MongoDB itself is usually unaware of the schema requirements for a table or collection — schema is now usually enforced by the application.
  2. 2. It is possible for different documents in a single MongoDB collection to each conform to a different schema. We can take advantage of these differences to avoid the downtime required by a SQL migration script, by having our application migrate data itself without downtime.

Data Migration as a responsibility of the Application

With this pattern, when a new version of the application is released, it is responsible for migrating individual documents one at a time from the previous schema, to the new schema. This should happen naturally, as each document is loaded by the application by normal user action.

  • Design the application so that each collection is only accessed from a single class (or highly cohesive set of classes). The repository pattern is a good example of how to achieve this. (http://martinfowler.com/eaaCatalog/repository.html
  • When the class retrieves data from the MongoDB collection, documents that conform to the new schema are loaded normally. Documents that conform to the previous schema are transparently migrated to the new format, before being converted into objects. This behaviour can (and should) be unit tested to ensure it is working as expected.
  • When the class saves data back to the MongoDB collection, documents are always written in the new format.
  • Cross table migration issues such as foreign keys and joins are not a problem. Since MongoDB doesn’t have these concepts, they will not trip you up.

With this pattern, data migration will start to happen automatically as soon as the application accesses data. The change will be invisible to the users of the application, and the most frequently used (or updated) data will be migrated first. As the new version of the application is used, more and more data will be upgraded, although it is likely that not all data will be migrated right away

Once the application deployment is complete, you can wait for all of the documents to be migrated naturally, or you can migrate leftover data in a background thread. I suggest at least considering this option, as in the long term it will become onerous if you have to support many different schema versions for documents in the same collection.

Special cases that require a little more attention:

When the migration edits the primary key of the document: In a normal migration operation, data is loaded, the format is changed, and data is saved back to the same place. When the primary key is changed as part of a migration, the document is also *moved*. This means that the new document needs to be saved, and the old version must be deleted. It’s also necessary to check in two places in order to be sure that a document does not exist. Your code will need to issue two find operations (or perhaps one, with an $or clause) when retrieving documents to be sure to catch documents that have (and have not) been migrated already.

Expensive migrations:

Some migration operations (hopefully most) are so cheap that they can be performed over and over at very little cost. Renaming a field is the classic example of this sort of operation. In these cases, migrating whenever a document is loaded is fine, even if the document isn’t saved — the cost of performing the migration multiple times isn’t an issue. However, some migrations might be more expensive. A migration might need to call an external service or perform some other costly or complicated operation. In these cases, it’s probably best to save the migrated document back to MongoDB immediately.

Incompatible unique indexes: Unique indexes are a valuable tool for maintaining data integrity in MongoDB. However, unique indexes are problematic when a migration renames document fields, as a document missing the fields has an implied field value of NULL, and only one document is allowed the value NULL with a unique index. Fortunately, MongoDB introduced the concept of sparse indexes in version 1.8, and unique indexes can also be sparse. With a sparse unique index, documents that don’t contain the indexed field at all are exempt from the unique requirement. Sparse Unique indexes are a valuable tool for document-at-a-time migration solutions.

Expensive migrations: Some migration operations (hopefully most) are so cheap that they can be performed over and over at very little cost. Renaming a field is the classic example of this sort of operation. In these cases, migrating whenever a document is loaded is fine, even if the document isn’t saved — the cost of performing the migration multiple times isn’t an issue. However, some migrations might be more expensive. A migration might need to call an external service or perform some other costly or complicated operation. In these cases, it’s probably best to save the migrated document back to MongoDB immediately.

Map/Reduce or Aggregation Framework operations: Since these operations occur within the database itself, they cannot take advantage of a layer of the application transparently migrating documents so that the entire collection has the same schema. In a future release, MongoDB might include computed views that will make this situation easier to deal with, but until then the best solution is probably to perform the operation twice: once with the data that is still in the legacy schema, and once with the data that has been converted into the new schema. Then perform a final aggregation of the two outputs into a single result in your application code.

One final tip that can make this pattern easier is to use a schemaVersion field. When the collection can have documents that conform to many different schema, each document should track which schema it conforms to.

So a document that looks like this before it is migrated:

{
    “_id”: 12345,
    “name”: “Sean”,
    “lastName”: “Reilly”
}

might look like this after migration:

 {
    “_id”: 12345,
    “name”:  {
         “first”: “Sean”,
         “last”: “Reilly”
    },
    schemaVersion: 1

This pattern seems like more work, and at first, it is. But it can free you from a number of headaches over time, and the benefits are clear:

If the current version of your application doesn’t have a schemaVersion field in every document, then treat the absence of the field as an implied version 0. This makes it very simple to find non-migrated documents, and might be especially valuable with Map/Reduce or Aggregation Framework querie

  • Avoid the IO and CPU load of a massive migration that loads and migrates an entire collection at once.
  • Avoid downtime due to “stop the world” migration scripts.
  • Migration responsibilities are part of the application itself, and can even be unit tested!

I hope that this is a pattern that you will find useful, and even more than that, I hope that it’s a pattern that will save your customers from maintenance windows, save your development team from some sleepless nights, and allow you to release your applications more often… and get closer to the ultimate goal of deploying applications at 11am on Wednesday mornings.

Sean Reilly is a software developer and consultant for Equal Experts, one of the fastest growing technology companies in the UK. He specialises in lightweight, Agile, enterprise web application development, and has been using MongoDB in anger since version 1.4.