Cache Reheating - Not to be Ignored



An important aspect to keep in mind with databases is the cost of cache reheating after a server restart. Consider the following diagram which shows several cache servers (e.g., memcached) in front of a database server.

This sort of setup is common and can work quite well when appropriate; it removes read load from the database and allows more RAM to be utilized for scaling (when the database doesn’t scale horizontally). But what happens if all the cache servers restart at the same time, say, on a power glitch in a data center?

We then have a cache reheating scenario. After the bounce the full load of all read requests will hit the database server (for a while) given the caches are empty. The server won’t be able to handle it (that’s why the cache servers were there in the first place). Now if the reheat time is short, this is not a big problem : we did go down after all. But if reheat takes a long time, it’s a big problem. Imagine 20 cache servers with 64GB RAM each. 1.2 terabytes of data must be queried from the database to be fully reheated!

Even without a cache server, the same issue exists for almost all databases. Imagine a server restart on a system with 64GB RAM. If loading 100MB/sec, reheat will take 10 minutes. However, if the queries are coming in randomly, it could take much much longer – only with sequential I/O can we get anywhere near that speed.

One could just sequentially read data in and fill the cache. However for databases much larger than RAM, loading in the right portion is difficult. “Hot” data could be at very different locations in the terabyte(s) of on disk data.

A very nice attribute of the MongoDB storage engine is its use of memory-mapped files. In this model the cache is the operating system’s file system cache. Restart the mongod process, and there is no reheat issue at all. Some databases use their own page cache which causes a reheat scenario even on just a process restart. Of course on a full server reboot MongoDB must reheat too.

A few points to keep in mind:

  • Think about reheating and how you will operationally handle it if a scenario involving it occurs. For schedules OS maintenance restart the server during off peak hours to minimize the load during reheat.
  • Restarting the mongod process is safe with respect to reheating.
  • Remounting a volume likely loses all file system cache info for the volume.
  • On a server restart, copy datafiles to /dev/null to force reheating to be sequential and thus much faster. This can be done even if the mongod process is already running. If the database is larger than RAM, copy only the newest datafiles (ones with the highest numbers); while this isn’t perfect, the latest files likely contain the largest percentage of frequently used data.