Deployment Best Practices: Monitor your resources



When you’re preparing a MongoDB deployment, you should try to understand how your application is going to hold up in production. It’s a good idea to develop a consistent, repeatable approach to managing your deployment environment so that you can minimize any surprises once you’re in production.

The best approach incorporates prototyping your setup, conducting load testing, monitoring key metrics, and using that information to scale your setup. The key part of the approach is to proactively monitor your entire system - this will help you understand how your production system will hold up before deploying, and determine where you’ll need to add capacity. Having insight into potential spikes in your memory usage, for example, could help put out a write-lock fire before it starts.

To monitor your deployment you can use several different tools. 10gen provides a free, hosted monitoring service MongoDB Management Service (MMS) that provides a dashboard and gives you a view of the metrics from your entire cluster. Alternatively you can also build your own tools with nagios, munin or SNMP. Several tools are provided along with MongoDB that allow you to gain insight into the performance of your deployment.

  • mongostat: this utility will check the status of all running mongod and mongos instances and will capture and return counters of database operations. These include inserts, queries, updates, deletes, and cursors. mongostat will also show when you’re hitting page faults, and showcase your lock percentage. This typically means you’re running low on memory, are hitting write capacity or have a similar performance issue.

  • mongotop: this will track and report the read and write activity of your MongoDB instance on a collection basis. mongotop returns information each second by default, but you can force mongotop to return information less frequently by specifying a specific number: mongotop 20 will return values every 20 seconds. You should check that this read and write activity matches your application intention, and you’re not firing too many writes to the database at a time, reading too frequently from disk, or are exceeding your working set size.
  • iostat: On Linux, use iostat to monitor your storage system performance. This will help identify any bottlenecks in your disk I/O and subsequently in your database. Metrics like %util will tell you the percentage of time your drive is being used, and avreq-sz will indicate the average request size. There are several others that may also be important to monitor for your deployment.

If you’re using MMS or another Monitoring service you should also closely monitor the following:

  • Op Counters: These include inserts, updates, deletes, reads, and cursor usage.

  • Resident Memory: You should always keep an eye on your memory allocation. Resident memory should always be lower than physical memory. If you go out of memory you’ll experience page faults and index misses and have much slower times on query returns.
  • Working set size: Keep a close eye on your working set, which is the total body of data used by your application. For optimal performance, your active working set should fit into RAM. In MongoDB 2.4, there is a new working set analyzer which will help reveal when documents are being “paged out,” or removed from physical memory by the operating system. You can decrease your working set size by optimizing your queries and indexing patterns to prevent large scans, or plan to add larger RAM when you expect your working set to increase.
  • Queues: MongoDB’s concurrency model uses a readers-writer lock to provide simultaneous reads but exclusive access to a single write operation. Given that approach queues can often form behind a single writer, with those queues containing readers, writers or both. During lengthy write operations MongoDB will periodically yield to allow other writers to get through in order to increase write throughput, but that can cause reader starvation Monitoring this metric along with “Lock Percentage” will give you an idea of the concurrency your deployment is seeing. If the “Lock Percentage” and the queues are trending upwards (e.g. spiking) then you may be dealing with contention within the database. Data model changes or “batch” operations can have a significant positive impact on concurrency.

Your testing period is critical to preparing your application for success. By monitoring these metrics closely during pre-launch, you’ll be better prepared for when your application hits heavy usage in the future. If you’re already in production, monitoring your current application usage with a tool like MMS will give you insight into production patterns. Going through your indexing patterns, CRUD behavior and indexes will help you better understand your applications flow for when there is a hiccup.