The Ops Guide to a Peaceful Thanksgiving
November 25, 2015 | Updated: January 5, 2016
Imagine a Thanksgiving holiday spent with your closest family and friends - enjoying turkey with gravy, mashed potatoes, green bean casserole, and old fashioned eggnog. For those of us on the operations team, a peaceful few days at home just isn’t a possibility. Ops is always on call, fixing the database at all hours, and over all holidays.
If you’re traveling on Wednesday, celebrating on Thursday, shopping on Friday, recovering on Saturday and Sunday, online all Monday, there’s just no time left to monitor your database. Before you leave for the holiday, you need to have confidence in the health of your deployment. Prepare yourself with a production checklist:
- Proper Monitoring and Alerts
- Make Sure You’re Deployed Across Data Centers
- Avoid Deploying A Majority of Nodes in a Single Data Center
- Have a Backup / Recovery System, and Know How to Use It
1. Put Proper Monitoring and Alerts in Place
Having proper monitoring and alerts will give you peace of mind at the Thanksgiving table. Be sure to monitor key metrics and set appropriate alerting thresholds such that you are certain to know when there is a problem on the server. A key aspect of monitoring and alerting is knowing what the appropriate thresholds for your system are. This should be done by load testing before deployment, and baselining your system after launch. Knowing what your limits are allow you establish the right alerting thresholds without generating false alarms. Incessant false alarms are a particularly treacherous danger because they cause engineers to start ignoring alerts. If your Ops team has become numb to a noisy alerting system they’ll be certain to miss the real alert condition when it comes. You’ll know that you’ll only be notified if something bad happens, so enjoy that 2nd piece of pie!
The best way to monitor the health of your deployment is with MongoDB Cloud Manager and Ops Manager. Cloud Manager is our cloud management application, and Ops Manager is the on premise version. Both offer fine-grained monitoring, giving you insight into your most important database metrics. You can also configure custom dashboards to track your cluster and set up alerts for immediate notification when metrics fall between your warning and saturation points. Visit our recommendation guide post for the top 5 monitoring alerts to keep your MongoDB deployment on track and you at the dinner table.
2. Make Sure You’re Deployed Across Data Centers
How available are you? Consider what kinds of failure scenarios you’re likely to go through. While MongoDB provides high availability through replication, you need to be sure that the nodes are deployed across a set of different data centers.
Any individual Data center is vulnerable to power outages, fires, and network interruptions which can limit your database availability. Deploying your replica set across a geographically disperse set of data centers assures availability when these events occur. The remaining nodes will automatically detect the loss of a primary and elect a new primary to restore availability.
3. Avoid Deploying A Majority of Nodes in a Single Data Center
When an election for primary occurs, a member node can only be a candidate for primary if that node is in healthy communication with a majority of nodes in the replica set. For instance, for a replica set of three, a member node can only become primary if it can successfully exchange “heartbeat” messages with at least one other replica set member. That is, the candidate node is in a set of two healthy nodes, as two is a majority of three. If the replica set was five nodes large, a member node can only become primary if it can successfully exchange heartbeat messages with at least three other replica set members - three is a majority of five. MongoDB’s election protocol has this requirement to avoid split-brain scenarios.
Given this election protocol, if you deploy a majority of nodes to a single data center (e.g. two nodes in center “A” and a third node in center “B”) you will lose availability in the event that data center “A” fails. This is because only one node is left standing from the original replica set, and that node can’t be a primary because it can’t establish a majority of nodes in healthy communication.
Now say your organization runs its own infrastructure, and you only have two data centers in which you can deploy a replica set. How can you avoid putting a majority of nodes in either of the two data centers? If you don’t have three data centers, try a cloud based service like AWS to host a third arbiter node. An arbiter is a special member of a replica set which bears no data and serves no requests. All it does is exchange heartbeat messages with other nodes in the replica set and votes in elections. Since it holds no data it doesn’t require robust disk, nor does it incur heavy network traffic. Putting an arbiter in a cloud service is a cheap and easy way to maintain primary electability when you are limited on the number of data centers you can run in.
4. Have a Backup / Recovery System, and Know How to Use It
Sometimes engineers find themselves sprinting to deploy their application by the launch date, cutting corners that can impact the success of their deployment. Don’t go to production without having a well-tested backup and recovery system in place. Perform fire drills frequently so you’ll know what you need to do in case of disaster, and document those procedures so that everyone on your team has a set of emergency procedures to refer to when disasters strike. You’ll thank yourself, especially when your head is still foggy from all the tryptophan you ate in the turkey. Besides, there is little time to waste when pie is being served. Follow your own best practice procedures and be prepared to save your system.
For more peace of mind, Cloud Manager provides a fully managed backup service to protect your data and your business. With Cloud Manager, you get dedicated MongoDB engineers that monitor your backup 24 hours a day, 365 days a year. Let us worry about protecting your data so you can focus on your dessert.
The most important part of Thanksgiving is spending time with your family and friends. To give them your undivided attention, you need peace of mind about your database. Forget about work over the holiday by setting up a proper infrastructure, testing the load capacity of your databases, creating proper monitoring and alerts, and having an emergency procedure in place so you’re ready in case disaster strikes.
From our team to yours, have a very happy Thanksgiving!
Learn more about MongoDB Cloud Manager, the easiest way to manage MongoDB in the cloud. We offer a 30 day free trial, try today:
About the Author - Bryan Reinero
Bryan Reinero is US Developer Advocate at MongoDB fostering understanding and engagement in the community. Previously Bryan was a Senior Consulting Engineer at MongoDB, helping users optimize MongoDB for scale and performance and a contributor to the Java Driver for MongoDB.
Earlier, Bryan was Software Engineering Manager at Valueclick, building and managing large scale marketing applications for advertising, retargeting, real-time bidding and campaign optimization. Earlier still, Bryan specialized in software for embedded systems at Ricoh Corporation and developed data analysis and signal processing software at the Experimental Physics Branch of Ames Research Center.