How PublishThis Tames Its Big Data Workload with MongoDB Management Service (MMS)
September 10, 2013 | Updated: May 22, 2015
Database efficiency matters to PublishThis. Providing the most up-to-date content for thousands of always-on feeds to customers like Overstock.com, Sporting News, SAP, and Cox Media is critical, and to do that requires careful monitoring and tuning of the performance of their MongoDB cluster. I recently spoke with Dr. Chris Lane, the Director of Data Services at PublishThis, about how he optimizes his MongoDB deployment.
PublishThis’ platform crawls the internet for the latest content and scores it based on its patent-pending relevancy engine so their customers can scale their real-time publishing efforts. PublishThis’ crawler indexes content from 300,000 of the web’s best sources daily, pulling data from automated feeds, social media, and other sources on topics from sports to politics to entertainment to technology . The crawler collects huge amounts of semi-structured data from the web, which is then stored in MongoDB.
Finding Performance Bottlenecks... Before Customers Do
To optimize the performance of their database cluster, Chris uses MongoDB Management Service (MMS). MMS Monitoring provides valuable insight into their infrastructure as he manages a multi-terabyte deployment.
The PublishThis crawler database is a sharded system, with data spread across multiple servers. Earlier in their platform development, Chris would run the balancer from 00:00 to 06:00 every day. With MMS monitoring the system, the team noticed that the lock percent and queues went up on the system when the balancer kicked in. With that insight, Chris checked the logs and noticed the moveChunk command was taking a very long time -- sometimes as long as 9 hours!
Chris realized that he needed a faster storage system like a SAN capable of handling large numbers of IOPS. As a quick fix while waiting for the delivery of the SAN, Chris reduced their chunk size. This allowed the balancer to make progress while it ran and minimized overall impact on the system by breaking the work into smaller, more digestible chunks. This improved performance significantly. Once the SAN arrived, he continued to run with the reduced chunk size based on the excellent performance he observed in MMS.
“MMS helps PublishThis deliver the web’s best content at real-time scale by making it easy to manage processing massive, always-on volume with our industry-leading relevancy engine.” – Dr. Chris Lane, Director of Data Services - PublishThis
Monitoring Tips From Chris
New to monitoring with MMS? Chris offers the following advice to those getting started with MMS:
- Set up simple dashboards with just a few key metrics. MMS provides a lot of data, so create a dashboards with the actionable metrics you understand.
- The 'Events' tab is very helpful. You can see everything that's happened with each replica set in one place, such as new hosts or elections.
- Run at least two agents on VMs on physically different hosts. If something happens with one VM (or, worst case, the host), you won't lose monitoring.
Haven’t started monitoring with MMS yet? It’s free. Create your account at mms.mongodb.com -- you’ll be able to visualize performance of your MongoDB system within minutes.