Hudl’s video analysis platform helps coaches win by delivering secure access to video analysis tools from any computer or mobile device. For those who follow Division 1 college sports in the United States, you’ll be interested in knowing that the Hudl platform stores video for 99% of DI schools’ top recruits.
As Hudl has grown, it has outgrown some of its infrastructure.
For example, when Hudl hit a limit on EC2 (where SQL wouldn’t scale on a single instance), the growing company needed to recruit a new database. After evaluating different options, Hudl chose MongoDB.
According to Hudl CTO Brian Kaiser: “MongoDB changed devops from a necessary evil to something that is transforming the company, helping us move quickly. It makes innovation easy and is universally recognized at our company because it’s been so impactful on our growth.”
The numbers speak for themselves. Today, MongoDB stores 650 million plays (atomic unit such as point for volleyball, play for football) and associated metadata, which gets 1 billion video views per month. The MongoDB-based platform has streamed 18 petabytes of data in 2014. During peak football season, the platform ingests 25 hours of raw video per minute; that’s one quarter of what YouTube ingests.
Big numbers for a small company!
With MongoDB, Hudl has achieved steady, consistent growth and enabled Hudl to double in size. “MongoDB really facilitates rapid iterations, so the dev team can try things out and make mistakes – it’s magical for that,” said Kaiser. “MongoDB has led ops to promote squad growth and really empowered our company, and that’s something we’re proud of.”
To see all MongoDB World presentations, visit the [MongoDB World Presentations](https://www.mongodb.com/mongodb-world/presentations) page.
MongoDB Performance Optimization with MMS
This is a guest post by Michael De Lorenzo, CTO at CMP.LY. CMP.LY is a venture-funded startup offering social media monitoring, measurement, insight and compliance solutions. Our clients include Fortune 100 financial services, automotive, consumer packaged goods companies, as well as leading brands, advertising, social and PR agencies. Our patented monitoring, measurement and insight (MMI) tool, CommandPost provides real-time, actionable insights into social performance. Its structured methodology, unified cross-platform reporting and direct channel monitoring capabilities ensure marketers can capture and optimize the real value of their engagement, communities and brand advocates. All of CommandPost’s products have built-in compliance solutions including plain language disclosure URLs (such as rul.es, ter.ms, disclosur.es, paid-po.st, sponsored-po.st and many others). MongoDB at CMP.LY At CMP.LY, MongoDB provides the data backbone for all of CommandPost’s social media monitoring services. Our monitoring services collect details about social media content across multiple platforms, all engagements with that content and builds profiles around each user that engages. This amounts to thousands of writes hitting our MongoDB replica set every second across multiple collections. While our monitoring services are writing new and updated data to the database in the background, our clients are consuming the same data in real-time via our dashboards from those same collections. More Insights Mean More Writes With the launch of CommandPost, we expanded the number of data points our monitoring services collected and enhanced analysis of those we were already collecting. These changes saw our MongoDB deployment come under a heavier load than we had previously seen - especially in terms of the number of writes performed. Increasing the number of data points collected also meant we had more interesting data for our clients to access. From a database perspective, this meant more reads for our system to handle. However, it appeared we had a problem - our API was slower than ever in returning the data clients requested. We had been diligent about adding indexes and making sure the most frequent client-facing queries were covered, but reads were still terribly slow. We turned to our MongoDB Management Service dashboard for clues as to why. MongoDB Management Service By turning to MMS, we knew we would have a reliable source to provide insight into what our database was doing both before and after our updates to CommandPost. Most (if not all) of the stats and charts we typically pay attention to in MMS looked normal for our application and MongoDB deployment. As we worked our way through each metric, we finally came across one that had changed significantly- Lock Percentage. Since releasing the latest updates to CommandPost, our deployment’s primary client-serving database saw its lock percentage jump from about 10% to a constant rate of 150-175%. This was a huge jump with a very negative impact on our application - API requests timed out, queries took minutes to complete and our client-facing applications became nearly unusable. Why is Lock Percentage important? A quick look at how MongoDB handles concurrency tells us exactly why Lock Percentage became so important for us. MongoDB uses a readers-writer lock that allows concurrent reads access to a database but gives exclusive access to a single write operation. When a read lock exists, many read operations may use this lock. However, when a write lock exists, a single write operation holds the lock exclusively, and no other read or write operations may share the lock. Locks are “writer greedy,” which means writes have preference over reads. When both a read and write are waiting for a lock, MongoDB grants the lock to the write. As of version 2.2, MongoDB implements locks at a per-database granularity for most read and write operations. The “greediness” of our writes was not only keeping our clients from being able to access data (in any of our collections), but causing additional writes to be delayed. Strategies to Reduce Lock Contention Once we identified the collections most affected by the locking, we identified three possible remedies to the issue and worked to apply all of them. Schema Changes The collection that saw our greatest load (in terms of writes and reads) originally contained a few embedded documents and arrays that tended to make updating documents hard. We took steps to denormalize our schema and, in some cases, customized the _id attribute. Denormalization allowed us to model our data for atomic updates. Customizing the _id attribute, allowed us to simplify our writes without additional queries or indexes by leverage the existing index on the document’s _id attribute. Enabling atomic updates allowed us to simplify our application code and reduce the time spent in application write lock. Use of Message Queues To manage the flow of data, we refactored some writes to be managed using a Publish-Subscribe pattern. We chose to use Amazon’s SQS service to do this, but you could just as easily use Redis, Beanstalkd, IronMQ or any other message queue. By implementing message queuing to control the flow of writes, we were able to spread the frequency of writes over a longer period of time. This became crucially important during times where our monitoring services came under higher-than-normal load. Multiple Databases We also chose to take advantage of MongoDB’s per database locking by creating and moving write-heavy collections into separate databases. This allowed us to move non-client-facing collections into databases that didn’t need to be accessed by our API and client queries. Splitting into multiple databases meant that only the database taking on an update needed to be locked, leaving all other databases to remain available to serve client requests. How did things change? The aforementioned changes yielded immediate results. The results were so drastic that many of our users commented to us that the application seemed faster and performed better. It wasn’t their imaginations - as you can see from the “after” Lock Percentage chart below, we reduced the value to about 50% on our primary client-serving database. What’s Next? In working with MongoDB Technical Services, we also identified one more strategy we intend to implement to further reduce our Lock Percentage - Sharding. Sharing will allow us to horizontally scale our write workload across multiple servers and easily add additional capacity to meet our performance targets. We’re excited about the possibility of not just improving the performance of our MongoDB deployment, but offering our users faster access to their data and a better overall experience using CommandPost. If you want to learn more about how to use MongoDB Management Service to identify potential issues with your MongoDB deployment, keep it healthy and keep your application running smoothly, attend my talk “Performance Tuning on the Fly at CMP.LY” at MongoDB World in New York City, on Tuesday, June 24th at 2:20pm in the New York East Ballroom.
What Do I Use This For? 270,000 Apps
“Would you rather run 10 different databases, or run one database in 10 different ways?” - Charity Majors, Parse/Facebook. When introduced to a new technology, would-be users often want to know, ‘So, what should I use this for?’ At MongoDB, we like to say, ‘MongoDB is a general purpose database good for a wide variety of applications.’ While this may be true, it of course doesn’t help the would-be user with her original question: ‘So, what do I use this for?’ Fair enough. In the evening keynote at MongoDB World, Charity Majors offered a response better than any MongoDB employee could have conjured up. Charity is the Production Engineer at Parse (now part of Facebook), a mobile backend-as-a-service that “allows you to build fully featured mobile apps without having to sully your pure mind with things like data models and indexes.” And it runs on MongoDB. Parse runs and scales more apps than even the most sprawling enterprises. 270,000 apps. The number of developers using Parse is growing at 250% per year; the number of API requests is growing at 500% annually. They serve every kind of workload under the sun. “We don’t know what any app’s workload is going to look like, and neither do the developers,” says Charity. And as Parse goes, so goes MongoDB. The diversity of workloads that Parse runs on MongoDB is testament to the canonical ‘general purpose’ argument. With 270,000 different apps running on MongoDB, it should be clear that you can use it for, at the very least, a lot of different use cases. But Charity’s rousing speech offered an implicit response to the use case question cited above. ‘What do I use this for?’ begs a different but arguably more important question that Charity suggests users ask themselves: ‘What can I not use this for?’ That is, when choosing a technology -- especially a database -- users should be looking for the most reusable solution. “Would you rather run 10 different databases, or run one database in 10 different ways?” Charity asked the audience. “There is no other database on the planet that can run this number of workloads.” While most companies don’t need a database that works for 270,000 applications at once, every developer, sysadmin, DBA, startup, and Fortune 500 enterprise faces the same questions as Parse but on its own scale. Given a limited amount of time and money, how many databases do you want to learn how to use? How many databases do you want in production? How many vendor relationships do you want to manage? How many integration points do you want to build? Assuming one database works for most if not all use cases, what is that worth to you? These are all flavors of what we’ll now call ‘The Parse’ question. To those in search of a solution, we suggest you take a look at MongoDB. And if you’re still unsure, we can try to put you in touch with Charity. But no promises -- she’s a bit of a celebrity these days. ======== Some other delectable quotes from Charity’s keynote because...we just couldn’t resist: “Holy #%& there are so many people here.” “I’ve been coming to MongoDB conferences for almost 2 years now, and they just keep getting better.” “Speaking as a highly seasoned operations professional...I hate software for a living, and I’m pretty good at it.” “Reliability, Flexibility, Automation.” (The 3 things she loves about MongoDB.) “When I talk about reliability, I’m talking about reliability through resiliency...you should never have to care about the health of any individual nodes, you should only have to care about the health of the service.” “Scalability is about more than just handling lots of requests really fast. It’s about building systems that don’t scale linearly in terms of the incremental cost to maintain them.” “The story of operations is the story of dealing with failures. And this is why MongoDB is great, because it protects you from failures. And when your database lets you sleep through the night, how bad can it be?” “May your nights be boring; may your pagers never ring.” To see all MongoDB World presentations, visit the [MongoDB World Presentations](https://www.mongodb.com/mongodb-world/presentations) page.