Surviving Success at Matchbook: Using MMS To Track Down Performance Issues

MongoDB

#Releases

This is a guest post from Jared Wyatt, CTO of Matchbook, an app for remembering the places you love and want to try.

I joined Matchbook as CTO in January with the goal of breathing new life into an iOS app that had a small, but very devoted following. For various reasons, we decided to start fresh and rebuild everything from the ground up—this included completely revamping the app itself and totally redesigning our API and backend infrastructure. The old system was using MySQL as a datastore, but MongoDB seemed like a better fit for our needs because of its excellent geospatial support and the flexibility offered by its document-oriented data model.

We submitted Matchbook 2.0 to the App Store at the end of June and within a few days received an email from Apple requesting design assets because they wanted to feature our app. So, of course we were all, like, “OMG OMG OMG.”

An Influx of Users

We had originally planned for a quiet roll-out of version 2.0 because it was a completely new codebase and had not really been tested under load. However, our cautious reasoning was replaced by grandiose visions of fame and glory when Apple offered to feature us.

Matchbook 2.0 launched in the App Store on July 3rd.  It was listed on the App Store home page under “New & Noteworthy” with top billing in the “Food & Drink” category. Within a week, we had onboarded tens of thousands of new users. Sweeet! It was high-fives all around until it suddenly wasn’t.

As our user base exploded, our application performance monitoring tool (New Relic) indicated massive amounts of time spent in the database during spikes of heavy user activity. Many, many milliseconds were being squandered somewhere in the ether while our API server was chatting with our MongoDB server. Support tickets and tweets started coming in about how much we sucked. We started freaking out (just a little) and began to rue the day we let Apple promote our app.

Monitoring to the Rescue

Prior to the launch, in addition to setting up New Relic to monitor our application, we set up MMS to monitor MongoDB. New Relic showed us that the performance issue was related to the database, but didn’t provide us with the detail necessary to determine what was causing the slowdown. So, I went to MMS. The first thing that caught my eye was the cursors chart. There were some freakish spikes in concurrently open cursors for the amount of activity on the database. So I says to myself, I says, “Jared, that seems sketchy, but why is it happening?”

I poked around in MMS a bit and noticed the profile data log—it was empty. At the risk of sounding like a n00b, I didn’t know what MongoDB profiling was, but it seemed like something I should look into. The MongoDB profile documentation indicates that level 1 profiles slow operations. Wait—did someone say slow operations? That’s me! I have slow operations! So, I hopped over to our database and said { profile: 1, slowms: 200 }.

Suddenly, query profiles started showing up in MMS and the universe began to make sense. We discovered that our ODM was running a lot of searches on indexed fields (which is good) using regular expressions instead of strings (which is bad for speediness). Upon further investigation, we found that this was happening because we had used the ODM to assign certain case-insensitive validations to some of the data models in our code. We made the appropriate changes and saw our performance issues immediately disappear. Our users were happy again.

image

Post Mortem

Although it caused big problems, this turned out to be a simple error with a simple fix. If not for MMS, the discovery could have been very time-intensive and stressful. It simply did not occur to us that our case-insensitive validations would cause the ODM to build queries with regular expressions and thus result in mad-crazy performance issues. Thanks to MMS, we got a clear picture of what went wrong, and it led us to implement a more efficient solution that gives us the case-insensitive validations we need without running regex searches in MongoDB.

It’s widely accepted that enterprise level systems need good monitoring tools because of their size and complexity, but the same need is often overlooked in tech startups. In today’s ecosystem where everyone is standing on the shoulders of dozens of 3rd-party libraries/frameworks/whatever to build a simple app, it’s often difficult to deduce where things might be going wrong. More than ever, the small, lean tech startups need tools that give us good insight so we can optimize performance and solve problems without expending too many of the precious few resources that we have.

Takeaways

  • Set up monitoring. Visibility into your operations and interpreting the data correctly is your lifeline. Set up some custom dashboards in MMS for at-a-glance views of key metrics.
  • Load test. Then load test some more and watch the data. You will see strange and wonderful things that you never thought possible when you watch how your application and database operations perform under load. Try to discover and fix some of these things before you launch. Load testing can also inform you about what specific metrics you should pay close attention to for your particular application.
  • Set up performance alerts. Once you have a pretty good idea of which metrics you need to pay attention to, create alerts for when these data points approach unacceptable levels.
  • Set up basic alerts for your server configuration, e.g. a replication lag alert for your replica set.
  • Strike a ninja-like offensive pose when you launch. You never know what will happen and must be ready with cat-like reflexes.

Learn more about Matchbook at matchbook.co. We’re currently hiring designers and developers, so feel free to drop us a line at jobs@matchbook.co for more info.