Remaining Agile with Billions of Documents: Appboy’s Creative MongoDB Schemas

Jon Hyman


Appboy is pioneering a new vertical in the marketing automation industry with a powerful technology designed for companies looking to build better relationships with customers through mobile and other emerging channels. Appboy powers some of the most successful brands in the new mobile economy such as iHeartMedia, PicsArt, Etsy, Samsung and Urban Outfitters.

### Table of Contents [Part 1: Statistical Analysis](#part1) [Part 2: Multivariate Testing and Rate Limiting](#part2) [Part 3: Flexible Schemas: Extensible User Profiles](#part3) [Part 4: Data Intensive Algorithms](#part4)


To power its marketing automation platform, Appboy uses MongoDB as the main data storage layer for its analytics and targeting engine. Appboy is currently processing billions of data points per day on behalf of thousands of customers. In this blog post, you will learn some of the ways in which Appboy has evolved to use MongoDB, remaining agile as the company has grown to massive scale. This post will cover topics such as random sampling of documents, multivariate testing and multi-arm bandit optimization of such tests, field tokenization, and how Appboy stores multi-dimensional data on an individual user basis to be able to quickly optimize for the best time to deliver messages to end users.

Part 1: Statistical Analysis

Appboy works with customer bases of all kinds of different sizes. Some of our customers are just starting and have tens of thousands of users, many have single digit to low tens of millions of users, but multiple Appboy customers have hundreds of millions of users that they collect data on and keep engaged thanks to Appboy’s marketing automation technology.

At the core of the Appboy platform is customer segmentation. Segmentation allows you to target users based upon their behavioral data, purchase history, technical characteristics, social profiles and demographics. Creative and intelligent use of Segmentation and Messaging Automation enables our clients to seamlessly and easily move users from install to active customers who help you meet your key performance indicators (KPIs). Segments can be very specific such as “users with iOS 8.1 who have purchased shoes and have not favorited an item in the last week” to a broad audience such as “Spanish speaking users who have not yet beaten level 2.”

When clients use the Appboy dashboard to define a segment, Appboy performs a real-time calculation of the population size, and other characteristics such as how many users in the segment have push notifications enabled or how much money the average user has spent in the app. These calculations need to be real-time and interactive, which is challenging to do with massive user bases without a Google-sized amount of infrastructure. The challenge here is to figure out how to handle scale effectively, and do what works for any size user base. Random statistical sampling is a good way to do this.

About Statistical Sampling

Random, statistical sampling is around us in everyday life. Public opinion polls on what percentage of Americans approve of the President aren’t conducted by asking every single American his or her opinion. National TV ratings aren’t determined by a ratings agency being tapped into every person’s TV set. Instead, those figures are estimated based on statistical sampling, which makes it possible to estimate the characteristics of a larger group by randomly observing members of that group, called a sample size. Through statistics, one doesn’t even need a massive sample size in order to make accurate estimations for a massive population. Many political polls only use a few thousand adults to estimate political leanings of hundreds of millions of citizens. But with each estimation comes what is called the confidence interval, also called the margin of error. That’s the plus-or-minus number that’s often reported alongside statistics from polling agencies.

Using Statistical Sampling

We can apply those same principles to our user base. Sampling users has a distinct advantage over traditional analytics databases because we can sample the entirety of actions taken by people instead of sampling from a raw event flow. One thing to note is that Appboy only uses statistical sampling to give immediate, real-time interactive feedback about a segment on the web dashboard. When running a marketing campaign or exporting a segment as a Facebook Custom Audience, exact membership is calculated and these principles do not apply.

To start, we can add a random number in a known range to each document and call this number a “bucket.” Let’s pick something small enough that, for any reasonably sized user base, we can expect to find users in each bucket, but large enough such that we can pick a decent distribution of users through ranges. For example, choosing a random number between 1 and 10 doesn’t allow us to easily grab a small percentage of a user base with hundreds of millions of users, and similarly, picking a number between 1 and 1,000,000 doesn’t give a good, uniform distribution for apps with smaller audiences. At Appboy, we have 10,000 buckets, ranging from 0 to 9,999.

Let’s say that you had 10 million documents which represented people. Let’s add a random number to that document and index it:

{ <b>`random: 4583,`</b> `favorite_color`: “blue”, `age: 29`, ` gender: “M”`, `favorite_food`: “pizza”, `city: “NYC”, `shoe_size: 11 }<br/> db.users.ensureIndex({random:1})

The first step would be to get a random sample. With 10 million documents, out of the 10,000 random buckets, we should expect each bucket to “hold” about 1,000 users:

db.users.find({random: 123}).count() == ~1000 db.users.find({random: 9043}).count() == ~1000 db.users.find({random: 4982}).count() == ~1000

If we wanted to sample 1% of the user base, that’s 100,000 users. To do that, choose a random range that “holds” those users. For example, these all work:

db.users.find({random: {$gt: 0, $lt: 101}) db.users.find({random: {$gt: 503, $lt: 604}) db.users.find({random: {$gt: 8938, $lt: 9039}) db.users.find({$or: [ {random: {$gt: 9955}}, {random: {$lt: 56}} ])

Now that we have our random sample, we can learn about it. To measure its true size, we should first apply a count, as it may not precisely equal 100,000 due to randomness.

In parallel, we can add an arbitrary query on top of that sample. Perhaps we want to find out what percentage of users are male with a favorite color of blue.

sample_size = db.users.find({random: {$gt: 503, $lt: 604}).count() observed = db.users.find({random: {$gt: 503, $lt: 604}, gender: “M”, favorite_color: “blue”).count()

Hypothetically, assume that sample_size is 100,000 and that the observed count was 11,302. From this, we can extrapolate out that 11.3% of users in our 10 million user population match the criteria. To be good statisticians, we should also provide a confidence interval for this estimate to understand how far off we may be. The math behind a confidence interval is a bit complicated, but there are countless sample size calculators that you can refer to if you want to try this out yourself. In our case, it comes out to +/- 0.2%.


In practice, Appboy does a variety of optimizations on top of this high-level concept when we perform statistical sampling. For starters, we make use of the MongoDB aggregation framework, and heavily utilize caching. A fantastic benefit to using MongoDB for this kind of sampling is that because we are using the memory-mapped storage engine, once we load the random sample into memory, we can run arbitrary queries on it very quickly. This provides our clients with a superior experience on our web dashboard, as they can interactively explore their user base by adding and removing selection criteria and seeing the statistics update immediately.

Part 2: Multivariate Testing and Rate Limiting

A Quick Primer on Multivariate Testing

In today’s competitive market, user segmentation is an absolute must-have. As experiences with brands continue their rapid shift to mobile and emerging channels, message personalization and relevance is more important than ever before for marketers. That’s why user segmentation is a solid prerequisite for engaging with customers.

But once you have defined a user segment, one of your next goals is to optimize your messaging to maximize conversions. Multivariate testing is a way to achieve this. A multivariate test is an experiment that compares users’ responses to multiple versions of the same marketing campaign. These versions share similar marketing goals, but differ in wording and style. The objective is to identify the version of the campaign that best accomplishes your desired outcome.

For example, suppose you have three different push notification messages:

Message 1: This deal expires tomorrow! Message 2: This deal expires in 24 hours! Message 3: Fourth of July is almost over! All deals end tomorrow!

Additionally, with those messages you want to test a variety of images to accompany the text.

Using a multivariate test, you can see which wording results in a higher conversion rate. The next time you send a push notification about a deal, you’ll know which tones and wordings are more effective. Even better, you can limit the size of the test to a small percentage of your audience, figure out which message works better, and then send that to everyone else!

When measuring a multivariate test, you have the test subjects who will receive the message, and a control group, a set of users who are in the segment but will not receive the message. This way, you can know how much uplift the messages generated in terms of conversions with respect to people who received nothing.

Technical Application

From a technical point of view, who receives the message should be random. That is, if you have 1 million users and you want to send a test to 50,000 of them, those 50,000 should be randomly distributed in your user base (and you also want another random 50,000 for your control group). Similarly, if you ran 10 tests each to 50,000 users, randomness helps ensure that different users are in each test group.

Thinking about this problem, it is generically the same problem as rate limiting a message. Many of our customers want to send a message to a small group of users. Perhaps an e-commerce company wants to give out 50,000 promo codes randomly in their user base. This is conceptually the same problem.

To achieve this, we can randomly scan across users based on the random value on each document:

At Appboy, we use parallel processing to manage users across different random ranges, and keep track of global state so we know when we have hit the rate limit. For multivariate testing, a variation of the message is then chosen based on a send probability (more on that later), or randomly chosen to be in the control.

There be dragons!

For those math minded individuals out there, you may have noticed that by using the same random value, we’re overloading what it means to be random. That is, if we use statistical analysis on the random field and also randomly choose individuals to receive messages based on that same field, then in some cases, we have biased ourselves. To illustrate the claim, imagine that we picked all users with a random bucket value of 10 and sent them a message. That means that we no longer have a random distribution in that bucket of users who did and did not receive the message. As a simple workaround, Appboy uses multiple random values on users, careful not to use the same value for more than one purpose.

Part 3: Flexible Schemas: Extensible User Profiles

Appboy creates a rich user profile on every user who opens one of our customers’ apps. The basic fields for a user may look like this:

{ first_name: “Jane”, email: “”, dob: “1994-10-24”, gender: “F”, country: “DE”, ... }

Appboy clients can also store what we call “custom attributes” on each of their users. As an example, a sports app might want to store a user’s “Favorite Player,” while an e-commerce app might store recent brands purchased, whether or not a customer has a credit card with the vendor, and the customer’s shoe size.

{ first_name: “Jane”, email: “”, dob: 1994-10-24, gender: “F”, custom: { brands_purchased: “Puma and Asics”, credit_card_holder: true, shoe_size: 37, ... }, ... }

A huge benefit to this is that updates to these custom attributes can be inserted directly alongside other updates. Because MongoDB offers flexible schemas, it is very easy to add any number of custom fields and not have to worry about the type (is it a boolean, a string, an integer, float, etc.). MongoDB handles it all, and queries on custom attributes are easy to understand. There are no complicated joins against a value column, as there may be in a relational database where types have to be defined ahead of time.

db.users.find(…).update({$set: {“custom.loyalty_program”:true}}) db.users.find({“custom.shoe_size” : {$gt: 35}})

The downside to this, if you’re not careful, is that in earlier versions of MongoDB it can end up taking up a lot of space if your clients use large custom attribute names (“this_is_my_really_long_custom_attribute_name_it_represents_shoe_size”), or the name of the field may be invalid as a MongoDB field name. Also, because type is not enforced, you can end up with a mismatch of value types across documents. One document might have listed that someone has { visited_website: true }, but if you’re not careful, another may have { visited_website: “yes” }.


To solve the first problem, we tokenize the custom attribute field names using a map. Effectively, it’s a document in MongoDB that maps values such as “shoe_size” to a unique, predictable, very short string. We can generate this map using only MongoDB’s atomic operators.

We use arrays to store items in a map, and the index into the array is its “token.” Each customer has at least 1 document that has an array field called list. When we add a new custom attribute for the first time, we can atomically push it to the end of the list, grab the index (“token”) and then cache that value for fast retrieval later:

db.custom_attribute_map.update({_id: X, list: {$ne: "Favorite Color"}}, {$push: {list: "Favorite Color"}})

may have a list that looks like this:

[“Loyalty Program”, “Shoe Size”, “Favorite Color”]             0            1            2           

MongoDB best practices caution against constantly growing documents, and as we’ve defined it so far, this document can grow unbounded. In practice, we are aware of this potential issue and have extended our implementation slightly to use multiple documents per customer by capping the size of the array. When adding new items to a list, the update operation can be restricted to only $push if the array length is less than a certain size. If the update operation does not result in a new $push, an atomic $findAndModify can be used to create a new document and add the element there. Tokenization definitely adds some indirection and complexity, but it lets us map custom attributes to numbers, which can be passed around throughout the code base.

We can apply this solution to the other problem as well, where we may have mismatched data types across documents. We also use a map for keeping track of data types. For example, recording that “visited_website” is a boolean, and only accepts values true and false.

Part 4: Data Intensive Algorithms

Intelligent Selection and Multi-Arm Bandit Multivariate Testing

The goal of a multivariate test is to, in the shortest period of time, find the variation we are statistically certain that has the highest rate of conversion. In most platforms that provide multivariate tests, customers run tests and check in periodically on the results to determine which one is the winner.

Appboy has a feature called Intelligent Selection, which analyzes the performance of a multivariate test and automatically adjusts the percentage of users that receive each message variant based on a statistical algorithm that makes sure we are adjusting for real performance differences and not just random chance. This algorithm is called the multi-arm bandit.

The math behind the multi-arm bandit algorithm is intense, and much too complicated for this blog post. I will, however, mention a great quote by Peter Whittle, a Professor of Mathematical Statistics at the University of Cambridge, who said in 1979:

“[The bandit problem] was formulated during the [second world] war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage.”

But the reason to bring up this algorithm is to point out that, to run effectively, the multi-arm bandit algorithm takes in a lot of data as inputs. For each message variation, the algorithm considers the unique recipients who received it and the conversion rate, as a timeseries. Here is where MongoDB shines, because we can use pre-aggregated analytics to automatically roll up those stats in real-time:

{ company_id: BSON::ObjectId, campaign_id: BSON::ObjectId, date: 2015-05-31, message_variation_1: { unique_recipient_count: 100000, total_conversion_count: 5000, total_open_rate: 8000, hourly_breakdown: { 0: { unique_recipient_count: 1000, total_conversion_count: 40, total_open_rate: 125, ... }, ... }, ... }, message_variation_2: { ... } }

With a schema such as this, we can quickly look at the daily and hourly breakdown of conversions, opens and sends. Appboy’s schema is slightly more complicated, as there are other factors to consider (such as tracking conversions both by when they happen, and also with respect to when the user received the message), but this is the gist.

Pre-aggregated documents allow us to very quickly pull back the entirety of an experiment. Since we shard this collection per company, we can simultaneously optimize an entire company’s campaigns at once in a scalable fashion.

Intelligent Delivery

Another proprietary algorithm that Appboy offers our customers is called Intelligent Delivery. When scheduling a message campaign to deploy, Appboy analyzes the optimal time to send a message to each user, and delivers it to him or her at that exact right moment. If Alice is more likely to engage with push notifications from an application at night, but Bob is more likely to do so in the morning before he goes to work, they’ll each get notifications at their respective best windows. This feature works wonders. As Jim Davis, the Director of CRM and Interactive Marketing at Urban Outfitters, lauds:

“Comparing overall open rates before and after using it, we've seen over 100% improvement in performance. Our one week retention campaigns targeted at male Urban On members improved 138%. Additionally, engaging a particularly difficult segment, users who have been inactive for three months, has improved 94%.”

This algorithm is certainly data intensive. In order to intelligently predict the best time to send a message to each individual, we need to know a lot of characteristics about that user’s behavior and usage patterns. On top of this, Appboy sends tens of millions of Intelligent Delivery messages every single day, so this predictive technology needs to be blazing fast.

The approach here is similar to Intelligent Selection: we can pre-aggregate dimensions on a per-user basis in real-time. With MongoDB, we have a series of documents per user that look like this:

{ _id: BSON::ObjectId of user, dimension_1: [DateTime, DateTime, …], dimension_2: [Float, Float, …], dimension_3: […], ... }

When dimensional data for a user comes in that we care about, we denormalize it and record a copy on one of these documents. Each document is sharded on {_id: “hashed”} for optimal distribution across shards for both read and write throughput. When we need to send a message with Intelligent Delivery, we can query back a handful of documents very quickly and feed that into our machine-learning algorithm. Not only does MongoDB make this solution possible, but it has been incredibly scalable for us as we’ve added dozens of dimensions as inputs. We’re constantly updating this algorithm with new dimensions as we send more and more Intelligent Delivery messages, and it’s a breeze for our engineers because of MongoDB’s flexible schemas.

Wrap Up

We’ve covered a lot in this blog post! Statistical sampling, random rate limiting, flexible schemas and field tokenization, multivariate testing, multiarm bandit algorithms and Intelligent Delivery. The main takeaways are:

  • MongoDB’s flexible schema makes it easy to add custom dimensions to any document
  • Adding random sampling on top of MongoDB documents enables fast analysis of a large document collection
  • Consolidating data in MongoDB for fast retrieval is a huge win for data-intensive algorithms

If you have a mobile application, we would love to hear from you at Appboy!

About the Author - Jon

Jon Hyman is the Co-Founder and CIO of Appboy, the world’s leading mobile marketing automation platform. He is in charge of building Appboy’s technical systems and infrastructure as well as managing the company’s technical operations.

Appboy is pioneering a new vertical in the marketing automation industry with a powerful technology designed for companies looking to build better relationships with customers through mobile and other emerging channels. With its industry-leading 360-degree customer profiles and audience segmentation engine at its core – coupled with an advanced multi-channel campaign creation and delivery system that automates personalized, life cycle marketing catered to each individual customer’s journey – Appboy empowers marketers to make intelligent, data-driven decisions around how to best engage, retain and monetize customers. Appboy powers some of the most successful brands in the new mobile economy – such as EPIX, iHeartMedia, PicsArt, Samsung and Urban Outfitters – with its thought leadership, relentless innovation and focus on delivering tangible ROI.