Building a push notification system on a sophisticated data analytics pipeline powered by Apache Kafka, Storm and MongoDB
2015 was an important year for the music industry. It was the first time digital became the primary revenue source for recorded music, overtaking sales of physical formats. Key to this milestone was the revenue generated by streaming services – growing over 45% in a single year.
As with many consumer services, the music streaming market is fragmented across the globe. In India – the 2nd most populous country on the planet and second largest smartphone market – Saavn has grown to become the sub-continent’s largest music service. It has 80m subscribers, experiencing a 9x increase in Daily Active Users (DAU) in just 24 months, with 90% of its streams served to mobile users. There are many factors that collectively have driven Saavn’s growth – but at the heart of it is data. And for this, they rely on MongoDB.
Saavn started out using MongoDB as a persistent cache, replacing an existing memcached layer. They soon realised the versatility and flexibility of the database to serve as the system of record for its data on subscribers, devices, and user activity. It was MongoDB’s flexibility and scalability that proved instrumental to maintain pace with Saavn’s breakneck growth.
Through its extensive collection of music, the company quickly attracted new users to its streaming service, but found engagement often dropped away. It identified that push notifications sent directly to client devices was key to reconnecting with users, and keeping them engaged by serving personalized playlists. At this year’s MongoDB World conference, CTO Sriranjan Manjunath, presented how Saavn has used MongoDB as part of a sophisticated analytics pipeline to drive a 3x increase in user engagement.
As Sriranjan and his team observed, it wasn’t enough to simply broadcast generic notifications to its users. Instead Saavn needed to craft notifications that provided playlists personalized to each user. Saavn built a sophisticated data processing pipeline that uses a scheduler to extract device, activity and user data stored in MongoDB. From there, it computes relevant playlists by analyzing a user’s listening preferences, activity, device, location and more. It then sends the computed recommendations to a dispatcher process that delivers the playlist to each user’s device and inbox. To refine personalizations, all user activity is ingested back into a Kafka queue where it is processed by Apache Storm and written back to MongoDB. Saavn is also expanding its use of artificial intelligence to better predict users interests, and is using MongoDB to store the resultant machine learning models and serve them in real time to the recommender application.
The system currently sends 30m notifications per day, but has been sized to support up to 1m per minute, providing plenty of headroom to support Saavn’s continued growth.
In his presentation, Sriranjan discussed how Saavn migrated from MongoDB 2.6 to MongoDB 3.0, taking advantage of the WiredTiger storage engine’s document level concurrency control to deliver improved performance. He talks about his key learnings in modifying schema design to reflect the differences in how updates are handled by the underlying storage engine, and usage of TTL indexes to automatically expire data from MongoDB . Sriranjan also discusses shard key selection to optimize uniform data distribution across the cluster, and the benefits of using MongoDB Cloud Manager for system monitoring and continuous backups, including integration with Slack for automated alerting to the ops team.
Click through to view Saavn’s presentation from MongoDB World
To learn more about managing real time streaming data, download:
About the author - Mat Keep
Mat is a director within the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.