Leaf in the Wild: India’s Largest Publisher Unlocks Behavioral Insight with MongoDB-Powered Real Time Web Analytics Engine
February 11, 2016 | Updated: December 8, 2016
Leaf in the Wild posts highlight real world MongoDB deployments. Read other stories about how companies are using MongoDB for their mission-critical projects.
Times Internet Limited, the largest news publisher in India, relies on MongoDB to power its editorial analytics engine, serving more than 150 million readers with customized content and experiences. I had the chance to meet with Gagan Bajpai and Gyan Mittal, Senior Managers, Central Technical Team at Times Internet Limited (TIL) to learn more about how they use MongoDB.
“We don’t ask, ‘Why MongoDB?’ anymore. Now we ask, “Why would we use anything else?’” Gyan Mittal, Senior Manager Central Technical Team, Times Internet Limited.
Please start by telling us a little bit about Times Internet.
Times of India Group is India's largest media and entertainment company. All of its digital platforms are run by TIL. Our websites are among the fastest growing Web and Mobile properties worldwide. Since its inception in 1999 Times Internet has led the Internet revolution in India, emerging as India's foremost web entity, and now running diverse portals and specialized websites.
TIL properties reach over 150M visitors and serve 2 billion page views every month across web and mobile channels. We have brands across news, entertainment, sports, e-commerce, classifieds, startup investments, local partnerships, and more. Today, we have a diversified set of 22+ digital consumer-facing businesses.
Tell us how you use MongoDB.
We use MongoDB for a range of applications. This includes the content management system journalists use to upload stories, e-commerce gateways, newsletter and alerts apps, and the social platform that covers all of our web properties. We are also using MongoDB Cloud Manager for on-demand scaling and automated backups for disaster recovery.
Our web analytics engine is the most critical and highest-profile application running on MongoDB. It was the first MongoDB project in TIL, and we launched it to the business in late 2010. Our editorial staff and product managers rely on the analytics engine to deepen engagement with our 150M+ audience. The engine tracks and analyses user engagement with every published story, providing feedback on how content is consumed through heat-maps and analytics dashboards. Site editors gain insights into the length of time spent per page, how content is shared across social networks, and where readers focus their attention. The analytics generated by MongoDB enable editorial staff to make data-driven decisions, improving future content to better address reader preferences, including tweaking headlines, moving copy, A/B testing of alternative images, and altering page layouts. TIL’s engine also provides personalized content recommendations based on reader’s browsing habits. Collectively, these capabilities ensure the sites’ articles are reaching and engaging with the broadest possible audience.
Did you consider other databases for your app? What made you select MongoDB?
My team at TIL all come from a relational database background and have massive respect for that technology. But our web analytics application presented us with a classic “big data” problem:
- We had to deal with large volumes of data generated by monitoring user activity tracking how content is consumed on our site.
- This data was coming in at high velocity from millions of concurrent users.
- We capture many different attributes of user behavior, so our database and analytics engine needs to handle wide variety of data structures.
In addition, development agility was critical. To put this into context, Internet growth in India is much faster than pretty much anywhere else in the world. We have a huge population who are now getting access to the Internet via low-cost mobile platforms. So competition is intense, and time to market is critical. We also knew that our application would need to continually evolve to keep pace with features the business would ask us to add. So a flexible and dynamic schema was also critical to give us the agility we needed. Working with a data model that eliminates the traditional object-relational impedance mismatch would allow our developers to move with much higher velocity in building the app.
Because of all of these factors, we felt a non-relational database would be a better fit for the web analytics app.
That said, what we love about relational databases is the ability to run deep and complex queries against the data. And this is also where MongoDB excels. Unlike NoSQL databases that require you to integrate a search engine or replicate data to dedicated analytics nodes or Hadoop, MongoDB enables us to run rich queries against in-place data, all in real time. MongoDB’s aggregation pipeline powers our heat maps and dashboards. It is much more performant and easier to use than MapReduce.
The MongoDB query language and secondary indexes give us a much more powerful framework to access and analyze multi-structured data than anything simple key-value stores can provide.
Developer velocity. That’s what I am focused on. How fast can we get this robust application live in the shortest amount of time? Our team built the analytics engine in a fraction of the time it would have taken on any other database and then it scaled beautifully to help us understand and engage with millions of readers.
Please describe your MongoDB deployment.
Our total MongoDB estate is around 50 nodes, powering multiple apps. Most apps are powered by a single replica set configured with two data nodes and an arbiter. This provides the ideal balance between high availability and operational efficiency.
Our web analytics platform is deployed on a sharded cluster. This gives us the scalability we need. We have around 1.5TB of active data in the cluster. The application itself is written in Java.
We run MongoDB on Linux-based servers hosted by Rackspace in a co-location facility.
Do you use any commercial services to support your MongoDB deployment?
We use MongoDB Professional to back the web analytics platform. Break/fix support is important, but as our deployment and our team grows, it’s good to be able to get regular check-ins with MongoDB engineers, and review things like schema design and best practices for operational processes.
As our deployment has grown, we are also now starting to evaluate the MongoDB Cloud Manager. Automated configuration and deployment can simplify on-demand scaling and upgrades, and the backup service enhanced our disaster recovery capability.
What has been the business outcome of using MongoDB for your web analytic engine?
We have demanding managers and editors looking to understand quickly how our readers are engaging with the news.
MongoDB is the solution that helps us turn heavy raw data into actionable insights that fundamentally change the way we deliver content.
Do you have plans to use MongoDB for other applications?
We don’t ask, ‘Why MongoDB?’ anymore. Now we ask, ‘Why would we use anything else?’.
What advice would you give someone who is considering using MongoDB for their next project?
Don’t just follow the crowd. Don’t just choose the same technology you have always chosen. There is so much innovation happening today, and the databases of the last decade are not always the right choice.
Once you have a short list of potential technologies, test them with your app, your queries, and your data. It is the only way to be sure you are choosing the right technology going forward.
Gyan and Gagan, thank you both for your time, and sharing your experiences with the MongoDB community.
Are you building big data applications? Read Big Data Examples and Guidelines to get started.
About the Author - Mat Keep
Mat is a director within the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.