MongoDB at Baidu: Powering 100+ Apps Across 600 Nodes at PB Scale

Mat Keep

#scale#World 2016#Business

Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards

In one of the busiest sessions at this year’s MongoDB World user conference, Xiao Beibei, DevOps engineer at Baidu, provided insight into what has grown to become the largest MongoDB deployment in Greater China.

As China’s largest internet services company, and the 4th most trafficked website on the planet based on Alexa rankings, everything Baidu does is “at scale.”

Starting Out

Baidu started out with MongoDB in 2012, initially migrating its user address book service from MySQL after hitting performance and scalability limits. MongoDB’s distributed design provided the scale required to serve a growing user base; the native JSON data model gave the needed performance improvements along with a dynamic schema for fast application evolution. The address book service soon grew to over 300 million users, and served as a catalyst to widespread MongoDB adoption across the company.

MongoDB Today

MongoDB now powers over 100 different Baidu products and services, including:

  • Messaging
  • Photo sharing and facial recognition
  • Metadata storage for the Baidu Personal Cloud, with each of its 300M+ users allocated 2 TB of free storage
  • Baidu Maps
  • Social forums
  • User activity logs

Many more projects are currently under development.

Running at Scale

Baidu has deployed a single, multi-tenant MongoDB cluster to powers its applications, which access MongoDB using a shared REST API layer. The cluster has grown today to 160 shards deployed on 600 SSD-equipped nodes, and distributed across multiple locations for disaster recovery. It stores 200 billion documents with up to 5 replica set members, and currently manages over 1 PB of data.

We need a strong & scalable database architecture. MongoDB is fantastic!

Xiao Beibei, Senior Developer, Baidu

Most instances are now running MongoDB 3.0 with the WiredTiger storage engine, while new projects are rolling out on the latest MongoDB 3.2 release, taking advantage of new features such as document validation for data governance controls.

Next Steps

In his presentation, Beibei shared his experiences with MongoDB, challenges he has encountered along the way, and the solutions his engineers have implemented. He also looks forward to the additional enhancements in MongoDB’s sharding architecture planned for the 3.4 release. Chunk balancing improvements make data redistribution across the cluster faster and more responsive as data sets continue to grow, further supporting the extreme scale demanded by Baidu.

View the Presentation from MongoDB World

Download the MongoDB Architecture Guide for a deep dive into MongoDB’s design for modern, scalable applications