Multi-Tenant Cluster Storing 200+ Billion Documents across 160 Shards
In one of the busiest sessions at this year’s MongoDB World user conference, Xiao Beibei, DevOps engineer at Baidu, provided insight into what has grown to become the largest MongoDB deployment in Greater China.
As China’s largest internet services company, and the 4th most trafficked website on the planet based on Alexa rankings, everything Baidu does is “at scale.”
Baidu started out with MongoDB in 2012, initially migrating its user address book service from MySQL after hitting performance and scalability limits. MongoDB’s distributed design provided the scale required to serve a growing user base; the native JSON data model gave the needed performance improvements along with a dynamic schema for fast application evolution. The address book service soon grew to over 300 million users, and served as a catalyst to widespread MongoDB adoption across the company.
MongoDB now powers over 100 different Baidu products and services, including:
- Photo sharing and facial recognition
- Metadata storage for the Baidu Personal Cloud, with each of its 300M+ users allocated 2 TB of free storage
- Baidu Maps
- Social forums
- User activity logs
Many more projects are currently under development.
Running at Scale
Baidu has deployed a single, multi-tenant MongoDB cluster to powers its applications, which access MongoDB using a shared REST API layer. The cluster has grown today to 160 shards deployed on 600 SSD-equipped nodes, and distributed across multiple locations for disaster recovery. It stores 200 billion documents with up to 5 replica set members, and currently manages over 1 PB of data.
We need a strong & scalable database architecture. MongoDB is fantastic!
Xiao Beibei, Senior Developer, Baidu
Most instances are now running MongoDB 3.0 with the WiredTiger storage engine, while new projects are rolling out on the latest MongoDB 3.2 release, taking advantage of new features such as document validation for data governance controls.
In his presentation, Beibei shared his experiences with MongoDB, challenges he has encountered along the way, and the solutions his engineers have implemented. He also looks forward to the additional enhancements in MongoDB’s sharding architecture planned for the 3.4 release. Chunk balancing improvements make data redistribution across the cluster faster and more responsive as data sets continue to grow, further supporting the extreme scale demanded by Baidu.
Download the MongoDB Architecture Guide for a deep dive into MongoDB’s design for modern, scalable applications