Leaf in the Wild: Qihoo Scales with MongoDB

Mat Keep


Leaf in the Wild posts highlight real world MongoDB deployments. Read other stories about how companies are using MongoDB for their mission-critical projects.

100+ apps, 1,500+ Instances, 20B Queries per Day

Qihoo is China’s number 1 Android mobile distribution platform. Qihoo is also China’s top malware protection company, providing products for both web and mobile platforms. A MongoDB user since 2011, Qihoo has built over 100 different applications on MongoDB – including new services and migrations from MySQL and Redis – running on 1,500+ instances and supporting 20 billion queries per day.

I had the chance to sit down with Yang Yan Jie, the Senior DBA at Qihoo to learn more about how and why they use MongoDB, his scaling best practices, and recommendations for those getting started with the database.

Can you start by telling us about Qihoo?
Qihoo 360 Technology Co. Ltd. is a leading Chinese Internet company. At the end of June 2014, we had around 500 million monthly active PC Internet users and over 640 million mobile users.

Recognizing malware protection as a fundamental need of all Internet and mobile users, we built our large user base by offering comprehensive, effective and user-friendly Internet and mobile security products and services to protect users' computers and mobile devices against malware and malicious websites. Our products and services are supported by our cloud-based security technology, which we believe is one of the most advanced and robust technologies in the malware protection industry. We monetize our user base primarily through online advertising and Internet value-added services.

In terms of our market position, we are:

  • A top three Internet Company as measured by user base in China
  • No. 1 Android-based mobile distribution platform in China
  • No. 1 provider of Internet and mobile malware protection products and services in China
  • No. 2 PC search engine in China

When did Qihoo start using MongoDB?
We were a very early adopter of MongoDB, building our first applications on the database back in 2011. I think we were using version 1.8 then!

How is Qihoo using MongoDB today?
MongoDB has become our standard modern database platform. We now have over 100 applications powered by MongoDB – both external customer-facing services and internal business applications.

In total we have more than 1,500 MongoDB instances running on our in-house built “HULK” cloud platform, collectively serving 20 billion queries per day.
Three particularly critical applications for our business are:
  1. Location-based mobile search application. We use MongoDB with its geospatial indexes and queries to deliver geo-aware search results to mobile users. The user can be searching for anything, from a local restaurant, to a shop, to a car dealership. The app will detect their location and serve search results based on proximity. MongoDB handles 1.2 billion queries per day from this application.
  2. Caching layer for user authentication data. Qihoo is a central portal for many Chinese Internet users. We have many partners that our users can connect to directly after logging into our site. We provide Single Sign On (SSO) to multiple services so users don’t need to keep providing their security credentials as they navigate around the web. The user’s SSO session is cached in MongoDB for ultra-fast access. MongoDB supports millions of concurrent users, handling 30,000 operations per second and 1.8 billion queries daily.
  3. Log analytics platform. We need to know our infrastructure is running well. Our internal business users also want to measure user engagement with new promotions and campaigns. To accomplish this, we collect log data from all of our Linux, Apache web server and Tomcat servers, and stream it directly into MongoDB. From there, our internal business users can generate real time analytics and reports using our PHP-based Business Intelligence (BI) platform. MongoDB stores 2.5 billion documents at any one time across 18 shards configured with 3-node replica sets for always-on availability. MongoDB serves nearly 3 billion queries per day, including 1 billion writes.
What other databases do you use?
MongoDB is one of the three database technologies used in our company. It isn’t necessarily suitable for all applications, so we also use MySQL for relational data problems and Redis for certain caching use-cases. Over time, we have migrated more than a dozen projects from MySQL and Redis to MongoDB.

What factors drove this migration?
Our goal is to use the best technology where it best fits. In the case of MySQL, migration was driven by scalability and developer productivity. As a relational database, MySQL does not scale out, so as our user base grew above 100 million active users, we hit the limits of how far we could push MySQL. MongoDB auto-sharding allows us to scale on-demand using commodity hardware.

The MongoDB data model is also far more flexible. Our developers can get more done and iterate faster with MongoDB than they can with the relational model.

In the case of Redis, the migrations were driven by cost and flexibility. We found that MongoDB meets our low latency caching requirements for many applications, while it’s on-disk persistence reduces the need to provision costly systems configured with high-memory footprints. In addition, there is much more you can do with MongoDB’s document data model than you can with Redis’ Key-Value model. This translates directly to richer application functionality. For applications where data volumes are expected to grow rapidly, we choose MongoDB over Redis.

Tell us about the platforms you are running MongoDB on.
Most of our applications are PHP based. We run CentOS on x86 hardware. We have standardized on local SSD storage as this gives us the best performance. We are running MongoDB 2.4 and the latest 2.6 releases. We are also looking forward to MongoDB 3.0!

How is MongoDB configured?
We run both single replica sets and sharded clusters, depending on the application. We have data centres across the country, with the main ones located in Beijing. We deploy MongoDB on our private cloud across multiple data centers, both for disaster recovery and for low latency local reads and writes.

We don’t control our own fiber, so network quality is out of our control. For the most critical apps, we spin up identical MongoDB clusters in multiple data centers and use our own message queue to replicate between them – this gives us assurance of maintaining availability in the face of network partitions.

How do you manage your MongoDB deployment?
We have developed a centralized orchestration web platform, which we call the HULK cloud platform. It is used by nearly all of our technical engineers to control our mission critical infrastructure and services. It is a complex piece of engineering which we are very proud of. When we originally started the cloud platform project, we hoped it would allow our engineers to stand on the shoulders of giants, relying on the platform to speed up the time to market for their applications. Hence we named it “HULK”.

HULK currently provides elastic services such as Web, relational database, NoSQL and distributed storage, etc. At same time, the open platform concept attracted various internal teams to move their applications onto the platform. The re-platforming of these applications provided immediate access to other LoBs internally, and in the process of doing that we helped the business groups to attain higher efficiency and greater technology expertise.

MongoDB is one of the most critical services on HULK and it is fully integrated into the platform with a high degree of automation, allowing us to operate more than 1,500 MongoDB instances with just one and a half DBAs. The DBAs can perform “one click deployment” and “one click upgrade” tasks via the HULK management interface. All backup and monitoring is fully automated. For instance, if you add a new MongoDB node or cluster, HULK automatically configures the monitoring and backup strategy, as well as deploy the necessary agents. For developers, they can monitor a multitude of MongoDB metrics and status. In addition, they can open a ticket right on the management portal itself, instead of using email or IM, all with a few mouse clicks.

How do you backup MongoDB?
We use a combination of approaches, governed by the application’s RPO and RTO objectives:

  • Filesystem backups. This is the default approach. We shut down a secondary replica set member and snapshot the filesystem image
  • Incremental replication. For continuous backup, we have built a tool that tails the MongoDB oplog. We use this approach for more critical apps where we need faster restoration of service
  • Delayed replicas. We use this approach for additional assurances, again governed by how quickly we need to bring the data back

Can you share any best practices on scaling your MongoDB infrastructure?
There are three tips I would like to share:

  1. From a DBA perspective, invest time to understand application usage. The developers will give their guidance, but we generally take any number they give us and add 50%!
  2. If you encounter performance issues, start with your hardware. We found upgrading from hard disks to SSDs gave us an instant performance boost without any other optimizations.
  3. For highly dynamic, write-intensive workloads, make sure you monitor storage fragmentation and compact regularly if needed.

Are you measuring the impact of MongoDB on your business?
Yes – in terms of time to market. An example of the impact this makes is our reaction to the 2014 earthquake in Yunnan province. Everyone in China wanted to have access to the latest updates and to be able to check in on friends and family in the region. The business felt the best way to do this was to build an app that verified and then consolidated newsfeeds from multiple sources.

We designed the app in the morning after the earthquake, coded it in the afternoon and launched it in the evening. One business day from concept to production. Only MongoDB could support that velocity of development.

Are you looking forward to MongoDB 3.0?
We started testing MongoDB 3.0 and filing bugs as soon as we could get our hands on the first Release Candidate.

We are especially excited about document level concurrency control. This will further improve write scaling and fully saturate the latest generation of dense multi-core systems we are using now. Compression is also a huge benefit for us. We have standardized on SSDs, so compression means we can pack more onto each drive, which will bring costs down. It will also give us another performance boost as fewer bits are read from disk, making better use of disk I/O cycles.

What advice would you give to those considering using MongoDB for their next project?
MongoDB’s document data model and dynamic schema bring great flexibility and power. But they also bring great responsibility! I’d recommend not storing multitudes of different document types and formats within a single collection as it makes ongoing application maintenance complex. Split out documents of different types and structures into their own collections. We have implemented tools that scan and sample documents from each collection. If variances in structure exceed our best practices, we alert the devs so they can go and address the issue. So that is where I’d start.

Mr. Yang – I’d like to thank you for taking the time to share your insights with the MongoDB community.

Struggling to scale your relational database? Download our Migration White Paper:
Migration White Paper

About the Author - Mat Keep

Mat is part of the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.