China-based Sogou is one of China’s top search engines, and one of the world’s top-100 most heavily trafficked websites. MongoDB helps to make Sogou fast by handling its log and report data.
Macro Huang, a Sogou engineer with years of experience deploying MongoDB at Sogou and his former employer, filled us in on the details.
Sogou uses MongoDB primarily for storing log and report data. The first MongoDB application stores advertising customers’ report data, including page views, cost, clicks, click-through rates, etc. Sogou runs 3 separate clusters to store 3 different types of reports, with each cluster containing a 3-node replica set, together holding over 1 billion documents.
For query purposes , Sogou built a customer index, querying up to 6 of the 17 fields included in each document using a Java driver via a custom ODM. Sogou separates its data into several MongoDB instances based on time range, with one year per database. Because Sogou’s data always has “hot” parts (most recent data being the most hot), the company moves the oldest databases to older machines, as necessary.
Sogou’s second application is a logging system, which the search company uses for storing key operation logs. This application involves a 2-node replica set, with roughly 2 billion documents total. Sogou keeps a maximum of 120 days’ worth of data in its cluster, dumping older data into BSON format and holding it in a Hadoop cluster. This cluster has only one customer index, and up to 10 query fields. Each document has 15 fields excluding the _id. Queries on the application can cross a maximum of 92 days’ worth of data, or 1.5 billion queries with an average response time of sub-1 second, from 500 milliseconds to 1 second, depending on data size.
Sogou was looking for something much faster than a relational database, which could be scaled quickly. Sogou looked at a range of NoSQL alternatives, but determined that MongoDB’s data model and index capacity made it a good fit. It also helped that MongoDB was so easy to learn. While Huang had previous experience with MongoDB, most of his team did not, and come from a RDBMS background. As Huang tells his colleagues, new MongoDB users can be up and running in 10 minutes, and doing real work in 30 minutes. That’s simply not possible with traditional RDBMS options.
The experience so far? Very positive. Performance has been fast, which was Sogou’s primary consideration, and the learning curve for new users is exceptionally short. The search company plans to expand the size of these existing deployments and broaden its use of MongoDB to other applications.