MongoDB and Apache Spark at China Eastern Airlines: Delivering 100x Performance Improvements
New MongoDB Connector for Apache Spark Enables New Fare Calculation Engine, Supporting 180m Fares and 1.6 billion Queries per Day, Migrated off Oracle. As one of the world’s largest airlines, China Eastern constantly explores emerging technologies to identify new ways of improving customer experience and reducing cost. Earlier this year, I interviewed its engineering leaders to learn more about the migration of China Eastern’s flight search platform from Oracle to MongoDB . As a result of the migration, China Eastern achieved orders-of-magnitude performance improvements, and enabled the delivery of application features that had never before been possible. The next stage of China Eastern’s initiative to transform user experience has been to address airline fare calculations, and serve them reliably, at scale, to the migrated search application. At this year’s MongoDB World user conference, Chong Huang, Lead Architect at China Eastern Airlines, presented details on their new fare calculation engine built with Apache Spark and MongoDB to New MongoDB Connector for Apache Spark Enables New Fare Calculation Engine, Supporting 180m Fares and 1.6 billion Queries per Day, Migrated off Oracle support more than 1.6 billion queries per day. The Challenge Based on averages used across the airline industry, 12,000 searches are needed to generate a single ticket sale. China Eastern currently sells 260,000 airline seats every day, and is targeting 50% of those to come from its online web and online channels. Selling 130,000 seats equates to 1.6 billion searches and fare requests per day, or just under 20,000 searches a second. However, the current fare calculation system built on the Oracle database supports only 200 searches per second. China Eastern’s engineers realized they needed to radically re-architect their fare engine to meet the required 100x growth in search traffic. The Solution Rather than calculate fares for each search in real time, China Eastern decided to take a different approach. In the new system, they pre-compute fares every night, and then load those results to the database for access by the search application. The flight inventory data is loaded into an Apache Spark cluster, which calculates fares by applying business rules stored in MongoDB. These rules compute fares across multiple permutations, including cabin class, route, direct or connecting flight, date, passenger status, and more. In total, 180 million prices are calculated every night! The results are then loaded into MongoDB where they are accessed by the search application. Why Apache Spark? The fare engine demands complex computations against large volumes of flight inventory data. Apache Spark provides an in-memory data processing engine that can be distributed and parallelized across multiple nodes to accelerate processing. Testing confirmed linear scalability as CPU cores were added to the cluster. As a Java shop, Apache Spark’s Java API also provides development simplicity for the engineering team. Why MongoDB? China Eastern had already implemented a successful project with MongoDB. They knew the database would scale to meet the throughput and latency needs of the fare engine, and they could rely on expert support from MongoDB engineers. The database’s flexible data model would allow multiple fares to be stored for each route in a single document, accessed in one round trip to the database, rather than having to JOIN many tables to serve fares from their Oracle database. Figure 1: Fare Engine Architecture Why the MongoDB Connector for Apache Spark? Just as the project was starting, MongoDB announced the new Databricks-certified MongoDB Connector for Apache Spark . Using the Connector, Spark can directly read data from the MongoDB collection and turn it into a Spark Resilient Distributed Dataset (RDD), against which transformations and actions can be performed. Internally the Connector uses the splitVector command to create chunk splits, and each chunk can then be assigned to one Spark worker for processing. Data locality awareness in the Connector ensures RDDs are co-located with the associated MongoDB shard, thereby minimizing data movement across the network and reducing latency. Once the transformations and actions have been completed on the RDD, the results can be written back to MongoDB. The Implementation Based on performance benchmarking, a cluster of less than 20 Red Hat Linux servers (comprising app servers, and the Spark and MongoDB cluster) are required to meet the demands of 180 million fares and 1.6 billion daily searches. China Eastern is using Apache Spark 1.6 against the latest MongoDB 3.2 release, provisioned with Ops Manager for operational automation. Testing has shown each node in the cluster scaling linearly, and delivering 15x higher performance at 10x lower latency than the previous Oracle based system. Next Steps In his MongoDB World presentation, Mr. Huang provides more insight into the platform built with Apache Spark and MongoDB. He shares detailed steps and code samples that show how to download and setup the Spark cluster, how to configure the MongoDB Connector for Apache Spark, the process for submitting a job, and lessons learned along the way to optimize performance. View the slides from MongoDB World Download our new whitepaper for examples and guidance on turning analytics into real-time action with Apache Spark and MongoDB. About the Author - Mat Keep Mat is a director within the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.
Leaf in the Wild: Qihoo Scales with MongoDB
Leaf in the Wild posts highlight real world MongoDB deployments. Read other stories about how companies are using MongoDB for their mission-critical projects. 100+ apps, 1,500+ Instances, 20B Queries per Day Qihoo is China’s number 1 Android mobile distribution platform. Qihoo is also China’s top malware protection company, providing products for both web and mobile platforms. A MongoDB user since 2011, Qihoo has built over 100 different applications on MongoDB – including new services and migrations from MySQL and Redis – running on 1,500+ instances and supporting 20 billion queries per day. I had the chance to sit down with Yang Yan Jie, the Senior DBA at Qihoo to learn more about how and why they use MongoDB, his scaling best practices, and recommendations for those getting started with the database. Can you start by telling us about Qihoo? Qihoo 360 Technology Co. Ltd. is a leading Chinese Internet company. At the end of June 2014, we had around 500 million monthly active PC Internet users and over 640 million mobile users. Recognizing malware protection as a fundamental need of all Internet and mobile users, we built our large user base by offering comprehensive, effective and user-friendly Internet and mobile security products and services to protect users' computers and mobile devices against malware and malicious websites. Our products and services are supported by our cloud-based security technology, which we believe is one of the most advanced and robust technologies in the malware protection industry. We monetize our user base primarily through online advertising and Internet value-added services. In terms of our market position, we are: A top three Internet Company as measured by user base in China No. 1 Android-based mobile distribution platform in China No. 1 provider of Internet and mobile malware protection products and services in China No. 2 PC search engine in China When did Qihoo start using MongoDB? We were a very early adopter of MongoDB, building our first applications on the database back in 2011. I think we were using version 1.8 then! How is Qihoo using MongoDB today? MongoDB has become our standard modern database platform. We now have over 100 applications powered by MongoDB – both external customer-facing services and internal business applications. In total we have more than 1,500 MongoDB instances running on our in-house built “HULK” cloud platform, collectively serving 20 billion queries per day. Three particularly critical applications for our business are: Location-based mobile search application. We use MongoDB with its geospatial indexes and queries to deliver geo-aware search results to mobile users. The user can be searching for anything, from a local restaurant, to a shop, to a car dealership. The app will detect their location and serve search results based on proximity. MongoDB handles 1.2 billion queries per day from this application. Caching layer for user authentication data. Qihoo is a central portal for many Chinese Internet users. We have many partners that our users can connect to directly after logging into our site. We provide Single Sign On (SSO) to multiple services so users don’t need to keep providing their security credentials as they navigate around the web. The user’s SSO session is cached in MongoDB for ultra-fast access. MongoDB supports millions of concurrent users, handling 30,000 operations per second and 1.8 billion queries daily. Log analytics platform. We need to know our infrastructure is running well. Our internal business users also want to measure user engagement with new promotions and campaigns. To accomplish this, we collect log data from all of our Linux, Apache web server and Tomcat servers, and stream it directly into MongoDB. From there, our internal business users can generate real time analytics and reports using our PHP-based Business Intelligence (BI) platform. MongoDB stores 2.5 billion documents at any one time across 18 shards configured with 3-node replica sets for always-on availability. MongoDB serves nearly 3 billion queries per day, including 1 billion writes. What other databases do you use? MongoDB is one of the three database technologies used in our company. It isn’t necessarily suitable for all applications, so we also use MySQL for relational data problems and Redis for certain caching use-cases. Over time, we have migrated more than a dozen projects from MySQL and Redis to MongoDB. What factors drove this migration? Our goal is to use the best technology where it best fits. In the case of MySQL, migration was driven by scalability and developer productivity. As a relational database, MySQL does not scale out, so as our user base grew above 100 million active users, we hit the limits of how far we could push MySQL. MongoDB auto-sharding allows us to scale on-demand using commodity hardware. The MongoDB data model is also far more flexible. Our developers can get more done and iterate faster with MongoDB than they can with the relational model. In the case of Redis, the migrations were driven by cost and flexibility. We found that MongoDB meets our low latency caching requirements for many applications, while it’s on-disk persistence reduces the need to provision costly systems configured with high-memory footprints. In addition, there is much more you can do with MongoDB’s document data model than you can with Redis’ Key-Value model. This translates directly to richer application functionality. For applications where data volumes are expected to grow rapidly, we choose MongoDB over Redis. Tell us about the platforms you are running MongoDB on. Most of our applications are PHP based. We run CentOS on x86 hardware. We have standardized on local SSD storage as this gives us the best performance. We are running MongoDB 2.4 and the latest 2.6 releases. We are also looking forward to MongoDB 3.0! How is MongoDB configured? We run both single replica sets and sharded clusters, depending on the application. We have data centres across the country, with the main ones located in Beijing. We deploy MongoDB on our private cloud across multiple data centers, both for disaster recovery and for low latency local reads and writes. We don’t control our own fiber, so network quality is out of our control. For the most critical apps, we spin up identical MongoDB clusters in multiple data centers and use our own message queue to replicate between them – this gives us assurance of maintaining availability in the face of network partitions. How do you manage your MongoDB deployment? We have developed a centralized orchestration web platform, which we call the HULK cloud platform. It is used by nearly all of our technical engineers to control our mission critical infrastructure and services. It is a complex piece of engineering which we are very proud of. When we originally started the cloud platform project, we hoped it would allow our engineers to stand on the shoulders of giants, relying on the platform to speed up the time to market for their applications. Hence we named it “HULK”. HULK currently provides elastic services such as Web, relational database, NoSQL and distributed storage, etc. At same time, the open platform concept attracted various internal teams to move their applications onto the platform. The re-platforming of these applications provided immediate access to other LoBs internally, and in the process of doing that we helped the business groups to attain higher efficiency and greater technology expertise. MongoDB is one of the most critical services on HULK and it is fully integrated into the platform with a high degree of automation, allowing us to operate more than 1,500 MongoDB instances with just one and a half DBAs. The DBAs can perform “one click deployment” and “one click upgrade” tasks via the HULK management interface. All backup and monitoring is fully automated. For instance, if you add a new MongoDB node or cluster, HULK automatically configures the monitoring and backup strategy, as well as deploy the necessary agents. For developers, they can monitor a multitude of MongoDB metrics and status. In addition, they can open a ticket right on the management portal itself, instead of using email or IM, all with a few mouse clicks. How do you backup MongoDB? We use a combination of approaches, governed by the application’s RPO and RTO objectives: Filesystem backups. This is the default approach. We shut down a secondary replica set member and snapshot the filesystem image Incremental replication. For continuous backup, we have built a tool that tails the MongoDB oplog. We use this approach for more critical apps where we need faster restoration of service Delayed replicas . We use this approach for additional assurances, again governed by how quickly we need to bring the data back Can you share any best practices on scaling your MongoDB infrastructure? There are three tips I would like to share: From a DBA perspective, invest time to understand application usage. The developers will give their guidance, but we generally take any number they give us and add 50%! If you encounter performance issues, start with your hardware. We found upgrading from hard disks to SSDs gave us an instant performance boost without any other optimizations. For highly dynamic, write-intensive workloads, make sure you monitor storage fragmentation and compact regularly if needed. Are you measuring the impact of MongoDB on your business? Yes – in terms of time to market. An example of the impact this makes is our reaction to the 2014 earthquake in Yunnan province. Everyone in China wanted to have access to the latest updates and to be able to check in on friends and family in the region. The business felt the best way to do this was to build an app that verified and then consolidated newsfeeds from multiple sources. We designed the app in the morning after the earthquake, coded it in the afternoon and launched it in the evening. One business day from concept to production. Only MongoDB could support that velocity of development. Are you looking forward to MongoDB 3.0? We started testing MongoDB 3.0 and filing bugs as soon as we could get our hands on the first Release Candidate. We are especially excited about document level concurrency control. This will further improve write scaling and fully saturate the latest generation of dense multi-core systems we are using now. Compression is also a huge benefit for us. We have standardized on SSDs, so compression means we can pack more onto each drive, which will bring costs down. It will also give us another performance boost as fewer bits are read from disk, making better use of disk I/O cycles. What advice would you give to those considering using MongoDB for their next project? MongoDB’s document data model and dynamic schema bring great flexibility and power. But they also bring great responsibility! I’d recommend not storing multitudes of different document types and formats within a single collection as it makes ongoing application maintenance complex. Split out documents of different types and structures into their own collections. We have implemented tools that scan and sample documents from each collection. If variances in structure exceed our best practices, we alert the devs so they can go and address the issue. So that is where I’d start. Mr. Yang – I’d like to thank you for taking the time to share your insights with the MongoDB community. Struggling to scale your relational database? Download our Migration White Paper: Migration White Paper About the Author - Mat Keep Mat is part of the MongoDB product marketing team, responsible for building the vision, positioning and content for MongoDB’s products and services, including the analysis of market trends and customer requirements. Prior to MongoDB, Mat was director of product management at Oracle Corp. with responsibility for the MySQL database in web, telecoms, cloud and big data workloads. This followed a series of sales, business development and analyst / programmer positions with both technology vendors and end-user companies.
Leaf in the Wild：MongoDB 助力奇虎扩展
100 多款应用、1500 多个实例、每天 200 亿次查询 奇虎是中国首屈一指的 Android 移动分发平台。奇虎也是中国顶尖的恶意软件防护公司，同时为 Web 和移动平台提供产品。自从 2011 年成为 MongoDB 的用户以来，奇虎已在 MongoDB 上生成了 100 多种不同的应用程序（包括新服务以及从 MySQL 和 Redis 迁移的服务），在 1500 多个实例上运行，并且支持每天 200 亿次查询。 我有机会能够与奇虎高级 DBA 杨艳杰进行交流，详细了解他们使用 MongoDB 的方式及原因、他们的扩展最佳实践以及为那些刚开始使用 MongoDB 的用户提供的建议。 请您先向我们介绍一下奇虎。 奇虎 360 科技有限公司是中国领先的互联网公司。在 2014 年 6 月末，我们已经拥有了大约 5 亿的月活跃电脑互联网用户以及超过 6.4 亿的移动用户。 将恶意软件防护视为所有互联网及移动用户的基本需求后，我们通过提供全方位高效、人性化的互联网和移动安全产品及服务来保护用户的计算机及移动设备，以防范恶意软件及网站攻击，最终形成了庞大的用户群。我们的产品及服务由基于云的安全技术支持，在我们看来，这项技术是在恶意软件防护产业最先进和最可靠的技术之一。我们通过为用户群提供在线广告以及互联网增值服务来盈利。 在市场地位方面，我们是： 中国前三的互联网公司（就用户基数而言） 中国最大的基于 Android 的移动分发平台 中国最大的互联网及移动恶意软件防护产品及服务提供商 中国第二的 PC 搜索引擎 奇虎从什么时候开始使用 MongoDB？ 我们很早就开始使用 MongoDB，2011 年就已经在 MongoDB 上生成了第一个应用程序。我想，那个时候我们使用的应该是版本 1.8。 如今奇虎如何使用 MongoDB？ MongoDB 已经成为我们标准的现代数据库平台。我们现在有 100 多个应用程序均由 MongoDB 提供支持，其中包括对外面向用户的服务以及内部业务应用程序。 总的说来，在我们内部搭建的“HULK”云平台上运行着 1500 多个 MongoDB 实例以及每天共计 200 亿次查询。 我们业务中三个特别关键的应用程序为： 基于位置的移动搜索应用程序。 我们通过使用 MongoDB 的地理空间索引及查询来向移动用户提供基于地理的搜索结果。用户有可能搜索任何内容：从本地餐馆、商店，到汽车经销商。此应用将检测到用户位置，然后根据距离远近为用户提供搜索结果。在这个应用程序中，MongoDB 每天都会处理 12 亿次查询。 用户身份认证数据的缓存层。 对于许多中国互联网用户而言，奇虎是一个核心门户。在登录我们的网站之后，用户可以直接连接到许多合作伙伴网站。我们为用户提供了多重服务单一登录 (SSO)，因此在浏览相关网页时，用户无需反复提供安全凭据。为更快地访问，用户的 SSO 会话会缓存在 MongoDB 中。MongoDB 可以支持数百万并发用户，每秒可以处理 3 万个操作，每天处理 18 亿次查询。 日志分析平台。 我们需要了解基础结构是否运行良好。此外，内部业务人员也希望通过新的促销策略和活动来衡量用户参与度。为实现上述目的，我们从所有的 Linux、Apache Web 服务器以及 Tomcat 服务器上收集日志数据，并将其直接流式传输到 MongoDB 中。从上述数据中，我们的内部业务人员可以使用基于 PHP 的商业智能 (BI) 平台生成实时的分析及报表。在任一时刻，MongoDB 都在 18 个分片中存储着 25 亿份文档，并且通过三个节点的复制集配置保证了实时的可获取性。MongoDB 每天都会处理将近 30 亿次查询，包括 10 亿次写操作。 你们还使用了其他什么数据库？ MongoDB 是我们在公司中使用的三种数据库之一。但它不一定适合所有应用领域，所以我们也使用 MySQL 处理关系型数据问题，使用 Redis 处理某些缓存用例。随着时间的推移，我们已经将十几个项目从 MySQL 和 Redis 迁移到 MongoDB 中。 是什么因素推动了这些迁移？ 我们的目标是将最好的技术用在最适合的地方。就 MySQL 而言，迁移原因是扩展性和开发人员生产力。作为一个关系型数据库，MySQL 无法横向扩展。因此，随着用户群增长到 1 亿活跃用户时，我们就达到了 MySQL 可以承载的上限。MongoDB 自动分片支持我们使用商用硬件按需扩展。 MongoDB 数据模型也更加灵活。相较于关系模型，使用 MongoDB 能够让我们的开发人员完成更多任务，实现更快的迭代开发。 对于 Redis 而言，迁移原因是成本及灵活性。我们发现 MongoDB 能够满足许多应用程序的低延迟缓存要求，同时它的磁盘暂留使我们无需部署配有高内存占用量的高成本系统。此外，相对于 Redis 的“键值”模型，我们可以使用 MongoDB 的文档数据模型实现更多功能。这可直接转换为更丰富的应用程序功能。对于数据量会快速增加的应用程序，我们会选择 MongoDB 而非 Redis。 请介绍一下你们运行 MongoDB 的平台。 我们的大部分应用程序都基于 PHP。我们在 x86 硬件上运行 CentOS。我们对本地 SSD 存储进行了标准化，因为这会给我们带来最佳性能。我们运行的是 MongoDB 2.4 以及最新的 2.6 版本。我们也非常期待 MongoDB 3.0。 你们如何配置 MongoDB？ 我们运行了单复制集和分片集群，具体视应用程序而定。我们的数据中心分布在全国各地，主要的数据中心在北京。为了提高灾难恢复速度并降低本地读写延迟，我们将 MongoDB 部署到多个数据中心的私有云上。 我们并未控制自己的光纤，因此网络质量不可控制。对于最关键的应用，我们在多个数据中心内部署相同集群，并使用我们自己的消息队列在它们之间进行复制，这可保证在面临网络分区时维持可用性。 你们如何管理 MongoDB 部署？ 我们开发了名为 HULK 云平台的集中式业务流程 Web 平台。我们几乎所有的技术工程师都用它来控制关键任务型基础结构和服务。它是一项复杂的工程，我们对此深感骄傲。最初启动云平台项目时，我们希望它能够让我们的工程师站在巨人的肩膀上成长，借助此平台将应用程序加速推向市场。因此我们我们将其命名为“HULK”（美国漫画超级英雄）。 HULK 目前提供了 Web、关系型数据库、NoSQL 及分布式存储等弹性服务。同时，开放平台的理念吸引了各内部团队将他们的应用程序移动到该平台上。为这些应用程序重新改造平台可支持各团队从内部直接访问其他 LoB，并且在此过程中我们可以帮助业务组在效率和技术专长方面得到很大提升。 MongoDB 是 HULK 中最重要的服务之一，并且它已完全集成到高度自动化的平台，让我们可以仅使用 1.5 个 DBA 即可操作 1500 多个 MongoDB 实例。DBA 能够通过 HULK 管理界面执行“一键部署”和“一键升级”任务。所有备份和监视完全自动执行。例如，当你添加了一个新的 MongoDB 节点或集群后，HULK 将自动配置监视和备份策略，并且部署所需代理。而对于开发人员而言，他们可以监视大量的 MongoDB 指标和状态。此外，他们已不再需要使用电子邮件或者 IM 进行沟通，只需要在管理门户上打开一个工单，点几下鼠标即可。 你们如何备份 MongoDB？ 我们结合使用了一系列方法，具体由应用程序的 RPO 和 RTO 目标决定： 文件系统备份。这是默认方法。我们将关闭一个次要的复制集成员，然后捕获文件系统映像的快照 增量复制。对于连续备份，我们生成了一个跟踪 MongoDB Oplog 的工具。我们将此方法用于需要更快还原服务的更为关键的应用 延迟复制 。我们使用此方法以提供额外保证，依然由需要恢复数据的速度决定是否使用 您能分享一下扩展 MongoDB 基础结构的最佳实践吗？ 在这里，我想分享三个技巧： 从 DBA 的角度而言，应花费时间去了解应用程序使用情况。开发人员会提供指导，但我们通常会在采用他们提供给我们的任何数字后，在此基础上再增加 50%。 如果遇到了性能问题，从硬件着手解决。我们发现从硬盘升级到 SSD 能够快速提升性能，而不用进行任何其他优化。 对于高度动态、写入密集的工作载荷，确保定期监视存储碎片并根据需要进行压缩。 你们是否衡量过 MongoDB 对业务的影响？ 是的，在推向市场的时间方面衡量过。MongoDB 给我们带来影响的一个示例是我们对 2014 年云南地震的反应速度。当时国内每一个人都希望了解到最新消息，很多人希望能够尽快联系到当地的朋友和家人。公司认为实现此需求的最好方式是生成一个应用，并整合来自多个消息源的信息流。 于是我们在地震发生的那天上午就设计好了这个应用、下午就写好了代码，晚上正式发布。从一个概念到实际生产只用了一个工作日。只有 MongoDB 可以支持这种开发速度。 您是否期待 MongoDB 3.0？ 在我们获取第一个候选版本后，我们就立即开始对 MongoDB 3.0 进行测试并且反馈了一些 Bug。 我们非常期待文档级的并发控制。这将进一步提升写入操作的扩展性，并完全融入我们现在使用的最新一代的密集多核心系统。此外，压缩也能够为我们带来巨大优势。由于我们对 SSD 进行了标准化，因此压缩意味着我们可以在一个驱动器上存储更多数据，从而降低成本。这也将提升性能，因为从磁盘读取的位数会减少，从而更好地利用磁盘 I/O 周期。 对于那些考虑将 MongoDB 用于下一个项目的用户，您有什么建议？ MongoDB 的文档数据模型和动态架构带来了很高的灵活性和强大的功能。但这同样带来了更多的责任。我建议不要在一个集合中存储许多不同的文档类型和格式，因为这会使正在进行中的应用程序维护变得复杂。将不同类型的文档拆分并组织到它们自己的集合中。我们已实现了在每个集合中对文档进行扫描并抽样检查的工具。如果结构中的方差超出了最佳实践范围，我们将向开发人员告警，于是他们就会采取行动来解决问题。这就是我开始提到的地方。 杨先生，感谢您抽出时间与 MongoDB 社区分享您的见解。 还在艰难地扩展关系型数据库？下载我们的“迁移白皮书”： 迁移白皮书 关于作者 - Mat Keep Mat 是 MongoDB 产品营销团队的一员，负责为 MongoDB 产品和服务构建愿景、定位和内容，包括分析市场趋势和客户要求。加入 MongoDB 前，Mat 是 Oracle Corp. 的产品管理主管，负责与 Web、电信、云和 Big Data 工作负荷有关的 MySQL 数据库。下属职位包括技术供应商和最终用户公司的一系列销售、业务发展和分析员/程序员职位。
How Hudl Uses MongoDB To Scale Its Video Analysis Platform
Hudl’s video analysis platform helps coaches win by delivering secure access to video analysis tools from any computer or mobile device. For those who follow Division 1 college sports in the United States, you’ll be interested in knowing that the Hudl platform stores video for 99% of DI schools’ top recruits. As Hudl has grown, it has outgrown some of its infrastructure. For example, when Hudl hit a limit on EC2 (where SQL wouldn’t scale on a single instance), the growing company needed to recruit a new database. After evaluating different options, Hudl chose MongoDB . According to Hudl CTO Brian Kaiser: “MongoDB changed devops from a necessary evil to something that is transforming the company, helping us move quickly. It makes innovation easy and is universally recognized at our company because it’s been so impactful on our growth.” The numbers speak for themselves. Today, MongoDB stores 650 million plays (atomic unit such as point for volleyball, play for football) and associated metadata, which gets 1 billion video views per month. The MongoDB-based platform has streamed 18 petabytes of data in 2014. During peak football season, the platform ingests 25 hours of raw video per minute; that’s one quarter of what YouTube ingests. Big numbers for a small company! With MongoDB, Hudl has achieved steady, consistent growth and enabled Hudl to double in size. “MongoDB really facilitates rapid iterations, so the dev team can try things out and make mistakes – it’s magical for that,” said Kaiser. “MongoDB has led ops to promote squad growth and really empowered our company, and that’s something we’re proud of.” To see all MongoDB World presentations, visit the [MongoDB World Presentations](https://www.mongodb.com/mongodb-world/presentations) page.
You Know What's Cool? 1 Trillion Is Cool
A million used to be cool. Then Facebook upped the ante to one billion. But in our world of Big Data, even a billion is no longer the upper end of scale, or cool. As I learned last night, at least one MongoDB customer now stores over 1 trillion documents in MongoDB. 1 trillion . That's cool. It's also far bigger than any other database deployment I've seen from any NoSQL or relational database, even from the simple key-value or columnar data stores that are only programmed to handle simple workloads, but to scale them well. That's what makes MongoDB über cool: not only does it offer dramatic, superior scale , but it does so while also giving organizations the ability to build complex applications. MongoDB delivers the optimal balance between functionality and performance, as this illustrates: Many systems are focused on nothing more than storing your data, and letting you access it quickly, but one and only one way. This simply isn’t enough . A truly modern database must support rich queries, indexing, analysis, aggregation, geospatial access and search across multi-structured, rapidly changing data sets in real time. The database must not trap your data and hinder its use. It must unleash your data . All 1 trillion documents of it. Want to see how major Global 2000 organizations like Bosch, U.S. Department of Veterans Affairs, Genentech, Facebook and many others scale with MongoDB? Easy. Just register to attend MongoDB World, June 24-25 in New York City. You can use my discount code to get 25% off: 25MattAsay.
MongoDB at SCALE 10x
The MongoDB team is thrilled to be part of SCALE's 10-year anniversary conference in Los Angeles this year. Southern California Linux Expo has grown to a three-day mega-conference, attracting thousands of open source enthusiasts from around the world. Meghan Gill, who supports community development and outreach for MongoDB , will present The Care and Feeding of an Open Source Community , discussing the developer-based outreach strategy that helped MongoDB to become a well-known project in the open source world. First made available in 2009, downloads of the document-oriented database software now exceed 100,000 each month. You can meet Meghan and other members of the 10gen team at the MongoDB booth during SCALE. Because the MongoDB community has grown tremendously in Los Angeles, we're also hosting a full-day MongoDB conference on January 19, timed so that SCALE attendees from out of town can stop in to learn more about the document-oriented open source database. MongoDB Los Angeles will be the first full-day MongoDB conference in Southern California. Presentations include: -Building Your First MongoDB Application (Antoin Girbal, 10gen) -Using Spring and MongoDB with CloudFoundry (Josh Long, VMWare) -Indexing and Query Optimizer (Kevin Hanson, 10gen) -N2M: node.js and MongoDB as the modern stack for the real-time web (Jason Hoffman, Joyent) The LA MongoDB User Group meetup will be held on the evening of January 18th, featuring a presentation from 10gen. We're excited that the event will be co-located with SCALE, one of the premier events in the open source world. Don’t forget to register for MongoDB Los Angeles ! Tagged with: mongodb, nosql, scale, scale10x, linux, events, 10gen