Playback: 20 Billion Documents in the Cloud with HomeAway (Expedia) - MongoDB World 2018
Playback is where the MongoDB blog brings you selected talks from around the world and around the industry. Here, we are showcasing the great talks at MongoDB World 2018 from people running MongoDB in production and at scale. Behind the scenes of its holiday rental business, HomeAway analyses its interaction with users like you, linking together the many interactions you might have when booking that rental over many weeks. Pulling those many sessions, usually over multiple devices, is a real challenge. HomeAway chose MongoDB as the engine to persist the linked data. In their MongoDB World 2018 talk , HomeAway Data Architect Singaram Rangunathan and Principal Software Engineer Naveen Malhotra, talk about why they chose MongoDB. HomeAway's solution is based around a KeyRing which gathers identifiable user sessions together. When common keys are spotted between KeyRings, they are chained together so that a complete picture of a user's overall experience is constructed. The process sees all the incoming session documents being processed through a Kafka pipeline which persists the links between KeyRings and KeyChains in MongoDB. Rangunathan discusses how HomeAway worked out how to get the scale they needed in the cloud to process the data. He also talks about using MongoDB sharding for horizontal scaling, pre-splitting the sharding keys for best performance and how they selected the appropriate cloud hardware to run their solution on. Learn more about data-streaming with Kafka and MongoDB. See more talks like this - in person - at MongoDB World 2019. Register Now!
How And Why Verizon Wireless Chose MongoDB
Even small organizations struggle with change. But imagine that you have 103 million retail customers, roughly 1700 retail locations to serve them, and $81 billion in revenues at stake. Change necessarily comes hard to a company of that scale and reach. But change is precisely what Verizon Wireless increasingly enables using MongoDB. The Times They Are a-Changing In an organization the size of Verizon Wireless, the business needs are constantly growing and changing, as Shivinder Singh, Senior Systems Architect at Verizon Wireless, told an audience at MongoDB World 2014. These forces push Verizon Wireless to explore new and innovative ways to process manage its data as it seeks to drive greater customer value for its customers. One of those "new and innovative ways" is MongoDB, which helps Verizon Wireless get greater value from its data while simultaneously accelerating time-to-market and improving its asset utilization. As the company looks to augment its existing technologies, however, there's always a fair amount of trepidation, not to mention the ever-looming question: why can't we just do this with the technologies we already own and/or know? Data is changing. The world of relational databases at times doesn't fit the new world of unstructured or semi-structured data. Traditional technologies which at times would require a dedicated resource weeks to setup a environment could be achieved fairly quick with MongoDB. In a certain case, with MongoDB Verizon Wireless was "able to do that in two hours." Even so, Verizon Wireless discovered that one of the biggest challenges in moving to MongoDB was to "unlearn" RDBMS concepts and change the mindset to embrace new MongoDB and NoSQL concepts. But we're getting ahead of ourselves here. How did Verizon Wireless start using MongoDB? Getting Started With MongoDB Verizon Wireless opted to start small with MongoDB, though it did try before it bought, one of the cardinal virtues of open source. (More on that below.) The company decided to augment its employee portal, a business critical application that is "basically the homepage of anyone who works for Verizon." The existing portal was good, but Verizon Wireless wanted to build in new functionality to capture social feeds from Twitter and Facebook and display it specific to that user. Not so easy for a relational database. Originally the development team put MongoDB through its paces, first running a proof of concept and then rolling it out. They didn't have anyone dedicated to supporting it, however, so the development team asked Singh's team to support it. To bring himself up-to-speed with MongoDB, Singh took the route that over 200,000 other people have taken: MongoDB's free online training. As he describes it, within two days he was at a level that he could comfortably manage MongoDB. Within just two weeks he had re-architected Verizon Wireless' entire development set-up to be in a replicated cluster versus a standalone cluster. He then proceeded to test and break the cluster, recover it, test the recovery, test failover capabilities and more. But Singh wasn't done yet. Putting The MongoDB Team To The Test Going with a new technology can be risky, but choosing a new technology vendor to support is perhaps even more so. To minimize that risk, Singh decided to put MongoDB - the company - to the test. So Singh did what any other conscientious would-be buyer would do: He faked his death. Well, not his death, per se, but the death of his server (along with the secondary data center, just to make things doubly interesting). Of course MongoDB would quickly respond to a marquee customer like Verizon Wireless, however, so he also faked his identity, using an @yahoo.com email address. In other words, MongoDB's support team got a call from some no-name person with a generic email address claiming "my-server-is-down-the-world-is-on-fire-someone-help-me-NOW!" Within "a short period of time" MongoDB had assembled its engineers to resolve the issue and get Verizon Wireless back on track. Only then did the MongoDB team learn the real identity of Singh and win the deal. The Future Of MongoDB At Verizon Wireless Looking forward, Verizon Wireless has already started a new proof of concept for an online log management system. Not surprisingly, Verizon has "some huge servers, some huge clusters, and all of them generate a huge amount of log data." Given Verizon Wireless' data volumes, it also is looking for ways to pair MongoDB with Hadoop to leverage the strengths of both together. The company has been evaluating the MongoDB Connector for Hadoop . As Verizon Wireless moves forward, Singh notes that MongoDB is appropriate for "quite a lot" of its new use cases, and is therefore being evaluated for these new use cases alongside its traditional RDBMSes. That's a big change for a Fortune 50 enterprise, but Singh believes it's necessary to help the company grow and evolve to meet customer needs.. To view all of Singh's slides: How Verizon Uses Disruptive Developments for Organized Progress from MongoDB To watch the video, please click here .
A Mobile-First, Cloud-First Stack at Pearson
Pearson, the global online education leader, has a simple yet grand mission: to educate the world; to have 1 billion students around the globe touching their content on a regular basis. They are growing quickly, especially to emerging markets where the primary way to consume content is via mobile phones. But to reach global users, they need to deploy in a multitude of private and public data centers around the globe. This demands a mobile-first, cloud-first platform, with the underlying goal to improve education efficacy. In 2018, Pearson will be announcing to the public markets what percentage of revenue is associated with the company’s efficacy. There’s no question; that’s a bold move. As a result, apps have to be built in a way to measure how users are interacting with them. Front and center in Pearson’s strategy is MongoDB. With MongoDB, as Pearson CTO Aref Matin told the audience at MongoDB World ( full video presentation here ), Pearson was able to replace silos of double-digit, independent platforms with a consolidated platform that would allow for measuring efficacy. “A platform should be open, usable by all who want to access functionality and services. But it’s not a platform until you’ve opened up APIs to the external world to introduce new apps on top of it,” declared Matin. A key part of Pearson’s redesigned technology stack, MongoDB proved to be a good fit for a multitude of reasons, including its agility and scalability, document model and ability to perform fast reads and ad hoc queries. Also important to Matin was the ability to capture the growing treasure trove of unstructured data, such as peer-to-peer and social interactions that are increasingly part of education. So far, Pearson has leveraged MongoDB for use cases such as: Identity and access management for 120 million user accounts, with nearly 50 million per day at peak; Adaptive learning and analytics to detect, in near real-time, what content is most effective and identify areas for improvement; and The Pearson Activity Framework (akin to a “Google DoubleClick” according to Matin), which collects data on how users interact with apps and feeds the analytics engine. All of this feeds into Matin’s personal vision of increasing the pace of learning. “Increasing the pace of learning will be a a disruptive force,” said Matin. “If you can reduce the length of time spent on educating yourself, you can learn a lot more and not spend as much on it. That will help us be able to really educate the world at a more rapid pace.” **Sign up to receive videos and content from MongoDB World.** MktoForms2.loadForm("//app-abk.marketo.com", "017-HGS-593", 1151);
Enabling Extreme Agility At The Gap With MongoDB
The Gap's creative director insists that "Fashion is...about instinct and gut reaction." In the competitive world of retail, that "instinct" has been set to fast forward as Gap seeks to outpace fast-fashion retailers and other trends that constantly push Gap and other retailers to meet consumer needs, faster. As boring as it may seem, Gap's purchase order management system really, really matters in ensuring it can quickly evolve to meet consumer tastes. Unable to meet business agility requirements using traditional relational databases, Gap uses MongoDB for a wide range of supply chain systems, including various master data management, inventory and logistics functions, including purchase order management. Collecting Money From Happy Customers This is no small feat given Gap's size. The Gap is a global specialty retailer offering clothing, accessories and personal care products for men, women, children and babies. With nearly 134,000 employees and almost 3,200 company-operated stores and an additional 400 franchise stores, fashion-conscious consumers can find The Gap around the world. And they do, spending over $16 billion annually on Gap's latest track pant, indigo-washed jeans and racerback tanks. That's both the good news and the bad news, as presented by Gap consultant Ryan Murray at MongoDB World. Good, because it means Gap, more than anyone else, dresses America and, increasingly, the world. Bad, because at its scale change can be hard. Square Pegs, Round Holes And Purchase Orders Even something simple like a purchase order can have a huge impact on a company like Gap. A purchase order is a rich business object that contains various pieces of information (item type, color, price, vendor information, shipping information, etc.). A purchase order at Gap can be an order to a vendor to produce a certain article of clothing. The critical thing is that the business thinks about the order as a single entity, while Gap's RDBMS broke up the purchase order into a variety of rows, columns and tables, joined together. Not very intuitive. While this may seem like a small thing, as Murray points out, the RDBMS "forced [developers] to shift away from the business concept-- what is a purchase order and what are the business rules and capabilities around it-- and shift gears into 'How do I make this technology work for me and help me solve a business problem?' [mode of thinking]. And that destroys flow." Developers may be more technical than the rest of us, Gap wanted its developers helping to build its business , not merely its technology. Murray continues: "We don't want the developer having to work with the impedance mismatch between the business concept that they're trying to solve for and the technology they're using to solve it." Enabling Supply Chain Agility By Improving Developer Productivity As such, Gap realized it needed to evolve how it manages inventory and its vendors. It turned to MongoDB because it was able to easily make sense of data that comes in different shapes, which it needed to store quickly and transparently in Gap's database. MongoDB, in short, helped Gap become much more agile and, hence, far more competitive. One way Gap managed this was by moving from a monolithic application architecture to a microservices-based approach. The traditional model for building applications has typically been as large monoliths. In this case, that meant the PO system was one, big code base that handled everything related to a PO, whether that was handling demand from the planning systems and creating those purchase orders or simply handling how the purchase orders actually integrate to other systems and get down to the vendors. All of those things are actually fairly independent of each other, but the code base to manage it was monstrously big and monolithic. Instead Murray and team introduced the concept of the microservice, a service dedicated to one business capability. For example, a microservice could handle communicating out to the vendors by EDI or whatever technology that a new purchase order has been registered. It turns out that MongoDB is perfect for such microservices because it's so simple and lightweight, Murray notes. Gap uses MongoDB to power these single service and to connect them together. Each of these services lines up with a business function. Developers can work on separate microservices without bumping into or waiting on each other, as is common in a monolithic architecture. This enables them to be far more productive; to work much faster. MongoDB As An "Extreme Enabler Of Agile Development" In this and other ways, Murray lauds MongoDB as “an extreme enabler of agile development”, or iterative development. Waxing rhapsodic, Murray continues: MongoDB allow[s our developers] to essentially forget about the storage layer that's underneath and just get work done. As the business evolves, the concept of a purchase order as an aggregate concept will also change as they add fields to it. MongoDB gets out of the way. [Developers] drop a collection, start up new code over that database, and MongoDB accepts whatever they throw at it. Again, developers don't have to stop, break the context of solving the business problem, and get back to what they're doing. They simply get to focus on the business problem. And so as an agile enabler, as an enabler of developers to work fast and smart, MongoDB do is extremely useful. As just one example, Gap was able to develop this new MongoDB-based purchase order system in just 75 days, a record for the company. In true agile fashion, MongoDB enables Gap to continue to iterate on the system. Five months in, the business wanted to track in a dashboard style the life of a purchase order. With MongoDB, that business requirement turned out to almost require no development effort. Murray and team were able to add new types of purchase orders and have them easily coexist with old purchase orders in the same collection and keep moving. Not in months. Or weeks. But rather each day the development team was able to show the business what that feature might look like because of MongoDB's flexibility. All of which makes Murray and his team at Gap so happy to work with MongoDB. "Software is ultimately about people," he insists, and giving developers software like MongoDB that they love to use makes them happy and productive. And agile. **Sign up to receive videos and content from MongoDB World.** MktoForms2.loadForm("//app-abk.marketo.com", "017-HGS-593", 1151);
Visualizing Mobile Broadband with MongoDB
The FCC has a mandate to collect and share information on mobile broadband quality. Traditionally, that has meant collecting data and then issuing a report. Before the report is completed – a process that involves drafting, writing, rewriting, and getting the right approvals – the public generally has no visibility into the data. MongoDB is helping change that. The FCC Speed Test App (available for iPhone and Android ) measures network quality metrics, including upload and download speed, latency, and packet loss. Currently, users can test their own networks and view an archive of their test results. Soon, the Visualizing Mobile Broadband project will allow consumers to see aggregate test results overlaid on maps, as soon as they become available. The visualization application is built on MongoDB using Node.js, and employs Mapbox for mapping. The results of the Speed Test app are collected, imported into MongoDB, cleaned and validated, and then aggregated by geography, carrier, and time. Eric Spry, the Acting Geographic Information Office at the FCC, said “The data tended to be a little messy for our SQL schema, and we were constantly having to redefine fields and having to account for new edge cases in the data … MongoDB allowed us a great deal of flexibility in the data we could accept.” Using MongoDB, the FCC is able to store the results as they come in, and perform data validation afterwards. Not only does MongoDB serve as the container for the speed test data, but it also provides the spatial operators for aggregating and analyzing test results based on location. The FCC also chose MongoDB for its ability to scale, as their test results grow from millions to tens of millions per month. The application will allow consumers to see mobile broadband data and use that information to make more informed carrier choices. In addition, an API and release of the source code will enable others to build their own applications using the mobile network information as it becomes available. Of course, being a government agency, the FCC faces its own set of challenges. “The MongoDB team understands government procurement and our unique security issues,” said Spry. “Their knowledge of our requirements meant that standing up a MongoDB server went very smoothly.” For more details, see the full recording of Eric Spry's talk at MongoDB World, available now. The FCC will launch the application in August 2014. To see all MongoDB World presentations, visit the [MongoDB World Presentations](https://www.mongodb.com/mongodb-world/presentations) page.
Dating at eHarmony - 95% Faster on MongoDB
Thod Nguyen, CTO of eHarmony, delivered a fascinating insight into how the world’s largest relationship service provider improved customer experience by processing matches 95% faster and increased subscriptions by 50% after migrating from relational database technology to MongoDB. The full recording and slides from Thod’s MongoDB World session are available now. eHarmony currently operates in North America, Australia and the UK. The company has a great track record of success - since launch in 2000, 1.2 million couples have married after being introduced by the service. Today eHarmony has 55m registered users, a number that will increase dramatically as the service is rolled out to 20 other countries around the globe in the coming months. eHarmony employs some serious data science chops to match prospective partners. Users complete a detailed questionnaire when they sign up for the service. Sophisticated compatibility models are then executed to create a personality profile, based on the user’s responses. Additional research based around machine learning and predictive analytics is added to the algorithms to enhance the matching of prospective partners. Unlike searching for a specific item or term on Google, the matching process used to identify prospective partners is bi-directional, with multiple attributes such as age, location, education, preferences, income, etc. cross-referenced and scored between each potential partner. In eHarmony’s initial architecture, a single monolithic database stored all user data and matches, however this didn’t scale as the service grew. eHarmony split out the matches into a distributed Postgres database, which bought them some headroom, but as the number of potential matches grew to 3 billion per day, generating 25TB of data, they could only scale so far. Running a complete matching analysis of the user base was taking 2 weeks. In addition to the problems of scale, as the data models became richer and more complex, adjusting the schema required a full database dump and reload, causing operational complexity and downtime, as well as inhibiting how quickly the business could evolve. eHarmony knew they needed a different approach. They wanted a database that could: Support the complex, multi-attribute queries that provide the foundation of the compatibility matching system A flexible data model to seamlessly handle new attributes The ability to scale on commodity hardware, and not add operational overhead to a team already managing over 1,000 servers eHarmony explored Apache Solr as a possible solution, but it was eliminated as the matching system requires bi-directional searches, rather than just conventional un-directional searches. Apache Cassandra was also considered but the API was too difficult to match to the data model, and there were imbalances between read and write performance. After extensive evaluation, eHarmony selected MongoDB. As well as meeting the three requirements above, eHarmony also gained a lot of value from the MongoDB community and from the enterprise support that is part of MongoDB Enterprise Advanced . Thod provided the audience with key lessons based on eHarmony’s migration to MongoDB: Engage MongoDB engineers early. They can provide best practices in data modeling, sharding and deployment productization When testing, use production data and queries. Randomly kill nodes so you understand behavior in multiple failure conditions Run in shadow mode alongside the existing relational database to characterize performance at scale Of course, MongoDB isn’t the only part of eHarmony’s data management infrastructure. The data science team integrates MongoDB with Hadoop, as well as Apache Spark and R for predictive analytics. The ROI from the migration has been compelling. 95% faster compatibility matching. Matching the entire user base has been reduced from 2 weeks to 12 hours. 30% higher communication between prospective partners. 50% increase in paying subscribers. 60% increase in unique web site visits. And the story doesn’t end there. In addition to eHarmony rolling out to 20 new countries, they also plan to bring their data science expertise in relationship matching to the jobs market – matching new hires to potential employers. They will start to add geo-location services as part of the mobile experience, taking advantage of MongoDB’s support for geospatial indexes and queries. eHarmony are also excited by the prospect of pluggable storage engines delivered in MongoDB 3.0 . The ability to mix multiple storage engines within a MongoDB cluster can provide a foundation to consolidate search, matches and user data. Whether you’re looking for a new partner, or a new job, it seems eHarmony has the data science and database to get you there. If you are interested in learning more about migrating to MongoDB from an RDBMS, read the white paper below: RDBMS to MongoDB Migration Guide
MongoDB: A Single Platform for All Financial Data at AHL
AHL , a part of Man Group plc, is a quantitative investment manager based in London and Hong Kong, with over $11.3 billion in assets under management. The company relies on technology like MongoDB to be more agile and therefore gain an edge in the systematic trading space. With MongoDB, AHL can better support its quantitative researchers – or “quants” – to research, construct and deploy new trading models in order to understand how markets behave. Importantly, AHL didn't embrace MongoDB piecemeal. Once AHL determined that MongoDB could significantly improve its operations, the financial services firm embraced MongoDB across the firm for an array of applications. AHL replaced a range of traditional technologies like relational databases with a single platform built on MongoDB for every type and frequency of financial market data, and for every level of data SLA, including: Low Frequency Data – MongoDB was 100x faster in retrieving data and also delivered consistent retrieval times. Not only is this more efficient for cluster computation, but it also leads to a more fluid experience for quants, with data ready for them to easily interact with, run analytics on and plot. MongoDB also delivered cost savings by replacing a proprietary parallel file system with commodity SSDs. Multi-user, Versioned, Interactive Graph-based Computation – This includes 1 terabyte of data representing 10,000 stocks and 20 years of time-series data, so as to help quants come up with trading signals for stock equities. While not a huge quantity of data, MongoDB reduced time to recompute trading models from hours to minutes, accelerated quants’ ability for interactive research, and enabled read/write performance of 600MB of data in less than 1 second. Tick Data – Used to capture all market activity, such as price changes for a security, up to 150,000 per second and including 30 terabytes of historic data. MongoDB quickly scaled to 250 million ticks per second, a 25X improvement in tick throughput (with just two commodity machines!) that enabled quants to fit models 25X as fast. AHL also cut disk storage down to a mere 40% of their previous solution, and realized a 40X cost savings. The result? According to Gary Collier, AHL’s Technology Manager: “Happy developers. Happy accountants.”
US Department Of Veterans Affairs Goes From Wire Frame To Production App In Weeks, Not Months Or Years, With MongoDB
Would it surprise you that one of the biggest open-source software shops in the world, in fact one of the biggest NoSQL shops in the world, resides in the U.S. government? The Department of Veterans Affairs has more than 20 million primary customers and a $3.4B annual IT budget with 400,000 users and over 5,000 applications. The VA turned to MongoDB to unlock enterprise services with a schema agnostic enterprise CRUD (eCRUD) service. Previously, the VA was paying millions of dollars to lock away data in relational databases and millions more to get it back out. “It just didn’t make sense,” said Joe Paiva, Chief Technology Strategist at the U.S. Department of Veteran Affairs. “We realized early on we could never build all the apps that people want. We wanted to go from wire frame to app much, much faster.” In order to get there, the VA used MongoDB as one logical, federated data store for all of its different types of data. Now, people can freely code as long as they know how to do an AJAX web services call. “You can say you’re agile, that you’re incremental, but when you need change, you need all the change!" said Paiva. With MongoDB, they achieved just that. The VA had the first version of the service up and running in weeks. “It was that fast,” said Paiva. Through this effort, the VA has been able to provide efficiency and enhanced information agility. Plus, it has increased security by consolidating data under standardized enterprise controls… all in the name of keeping costs low while better serving a greater number of veterans. To see all MongoDB World presentations, visit the [MongoDB World Presentations](https://www.mongodb.com/mongodb-world/presentations) page.