Why I Wrote the New MongoDB Aggregations Book
In early May 2021, I published my book, Practical MongoDB Aggregations, which I released electronically and free for anyone to read . I love the MongoDB database and the uniqueness and power of its aggregation framework to analyse and manipulate massive amounts of data intuitively and efficiently. The opportunity to share this passion with others spurred me to write the book, with which I aim to support developers, architects, data analysts, data engineers, and data scientists to better understand how to maximise their productivity and effectiveness when building aggregation pipelines, as well as how to optimise these pipelines. Like many people over the past year during the pandemic, I’ve struggled to keep myself occupied when not busy doing my day job. Hence, my book was born not just from a desire to improve people’s knowledge but as my pandemic project, written over many weekends, to stave off the boredom. I believe aggregation pipelines provide a powerful domain-specific language for data processing in a way I’ve not seen before in other data-oriented tools, languages, or standards. SQL is a good data query language that caters to some analytical use cases via “group-by/having” statements. However, it typically has to be paired with a procedural language (e.g., Oracle’s PL/SQL ) to encompass an ordered set of complex data transformation rules. In the big data world of Hadoop , I find the MapReduce approach is too complex to develop with efficiently. Higher-level tools like Spark help alleviate some of this. However, by the necessity of still having to be general-purpose and versatile, the amount of Spark code required to process data sitting in any type of database is still too high for my liking. Many ETL tools provide proprietary data transformation capabilities, but these have to cater to the lowest common denominator capabilities across all the different types of databases they interact with. For these reasons and from experience, I consider MongoDB Aggregations to be the best tool for processing large data sets because it combines performance with productivity. Nevertheless, I sense the aggregation framework is shrouded in mystery for many people, hence my desire to demystify it with this book. I believe I identified a knowledge gap that many users wanted to be filled. MongoDB Inc. provides excellent reference documentation about aggregations in the MongoDB Manual , and MongoDB University provides a tremendous free online training course on aggregations . What I felt was still to be addressed was an opinionated yet informed perspective on how best to assemble aggregation pipelines from the well-documented parts—something that points the way to achieve optimal productivity and performance, accompanied by fully formed example pipelines to help put these approaches into practice. I hope readers of my book will learn some new things of value and enjoy reading it. A good test of the relevance of my book, in time, will be if people come back to it repeatedly as they continue with their journey of developing aggregations. Read the book for free now!
Intern Spotlight: Russell Kaplan
This year, MongoDB welcomed 33 university students to our intern program in Engineering, Marketing and Education. In this series, we'll introduce you to the talented students who are helping us transform development and operations for how we run applications today. We had the chance to sit down with intern Russell Kaplan, who is working on the C++ Driver team. Where do you go to school, what is your major, and what year are you in? I go to Stanford, where I am a computer science major and a rising sophomore. What is your role at MongoDB? I work on the C++ driver team, building a geospatial API. How did you find out about the internship program at MongoDB? Why did you choose to come to MongoDB? I met MongoDB at PennApps . The App I made there won the prize for best use in the MongoDB category. It was called screenshades, and was a chrome extension that figured out what TV shows you watch and hides spoilers for them from your twitter stream. It worked with machine learning, so we needed a lot of training data, which we scraped from Twitter and Reddit for spoiler hashtags and built a dataset off of. We then used that as a classifier. I chose to come to MongoDB because I already had a lot of experience with front-end development and building web-apps and wanted to learn more about the back-end of development. What’s your hometown? My hometown is NYC. Best city in the world! Did you have previous experience using MongoDB before you arrived? If so, how are things different now that you work at MongoDB? If not, how did you learn MongoDB and how was the education process? I used it at hackathons before. But I only really used its basic features. I learned a lot more about it after getting here. It’s really simple to use for quickly getting started with web applications. Bike or public transportation to work? Subway. What’s a typical day (or week) for you? I get into the office by 10am. Eat some breakfast in the café, catch up on emails for a bit and then get to coding. I code until lunch, have some seamless, play a game of ping pong and then code for the rest of the day. What do you love most about MongoDB? I love the people I get to work with. It’s a lot of really smart high-energy people that I have so much to learn from. What’s the most challenging aspect of your job? Because it’s a database and an open source company, the code really has to be production quality in a way that class work doesn’t. It’s a much more rigorous standard of development. That’s something that’s really cool to learn but challenging at times. What do you hope to accomplish while you’re here? I hope to have my code integrated into the rest of the MongoDB code base. I hope that the people who use the C++ driver appreciate the work I’ve done. What’s your favorite Seamless lunch order? Chop’t steak salad. Name one secret skill you have, unrelated to work. I can beat box. A little bit, I’m an amateur. Whose your favorite tennis player? Djokovic, he’s incredible. He also has a hilarious sense of humor and isn’t afraid to make jokes about himself and other players. Kindle or book? What’s your favorite book? Books. I’m old school. My favorite book is probably 1984. Describe your perfect weekend. Oh man. Sleep in late Saturday morning and then go play some tennis with some friends. Discover some obscure yet delicious restaurant for dinner, and then go see a Death Cab for Cutie concert. All while getting to hang-out with friends and family. Want to help build the next revolution in database technology? MongoDB offers summer internships and new graduate opportunities to foster computer science talent across the country. Learn more about the MongoDB University Relations program .
Beyond NoSQL: A Modern Database Manifesto
There is no such thing as NoSQL. Not as we tend to think of it, anyway. While NoSQL was born as a movement away from rigid relational data models so web giants could embrace Big Data with scale-out architectures, the term has come to categorize a set of databases that are more different than they are the same. This broad categorization doesn’t work. It’s not helpful. While we at MongoDB still sometimes refer to NoSQL, we try to do it sparingly, given its propensity to confuse rather than enlighten. Deconstructing NoSQL Today the NoSQL category includes a cacophony of over 100 document, key-value, wide-column and graph databases . Each of these database types comes with its own strengths and limits. Each differs markedly from the others, with disparate models and capabilities relative to data storage, querying, consistency, scalability and high availability. Comparing a document database to a key-value store, for example, is like comparing a smartphone to a beeper. A beeper is exceptionally useful for getting a simple message from Point A to Point B. It’s fast. It’s reliable. But it’s nowhere near as functional as a smartphone, which can quickly and reliably transmit messages, but can also do so much more. Both are useful, but the smartphone fits a far broader range of applications than the more limited beeper. As such, organizations searching for a database to tackle Gartner’s three V’s of Big Data -- volume, velocity and variety -- won’t find an immediate answer in “NoSQL.” Instead, they need to probe deeper for a modern database that can handle all of their Big Data application requirements. Modern Databases For Modern Data One of these requirements is, of course, the ability to handle large volumes of data, the original impetus behind the NoSQL movement. But the ability to handle volume, or scale, is something all databases categorized as “NoSQL” share. MongoDB, for example, counts among its users those who regularly store petabytes of data, perform over 1,000,000 operations per second and clusters that exceed 1,000 nodes. A modern database, however, must do more than scale. Scalability is table stakes. It also must enable agility to accelerate development and time to market. It must allow organizations to iterate as they embrace new business requirements. And a modern database must, above all, enable enterprises to take advantage of rapidly growing data variety. Indeed the “greatest challenge and opportunity” for enterprises, as Forrester notes, is managing a “variety of data sources,” including data types and sources that may not even exist today. In general, all so-called NoSQL databases are much more helpful than relational databases at storing a wide variety of data types and sources, including mobile device, geospatial, social and sensor data. But the hallmark of a modern database its ability to allow organizations to do useful things with their data. Defining The Modern Database To count as a modern database, then, a database must meet three requirements. While relational databases are able to manage some of these requirements, and newer so-called “NoSQL” key-value or wide column data stores meet others, only MongoDB meets all three requirements. The database MUST scale . As data volume and velocity grows, so the database must grow too. It should scale horizontally and elegantly, without doing unnatural things to your application, in the cloud or on commodity hardware. Meeting the base requirements -- like having enough capacity to serve your customers -- should be a given. The database MUST adapt to change . The speed of business accelerates and your database must keep pace, enabling iteration. This means you must be able to process and mine new data sources and data types without the database breaking a sweat (or you breaking your back or budget). Your schema must flow from your application requirements, rather than forcing your application to fit a predefined, rigid schema. The database MUST unleash your data . Just storing data isn’t enough. You must be able to exploit the data, which particularly means you must be able to ask significant questions of your data. In part this means that the database must support rich queries, indexing, aggregation and search across multi-structured, rapidly changing data sets in real time. But it also means that it must support data for modern use cases including mobile, social, Internet of Things and other systems of engagement. Some relational databases can handle a few of these requirements, yet fail in the essential need to deliver scale and adaptability. Some newer databases, including so-called “NoSQL” key-value or wide column data stores, meet still other requirements, yet don’t give organizations the latitude to unleash their data. In fact, they constrain you to look up data by the key with which it was written unless you integrate external search engines and analytics nodes, which can create other problems. MongoDB: A Modern Database For Today's Business Needs But only one database today can deliver on each of these critical components of a modern database. Only one database offers orders of magnitude more productivity for developers and operations teams alike, while still delivering petabyte scale and lightning-fast performance. Only MongoDB, the modern database that tens of thousands of organizations depend upon to build and run today’s most demanding applications. To learn more about how MongoDB has enabled some of the world’s largest and most innovative companies to deliver applications and outcomes that were previously impossible, download our new whitepaper .
Announcing the Second Annual MongoDB March Madness
March Madness is a college basketball tournament, but it is also a month where we engage our MongoDB User Group network in a global activity. Last year we had the groups compete in a World-wide Hackathon . This year, we are sending MongoDB engineers to 33 MongoDB User Groups (MUGs) around the globe! Our goal is give our incredible MUG Leaders what they always ask for: A MongoDB Engineer to share their expertise with the community! Thanks to MUG Leaders Jason Ford, Bev Corwin, Flavio Percoco, Tobias Trelle, Ivan Hristov, John Puddifoot, Mário Cordeiro, Adrian Wolny, Stefan Rudnitzki, Brad Urani, Scott Shellabarger, Mario Koppen, Bertin Nono, Ben McCann, Sig Navarez and Victoria Malaya for making this event happen. We'll be launching March Madness at the Stockholm MUG and are looking forward to sharing the great stories, slides and photos with the community. Find a March Madness event happening near you or find a MUG near you . Don't see a MUG in your area? Start a MUG for your local tech community. Amsterdam Barcelona Berlin Cambridge Casablanca Cincinnati Copenhagen Dnipropetrovsk Dublin Dusseldorf Edinburgh Geneva Gent Hamburg Krakow Lisbon Little Rock London Madrid Moscow Milan Munich Nashville Orange County Oporto Paris Richmond Rome Sevilla Stockholm Tel Aviv Vienna
Looking beyond labels like relational and NoSQL
According to a new Dice.com salary survey , MongoDB ranks as one of top-10 most highly compensated technology skills. Indeed.com rates MongoDB as the second hottest job trend. And DB-Engines.com, which ranks over 200 databases on their relative popularity, MongoDB is now the fifth-most popular database in the world, this month surpassing IBM's DB2. All great, right? Maybe. Buried in the Dice.com data, as well as the Indeed.com data, is evidence of real confusion. For example, of the top-10 most highly compensated skills in Dice.com's survey is "NoSQL ." NoSQL is not a technology. It's not really something a developer can "know" in any real sense. NoSQL is a movement that describes a different way of modeling data but, as Basho founder Justin Sheehy correctly noted , there are as many differences among so-called NoSQL databases as there are similarities. As such, knowing Basho's Riak won't really help you understand MongoDB. Perhaps at a high, conceptual level, but expertise in one doesn't really translate into familiarity with another. They are different databases with different approaches. Employers looking for generic NoSQL skills need to think more deeply about what their application requirements are. Looking beyond relational databases for modern application requirements is a good start, but looking to generic "NoSQL" is not sufficient. Organizations should be looking for a modern database that dramatically improves developer productivity, encourages application iteration and enables a new wave of transformational applications in areas like Big Data , Internet of Things , mobile and more . That database is MongoDB. Is MongoDB "NoSQL." Sure. But it's much bigger than that ( based on what people search for on Google , many organizations already seem to understand this). MongoDB is the fastest-growing database in the world , not because it fits the NoSQL category, but because it significantly improves the productivity of developers and the organizations for which they work. So if you're looking to hire technology talent, you're far more likely to be successful hiring an experienced MongoDB engineer than a "NoSQL engineer." MongoDB, after all, is an actual database. NoSQL simply describes an important movement.
MongoDB Named InfoWorld 2014 Technology of the Year: It's A Matter Of Innovation
When it rains, it pours. Right on the heels of being named DB-Engines' 2013 Database of the Year and Linux Journal's Best NoSQL Database , InfoWorld has given MongoDB its 2014 Technology of the Year award , alongside Amazon Web Services and GitHub, among others. More than just point solutions to finite business problems, InfoWorld's list includes technologies that "point the way to the data centers, clouds, and applications of tomorrow. They’re the innovations that are changing the way we work and do business," as Doug Dineley, executive editor of InfoWorld’s Test Center, declares . Sometimes innovation is about lower costs. For example, one of the biggest advantages Hadoop brings is enabling data analytics on commodity hardware, as opposed to the expensive, proprietary solutions of yesterday. The real value of Linux, in its early years, was arguably less about product innovation and more a matter of helping enterprises transition away from expensive UNIX servers. MongoDB enables a different type of innovation. Yes, MongoDB is dramatically less expensive than licensing and running a proprietary relational database. But that's not what has made it the fastest-growing, most popular non-relational database (by a wide, wide margin). Instead, MongoDB is popular because it reinvents data management, enabling developers to write a new breed of application that is impossible, or exceptionally difficult, with a relational database. Part of this is a matter of simplifying data schema: And part of it is allowing the developer to focus on her application (pictured as a car in the graphic below), and not the unnecessary overhead of object relational mapping and upkeep on a rigid, relational schema: But the overall value is about enabling and enobling developers, giving them power to get work done for the line of business tasked with new marketing initiatives, optimizing business processes and more. Ultimately, then, MongoDB has won InfoWorld's 2014 Technology of the Year award because it brings innovation back to the data management market, something that has been sorely lacking for a long time.
Hudl: Getting Athletes to the Top with MongoDB
Football is a resource-intensive sport. The strategy and people power that help bring a team into top shape are enormous. Playbooks look like phone books and the hours of game and practice footage are difficult to distribute to teams and coaching staff. Many teams, however, have gotten an edge by using Hudl, a platform that offers secure access to video analysis tools from any computer or mobile device. The MongoDB-based platform makes it easy to upload, sort, analyze and share video to help coaches learn about their teams, scout opponents and win. After facing bottlenecks with SQL, Hudl turned to MongoDB to support its video metadata storage. MongoDB delivers a flexible data model , ensuring coaches are not restricted when defining variable data, such as football formations, camera angles, and custom notes used for post-game analysis. With MongoDB, Hudl can create a single collection with high-speed querying, while easily and cost-effectively sharding to scale linearly. “Rather than partitioning SQL, we decided to invest in horizontal scale for the long term,” said Brian Kaiser, CTO at Hudl. “MongoDB makes it so easy to add shards that we don’t require a large capital expenditure to upgrade, which is great from a predictability point of view. Together with Amazon’s Provisioned IOPS, MongoDB delivers remarkably stable query.” MongoDB has increased developer productivity by facilitating Hudl’s A/B testing and enabling the incremental, easy rollout of new features. In addition, Hudl relies on MongoDB Management Service (MMS) as a crucial asset to monitor MongoDB clusters and proactively address deployment issues. “MongoDB is painless for developers and has proven to be battle-tested for the web-based video analysis that Hudl requires,” said Kaiser. “We appreciate having a strong company that backs the product with great support and a high level of innovation.” Since 2001, over 1.6 Million recruiting packages have been sent through Hudl, and over 162,000 college coaches have watched recruiting films through Hudl. We hope to see more from the Hudl team as they change the way athletes, coaches and recruiters build talent
Mapping the Industry's Tectonic Shift in Data Management
We are clearly in the early stages of a "tectonic shift" in the database market, as eWeek terms it . Not because any particular database vendor decided that the world was ripe for a change, but because the nature of data we're generating and processing has changed. Dramatically. In a recent research note, Cowen & Co. analyst Peter Goldmacher clearly articulates this shift: It is well understood that the current database giants have written superb products to solve primarily one problem (automating standard business processes), but we no longer live in a one problem world. The proliferation of mobile devices is forcing an immense structural change as we increasingly overlay a digital existence on top of our analog existence. If we can measure it, we can manage it; has transcended the world of business process automation and now has meaning in everything we do, as everything we do generates data. Driving, tweeting, gaming, friending, browsing, walking...it all generates data. We can capture, analyze and derive tremendous value from that data, but only if we can use low cost, high-quality data management products. This is the challenge MongoDB is laying down, and it is the challenge all other data management players must rise to meet if Big Data is going to realize its potential. I've called out before that NoSQL and Hadoop are the new normal in data management. This is why. And it's why as much as the RDBMS establishment may wish it otherwise, the industry looks bright for NoSQL technologies like MongoDB.
The Changing Of The Technology Guard: NoSQL + Hadoop
Big Data truly is prompting a changing of the technology guard. In an excellent article today, The Wall Street Journal notes that Hadoop is "challenging tech heavyweights like Oracle and Teradata [whose] core database technology is too expensive and ill-suited for typical big data tasks." This follows my own observations that repeated earnings misses across the legacy technology vendor landscape indicate that real, tectonic shifts in the technology landscape are underway. In other words, NoSQL and Hadoop are the new normal. What the Journal missed, however, was the right emphasis. As fantastic as Hadoop is, it's only one part of the Big Data story. And not necessarily the most significant part. For example, the Journal writes: Traditional databases organize easy-to-categorize information. Customer records or ATM transactions, for example, arrive in a predefined format that is easy to process and analyze. These so-called relational databases are the kind offered by Oracle and Teradata among others, and the market for them runs to an estimated $30 billion a year, according to IDC estimates. The Internet, though, is messy. Companies now also have to make sense of and store the mass of data being generated from tweets, Web-surfing logs and Internet-connected machines. Hadoop is a cheap technology to make that possible, and it was born of Google technologies detailed in academic papers. The article is dead-on in most respects, except for the market that Hadoop truly tackles. Of the $30 billion database market, Hadoop addresses just a quarter of it: the OLAP market. The much larger market is the traditional OLTP market, and this is the home of NoSQL databases like MongoDB. Perhaps unsurprisingly, then, MongoDB has the fastest growing Big Data community , and the second hottest job trend after only HTML5 . Big Data, after all, isn't merely about analytics. It's primarily about operational databases that can help enterprises put their data to work in real time.
Making Sense Of NoSQL: A Layperson's Guide
Despite how simple MongoDB is to learn and use for application development, it still comes burdened with lingo that outsiders can find abstruse. Actually, this is a problem common to a lot of great technology, which is why I appreciate it when someone takes the time to make business sense of complex technology. For example, ReadWrite 's Brian Proffitt recently wrote a primer on Hadoop that I found really helpful. Closer to home, today I stumbled upon "The Business Person’s Minimalist Guide to NoSQL" by ServiceSource's SVP of Marketing Paula Reinman . In just a few short paragraphs, Reinman highlights why MongoDB is such a big deal, without resorting to deep technical jargon. Some examples: Term Engineers Say I Say Replication A master can perform reads and writes. A slave copies data from the master and can only be used for reads or backup (not writes). The slaves have the ability to select a new master if the current one goes down Gives my customers real time, uninterrupted access to key data and analysis, even when part of the system goes down Sharding Distributes a single logical database system across a cluster of machines Allows my customers to have consistent and timely access to information even when their data volumes grow beyond the capacity of individual servers Schema-less You can create collections without defining the structure, i.e. the fields or the types of their values, of the documents in the collection. You can change the structure of documents simply by adding new fields or deleting existing ones. Documents in a collection need not have an identical set of fields My customers can get product more quickly since my engineers can more easily adapt our application to our customers’ variable and dynamic requirements As someone who appreciates great software but can't actually code it, I'm grateful for people like Reinman and Proffitt who can help explain why it matters.
This Week in MongoDB: May 13-19
Learn More MongoDB Indexing Best Practices MongoDB, Build Parties and Deploying Your Web App at 11am MongoDB Profiling Tips MongoDB for Java Developers Begins May 13. Sign up and get the first week's material. First homework due next Monday Upcoming MongoDB Days MongoDB Pittsburgh June 3 MongoDB Israel June 16 MongoDB New York City June 21 Upcoming Webinars May 14: Utilisations courantes de MongoDB May 16: Realizing the Promise of Machine to Machine (M2M) with MongoDB May 22: Indexing and Query Optimization May 23: How Financial Firms Create a Single Customer View with MongoDB User Groups and Events this Week May 14 NYC C++ Meetup May 15 May MongoDC Meetup May 17 Jeremy Mikola will by at PHPTek in Chicago discussing how to be a good open source contributor Have something you'd like to share? Let us know
Why It's the Right Time to Learn MongoDB
There are a number of technical considerations involved in choosing a database for a new project, but if you’re looking to learn a new technology, you need the reassurance that there is traction in the field and resources available to grow as a developer or ops professional. Here’s why it’s the right time to learn MongoDB . The Technology has matured Product maturity grows due to increased usage and familiarity. MongoDB is open source and has grown along with the community--thanks both to code contributors, community testers and even those who vote on new features. If you’re learning MongoDB now, you will be learning to use a solid product that has industry validation and similar functionality to many RDBMS systems you’ve encountered before. You will also have the support of a community of experts who have been using MongoDB in different environments for three years or more. You Need to Stay Relevant Interest in MongoDB spiked in 2010, according to Google Search Insights and the momentum has only continued to grow. This is because the technology has matured, 10gen’s development on MongoDB has increased and adoption has grown. MongoDB has enabled developers to build new types of applications for cloud, mobile, social, making MongoDB developers an invaluable resource for companies looking to innovate in each of these areas. In May 2012, James Governor posted Indeed Job Trends for various NoSQL products, all heading uphill since 2010, and MongoDB came out on top. Additionally, MongoDB is the most widely adopted NoSQL technology according to 451 Group's monthly LinkedIn Skills Index , with 45% of LinkedIn profile mentions in the NoSQL category. MongoDB skills are in high demand from businesses, and your peers are learning the skills to stay relevant. You Need to Get Ahead. Employers are looking for talented engineers who stay up-to-speed on new technologies. But even if you’re not looking for a new position, learning MongoDB can place you in line to lead a new project or oversee a large database migration. Developers at companies like eBay , Disney , Carfax , Edmunds and Cisco are running large production deployments of MongoDB. Companies like The Guardian have committed to prototype all new projects on MongoDB--calling it the “MongoDB First” philosophy. If you work at a large engineering company, it’s likely that some new projects for social communications, advanced analytics products, content management or archiving could use a MongoDB backend. With the right expertise, you can position yourself to lead the project. The Resources are there for you! MongoDB has matured, and so have the resources for learning how to use the database. The docs, mailing lists and user forums are all at least three years old and are available in a number of languages. Additionally, there are community developed resources for getting started, including the Little MongoDB book. Here are some more materials for getting started with MongoDB: Online Education Courses : 10gen launched online education classes in November 2012, and have been adding on new courses every few months. 10gen’s 7 Week classes will help you learn the basics of data modeling, application design and operations with MongoDB. The next set of courses for MongoDB and Java will begin on May 13 and MongoDB for Developers will begin June 17. Training : 10gen provides 2-3 day training for Developers and DBAs. These courses offer a deep dive into MongoDB. 10gen offers training regularly in New York, Palo Alto and London, and offers training in other cities in the United States and Europe. This is ideal for those interested in getting started on a new MongoDB project right away. Webinars : If you’re chained to your desk all day, try attending an introductory webinar. At 10gen we host at least 1 webinar a week. These offer an in-depth, technical overview into a specific topic, and you’ll always get slides and video after. Conferences : Full-day conferences are an excellent way to get a good overview of a particular technology and its ecosystem. Not only will you leave with practical knowledge on how to get started, but you’ll also get to hear from production users who have valuable experience in onboarding development teams designing and scaling applications. Check out 10gen’s conference schedule for the rest of 2013.