Securing Multi-Cloud Applications with MongoDB Atlas
The rise of multi-cloud applications offers more versatility and flexibility for teams and users alike. Developers can leverage the strengths of different cloud providers, such as more availability in certain regions, improved resilience and availability, and more diverse features for use cases such as machine learning or events. As organizations transition to a public, multi-cloud environment, however, they also need to adjust their mindset and workflows — especially where it concerns security. Using multiple cloud providers requires teams to understand different security policies, and take extra steps to avoid potential breaches. In this article, we’ll examine three security challenges associated with multi-cloud applications, and explore how MongoDB Atlas can help you mitigate or reduce the risks posed by these challenges. Challenge 1: More clouds, more procedures, more complexity Security protocols, such as authentication, authorization, and encryption, vary between cloud providers. And, as time goes on, cloud providers will continue to update their features to stay current with the market and remain competitive, adding more potential complications to multi-cloud environments. Although there are broad similarities between AWS, Azure, and GCP, there are also many subtle differences. AWS Identity and Access Management (IAM) is built around root accounts and identities, such as users and roles. Root accounts are basically administrators with unlimited access to resources, services, and billing. Users represent credentials for humans or applications that interact with AWS, whereas roles serve as temporary access permissions that can be assumed by users as needed. In contrast, Azure and GCP use role-based access control (RBAC) and implement it in different ways. Azure Active Directory allows administrators to nest different groups of users within one another, forming a hierarchy of sorts — and making it easier to assign permissions. However, GCP uses roles , which include both preset and customizable permissions (e.g., editor or viewer), and scopes , or permissions that are allotted to a specific identity concerning a certain resource or project. For example, one scope could be a read-only viewer on one project but an editor on another. Given these differences, keeping track of security permissions across various cloud providers can be tricky. As a result, teams may fail to grant access to key clients in a timely manner or accidentally authorize the wrong users, causing delays or even security breaches. Challenge 2: Contributing factors Security doesn’t exist in a vacuum, and some factors (organizational and otherwise) can complicate the work of security teams. For example, time constraints can make it harder to implement or adhere to security policies. Turnover can also create security concerns, including lost knowledge (e.g., a team may lose its AWS expert) or stolen credentials. To avoid the latter, organizations must immediately revoke access privileges for departing employees and promptly grant credentials to incoming ones. However, one study found that 50% of companies took three days or longer to revoke system access for departing employees, while 72% of companies took one week or longer to grant access to new employees. Challenge 3: Misconfigurations and human error According to the Verizon Data Breach Investigations Report , nearly 13% of breaches involved human error — primarily misconfigured cloud storage. Overall, the Verizon team found that the human element (which includes phishing and stolen credentials) was responsible for 82% of security incidents. Because misconfigurations are such common mistakes, they comprise the majority of data breaches. For example, AWS governs permissions and resources through JSON files called policies. However, unless you’re an expert in AWS IAM, it’s hard to understand what a policy might really mean. Figure 1 shows a read-only policy that was accidentally altered to include writes through the addition of a single line of code, thereby inadvertently opening it to the public. That data could be sensitive personally identifiable information (PII); for example, it could be financial data — something that really shouldn’t be modified. Figure 1: Two examples of read-only policies laid out side by side, demonstrating how a single line of code can impact your security. Although the Verizon report concluded that misconfigurations have decreased during the past two years, these mistakes (often AWS S3 buckets improperly configured for public access) have resulted in high-profile leaks worldwide. In one instance, a former AWS engineer created a tool to find and download user data from misconfigured AWS accounts . She gained access to Capital One and more than 100 million customer credentials and credit card applications. The penalties for these vulnerabilities and violations are heavy. For example, the General Data Protection Regulation (GDPR) enacts a penalty of up to four percent of an organization’s worldwide revenue or €20,000,000 — whichever is larger. In the aftermath of the security event, Capital One was fined $80 million by regulators ; other incidents have resulted in fines ranging from $35 million to $700 million . Where does MongoDB Atlas come in? MongoDB Atlas is secure by default, which means minimal configuration is required, and it’s verified by leading global and regional certifications and assurances. These assurances include critical industry standards, such as ISO 27001 for information security, HIPAA for protected healthcare information, PCI-DSS for payment card transactions, and more . By abstracting away the details of policies, roles, and other protocols, Atlas centralizes and simplifies multi-cloud security controls. Atlas provides a regional selection option to control data residency, default virtual private clients (VPCs) for resource isolation, RBAC for fine-tuning access permissions, and more. These tools support security across an entire environment, meaning you can simply configure them as needed, without worrying about the nuances of each cloud provider. Atlas is also compatible with many of the leading security technologies and managers, including Google KMS, Azure Key Vault, or AWS KMS, enabling users to either bring their own keys or to secure their clusters with the software of their choice. Additionally, data is always encrypted in transit and at rest. For example, you can run rich queries on fully encrypted data using Queryable Encryption , which allows you to extract insights without compromising security. Data is only decrypted when the results are returned to the driver — where the key is located — otherwise, encrypted fields will display as randomized ciphertext. One real-world example involves a 2013 data breach at a supermarket chain in the United Kingdom, where a disgruntled employee accessed the personal data of nearly 100,000 employees. If Queryable Encryption had been available and in use at the time, the perpetrator would have downloaded only cipher text. With MongoDB Atlas, securing multi-cloud environments is simple and straightforward. Teams can use a single, streamlined interface to manage their security needs. There is no need to balance different security procedures and structures or keep track of different tools like hyperscalers or key management systems. Enjoy a streamlined, secure multi-cloud experience — sign up for a free MongoDB Atlas cluster today .
Demystifying Sharding with MongoDB
Sharding is a critical part of modern databases, yet it is also one of the most complex and least understood. At MongoDB World 2022 , sharding software engineer Sanika Phanse presented Demystifying Sharding in MongoDB , a brief but comprehensive overview of the mechanics behind sharding. Read on to learn about why sharding is necessary, how it is executed, and how you can optimize the sharding process for faster queries. Watch this deep-dive presentation on the ins and outs of sharding, featuring MongoDB sharding software engineer Sanika Phanse. What is sharding, and how does it work? In MongoDB Atlas , sharding is a way to horizontally scale storage and workloads in the face of increased demand — splitting them across multiple machines. In contrast, vertical scaling requires the addition of more physical hardware, for example, in the form of servers or components like CPUs or RAM. Once you’ve hit the capacity of what your servers can support, sharding becomes your solution. Past a certain point, vertical scaling requires teams to spend significantly more time and money to keep pace with demand. Sharding, however, spreads data and traffic across your servers, so it’s not subject to the same physical limitations. Theoretically, sharding could enable you to scale infinitesimally, but, in practice, you are scaling proportionally to the number of servers you add. Each additional shard increases both storage and throughput, so your servers can simultaneously store more data and process more queries. How do you distribute data and workloads across shards? At a high level, sharding data storage is straightforward. First, a user must specify a shard key, or a subset of fields to partition their data by. Then, data is migrated across shards by a background process called the balancer , which ensures that each shard contains roughly the same amount of data. Once you specify what your shard key will be, the balancer will do the rest. A common form of distribution is ranged sharding, which assigns data to various shards through a range of shard keys. Using this approach, one shard will contain all the data with shard keys ranging from 0-99, the next will contain 100-199, and so forth. In theory, sharding workloads is also simple. For example, if you receive 1,000 queries per second on a single server, sharding your workload across two servers would divide the number of queries per second equally, where each server receives 500 queries per second. . However, these ideal conditions aren’t always attainable, because workloads aren’t always evenly distributed across shards. Imagine a group of 50,000 students, whose grades are split between two shards. If half of them decide to check their grades — and all of their records happen to fall in the same shard ID range — then all their data will be stored on the same shard. As a result, all the traffic will be routed to one shard server. Note that both of these examples are highly simplified; real-world situations are not as neat. Shards won’t always contain a balanced range of shard IDs, because data might not be evenly divided across shards. Additionally, 50,000 students, while large, is still too small of a sample size to be in a sharded cluster. How do you map and query sharded data? Without an elegant solution, users may encounter latency or failed queries when they try to retrieve sharded data. The challenge is to tie together all your shards, so it feels like you’re communicating with one database, rather than several. This solution starts with the config server, which holds metadata describing the sharded cluster, as well as the most up-to-date routing table, which maps shard keys to shard connection strings. To increase efficiency, routers regularly contact the config server to create a cached copy of this routing table. Nonetheless, at any given point in time, the config server’s version of the routing table can be considered the single source of truth. To query sharded data, your application sends your command to the team of routers. After a router picks up the command, it will then use the shard key from the command’s query, in conjunction with its cached copy of the routing table, to direct the query to the correct location. Rather than using the entire document, the user will only select one field (or combination of fields) to serve as the shard key. Then, the query will make its way to the correct shard, execute the command, update, and return a successful result to the router. Operations aren’t always so simple, especially when queries do not specify shard keys. In this case, the router realizes that it is unaware of where your data exists. Thus, it sends the query to all the shards, and then it waits to gather all the responses before returning to the application. Although this specific query is slow if you have many shards, it might not pose a problem if this query is infrequent or uncommon. How do you optimize shards for faster queries? Shard keys are critical for seamless operations. When selecting a shard key, use a field that matches on all (or most) of your data and has a high cardinality. This step ensures granularity among shard key values, which allows the data to be distributed evenly across shards. Additionally, your data can be resharded as needed, to fit changing requirements or to improve efficiency. Users can also accelerate queries with thoughtful planning and preparation, such as optimizing their data structures for the most common, business-critical query patterns. For example, if your workload makes lots of age-based queries and few _ID-based queries, then it might make sense to sort data by age to ensure more targeted queries. Hospitals are good examples, as they pose unique challenges. Assuming that the hospital’s patient documents would contain fields such as insurance, _ID value, and first and last names, which of these values would make sense as a shard key? Patient name is one possibility, but it is not unique, as many people might have the same name. Similarly, insurance can be eliminated, because there are only a handful of insurance providers, and people might not even have insurance. This key would violate both the high-cardinality principle, as well as the requirement that every document has this value filled. The best candidate for shard key would be the patient ID number or _ID value. After all, if one patient visits, that does not indicate whether another patient will (or will not) visit. As a result, the uniqueness of the _ID value will be very useful, as it will enable users to make targeted queries to the one document that is relevant to the patient. Faced with repeating values, users can also create compound shard keys instead. By including hyphenated versions of multiple fields, such as _ID value, patient names, and providers, a compound shard key can help reduce query bottlenecks and latency. Ultimately, sharding is a valuable tool for any developer, as well as a cost-effective way to scale out your database capacity. Although it may seem complicated in practice, sharding (and working effectively with sharded data) can be very intuitive with MongoDB. To learn more about sharding — and to see how you can set it up in your own environment — contact the MongoDB Professional Services team today.
Top 10 MongoDB Blogs of 2022
2022 was a year of many milestones at MongoDB — from global events to enhancements to the MongoDB Atlas developer data platform , and everything in between. Our most popular blog posts reflected these achievements, covering announcements, events, and more, from MongoDB 6.0 to Queryable Encryption to our annual Partner of the Year awards. Read on for a roundup of all our top blog posts from 2022. 10. Recognizing MongoDB's Partners of the Year Partners are a big part of MongoDB’s success, helping our customers modernize their infrastructure with MongoDB, integrating MongoDB into their existing systems, and even selling Atlas on their marketplaces. Other partners also help adapt the MongoDB developer data platform to various sectors and niches, such as streaming data, ecommerce, and financial services. Read Working Together: MongoDB’s Partner of the Year Awards to learn about a few of these organizations. 9. Syncing data with cluster-to-cluster sync Data sync and backups are a key part of the Atlas developer data platform. Until recently, however, there was no easy way to sync data across clusters, especially if they were in different environments — think syncing a hybrid cluster with a cloud-native one. This blog introduces cluster-to-cluster sync, which enables one-way synchronization between any two MongoDB clusters, regardless of type. The sync process is very flexible, includes real-time monitoring, and controls to pause, resume, or reverse synchronization at any time. Read Keeping Data in Sync Anywhere with Cluster-to-Cluster Sync . 8. How MongoDB 6.0 improves time series data Utilized in everything from IoT devices to ecommerce, time series data is an important use case for many industries. Since their release in MongoDB 5.0, time series collections have been continuously improved to help developers tackle a wider range of problems, including columnar compression for smaller storage footprints, densification for filling data gaps, and enhanced indexing on time series collections for better read performance. Read MongoDB Announces New Time Series Capabilities Coming For MongoDB 6.0 . 7. Reducing complexity with Apollo GraphQL and MongoDB Atlas As a layer that unifies your cloud data, APIs, and services into a single, accessible schema (known as a graph), GraphQL brings a streamlined, monolithic approach to operations while retaining the benefits of a microservices architecture. This post discusses how the scalability, reliability, and usability of the MongoDB Atlas developer data platform make it an ideal data layer for any tech environment that utilizes a graph. Read Building a Modern App Stack with Apollo GraphQL and MongoDB Atlas . 6. Improvements to change streams Change streams enable applications to recognize data changes throughout their environment, simplifying the creation of event-driven applications and abilities like real-time personalization, notifications, and more. This post explains the upgrades to change streams in depth, including the ability to retrieve the before and after states of documents to address more use cases; support for data definition language (DDL) operations (such as creating or dropping indexes with a single command) to simplify database management; and filtering notifications from aggregation frameworks for improved performance. Read Change Streams in MongoDB 6.0 — Support Pre- and Post-Image Retrieval, DDL Operations, and more . 5. Breaking biases and getting more women into tech leadership Even as representation in tech improves, only 26.7% of technologists are women — who leave the sector at a rate 45% higher than their male counterparts. For International Women’s Day, MongoDB senior product designer Ksenia Samokhvalova sat down with executives for a panel discussion on the root causes of (and potential solutions for) this issue. They discussed the differences between inclusivity and diversity, the hurdles for retention, how bias begins at a young age, and the importance of mentorship. Read Breaking the Bias: How Can We Get More Women into Top-Level Tech . 4. Highlights from MongoDB World 2022 After three years, MongoDB World returned to New York City’s Javits Center from June 7-9 for three days of exploration, inspiration, and innovation. Readers were eager to get the scoop on everything they missed, from new product launches to workshops. Day 1 kicked off the conference with keynotes announcing the latest features and outlining the concept of a developer data platform, an integrated set of data and application services with a unified developer experience. Day 2 included over 80 breakout sessions on topics ranging from diversity to data modeling to building with Rust. The last day featured a keynote from renowned inventor Ray Kurzweil, interactive competitions and challenges at the Builder’s Fest, and the “Chaos Presentation” from MongoDB CTO Mark Porter. Read our recaps for day 1 , day 2 , and day 3 of MongoDB World. If you are interested in watching full sessions from MongoDB World, check out our playlist on YouTube . 3. Introducing MongoDB's Prometheus monitoring integration A popular open source monitoring platform, Prometheus features a flexible query language (PromQL), a versatile data model that supports time series data, customizable alerting, a large, active user community, and consistent updates. In this post, learn how the integration enables Prometheus to collect hardware and monitoring metrics from MongoDB and display them directly in the Prometheus UI or via Grafana dashboards. You can simplify monitoring with the Prometheus integration for MongoDB, removing the need to toggle between interfaces and keeping all your metrics in one place. Read Introducing MongoDB’s Prometheus Monitoring Integration . 2. Queryable Encryption For years, Queryable Encryption (QE) existed only as a theory: What if users could query fully-encrypted data, and only have to decrypt it once the results were returned? Given that data has traditionally been encrypted at rest or in transit — but not during the querying process — a feature like QE would add an additional layer of protection and remove a known vulnerability. We were happy to see that readers were as excited as we were at the release of Queryable Encryption in preview. This announcement was only possible after years of research and partnerships with outside experts from Brown University, the University of Chicago, and a leading organization in the field. For more background on the evolution of Queryable Encryption, check out Wired’s article, A Long Awaited Defense Against Data Leaks May Have Just Arrived . Now, you can run fast, rich queries on encrypted data at scale, keeping it secure throughout its lifecycle. Queryable Encryption also helps speed up app development because it is easy to use and set up, is compatible with MongoDB drivers, and supports strong key management and cryptography. Read MongoDB Releases Queryable Encryption Preview . 1. 7 reasons to upgrade to MongoDB 6.0 The release of MongoDB 6.0 was big news this year. It brought improvements and new features in areas, like security, change streams, time series collections, operations, and more — making the developer data platform even easier to run, scale, and build with. As with other releases, MongoDB 6.0 removes data silos, eliminates complexity, and frees up teams to spend less time troubleshooting custom architectures — and more time creating apps and products. Some highlights were the inclusion of Atlas Search facets to easily filter results, the creation of initial sync (via file copy) to quickly catch up new or slow nodes in your replica sets, and the addition of new operators to aggregation frameworks for faster analysis and deeper insights. Read 7 Big Reasons to Upgrade to MongoDB 6.0 . We hope you had a great 2022, and that you enjoyed attending our events, reading our blogs, and using the MongoDB Atlas developer data platform. As always, you can sign up for a free (forever) cluster on Atlas.
MongoDB highlights from AWS re:Invent 2022
Since its inception a decade ago, AWS re:Invent has become one of the preeminent conferences for the global cloud community — and a venue for inspiration, exploration, and innovation. This year, MongoDB attended and hosted talks, workshops, and sessions; met with customers and partners; and connected with developers and potential and current customers in the expo hall. MongoDB is the AWS Marketplace Partner of the Year - EMEA MongoDB was awarded the AWS Marketplace Partner of the Year for the Europe, Middle East, and Africa (EMEA) region. Since 2021, MongoDB Atlas on AWS has grown by 173 percent among EMEA users, a result of a deeper collaboration with AWS and increased attention to the customer experience. For instance, we’ve simplified the purchase and pricing of MongoDB on AWS; partnered with AWS Marketplace Vendor Insights for increased security, compliance, and confidence; helped customers accelerate their migrations to MongoDB on AWS; and more. For more details, read “ MongoDB and AWS: How a decade-old collaboration got even better in 2022 .” MongoDB customers were everywhere at AWS re:Invent Many MongoDB customers trust AWS for their cloud computing needs, making AWS re:Invent an ideal opportunity to better understand customer needs and use cases, strengthen relationships, and plan for the new year. We were happy to see how many MongoDB customers were mentioned in AWS CEO Adam Selipsky’s keynote address . In between debuting new AWS features and capabilities, Selipsky also mentioned many AWS and MongoDB customers, including Intuit , Okta , Palo Alto Networks , Expedia , and Epic Games . In fact, two-thirds of the brands mentioned across all four keynotes use and trust MongoDB. Additionally, joint MongoDB and AWS partners Vercel and BigID , as well as customers TEG/Ticketek and Midland Credit Management , were featured on the Voice of the Customer series . The videos should be posted on the Amazon Partner Network channel soon . MongoDB customer and partner live streams from re:Invent From fashion startups to telecommunications providers, MongoDB customers span a wide range of industries, sizes, and business models. To help them share their diverse experiences, we also live-streamed conversations with leaders from four innovative MongoDB customers and partners: Okta/Auth0 , VEERUM , Alloy Automation , and Vercel . For our first livestream, MongoDB developer relations lead Shane McAllister sat down with Okta VP of engineering Andrew Yu for a conversation on how Okta became the preferred identity provider of tech teams across a range of industries and sectors. MongoDB developer relations lead Shane McAllister discusses the meteoric rise of Okta with Okta VP of engineering Andrew Yu. Next, MongoDB senior developer advocate Jesse Hall spoke with VEERUM CTO Rob Southon about their unique “digital twin” technology that allows remote site visits, enabling VEERUM customers to reduce time, money, and environmental impact. MongoDB senior developer advocate Jesse Hall and VEERUM CTO Rob Southon discuss VEERUM's innovative asset management model and how it reduces costs and carbon footprint. Afterward, Gregg Mojica, co-founder of Alloy Automation , spoke to MongoDB principal developer advocate Mike Lynn about his tech journey, including Alloy’s participation in the MongoDB for Startups program , overcoming the challenges of the COVID-19 pandemic, and the successful completion of a Series A funding round. MongoDB principal developer advocate Michael Lynn talks to Alloy Automation co-founder Gregg Mojica about Alloy’s journey. Shane McAllister also sat down with Vercel CEO Guillermo Rauch for a discussion about the MongoDB integration in the Vercel Marketplace , new announcements for Next.js 13 , and the latest World Cup win for Argentina ⚽️. MongoDB developer relations lead Shane McAllister speaks with Vercel CEO Guillermo Rauch on all the new Vercel announcements at AWS re:Invent 2022. MongoDB executives on prime time MongoDB CEO Dev Ittycheria and CISO Lena Smart were featured on theCube by SiliconANGLE , a leading tech news site. Ittycheria discussed some of current data trends, including the link between productivity and innovation, how consolidating tech stacks accelerates the release cycle, and how the MongoDB developer data platform empowers teams to make decisions faster and shorten time to market. MongoDB CEO Dev Ittycheria sits down with interviewers from SiliconANGLE’s theCube talk show to talk about today’s rapidly evolving data landscape. In her interview on theCube , Smart covered a variety of subjects, including new security features like Queryable Encryption , how to turn security from an obstacle to an opportunity, and the evolution of MongoDB. She also compared and contrasted emerging security challenges (from AI to quantum computing) with past crises like Y2K. In this episode of SiliconANGLE’s theCube, MongoDB CISO Lena Smart reflects on past crises, the development of MongoDB, and how to turn security from obstacle to opportunity. Lastly, SVP of product management Andrew Davidson spoke with Patrick Moorhead and David Newman from Futurum Research on The Six Five on the Road at AWS re:Invent 2022 show . Davidson began with an explanation of the data landscape, defining and discussing transactional data before explaining how MongoDB’s innovative document data model empowers developers with its flexibility and ease of use. MongoDB was named as a leader in The Forrester Wave™: Translytical Data Platforms report for 2022 . By bridging the gap between transactional and analytical data, translytical data enables teams to build smarter apps, get faster business insights, increase innovation, and outpace competitors. Security and "The Rise of the Developer Data Platform" In CISO Lena Smart’s re:Invent fireside chat with MongoDB’s Karen Huaulme, Principal Developer Advocate; Andrew Davidson, SVP, Products; and Krista Braun, Executive Keynote Advisor; Smart discussed how security innovations played an important role in the growth of the MongoDB Atlas developer data platform , an integrated set of data and application services that share a unified developer experience. MongoDB CISO Lena Smart presents “The Rise of the Developer Data Platform,” before joining MongoDB employees Andrew Davidson, Krista Braun, and Karen Huaulme for a panel discussion on the evolution of MongoDB. Making the most out of your data On each day of AWS re:Invent, MongoDB hosted lightning talks at our in-booth theater, covering a variety of practical topics such as frontend development, real-time analytics, and more. In his two daily sessions, executive solutions architect Sigfrido “Sig” Narváez discussed different ways to maximize the value of your data. In the first workshop, Narváez and Ralph Capasso, director of engineering for MongoDB Data Lake , used real (and fictional) open source data from a Blue Origin rocket launch to demonstrate how the Atlas developer data platform can streamline tech stacks and provide real-time analytics and visualizations to boost customer engagement. Check out the rocket-analytics GitHub repo for more information. In his next workshop, Narváez discussed how to tap into data locked away in relational databases by migrating to the Atlas developer data platform. In his demo, Narváez covered several key competencies, including transforming data with the MongoDB Relational Migrator , invoking a GraphQL endpoint with Postman , and using the Realm SDK to build a mobile app enabled with cloud sync. Visit the liberate-data GitHub repo for more information, including a complete Postman collection to import into your environment. MongoDB executive solutions architect Sigfrido “Sig” Narváez presents a workshop at the in-booth theater at AWS re:Invent 2022. Demystifying the edge Following up on a MongoDB World presentation, “ Building Your First Edge Computing App with MongoDB Atlas Device Sync, Realm, & Verizon 5G Edge ,” MongoDB solutions architect and Realm specialist Mark Brown put together a practical, step-by-step tutorial on how developers can use LTE and 5G networks to bypass the physical fiber optic infrastructure of the internet, and deliver speedy, seamless service. Although Brown’s re:Invent sessions were not recorded, you can access the workshop modules for a self-guided walkthrough. Read the Mobile Edge Computing: Realizing the Benefits of 5G with MongoDB and Verizon 5G Edge white paper and blog series for more information. MongoDB solutions architect (and Realm specialist) Mark Brown hosts his workshop on bypassing physical internet infrastructure using LTE and 5G networks and edge applications. The evolution of a data-driven application In her workshop “ 10 Things You Didn’t Know Your Data Could Do for You ,” MongoDB principal developer advocate Karen Huaulme shared her experience in creating a data-driven application. Read “ Streamline, Simplify, Accelerate: New MongoDB Features Reduce Complexity ” to learn more. MongoDB principal developer advocate Karen Huaulme shares the struggles, rewards, and lessons learned from building a data-driven application. Transitioning from relational to NoSQL MongoDB developer relations director Rick Houlihan shared cultural and operational aspects of switching from relational to NoSQL. Watch the recording of his talk, " From RDBMS to NoSQL ,” or read " Relational to NoSQL at Enterprise Scale .” MongoDB developer relations director Rick Houlihan talks about the cultural shift from relational to NoSQL at AWS re:Invent 2022. If you attended AWS re:Invent 2022, we hope you had a good time, and that we’ll see you there next year. In the meantime, you can run MongoDB Atlas on AWS — just head to the AWS Marketplace to get started. Sign up for a free trial to test out all the features and abilities that you’ve heard so much about.
MongoDB at AWS re:Invent: Workshops, Talks, Parties, and More
Join us at AWS re:Invent 2022 in Las Vegas. At our re:Invent booth and in our sessions, we'll show how MongoDB Atlas on AWS lets you build applications that are highly available, performant at global scale, and compliant with the most demanding security and privacy standards. MongoDB Atlas on AWS also provides the convenience of consolidated billing and simplified procurement through your AWS account. Learn how developers can use AWS and MongoDB features together to build the next big thing in AI, application modernization, serverless analytics, or any number of use cases. As the flagship conference of one of the leading cloud providers (and a close MongoDB Partner), AWS re:Invent features more than 1,500 workshops, presentations, and demos, and draws more than 50,000 attendees. AWS re:Invent runs from November 28 through December 2 at six properties on the Las Vegas strip: The Venetian, the Wynn and the Encore at the Wynn, Caesars Palace, MGM Grand, and Mandalay Bay. AWS re:Invent is a great place to experience next-generation products firsthand, connect with other like-minded peers, thought leaders, and more. Read on to learn what MongoDB has planned for this event, and to plan ahead for your own AWS re:Invent journey. The MongoDB booth at AWS re:Invent 2021. This year, find us at Booth #1611, located in the Expo Hall at The Venetian. Meet, learn, and engage at MongoDB locations For questions about specific use cases and to meet with MongoDB experts, visit Booth #1611, located in the Expo Hall at The Venetian, or check out our after hours events, hosted at The Emerald Lounge at Sugarcane Raw Bar Grill at The Venetian. This year, the MongoDB booth includes interactive demo kiosks showcasing MongoDB Atlas , our fully managed developer data platform, and to check out guided workshops on security, mobile app development, and more. The MongoDB booth will also include a series of lightning talk sessions on a variety of subjects, from data modeling to Queryable Encryption . These lightning talks and tutorials cover specific topics, such as using MongoDB alongside AWS products such as Wavelength, and may also include an interactive component. To attend, head to our in-booth theater, where seats are available on a first-come, first-served basis. In the evenings, check out our events hosted at Sugarcane. On Tuesday night from 6-8 p.m. PST, MongoDB, Vercel, and PluralSight will host the Gamer’s Paradise Pub Crawl , where you can mingle and play arcade, board, and video games. On Wednesday night at 9 p.m. PST, stop by for our Desert Disco , co-hosted with our partner, Confluent, and featuring DJ Malibu Cathy, top shelf drinks, and food. RSVP now to reserve your spot. The 2021 party at the Sugarcane Raw Bar Grill. Join us this year for our Gamer’s Paradise Pub Crawl and our Desert Disco. Listen to MongoDB speakers MongoDB experts and executives will be featured at AWS re:Invent, hosting breakout sessions on the growing partnership between MongoDB and AWS; the evolving data landscape; how these dynamics affect developers, applications, and users; and the rise of edge computing. First, CISO Lena Smart will give a talk on “ The Rise of the Developer Data Platform ,” highlighting the concept of a product ecosystem built around a common API, enabling developers to easily build more reliable, scalable applications, and to drive innovation. Smart will also touch on how this new digital paradigm has affected security, and how the developer data platform serves as a unifying philosophy for all MongoDB features and releases. Afterwards, Smart will join MongoDB Principal Developer Advocate Karen Huaulme, Senior Vice President of Products Andrew Davidson, and Executive Keynote Advisor Krista Braun for a fireside chat and live Q&A on the developer data platform. In her talk on “ 10 Things You Didn’t Know Your Data Could Do for You ,” Principal Developer Advocate Karen Huaulme will discuss how developers can avoid being overwhelmed by the abundance of data today. She’ll also dig into how to put data to work, whether it’s deriving analytical insights, powering diverse workloads, or developing practical functionality. Huaulme will draw on her extensive background to share common mistakes and teachable moments, so you can avoid the same pitfalls. Developer Relations Director Rick Houlihan will cover going from RDBMS to NoSQL , introducing NoSQL in a new light—not just as a technology, but as a philosophy. Transitioning from relational to non-relational doesn’t only involve migrations, but also requires a shift in mindset in areas such as data modeling and everyday operations. As the former head of Amazon’s NoSQL Blackbelt team, Houlihan speaks from experience, as he led Amazon’s migration from relational to NoSQL, and played a pivotal role in modeling thousands of production workloads and retraining more than 25,000 developers on this new paradigm. Alongside AWS team members, Realm Specialist Solutions Architect Mark Brown will deliver a talk on architecting and delivering applications at the edge with AWS hybrid cloud and edge computing services. Brown and his AWS collaborators will explain the unexpected challenges of edge computing and demonstrate possible solutions. Be sure to bring your laptop! Learn how MongoDB can empower you to build apps on AWS faster and easier To try MongoDB and AWS products for yourself, deploy and manage Atlas from your AWS environment through AWS Quick Start , as well as through AWS CloudFormation . From there, you can connect a wide range of AWS services with MongoDB tools for any use case. For instance, you can build serverless, event-driven applications with MongoDB Application Services (formerly known as Realm) and Amazon Eventbridge, migrate legacy applications with MongoDB Atlas on AWS , ingest and analyze streaming data with Amazon MSK and MongoDB, and more. For a more detailed list of AWS and MongoDB integrations, check out our Managed MongoDB on AWS resource . If you’re eager to try out Atlas with AWS today, check out the AWS marketplace . Atlas is available in AWS regions across the world . To learn more about what MongoDB has planned for AWS re:Invent, check out our event web page .
Network, Build, and Learn at MongoDB.local Events — Now Free to Attend
Panel Discussion at MongoDB.local London, 2021 Every year, MongoDB hosts popular MongoDB.local events in major cities around the world. Packed with workshops, talks, and keynotes, these one-day, in-person gatherings bring together engineers, entrepreneurs, and executives from the surrounding area. This year, for the first time, admission to MongoDB.local events is free. (Note that admission is granted on a first-come, first-served basis, limited only by seating capacity.) Five upcoming events Five MongoDB.local events are scheduled for the remainder of 2022, and you can register for the .local event near you through the links below or through the MongoDB.local hub page . Frankfurt , September 27, 2022 San Francisco , October 20, 2022 Dallas , October 27, 2022 London , November 15, 2022 Toronto , December 15, 2022 From sessions on the future of serverless to demos of next-generation technology, here’s what to expect at a MongoDB.local event near you. Learn from the experts Whether you attend keynote presentations or participate in customer discussions, you can tap into a wealth of knowledge from people and organizations that are thoroughly familiar with today’s technology landscape. You’ll learn from MongoDB experts, who will share hard-earned knowledge, practical solutions, and technical insight based on firsthand experience with common issues. You can also attend talks from MongoDB customers, which are generally centered around a specific use case and solution — a sort of shared retrospective for the public. At .local Frankfurt, for example, an engineer from Bosch will discuss the company’s evolution from individual documents to time series data in an IoT environment. All MongoDB.locals include sessions for a wide array of skill levels and specialities, such as a deep dive into the new Queryable Encryption feature or an introduction to building a basic application using Atlas Device Sync and React. These workshops offer practical, actionable advice that you can implement immediately upon returning to your office. Expand your professional network MongoDB.local events also offer many opportunities to expand your personal and professional network. In particular, these gatherings are a great way to connect with members of your local MongoDB User Group, who are likely working with the same technologies (or facing similar challenges) that you are. Whether you’re searching for a new job or business opportunity, looking for tips and techniques to implement in your own environment, or just browsing for inspiration, you’ll likely find what you seek at MongoDB.local. Explore the latest products Product booths are another highlight of MongoDB.local events. Staffed by MongoDB product teams, these booths are where you can pick up limited edition stickers, discuss the latest developments with expert engineers, and see new MongoDB features in action. Every event also features booths where third-party partners, vendors, and allies demonstrate cutting-edge technology, show how their platforms and services work in tandem with MongoDB, and answer any questions you may have. Stop by these booths to explore the next big thing in data, see how MongoDB can provide new solutions for pressing problems, and come away with helpful, personalized advice for your own challenges. Enjoy a one-of-a-kind experience From Frankfurt’s Klassikstadt to London’s Tobacco Dock , MongoDB.locals are held at unique, memorable venues. Step inside refurbished historical sites, such as a former factory turned automobile museum or a shipping wharf converted into a top-tier event space. In addition to a full day of talks and tutorials, attendees can enjoy breakfast, lunch, snacks, and drinks served at MongoDB.locals. Join us for a day packed with learning and networking opportunities in a venue near you. Whether you’re a decision-maker or a developer, you’ll find something interesting, enlightening, or useful at MongoDB.local. Learn more about our upcoming MongoDB.local events in Frankfurt , San Francisco , Dallas , London , and Toronto , and register for your free ticket.
Free your data with the MongoDB Relational Migrator
Nothing is more frustrating than data that is just out of reach. Imagine wanting to combine customer behavior data from your CRM and usage data from your legacy product to trigger tailored promotions in your new mobile app, but not being able to locate the required data in the sea of tables in your relational database. As MongoDB CTO Mark Porter explains in his MongoDB World keynote , the data that can make a difference might be locked up “somewhere that you can’t use.” Relying on his own hard-earned experience with data, Porter adds that this information can be trapped “in a schema with hundreds or thousands of tables that have built up over decades.” “Schema is a huge part of this problem,” MongoDB product manager Tom Hollander explains during a presentation on MongoDB Relational Migrator at MongoDB World 2022. “So we’ve spent a lot of time building out the tools to enable you to map your tabular relational schema into a document schema and make use of the full power of the MongoDB document model.” To see MongoDB Relational Migrator in action, check out this introduction and demo from MongoDB World 2022, featuring MongoDB product manager Tom Hollander. What is MongoDB Relational Migrator? MongoDB Relational Migrator streamlines migrations from legacy data infrastructure to MongoDB by helping developers analyze relational database schemas, convert them into MongoDB schemas, and then migrate data from the source database to MongoDB. Currently, Relational Migrator is compatible with four of the most common relational databases: Oracle, SQL Server, MySQL, and PostgreSQL. Migrator not only moves data from your relational database to MongoDB, but it also transforms it according to your new schema. As Hollander and MongoDB product marketing director Eric Holzhauer point out , developers often use a mix of software and tools (e.g., extract-transform-load pipelines, change data capture (CDC), message queues, and streaming) to execute migrations, which can be complicated, risky, and error-prone. Relational Migrator provides a single tool that can streamline the process while simultaneously ensuring that your data lands in an organized, logical manner. By simplifying schema translations — one of the most complex, difficult parts of any relational migration — Relational Migrator grants developers and other technical teams a greater degree of control over (and increased visibility into) their new MongoDB schema. The result is to make data more accessible for analysis and decision making. “Now I can get at the data in my program without going through a translation layer,” Porter explains. A visual representation of how Migrator maps relational schema to document schema. Migration mode: Snapshot or ongoing? Migrator provides two modes of data transfer: a one-time snapshot or a continuous sync (which will be available later this year). To help decide which mode you should use, consider whether you can move over to MongoDB and immediately decommission your previous database or whether you need to keep your existing relational database up and running. Organizations may wish to keep their relational database for various reasons, such as testing the effectiveness of your proposed document schema, running out the contract or licensing agreement to avoid expensive fees, or keeping old databases available for audits. In this situation, you can keep your relational database running so that Relational Migrator will continue to push data from your source to your new MongoDB clusters. The limits of Relational Migrator As Hollander points out, Relational Migrator is only a tool — one intended to facilitate schema mapping, providing many abilities and options for effective schema design. “It’s not a silver bullet that will immediately modernize your application portfolio,” Hollander says. “It’s not going to do everything for you. You still have to do the planning.” Furthermore, because database schema is a tricky topic even for seasoned experts, Hollander recommends that developers would benefit from working with architects, consultants, and partners — especially if they’re not familiar with MongoDB or schema design best practices. Relational Migrator does not yet support continuous replication, which would enable your relational database and MongoDB clusters to coexist for an extended period of time. However, Hollander says that work on this feature is ongoing and it will be available in the future, along with additional capabilities like schema recommendations, an integration for the MongoDB Atlas developer data platform, and more. MongoDB Relational Migrator is currently in private preview, for use on non-production workloads with assistance from our Product and Field Engineering teams. To learn more, get in touch with your MongoDB rep or contact us via our Migrator page to discuss your workload and next steps.
3 Reasons (and 2 Ways) to Use MongoDB’s Improved Time Series Collections
Time series data, which reflects measurements taken at regular time intervals, plays a critical role in a wide variety of use cases for a diverse range of industries. For example, park management agencies can use time series data to examine attendance at public parks to better understand peak times and schedule services accordingly. Retail companies, such as Walmart , depend on it to analyze consumer spending patterns down to the minute, to better predict demand and improve shift scheduling, hiring, warehousing, and other logistics. As more sensors and devices are added to networks, time series data and its associated tools have become more important . In this article, we’ll look at three reasons (and two ways) to use MongoDB time series collections in your stack. This in-depth introduction to time series data features MongoDB Product Manager Michael Gargiulo. Reason 1: Purpose-built for the challenges of time series data At first glance, time series collections resemble other collections within MongoDB, with similar functionalities and usage. Beneath the surface, however, they are specifically designed for storing, sorting, and working with time series data. For developers, query speed and data accessibility continue to be challenges associated with time series data. Because of how quickly time series data can accumulate, it must be organized and sorted in a logical way to ensure that queries and their associated operations can run smoothly and quickly. To address this issue, time series collections implement a key tenet of the MongoDB developer data platform: Data that is stored together is accessed together. Documents (the basic building block of MongoDB data) are grouped into buckets, which are organized by time. Each bucket contains time series data from a variety of sources — all of which were gathered from the same time period and all of which are likely to show up on the same queries. For example, if you are using time series collections to analyze the rise in summer temperatures of Valencia, Spain from 1980 to 2020, then one bucket will contain temperatures for August 1991. Relevant, but distinct buckets (such as temperatures for the months of June and July 1991) would also be stored on the same page for faster, easier access. MongoDB also lets you create compound indexes on any measurement field in the bucket (whether it’s timeField or metaField) for faster, more flexible queries. Because of the wide variety of indexing options, operations on time series data can be executed much more quickly than with competing products. For example, scan times are reduced by indexing buckets of documents (each of which has a unique identifier) rather than individual documents. In terms of the previous example, you could create an index on the minimum and maximum average summer temperatures in Valencia, Spain from 1980 to 2020 to more quickly surface necessary data. That way, MongoDB does not have to scan the entire dataset to find min and max values over a period of nearly four decades. Another concern for developers is finding the last metadata value, which in other solutions, requires users to scan the entire data set — a time-consuming process. Instead, time series collections use last point queries, where MongoDB simply retrieves the last measurement for each metadata value. As with other fields, users can also create indexes for last points in their data. In our example, you could create an index to identify the end of summer temperatures in Valencia from 1980 to 2020. By indexing the last values, time series collections can drastically reduce query times. Another recurring challenge for time series applications is data loss from Internet of Things (IoT) applications for industries such as manufacturing, meteorology, and more. As sensors go offline and gaps in your data appear, it becomes much more difficult to run analytics, which require a continuous, uninterrupted flow of data. As a solution, the MongoDB team created densification and gap filling. Densification, executed by the $densify command, creates blank, placeholder documents to fill in any missing timestamps. Users can then sort data by time and run the $fill command for gap filling. This process will estimate and add in any null or missing values in documents based on existing data. By using these two capabilities in tandem, you will get a steady flow of data to input into aggregation pipelines for insights. Reason 2: Keep everything in house, in one data platform Juggling different data tools and platforms can be exhausting. Cramming a bunch of separate products and technologies into a single infrastructure can create complex architectures and require significant operational overhead. Additionally, a third-party time series solution may not be compatible with your existing workflows and may necessitate more workarounds just to keep things running smoothly. The MongoDB developer data platform brings together several products and features into a single, intuitive ecosystem, so developers can use MongoDB to address many common needs — from time series data to change streams — while reducing time-consuming maintenance and overhead. As a result, users can take advantage of the full range of MongoDB features to collect, analyze, and transform time series data. You can query time series collections through the MongoDB Compass GUI or the MongoDB Shell , utilize familiar MongoDB capabilities like nesting data within documents, secondary indexes, and operators like $lookup or $merge, and process time series data through aggregation pipelines to extract insights and inform decision making. Reason 3: Logical ways to organize and access time series data Time series collections are designed to be efficient, effective, and easy to use. For example, these collections utilize a columnar storage format that is optimized for time series data. This approach ensures efficiency in all database operations, including queries, input/output, WiredTiger cache usage, and storage footprints for both data and secondary indexes. Let’s look, for example, at how querying time series data collections works. When a query is executed, two things happen behind the scenes: Bucket unpacking and query rewrites. To begin with, time series collections will automatically unpack buckets — similar to the $unwind command. MongoDB will unscroll compressed data, sort it, and return it to the format in which it was inserted, so that it is easier for users to read and parse. Query rewrites work alongside bucket unpacking to ensure efficiency. To avoid unpacking too many documents (which exacts a toll in time and resource usage), query rewrites use indexes on fields such as timestamps to automatically eliminate buckets that fall outside the desired range. For example, if you are searching for average winter temperatures in Valencia, Spain from 1980 to 2020, you can exclude all temperatures from the spring, summer, and fall months. Now that we’ve examined several reasons to consider MongoDB time series collections, we’ll look at two specific use cases. Use case 1: Algorithmic trading Algorithmic trading is a major use case for time series data, and this market is predicted to grow to $15 billion by 2028 . The strength of algorithms lies in their speed and automation; they reduce the possibility of mistakes stemming from human emotions or reaction time and allow for trading frequency beyond what a human can manage. Trading algorithms also generate vast volumes of time series data, which cannot necessarily be deleted, due to compliance and forecasting needs. MongoDB, however, lets you set archival parameters, to automatically move that data into cheaper cloud object storage after a preset interval of time. This approach preserves your valuable storage space for more recent data. Using MongoDB products such as Atlas, materialized views, time series collections, and triggers, it is also possible to build a basic trading algorithm. Basically, time series data will be fed into this algorithm, and when the conditions are ideal, the algorithm can buy or sell as needed, thus executing a series of individual trades with cumulative profits and losses (P&L). Although you’ll need a Java app to actually execute the trades, MongoDB can provide a strong foundation on which to build. The structure of such an algorithm is simple. Time series data is loaded from a live feed into MongoDB Atlas, which will then input it into a materialized view to calculate the averages that will serve as the basis of your trades. You can also add a scheduled trigger to execute when new data arrives, thereby refreshing your materialized views, keeping your algorithm up to date, and not losing out on any buying/selling opportunities. To learn more, watch Wojciech Witoszynski’s MongoDB World 2022 presentation on building a simple trading algorithm using MongoDB Atlas, “Algorithmic Trading Made Easy.” Use case 2: IoT Due to the nature of IoT data, such as frequent sensor readings at fixed times throughout a day, IoT applications are ideally suited for time series collections. For example, Confluent, a leading streaming data provider, uses its platform alongside MongoDB Atlas Device Sync , mobile development services, time series collections, and triggers to gather, organize, and analyze IoT data from edge devices. IoT apps often feature high volumes of data taken over time from a wide range of physical sensors, which makes it easy to fill in meta fields and take advantage of densification and gap filling features as described above. MongoDB’s developer data platform also addresses many of the challenges associated with IoT use cases. To begin with, MongoDB is highly scalable, which is an important advantage, given the huge volumes of data generated by IoT devices. Furthermore, MongoDB includes key features to enable you to make the most of your IoT data in real time. These include change streams for identifying database events as they occur, or functions, which can be pre-scheduled or configured to execute instantaneously to respond to database changes and other events. For users dealing with time-based data, real-time or otherwise, MongoDB’s time series collections offer a seamless, highly optimized way to accelerate operations, remove friction, and use tools, such as triggers, to further analyze and extract value from their data. Additionally, users no longer have to manually bucket, query, or otherwise troubleshoot time series data; instead, MongoDB does all that work for them. Try MongoDB time series collections for free in MongoDB Atlas .
MongoDB 6.0 Now Available!
MongoDB 6.0 is now available for download. This major release introduces improvements to existing features as well as new products to empower you to build faster, troubleshoot less, and cut out complexity from your workflows. Continuing with the theme of the developer data platform concept introduced at MongoDB World 2022 , MongoDB 6.0’s new and enhanced abilities help remove the need for outside platforms in your tech stacks or application architectures. This means less time managing fundamentally incompatible solutions and more time building applications and solutions. MongoDB 6.0 includes several feature upgrades, more integrations, support for a diverse range of scenarios, and much more. For instance, time series collections and change streams can now be used for additional use cases, such as geo-indexing or finding the before and after states of documents, respectively. Additionally, MongoDB 6.0 includes exciting new releases for security, analytics, search, and more. One innovative new capability is Queryable Encryption , a first-of-its-kind technology that allows you to efficiently query data even as it remains encrypted, only decrypting it when it’s made available to the user. To learn more about MongoDB 6.0, read “7 Big Reasons to Upgrade to MongoDB 6.0” and visit the MongoDB 6.0 homepage to learn more — and to upgrade now.
7 Big Reasons to Upgrade to MongoDB 6.0
First announced at MongoDB World 2022, MongoDB 6.0 is now generally available and ready for download now. MongoDB 6.0 includes the capabilities introduced with the previous 5.1–5.3 Rapid Releases and debuts new abilities to help you address more use cases, improve operational resilience at scale, and secure and protect your data. The common theme in MongoDB 6.0 is simplification: Rather than forcing you to turn to external software or third-party tools, these new MongoDB capabilities allow you to develop, iterate, test, and release applications more rapidly. The latest release helps developers avoid data silos, confusing architectures, wasted time on integrating external tech, missed SLAs and other opportunities, and the need for custom work (such as pipelines for exporting data). Here’s what to expect in MongoDB 6.0. 1. Even more support for working with time series data Used in everything from financial services to e-commerce, time series data is critical for modern applications. Properly collected, processed, and analyzed, time series data provide a gold mine of insights — from user growth to promising areas of revenue — helping you grow your business and improve your application. First introduced in MongoDB 5.0, time series collections provide a way to handle these workloads without resorting to adding a niche technology and the resulting complexity. In addition, it was critical to overcome obstacles unique to time series data, such as high volume, storage and cost considerations, and gaps in data continuity (caused by sensor outages). Since its introduction, time series collections have been continuously updated and improved with a string of rapid releases . We began by introducing sharding for time series collections (5.1) to better distribute data, before rolling out columnar compression (5.2) to improve storage footprints, and finally moving on to densification and gap-filling (5.3) for allowing teams to run time series analytics — even when there are missing data points. As of 6.0, time series collections now include secondary and compound indexes on measurements, improving read performance and opening up new use cases like geo-indexing. By attaching geographic information to time series data, developers can enrich and broaden analysis to include scenarios involving distance and location. This could take the form of tracking temperature fluctuations in refrigerated delivery vehicles during a hot summer day or monitoring the fuel consumption of cargo vessels on specific routes. We’ve also improved query performance and sort operations. For example, MongoDB can now easily return the last data point in a series — rather than scanning the whole collection — for faster reads. You can also use clustered and secondary indexes to efficiently perform sort operations on time and metadata fields. 2. A better way to build event-driven architectures With the advent of applications like Seamless or Uber, users have come to expect real-time, event-driven experiences, such as activity feeds, notifications, or recommendation engines. But moving at the speed of the real world is not easy, as your application must quickly identify and act on changes in your data. Introduced in MongoDB 3.6, change streams provide an API to stream any changes to a MongoDB database, cluster, or collection, without the high overhead that comes from having to poll your entire system. This way, your application can automatically react, generating an in-app message notifying you that your delivery has left the warehouse or creating a pipeline to index new logs as they are generated. The MongoDB 6.0 release enriches change streams, adding abilities that take change streams to the next level. Now, you can get the before and after state of a document that’s changed, enabling you to send updated versions of entire documents downstream, reference deleted documents, and more. Further, change streams now support data definition language (DDL) operations, such as creating or dropping collections and indexes. To learn more, check out our blog post on change streams updates . 3. Deeper insights from enriched queries MongoDB’s aggregation capabilities allow users to process multiple documents and return computed results. By combining individual operators into aggregation pipelines, you can build complex data processing pipelines to extract the insights you need. MongoDB 6.0 adds additional capabilities to two key operators, $lookup and $graphlookup , improving JOINS and graph traversals, respectively. Both $lookup and $graphlookup now provide full support for sharded deployments. The performance of $lookup has also been upgraded. For instance, if there is an index on the foreign key and a small number of documents have been matched, $lookup can get results between 5 and 10 times faster than before. If a larger number of documents are matched, $lookup will be twice as fast as previous iterations. If there are no indexes available (and the join is for exploratory or ad hoc queries), then $lookup will yield a hundredfold performance improvement. The introduction of read concern snapshot and the optional atClusterTime parameter enables your applications to execute complex analytical queries against a globally and transactionally consistent snapshot of your live, operational data. Even as data changes beneath you, MongoDB will preserve point-in-time consistency of the query results returned to your users. These point-in-time analytical queries can span multiple shards with large distributed datasets. By routing these queries to secondaries, you can isolate analytical workloads from transactional queries with both served by the same cluster, avoiding slow, brittle, and expensive ETL to data warehouses. To learn more, visit our documentation . 4. More operators, less work Boost your productivity with a slate of new operators, which will enable you to push more work to the database — while spending less time writing code or manipulating data manually. These new MongoDB operators will automate key commands and long sequences of code, freeing up more developer time to focus on other tasks. For instance, you can easily discover important values in your data set with operators like $maxN , $minN , or $lastN . Additionally, you can use an operator like $sortArray to sort elements in an array directly in your aggregation pipelines. 5. More resilient operations From the beginning, MongoDB’s replica set design allows users to withstand and overcome outages. Initial sync is how a replica set member in MongoDB loads a full copy of data from an existing member — critical for catching up nodes that have fallen behind, or when adding new nodes to improve resilience, read scalability, or query latency. MongoDB 6.0 introduces initial sync via file copy, which is up to four times faster than existing, current methods. This feature is available with MongoDB Enterprise Server. In addition to the work on initial sync, MongoDB 6.0 introduces major improvements to sharding, the mechanism that enables horizontal scalability. The default chunk size for sharded collections is now 128 MB, meaning fewer chunk migrations and higher efficiency from both a networking perspective and in internal overhead at the query routing layer. A new configureCollectionBalancing command also allows the defragmentation of a collection in order to reduce the impact of the sharding balancer. 6. Additional data security and operational efficiency MongoDB 6.0 includes new features that eliminate the need to choose between secure data or efficient operations. Since its GA in 2019, client-side field-level encryption (CSFLE) has helped many organizations manage sensitive information with confidence, especially as they migrate more of their application estate into the public cloud. With MongoDB 6.0, CSFLE will include support for any KMIP-compliant key management provider. As a leading industry standard, KMIP streamlines storage, manipulation, and handling for cryptographic objects like encryption keys, certificates, and more. MongoDB’s support for auditing allows administrators to track system activity for deployments with multiple users, ensuring accountability for actions taken across the database. While it is important that auditors can inspect audit logs to assess activities, the content of an audit log has to be protected from unauthorized parties as it may contain sensitive information. MongoDB 6.0 allows administrators to compress and encrypt audit events before they are written to disk, leveraging their own KMIP-compliant key management system. Encryption of the logs will protect the events' confidentiality and integrity. If the logs propagate through any central log management systems or SIEM, they stay encrypted. Additionally, Queryable Encryption is now available in preview. Announced at MongoDB World 2022, this pioneering technology enables you to run expressive queries against encrypted data — only decoding the data when it is made available to the user. This ensures that data remains encrypted throughout its lifecycle, and that rich queries can be run efficiently without having to decrypt the data first. For a deep dive into the inner workings of Queryable Encryption, check out this feature story in Wired . 7. A smoother search experience and seamless data sync Alongside the 6.0 Major Release, MongoDB will also make ancillary features generally available and available in preview. The first is Atlas Search facets , which enable fast filtering and counting of results, so that users can easily narrow their searches and navigate to the data they need. Released in preview at MongoDB World 2022 , facets will now include support for sharded collections. Another important new addition is Cluster-to-Cluster Sync , which enables you to effortlessly migrate data to the cloud, spin up dev, test, or analytics environments, and support compliance requirements and audits. Cluster-to-Cluster Sync provides continuous, unidirectional data synchronization of two MongoDB clusters across any environment, be it hybrid, Atlas, on-premises, or edge. You’ll also be able to control and monitor the synchronization process in real time, starting, stopping, resuming, or even reversing the synchronization as needed. Ultimately, MongoDB 6.0’s new abilities are intended to facilitate development and operations, remove data silos, and eliminate the complexity that accompanies the unnecessary use of separate niche technologies. That means less custom work, troubleshooting, and confusing architectures — and more time brainstorming and building. MongoDB 6.0 is not an automatic upgrade unless you are using Atlas serverless instances. If you are not an Atlas user, download MongoDB 6.0 directly from the download center . If you are already an Atlas user with a dedicated cluster, take advantage of the latest, most advanced version of MongoDB. Here’s how to upgrade your clusters to MongoDB 6.0 .
The Developer Data Platform: Highlights from MongoDB World 2022 Keynotes
MongoDB World 2022 is the first in-person MongoDB conference in nearly three years, offering us an opportunity to announce new releases and outline the future of MongoDB. During three World keynotes on June 7, the company’s leaders discussed our vision for the company and our products — and how they form a developer data platform, a family of tools and services built around a common API to help developers reduce complexity, improve their experience, achieve operational excellence, and run deep analytics. The inspiration for this concept originated from the desire to empower developers to build and scale applications faster, thus transforming their organizations and businesses. As Dev Ittycheria has discovered over the course of his eight years as CEO, “No customer has complained about innovating too quickly.” “What they have complained about — and what they struggle with — is increasing their pace of innovation,” Ittycheria says. “Invariably, the thing that holds them back is their legacy, brittle, inflexible architecture and infrastructure.” Why developers? From the beginning, MongoDB was built by — and for — developers, a category that includes anyone who creates or works with applications, as well as those who lead them. “Every product we build, every feature we develop — is all geared towards developer productivity,” Ittycheria says. “The obvious question,” Ittycheria continues, “is how do you make developers insanely fast and productive?” Given that developers spend so much time troubleshooting data, the answer lay in removing the friction inherent to this process. That’s why MongoDB was built on the document model, which maps data to objects in code — transforming the way developers organized and interacted with data. We believed in the potential of the document model so strongly that we built our entire product family around it, streamlining the developer data experience and facilitating all data-related tasks and products, from search to analytics. Additionally, the world continues to digitize, a trend that was only accelerated by the COVID-19 pandemic and the ensuing lockdowns. “There will be 750 million new digital apps by 2025,” Ittycheria says, citing a study from analyst firm IDC. CTO Mark Porter agrees. “There will be more applications built over the next four years than were built in the first 40 years,” he says. “The pace of innovation is increasing, and that means developer productivity is essential.” To get ahead of these trends, Ittycheria says, MongoDB is doubling down on research and development — as well as empowering innovators to create, transform, and disrupt industries by unleashing the power of software and data. The struggles of a developer The root causes of many developer difficulties can be summed up in two parts: an obsolete, decades-old technology (the relational data model) and the complications that arise from its fundamental mismatch with modern applications. “Relational databases were not scalable,” Porter says, recalling his time as a developer. “No matter how hard I tried, we couldn’t make them available, and no matter what we did, we couldn’t make SQL and RMS easy to use.” In essence, the limitations of relational databases are becoming very clear, Ittycheria adds. “They’re too rigid, too inflexible, too cumbersome, and just don’t scale.” As a result, “there’s been a proliferation of niche databases — which are focused on some small point solution — to compensate.” In fact, these narrow, specialized products (such as key-value or in-memory databases) often add cost and complexity. Combining these disparate products into a single architecture can impede innovation by siloing data, fragmenting application infrastructure, and further confusing workflows. This also creates a training gap — slowing down developers as they spend valuable time learning the ins and outs of each product. A typical data architecture, with a number of specialized databases adding complexity. A better way to work with data “We obsess about helping you get from an idea to a global reality,” says Sahir Azam, MongoDB’s chief product officer. The result of that obsession is MongoDB Atlas, our developer data platform, which reflects that obsession in three key ways. First, MongoDB offers an elegant developer experience. By getting the data, plumbing, and complexity out of the way, MongoDB enables users to “focus on innovating and building the differentiation for their companies and ideas,” Azam says. As a result, developers no longer have to create or run unwieldy, bespoke architectures for each new product or application. Next, Atlas enables broad workload support, providing, in Azam’s words, “most, if not all, of the capabilities you need for demanding modern applications” — whether they’re operational, analytical, or transactional. This includes abilities like application search, data lake, and aggregation pipelines, to name a few. Lastly, Atlas is resilient, scalable, stable, and secure, “so you can take an idea from a single geography to serving customers worldwide,” Azam says. When combined with the ease of use and versatility of the document model, the Atlas product family presents a uniquely valuable proposition for many developers. In order to build the future, developers need a mission-critical foundation. “Applications have always needed a solid foundation — from silicon to chips,” Porter says. If “someone at the lower level misses a configuration file, someone at the lower level messes something up, and everything comes crashing down.” Ultimately, the strength of MongoDB is that it frees up the developer to play to their strengths — building new products and applications, and not wrangling existing components. By providing documents and a flexible schema, high availability and scalability, and seamless partner integration, MongoDB helps become the mission-critical foundation for developers to build upon. “Just a database isn’t enough,” Porter says. For you to succeed, “there’s an actual, existential need to have this foundation. And we call it our developer data platform.” How far we've come Today, MongoDB is the world’s most popular data platform for building modern applications, Ittycheria says. The numbers back up this statement, with over 265 million downloads of MongoDB’s Community Edition, upwards of 150,000 new Atlas registrations per month, and more downloads in the past twelve months than in the first 12 years of MongoDB’s existence. Further, MongoDB has greatly expanded its global reach. From a humble beginning of four regions in AWS, MongoDB Atlas now runs in 95+ regions worldwide in AWS, Google Cloud, and Azure. MongoDB has also partnered with other cloud providers around the world. MongoDB’s core mission remains the same, even as our user base has expanded to 35,000+ customers across every industry and use case, as well as 100+ nations. MongoDB continues to simplify the developer experience, streamline the release process, speed up innovation, and help organizations ship faster. “Every week, we see new ideas spring up across the globe,” Azam says, many of which are powered by MongoDB. These organizations, which range from small startups to large corporations, include a digital-only challenger bank in Vietnam, a startup providing simulation training for Norwegian healthcare professionals, and a nonprofit that deals with surplus food from restaurants across Mexico. A serverless, mission-critical foundation MongoDB’s goal is to make Atlas the data platform for developers, empowering them to build the applications of the future. To achieve this objective, MongoDB is going serverless. “Modern development, in many ways, has been a constant search for higher levels of abstraction,” Azam points out, which removes complexity, and enables developers to move faster, differentiate, and pivot as needed. By going serverless, Atlas will minimize operational overhead down to almost zero, shifting the burden of servers, data centers, and provisioning away from developers. Further, Azam points out that many existing serverless databases “pose some significant limitations.” For instance, one popular type of serverless database is the key-value store, an ultra-simple database that cannot sustain complex workloads — and forces developers to add more databases in order to support additional application functionality. Instead, Atlas serverless combines all the best characteristics of serverless with the complete MongoDB experience — including the versatility of the rich document model, transactional guarantees, rich aggregations, and much more. This way, “we can support the full breadth of use cases you’re used to building on our platform,” Azam says. Unlike other serverless products, Atlas serverless instances also offer a competitive pricing model. Currently, “most serverless databases force a hard trade-off” when it comes to scaling, Azam says, requiring users to either deal with cold start delays when ramping up their serverless databases from zero, or pay extra (and pre-provision capacity) in order to scale quickly up from zero. In contrast, Atlas serverless enables users to “scale down to minimal usage and instantly scale up as your application needs — without any pre-committed capacity,” Azam says. Coupled with competitive pricing, flexibility for development and deployment, and instant scaling, Atlas serverless instances bring all of the advantages of serverless — without any of the downsides. What MongoDB can do for developers In essence, MongoDB will enable users to do their best work in four key ways. Reduce complexity Complicated application architectures, alongside an abundance of point solutions, force developers to spend more time and effort on operational “plumbing,” distracting them from their core mission of transformation through innovation. Using the MongoDB Atlas developer data platform, developers can, in Azam’s words, “remove complexity and the need for more niche databases in your architecture.”These features include MongoDB Atlas Search, for a purpose-built search solution, and Atlas Device Sync, for ensuring data consistency between edge, cloud, and backend.” Read our blog on reducing complexity to learn more . Provide a better developer experience “If you remove the friction from working with data,” Ittycheria says, “you make developers insanely productive.” An elegant developer experience “makes lives so much easier.” This is achieved through superior tooling and integration between MongoDB features, such as Atlas serverless instances, which abstract away considerations like provisioning and scaling, or the Atlas CLI, which packs the power and functionality of a GUI into the simplicity of a command line. Read our blog on the developer experience gap to learn more . Application analytics As businesses continue to digitize, their need to collect information for real-time analytics has only grown. To address this need, Atlas has added real-time application analytics abilities into its unified platform, Azam says. This means supporting analytical queries (and not just transactions), as well as making this data easily available for deep analysis and strategic decision making. This category includes Atlas Charts for rich data visualizations, and the Atlas SQL Interface for both connecting third party SQL-based analytics tools to Atlas. Read our blog on new analytics features to learn more . Operational excellence “We do this all with a strong foundation of resiliency, security, and scale,” Azam says. This means automating core operational processes to deploy and run global data infrastructure, plus simplifying complex procedures such as data secrecy, migrations, and cross-environment sync. Related features include the Atlas Operator for Kubernetes, which allows developers to deploy, scale, and manage Atlas clusters using Kubernetes, or our pioneering Queryable Encryption, a cryptographically secure, operationally efficient solution for working with sensitive data. Read our blog on new features to improve security and operations to learn more . Building the future — with MongoDB “But we’re not done yet — and neither are you,” Ittycheria says. “Tomorrow, we will help support newer and more inspiring applications. Just imagine what we’ll do tomorrow.” “We have 150,000 new ideas coming in every month,” Azam says. “I challenge you to think about how to transform your organization — how to take your next big idea to a global reality.” “What I’d like to challenge you to do is to grab your share of those 765 million apps,” Porter says. “Think about how you can change the world — and hopefully do it on our platform.... I am sure that the future is going to be built by you.”
Streamline, Simplify, Accelerate: New MongoDB Features Reduce Complexity
At MongoDB World 2022 , we announced several developer-centric features that provide more powerful analytics, streamline operations, and reduce complexity. In this post, we look at MongoDB Atlas Data Federation , MongoDB Atlas Search , MongoDB Atlas Device Sync and its Flexible Sync, and change streams. As consumer expectations of the applications they use grow, developers must continue to create richer experiences. To do that, many are adding a variety of data systems and components to their architectures, including single-purpose NoSQL datastores, dedicated search engines, and analytics systems. Piecing these disparate systems together adds complexity to workflows, schedules, and processes, however. For instance, one application could utilize a solution for database management, another solution for search functionality, and a third solution for mobile data sync. Even within an organization, teams often use different products to perform the same tasks, such as data analysis. This way of building modern applications often causes significant problems, such as data silos and overly complex architectures. Additionally, developers are forced to spend extra time and effort to learn how each of these components functions, to ensure they work together, and to maintain them over the long term. It should not be the developer’s job to rationalize all these different technologies in order to build rich application experiences. The developer data platform For developers and their teams, cobbling together a data infrastructure from disparate components is inefficient and time-consuming. Providers have little incentive to ensure that their solutions can function alongside the products of their competitors. Further, internal documentation, which is key to demystifying the custom code and shortcuts in a bespoke architecture, might not be available or current, and organizational knowledge gets lost over time. MongoDB Atlas, our developer data platform , was built to solve these issues. An ecosystem of intuitive, interlinked services, Atlas includes a full array of built-in data tools, all centered around the MongoDB Atlas database. Features are native to MongoDB, work with a common API, are designed for compatibility, and are intended to support any number of use cases or workloads, from transactional to operational, analytics to search, and anything in between. Equally important, Atlas removes the hidden, manual work of running a sprawling architecture, from scaling infrastructure to building integrations between two or more products. With these rote tasks automated or cleared away, developers are free to focus on what they do best: build, iterate, and release new products. MongoDB Atlas Data Federation MongoDB Atlas Data Federation allows you to write a single query to work with data across multiple sources, such as your Amazon S3, Atlas Data Lake , and MongoDB Atlas clusters. Atlas Data Federation is not a separate repository of data, but a service to combine, enrich, and transform data across multiple sources, regardless of origin, and output to your preferred location. With Atlas Data Federation, developers who want to aggregate data or federate queries do not need to use complex data pipelines or time-consuming transformations — a key advantage for those seeking to build real-time app features. Atlas Data Federation also makes it easier to quickly convert MongoDB data into columnar file formats, such as Parquet or CSV, so you can facilitate ingestion and processing by downstream teams that are using a variety of different analytics tools. MongoDB Atlas Search Rich, responsive search functionality has become table stakes for both consumer-facing and internal applications. But building high-quality search experiences isn’t always easy. Developers who use a third-party, bolt-on search engine to build search experiences have to deal with problems like the need to sync data between multiple systems; more operational overhead for scaling, securing, and provisioning; and using different query interfaces for database and search. Built on the industry-leading Apache Lucene search library, MongoDB Atlas Search is the easiest way to build rich, fast, and relevant search directly into your applications. It compresses three systems — database, search engine, and sync mechanism — into one, so developers don’t have to deal with the problems that bolt-on search engines introduce. It can be enabled with a few API calls or clicks and uses the same query language as the rest of the MongoDB product family. Atlas Search provides all of the features developers need for rich, personalized search experiences to users, like facets , now generally available, which offers users a way to quickly filter and navigate search results. With facets, developers can index data to map fields to categories like brand, size, or cost, and update query results based on relevance. This allows users to easily define multiple search criteria and see results updated in near real-time. MongoDB Atlas Device Sync With apps such as TikTok, Instagram, and Spotify, mobile users have come to expect features such as real-time updates, reactive UIs, and an always-on, always-available experience. While the user experience is effortless, building these abilities into a mobile app is anything but. Such features require lots of time and resources to develop, test, debug, and maintain. MongoDB Atlas Device Sync is designed to help developers address mobile app data challenges, including limited connectivity, dead zones, and multiple collaborators (all with varying internet speeds and access) by gathering, syncing, and resolving any sync conflicts between the mobile database and MongoDB Atlas — without the burden of learning, deploying, and managing separate data technologies. At World 2022, MongoDB announced Flexible Sync, a new way to sync data between devices and the cloud. Using Flexible Sync, developers can now define synced data using language-native queries and fine-grained permissioning, resulting in a faster, more seamless way of working — and one analogous to the way developers code and build. Previously, developers had to sync full partitions of data; Flexible Sync enables synchronization of only the data that’s relevant. With support for filter logic, asymmetric sync, and hierarchical permissioning, Flexible Sync can reduce the amount of required code by 20% or more, and speed up build times from months to weeks. Change Streams Data changes quickly, and your applications need to react just as quickly. When a customer’s order is shipped, for instance, they expect an in-app or email notification — and they expect it immediately. Yet building applications that can respond to events in real time is difficult and often requires the use of polling infrastructure or third-party tools, both of which add to developer overhead. Latency and long reaction times result in data that is outdated, and poor experiences for users of that data. Like Atlas’s Database Triggers, change streams enable developers to build event-driven applications and features that react to data changes as they happen. Along with reducing the complexity and cost of building this infrastructure from scratch, the new change stream enhancements (available in MongoDB 6.0) will enable you to determine the state of your database before and after an event occurs, so you can act on the changes and build business logic, analytics, and policies around it. That opens up new use cases, such as retrieving a copy of a document immediately after it is updated. All of these updates and new capabilities focus on the critical need to eliminate complexity in order to build, deploy, and secure modern applications in any environment. Together, MongoDB helps solve what MongoDB president and CEO Dev Ittycheria called a key developer challenge in his MongoDB World 2022 keynote: reducing the friction and cost of working with data. Learn more about MongoDB World 2022 announcements at mongodb.com/new and in these stories: 5 New Analytics Features to Accelerate Insights and Automate Decision-Making 4 New MongoDB Features to Improve Security and Operations Closing the Developer Experience Gap: MongoDB World Announcements