100x Faster Facets and Counts with MongoDB Atlas Search: Public Preview
Today we’ve released one of the most powerful features of Atlas Search in public preview, and ready for your evaluation: lightning fast facets and counts over large data sets. Faceted search allows users to filter and quickly navigate search results by categories and see the total number of results per category for at-a-glance statistics. With the new facet operator , facet and count operations are pushed down into Atlas Search’s embedded Lucene index and processed locally – taking advantage of 20+ years of Lucene optimizations – before returning the faceted result set back to the application. What this means is that now facet-heavy workloads such as ecommerce product catalogs, content libraries, and counts run up to 100x faster . The power of facets and counts in full-text search Faceting is a popular search and analytics capability that allows an application to group information into related categories by applying filters to query results. Users can narrow their search results by simply selecting a facet value as a filter criteria. They can intuitively explore complex data sets, providing fast and convenient navigation to quickly drill into the data that is of most interest. A common use of faceting is navigating product catalogs. With travel starting to reopen, let's take a travel site as an example. By using faceted search, the site can present vacation options by destination region, trip type (i.e. hotel, self-catering, beach, ski, city break), price band, season, and more, enabling users to quickly navigate to the category that is most relevant to them. Facets also enable fast results counting. Extending our travel site example, business analysts can use facets to quickly compare sales statistics by counting the number of trips sold by region and season. Prior to the new facet operator, the only way Atlas Search could facet and count data was to retrieve the entire result set back to MongoDB’s internal $facet aggregation pipeline stage . While that was OK for smaller data sets, it became slow when the result set exceeded tens of thousands of documents. This all changes as now operations are pushed down to Atlas Search’s embedded and optimized Lucene library in a single $search pipeline stage. From our internal testing of a collection with one million documents, the new Atlas Search faceting improves performance by 100x. How to use faceting in Atlas Search Our new Atlas Search facets tutorial will help you get started. It describes how to: Create an index with a facet definition on string, date, and numeric fields in the sample_mflix.movies collection. Then run an Atlas Search query against those fields for results grouped by values for the string field and by ranges for the date and numeric fields, including the count for each of those groups. To use Atlas Search facets, you must be running your Atlas cluster on MongoDB 4.4.11 and above or MongoDB 5.0.4 and above. These clusters must be running on the M10 tier or higher. Facets and counts currently work on non-sharded collections. Support for sharded collections is scheduled for next year. The power of Atlas Search in a unified application data platform in the cloud MongoDB Atlas Search makes it easy to build fast, relevant full-text search on top of your data in the cloud. A couple of API calls or clicks in the Atlas UI, and you instantly expose your data to sophisticated search experiences that boost engagement and improve satisfaction with your applications. Your data is immediately more discoverable, usable, and valuable. By embedding the Apache Lucene library directly alongside your database, data is automatically synchronized with the search index; developers get to work with a single API; there is no separate system to run and pay for; and everything is fully-managed for you on any cloud you choose. Figure 1: Rather than bolting-on a separate search engine to your database, Atlas Search provides a fully integrated platform. Atlas Search provides the power you get with Lucene — including faceted navigation, autocomplete, fuzzy search, built-in analyzers, highlighting, custom scoring, and synonyms — combining it with the productivity you get fromMongoDB. As a result, developers can ship search applications and new features 30%+ faster. Next steps You can try out Atlas Search with the public preview of lightning-fast facets and counts today: If you are new to Atlas Search, simply spin up a cluster (M10 tier or above) and get started with our Atlas Search facets tutorial . If you are already using Atlas Search on M10 tiers and above then update your indexes to use the facet field mapping , and then start querying ! Your data remains searchable while it is being re-indexed. If you want to dig into the use cases you can serve with Atlas Search — along with users who are already taking advantage of it today — download our new Atlas Search whitepaper . Safe Harbor The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.
The Top 5 Data Trends Driving Competitive Advantage Today… — and Tomorrow
The latest market research from Cloudflight , a leading analyst firm based in Europe, identified 12 major technology trends for the current year. The trends found a radical shift in cloud adoption and an acceleration toward digital as people, society, the economy, and the environment all responded to the coronavirus pandemic. During a recent webinar , Dr. Stefan Ried (Cloudflight) and Mat Keep (MongoDB) shared key industry insights and explored in detail five of the most prevalent trends. The session found that, as the need for technological innovation grows, a company’s competitive advantage is increasingly tied to how well it can build software around its most important asset: data. In this post, Dr. Stefan Ried breaks down those five key trends and analyzes how businesses can drive data innovation to stay ahead of the field. Mat Keep then offers practical next steps to get started as data is increasingly managed in the cloud. Trend 1 Data becomes the differentiator — even beyond software Initially, many startups disrupted the incumbents in their industries with innovation based on software. All the while, non-digital-native enterprises caught up. Now data has become more important than software algorithms. Here’s an example: Imagine a traditional automotive company. The business could purchase components and software from a supplier to implement autonomous driving in its cars, but without enough learning data out of every region its cars wouldn’t drive reliably. In this case — and many more — the automotive firm cannot just buy a software competitive advantage off the shelf. Instead, it must build that advantage — and build it using data. It’s why data is quickly becoming the differentiator in all industries and why delivering a modern customer experience is increasingly reliant on this underlying infrastructure. Software Stack Eruption (Source: Cloudflight 2020) The above image illustrates just how the tech stack is evolving. Data quality is quickly becoming the outstanding differentiator compared to software algorithms. That’s why we consider the access, ownership, and quality of data to be the mountain of innovation in this decade and moving forward. Trend 2 Europe embraces various cloud scenarios Cloud adoption in Europe has always been behind that of the United States. One reason is obvious data sovereignty and compliance concerns. It would be an intriguing thought experiment to reflect on how the U.S. public cloud adoption would have developed over the past 10 years if the only strong and innovative providers were European or even Chinese companies. Europe, however, is now at an important inflection point. Global hyperscalers finally addressed these national privacy issues. Platform service providers, including MongoDB with MongoDB Atlas , have significantly increased support for these privacy requirements with technical features such as client-side-encryption and operational SLAs. This achievement enables enterprises and even public government agencies across Europe to embrace all three basic types of cloud scenarios. Lift and shift , moving existing legacy workloads without any change to new IaaS landscapes in the cloud. Modernization and decomposing existing application stacks into cloud-native services such as a DBaaS. Modernized workloads could leverage the public cloud PaaS stacks much better than monolithic legacy stacks. The new development of cloud-native applications and building modern applications with less code and more orchestration of many PaaS services. Trend 3 Hybrid-cloud is the dominant cloud choice and multicloud will come next Nearly 50 percent of participants in our recent webinar said hybrid-cloud is their current major deployment model. These organizations use different public and private clouds for different workloads. Just 20 percent of the attendees still restrict activities to a single cloud provider. Another equally sized group claimed the exact opposite approach to multicloud environments,where a single workload may use a mixture of cloud sources or may be developed on different providers to reach multiple regions. See below. Embracing the Cloud webinar poll results (June 2021) The increasing adoption of these real multicloud scenarios is yet another major trend we will see for many years. Less experienced customers may be afraid of the complexity of using multiple cloud providers, but independent vendors offer the management of a full-service domain across multiple providers. MongoDB Atlas offers this platform across AWS, Azure, and GCP, and paves the road for real multicloud adoption and innovation. Trend 4 Cloud-native is taking off with innovative enterprises In many client engagements, Cloudflight sees a strong correlation between new business models driven by digital products and cloud-native architectures. Real innovation happens when differentiated business logic meets the orchestration of a PaaS offering. That’s why car OEMs do not employ packaged asset-life-cycle-management systems but instead develop their own digital twins for the emerging fleet of millions of digitized vehicles. These PaaS architectures follow an API-first and service-oriented paradigm leveraging a lot of open-source software. Most of this open-source software is commercially managed by hyperscalers and their partner vendors to make it accessible and highly available without deep knowledge of the service itself. The approach provides very fast productive operations of new digital products. If compliance requires it, however, customers may operate the same open-source services on their own again. Once your product becomes extremely successful and you’re dealing with data volume far beyond one petabyte, you may also reconsider self-operations for cost reasons. This is because there is no operational lock-in for a specific service provider and you may become an “operations pro” on your own. Trend 5 Digital twins become cloud drivers in many industries Many people still connect the term “cloud computing” to virtualized compute-and-storage services. Yet cloud computing is far more. PaaS levels became increasingly attractive with prepackaged cloud-native services. It has been on the market for many years, but the perception and adoption — especially in Europe — is still behind its potential. Based on today’s PaaS services, cloud providers and their partners are already extending their offers to higher levels. The space of digital twins along with AI are clear opportunities here. There are offerings for each of the three major areas of digital twins: In modern automated manufacturing (industry 4.0), production twins are created when a product is ordered and they make production-relevant information (such as individual configurations) available to all manufacturing steps along the supply chain. Once the final product is delivered, the requirements for interactions and data models change significantly for these post-production-life-cycle twins . Production, post-production and simulation-twin (Source: Cloudflight ) Finally, simulation twins are a smart approach to test machine learning applications. Take, for example, the autonomous driving challenge: Instead of testing the ongoing iterations of driving “knowledge” on a physical vehicle, running virtual simulation twins is much preferred and safer than experiments in real traffic situations. Beyond manufacturing and automotive, there are many verticals in which digital twins make sense. Health care is a clear and obvious example in which real-life experiments may not always be the best approach. Success here depends mostly on the cooperation between technology vendors and the industry-specific digital twin ecosystems . In Summary Each of the five trends discussed center on or closely relate to cloud-native data management. A traditional database may be able to run for specific purposes on cloud infrastructure, but only a modern cloud-native application data platform is able to serve both the migration of legacy applications and the development of multiple new cloud-native applications. Next Steps Where and how can companies get started on a path to using data as a driver of competitive advantage? Mat Keep, Senior Director of Products at MongoDB, takes us through how to best embrace this journey. As companies move to embrace the cloud, they face an important choice. Do they: Lift and shift: move existing applications to run in the cloud on the same architecture and technologies used on premises. Transform (modernize): rearchitect applications to take advantage of new cloud-native capabilities such as elasticity, redundancy, global distribution, and managed services. Lift and shift is often seen as an easier and more predictable path since it reuses a lot of the technology you use on premises — albeit now running in the cloud — presenting both the lowest business risk and least internal cultural and organizational resistance. It can be the right path in some circumstances, but we need to define what those circumstances are. For your most critical applications, lift and shift rarely helps you move the business forward. You will be unable to fully exploit new cloud-native capabilities that enable your business to build, test, and adapt faster. The reality we all face is that every application is different, so there is no simple or single “right” answer to choosing lift and shift versus transformation. In some cases, lift and shift can be the right first step, helping your teams gain familiarity with operating in the cloud before embarking on a fuller transformation as they see everything the cloud has to offer. This can also be a risk, however, if your teams believe they are done with the cloud journey and don’t then progress beyond that first step. To help business and technology leaders make the right decisions as they embrace the cloud, we have created an Executive Perspective for Lift and Shift Versus Transformation . The perspective presents best practices that can help prioritize your efforts and mobilize your teams. By working with more than 25,000 customers, including more than 50 percent of the Fortune 100, the paper shares the evaluation frameworks we have built that can be used to navigate the right path for your business, along with the cultural transformations your teams need to make along the way. Embracing the Cloud: Assessment Framework Toyota Material Handling in Northern Europe has recently undergone its own cloud journey. As the team evolved its offerings for industry 4.0, it worked with MongoDB as part of its transformation. Moving from monolithic applications and aging relational databases running on premises to microservices deployed on a multicloud platform, the company completed its migration in just four months. It reduced costs by more than 60 percent while delivering an agile, resilient platform to power its smart factory business growth. To learn more about cloud trends and the role of data in your cloud journey, tune in to the on-demand webinar replay .
Client-Side Field Level Encryption is now on Azure and Google Cloud
We’re excited to announce expanded key management support for Client-Side Field Level Encryption (FLE). Initially released last year with Amazon’s Key Management Service (KMS), native support for Azure Key Vault and Google Cloud KMS is now available in beta with support for our C#/.Net, Java, and Python drivers. More drivers will be added in the coming months. Client-Side FLE provides amongst the strongest levels of data privacy available today. By expanding our native KMS support, it is even easier for organizations to further enhance the privacy and security of sensitive and regulated workloads with multi-cloud support across ~80 geographic regions. My databases are already encrypted. What can I do with Client-Side Field Level Encryption? What makes Client-Side FLE different from other database encryption approaches is that the process is totally separated from the database server. Encryption and decryption is instead handled exclusively within the MongoDB drivers in the client, before sensitive data leaves the application and hits the network. As a result, all encrypted fields sent to the MongoDB server – whether they are resident in memory, in system logs, at-rest in storage, and in backups – are rendered as ciphertext. Neither the server nor any administrators managing the database or cloud infrastructure staff have access to the encryption keys. Unless the attacker has a compromised DBA password, privileged network access, AND a stolen client encryption key, the data remains protected, securing it against sophisticated exploits. MongoDB’s Client-Side FLE complements existing network and storage encryption to protect the most highly classified, sensitive fields of your records without: Developers needing to write additional, highly complex encryption logic application-side Compromising your ability to query encrypted data Significantly impacting database performance By securing data with Client-Side FLE you can move to managed services in the cloud with greater confidence. This is because the database only works with encrypted fields, and you control the encryption keys, rather than having the database provider manage the keys for you. This additional layer of security enforces an even finer-grained separation of duties between those who use the database and those who administer and manage the database. You can also more easily comply with “right to erasure” mandates in modern privacy legislation such as the GDPR and the CCPA . When a user invokes their right to erasure, you simply destroy the associated field encryption key and the user’s Personally Identifiable Information (PII) is rendered unreadable and irrecoverable to anyone. Client-Side FLE Implementation Client-Side FLE is highly flexible. You can selectively encrypt individual fields within a document, multiple fields within the document, or the entire document. Each field can be optionally secured with its own key and decrypted seamlessly on the client. To check-out how Client-Side FLE works, take a look at this handy animation. Client-Side FLE uses standard NIST FIPS-certified encryption primitives including AES at the 256-bit security level, in authenticated CBC mode: AEAD AES-256-CBC encryption algorithm with HMAC-SHA-512 MAC. Data encryption keys are protected by strong symmetric encryption with standard wrapping Key Encryption Keys, which can be natively integrated with external key management services backed by FIPS 140-2 validated Hardware Security Modules (HSMs). Initially this was with Amazon’s KMS, and now with Azure Key Vault and Google Cloud KMS in beta. Alternatively, you can use remote secure web services to consume an external key or a secrets manager such as Hashicorp Vault. Getting Started To learn more, download our Guide to Client-Side FLE . The Guide will provide you an overview of how Client-Side FLE is implemented, use-cases for it, and how it complements existing encryption mechanisms to protect your most sensitive data. Review the Client-Side FLE key management documentation for more details on how to configure your chosen KMS. Safe Harbor The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.