Best Practices für die Leistung: Indizierung
Willkommen zum dritten Teil unserer Reihe von Blogbeiträgen zu Best Practices für die Leistung von MongoDB. In dieser Reihe behandeln wir wichtige Überlegungen zur Erzielung von Leistung bei skalieren in einer Reihe wichtiger Dimensionen, darunter: Datenmodellierung und Speicherdimensionierung (die Arbeitsfestlegung) Abfragemuster und Profilerstellung Indizierung, die wir heute behandeln werden Sharding Transaktionen und Lese-/Schreibprobleme Hardware und Betriebssystemkonfiguration Bankaufwärmen Da beide in den letzten 15 Jahren für verschiedene Datenbankanbieter gearbeitet haben, können wir mit Sicherheit sagen, dass das Versäumnis, den richtigen Index zu definieren, das größte Leistungsproblem ist, mit dem sich technische Supportteams bei Benutzern befassen müssen. Wir müssen es also richtig machen ... hier sind die Best Practices, die Ihnen helfen. Index in MongoDB In jeder Datenbank unterstützt Index die effiziente Ausführung von Abfragen. Ohne sie muss die Datenbank jedes Dokument in einer collection oder Tabelle scannen, um diejenigen auszuwählen, die der Abfrageanweisung entsprechen. Wenn für eine Abfrage ein geeigneter Index vorhanden ist, kann die Datenbank mithilfe des Index die Anzahl der Dokumente begrenzen, die sie überprüfen muss. MongoDB bietet eine breite Palette an Indextypen und Funktionen mit sprachspezifischen Sortierreihenfolgen, um komplexe Zugriffsmuster auf Ihre Daten zu unterstützen. Der MongoDB- Index kann bei Bedarf erstellt und gelöscht werden, um sich ändernden Anwendungsanforderungen und Abfragemustern gerecht zu werden, und kann für jedes Feld in Ihren Dokumenten deklariert werden, einschließlich der in Arrays verschachtelten Felder. Sehen wir uns also an, wie Sie den Index in MongoDB optimal nutzen. Verwenden Sie den zusammengesetzten Index Zusammengesetzte Index sind Index , die aus mehreren verschiedenen Feldern bestehen. Anstatt beispielsweise einen Index für „Nachname“ und einen anderen für „Vorname“ zu haben, ist es in der Regel am effizientesten, einen Index zu erstellen, der sowohl „Nachname“ als auch „Vorname“ enthält, wenn Sie beide Namen abfragen . Unser zusammengesetzter Index kann weiterhin zum Filtern von Abfragen verwendet werden, die nur den Nachnamen angeben. Befolgen Sie die ESR-Regel Bei zusammengesetzten Indizes ist diese Faustregel hilfreich, um die Reihenfolge der Felder im Index festzulegen: Fügen Sie zunächst die Felder hinzu, für die Gleichheitsabfragen ausgeführt werden Die nächsten Index sollten die Sortierreihenfolge der Abfrage widerspiegeln Die letzten Felder stellen den Bereich der Daten dar, auf die zugegriffen werden soll Verwenden Sie nach Möglichkeit abgedeckte Abfragen Abgedeckte Abfragen liefern Ergebnisse direkt aus einem Index, ohne dass auf die Quelldokumente zugegriffen werden muss, und sind daher sehr effizient. Damit eine Abfrage abgedeckt werden kann, müssen alle Felder, die zum Filtern, Sortieren und/oder zur Rückgabe an den Client benötigt werden, in einem Index vorhanden sein. Um festzustellen, ob es sich bei einer Abfrage um eine abgedeckte Abfrage handelt, verwenden Sie die Methode „ explain() “. Wenn in der EXPLAIN()-Ausgabe „totalDocsExamined“ als 0 angezeigt wird, zeigt dies, dass die Abfrage durch einen Index abgedeckt ist. Weitere Informationen zur Erläuterung der Ergebnisse finden Sie in der Dokumentation . Ein häufiges Problem beim Versuch, abgedeckte Abfragen zu erreichen, besteht darin, dass das ID Feld immer standardmäßig zurückgegeben wird. Sie müssen es explizit aus den Abfrageergebnissen ausschließen oder dem Index hinzufügen. Im sharded cluster muss MongoDB intern auf die Felder des shard key zugreifen. Dies bedeutet, dass abgedeckte Abfragen nur möglich sind, wenn der shard key Teil des Index ist. Normalerweise ist es trotzdem eine gute Idee, dies zu tun. Seien Sie vorsichtig, wenn Sie Index für Felder mit niedriger Kardinalität in Betracht ziehen Abfragen auf Felder mit einer kleinen Anzahl eindeutiger Werte (geringe Kardinalität) können große Ergebnisse zurückgeben. Der zusammengesetzte Index kann Felder mit niedriger Kardinalität enthalten, der Wert der kombinierten Felder sollte jedoch eine hohe Kardinalität aufweisen. Eliminieren Sie unnötige Index Index sind ressourcenintensiv: Selbst bei Komprimierung in der MongoDB WiredTiger Storage Engine verbrauchen sie RAM und Festplatte. Während Felder aktualisiert werden, muss der zugehörige Index beibehalten werden, was zusätzlichen CPU- und Festplatten-E/A-Overhead verursacht. MongoDB bietet Tools, die Ihnen helfen, die Indexnutzung zu verstehen, auf die wir später in diesem Beitrag eingehen werden. Platzhalterindizes sind kein Ersatz für die arbeitslastbasierte Indexplanung Für Arbeitslasten mit vielen Ad-hoc-Abfragemustern oder die stark polymorphe Dokumentstrukturen bewältigen, bietet Ihnen der Wildcard Index viel zusätzliche Flexibilität. Sie können einen Filter definieren, der automatisch alle übereinstimmenden Felder, Unterdokumente und Arrays in einer collection Index. Wie jeder Index müssen auch sie gespeichert und verwaltet werden, sodass sie der Datenbank Overhead verleihen. Wenn die Abfragemuster Ihrer Anwendung im Voraus bekannt sind, sollten Sie einen selektiveren Index für die spezifischen Felder verwenden, auf die die Abfragen zugreifen. Verwenden Sie die Textsuche, um Wörter in einem Feld zu finden Reguläre Index sind nützlich, um den gesamten Wert eines Felds abzugleichen. Wenn Sie nur ein bestimmtes Wort in einem Feld mit viel Text finden möchten, verwenden Sie einen Index . Wenn Sie MongoDB im Atlas-Dienst ausführen, sollten Sie die Verwendung der Atlas Full Text Search in Betracht ziehen, die einen vollständigmanaged Lucene-Index bereitstellt, der in die MongoDB-Datenbank integriert ist. FTS bietet eine höhere Leistung und größere Flexibilität beim Filtern, Einordnen und Sortieren Ihrer Datenbank, um Ihren Benutzern schnell die relevantesten Ergebnisse anzuzeigen. Verwenden Sie einen partiellen Index Reduzieren Sie den Größen- und Overhead von Indizes, indem Sie nur Dokumente einschließen, auf die über den Index zugegriffen werden soll. Erstellen Sie beispielsweise einen Index für das Feld „orderID“, der nur Bestelldokumente mit dem orderStatus „In Bearbeitung“ enthält, oder Index nur das Feld „emailAddress“ für Dokumente, sofern es vorhanden ist. Nutzen Sie den Multi-Key- Index zum Abfragen von Arrays Wenn Ihre Abfragemuster den Zugriff auf einzelne Array-Elemente erfordern, verwenden Sie einen Index mit mehreren Schlüsseln. MongoDB erstellt für jedes Element im Array einen Indexschlüssel und kann über Arrays erstellt werden, die sowohl Skalarwerte als auch verschachtelte Dokumente enthalten. Vermeiden Sie reguläre Ausdrücke, die nicht verankert oder verwurzelt sind Index ist nach Wert sortiert. Führende Platzhalter sind ineffizient und können zu vollständigen Index-Scans führen. Nachfolgende Platzhalter können effizient sein, wenn der Ausdruck genügend führende Zeichen enthält, bei denen die Groß-/Kleinschreibung beachtet werden muss. Vermeiden Sie reguläre Ausdrücke ohne Berücksichtigung der Groß- und Kleinschreibung Wenn der einzige Grund für die Verwendung eines regulären Ausdrucks darin besteht, dass die Groß-/Kleinschreibung nicht berücksichtigt wird, verwenden Sie stattdessen einen Index, bei dem die Groß-/Kleinschreibung nicht berücksichtigt wird, da diese schneller sind. Nutzen Sie die in der WiredTiger Storage Engine verfügbaren Indexoptimierungen Wenn Sie MongoDB selbst verwalten, können Sie Index optional auf einem eigenen separaten Volume platzieren, was ein schnelleres Festplatten-Paging und weniger Konflikte ermöglicht. Weitere Informationen finden Sie unter WiredTiger -Optionen . Nutzen Sie den Explain-Plan Wir haben die Verwendung des Explain-Plans von MongoDB im vorherigen Beitrag zu Abfragemustern und zur Profilerstellung behandelt. Dies ist das beste Tool, um die Indexabdeckung für einzelne Abfragen zu überprüfen. Basierend auf dem Explain-Plan stellt MongoDB Visualisierungstools bereit, die dabei helfen, das Verständnis Ihres Index weiter zu verbessern, und die intelligente und automatische Empfehlungen dazu liefern, welcher Index hinzugefügt werden sollte. Visualisieren Sie die Indexabdeckung mit MongoDB Compass und Atlas Data Explorer Als kostenlose grafische Benutzeroberfläche für MongoDB Compass viele Funktionen, die Ihnen bei der Optimierung der Abfrageleistung helfen, einschließlich der Untersuchung Ihres Schemas und der Visualisierung von Abfrage-Erklärungsplänen – zwei Bereiche, die bereits in dieser Serie behandelt wurden. Die Registerkarte Index in Compass erweitert Ihr Arsenal um ein weiteres Werkzeug. Es listet die vorhandenen Indizes für eine collection auf und meldet den Namen und die Schlüssel des Index sowie seinen Typ, seine Größe und alle speziellen Eigenschaften. Über die Registerkarte „Index“ können Sie bei Bedarf auch Indizes hinzufügen und löschen. Eine wirklich nützliche Funktion ist die Indexnutzung, die Ihnen anzeigt, wie oft ein Index verwendet wurde. Zu viele Index können Ihre Leistung fast genauso beeinträchtigen wie zu wenige. Daher ist diese Funktion besonders wertvoll, wenn es darum geht, nicht verwendete Index zu identifizieren und zu entfernen. Dies hilft Ihnen, Arbeitsspeicherplatz freizugeben und eliminiert den Datenbank- Overhead , der durch die Pflege des Index entsteht. Wenn Sie MongoDB in unserem vollständigmanaged Atlas-Dienst ausführen, bietet Ihnen die Indexansicht im Daten-Explorer die gleiche Funktionalität wie Compass, ohne dass Sie mit einem separaten Tool eine Verbindung zu Ihrer Datenbank herstellen müssen. Sie können Indexstatistiken auch mithilfe der aggregation pipeline $indexStats abrufen . Automatisierte Indexempfehlungen Trotz der gesamten Telemetrie, die von den MongoDB-Tools bereitgestellt wird, sind Sie immer noch dafür verantwortlich, die erforderlichen Daten abzurufen und zu analysieren, um Entscheidungen darüber zu treffen, welcher Index hinzugefügt werden soll. Der Schwellenwert für langsame Abfragen variiert je nach der durchschnittlichen Betriebszeit Ihres cluster , um Empfehlungen bereitzustellen, die für Ihre Arbeitslast relevant sind. Empfohlene Indizes werden von Beispielabfragen begleitet, die nach Abfrageform gruppiert sind (d. h. Abfragen mit ähnlicher Prädikatstruktur, Sortierung und Projektion), die für eine collection ausgeführt wurden, die von der Hinzufügung eines vorgeschlagenen Index profitieren würde. Der Performance Advisor hat keinen negativen Einfluss auf die Leistung Ihres Atlas cluster. Wenn Sie mit der Empfehlung zufrieden sind, können Sie den neuen Index automatisch einführen, ohne dass es zu Ausfallzeiten der Anwendung kommt. Was kommt als nächstes Damit ist diese neueste Ausgabe der Best-Practices-Serie zur Leistung abgeschlossen. Die MongoDB University bietet einen kostenlosen, webbasierten Schulungskurs zur MongoDB-Leistung an . Dies ist eine großartige Möglichkeit, mehr über die Leistungsfähigkeit der Indizierung zu erfahren.
Building AI with MongoDB: Unlocking Value from Multimodal Data
One of the most powerful capabilities of AI is its ability to learn, interpret, and create from input data of any shape and modality. This could be structured records stored in a database to unstructured text, computer code, video, images, and audio streams. Vector embeddings are one of the key AI enablers in this space. Encoding our data as vector embeddings dramatically expands the ability to work with this multimodal data. We’ve gone from depending on data scientists training highly specialized models just a few years ago to developers today building general-purpose apps incorporating NLP and computer vision. The beauty of vector embeddings is that data that is unstructured and therefore completely opaque to a computer can now have its meaning and structure inferred and represented via these embeddings. Using a vector store such as Atlas Vector Search means we can search and compute unstructured and multimodal data in the same way we’ve always been able to with structured business data. Now we can search for it using natural language, rather than specialized query languages. Considering that 80%+ of the data that enterprises create every day is unstructured, we start to see how vector search combined with LLMs and generative AI opens up new use cases and revenue streams. In this latest round-up of companies building AI with MongoDB, we feature three examples who are doing just that. The future of business data: Unlocking the hidden potential of unstructured data In today's data-driven world, businesses are always searching for ways to extract meaningful insights from the vast amounts of information at their disposal. From improving customer experiences to enhancing employee productivity, the ability to leverage data enables companies to make more informed and strategic decisions. However, most of this valuable data is trapped in complex formats, making it difficult to access and analyze. That's where Unstructured.io comes in. Imagine an innovative tool that can take all of your unstructured data – be it a PDF report, a colorful presentation, or even an image – and transform it into an easily accessible format. This is exactly what Unstructured.io does. They delve deep, pulling out crucial data, and present it in a simple, universally understood JSON format. This makes your data ready to be transformed, stored and searched in powerful databases like MongoDB Atlas Vector Search . What does this mean for your business? It's simple. By automating the data extraction process, you can quickly derive actionable insights, offering enhanced value to your customers and improving operational efficiencies. Unstructured also offers an upcoming image-to-text model. This provides even more flexibility for users to ingest and process nearly any file containing natural language data. And, keep an eye out for notable upgrades in table extraction – yet another step in ensuring you get the most from your data. Unstructured.io isn't just a tool for tech experts. It's for any business aiming to understand their customers better, seeking to innovate, and looking to stay ahead in a competitive landscape. Unstructured’s widespread usage is a testament to its value – with over 1.5 million downloads and adoption by thousands of enterprises and government organizations. Brian Raymond, the founder and CEO of Unstructured.io, perfectly captures this synergy, saying, “As the world’s most widely used natural language ingestion and preprocessing platform, partnering with MongoDB was a natural choice for us. This collaboration allows for even faster development of intelligent applications. Together, we're paving the way businesses harness their data.” MongoDB and Unstructured.io are bridging the gap between data and insights, ensuring businesses are well-equipped to navigate the challenges of the digital age. Whether you’re a seasoned entrepreneur or just starting, it's time to harness the untapped potential of your unstructured data. Visit Unstructured.io to get started with any of their open-source libraries. Or join Unstructured’s community Slack and explore how to seamlessly use your data in conjunction with large language models. Making sense of complex contracts with entity extraction and analysis Catylex is a revolutionary contract analytics solution for any business that needs to extract and optimize contract data. The company’s best-in-class contract AI automatically recognizes thousands of legal and business concepts out-of-the-box, making it easy to get started and quickly generate value. Catylex’s AI models transform wordy, opaque documents into detailed insights revealing rights, obligations, risks, and commitments associated with the business, its suppliers, and customers. The insights generated can be used to accelerate contract review and to feed operational and risk data into core business systems (CLMs, ERPs, etc.) and teams. Documents are processed using Catylex’s proprietary extraction pipeline that uses a combination of various machine learning/NLP techniques (custom Named Entity Recognition, Text Classification) and domain expert augmentation to parse documents into an easy-to-query ontology. This eliminates the need for end users to annotate data or train any custom models. The application is very intuitive and provides easy-to-use controls to Quality Check the system-extracted data, search and query using a combination of text and concepts, and generate visualizations across portfolios. You can try all of this for free by signing up for the “Essentials'' version of Catylex . Catylex leverages a suite of applications and features from the MongoDB Atlas developer data platform . It uses the MongoDB Atlas database to store documents and extract metadata due to its flexible data model and easy-to-scale options, and it uses Atlas Search to provide end users with easy-to-use and efficient text search capabilities. Features like highlighting within Atlas Search add a lot of value and enhance the user experience. Atlas Triggers are used to handle change streams and efficiently relay information to various parts within the Catylex application to make it event-driven and scalable. Catylex is actively evaluating Atlas Vector Search. Bringing together vector search alongside keyword search and database in a single, fully synchronized, and flexible storage layer, accessed by a single API, will simplify development and eliminate technology sprawl. Being part of the MongoDB AI Innovators Program gives Catylex’s engineers direct access to the product management team at MongoDB, helping to share feedback and receive the latest product updates and best practices. The provision of Atlas credits reduces the costs of experimenting with new features. Co-marketing initiatives help build visibility and awareness of the company’s offerings. Harness Generative AI with observed and dark data for customer 360 Dataworkz enables enterprises to harness the power of LLMs with their own proprietary data for customer applications. The company’s products empower businesses to effortlessly develop and implement Retrieval-Augmented Generation (RAG) applications using proprietary data, utilizing either public LLM APIs or privately hosted open-source foundation models. The emergence of hallucinations presents a notable obstacle in the widespread adoption of Gen AI within enterprises. Dataworkz streamlines the implementation of RAG applications enabling Gen AI to reference its origins, consequently enhancing traceability. As a result, users can easily use conversational natural language to produce high-quality, LLM-ready, customer 360 views powering chatbots, Question-Answering systems, and summarization services. Dataworkz provides connectors for a vast array of customer data sources. These include back-office SaaS applications such as CRM, Marketing Automation, and Finance systems. In addition, leading relational and NoSQL databases, cloud object stores, data warehouses, and data lake houses are all supported. Dataflows, aka composable AI-enabled workflows, are a set of steps that users combine and arrange to perform any sort of data transformation – from creating vector embeddings to complex JSON transformations. Users can describe data wrangling tasks in natural language, have LLMs orchestrate the processing of data in any modality, and merge it into a “golden” 360-degree customer view. MongoDB Atlas is used to store the source document chunks for this customer's 360-degree view and Atlas Vector Search is used to index and query the associated vector embeddings. The generation of outputs produced by the customer’s chosen LLM is augmented with similarity search and retrieval powered by Atlas. Public LLMs such as OpenAI and Cohere or privately hosted LLMs such as Databricks Dolly are also available. The integrated experience of the MongoDB Atlas database and Atlas Vector Search simplifies developer workflows. Dataworkz has the freedom and flexibility to meet their customers wherever they run their business with multi-cloud support. For Dataworkz, access to Atlas credits and the MongoDB partner ecosystem are key drivers for becoming part of the AI Innovators program. What's next? If you are building AI-enabled apps on MongoDB, sign up for our AI Innovators Program . We’ve had applicants from all industries building for a huge diversity of new use cases. To get a flavor, take a look at earlier blog posts in this series: Building AI with MongoDB: First Qualifiers includes AI at the network edge for computer vision and augmented reality; risk modeling for public safety; and predictive maintenance paired with Question-Answering systems for maritime operators. Building AI with MongoDB: Compliance to Copilots features AI in healthcare along with intelligent assistants that help product managers specify better products and help sales teams compose emails that convert 2x higher. Finally, check out our MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into AI-driven reality.
Building AI with MongoDB: How VISO TRUST is Transforming Cyber Risk Intelligence
Since announcing MongoDB Atlas Vector Search preview availability back in June, we’ve seen rapid adoption from developers building a wide range of AI-enabled apps. Today we're going to talk to one of these customers. VISO TRUST puts reliable, comprehensive, actionable vendor security information directly in the hands of decision-makers who need to make informed risk assessments. The company uses a combination of state-of-the-art models from OpenAI, Hugging Face, Anthropic, Google, and AWS, augmented by vector search and retrieval from MongoDB Atlas. We sat down with Pierce Lamb, Senior Software Engineer on the Data and Machine Learning team at VISO TRUST to learn more. Tell us a little bit about your company. What are you trying to accomplish and how that benefits your customers or society more broadly? VISO TRUST is an AI-powered third-party cyber risk and trust platform that enables any company to access actionable vendor security information in minutes. VISO TRUST delivers the fast and accurate intelligence needed to make informed cybersecurity risk decisions at scale for companies at any maturity level. Our commitment to innovation means that we are constantly looking for ways to optimize business value for our customers. VISO TRUST ensures that complex business-to-business (B2B) transactions adequately protect the confidentiality, integrity, and availability of trusted information. VISO TRUST’s mission is to become the largest global provider of cyber risk intelligence and become the intermediary for business transactions. Through the use of VISO TRUST, customers will reduce their threat surface in B2B transactions with vendors and thereby reduce the overall risk posture and potential security incidents like breaches, malicious injections, and more. Today VISO TRUST has many great enterprise customers like InstaCart, Gusto, and Upwork and they all say the same thing: 90% less work, 80% reduction in time to assess risk, and near 100% vendor adoption. Because it’s the only approach that can deliver accurate results at scale, for the first time, customers are able to gain complete visibility into their entire third-party populations and take control of their third-party risk. Describe what your application does and what role AI plays in it The VISO TRUST Platform approach uses patented, proprietary machine learning and a team of highly qualified third-party risk professionals to automate this process at scale. Simply put, VISO TRUST automates vendor due diligence and reduces third-party at scale. And security teams can stop chasing vendors, reading documents, or analyzing spreadsheets. Figure 1: VISO TRUST is the only SaaS third-party cyber risk management platform that delivers the rapid security intelligence needed for modern companies to make critical risk decisions early in the procurement process VISO TRUST Platform easily engages third parties, saving everyone time and resources. In a 5-minute web-based session third parties are prompted to upload relevant artifacts of the security program that already exists and our supervised AI – we call Artifact Intelligence – does the rest. Security artifacts that enter VISO’s Artifact Intelligence pipeline interact with AI/ML in three primary ways. First, VISO deploys discriminator models that produce high-confidence predictions about features of the artifact. For example, one model performs artifact classification, another detects organizations inside the artifact, another predicts which pages are likely to contain security controls, and more. Our modules reference a comprehensive set of over 25 security frameworks and use document heuristics and natural language processing to analyze any written material and extract all relevant control information. Secondly, artifacts have text content parsed out of them in the form of sentences, paragraphs, headers, table rows, and more; these text blobs are embedded and stored in MongoDB Atlas to become part of our dense retrieval system. This dense retrieval system performs retrieval-augmented generation (RAG) using MongoDB features like Atlas Vector Search to provide ranked context to large language model (LLM) prompts. Thirdly, we use RAG results to seed LLM prompts and chain together their outputs to produce extremely accurate factual information about the artifact in the pipeline. This information is able to provide instant intelligence to customers that previously took weeks to produce. VISO TRUST’s risk model analyzes your risk and delivers a complete assessment that provides everything you need to know to make qualified risk decisions about the relationship. In addition, the platform continuously monitors and reassesses third-party vendors to ensure compliance. What specific AI/ML techniques, algorithms, or models are utilized in your application? For our discriminator models, we research the state-of-the-art pre-trained models (typically narrowed by those contained in HuggingFace’s transformers package) and perform fine-tuning of these models using our dataset. For our dense retrieval system, we use MongoDB Atlas Vector Search which internally uses the Hierarchical Navigable Small Worlds algorithm to retrieve similar embeddings to embedded text content. We have plans to perform a re-ranking of these results as well. For our LLM system, we have experimented with GPT3.5-turbo, GPT4, Claude 1 & 2, Bard, Vertex, and Bedrock. We blend a variety of these based on our customer's accuracy, latency, and security needs. Can you describe other AI technologies used in your application stack? Some of the other frameworks we use are HuggingFace transformers, evaluate, accelerate, and Datasets, PyTorch, WandB, and Amazon Sagemaker. We have a library for ML experiments (fine-tuning) that is custom-built, a library for workflow orchestration that is custom-built, and all of our prompt engineering is custom-built. Why did you choose MongoDB as part of your application stack? Which MongoDB features are you using and where are you running MongoDB? The VISO TRUST Platform relies on effective solutions and tools like MongoDB's distinctive attributes to fulfill specific objectives. MongoDB supports our platform's mechanism to engage third parties efficiently, employing both AI and human oversight to automate the assessment of security artifacts at scale. The fundamental value proposition of MongoDB – a robust document database – is why we originally chose it. It was originally deployed as a storage/retrieval mechanism for all the factual information our artifact intelligence pipeline produces about artifacts. While it still performs this function today, it has now become our “vector/metadata database.” MongoDB executes fast ranking of large quantities of embedded text blobs for us while Atlas provides us with all the ease-of-use of a cloud-ready database. We use both the Atlas search index visualization, and the query profiler visualization daily. Even just the basic display of a few documents in collections often saves time. Finally, when we recently backfilled embeddings across one of our MongoDB deployments, Atlas would automatically provision more disk space for large indexes without us needing to be around which was incredibly helpful. What are the benefits you've achieved by using MongoDB? I would say there are two primary benefits that have greatly helped us with respect to MongoDB and Atlas. First, MongoDB was already a place where we were storing metadata about artifacts in our system; with the introduction of Atlas Vector Search now we have a comprehensive vector/metadata database – that’s been battle-tested over a decade – that solves our dense retrieval needs. No need to deploy a new database we have to manage and learn. Our vectors and artifact metadata can be stored right next to each other. Second, Atlas has been helpful in making all the painful parts of database management easy. Creating indexes, provisioning capacity, alerting slow queries, visualizing data, and much more have saved us time and allowed us to focus on more important things. What are your future plans for new applications and how does MongoDB fit into them? Retrieval-augmented generation is going to continue to be a first-class feature of our application. In this regard, the evolution of Atlas Vector Search and its ecosystem in MongoDB will be highly relevant to us. MongoDB has become the database our ML team uses, so as our ML footprint expands, our use of MongoDB will expand. Getting started Thanks so much to Pierce for sharing details on VISO TRUST’s AI-powered applications and experiences with MongoDB. The best way to get started with Atlas Vector Search is to head over to the product page . There you will find tutorials, documentation, and whitepapers along with the ability to sign up for MongoDB Atlas. You’ll just be a few clicks away from spinning up your own vector search engine where you can experiment with the power of vector embeddings and RAG. We’d love to see what you build, and are eager for any feedback that will make the product even better in the future!
Building AI with MongoDB: From Compliance to Copilots
There has been a lot of recent reporting on the desire to regulate AI. But very little has been made of how AI itself can assist with regulatory compliance. In our latest round-up of qualifiers for the MongoDB AI Innovators Program, we feature a company who are doing just that in one of the world’s most heavily regulated industries. Helping comply with regulations is just one way AI can assist us. We hear a lot about copilots coaching developers to write higher-quality code faster. But this isn’t the only domain where AI-powered copilots can shine. To round out this blog post, we provide two additional examples – a copilot for product managers that helps them define better specifications and a copilot for sales teams to help them better engage customers. We launched the MongoDB AI Innovators Program back in June this year to help companies like these “build the next big thing” in AI. Whether a freshly minted start-up or an established enterprise, you can benefit from the program, so go ahead and sign up. In the meantime, let's explore how innovators are using MongoDB for use cases as diverse as compliance to copilots. AI-powered compliance for real-time healthcare data Inovaare transforms complex compliance processes by designing configurable AI-driven automation solutions. These solutions help healthcare organizations collect real-time data across internal and external departments, creating one compliance management system. Founded 10 years ago and now with 250 employees, Inovaare's comprehensive suite of HIPAA-compliant software solutions enables healthcare organizations across the Americas to efficiently meet their unique business and regulatory requirements. They can sustain audit readiness, reduce non-compliance risks, and lower overall operating costs. Inovaare uses classic and generative AI models to power a range of services. Custom models are built with PyTorch while LLMs are built with transformers from Hugging Face and developed and orchestrated with LangChain . MongoDB Atlas powers the models’ underlying data layer. Models are used for document classification along with information extraction and enrichment. Healthcare professionals can work with this data in multiple ways including semantic search and the company’s Question-Answering chatbot. A standalone vector database was originally used to store and retrieve each document’s vector embeddings as part of in-context model prompting. Now Inovaare has migrated to Atlas Vector Search . This migration helps the company’s developers build faster through tight vector integration with the transactional, analytical, and full-text search data services provided by the MongoDB Atlas platform . Next-generation healthcare compliance platform from Inovaare. The platform provides AI-powered health plan solutions with continuous monitoring, regulatory reporting, and business intelligence. Inovaare also uses AI agents to orchestrate complex workflows across multiple healthcare business processes, with data collected from each process stored in the MongoDB Atlas database. Business users can visualize the latest state of healthcare data with natural language questions translated by LLMs and sent to Atlas Charts for dashboarding. Inovaare selected MongoDB because its flexible document data model enables the company's developers to store and query data of any structure. This coupled with Atlas’ HIPAA compliance, end-to-end data encryption, and the freedom to run on any cloud – supporting almost any application workload – helps the company innovate and release with higher velocity and lower cost than having to stitch together an assortment of disparate databases and search engines. Going forward, Inovaare plans to expand into other regions and compliance use cases. As part of MongoDB’s AI Innovators Program, the company’s engineers get to work with MongoDB specialists at every stage of their journey. The AI copilot for product managers The ultimate goal of any venture is to create and deliver meaningful value while achieving product-market fit. Ventecon 's AI Copilot supports product managers in their mission to craft market-leading products and solutions that contribute to a better future for all. Hundreds of bots currently crawl the Internet, identifying and processing over 1,000,000 pieces of content every day. This content includes details on product offerings, features, user stories, reviews, scenarios, acceptance criteria, and issues through market research data from target industries. Processed data is stored in MongoDB. Here it is used by Ventecon’s proprietary NLP models to assist product managers in generating and refining product specifications directly within an AI-powered virtual space. Patrick Beckedorf, co-founder of Ventecon says “Product data is highly context-specific and so we have to pre-train foundation models with specific product management goals, fine-tune with contextual product data, include context over time, and keep it up to date. In doing so, every product manager gets a digital, highly contextualized expert buddy.” Currently, vector embeddings from the product data stored in MongoDB are indexed and queried in a standalone vector database. As Beckedorf says, the engineering team is now exploring a more integrated approach. “The complexity of keeping vector embeddings synchronized across both source and vector databases, coupled with the overhead of running the vector store ties up engineering resources and may affect indexing and search performance. A solid architecture therefore provides opportunities to process and provide new knowledge very fast, i.e. in Retrieval-Augmented Generation (RAG), while bottlenecks in the architecture may introduce risks, especially at scale. This is why we are evaluating Atlas Vector Search to bring source data and vectors together in a single data layer. We can use Atlas Triggers to call our embedding models as soon as new data is inserted into the MongoDB database. That means we can have those embeddings back in MongoDB and available for querying almost immediately.” For Beckedorf, the collaboration with data pioneers and the co-creation opportunities with MongoDB are the most valuable aspects of the AI Innovators Program. AI sales email coaching: 2x reply rates in half the time Lavender is an AI sales email coach. It assists users in real-time to write better emails faster. Sales teams who use Lavender report they’re able to write emails in less time and receive twice as many replies. The tool uses generative AI to help compose emails. It personalizes introductions for each recipient and scores each email as it is being written to identify anything that hurts the chances of a reply. Response rates are tracked so that teams can monitor progress and continuously improve performance using data-backed insights. OpenAI’s GPT LLMs along with ChatGPT collaboratively generate email copy with the user. The output is then analyzed and scored through a complex set of business logic layers built by Lavender’s data science team, which yield industry-leading, high-quality emails. Together, the custom and generative models help write subject lines, remove jargon and fix grammar, simplify unwieldy sentences, and optimize formatting for mobile devices. They can also retrieve the recipient’s (and their company’s) latest publicly posted information to help personalize and enrich outreach. MongoDB Atlas running on Google Cloud backs the platform. Lavender’s engineers selected MongoDB because of the flexibility of its document data model. They can add fields on-demand without lengthy schema migrations and can store data of any structure. This includes structured data such as user profiles and response tracking metrics through to semi and unstructured email copy and associated ML-generated scores. The team is now exploring Atlas Vector Search to further augment LLM outputs by retrieving similar emails that have performed well. Storing, syncing, and querying vector embeddings right alongside application data will help the company’s engineers build new features faster while reducing technology sprawl. What's next? We have more places left in our AI Innovators Program , but they are filling up fast, so sign up directly on the program’s web page. We are accepting applications from a diverse range of AI use cases. To get a flavor of that diversity, take a look at our blog post announcing the first program qualifiers who are building AI with MongoDB . You’ll see use cases that take AI to the network edge for computer vision and Augmented Reality (AR), risk modeling for public safety, and predictive maintenance paired with Question-Answering systems for maritime operators. Also, check out our MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into AI-driven reality.
Building AI with MongoDB: Announcing the First Qualifiers for the Innovators Program
Artificial Intelligence is igniting so many brilliant ideas for new products and services. But turning those ideas into reality is a path that even the brightest minds struggle to navigate without some help along the way. That’s why we launched the MongoDB AI Innovators Program back in June this year. Access to expert technical advice, free MongoDB Atlas credits, co-marketing opportunities, and – for eligible startups, introductions to potential venture investors – come together to help you “build the next big thing” in AI. Since opening the program, we’ve received applications from around the world addressing every industry and spanning the spectrum of generative to analytical AI use cases. From enterprise chat and video bots that improve customer service and unlock insights from vast internal information repositories, conversational intelligence for sales reps, AI agents for workflow orchestration, tools for talent recruitment and retention, identifying workplace burnout through to news classifiers and summarization, personal wellbeing assistants, and the generation of bedtime stories for children that make science and technology more accessible to them. We’ve been amazed by the breadth and pace of innovators building AI on top of MongoDB Atlas. In this blog post, I want to share an overview of three startups that have just qualified from our AI Innovators Program. Elevating the edge experience: Deploy AI anywhere with Cloneable and MongoDB Cloneable provides the application layer that brings AI to any device at the edge of the network. The Cloneable platform empowers developers to craft dynamic applications using intuitive low/no-code tools, instantly deployable to a spectrum of devices - mobiles, IoT devices, robots, and beyond. By harnessing machine learning models, a business can seamlessly leverage complex technologies across its operations. Models are pushed down to the device where they are converted to a native embedded format such as CoreML. From here, they are executed by the device’s neural engine to provide low latency inference, computer vision, and augmented reality. Cloneable uses MongoDB Atlas Device Sync to persist data locally on the device and sync it to the Atlas database backend in the cloud. This unique synergy creates an ecosystem where enterprise apps become real-time gateways to track, measure, inspect, and respond to events across an operation. The company is also exploring creating vector embeddings from images and data collected on devices and storing them in Atlas Vector Search . With this expanded functionality, users can better search and analyze events collected from the field. Predicting risks to public safety ExTrac draws on thousands of data sources identified by domain experts, using AI-powered analytics to locate, track and forecast both digital and physical risks to public safety in real-time. Initially serving Western governments to predict risks of emerging or escalating conflicts overseas, ExTrac is expanding into enterprise use cases for reputational management, operational risk, and content moderation. “Data is at the core of what we do. Our domain experts find and curate relevant streams of data, and then we use AI to anonymize and make sense of it at scale”, said Matt King, CEO at ExTrac. “We take a base model, such as RoBERTa or an LLM, and fine-tune it with our own labeled data to create domain-specific models capable of identifying and classifying threats in real-time.” Asked about why ExTrac built on MongoDB Atlas, King said “The flexibility of the document data model allows us to land, index, and analyze data of any shape and structure – no matter how complex. This helps us unlock instant insights for our customers.” King went on to say “ Atlas Vector Search is also proving to be incredibly powerful across a range of tasks where we use the results of the search to augment our LLMs and reduce hallucinations. We can store vector embeddings right alongside the source data in a single system, enabling our developers to build new features way faster than if they had to bolt-on a standalone vector database - many of which limit the amount of data that can be returned if it has meta-data attached to it. We are also moving beyond text to vectorize images and videos from our archives dating back over a decade. Being able to query and analyze data in any modality will help us to better model trends, track evolving narratives, and predict risk for our customers.” Access to technical expertise provided by the AI Innovators Program will help ExTrac manage the ever-growing size of its data sets – keeping performance high and costs low as the business scales. Using AI to cut maritime emissions and risk CetoAI provides predictive analytics for the maritime industry; combining high-frequency data, engineering expertise, and artificial intelligence the company reduces machinery breakdowns, cuts carbon emissions, and manages operational risk. Sensors installed onto each vessel generate real-time data feeds of engine and vessel performance. The data is used by the company’s AI models for predictive maintenance, optimizing fuel consumption, and carbon intensity forecasting, with the outputs consumed by the vessel’s crew, owners, and insurers. The data feeds generated by CetoAI are highly complex. Sensors on each vessel emit around 90,000 JSON documents daily. Each document stores around 100 unique time-series measurements, all requiring heavy-duty analytics processing before feeding machine learning models. It was these demands that led CetoAI’s engineering team to select MongoDB, migrating from a standalone time-series database that couldn’t keep pace with business growth. Sensor measurements from each data feed are streamed through Microsoft Azure’s IoT hub and ingested into MongoDB Atlas’ purpose-built time-series database collections . Here MongoDB window functions process and transform the data before serving it to CetoAI’s machine-learning models built with PyTorch and Scikit-Learn. CetoAI is now exploring additional capabilities available with MongoDB to expand its offerings. Atlas Device Sync can persist data locally and sync it between vessels and the cloud, withstanding the loss of network connectivity. Atlas Vector Search can be used for Retrieval Augmented Generation with the company’s LLMs. These are being developed to help crews diagnose and remediate equipment failures using natural language queries. Access to Atlas credits and expert support provided as part of the AI Innovators program enable CetoAI to accelerate and derisk the delivery of these new services. Get started Today we’ve focused on just three startups – there are many more that are already enjoying the benefits of the AI Innovators Program . We have more places left, but they are filling up fast, so sign up directly on the program’s web page. Also, check out our MongoDB for Artificial Intelligence resources page for all of the latest best practices to get you started on building the “next big thing” with AI.
Application-Driven Analytics: Why are Operational and Analytical Workloads Converging?
Fifteen years ago, our vision was to provide developers a new approach to databases. As industry change is constant, we are working to bring you another shift so you can stay ahead of the curve – application-driven analytics. Application-driven analytics isn’t about replacing your centralized data warehouse or data lakehouse. Rather it’s about augmenting them; bringing a new class of analytics directly into applications where they are built by developers. In his recent Analyst Perspective, Matt Aslett, VP & Research Director at Ventana Research was very clear there remains different functional requirements for dedicated operational and analytical systems. However, he noted the growth in intelligent applications infused with the results of analytic processing. This in turn is driving operational / OLTP data platforms such as MongoDB to integrate native analytics functionality. In the Perspective, Aslett goes on to describe some of the recent product enhancements introduced by MongoDB to support analytics, and wraps up with this advice: I recommend that organizations evaluating potential database providers for new, intelligent operational applications include MongoDB Atlas in their evaluations. If you are interested in learning more about Ventana Research’s insights, take a look at the company’s Analyst Perspective: MongoDB’s Atlas Delivers Data-Driven Applications . From manufacturing to retail and finance Beyond the research from industry analysts, organizations are increasingly working to capture the opportunities presented by application-driven analytics. Bosch Global Software uses MongoDB at the core of its IoT systems powering automotive, industrial, and smart home use cases. Being able to generate analytics in real time is a key capability of the company’s applications. As discussed in his recent article in The New Stack , Kai Hackbarth, senior technology evangelist at Bosch, talked about the value MongoDB provides: From my history [of doing this for] 22 years, we never had the capabilities to do this before. Global retailer Marks and Spencer rebuilt its Sparkes rewards program , moving from a packaged app to an in-house solution with MongoDB. The company reduced its time to build one million personalized customer offers from one hour to just five minutes. It is now able to serve those offers at 10x lower latency with the loyalty program driving 8x higher customer spend. As part of its digital transformation initiative, Toyota Financial Services built a new operational data layer powered with MongoDB Atlas . The data layer connects the company’s internal mainframe backend systems with customers engaging the company through new digital channels. MongoDB handles customer onboarding along with fraud prevention. The native OLTP and analytics capabilities provided by MongoDB Atlas were key. They eliminated the need for Toyota Financial Services to integrate and build against separate database, cache, object store, and data warehouse technologies. All dramatically simplifying the company’s technology estate. MongoDB helps us make better decisions and build better products. Ken Schuelke, Division Information Officer, Toyota Financial Services Enabling developers for application-driven analytics How is MongoDDB helping developers make the shift to smarter apps and intelligent software? MongoDB Atlas unifies the core transactional and analytical data services needed to deliver app-driven analytics. It puts powerful analytics capabilities directly into the hands of developers in ways that fit their workflows. With Atlas, they land data of any structure, index, query, and analyze it in any way they want, and then archive it. All while working with a unified API and without having to build their own data pipelines or duplicate data. MongoDB Atlas supports any level of application intelligence. From querying and searching records to aggregating and transforming data through to feeding rules-based engines and machine learning models. Figure 1: MongoDB Atlas combines transactional and analytical processing in a multi-cloud data platform. Atlas automatically optimizes how data is ingested, processed, and stored, maximizing the efficiency of the application’s operational and analytical workloads. These capabilities are packaged in an elegant and integrated multi-cloud data architecture. Getting started There are many ways you can get started in building more intelligent apps. If you want to read more about the use-cases and business drivers, download our App-Driven Analytics whitepaper . Alternatively if you want to dive straight in, sign up for an account on MongoDB Atlas . From there, they can create a free database cluster, load your own data or our sample data sets, and explore what’s possible within the platform. The MongoDB Developer Center hosts an array of resources including tutorials, sample code, videos, and documentation organized by programming language and product.
Choosing the Right Tool for the Job: Understanding the Analytics Spectrum
Data-driven organizations share a common desire to get more value out of the data they're generating. To maximize that value, many of them are asking the same or similar questions: How long does it take to get analytics and insights from our application data? What would be the business impact if we could make that process faster? What new experiences could we create by having analytics integrated directly within our customer-facing apps? How do our developers access the tools and APIs they need to build sophisticated analytics queries directly into their application code? How do we make sense of voluminous streams of time-series data? We believe the answer to these questions in today's digital economy is application-driven analytics. What is Application-Driven Analytics? Traditionally, there's been a separation at organizations between applications that run the business and analytics that manage the business. They're built by different teams, they serve different audiences, and the data itself is replicated and stored in different systems. There are benefits to the traditional way of doing things and it's not going away. However, in today's digital economy, where the need to create competitive advantage and reduce costs and risk are paramount, organizations will continue to innovate upon the traditional model. Today, those needs manifest themselves in the demand for smarter applications that drive better customer experiences and surface insights to initiate intelligent actions automatically. This all happens within the flow of the application on live, operational data in real time. Alongside those applications, the business also wants faster insights so it can see what's happening, when it's happening. This is known as business visibility, and the goal of it is to increase efficiency by enabling faster decisions on fresher data. In-app analytics and real-time visibility are enabled by what we call application-driven analytics. Find out why the MongoDB Atlas developer data platform was recently named a Leader in Forrester Wave: Translytical Data Platforms, Q4 2022 You can find examples of application-driven analytics in multiple real-world industry use cases including: Hyper-personalization in retail Fraud prevention in financial services Preventative maintenance in manufacturing Single subscriber view in telecommunications Fitness tracking in healthcare A/B testing in gaming Where Application-Driven Analytics fits in the Analytics Ecosystem Application-driven analytics complements existing analytics processes where data is moved out of operational systems into centralized data warehouses and data lakes. In no way does it replace them. However, a broader spectrum of capabilities are now required to meet more demanding business requirements. Contrasting the two approaches, application-driven analytics is designed to continuously query data in your operational systems. The freshest data comes in from the application serving many concurrent users at very low latency. It involves working on much smaller subsets of data compared to centralized analytics systems. Application-driven analytics is typically working with hundreds to possibly a few thousand records at a time. And it's running less complex queries against that data. At the other end of the spectrum is centralized analytics. These systems are running much more complex queries across massive data sets — hundreds of thousands or maybe millions of records, and maybe at petabyte scale — that have been ingested from many different operational data sources across the organization. Table 1 below identifies the required capabilities across the spectrum of different classes of analytics. These are designed to help MongoDB’s customers match appropriate technologies and skill sets to each business use case they are building for. By mapping required capabilities to use cases, you can see how these different classes of analytics serve different purposes. If, for example, we're dealing with recommendations in an e-commerce platform, the centralized data warehouse or data lake will regularly analyze vast troves of first- and third-party customer data. This analysis is then blended with available inventory to create a set of potential customer offers. These offers are then loaded back into operational systems where application-driven analytics is used to decide which offers are most relevant to the customer based on a set of real-time criteria, such as actual stock availability and which items a shopper might already have in their basket. This real-time decision-making is important because you wouldn't want to serve an offer on a product that can no longer be fulfilled or on an item a customer has already decided to buy. This example demonstrates why it is essential to choose the right tool for the job. Specifically, in order to build a portfolio of potential offers, the centralized data warehouse or data lake is an ideal fit. Such technologies can process hundreds of TBs of customer records and order data in a single query. The same technologies, however, are completely inappropriate when it comes to serving those offers to customers in real time. Centralized analytics systems are not designed to serve thousands of concurrent user sessions. Nor can they access real-time inventory or basket data in order to make low latency decisions in milliseconds. Instead, for these scenarios, application-driven analytics served from an operational system is the right technology fit. As we can see, application-driven analytics is complementary to traditional centralized analytics, and in no way competitive to it. The benefits to organizations of using these complementary classes of analytics include: Maximizing competitive advantage through smarter and more intelligent applications Out-innovating and differentiating in the market Improving customer experience and loyalty Reducing cost by improving business visibility and efficiency Through its design, MongoDB Atlas unifies the essential data services needed to deliver on application-driven analytics. It gives developers the tools, tech, and skills they need to infuse analytics into their apps. At the same time, Atlas provides business analysts, data scientists, and data engineers direct access to live data using their regular tools without impacting the app. For more information about how to implement app-driven analytics and how the MongoDB developer data platform gives you the tools needed to succeed, download our white paper, Application-Driven Analytics: Defining the Next Wave of Modern Apps .
MongoDB Named as a Leader in The Forrester Wave™: Translytical Data Platforms, Q4 2022
In The Forrester Wave™: Translytical Data Platforms, Q4 2022, translytical data platforms are described by Forrester as being “designed to support transactional, operational, and analytical workloads without sacrificing data integrity, performance, and analytics scale.” Characterized as next-generation data platforms, the Forrester report further notes that “Adoption of these platforms continues to grow strongly to support new and emerging business cases, including real-time integrated insights, scalable microservices, machine learning (ML), streaming analytics, and extreme transaction processing.” To help users understand this emerging technology landscape, Forrester published its previous Translytical Data Platforms Wave back in 2019. Three years on, Forrester has named MongoDB as a Leader in its latest Translytical Data Platforms Wave. We believe MongoDB was named a Leader in this report due to the R&D investments made in further building out capabilities in MongoDB Atlas , our multi-cloud developer data platform. These investments were driven by the demands of the developer communities we work with day-in, day-out. You told us how you struggle to bring together all of the data infrastructure needed to power modern digital experiences – from transactional databases to analytics processing, full-text search, and streaming. This is exactly what our developer data platform offers. It provides an elegant, integrated, and fully-managed data architecture accessed via a unified set of APIs. With MongoDB Atlas, developers are more productive, they ship code faster and improve it more frequently. Translytics and the Rise of Application-Driven Analytics Translytics is part of an important shift that we at MongoDB call application-driven analytics . By building smarter apps and increasing the speed of business insights, application-driven analytics gives you the opportunity to out-innovate your competitors and improve efficiency. To do this you can no longer rely only on copying data out of operational systems into separate analytics stores. Moving data takes time and creates too much separation between application events and actions. Instead, analytics processing has to be “shifted left” to the source of your data – to the applications themselves. This is the shift MongoDB calls application-driven analytics . It’s a shift that impacts both the skills and the technologies developers and analytics teams use every day. This is why understanding the technology landscape is so important. Overall, MongoDB is good for customers that are driving their strategy around developers who are tasked with building analytics into their applications. The Forrester Wave™: Translytical Data Platforms, Q4 2022 Evaluating the top vendors in the Translytic Data Platforms Wave Forrester evaluated 15 of the most significant translytical data platform vendors against 26 criteria. These criteria span current offering and strategy through to market presence. Forrester gave MongoDB the highest possible scores across eleven criteria, including: Number of customers Performance Scalability Dev Tools/API Multi-model Streaming Cloud / On-prem / distributed architecture Commercial model The report cites that “MongoDB ramps up its translytical offering aggressively”, and that “Organizations use MongoDB to support real-time analytics, systems of insight, customer 360, internet of things (IoT), and mobile applications.” Access your complimentary copy of the report here . Customer Momentum Many development teams start out using MongoDB as an operational database for both new cloud-native services as well as modernized legacy apps. More and more of these teams are now improving customer experience and speeding business insight by adopting application-driven analytics. Examples include: Bosch for predictive maintenance using IoT sensor data. Keller Williams for relevance-based property search and sales dashboarding. Iron Mountain for AI-based information discovery and intelligence. Volvo Connect for fleet management. Getting started on your Translytics Journey The MongoDB Atlas developer data platform is engineered to help you make the shift to Translytics and application-driven analytics – leading to smarter apps and increased business visibility. The best way to get started is to sign up for an account on MongoDB Atlas . Then create a free database cluster, load your own data or our sample data sets, and explore what’s possible within the platform. The MongoDB Developer Center hosts an array of resources including tutorials, sample code, videos, and documentation organized by programming language and product. Whether you are a developer or a member of an analytics team, it's never been easier to get started enriching your transactional workloads with analytics!
5 Key Questions for App-Driven Analytics
Note: This article originally appeared in The New Stack . Data that powers applications and data that powers analytics typically live in separate domains in the data estate. This separation is mainly due to the fact that they serve different strategic purposes for an organization. Applications are used for engaging with customers while analytics are for insight. The two classes of workloads have different requirements—such as read and write access patterns, concurrency, and latency—therefore, organizations typically deploy purpose-built databases and duplicate data between them to satisfy the unique requirements of each use case. As distinct as these systems are, they're also highly interdependent in today's digital economy. Application data is fed into analytics platforms where it's combined and enriched with other operational and historical data, supplemented with business intelligence (BI), machine learning (ML) and predictive analytics, and sometimes fed back to applications to deliver richer experiences. Picture, for example, an ecommerce system that segments users by demographic data and past purchases and then serves relevant recommendations when they next visit the website. The process of moving data between the two types of systems is here to stay. But, today, that’s not enough. The current digital economy, with its seamless user experiences that customers have come to expect, requires that applications also become smarter, autonomously taking intelligent actions in real time on our behalf. Along with smarter apps, businesses want insights faster so they know what is happening “in the moment.” To meet these demands, we can no longer rely only on copying data out of our operational systems into centralized analytics stores. Moving data takes time and creates too much separation between application events and analytical actions. Instead, analytics processing must be “shifted left” to the source of the data—to the applications themselves. We call this shift application-driven analytics . And it’s a shift that both developers and analytics teams need to be ready to embrace. Find out why the MongoDB Atlas developer data platform was recently named a Leader in Forrester Wave: Translytical Data Platforms, Q4 2022 Defining required capabilities Embracing the shift is one thing; having the capabilities to implement it is another. In this article, we break down the capabilities required to implement application-driven analytics into the following five critical questions for developers: How do developers access the tools they need to build sophisticated analytics queries directly into their application code? How do developers make sense of voluminous streams of time series data? How do developers create intelligent applications that automatically react to events in real time? How do developers combine live application data in hot database storage with aged data in cooler cloud storage to make predictions? How can developers bring analytics into applications without compromising performance? To take a deeper dive into app-driven analytics—including specific requirements for developers compared with data analysts and real-world success stories—download our white paper: Application-Driven Analytics . 1. How do developers access the tools they need to build sophisticated analytics queries directly into their application code? To unlock the latent power of application data that exists across the data estate, developers rely on the ability to perform CRUD operations, sophisticated aggregations, and data transformations. The primary tool for delivering on these capabilities is an API that allows them to query data any way they need, from simple lookups to building more sophisticated data processing pipelines. Developers need that API implemented as an extension of their preferred programming language to remain "in the zone" as they work through problems in a flow state. Alongside a powerful API, developers need a versatile query engine and indexing that returns results in the most efficient way possible. Without indexing, the database engine needs to go through each record to find a match. With indexing, the database can find relevant results faster and with less overhead. Once developers start interacting with the database systematically, they need tools that can give them visibility into query performance so they can tune and optimize. Powerful tools like MongoDB Compass let users monitor real-time server and database metrics as well as visualize performance issues . Additionally, column-oriented representation of data can be used to power in-app visualizations and analytics on top of transactional data. Other MongoDB Atlas tools can be used to make performance recommendations , such as index and schema suggestions to further streamline database queries. 2. How do you make sense of voluminous streams of time series data? Time series data is typical in many modern applications. Internet of Things (IoT) sensor data, financial trades, clickstreams, and logs enable businesses to surface valuable insights. To help, MongoDB developed the highly optimized time series collection type and clustered indexes. Built on a highly compressible columnar storage format, time series collections can reduce storage and I/O overhead by as much as 70%. Developers need the ability to query and analyze this data across rolling time windows while filling any gaps in incoming data. They also need a way to visualize this data in real time to understand complex trends. Another key requirement is a mechanism that automates the management of the time series data lifecycle. As data ages, it should be moved out of hot storage to avoid congestion on live systems; however, there is still value in that data, especially in aggregated form to provide historical analysis. So, organizations need a systematic way of tiering that data into low-cost object storage in order to maintain their ability to access and query that data for the insights it can surface. 3. How do you create intelligent applications that automatically react to events in real time? Modern applications must be able to continuously analyze data in real time as they react to live events. Dynamic pricing in a ride-hailing service, recalculating delivery times in a logistics app due to changing traffic conditions, triggering a service call when a factory machine component starts to fail, or initiating a trade when stock markets move—these are just a few examples of in-app analytics that require continuous, real-time data analysis. MongoDB Atlas has a host of capabilities to support these requirements. With change streams , for example, all database changes are published to an API, notifying subscribing applications when an event matches predefined criteria. Atlas triggers and functions can then automatically execute application code in response to the event, allowing you to build reactive, real-time, in-app analytics. 4. How do you combine live application data in hot database storage with aged data in cooler cloud storage to make predictions? Data is increasingly distributed across different applications, microservices , and even cloud providers. Some of that data consists of newly ingested time-series measurements or orders made in your ecommerce store and resides in hot database storage. Other data sets consist of older data that might be archived in lower cost, object cloud storage. Organizations must be able to query, blend, and analyze fresh data coming in from microservices and IoT devices along with cooler data, APIs, and third-party data sources that reside in object stores in ways not possible with regular databases. The ability to bring all key data assets together is critical for understanding trends and making predictions, whether that's handled by a human or as part of a machine learning process. 5. How can you bring analytics into your applications without compromising their performance? Live, customer-facing applications need to serve many concurrent users while ensuring low, predictable latency and do it consistently at scale. Any slowdown degrades customer experience and drives customers toward competitors. In one frequently cited study, Amazon found that just 100 milliseconds of extra load time cost them 1% in sales . So, it's critical that analytics queries on live data don’t affect app performance. A distributed architecture can help you enforce isolation between the transactional and analytical sides of an application within a single database cluster . You can also use sophisticated replication techniques to move data to systems that are totally isolated but look like a single system to the app. Next steps to app-driven analytics As application-driven analytics becomes pervasive, the MongoDB Atlas developer data platform unifies the core data services needed to make smarter apps and improved business visibility a reality. Atlas does this by seamlessly bridging the traditional divide between transactional and analytical workloads in an elegant and integrated data architecture. With MongoDB Atlas, you get a single platform managing a common data set for both developers and analysts. With its flexible document data model and unified query interface, the Atlas platform minimizes data movement and duplication and eliminates data silos and architectural complexity while unlocking analytics faster and at lower cost on live operational data. It does all this while meeting the most demanding requirements for resilience, scale, and data privacy. For more information about how to implement app-driven analytics and how the MongoDB developer data platform gives you the tools needed to succeed, download our white paper, Application-Driven Analytics .
5 Steps to Replacing Elasticsearch and Solr with Atlas Search
What do a global auto manufacturer, multinational media and entertainment company, and a challenger bank have in common? They have all made the switch from Elasticsearch to MongoDB Atlas Search to simplify their technology stack and ship application search faster. But what problems were they solving and how did they migrate? We have a new 5-step guide that takes you through why they switched, and how they did it. The need for application search Type almost anything into a search bar on sites like Google, Amazon, and Netflix and you are instantly presented with relevant results. Whether you make a typo or enter a partial search term, the search engine figures out what you are looking for. Results are returned conveniently sorted by relevance and are easy to navigate with features like highlighting, filters, and counts. Everyone now expects these same fast and intuitive search experiences in every application they use, whether at home or at work. However, creating these experiences is hard with the burden falling onto developers and ops teams who have to build and run the underlying systems. The pain of building application search MongoDB has always focused on accelerating and simplifying how developers build with data for any class of application. From our very earliest MongoDB releases, we saw developers needing to expose the application data stored in their database to search and information discovery. For simple use cases – where it was enough to just match text in a field – developers were able to use the basic text search operators and index built into the MongoDB database. However these lacked the much more sophisticated speed and relevance tuning features offered by dedicated search engines, typically built on top of Apache Lucene . As a result many developers ended up bolting on an external search engine such as Elasticsearch or Apache Solr to their database. Elasticsearch and Solr were (and remain) popular and proven. However as Figure 1 shows, they introduced a huge amount of complexity to the application stack, reducing developer velocity while driving up risk, complexity, and cost. Figure 1: The pain of bolting on a search engine to your database Working with the MongoDB community, our product designers and engineers ideated on ways to make building application search easier for developers – without compromising on the key features they needed. The result is MongoDB Atlas Search . What is Atlas Search and why switch to it? Atlas Search embeds a fully-managed Apache Lucene search index directly alongside the database and automatically synchronizes data between them. By integrating the database, search engine, and sync pipeline into a single, fully-managed platform you get to compress three systems into one and simplify your technology stack. Engineering teams and application owners have reported improved development velocity of 30% to 50% after adopting Atlas Search. This is because they get to: Eliminate the synchronization tax. Data is automatically and dynamically synced from the Atlas database to the Atlas Search indexes. They avoid having to stand up and manage their own sync mechanism, write custom transformation logic, or remap search indexes as their database schema evolves. They escape the 10% of engineering cycles typically lost to manually recovering sync failures, investing that time to innovate for their users instead. ( 1 ) Ship new features faster. They work with a single, unified API across both database and search operations, simplifying query development. No more context switching between multiple query languages, and with a single driver, build dependencies are streamlined so they release faster. They can test queries and preview results with interactive tools to fine-tune performance and scoring before deploying them directly into application code. Remove operational heavy-lifting. The fully-managed Atlas platform automates provisioning, replication, patching, upgrades, scaling, security, and disaster recovery while providing deep performance visibility into both database and search. By working with a single system, they avoid an exponential increase in the number of system components they need to design, test, secure, monitor, and maintain. Figure 2: Dramatic architectual simplification with integrated database, sync, and search in MongoDB Atlas 5 steps to make the switch to Atlas Search The benefits Atlas Search provides has led engineering teams across all industry sectors and geographies to make the switch from bolt-on search engines. Through the experiences gained by working with these teams, we have put together a repeatable 5-step methodology to replacing Elasticsearch and Solr. The guide steps you through how to: Qualify target workloads for Atlas Search. Migrate your indexes to Atlas Search. Migrate your queries to Atlas Search. Validate and relevance-tune your Atlas Search queries and indexes. Size and deploy your Atlas Search infrastructure. Figure 3: 5-step methodology to replacing Elasticsearch and Solr with Atlas Search The guide wraps up with examples of customers that have made the switch and provides guidance on how to get started with Atlas Search. What's next? You can get started today by downloading the 5-step guide to replacing Elasticsearch and Solr with Atlas Search . The 5-step guide is designed to help you plan and execute your migration project. MongoDB's Professional Services team is also available to you as a trusted delivery partner. We can help you through any of the steps in the methodology or throughout your entire journey to Atlas Search. If you want to dig deeper into Atlas Search, spin it up at no-cost on the Atlas Free Tier . You can follow along with reference materials and tutorials in the Atlas Search documentation using our sample data sets, or load your own data for experimentation within your own sandbox. Welcome to a world where application search is, at last, simplified! Download the 5-step Guide Now! 1. Based on interviews with engineering teams that have replaced bolt on search engines and the associated sync mechanism.
Scale Out Without Fear or Friction: Live Resharding in MongoDB
Live resharding was one of the key enhancements delivered in our MongoDB 5.0 Major Release . With live resharding you can change the shard key for your collection on demand as your application evolves with no database downtime or complex data migrations . In this blog post, we will be covering: Product developments that have made sharding more flexible What you had to do before MongoDB 5.0 to reshard your collection, and how that changed with 5.0 live resharding Guidance on the performance and operational considerations of using live resharding Before that, we should discuss why you should shard at all, and the importance of selecting a good shard key – even though you have the flexibility with live resharding to change it at any time. Go ahead and skip the next couple of sections if you are already familiar with sharding! Why Shard your Database? Sharding enables you to distribute your data across multiple nodes. You do that to: Scale out horizontally — accommodate growing data or application load by sharding once your application starts to get close to the capacity limits of a single replica set. Enforce data locality — for example pinning data to shards that are provisioned in specific regions so that the database delivers low latency local access and maintains data sovereignty for regulatory compliance. Sharding is the best way of scaling databases and MongoDB was developed to support sharding natively. Sharding MongoDB is transparent to your applications and it’s elastic so you can add and remove shards at any time. The Importance of Selecting a Good Shard Key MongoDB’s native sharding has always been highly flexible — you can select any field or combination of fields in your documents to shard on. This means you can select a shard key that is best suited to your application’s requirements. The choice of shard key is important as it defines how data is distributed across the available shards. Ideally you want to select a shard key that: Gives you low latency and high throughput reads and writes by matching data distribution to your application’s data access patterns. Evenly distributes data across the cluster so you avoid any one shard taking most of the load (i.e., a “hot shard”). Provides linear scalability as you add more shards in the future. While you have the flexibility to select any field(s) of your documents as your shard key, it was previously difficult to change the shard key later on. This made some developers fearful of sharding . If you chose a shard key that doesn’t work well, or if application requirements change and the shard key doesn’t work well for its changed access patterns, the impact on performance could be significant. At this point in time, no other mainstream distributed database allows users to change shard keys, but we wanted to give users this ability. Making Shard Keys More Flexible Over the past few releases, MongoDB engineers have been working to provide more sharding flexibility to users: MongoDB 4.2 introduced the ability to modify a shard key’s value . Under the covers the modification process uses a distributed, multi-document ACID transaction to change the placement of a document in a sharded cluster. This is useful when you want to rehome a document to a different geographic region or age data out to a slower storage tier . MongoDB 4.4 went further with the ability to refine the shard key for a collection by adding a suffix to an existing key. Both of these enhancements made sharding more flexible, but they didn’t help if you needed to reshard your collection using an entirely different shard key. Manual Resharding: Before MongoDB 5.0 Resharding a collection was a manual and complex process that could only be achieved through one of two approaches: Dumping the entire collection and then reloading it into a new collection with the new shard key . This is an offline process, and so your application is down until data reloading is complete — for example, it could take several days to dump and reload a 10 TB+ collection on a three-shard cluster. Undergoing a custom migration that involved writing all the data from the old cluster to a new cluster with the resharded collection. You had to write the query routing and migration logic, and then constantly check the migration progress to ensure all data had been successfully migrated. Custom migrations entail less downtime, but they come with a lot of overhead. They are highly complex, labor-intensive, risky, and expensive (as you had to run two clusters side-by-side). It took one MongoDB user three months to complete the live migration of 10 billion documents. How this Changed with MongoDB 5.0: Live Resharding We made manual resharding a thing of the past with MongoDB 5.0. With 5.0 you just run the reshardCollection command from the shell, point at the database and collection you want to reshard, specify the new shard key, and let MongoDB take care of the rest. reshardCollection: "<database>.<collection>", key: <shardkey> When you invoke the reshardCollection command, MongoDB clones your existing collection into a new collection with the new shard key, then starts applying all new oplog updates from the existing collection to the new collection. This enables the database to keep pace with incoming application writes. When all oplog updates have been applied, MongoDB will automatically cut over to the new collection and remove the old collection in the background. Lets walk through an example where live resharding would really help a user: The user has an orders collection. In the past, they needed to scale out and chose the order_id field as the shard key. Now they realize that they have to regularly query each customer’s orders to quickly display order history. This query does not use the order_id field. To return the results for such a query, all shards need to provide data for the query. This is called a scatter-gather query. It would have been more performant and scalable to have orders for each customer localized to a shard, avoiding scatter-gather, cross-shard queries. They realize that the optimal shard key would be "customer_id: 1, order_id: 1" rather than just the order_id . With MongoDB 5.0’s live resharding, the user can just run the reshard command, and MongoDB will reshard the orders collection for them using the new shard key, without having to bring the database and the application down. Watch our short Live Resharding talk from MongoDB.Live 2021 to see a demo with this exact example. Not only can you change the field(s) for a shard key, you can also review your sharding strategy, changing between range, hash, and zones. Live Resharding: Performance and Operational Considerations Even with the flexibility that live resharding gives you, it is still important to properly evaluate the selection of your shard key. Our documentation provides guidance to help you make the best choice of shard key . Of course, live resharding makes it much easier to change that key should your original choice have not been optimal, or if your application changes in a way that you hadn’t previously anticipated. If you find yourself in this situation, it is essential to plan for live resharding. What do you need to be thinking about before resharding Make sure you have sufficient storage capacity available on each node of your cluster. Since MongoDB is temporarily cloning your existing collection, spare storage capacity needs to be at least 1.2x the size of the collection you are going to reshard. This is because we need 20% more storage in order to buffer writes that occur during the resharding process. For example, if the size of the collection you want to reshard is 2 TB compressed, you should have at least 2.4 TB of free storage in the cluster before starting the resharding operation. While the resharding process is efficient, it will still consume additional compute and I/O resources. You should therefore make sure you are not consistently running the database at or close to peak system utilization. If you see CPU usage in excess of 80% or I/O usage above 50%, you should scale up your cluster to larger instance sizes before resharding. Once resharding is done, it's fine to scale back down to regular instance sizes. Before you run resharding, you should update any queries that reference the existing shard key to include both the current shard key and the new shard key. When resharding is complete, you can remove the old shard key from your queries. Review the resharding requirements documentation for a full run down on the key factors to consider before resharding your collection. What should you expect during resharding? Total duration of the resharding process is dependent on the number of shards, the size of your collection, and the write load to your collection. For a constant data size, the more shards the shorter the resharding duration. From a simple POC on MongoDB Atlas, a 100 GB collection took just 2 hours 45 minutes to reshard on a 4-shard cluster and 5 hours 30 minutes on a 2-shard cluster. The process scales up and down linearly with data size and number of shards – so a 1 TB collection will take 10 times longer to reshard than a 100GB collection. Of course your mileage may vary based on the read/write ratio of your application along with the speed and quality of your underlying hardware infrastructure. While resharding is in flight, you should expect the following impacts to application performance: The latency and throughput of reads against the collection that is being resharded will be unaffected . Even though we are writing to the existing collection and then applying oplog entries to both its replicas and to the cloned collection, you should expect to see negligible impact to write latency given enough spare CPU. If your cluster is CPU-bound, expect a latency increase of 5 to 10% during the cloning phase and 20 to 50% during the applying phase (*) . As long as you meet the aforementioned capacity requirements, the latency and throughput of operations to other collections in the database won't be impacted . (*) Note: If you notice unacceptable write latencies to your collection, we recommend you stop resharding, increase your shard instance sizes, and then run resharding again. The abort and cleanup of the cloned collection are instantaneous. If your application has time periods with less traffic, reshard your collection during that time if possible. All of your existing isolation, consistency, and durability guarantees are honored while resharding is running. The process itself is resilient and crash-safe, so if any shard undergoes a replica set election, there is no impact to resharding – it will simply resume when the new primary has been elected. You can monitor the resharding progress with the $currentOp pipeline stage. It will report an estimate of the remaining time to complete the resharding operation. You can also abort the resharding process at any time. What happens after resharding is complete? When resharding is done and the two collections are in sync, MongoDB will automatically cut over to the new collection and remove the old collection for you, reclaiming your storage and returning latency back to normal. By default, cutover takes up to two seconds — during which time the collection will not accept writes, and so your application will see a short spike in write latency. Any writes that timeout are automatically retried by our drivers , so exceptions are not surfaced to your users. The cutover interval is tunable: Resharding will be quicker if you raise the interval above the two second default, with the trade-off that the period of write unavailability will be longer. By dialing it down below two seconds, the interval of write unavailability will be shorter. However, the resharding process will take longer to complete, and the odds of the window ever being short enough to cutover will be diminished. You can block writes early to force resharding to complete by issuing the commitReshardCollection command. This is useful if the current time estimate to complete the resharding operation is an acceptable duration for your collection to block writes. What you Get with Live Resharding Live sharding is available wherever you run MongoDB – whether that’s in our fully managed Atlas data platform in the cloud , with Enterprise Advanced , or if using the Community Edition of MongoDB. To recap how you benefit from live resharding: Evolve with your apps with simplicity and resilience: As your applications evolve or as you need to improve on the original choice of shard key, a single command kicks off resharding. This process is automated, resilient, and non-disruptive to your application. Compress weeks/months to minutes/hours: Live resharding is fully automated, so you eliminate disruptive and lengthy manual data migrations. To make scaling out even easier, you can evaluate the effectiveness of different shard keys in dev/test environments before committing your choice to production. Even then, you can change your shard key when you want to. Extend flexibility and agility across every layer of your application stack: You have seen how MongoDB’s flexible document data model instantly adapts as you add new features to your app. With live resharding you get that same flexibility when you shard. New features or new requirements? Simply reshard as and when you need to. Summary Live Resharding is a huge step forward in the state of distributed systems, and is just the start of an exciting and fast-paced MongoDB roadmap that will make sharding even easier, more flexible, and automated. If you want to dig deeper, please take a look at the Live Resharding session recording from our developer conference and review the resharding documentation . To learn more about MongoDB 5.0 and our new Rapid Releases, download our guide to what’s new in MongoDB .
Data and the European Landscape: 3 Trends for 2022
The past two years have brought massive changes for IT leaders: large and complex cloud migrations; unprecedented numbers of people suddenly working, shopping and learning from home; and a burst in demand for digital-first experiences. Like everyone else, we are hoping that 2022 isn’t so disruptive (fingers crossed!), but our customer conversations in Europe do lead us to believe the new year will bring new business priorities. We’re already noticing changes in conversations around vendor lock-in, thanks to the Digital Markets Act, a new enthusiasm for combining operational and analytical data to drive new insights faster, and a more strategic embrace of sustainability. Here’s how we see these trends playing out in 2022. Digital markets act draws new attention to cloud vendor lock-in in Europe We’ve heard plenty about the European Commission’s Digital Markets Act , which, in the name of ensuring fair and open digital markets, would place new restrictions on companies that are deemed to be digital “gatekeepers” in the region. That discussion will be nothing compared to the vigorous debate we expect once the EU begins the very tricky political business of determining exactly which companies will fall under the act. If the EU sets the bar for revenues, users, and market size high enough, it’s possible that the regulation will end up affecting only Facebook, Amazon, Google, Apple, and Microsoft. But a European group representing 2,500 CIOs and almost 700 organisations is now pushing to have the regulation encompass more software companies. Their main concern centers around “distorted competition” in cloud infrastructure services and a worry that companies are being locked into one cloud vendor. A trend that will likely increase in 2022 that pushes back on cloud vendor lock-in is embracing multi-cloud strategies. We should expect to see more organisations in the region pursuing multi-cloud environments as a means to improve business continuity and agility whilst being able to access best of breed services from each cloud provider. As we have always said …”it’s fine to date your cloud provider….but don’t ever marry them.” The convergence of operational and analytical data The processing of operational and analytical data is almost always contained in different data systems, each tuned to that use case and managed by separate teams. But because that data lives in separate places, it’s almost impossible for organisations to generate insights and automate actions in real time, against live data. We believe 2022 is the year we’ll see a critical mass of companies in the region make significant progress toward a convergence of their operational and analytical data. We’re already starting to see some of the principles of microservices in operational applications, such as domain ownership, be applied to analytics as well. We’re hearing about this from so many of our customers locally, who are looking at MongoDB as a data platform that allows them to perform queries across both real-time and historical data, using a unified platform and a single query API. This results in the applications they are building becoming more intelligent and contextual to their users, while avoiding dependencies on centralized analytics teams that otherwise slow down how quickly new, data-driven experiences can be released. Sustainability drives local strategic IT choice Technology always has some environmental cost. Sometimes that’s obvious — such as the energy needs and emissions associated with Bitcoin mining. More often, though, the environmental costs are well hidden. The European Green Deal commits the European Union to reducing emissions by 55% by 2030, with a focus on sustainable industry. With the U.N. Climate Change Conference (COP26) recently completed in Glasgow, and coming off the hottest European summer on record, climate issues have become top of mind. That means our customers are increasingly looking to make their technical operations more sustainable — including in their choice of cloud provider and data centers. According to research from IDC , more than 20% of CxOs say that sustainability is now important in selecting a strategic cloud service provider, and some 29% of CxOs are including sustainability into their RFPs for cloud services. Most interesting, 26% say they are willing to switch to providers with better sustainability credentials. Historically, it’s been difficult to make a switch like that. That’s part of the reason we built MongoDB Atlas — to give our customers the flexibility to run in any region , with any of the three largest cloud providers, and to make it easy to switch between them, and even to run a single database cluster across them. Publicly available information about the footprint of individual regions and even single data centers will make it simpler for companies to make informed decisions. Already, at least one cloud platform has added indicators to regions with the lowest carbon footprint. So while we hope 2022 will not be as disruptive as the years gone by, it will still bring seminal changes to our industry. These changes will also prompt organisations toward more agile, cohesive and sustainable data platform strategies as they seek to gain competitive advantage and exceed customer expectations. Source: IDC, European Customers Engage Services Providers at All Stages of Their Cloud Journey, IDC Survey Spotlight, Doc #EUR248484021, Dec 2021