field-level-encryption

3132 results

Don’t Just Build Agents, Build Memory-Augmented AI Agents

Insight Breakdown: This piece aims to reveal that regardless of architectural approach—whether Anthropic's multi-agent coordination or Cognition's single-threaded consolidation—sophisticated memory management emerges as the fundamental determinant of agent reliability, believability, and capability. It marks the evolution from stateless AI applications toward truly intelligent, memory-augmented systems that learn and adapt over time. AI agents are intelligent computational systems that can perceive their environment, make informed decisions, use tools, and, in some cases, maintain persistent memory across interactions—evolving beyond stateless chatbots toward autonomous action. Multi-agent systems coordinate multiple specialized agents to tackle complex tasks, like a research team where different agents handle searching, fact-checking, citations and research synthesis. Recently, two major players in the AI space released different perspectives on how to build these systems. Anthropic released an insightful piece highlighting their learnings on building multi-agent systems for deep research use cases. Cognition also released a post titled: " Don't Build Multi-Agents ," which appears to contradict Anthropic's approach directly. Two things stand out: Both pieces are right Yes, this sounds contradictory, but working with customers building agents of all scales and sizes in production, we find that both the use case and application mode, in particular, are key factors to consider when determining how to architect your agent(s). Anthropic's multi-agent approach makes sense for deep research scenarios where sustained, comprehensive analysis across multiple domains over extended periods is required. Cognition's single-agent approach is optimal for conversational agents or coding tasks where consistency and coherent decision-making are paramount. The application mode—whether research assistant, conversational agent, or coding assistant—fundamentally shapes the optimal memory architecture. Anthropic also highlights this point when discussing the downside of multi-agent architecture. For instance, most coding tasks involve fewer truly parallelizable tasks than research, and LLM agents are not yet great at coordinating and delegating to other agents in real time. Anthropic, Building Multi-Agent Research System Both pieces are saying the same thing Memory is the foundational challenge that determines agent reliability, believability, and capability. Anthropic emphasizes sophisticated memory management techniques (compression, external storage, context handoffs) for multi-agent coordination. Cognition emphasizes context engineering and continuous memory flow to prevent the fragmentation that destroys agent reliability. Both teams arrived at the same core insight: agents fail without robust memory management . Anthropic chose to solve memory distribution across multiple agents, while Cognition chose to solve memory consolidation within single agents. The key takeaway from both pieces for AI Engineers or anyone developing an agentic platform is not just build agents, build Memory Augmented AI agents . With that out of the way, the rest of this piece will provide you with the essential insights from both pieces that we think are important and point to the memory management principles and design patterns we’ve observed in our customers’ building agents. The key insights If you are building your agentic platform from scratch, you can extract much value from Anthropic's approach to building multi-agent systems, particularly their sophisticated memory management principles, which are essential for effective agentic systems. Their implementation reveals critical design considerations, including techniques to overcome context window limitations through compression, function calling, and storage functions that enable sustained reasoning across extended multi-agent interactions—foundational elements that any serious agentic platform must address from the architecture phase. Key insights: Agents are overthinkers Multi-agent systems trade efficiency for capability Systematic agent observation reveals failure patterns Context windows remain insufficient for extended sessions Context compression enables distributed memory management Let's go a bit deeper into how these insights translate into practical implementation strategies. Agents are overthinkers Anthropic researchers mentioned using explicit guidelines to steer agents into allocating the right amount of resources (tool calls, sub-agent creation, etc.), or else, they tend to overengineer solutions. Without proper constraints, the agents would spawn excessive subagents for simple queries, conduct endless searches for nonexistent information, and apply complex multi-step processes to tasks requiring straightforward responses. Explicit guidance for agent behavior isn't entirely new—system prompts and instructions are typical parameters in most agent frameworks. However, the key insight here goes deeper than traditional prompting approaches. When agents are given access to resources such as data, tools, and the ability to create sub-agents, there needs to be explicit, unambiguous direction on how these resources are expected to be leveraged to address specific tasks. This goes beyond system prompts and instructions into resource allocation guidance, operational constraints, and decision-making boundaries that prevent agents from overengineering solutions or misusing available capabilities. Take, for example, the OpenAI Agent SDK with several parameters to describe behaviours of resources to the agent, such as handoff_description , which will be utilized in a multi-agent system built with the OpenAI SDK. This argument specifies how the subagent should be leveraged in a multi-agent system. Or the explicit argument tool_use_behavior that describes to the agent how a tool should be used, as the name suggests. The key takeaway for AI Engineers is that multi-agent system implementation requires an extensive thinking process that involves what tools the agents are expected to leverage, the subagents in the system, and how resource utilization is communicated to the calling agent in a multi-agent system. When implementing resource allocation constraints for your agents, consider traditional approaches of managing multiple specialized databases (vector DB for embeddings, graph DB for relationships, relational DB for structured data) compound the complexity problem, and introduce tech stack sprawl, an anti-pattern to rapid AI innovation. Multi-agent systems trade efficiency for capability While multi-agent architectures can utilize more tokens and parallel processing for complex tasks, Anthropic found operational costs significantly higher due to coordination overhead, context management, and the computational expense of maintaining a coherent state across multiple agents. In some cases, two heads are better than one, but they are also expensive within multi-agent systems. One thing we note here is that the use case used in Anthropic's multi-agent system is deep research. This use case requires extensive exploration of resources, including heavily worded research papers, sites, and documentation, to accumulate enough information to formulate the result of this use case (which is typically a 2000+ word essay on the user’s starting prompt). In other use cases, such as automated workflow with agents representing processes within the workflow, there might not be as much token consumption, especially if the process encapsulates deterministic steps such as database reads and write operations, and its output is execution results that are sentences or short summaries. The coordination overhead challenge becomes particularly acute when agents need to share state across different storage systems. Rather than managing complex data synchronization between specialized databases, MongoDB's native ACID compliance ensures that multi-agent handoffs maintain data integrity without external coordination mechanisms. This unified approach reduces both the computational overhead of distributed state management and the engineering complexity of maintaining consistency across multiple storage systems. Context compression enables distributed memory management Beyond reducing inference costs, compression techniques allow multi-agent systems to maintain shared context across distributed agents. Anthropic's approach involves summarizing completed work phases and storing essential information in external memory before agents transition to new tasks. This, coupled with the insight that Context windows remain insufficient for extended sessions, points to the fact that prompt compression or compaction techniques are still relevant and useful in a world where LLMs have extensive context windows. Even with a 200K token (approximately 150,000 words) capacity, Anthropic’s agents in multi-round conversations require sophisticated context management strategies, including compression, external memory offloading, and spawning fresh agents when limits are reached. We previously partnered with Andrew Ng and DeepLearning AI on a course on prompt compression techniques and retrieval-augmented generation (RAG) optimization. Systematic agent observation reveals failure patterns Systematic agent observation represents one of Anthropic's most practical insights. Essentially, rather than relying on guesswork (or vibes), the team built detailed simulations using identical production prompts and tools and then systematically observed step-by-step execution to identify specific failure modes. This phase in an agentic system has an extensive operational cost. From our perspective, working with customers building agents in production, this methodology addresses a critical gap most teams face: understanding how your agents actually behave versus how you think they should behave . Anthropic's approach immediately revealed concrete failure patterns that many of us have encountered but struggled to diagnose systematically. Their observations uncovered agents overthinking simple tasks, like we mentioned earlier, using verbose search queries that reduced effectiveness, and selecting inappropriate tools for specific contexts. As they note in their piece: " This immediately revealed failure modes: agents continuing when they already had sufficient results, using overly verbose search queries, or selecting incorrect tools. Effective prompting relies on developing an accurate mental model of the agent. " The key insight here is moving beyond trial-and-error prompt engineering toward purposeful debugging . Instead of making assumptions about what should work, Anthropic demonstrates the value of systematic behavioral observation to identify the root causes of poor performance. This enables targeted prompt improvements based on actual evidence rather than intuition. We find that gathering, tracking, and storing agent process memory serves a dual critical purpose: not only is it vital for agent context and task performance, but it also provides engineers with the essential data needed to evolve and maintain agentic systems over time. Agent memory and behavioral logging remain the most reliable method for understanding system behavior patterns, debugging failures, and optimizing performance, regardless of whether you implement a single comprehensive agent or a system of specialized subagents collaborating to solve problems. MongoDB's flexible document model naturally accommodates the diverse logging requirements for both operational memory and engineering observability within a single, queryable system. One key piece that would be interesting to know from the Anthropic research team is what evaluation metrics they use. We’ve spoken extensively about evaluating LLMs in RAG pipelines, but what new agentic system evaluation metrics are developers working towards? We are answering these questions ourselves and have partnered with Galileo, a key player in the AI Stack, whose focus is purely on evaluating RAG and Agentic applications and making these systems reliable for production. Our learning will be shared in this upcoming webinar , taking place on July 17, 2025. However, for anyone building agentic systems, this represents a shift in development methodology—building agents requires building the infrastructure to understand them, and sandbox environments might become a key component of the evaluation and observability stack for Agents. Advanced implementation patterns Beyond the aforementioned core insights, Anthropic's research reveals several advanced patterns worth examining: The Anthropic piece hints at the implementation of advanced retrieval mechanisms that go beyond vector-based similarity between query vectors and stored information. Their multi-agent architecture enables sub-agents to call tools (an approach also seen in MemGPT ) to store their work in external systems, then pass lightweight references—presumably unique identification numbers of summarized memory components—back to the coordinator. We generally emphasize the importance of the multi-model retrieval approach to our customers and developers, where hybrid approaches combine multiple retrieval methods—using vector search to understand intent while simultaneously performing text search for specific product details. MongoDB's native support for vector similarity search and traditional indexing within a single system eliminates the need for complex reference management across multiple databases, simplifying the coordination mechanisms that Anthropic's multi-agent architecture requires. The Anthropic team implements continuity in the agent execution process by establishing clear boundaries between task completion and summarizing the current phase before moving to the next task. This creates a scalable system where memory constraints don't bottleneck the research process, allowing for truly deep and comprehensive analysis that spans beyond what any single context window could accommodate. In a multi-agent pipeline, each sub-agent produces partial results—intermediate summaries, tool outputs, extracted facts—and then hands them off into a shared “memory” database. Downstream agents will then read those entries, append their analyses, and write updated records back. Because these handoffs happen in parallel, you must ensure that one agent’s commit doesn’t overwrite another’s work or that a reader doesn’t pick up a half-written summary. Without atomic transactions and isolation guarantees, you risk: Lost updates , where two agents load the same document, independently modify it, and then write back, silently discarding one agent’s changes. Dirty or non-repeatable reads , where an agent reads another’s uncommitted or rolled-back write, leading to decisions based on phantom data. To coordinate these handoffs purely in application code would force you to build locking layers or distributed consensus, quickly becoming a brittle, error-prone web of external orchestrators. Instead, you want your database to provide those guarantees natively so that each read-modify-write cycle appears to execute in isolation and either fully succeeds or fully rolls back. MongoDB's ACID compliance becomes crucial here, ensuring that these boundary transitions maintain data integrity across multi-agent operations without requiring external coordination mechanisms that could introduce failure points. Application mode is crucial when discussing memory implementation . In Anthropic's case, the application functions as a research assistant, while in other implementations, like Cognition's approach, the application mode is conversational. This distinction significantly influences how agents operate and manage memory based on their specific application contexts. Through our internal work and customer engagements, we extend this insight to suggest that application mode affects not only agent architecture choices but also the distinct memory types used in the architecture. AI agents need augmented memory Anthropic’s research makes one thing abundantly clear: context window is not all you need. This extends to the key point that memory and agent engineering are two sides of the same coin. Reliable, believable, and truly capable agents depend on robust, persistent memory systems that can store, retrieve, and update knowledge over long, complex workflows. As the AI ecosystem continues to innovate on memory mechanisms, mastering sophisticated context and memory management approaches will be the key differentiator for the next generation of successful agentic applications. Looking ahead, we see “Memory Engineering” or “Memory Management” emerge as a key specialization within AI Engineering, focused on building the foundational infrastructure that lets agents remember, reason, and collaborate at scale. For hands-on guidance on memory management, check out our webinar on YouTube, which covers essential concepts and proven techniques for building memory-augmented agents. Head over to the MongoDB AI Learning Hub to learn how to build and deploy AI applications with MongoDB.

July 9, 2025

Build an AI-Ready Data Foundation with MongoDB Atlas on Azure

It’s time for a database reality check. While conversations around AI usually focus on its immense potential, these advancements are also bringing developers face to face with an immediate challenge: Their organizations’ data infrastructure isn’t ready for AI. Many developers now find themselves trying to build tomorrow’s applications on yesterday’s foundations. But what if your database could shift from bottleneck to breakthrough? Is your database holding you back? Traditional databases were built for structured data in a pre-AI world—they’re simply not designed to handle today’s need for flexible, real-time data processing. Rigid schemas force developers to spend time managing database structure instead of building features, while separate systems for operational data and analytics create costly delays and complexity. Your data architecture might be holding you back if: Your developers spend more time wrestling with data than innovating. AI implementation feels like forcing a square peg into a round hole. Real-time analytics are anything but real-time. Go from theory to practice: Examples of modern data architecture at work Now is the time to rethink your data foundation by moving from rigid to flexible schemas that adapt as applications evolve. Across industries, leading organizations are unifying operational and analytical structures to eliminate costly synchronization processes. Most importantly, they’re embracing databases that speak developers’ language. In the retail sector , business demands include dynamic pricing that responds to market conditions in real-time. Using MongoDB Atlas with Azure OpenAI from Microsoft Azure, retailers are implementing sophisticated pricing engines that analyze customer behavior and market conditions, enabling data-driven decisions at scale. In the healthcare sector , organizations can connect MongoDB Atlas to Microsoft Fabric for advanced imaging analysis and results management, streamlining the flow of critical diagnostic information while maintaining security and compliance. More specifically, when digital collaboration platform Mural faced a 1,700% surge in users, MongoDB Atlas on Azure handled its unstructured application data. The results aligned optimally with modern data principles: Mural’s small infrastructure team maintained performance during massive growth, while other engineers were able to focus on innovation rather than database management. As noted by Mural’s Director of DevOps, Guido Vilariño, this approach enabled Mural’s team to “build faster, ship faster, and ultimately provide more expeditious value to customers.” This is exactly what happens when your database becomes a catalyst rather than an obstacle. Shift from “database as storage” to “database as enabler” Modern databases do more than store information—they actively participate in application intelligence. When your database becomes a strategic asset rather than just a record-keeping necessity, development teams can focus on innovation instead of infrastructure management. What becomes possible when data and AI truly connect? Intelligent applications can combine operational data with Azure AI services. Vector search capabilities can enhance AI-driven features with contextual data. Applications can handle unpredictable workloads through automated scaling. Seamless integration occurs between data processing and AI model deployment. Take the path to a modern data architecture The deep integration between MongoDB Atlas and Microsoft’s Intelligent Data Platform eliminates complex middleware, so organizations can streamline their data architecture while maintaining enterprise-grade security. The platform unifies operational data, analytics, and AI capabilities—enabling developers to build modern applications without switching between multiple tools or managing separate systems. This unified approach means security and compliance aren’t bolt-on features—they’re core capabilities. From Microsoft Entra ID integration for access control to Azure Key Vault for data protection, the platform provides comprehensive security while simplifying the development experience. As your applications scale, the infrastructure scales with you, handling everything from routine workloads to unexpected traffic spikes without adding operational complexity. Make your first move Starting your modernization journey doesn’t require a complete infrastructure overhaul or the disruption of existing operations. You can follow a gradual migration path that prioritizes business continuity and addresses specific challenges. The key is having clear steps for moving from legacy to modern architecture. Make decisions that simplify rather than complicate: Choose platforms that reduce complexity rather than add to it. Focus on developer experience and productivity. Prioritize solutions that scale with your needs. For example, you can begin with a focused proof of concept that addresses a specific challenge—perhaps an AI feature that’s been difficult to implement or a data bottleneck that’s slowing development. Making small wins in these areas demonstrates value quickly and builds momentum for broader adoption. As you expand your implementation, focus on measurable results that matter to your organization. Tracking these metrics—whether they’re developer productivity, application performance, or new capabilities—helps justify further investment and refine your approach. Avoid these common pitfalls As you undertake your modernization journey, avoid these pitfalls: Attempting to modernize everything simultaneously: This often leads to project paralysis. Instead, prioritize applications based on business impact and technical feasibility. Creating new data silos: In your modernization efforts, the goal must be integration and simplification. Adding complexity: remember that while simplicity scales, complexity compounds. Each decision should move you toward a more streamlined architecture, not a more convoluted one. The path to a modern, AI-ready data architecture is an evolution, not a revolution. Each step builds on the last, creating a foundation that supports not just today’s applications but also tomorrow’s innovations. Take the next step: Ready to modernize your data architecture for AI? Explore these capabilities further by watching the webinar “ Enhance Developer Agility and AI-Readiness with MongoDB Atlas on Azure .” Then get started on your modernization journey! Visit the MongoDB AI Learning Hub to learn more about building AI applications with MongoDB.

July 8, 2025

Why Relational Databases Are So Expensive to Enterprises

Relational databases were designed with a foundational architecture based on the premise of normalization. This principle—often termed “3rd Normal Form”—dictates that repeating groups of information are systematically cast out into child tables, allowing them to be referenced by other entities. While this design inherently reduces redundancy, it significantly complicates underlying data structures. Figure 1. Relational database normalization structure for insurance policy data. Every entity in a business process, its attributes, and their complex interrelations must be dissected and spread across multiple tables—policies, coverages and insured items, each becoming a distinct table. This traditional decomposition results in a convoluted network of interconnected tables that developers must constantly navigate to piece back together the information they need. The cost of relational databases Shrewd C-levels and enterprise portfolio managers are interested in managing cost and risk, not technology. Full stop. This decomposition into countless interconnected tables comes at a significant cost across multiple layers of the organization. Let’s break down the cost of relational databases for three different personas/layers: Developer and software layer Let’s imagine that as a developer you’re dealing with a business application that must create and manage customers and their related insurance policies. That customer has addresses, coverages, and policies. Each policy has insured objects and each object has its own specificities. If you’re building relational databases, it’s likely that you may be dealing with a dozen or more database objects that represent the aggregate business object of policy. In this design, all of these tables require you to break up the logical dataset into many parts, insert that data across many tables, and then execute complex JOIN operations when you wish to retrieve and edit it. As a developer, you’re familiar with working with object-oriented design, and to you, all of those tables likely represent one to two major business objects: the customer and the policy. With MongoDB, these dozen or more relational database tables can be modeled as one single object (see Figure 2). Figure 2. Relational database complexity vs. MongoDB document model for insurance policy data. At the actual business application-scale, with production data volumes, we start to truly see just how complicated this can get for the developers. In order to render it meaningfully to the application user interface, it must be constantly joined back together. When it’s edited, it must again be split apart, and saved into those dozen or more underlying database tables. Relational is therefore not only a more complex storage model, but it’s also cognitively harder to figure out. It’s not uncommon for a developer who didn’t design the original database, and is newer to the application team, to struggle to understand, or even mis-interpret a legacy relational model. Additionally, the normalized relational requires more code to be written for basic create, update, and read operations. An object relational mapping layer will often be introduced to help translate the split-apart representation in the database to an interpretation that the application code can more easily navigate. Why is this so relevant? Because more code equals more developer time and ultimately more cost. Overall it takes noticeably longer to design, build, and test a business feature when using a relational database than it would with a database like MongoDB. Finally, changing a relational schema is a cumbersome process. ALTER TABLE statements are required to change the underlying database object structure. Since relational tables are like spreadsheets, they can only have one schema at any given point in time. Your business feature requires you to add new fields? You must alter the single, fixed schema that is bound to the underlying table. This might seem to be a quick and easy process to execute in a development environment, but by the time you get to the production database, deliberate care, caution must be applied, and extra steps are mandatory to ensure that you do not jeopardize the integrity of the business applications that use the database. Altering production table objects incurs significant risk, so organizations must put in place lengthy and methodical processes that ensure change is thoroughly tested and scheduled, in order to minimize possible disruption. The fundamental premise of normalization, and its corresponding single, rigid and predefined table structures are a constant bottleneck when it comes to speed and cost to market. Infrastructure administrator Performing JOIN operations across multiple database objects at runtime requires more computational resources than if you were to retrieve all of the data you need from a single database object. If your applications are running against well-designed, normalized relational databases, your infrastructure is most certainly feeling the resource impact of those joins. Across a portfolio of applications, the hardware costs of normalization add up. For a private data center, it can mean the need to procure additional, expensive hardware. For the cloud, it likely means your overall spending is higher than that of a portfolio running on a more efficient design (like MongoDB’s Document Model). Ultimately, MongoDB allows more data-intensive workloads to be run on the same server infrastructure than that of relational databases, and this directly translates to lower infrastructure costs. In addition to being inefficient at the hardware layer, normalized relational tables result in complex ways in which the data must be conditionally joined together and queried, especially within the context of actual business rules. Application developers have long pushed this complex logic ‘down to the database’ in an effort to reduce complexity at the application layer, as well as preserve application tier memory and cpu. This decades-long practice can be found across every industry, and in nearly every flavor and variant of relational database platforms. The impact is multi-fold. Database administrators, or those specialized in writing and modifying complex SQL ‘stored procedures,’ are often called upon to augment the application developers who maintain code at the application tier. This external dependency certainly slows down delivery teams tasked with making changes to these applications, but it’s just the tip of the iceberg. Below the waterline, there exists a wealth of complexity. Critical application business logic ends up bifurcated; some in the database as SQL, and some in the application tier in a programming language. The impact to teams wishing to modernize or refactor legacy applications is significant in terms of the level of complexity that must be dealt with. At the root of this complexity and phenomenon is the premise of normalized database objects, which would otherwise be a challenge to join and search, if done at the application tier. Portfolio manager An Application Portfolio Manager is responsible for overseeing an organization’s suite of software applications, ensuring they align with business goals, provide value, and are managed efficiently. The role typically involves evaluating, categorizing, and rationalizing application catalogs to reduce redundancy, lower costs, and enhance the overall ability to execute the business strategy. In short, the portfolio manager cares deeply about speed, complexity, and cost to market. At a macro level, a portfolio with relational databases translates into slower teams that deliver fewer features per agile cycle. In addition, a larger staff is needed as database/infrastructure admins are a necessary interface between the developers and the database. Unlike relational databases, MongoDB allows developers to maintain more than simply one version of a schema at a given time. In addition, documents contain both data and structure, which means you don’t need the complex, lengthy, and risky change cycles that relational demands, to simply add or edit existing fields within the database. The result? Software teams deliver more features than is possible with relational databases, with less time, cost, and complexity. Something the business owners of the portfolio will certainly appreciate, even if they don’t understand the underlying technology. Add in the fact that MongoDB runs more efficiently on the same hardware than relational databases, and your portfolio will see even more cost benefits. Beyond relational databases: A new path to efficiency and agility The fundamental premise of normalization, and its corresponding single, rigid, and predefined table structures are a constant bottleneck when it comes to speed, cost, and complexity to market. At a time when the imperative is to leverage AI to lower operating expenses, the cost, complexity, and agility of the underlying database infrastructure needs to be scrutinized. In contrast, MongoDB’s flexible Document Model offers a superior, generational step-change forward. One that enables your developers to move more quickly, runs more efficiently on anyone's hardware, yours or a cloud data center, and increases your application portfolio's speed to market for advancing the business agenda. Transform your enterprise data architecture today. Start with our free Overview of MongoDB and the Document Model course at MongoDB University, then experience the speed and flexibility firsthand with a free MongoDB Atlas cluster.

July 7, 2025

Real-Time Threat Detection With MongoDB & PuppyGraph

Security operations teams face an increasingly complex environment. Cloud-native applications, identity sprawl, and continuous infrastructure changes generate a flood of logs and events. From API calls in AWS to lateral movement between virtual machines, the volume of telemetry is enormous—and it’s growing. The challenge isn’t just scale. Its structure. Traditional security tooling often looks at events in isolation, relying on static rules or dashboards to highlight anomalies. But real attacks unfold as chains of related actions: A user assumes a role, launches a resource, accesses data, and then pivots again. These relationships are hard to capture with flat queries or disconnected logs. That’s where graph analytics comes in. By modeling your data as a network of users, sessions, identities, and events, you can trace how threats emerge and evolve. And with PuppyGraph, you don’t need a separate graph database or batch pipelines to get there. In this post, we’ll show how to combine MongoDB and PuppyGraph to analyze AWS CloudTrail data as a graph—without moving or duplicating data. You’ll see how to uncover privilege escalation chains, map user behavior across sessions, and detect suspicious access patterns in real time. Why MongoDB for cybersecurity data MongoDB is a popular choice for managing security telemetry. Its document-based model is ideal for ingesting unstructured and semi-structured logs like those generated by AWS CloudTrail, GuardDuty, or Kubernetes audit logs. Events are stored as flexible JSON documents, which evolve naturally as logging formats change. This flexibility matters in security, where schemas can shift as providers update APIs or teams add new context to events. MongoDB handles these changes without breaking pipelines or requiring schema migrations. It also supports high-throughput ingestion and horizontal scaling, making it well-suited for operational telemetry. Many security products and SIEM backends already support MongoDB as a destination for real-time event streams. That makes it a natural foundation for graph-based security analytics: The data is already there—rich, semi-structured, and continuously updated. Why graph analytics for threat detection Modern security incidents rarely unfold as isolated events. Attackers don’t just trip a single rule—they navigate through systems, identities, and resources, often blending in with legitimate activity. Understanding these behaviors means connecting the dots across multiple entities and actions. That’s precisely what graph analytics excels at. By modeling users, sessions, events, and assets as interconnected nodes and edges, analysts can trace how activity flows through a system. This structure makes it easy to ask questions that involve multiple hops or indirect relationships—something traditional queries often struggle to express. For example, imagine you’re investigating activity tied to a specific AWS account. You might start by counting how many sessions are associated with that account. Then, you might break those sessions down by whether they were authenticated using MFA. If some weren’t, the next question becomes: What resources were accessed during those unauthenticated sessions? This kind of multi-step investigation is where graph queries shine. Instead of scanning raw logs or filtering one table at a time, you can traverse the entire path from account to identity to session to event to resource, all in a single query. You can also group results by attributes like resource type to identify which services were most affected. And when needed, you can go beyond metrics and pivot to visualization, mapping out full access paths to see how a specific user or session interacted with sensitive infrastructure. This helps surface lateral movement, track privilege escalation, and uncover patterns that static alerts might miss. Graph analytics doesn’t replace your existing detection rules; it complements them by revealing the structure behind security activity. It turns complex event relationships into something you can query directly, explore interactively, and act on with confidence. Query MongoDB data as a graph without ETL MongoDB is a popular choice for storing security event data, especially when working with logs that don’t always follow a fixed structure. Services like AWS CloudTrail produce large volumes of JSON-based records with fields that can differ across events. MongoDB’s flexible schema makes it easy to ingest and query that data as it evolves. PuppyGraph builds on this foundation by introducing graph analytics—without requiring any data movement. Through the MongoDB Atlas SQL Interface , PuppyGraph can connect directly to your collections and treat them as relational tables. From there, you define a graph model by mapping key fields into nodes and relationships. Figure 1. Architecture of the integration of MongoDB and PuppyGraph. This makes it possible to explore questions that involve multiple entities and steps, such as tracing how a session relates to an identity or which resources were accessed without MFA. The graph itself is virtual. There’s no ETL process or data duplication. Queries run in real time against the data already stored in MongoDB. While PuppyGraph works with tabular structures exposed through the SQL interface, many security logs already follow a relatively flat pattern: consistent fields like account IDs, event names, timestamps, and resource types. That makes it straightforward to build graphs that reflect how accounts, sessions, events, and resources are linked. By layering graph capabilities on top of MongoDB, teams can ask more connected questions of their security data, without changing their storage strategy or duplicating infrastructure. Investigating CloudTrail activity using graph queries To demonstrate how graph analytics can enhance security investigations, we’ll explore a real-world dataset of AWS CloudTrail logs. This dataset originates from flaws.cloud , a security training environment developed by Scott Piper. The dataset comprises anonymized CloudTrail logs collected over 3.5 years, capturing a wide range of simulated attack scenarios within a controlled AWS environment. It includes over 1.9 million events, featuring interactions from thousands of unique IP addresses and user agents. The logs encompass various AWS API calls, providing a comprehensive view of potential security events and misconfigurations. For our demonstration, we imported a subset of approximately 100,000 events into MongoDB Atlas. By importing this dataset into MongoDB Atlas and applying PuppyGraph’s graph analytics capabilities, we can model and analyze complex relationships between accounts, identities, sessions, events, and resources. Demo Let’s walk through the demo step by step! We have provided all the materials for this demo on GitHub . Please download the materials or clone the repository directly. If you’re new to integrating MongoDB Atlas with PuppyGraph, we recommend starting with the MongoDB Atlas + PuppyGraph Quickstart Demo to get familiar with the setup and core concepts. Prerequisites A MongoDB Atlas account (free tier is sufficient) Docker Python 3 Set up MongoDB Atlas Follow the MongoDB Atlas Getting Started guide to: Create a new cluster (free tier is fine). Add a database user. Configure IP access. Note your connection string for the MongoDB Python driver (you’ll need it shortly). Download and import CloudTrail logs Run the following commands to fetch and prepare the dataset: wget https://summitroute.com/downloads/flaws_cloudtrail_logs.tar mkdir -p ./raw_data tar -xvf flaws_cloudtrail_logs.tar --strip-components=1 -C ./raw_data gunzip ./raw_data/*.json.gz Create a virtual environment and install dependencies: # On some Linux distributions, install `python3-venv` first. sudo apt-get update sudo apt-get install python3-venv # Create a virtual environment, activate it, and install the necessary packages python -m venv venv source venv/bin/activate pip install ijson faker pandas pymongo Import the first chunk of CloudTrail data (replace the connection string with your Atlas URI): export MONGODB_CONNECTION_STRING="your_mongodb_connection_string" python import_data.py raw_data/flaws_cloudtrail00.json --database cloudtrail This creates a new cloudtrail database and loads the first chunk of data containing 100,000 structured events. Enable Atlas SQL interface and get JDBC URI To enable graph access: Create an Atlas SQL Federated Database instance. Ensure the schema is available (generate from sample, if needed). Copy the JDBC URI from the Atlas SQL interface. See PuppyGraph’s guide for setting up MongoDB Atlas SQL . Start PuppyGraph and upload the graph schema Start the PuppyGraph container: docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 \ -e PUPPYGRAPH_PASSWORD=puppygraph123 \ -d --name puppy --rm --pull=always puppygraph/puppygraph:stable Log in to the web UI at http://localhost:8081 with: Username: puppygraph. Password: puppygraph123. Upload the schema: Open schema.json. Fill in your JDBC URI, username, and password. Upload via the Upload Graph Schema JSON section or run: curl -XPOST -H "content-type: application/json" \ --data-binary @./schema.json \ --user "puppygraph:puppygraph123" localhost:8081/schema Wait for the schema to upload and initialize (approximately five minutes). Figure 2: A graph visualization of the schema, which models the graph from relational data. Run graph queries to investigate security activity Once the graph is live, open the Query panel in PuppyGraph’s UI. Let's say we want to investigate the activity of a specific account. First, we count the number of sessions associated with the account. Cypher: MATCH (a:Account)-[:HasIdentity]->(i:Identity) -[:HasSession]->(s:Session) WHERE id(a) = "Account[811596193553]" RETURN count(s) Gremlin: g.V("Account[811596193553]") .out("HasIdentity").out("HasSession").count() Figure 3. Graph query in the PuppyGraph UI. Then, we want to see how many of these sessions are MFA-authenticated or not. Cypher: MATCH (a:Account)-[:HasIdentity]->(i:Identity) -[:HasSession]->(s:Session) WHERE id(a) = "Account[811596193553]" RETURN s.mfa_authenticated AS mfaStatus, count(s) AS count Gremlin: g.V("Account[811596193553]") .out("HasIdentity").out("HasSession") .groupCount().by("mfa_authenticated") Figure 4. Graph query results in the PuppyGraph UI. Next, we investigate those sessions that are not MFA authenticated and see what resources they accessed. Cypher: MATCH (a:Account)-[:HasIdentity]-> (i:Identity)-[:HasSession]-> (s:Session {mfa_authenticated: false}) -[:RecordsEvent]->(e:Event) -[:OperatesOn]->(r:Resource) WHERE id(a) = "Account[811596193553]" RETURN r.resource_type AS resourceType, count(r) AS count Gremlin: g.V("Account[811596193553]").out("HasIdentity") .out("HasSession") .has("mfa_authenticated", false) .out('RecordsEvent').out('OperatesOn') .groupCount().by("resource_type") Figure 5. PuppyGraph UI showing results that are not MFA authenticated. We show those access paths in a graph. Cypher: MATCH path = (a:Account)-[:HasIdentity]-> (i:Identity)-[:HasSession]-> (s:Session {mfa_authenticated: false}) -[:RecordsEvent]->(e:Event) -[:OperatesOn]->(r:Resource) WHERE id(a) = "Account[811596193553]" RETURN path Gremlin: g.V("Account[811596193553]").out("HasIdentity").out("HasSession").has("mfa_authenticated", false) .out('RecordsEvent').out('OperatesOn') .path() Figure 6. Graph visualization in PuppyGraph UI. Tear down the environment When you’re done: docker stop puppy Your MongoDB data will persist in Atlas, so you can revisit or expand the graph model at any time. Conclusion Security data is rich with relationships, between users, sessions, resources, and actions. Modeling these connections explicitly makes it easier to understand what’s happening in your environment, especially when investigating incidents or searching for hidden risks. By combining MongoDB Atlas and PuppyGraph, teams can analyze those relationships in real time without moving data or maintaining a separate graph database . MongoDB provides the flexibility and scalability to store complex, evolving security logs like AWS CloudTrail, while PuppyGraph adds a native graph layer for exploring that data as connected paths and patterns. In this post, we walked through how to import real-world audit logs, define a graph schema, and investigate access activity using graph queries. With just a few steps, you can transform a log collection into an interactive graph that reveals how activity flows across your cloud infrastructure. If you’re working with security data and want to explore graph analytics on MongoDB Atlas , try PuppyGraph’s free Developer Edition . It lets you query connected data, such as users, sessions, events, and resources, all without ETL or infrastructure changes.

July 7, 2025

New in MongoDB Atlas Stream Processing: External Function Support

Today we're excited to introduce External Functions, a new capability in MongoDB Atlas Stream Processing that lets you invoke AWS Lambda, directly from your streaming pipelines. The addition of External Functions to Atlas Stream Processing unlocks new ways to enrich, validate, and transform data in-flight, enabling smarter and more modular event-driven applications. This functionality is available through a new pipeline stage, $externalFunction. What are external functions? External functions allow you to integrate Atlas Stream Processing with external logic services such as AWS Lambda. This lets you reuse existing business logic, perform AI/ML inference, or enrich and validate data as it moves through your pipeline, all without needing to rebuild that logic directly in your pipeline definition. AWS Lambda is a serverless compute service that runs your code in response to events, scales automatically, and supports multiple languages (JavaScript, Python, Go, etc.). Because there’s no infrastructure to manage, Lambda is ideal for event-driven systems. Now, by using external functions, you can seamlessly plug that logic into your streaming workloads. Where $externalFunction fits in your pipeline MongoDB Atlas Stream Processing can connect to a wide range of sources and output to various sinks. The diagram below shows a typical streaming architecture: Atlas Stream Processing ingests data, enriches it with stages like $https and $externalFunction, and routes the transformed results to various destinations. Figure 1. A high-level visual of a stream processing pipeline. The $externalFunction stage can be placed anywhere in your pipeline (except as the initial source stage) allowing you to inject external logic at any step. Atlas Stream Processing supports two modes for invoking external functions—synchronous and asynchronous. Synchronous execution type In synchronous mode, the pipeline calls the Lambda function and waits for a response. The result is stored in a user-defined field (using the “as” key) and passed into the following stages. let syncEF = { $externalFunction: { connectionName: "myLambdaConnection", functionName: "arn:aws:lambda:region:account-id:function:function-name", execution: "sync", as: "response", onError: "fail", payload: [ { $replaceRoot: { newRoot: "$fullDocument.payloadToSend" } }, { $addFields: { sum: { $sum: "$randomArray" }}}, { $project: { success: 1, sum: 1 }} ] } } Let’s walk through what each part of the $externalFunction stage does in this synchronous setup: connectionName: external function connection name specified in the Connection Registry. functionName: full AWS ARN or the name of the AWS Lambda function. execution: Indicates synchronous execution ("sync") as opposed to asynchronous (“async). as: specifies the Lambda response will be stored in the “response” field. onError: behavior when the operator encounters an error (in this case "fail" stops the processor). The default is to add the event to the dead letter queue. payload: inner pipeline that allows you to customize the request body sent, using this allows you to decrease the size of the data passed and ensure only relevant data is sent to the external function. This type is useful when you want to enrich or transform a document using external logic before it proceeds through the rest of the pipeline. Asynchronous execution type In async mode, the function is called, but the pipeline does not wait for a response. This is useful when you want to notify downstream systems, trigger external workflows, or pass data into AWS without halting the pipeline. let asyncEF = { $externalFunction: { connectionName: "EF-Connection", functionName: "arn:aws:lambda:us-west-1:12112121212:function:EF-Test", execution: "async" } } Use the async execution type for propagating information outward, for example: Triggering downstream AWS applications or analytics Notifying external systems Firing off alerts or billing logic Real-world use case: Solar device diagnostics To illustrate the power of external functions, let’s walk through an example: a solar energy company wants to monitor real-time telemetry from thousands of solar devices. Each event includes sensor readings (e.g., temperature, power output) and metadata like device_id and timestamp. These events need to be processed, enriched and then stored into a MongoDB Atlas collection for dashboards and alerts. This can easily be accomplished using a synchronous external function. Each event will be sent to a Lambda function that enriches the record with a status (e.g., ok, warning, critical) as well as diagnostic comments. After which the function waits for the enriched events to be returned and then sends them to the desired MongoDB collection. Step 1: Define the external function connection First, create a new AWS Lambda connection in the Connection Registry within Atlas. You can authenticate using Atlas's Unified AWS Access, which securely connects Atlas and your AWS account. Figure 2. Adding an AWS Lambda connection in the UI. 2. Implement the lambda function Here’s a simple diagnostic function. It receives solar telemetry data, checks it against thresholds, and returns a structured result. export const handler = async (event) => { const { device_id, group_id, watts, temp, max_watts, timestamp } = event; // Default thresholds const expectedTempRange = [20, 40]; // Celsius const wattsLowerBound = 0.6 * max_watts; // 60% of max output let status = "ok"; let messages = []; // Wattage check if (watts < wattsLowerBound) { status = "warning"; messages.push(`Observed watts (${watts}) below 60% of max_watts (${max_watts}).`); } // Temperature check if (temp < expectedTempRange[0] || temp > expectedTempRange[1]) { status = "warning"; messages.push(`Temperature (${temp}°C) out of expected range [${expectedTempRange[0]}–${expectedTempRange[1]}].`); } // If multiple warnings, escalate to critical if (messages.length > 1) { status = "critical"; } return { device_id, status, timestamp, watts_expected_range: [wattsLowerBound, max_watts], temp_expected_range: expectedTempRange, comment: messages.length ? messages.join(" ") : "All readings within expected ranges." }; }; 3. Create the streaming pipeline Using VS Code, define a stream processor using the sample solar stream as input. let s = { $source: { connectionName: 'sample_stream_solar' } }; // Define the External Function let EFStage = { $externalFunction: { connectionName: "telemetryCheckExternalFunction", onError: "fail", functionName: "arn:aws:lambda:us-east-1:121212121212:function:checkDeviceTelemetry", as: "responseFromLambda", } }; // Replace the original document with the Lambda response let projectStage = { $replaceRoot: { newRoot: "$responseFromLambda" } }; // Merge the results into a DeviceTelemetryResults collection let sink = { $merge: { into: { connectionName: "IoTDevicesCluster", db: "SolarDevices", coll: "DeviceTelemetryResults" } } }; sp.createStreamProcessor("monitorSolarDevices", [s, EFStage, projectStage, sink]); sp.monitorSolarDevices.start(); Once running, the processor ingests live telemetry data, invokes the Lambda diagnostics logic, and returns enriched results to MongoDB Atlas, complete with status and diagnostic comments. 4. View enriched results in MongoDB Atlas Explore the enriched data in MongoDB Atlas using the Data Explorer . For example, filter all documents where status = "ok" after a specific date. Figure 3. Data Explorer filtering for all documents with a status of “ok” from June 14 onwards. Smarter stream processing with external logic MongoDB Atlas Stream Processing external functions allow you to enrich your data stream with logic that lives outside the pipeline, making your processing smarter and more adaptable. In this example, we used AWS Lambda to apply device diagnostics in real-time and store results in MongoDB. You could easily extend this to use cases in fraud detection, personalization, enrichment from third-party APIs, and more. Log in today to get started, or check out our documentation to create your first external function. Have an idea for how you'd use external functions in your pipelines? Let us know in the MongoDB community forum !

July 3, 2025

Introducing Query Shape Insights in MongoDB Atlas

As modern applications scale, databases are often the first to show signs of stress, especially when query patterns shift or inefficiencies arise. MongoDB has invested in building a robust observability suite to help teams monitor and optimize performance. Tools such as the Query Profiler and, more recently, Namespace Insights provide deep visibility into query behavior and collection-level activity. While powerful, these capabilities primarily focus on individual queries or collections, limiting their ability to surface systemic patterns that impact overall application performance. Today, MongoDB is excited to announce Query Shape Insights, a powerful new feature for MongoDB Atlas that offers a high-resolution, holistic view of how queries behave at scale across clusters. Query Shape Insights delivers a paradigm shift in visibility by surfacing aggregated statistics for the most resource-intensive query shapes. This accelerates root cause analysis, streamlines optimization workflows, and improves operational efficiency. Figure 1. Overview page of Query Shape Insights showing the most resource-intensive query shapes. A new granularity for performance analysis Previously, if a modern application experienced a traffic surge, it risked overloading the database with queries, causing rapid performance degradation. In those critical moments, developers and database administrators must quickly identify the queries contributing most acutely to the bottleneck. This necessitated scrutinizing logs or per-query samples. With the launch of Query Shape Insights, the top 100 query shapes are surfaced by grouping structurally similar queries with shared filters, projects, and aggregation stages into defined query shapes. These query shapes are then ranked by total execution time, offering MongoDB Atlas users greater visibility into the most resource-intensive queries. Each query shape is enriched with detailed metrics such as execution time, operation count, number of documents examined and returned, and bytes read. These metrics are rendered as time series data, enabling developers and database administrators to pinpoint when the regressions began, how long they persisted, and what triggered them. Figure 2. Detailed view of a query shape, with a pop-up displaying associated metrics. This new feature integrates seamlessly into the performance workloads teams use to monitor, debug, and optimize applications. Each query shape includes associated client metadata, such as application name, driver version, and host. This empowers teams to identify which services, applications, or teams impact performance. This level of visibility is particularly valuable for microservices-based environments, where inefficiencies might manifest across multiple teams and services. Query Shape Insights adapts based on cluster tier to support varying workload sizes. Teams can analyze the performance data of each query shape over a 7-day window. This enables them to track trends, find changes in application behavior, and identify slow regressions that might otherwise be missed. Integration with MongoDB’s observability suite Query Shape Insights was designed to enable MongoDB Atlas users to move from detection to resolution with unprecedented speed and clarity. Built directly into the MongoDB Atlas experience, this feature is a clear starting point for performance investigations. This is imperative for dynamic environments where application behavior evolves rapidly and bottlenecks must be identified and resolved rapidly. The Query Shape Insights dashboard offers comprehensive, time series–based analysis of query patterns across clusters. It enables teams to detect inefficiencies and understand when and how workloads have changed. Query Shape Insights answers critical diagnostic questions by surfacing the most resource-intensive query shapes. It identifies the workloads that consume the most resources and can help determine whether these workloads are expected or anomalous. Query Shape Insights can also help identify the emergence of new workloads and reveal how workloads have changed over time. To support this level of analysis, Query Shape Insights offers a rich set of capabilities, giving teams the clarity and speed they need to troubleshoot intelligently and maintain high-performing applications: Unified query performance view: Monitor query shapes to rapidly identify and investigate bottlenecks. Detailed query shape statistics: Track key metrics including execution time, document return counts, and execution frequency. Interactive analysis tools: Query shape drill-downs to view detailed metadata and performance trends. Flexible filtering options: Narrow analysis by shard/host, data range, namespace, or operation type. Programmatic access: Leverage MongoDB’s new Admin API endpoint to integrate query shape data with the existing observability stack. After using Query Shape Insights, MongoDB Atlas users can pivot directly to Query Profiler with filters pre-applied to the specific collection and operation type for more information beyond that provided by Query Shape Insights. Once they have traced the issue to its root, users can continue their diagnostics journey by visiting Performance Advisor . This recommends indexes tailored to the query shape, ensuring that cluster optimizations are data-driven and precise. Query Shape Insights is a leap forward in how teams manage, investigate, and respond to performance issues with MongoDB. By introducing a high-level, shape-aware view of query activity, Query Shape Insights enhances traditional reactive troubleshooting with greater clarity. This enables teams to troubleshoot faster and monitor performance effectively. Query Shape Insights is now available for all MongoDB Atlas dedicated clusters (M10 and above) deployments. Clusters must run on MongoDB 8.0 or later to access this feature. Support for Cloud Manager deployments is planned for the future. Check out MongoDB’s documentation for more details on Query Shape Insights. Start using Query Shape Insights today through your MongoDB Atlas portal.

July 2, 2025

Rapid Prototyping a Safe, Logless Reconfiguration Protocol for MongoDB with TLA+

MongoDB provides high availability and fault tolerance using replica sets, which are a group of database servers that operate a Raft-like consensus protocol. Each database write operation is replicated in a sequential log (the oplog ) and applied to all replicas. The consensus protocol guarantees that once an oplog entry is committed on a majority of replica set nodes, the write will be durable even if some nodes fail. Over time, however, we may need to change the set of servers operating within a replica set, to remove or replace failed nodes, a problem known as dynamic reconfiguration . Reconfiguration is a critical operation within replica sets for dynamically expanding a cluster or replacing unhealthy nodes, so its correctness is crucial for enabling customer confidence in these operations and overall reliability within a replica set or sharded cluster. In 2019, we needed to implement a new, safe reconfiguration protocol with rigorous correctness guarantees. At the time, the MongoDB replication system had an existing, legacy reconfiguration mechanism, but it had several known correctness bugs which necessitated a new protocol design. Although the existing protocol had correctness issues, it also had some attractive design characteristics. In particular, it decoupled reconfigurations from the main database operation log and employed a logless design, storing configurations as single objects and replicating them between nodes in a gossip-based manner. Therefore, as part of our design process, we had a goal of developing a new, safe reconfiguration protocol while minimizing changes to this existing, legacy gossip-based reconfiguration protocol. We knew that dynamic reconfiguration protocols were notoriously difficult to design correctly, so we needed a design approach that would allow us to proceed efficiently and with high confidence. With the help of formal specification and model checking tools—specifically TLA+ and its model checker, TLC—we were able to embark on a process of rapidly developing the design of a new, safe, logless reconfiguration protocol in just a couple of weeks, and implementing it in production in a few months. In this post, we discuss our process of formally modeling the legacy reconfiguration protocol in TLA+, characterizing its bugs with a model checker, and iteratively developing modifications to lead to a safe, logless reconfiguration protocol design. There were a few key, high-level takeaways from our process. Most notably, rigorous, formal modeling didn’t slow us down, but instead accelerated design and delivery timelines while maintaining a high correctness bar. It also led to a simpler protocol design, allowing maintenance of a unified reconfiguration engine, rather than dealing with two parallel protocols, which could be prone to unexpected interactions and maintenance burden. The new protocol also provided novel performance benefits over standard reconfiguration approaches, due to the decoupling of reconfigurations from the main database log. Background and motivation The original MongoDB replication system used a legacy, gossip-based reconfiguration protocol that was fully decoupled from the main oplog. Each configuration was ordered by a numeric, monotonic config version, and nodes in a replica set learned the latest config from each other via periodic heartbeat messages. Upon learning of a higher config, it was immediately installed and took effect on that node. We refer to this original protocol design as logless, since it stored configurations as a single object and propagated them in a gossip-based manner, with no use of a sequential log for recording and replicating reconfiguration operations. This protocol also had a “force reconfig” feature, allowing users to install a new configuration even if a majority of nodes were offline. While the legacy protocol performed well in most scenarios, it was known to be unsafe in certain cases. Moreover, we expected reconfiguration to become a more common operation in MongoDB, necessitating the development of a new, safe reconfiguration protocol. Initially, we considered Raft's existing reconfiguration protocols , including its single-node reconfiguration protocol, which restricts reconfigurations to adding or removing a single server. The standard Raft approach, however, was ultimately deemed incompatible with "force reconfig," and would require maintenance of both a new, log-based implementation and the legacy, gossip-based one. It would also be complicated to ensure the two protocols didn’t interfere with each other. Instead, we hoped to develop a new protocol that minimized changes to the existing legacy protocol to simplify design and implementation. Ideally, we would be able to adopt ideas from Raft’s single-node reconfiguration protocol to our gossip-based, legacy reconfig protocol—which would allow for better compatibility with "force" reconfig, would be easier to upgrade and downgrade, and would eliminate the need for a new oplog entry format for reconfigurations. This idea of developing a safe, logless reconfiguration protocol seemed promising, as it would eliminate the need to mix two protocols and to share the basic mechanism for both normal and force reconfigurations. We needed, however, to be very confident in the correctness of such an approach, which was difficult to do manually and with a short design time frame. When we first pitched this idea early in the design process, it was unclear if such a solution was possible and whether it could be successful and implemented safely in production. There was some existing work on decoupling reconfigurations and on logless consensus , but none that directly applied to a Raft-based consensus system such as ours. Also, the discovery of a critical safety bug in one of Raft's reconfiguration protocols after its initial publication highlighted how challenging the task of designing or modifying reconfiguration protocols for consensus systems can be. This bug was only discovered over a year after Raft’s initial publication and required subtle protocol modifications to address. Around that time, in 2019, MongoDB’s replication team had had some past success with TLA+ and model checking on similar protocol design problems. Encouraged by these experiences, we set off to employ TLA+ and its model checker, TLC, to rapidly iterate on a candidate design and to develop a safe, logless reconfiguration protocol design that was simpler, easier to implement, and which provided novel performance benefits. Modeling the legacy protocol We were focused on developing a reconfiguration protocol that minimized design changes to the existing system, so we started by developing a TLA+ specification of the legacy reconfiguration protocol. This allowed us to characterize the flaws in this legacy protocol precisely and guide us towards modifications needed to make the protocol safe. To model the legacy, gossip-based protocol, we extended an existing TLA+ specification we had developed for an abstract version of the MongoDB replication protocol that did not include reconfiguration behavior. We extended this specification with two key reconfiguration-related actions: a Reconfig action, which represents the installation of a new config on a primary node, and a SendConfig action, which gossips a new config with a higher config version from one node to another. This model also defines the high-level safety properties of the protocol. The fundamental external guarantee is that when a majority write is committed on a replica set, the write will be durable as long as a majority of nodes are alive. This guarantee is largely captured in the LeaderCompleteness property, stating that any new leader in a higher term must contain log entries committed in earlier terms. Along with this, we also include a lower-level correctness property of Raft-based systems, ElectionSafety , which states that there can never be two primaries in the same term. Iteratively strengthening our reconfiguration rules Our legacy protocol model and its underlying correctness properties served as the starting point for a series of experiments, guided by the model checker, that iteratively led us towards a safe protocol design. We explored a series of design candidates by incrementally analyzing and refining our design in response to counterexamples discovered by the model checker. Single node changes One of the fundamental, challenging aspects of dynamic reconfiguration is related to the fact that the notion of “quorum” (i.e., majority) changes when the set of servers operating the protocol changes. For example, consider a reconfiguration that expands the protocol’s set of servers from C1={n 1 , n 2 , n 3 } to C2={n 1 , n 2 , n 3 , n 4 , n 5 }. Contacting a quorum in C1 may (correctly) contact servers Q1={n 1 , n 2 }, but a valid quorum in C2 may be Q2={n 3 , n 4 , n 5 }, which is problematic since Q1 and Q2 do not intersect, a key property of all quorums in standard Raft (and most other practical consensus protocols). Raft’s single-node approach attempts to partially address this by restricting configuration changes to those that add or remove a single node, which enforces overlapping quorums between such configurations. So we started by considering a basic initial question: does enforcing single-node changes partially address the safety issues of the legacy protocol? We had expected this would not be a fully sufficient condition for safety, but it was a stepping stone towards safer protocol revisions, and we wanted to confirm each of our hypotheses along the way. We introduced the single node change rule in the Reconfig action, which ensures that any majority of nodes in the old config and any majority of nodes in the new config share at least one common node. In our specification, we employed a slightly generalized definition of this property, which allows reconfigurations between any nodes where majority quorums overlap, even if not strictly a single-node change (e.g. all majority quorums of C1={n 1 , n 2 } and C2={n 1 , n 2 , n 3 , n 4 } intersect, but you cannot move from one to the other via a single addition/removal). One of the benefits of specifying the protocol in a high-level, mathematical specification language like TLA+ is that it enables concise definition of these kinds of properties, as seen below. After adding this condition to our Reconfig action, TLC was able to produce a violation trace for this updated protocol in a few seconds, and this bug was clear to understand, as shown below (only modified variables are shown in each state): Essentially, single-node changes only guarantee safe quorum intersection between adjacent configurations, but a series of locally adjacent reconfigurations may lead to a globally unsafe situation—i.e., two configurations that are both active but violate the quorum overlap property. This is demonstrated in the above trace concretely, and leads to a violation of the ElectionSafety property, with two nodes acting as primary in the same term in State 6. Node n 1 was safely elected in configuration {n 1 }, but then two subsequent reconfigurations occur to move the system to {n 1 , n 2 , n 3 }, and n 2 is elected in this configuration with a quorum of {n 2 , n 3 }, with no intersection of the original quorum of config {n 1 }. Our initial expectation was that just adding the single-node change constraint would not be correct by itself, but it was reassuring to have the model checker confirm this with a counterexample in just a few seconds. This began to give us more confidence to iterate on a new protocol design, which we proceeded to develop over the next week or so, next moving on to understand a deeper investigation of protocol safety requirements. Config commitment rule Adopting the single-node change condition is straightforward, as it only requires verifying new configurations in a pairwise, local manner. As we saw above, though, it is still problematic to move through arbitrary sequences of overlapping configurations, so we need to take extra care to avoid these problematic cases. Our first hunch was to add an explicit notion of “config commitment” within the protocol, similar to the commitment rules of Raft. That is, restrict a reconfiguration from taking place until some appropriate commitment conditions have been satisfied. Intuitively, this would place restrictions on how quickly, for example, a primary could execute reconfigurations—i.e., it would prevent a primary from moving to a new configuration before an older, non-overlapping configuration was, in a sense, “deactivated.” One natural idea was to borrow some similar concepts from Raft on log commitment, adapted for our logless, gossip-based setting. After a few iterations, we developed the following additional preconditions for the Reconfig action: ConfigQuorumCheck : A quorum of nodes have the same config version as the primary executing the reconfig. TermQuorumCheck : A majority of nodes in the primary’s config have reached the term of the primary or newer. We modeled the protocol with these new TermQuorumCheck and ConfigQuorumCheck , and they were initially sufficient to rule out the counterexamples we encountered previously. They were not yet fully general to ensure safety, though, as we will see below, where we worked out a final solution for config commitment. Oplog commitment rule In addition to the "config commitment" idea, it is worth noting the relationship between the config and oplog caused by the divergence from Raft. Raft sequences a reconfiguration among other oplog entries, thereby establishing a strong implicit ordering among them. However, since the gossip-based reconfig protocol does not include the configuration as part of the oplog, there may be some implicit dependencies between oplog entries and configurations that are not accounted for. We had started to think about this interaction between oplog entry commitment and reconfiguration, and conjectured a few problematic scenarios that we were able to confirm with the model checker. An example of this problem is illustrated by the following, simplified error trace: The core issue here is that config C3={n 1 , n 2 , n 3 } (with version=3) is installed even though the entry <<1,1>> (index, term) that was committed in a previous configuration, C1={n 1 }, has not been committed in the current configuration, C2={n 1 ,n 2 }. Since quorums may not overlap for non-adjacent configurations (e.g., C1 and C3), by ensuring that the commitment of writes in a previous configuration is also guaranteed in the current configuration, we can "propagate" the durability guarantee of earlier configurations to the future. As a result, we need to explicitly check this property when accepting reconfiguration commands. The rules for accepting a new configuration now include this additional, newly developed precondition: This rule is about ensuring that durable, replicated log entries from older configs are transferred to newer configs, which must be upheld to ensure safe protocol operation over time. This feature is implicit in Raft reconfiguration due to the tight coupling of reconfigurations and the main operation oplog, but must be handled explicitly here due to the decoupled design. The config as logless state machine We were now confident that we had established strong rules to guarantee local quorum overlap, the proper sequential ordering of configs, and the appropriate transfer of oplog entries between configs. After re-checking our model with these new preconditions, though, the model checker discovered a new counterexample after running for several hours on a larger workstation. The following is a simplified version of this error trace: In this case, node n 1 executes a reconfig to Ca={n 1 ,n 2 ,n 3 }, but hasn't propagated it to any other nodes at state 3. Then, n 2 becomes the primary and reconfigures to config Cb={n 1 ,n 2 ,n 4 } in state 6. n 1 can then be elected in term 3 with quorum {n 1 ,n 3 }, and n 2 can be elected in term 3 with quorum {n 2 ,n 4 }, violating the ElectionSafety property. The problem in the above trace is that when n 2 moved to a new config, it should have ensured that, in the future, no leaders would ever be elected in “earlier” configs. It failed to do so and, in the last step, a quorum was then able to be formed in a config with version 2, leading to two active, non-overlapping quorums. A key here is that divergence between configs in different terms leads to the issue. That is, config commitment as we did above was sufficient for a sequence of reconfigs by a single leader, but not with concurrent leaders in competing terms. Figure 1. Concurrent configurations with non-intersecting majority quorums. After going through these counterexamples, we understood the problem more clearly and had a path to refine our correctness argument. We realized that agreeing on the configuration among nodes can be viewed as a separate kind of consensus problem, separate from the oplog consensus but with similar rules. In our system, the config itself can be viewed as a compacted (i.e., rolled up) replicated state machine (RSM) that does not require a log (i.e., it is “logless”), since explicit maintenance of config history isn’t needed and only the latest config takes effect. Propagating the config via heartbeats can be viewed as “appending” to the config log (e.g. as in Raft), and rolling back a config is never explicitly required—i.e., we always simply install a more up-to-date config. This config RSM already shares many similarities with the oplog RSM, such as term propagation. The similarity suggests that just using the config version to identify a config is not sufficient. Viewing the config as its own RSM, we need to assign the primary’s term to configs. The config term is then a separate property of the config, similar to how the oplog entry’s term is part of every oplog entry. Thus, a config should be defined and ordered by the tuple (configVersion, configTerm), analogous to how an oplog entry is identified and ordered by its (timestamp, term), with term being compared first, followed by timestamp/version. The elections of these two consensus protocols can then be merged together by adding a new rule that a voter checks if the candidate’s config is stale in addition to other checks. Moreover, we can borrow the definition of “commitment” from the oplog RSM to the config RSM. That is, when a config is propagated to a majority of nodes in the primary’s term, the config is committed. It also became clear that the RSM only moves ahead through committed configs sequentially - the config RSM can choose the next config and commit it only if its current one is committed. Putting it all together Our final protocol specification included all of the above preconditions and features, producing a version of the protocol which we refer to as safe, logless dynamic reconfiguration. We conducted final model checking runs for several cases over 20 hours, exploring over 800 million protocol states, with configurations of four and five servers, along with pen and paper explanations for the correctness of the final result. Note that, at a high level, we can understand dynamic reconfiguration protocols like this as needing to deal with two core conceptual aspects: (1) config deactivation and (2) state transfer. Our various config commitment rules combine to address the first, which is related to ensuring that different configs that diverge over time cannot both be concurrently active. Aspect (2) relates to the fact that various types of replicated, durable state within a configuration must be appropriately transferred over to newer configurations. This is what the oplog commitment rules address, as well as the rules for ensuring that the term state propagates appropriately between configurations. Once we had the abstract protocol ironed out and gained confidence in its correctness, we were ready to move forward swiftly to implementation, and completed it in the MongoDB replication system over the course of a few months. The protocol has been running reliably in MongoDB and in production for several years since its introduction, and the implementation and protocol were significantly simpler than our original design alternatives. Takeaways Overall, we were able to get a draft protocol in one week, and within two weeks we finalized the protocol and successfully passed correctness checks using the model checker. It was motivating to see our vague ideas turn into something tangible, and the successful outcome from this design phase gave us the confidence to move forward to the implementation phase. Model checking is an excellent tool for rapidly and precisely answering "what if" design questions. Our efforts also emphasized an important feature of lightweight, design-level formal methods techniques, which is about more than simply ensuring correctness of your system design. Rather, it enables the exploration of protocol optimizations at a level of aggressiveness and velocity that would typically be infeasible with manual design methods. From this perspective, we can view these formal methods tools as not only a means for improving correctness of our systems and protocols, but as a means for efficient exploration of the optimization design space while maintaining a high correctness bar. This also speaks to the potential value of investing some amount of time upfront in models for key protocols that are highly critical and may need to evolve over time. Due to our novel protocol design, the scope of the implementation changes also became much smaller. We delivered the project in three months with three to four developers, and "force reconfig" was implemented using the same mechanism with relaxed rules. Version upgrade/downgrade only involves a small on-disk format change of the config, avoiding switching between two different reconfig approaches. In addition, our approach also provided potential performance improvements. Specifically, the decoupled reconfiguration design can bypass the oplog to recover the system when the oplog becomes the bottleneck. Similar ideas have since been explored in other, recent reconfiguration protocols like Matchmaker Paxos . Since its introduction in MongoDB 4.4 in 2019, the new, logless reconfiguration protocol has proven to be reliable and has served as a solid building block for other features, such as automatically assigning new nodes votes only after their initial sync. There have been no significant protocol bugs discovered since its deployment, a testament to the value of these rigorous protocol design techniques. While we focused on the intuition of the new protocol and the experience of leveraging model checking in this article, our paper , published in OPODIS 2021, includes a much more detailed description of the reconfiguration protocol, and a formal safety proof was also published. The final versions of the specifications we developed and published can be found in this Github repository , as well as some of the original specs we used in the MongoDB repository .

July 2, 2025

Build Event-Driven Apps Locally with MongoDB Atlas Stream Processing

Building event-driven architectures (EDAs) often poses challenges, particularly when you’re integrating complex cloud components with local development services. For developers, working directly from a local environment provides convenience, speed, and flexibility. Our demo application demonstrates a unique development workflow that balances local service integration with cloud stream processing, showcasing portable, real-time event handling using MongoDB Atlas Stream Processing and ngrok. With MongoDB Atlas Stream Processing, you can streamline the development of event-driven systems while maintaining all the components locally. Using this service’s capabilities alongside ngrok, this demo application shows a secure way to interact with cloud services directly from your laptop, ensuring you can build, test, and refine applications with minimal friction and maximum efficiency. Using MongoDB Atlas Stream Processing MongoDB Atlas Stream Processing is a powerful feature within the MongoDB Atlas modern database that enables you to process data streams in real time using the familiar MongoDB Query API (and aggregation pipeline syntax). It integrates seamlessly with MongoDB Atlas clusters, Apache Kafka, AWS Lambda, and external HTTP endpoints. Key takeaway #1: Build event-driven apps more easily with MongoDB Atlas Stream Processing One of the primary goals of MongoDB Atlas Stream Processing is to simplify the development of event-driven applications. Instead of managing separate stream processing clusters or complex middleware, you can define your processing logic directly within MongoDB Atlas. This means: A unified platform: Keep your data storage and stream processing within the same ecosystem. Familiar syntax: Use the MongoDB Query API and aggregation pipelines you already know. Managed infrastructure: Let MongoDB Atlas handle the underlying infrastructure, scaling, and availability for your stream processors. Key takeaway #2: Develop and test locally, deploy globally A significant challenge in developing event-driven systems is bridging the gap between your local development environment and cloud-based services. How do you test interactions with services running on your laptop? You can configure MongoDB Atlas Stream Processing to connect securely to HTTP services and even Apache Kafka instances running directly on your development machine! You can typically achieve this using a tunneling service like ngrok, which creates secure, publicly accessible URLs for your local services. MongoDB Atlas Stream Processing requires HTTPS for HTTP endpoints and specific Simple Authentication and Security Layer protocols for Apache Kafka, making ngrok an essential tool for this local development workflow. Introducing the real-time order fulfillment demo To showcase these capabilities in action, we’ve built a full-fledged demo application available on GitHub . Figure 1. High-level architecture diagram. This demo simulates a real-time order fulfillment process using an event-driven architecture orchestrated entirely by MongoDB Atlas Stream Processing. What the demo features A shopping cart service: Generates events when cart items change. An order processing service: Handles order creation and validation (running locally as an HTTP service). A shipment service: Manages shipment updates. Event source flexibility: Can ingest events from either a MongoDB capped collection or an Apache Kafka topic (which can also run locally). Processors from Atlas Stream Processing: Act as the central nervous system, reacting to events and triggering actions in the different services. An order history database: Centralizes status updates for easy tracking. Figure 2. High-level sequence diagram of a flow. How the demo uses MongoDB Atlas Stream Processing and local development Event orchestration: MongoDB Atlas Stream Processing instances listen for shopping cart events (from MongoDB or Kafka). Local service interaction: An ASP processor calls the Order Processing Service running locally on localhost via an ngrok HTTPS tunnel. Kafka integration (optional): Demonstrates ASP connecting to a local Kafka broker, also tunneled via ngrok . Data enrichment & routing: Processors enrich events and route them appropriately (e.g., validating order, triggering shipments). Centralized logging: All services write status updates to a central MongoDB collection that functions as a continuously materialized view of order status and history. This demo practically illustrates how you can build sophisticated, event-driven applications using ASP while performing key development and testing directly on your laptop, interacting with local services just as you would in a deployed environment. What the demo highlights Real-world EDA: Provides a practical example of asynchronous service communication. Orchestration powered by MongoDB Atlas Stream Processing: Shows how this service manages complex event flows. Local development workflow: Proves the concept of connecting this service to local HTTP / Apache Kafka via ngrok. Flexible event ingestion: Supports both MongoDB and Apache Kafka sources. Centralized auditing: Demonstrates easy status tracking via a dedicated history collection. Get started with the demo! MongoDB Atlas Stream Processing significantly lowers the barrier to entry for building robust, real-time EDAs. Its ability to integrate seamlessly with MongoDB Atlas, external services, and, crucially, your local development environment (thanks to tools like ngrok) makes it a powerful addition to the developer toolkit. Explore the demo project, dive into the code, and see for yourself how ASP can simplify your next event-driven architecture, starting right from your own laptop! Ready to see it in action? Head over to the GitHub repository ! The repository’s README.md file contains comprehensive, step-by-step instructions to get you up and running. In summary, you’ll: Clone the repository. Set up a Python virtual environment and install dependencies. Crucially, set up ngrok to expose your local order-processing service (and Apache Kafka, if applicable) via secure tunnels. (Details in the README.md appendix!) Configure your .env file with MongoDB Atlas credentials, API keys, and the ngrok URLs. Run scripts to create necessary databases, collections, and the MongoDB Atlas Stream Processing instance/connections/processors. Start the local order_processing_service.py . Run the shopping_cart_event_generator.py to simulate events. Query the order history to see the results! For detailed setup guidance, especially regarding ngrok configuration for multiple local services (HTTP and TCP / Apache Kafka), please refer to the appendix of the project's README.md .

July 1, 2025

Data Modeling Strategies for Connected Vehicle Signal Data in MongoDB

Today’s connected vehicles generate massive amounts of data. According to an article from S&P Global Mobility, a single modern car produces nearly 25GB of data per hour. To put that in perspective: that’s like each car pumping out the equivalent of six full-length Avatar movies in 4K—every single day! Now scale that across millions of vehicles, and it’s easy to see the challenge ahead. Of course, not all of that data needs to be synchronized to the cloud—but even a fraction of it puts significant pressure on the systems tasked with processing, storing, and analyzing it at scale. The challenge isn’t just about volume. The data is fast-moving and highly diverse—from telematics and location tracking to infotainment usage and driver behavior. Without a consistent structure, this data is hard to use across systems and organizations. That’s why organizations across the industry are working to standardize how vehicle data is defined and exchanged. One such example is the Connected Vehicle Systems Alliance or COVESA , which developed the Vehicle Signal Specification (VSS)—a widely adopted, open data model that helps normalize vehicle signals and improve interoperability. But once data is modeled, how do you ensure it's persistent and available at all times in real-time? To meet these demands, you need a data layer that's flexible, reliable, and performant at scale. This is precisely where a robust data solution designed for modern needs becomes essential. In this blog, we’ll explore data strategies for connected vehicle systems using VSS as a reference model, with a focus on real-world applications like fleet management. These strategies are particularly effective when implemented on flexible, high-performance databases like MongoDB, a solution trusted by leading automotive companies . Is your data layer ready for the connected car era? Relational databases were built in an era when saving storage space was the top priority. They work well when data fits neatly into tables and columns—but that’s rarely the case with modern, high-volume, and fast-moving vehicle data. Telematics, GPS coordinates, sensor signals, infotainment activity, diagnostic logs—data that’s complex, semi-structured, and constantly evolving. Trying to force it into a rigid schema quickly becomes a bottleneck. That’s why many in the automotive world are moving to document-oriented databases. A full-fledged data solution, designed for modern needs, can significantly simplify how one works with data, scale effortlessly as demands grow, and adapt quickly as systems evolve. A solution embodying these capabilities, like MongoDB, supports the demands of complex connected vehicle systems. Its features include: Reduced complexity: The document model mirrors the way developers already structure data in their code. This makes it a natural fit for vehicle data, where data often comes in nested, hierarchical formats. Scale by design: MongoDB’s distributed architecture and flexible schema design help simplify scaling. It reduces interdependencies, making it easier to shard workloads without performance headaches. Built for change: Vehicle platforms are constantly evolving, and MongoDB makes it easy to update data models without costly migrations or downtime, keeping development fast and agile. AI-ready: MongoDB supports a wide variety of data types—structured, time series, vector, graph—which are essential for AI-driven applications. This makes it the natural choice for AI workloads, simplifying data integration and accelerating the development of smart systems. Figure 1. The MongoDB connected car data platform. These capabilities are especially relevant in connected vehicle systems. Companies like Volvo Connect use MongoDB Atlas to track 65 million daily events from over a million vehicles, ensuring real-time visibility at massive scale. Another example is SHARE NOW , which handles 2TB of IoT data per day from 11,000 vehicles across 16 cities, using MongoDB to streamline operations and deliver better mobility experiences. It’s not just the data—it’s how you use it Data modeling is where good design turns into great performance. In traditional relational systems, modeling starts with entities and relationships to focus on minimizing data duplication. MongoDB flips that mindset. You still care about entity relationships—but what really drives design is how the data will be used. The core principle? Data that is accessed together should be stored together. Let’s bring this to life. Take a fleet management system. The workload includes vehicle tracking, diagnostics, and usage reporting. Modeling in MongoDB starts by understanding how that data is produced and consumed. Who’s reading it, when, and how often? What’s being written, and at what rate? Below, we show a simplified workload table that maps out entities, operations, and expected rates. Table 1. Fleet management workload example. Now, to the big question: how do you model connected vehicle signal data in MongoDB? It depends on the workload. If you're using COVESA’s VSS as your signal definition model, you already have a helpful structure. VSS defines signals as a hierarchy: attributes (rarely change, like tank size), sensors (update often, like speed), and actuators (reflect commands, like door lock requests). This classification is a great modeling hint. VSS’s tree structure maps neatly to MongoDB documents. You could store the whole tree in a single document, but in most cases, it’s more effective to use multiple documents per vehicle. This approach better reflects how the data is produced and consumed—leading to a model that’s better suited for performance at scale. Now, let’s look at two examples that show different strategies depending on the workload. Figure 2. Sample VSS tree. Source: Vehicle Signal Specification documentation . Example 1: Modeling for historical analysis For historical analysis—like tracking fuel consumption trends—time-stamped data needs to be stored efficiently. Access patterns may include queries like “What was the average fuel consumption per km in the last hour?” or “How did the fuel level change over time?” Here, separating static attributes from dynamic sensor signals helps minimize unnecessary updates. Grouping signals by component (e.g., powertrain, battery) allows updates to be scoped and efficient. MongoDB Time Series collections are built for exactly this kind of data, offering optimized storage, automatic bucketing, and fast time-based queries. Example 2: Modeling for the last vehicle state If your focus is real-time state—like retrieving the latest signal values for a vehicle—you’ll prioritize fast reads and lightweight updates. Common queries include “What’s the latest coolant temperature?” or “Where are all fleet vehicles right now?” In this case, storing a single document per vehicle or update group with only the most recent signal values works well. Updating fields in place avoids document growth and keeps read complexity low. Grouping frequently updated signals together and flattening nested structures ensures that performance stays consistent as data grows. These are just two examples—tailored for different workloads—but MongoDB offers the flexibility to adapt your model as needs evolve. For a deeper dive into MongoDB data modeling best practices, check out our MongoDB University course and explore our Building with Patterns blog series . The right model isn't one-size-fits-all—it’s the one that matches your workload. How to model your vehicle signal data At the COVESA AMM Spring 2025 event, the MongoDB Industry Solutions team presented a prototype to help simplify how connected vehicle systems adopt the Vehicle Signal Specification. The concept: make it easier to move from abstract signal definitions to practical, scalable database designs. The goal wasn’t to deliver a production-ready tool—it was to spark discussion, test ideas, and validate patterns. It resonated with the community, and we’re continuing to iterate on it. For now, the use cases are limited, but they highlight important design decisions: how to structure vehicle signals, how to tailor that structure to the needs of an application, and how to test those assumptions in MongoDB. Figure 3. Vehicle Signal Data Model prototype high-level architecture. This vehicle signals data modeler is a web-based prototype built with Next.js and powered by MongoDB Atlas. It’s made up of three core modules: Schema builder: This is where it starts. You can visually explore the vehicle signals tree, select relevant data points, and define how they should be structured in your schema. Use case mapper: Once the schema is defined, this module helps map how the signals are used. Which signals are read together? Which are written most often? These insights help identify optimization opportunities before the data even hits your database. Database exporter: Finally, based on what you’ve defined, the tool generates an initial database schema optimized for your workload. You can load it with sample data, export it to a live MongoDB instance, and run aggregation pipelines to validate the design. Together, these modules walk you through the journey—from signal selection to schema generation and performance testing—all within a simple, intuitive interface. Figure 4. Vehicle signal data modeler demo in action. Build smarter, adapt faster, and scale more confidently Connected vehicle systems aren’t just about collecting data—they’re about using it, fast and at scale. To get there, you need more than a standardized signal model. You need a data solution that can keep up with constant change, massive volume, and real-time demands. That’s where MongoDB stands out. Its flexible document model, scalable architecture, and built-in support for time series and AI workloads make it a natural fit for the complexities of connected mobility. Whether you're building fleet dashboards, predictive maintenance systems, or next-gen mobility services, MongoDB helps you turn vehicle data into real-world outcomes—faster. To learn more about MongoDB-connected mobility solutions, visit the MongoDB for Manufacturing & Mobility webpage. You can also explore the vehicle signals data modeller prototype and related resources on our GitHub repository .

July 1, 2025

Introducing Text-to-MQL with LangChain: Query MongoDB using Natural Language

We're excited to announce that we've added a powerful new capability to the MongoDB integration for LangChain: Text-to-MQL. This enhancement allows developers to easily transform natural language queries into MongoDB Query Language (MQL), enabling them to build new and intuitive application interfaces powered by large language models (LLMs). Whether you're building chatbots to interact with internal company data stored on MongoDB or AI agents that will work directly with MongoDB, this LangChain toolkit delivers out-of-the-box natural language querying with Text-to-MQL. Enabling new interfaces with Text-to-MQL LLMs are transforming the workplace by enabling people to “talk” to their data. Historically, accessing and querying databases required specialized knowledge or tools. Now, with natural language querying enabled by LLMs, developers can create new, intuitive interfaces that give virtually anyone access to data and insights—no specialized skills required. Using Text-to-MQL, developers can build applications that rely on natural language to generate insights or create visualizations for their users. This includes conversational interfaces that query MongoDB directly, democratizing database exploration and interactions. Robust database querying capabilities through natural language are also critical for building more sophisticated agentic systems. Agents leveraging MongoDB through MQL can interact autonomously with both operational and analytical data, greatly enhancing productivity across a wide range of operational and business tasks. Figure 1. Agent components and how MongoDB powers tools and memory. For instance, customer support agents leveraging Text-to-MQL capabilities can autonomously retrieve the most recent customer interactions and records directly from MongoDB databases, enabling faster and more informed responses. Similarly, agents generating application code can query database collections and schemas to ensure accurate and relevant data retrieval logic. In addition, MongoDB’s flexible document model aligns more naturally with how users describe data in plain language. Its support for nested, denormalized data in JSON-like BSON documents reduces the need for multi-table joins—an area where LLMs often struggle—making MongoDB more LLM-friendly than traditional SQL databases. Implementing Text-to-MQL with MongoDB and LangChain The LangChain and MongoDB integration package provides a comprehensive set of tools to accelerate AI application development. It supports advanced retrieval-augmented generation (RAG) implementations through integrations with MongoDB for vector search, hybrid search, GraphRAG, and more. It also enables agent development using LangGraph, with built-in support for memory persistence. The latest addition, Text-to-MQL, can be used either as a standalone component in your application or as a tool integrated into LangGraph agents. Figure 2. LangChain and MongoDB integration overview. Released in version 0.6.0 of the langchain-mongodb package, the agent_toolkit class introduces a set of methods that enable reliable interaction with MongoDB databases, without the need to develop custom integrations. The integration enables reliable database operations, including the following pre-defined tools: List the collections in the database Retrieve the schema and sample rows for specific collections Execute MongoDB queries to retrieve data Check MongoDB queries for correctness before executing them You can leverage the LangChain database toolkit as a standalone class in your application to interact with MongoDB from natural language and build custom text interfaces or more complex agentic systems. It is highly customizable, providing the flexibility and control needed to adapt it to your specific use cases. More specifically, you can tweak and expand the standard prompts and parameters offered by the integration. When building agents using LangGraph —LangChain’s orchestration framework—this integration serves as a reliable way to give your agents access to MongoDB databases and execute queries against them. Real-world considerations when implementing Text-to-MQL Natural language querying of databases by AI applications and agentic systems is a rapidly evolving space, with best practices still taking shape. Here are a few key considerations to keep in mind as you build: Ensuring accuracy The generated MongoDB Query Language (MQL) relies heavily on the capabilities of the underlying language model and the quality of the schema or data samples provided. Ambiguities in schemas, incomplete metadata, or vague instructions can lead to incorrect or suboptimal queries. It's important to validate outputs, apply rigorous testing, and consider adding guardrails or human review, especially for complex or sensitive queries. Preserving performance Providing AI applications and agents with access to MongoDB databases can present performance challenges. The non-deterministic nature of LLMs makes workload patterns unpredictable. To mitigate the impact on production performance, consider routing agent queries to a replica set or using dedicated, optimized search nodes . Maintaining security and privacy Granting AI apps and agents access to your database should be considered with care. Apply the common security principles and best practices: define and enforce roles and policies to implement least-privilege access, granting only the minimum permissions necessary for the task. Giving access to your data may involve sharing private and sensitive information with LLM providers. You should evaluate what kind of data should actually be sent (such as database names, collection names, or data samples) and whether that access can be toggled on or off to accommodate users. Build reliable AI apps and agents with MongoDB LLMs are redefining how we interact with databases. We're committed to providing developers the best paths forward for building reliable AI interfaces with MongoDB. We invite you to dive in, experiment, and explore the power of connecting AI applications and agents to your data. Try the LangChain MongoDB integration today! Ready to build? Dive into Text-to-MQL with this tutorial and get started building your own agents powered by LangGraph and MongoDB Atlas!

June 30, 2025

Natural-Language Agents: MongoDB Text-to-MQL + LangChain

The text-to-MQL capability available in the LangChain MongoDB package converts natural language into MongoDB Query Language, enabling applications to process queries like, "Show me movies from the 1990s with ratings above 8.0," and automatically generate the corresponding MongoDB operations. This guide demonstrates how to build production-ready applications that leverage text-to-MQL for conversational database interfaces, covering agent architectures, conversation memory, and reliable database interactions at scale. Understanding text-to-MQL: Beyond simple query translation Text-to-MQL shifts database interaction from manual query construction to natural language processing. Traditional database applications require developers to parse user intent, construct queries, handle validation, and format results. Text-to-MQL applications can accept natural language directly: # Traditional approach def get_top_movies_by_rating(min_rating, limit): return db.movies.aggregate([ {"$match": {"imdb.rating": {"$gte": min_rating}}}, {"$sort": {"imdb.rating": -1}}, {"$limit": limit} ]) # Text-to-MQL approach def process_natural_language_query(user_query): return agent.invoke({"messages": [("user", user_query)]}) This transformation enables natural language interfaces for complex database operations, making data access intuitive for end users while reducing development effort for database interaction logic. The MongoDB agent toolkit: Implementing text-to-MQL Users can access agent_toolkit by: pip install langchain-mongodb langchain-openai langgraph The LangChain MongoDB agent_toolkit provides four core tools that work together to implement text-to-MQL functionality: from langchain_mongodb.agent_toolkit import MongoDBDatabase, MongoDBDatabaseToolkit db = MongoDBDatabase.from_connection_string(connection_string, database="sample_mflix") toolkit = MongoDBDatabaseToolkit(db=db, llm=llm) tools = toolkit.get_tools() Set up MongoDB Atlas with our sample movie dataset and start experimenting with text-to-MQL in minutes. Get started with Atlas Free Tier Load the sample MFlix Dataset Full notebook demonstration The text-to-MQL workflow When a user asks, "Which theaters are furthest west?" the text-to-MQL system follows this process: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Step Tool What happens Example 1. Discovery mongodb_list_collections Agent identifies available data Finds theaters, movies, users, collections 2. Schema understanding mongodb_schema Agent examines relevant collection structure Discovers location.geo field in theaters 3. Query generation LLM reasoning Natural language converts to MongoDB syntax Creates geospatial aggregation pipeline 4. Validation mongodb_query_checker Agent verifies query correctness Checks syntax and field references 5. Execution mongodb_query Agent runs the validated query Returns sorted theaters by longitude This workflow handles complex operations automatically—including geospatial queries, aggregations, and multi-collection operations—without requiring manual aggregation pipeline development. Building your first agent? Follow our step-by-step guide: Build Agents with LangGraph and MongoDB . Complex query examples Text-to-MQL handles sophisticated analytical queries: def demo_basic_queries(): queries = [ "List the top 5 movies with highest IMDb ratings", "Who are the top 10 most active commenters?", # ... additional queries for theaters, geographic analysis, director analytics ] for query in queries: # Execute each text-to-MQL query in separate conversation thread execute_graph_with_memory(f"demo_{i}", query) Query complexity examples: Temporal analysis: "Show me movie rating trends by decade for sci-fi films"—automatically filters by genre, groups by decade, and calculates statistical aggregations. Geographic intelligence: "Which states have the most theaters and what's their average capacity?"—discovers geographic fields, groups by state boundaries, and calculates regional statistics. Cross-collection analytics: "Find directors with at least 10 films who have the highest average ratings"—joins movie and director data, applies complex filtering and ranking logic. See these workflows in action Our interactive notebook demonstrates each step with live code examples you can run and modify. Explore the complete notebook in our Gen AI Showcase . Agent architecture patterns for text-to-MQL Two proven patterns address different text-to-MQL requirements based on your application's predictability needs. Pattern 1: ReAct agents for dynamic processing ReAct (Reasoning + Acting) agents provide flexible text-to-MQL processing where the optimal query strategy isn't predetermined: from langgraph.prebuilt import create_react_agent def create_flexible_text_to_mql_agent(): # Create ReAct agent with MongoDB tools and conversation memory checkpointer = MongoDBSaver(client) return create_react_agent(llm, toolkit.get_tools(), checkpointer=checkpointer) # Usage: Create agent and execute queries with conversation context agent = create_flexible_text_to_mql_agent() config = {"configurable": {"thread_id": "exploration_session"}} agent.invoke({"messages": [("user", "Find anomalies in user behavior patterns")]}, config) For more details, see how the MongoDBDatabaseToolkit can be used to develop ReAct-style agents. Pattern 2: Structured workflows for predictable operations For applications requiring consistent text-to-MQL behavior, implement deterministic workflows: def list_collections(state: MessagesState): # Call mongodb_list_collections tool to discover available data # Returns updated message state with collection list return {"messages": [call_msg, tool_response]} def generate_query(state: MessagesState): # Use LLM with MongoDB tools to convert natural language to MQL # Returns updated message state with generated query return {"messages": [llm_response]} # See notebook for complete node implementations def create_langgraph_agent_with_enhanced_memory(): summarizing_checkpointer = LLMSummarizingMongoDBSaver(client, llm) g = StateGraph(MessagesState) g.add_node("list_collections", list_collections) g.add_node("get_schema", schema_node) # ... add nodes for: generate_query, run_query, format_answer g.add_edge(START, "list_collections") g.add_edge("list_collections", "get_schema") # ... connect remaining edges: get_schema → generate_query → run_query → format_answer → END return g.compile(checkpointer=summarizing_checkpointer) Choosing your agent pattern table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Considerations ReAct agents Structured workflows Exploratory analytics ✅ Adapts to unpredictable queries, by dynamically selecting and chaining appropriate tools at runtime ❌ Too rigid for exploration, as a fixed workflow must be manually updated to support new “what-if” path Interactive dashboards ✅ Flexible drill-down capabilities, enabling on-the-fly responses to any dashboard interaction ❌ Fixed workflow limiting, because a structured graph requires advanced enumeration API endpoint optimization ❌ Unpredictable response times, since ReAct’s dynamic reasoning loops can lead to variable pre-request latency ✅ Consistent performance, as a structured agent runs the same sequence of steps Customer-facing apps ❌ Variable behavior, as ReAct may choose different tool paths for identical outputs ✅ Predictable user experience, since a fixed workflow yields the same sequence and similar output the majority of the time Automated systems ❌ Hard to debug failures, as troubleshooting requires tracing through dynamic chain of LLM decisions and tool calls ✅ Clear failure isolation, where failures immediately point to the specific node that broke, speeding up diagnostics Conversational text-to-MQL: Maintaining query context Text-to-MQL's real power emerges in multi-turn conversations where users can build complex analytical workflows through natural dialogue. LangGraph's MongoDB checkpointing implementation preserves conversation context across interactions. LangGraph MongoDB checkpointer for stateful text-to-MQL Users can leverage the MongoDBSaver checkpointer with the following command: pip install -U langgraph-checkpoint-mongodb pymongo The MongoDBSaver checkpointer transforms text to MQL from isolated query translation into conversational analytics: from langgraph.checkpoint.mongodb import MongoDBSaver class LLMSummarizingMongoDBSaver(MongoDBSaver): def __init__(self, client, llm): super().__init__(client) self.llm = llm # ... initialize summary cache def put(self, config, checkpoint, metadata, new_versions): # Generate human-readable step summary using LLM step_summary = self.summarize_step(checkpoint) # Add summary to checkpoint metadata for debugging enhanced_metadata = metadata.copy() if metadata else {} enhanced_metadata['step_summary'] = step_summary # ... add timestamp and other metadata return super().put(config, checkpoint, enhanced_metadata, new_versions) def create_react_agent_with_enhanced_memory(): # Create ReAct agent with intelligent conversation memory summarizing_checkpointer = LLMSummarizingMongoDBSaver(client, llm) return create_react_agent(llm, toolkit.get_tools(), checkpointer=summarizing_checkpointer) Conversational workflows in practice The checkpointer enables sophisticated, multi-turn text-to-MQL conversations: def demo_conversation_memory(): thread_id = f"conversation_demo_{uuid.uuid4().hex[:8]}" conversation = [ "List the top 3 directors by movie count", "What was the movie count for the first director?", # ... additional contextual follow-up questions ] for query in conversation: # Execute each query in same thread to maintain conversation context execute_graph_with_memory(thread_id, query) What conversation memory enables: Contextual follow-ups: Users can ask "What about comedies?" after querying movie genres. Progressive refinement: Each query builds on previous results for natural drill-down analysis. Session persistence: Conversations survive application restarts and resume exactly where they left off. Multi-user isolation: Different users maintain separate conversation threads. This creates readable execution logs for debugging: Step 1 [14:23:45] User asks about movie trends Step 2 [14:23:46] Text-to-MQL discovers movies collection Step 3 [14:23:47] Generated aggregation pipeline Step 4 [14:23:48] Query validation successful Step 5 [14:23:49] Returned 15 trend results Production implementation guide Moving text-to-MQL applications from development to production requires addressing performance, monitoring, testing, and integration concerns. Performance optimization Text-to-MQL applications face unique challenges: LLM API calls are expensive while generated queries can be inefficient. Implement comprehensive optimization: # Optimization ideas for production text-to-MQL systems: class OptimizedTextToMQLAgent: def __init__(self): # Cache frequently requested queries and schema information self.query_cache = {} self.schema_cache = {} def process_query(self, user_query): # Check cache for similar queries to reduce LLM API calls if cached_result := self.check_query_cache(user_query): return cached_result # Generate new query and cache result mql_result = self.agent.invoke({"messages": [("user", user_query)]}) # ... cache result for future use return mql_result def optimize_generated_mql(query, collection_name): # Add performance hints and limits to agent-generated queries # Example: Add index hints for known collections if collection_name == 'movies' and '$sort' in str(query): query.append({'$hint': {'imdb.rating': -1}}) # Always limit result sets to prevent runaway queries if not any('$limit' in stage for stage in query): query.append({'$limit': 1000}) return query Optimization strategies: Query caching: Cache based on semantic similarity rather than exact string matching. Index hints: Map common query patterns to existing indexes for better performance. Result limits: Always add limits to prevent runaway queries from returning entire collections. Schema caching: Cache collection schemas to reduce repeated discovery operations. Read more about how to implement caching using the MongoDBCache module. Monitoring and testing Unlike traditional database applications, text-to-MQL systems require monitoring conversation state and agent decision-making: def memory_system_stats(): # Monitor text-to-MQL conversation system health db_checkpoints = client['checkpointing_db'] total_checkpoints = checkpoints.count_documents({}) total_threads = len(checkpoints.distinct('thread_id')) # ... additional metrics like average session length, memory usage return {"checkpoints": total_checkpoints, "threads": total_threads} def test_enhanced_summarization(): # Test agent with variety of query patterns test_queries = [ "How many movies are in the database?", "Find the average rating of all movies", # ... additional test queries covering different analytical patterns ] # Execute all queries in same thread to test conversation flow for query in test_queries: execute_graph_with_memory(thread_id, query) # Inspect results to verify LLM summarization quality inspect_thread_history(thread_id) def compare_agents_with_memory(query: str): # Compare ReAct vs structured workflow performance # Execute same query with both agent types execute_react_with_memory(react_thread, query) execute_graph_with_memory(graph_thread, query) return {"react_thread": react_thread, "graph_thread": graph_thread} Essential monitoring Monitoring is crucial for maintaining the reliability and performance of your text-to-MQL agents. Start by tracking conversation thread growth and average session length to understand usage patterns and memory demands over time. Keep a close eye on query success rates, response times, and large language model (LLM) API usage to identify potential performance bottlenecks. Following MongoDB monitoring best practices can help you set up robust observability across your stack. Additionally, set alerts for any degradation in key text-to-MQL performance metrics, such as increased latency or failed query generation. Finally, implement automated cleanup policies to archive or delete stale conversation threads, ensuring that your system remains performant and storage-efficient. Testing strategies Thorough testing ensures your agents produce consistent and accurate results under real-world conditions. Begin by testing semantically similar natural language queries to validate that they generate equivalent MQL results. It's also helpful to regularly compare the behavior and output of different agent execution modes—such as ReAct-style agents versus structured workflow agents—to benchmark performance and consistency. Establish baseline metrics for success rates and response times so you can track regressions or improvements over time. Don’t forget to simulate concurrent conversations and introduce varying query complexity in your tests to evaluate how your system handles real-time load and edge cases. Integration patterns Text-to-MQL agents can be integrated into applications in several ways, depending on your architecture and latency requirements. One common pattern is exposing agent functionality via RESTful endpoints or WebSocket streams, allowing client apps to send natural language queries and receive real-time responses. Alternatively, you can deploy agents as dedicated microservices, making it easier to scale, monitor, and update them independently from the rest of your system. For deeper integration, agents can be embedded directly into existing data access layers, enabling seamless transitions between traditional query logic and natural language interfaces without major architectural changes. Security and access control To safely run text-to-MQL agents in production, robust security practices must be in place. Start by implementing role-based query restrictions so that different agents or user groups have tailored access to specific data. Logging all agent-generated queries—along with the user identities and corresponding natural language inputs—creates an audit trail for traceability and debugging. To prevent runaway queries or abuse, enforce limits on query complexity and result set size. Lastly, use connection pooling strategies that can scale with agent activity while maintaining session isolation, ensuring responsiveness and security across high-traffic workloads. Production Deployment Checklist Before deploying your text-to-MQL agent system to production, it’s important to implement safeguards and best practices that ensure reliability, security, and maintainability. Start by setting appropriate resource limits, such as timeouts for both LLM API calls and MongoDB queries, to prevent long-running or stalled requests from impacting performance. Incorporate robust error handling to ensure the system can gracefully degrade or return fallback messages when query generation or execution fails. To protect your system from abuse or unintentional overuse, enforce rate limiting with per-user query limits. Maintain clear environment separation by using different agents and database connections for development, staging, and production environments, reducing the risk of cross-environment interference. Adopt configuration management practices by externalizing critical parameters such as the LLM model being used, timeout thresholds, and database settings—making it easier to update or tune the system without redeploying code. Make sure your monitoring integration includes text-to-MQL-specific metrics, tracked alongside broader application health metrics. Finally, establish a robust backup strategy that ensures conversation history and agent memory are backed up according to your organization’s data retention and recovery policies. Together, these practices create a resilient foundation for deploying intelligent agents at scale. Atlas database features supporting agents Atlas offers powerful core database features that make it a strong foundation for LangChain text-to-MQL agents. While these features aren’t specific to text-to-MQL, they provide the performance, scalability, and flexibility needed to support production-grade agentic systems. 3-in-one backend architecture Atlas can serve as a unified backend that fulfills three critical roles in an agentic stack by acting as the: Primary data store , housing your queryable application collections—such as movies, users, or analytics. Vector store for embedding-based semantic search if you’re leveraging vector search capabilities. Memory store , enabling conversation history persistence and agent checkpointing across user interactions. This 3-in-one architecture reduces the need for external services and simplifies your overall infrastructure. Single connection benefits By using a single Atlas cluster to manage your data, vectors, and memory, you streamline the development and deployment process. This unified approach minimizes configuration complexity and makes it easier to maintain your system. It also provides performance advantages through data locality—allowing your agent to query related information efficiently without needing to switch between services or endpoints. Logical database organization To keep your agent system organized and maintainable, you can logically separate storage needs within your Atlas cluster. Application data can reside in collections like movies, users, or analytics. Agent-related infrastructure—such as conversation state and memory—can be stored in a dedicated checkpointing_db . If your agent uses semantic search, vector embeddings can be stored in purpose-built vector_search collections. This structure supports clear boundaries between functionality while maintaining the simplicity of a single database backend. Future directions for text-to-MQL applications Text-to-MQL represents the foundation for several emerging application patterns: Multi-modal data interfaces: Applications that combine text-to-MQL with vector search and graph queries, enabling users to ask questions that span structured data, semantic search, and relationship analysis within single conversations. Autonomous data exploration: Text-to-MQL agents that can suggest follow-up questions and identify interesting patterns in data, guiding users through exploratory analysis workflows. Intelligent query optimization: Text-to-MQL systems that learn from usage patterns to automatically optimize query generation, suggest more efficient question phrasings, and recommend database schema improvements. Collaborative analytics: Multi-user text-to-MQL environments where teams can share conversation contexts and build on each other's analytical discoveries through natural language interfaces. These trends point toward a future where natural language becomes a powerful, flexible layer for interacting with data across every stage of the analytics and application lifecycle. Conclusion The text-to-MQL capabilities available in the LangChain MongoDB package provide the foundation for building data-driven applications with conversational interfaces. The architectural patterns shown here—ReAct agents for flexibility and structured workflows for predictability—address different technical requirements while sharing common patterns for memory management and error handling. When choosing between these patterns, consider your specific requirements: ReAct agents work well for flexible data exploration and dynamic query generation, while structured workflows provide predictable performance and easier debugging. The memory systems and production patterns demonstrated here help ensure these agents can operate reliably at scale. These implementation patterns show how to move beyond basic database APIs toward more natural, conversational data interfaces. The LangChain text-to-MQL toolkit provides the building blocks, and these patterns provide the architectural guidance for building reliable, production-ready systems. The future of application development increasingly lies in natural language interfaces for data. Text-to-MQL provides the technical foundation to build that future today, enabling applications that understand what users want to know and automatically translate those questions into precise database operations. Start building conversational database apps today The LangChain MongoDB text-to-MQL package gives you everything needed to build production-ready applications with natural language database interfaces. What's next? Get hands-on: Load the MFlix sample dataset and run your first text-to-MQL queries. Go deeper: Implement conversation memory and production patterns from our notebook . Get support: Join thousands of developers building AI-powered apps with MongoDB. Join the MongoDB Developer Community to learn about MongoDB events, discuss relevant topics, and meet other community members from around the world. Visit the MongoDB AI Learning Hub to learn more about how MongoDB can support your AI use case. Get implementation support through the MongoDB support portal. The complete implementation demonstrating these text-to-MQL patterns is available in our companion notebook, which includes both agent architectures with conversation memory and production-grade debugging capabilities specifically designed for natural language database interfaces.

June 30, 2025

MongoDB Commits to FedRAMP High and DoD Impact Level 5 Authorizations

I’m pleased to announce that today, MongoDB is taking the next step in our commitment to the U.S. public sector by pursuing Federal Risk and Authorization Management Program (FedRAMP) High and Impact Level 5 (IL5) authorizations for MongoDB Atlas for Government . Obtaining these advanced authorizations, set by FedRAMP and the U.S. Department of Defense (DoD), will expand MongoDB’s ability to support the U.S. public sector’s most sensitive and mission-critical workloads. These authorizations are essential for agencies and other federal entities handling highly sensitive data. This includes information related to national security, law enforcement, and emergency response. MongoDB Atlas for Government , with its existing FedRAMP Moderate authorization , already enables public sector organizations to meet evolving cybersecurity mandates and to accelerate their migration of legacy applications to the cloud. What are FedRAMP High and Impact Level 5 authorizations? The cloud sits at the center of many public sector organizations’ priority lists as they strive to modernize legacy applications and migrate workloads. To mitigate severe risks to operations, assets, and individuals, FedRAMP mandates protective standards for these agencies to follow when partnering with Cloud Service Providers (CSPs). In particular, agencies that handle highly sensitive, unclassified data, especially in areas such as national security, law enforcement, and emergency response, are required to meet FedRAMP’s High Baseline . This standard includes 421 stringent security controls designed to protect data related to life and safety. Impact Level 5 (IL5) is a United States Department of Defense (DoD) certification that authorizes CSPs to securely store and process some of the DoD’s highly sensitive data. The DoD uses an “Impact Level” system to classify data by its sensitivity and the potential impact of a data breach. IL5 authorization supports unclassified, yet highly sensitive information that requires strong protections. An IL5 certification permits verified vendors to handle Controlled Unclassified Information, mission-critical details, and data related to national security systems. MongoDB Atlas for Government is currently authorized at FedRAMP Moderate Baseline, which offers the U.S. public sector a dedicated, secure environment for deploying, managing, and scaling applications in the cloud. Attaining FedRAMP High and IL5 authorization will extend MongoDB’s capabilities to support agencies handling acutely sensitive and critical workloads with the highest compliance standards. MongoDB Atlas for Government: Driving cloud and AI adoption in the public sector MongoDB Atlas for Government is an independent, dedicated version of MongoDB Atlas. It is designed specifically to meet the unique needs of the U.S. public sector and independent software vendors (ISVs) developing government solutions. This AI-ready database provides the versatility and scalability required to modernize legacy applications and migrate workloads to the cloud. It operates in a secure, fully managed environment that is authorized at the FedRAMP Moderate level. MongoDB Atlas for Government is the most versatile means by which the U.S. public sector can deploy, run, and scale workloads in the cloud. With Atlas for Government, public sector organizations and ISVs can: Accelerate time to mission: Move rapidly with a database designed for developer productivity with broad workload support. Achieve multi-cloud flexibility and resilience: Distribute data across multiple cloud providers to meet an organization’s specific needs while optimizing for performance, cost, and resilience. Scale critical workloads reliably: Ensure high availability with automated backup and data recovery options, and scale on demand to meet changing needs. Protect sensitive data confidently: Meet strict government security standards with a platform that is secure by default, FedRAMP Moderate Authorized, and trusted for handling criminal justice information ( CJIS ). MongoDB Atlas for Government combines the full functionality of MongoDB’s document database with a powerful suite of data services, including Atlas Search, Vector Search, and more. This unified, multi-cloud database supports a broad range of use cases, including the creation of AI-powered applications, mobile development, transactional workloads, real-time analytics, Internet of Things (IoT) solutions, and single customer views. Atlas for Government maintains business continuity and minimizes downtime by ensuring robust resilience and comprehensive disaster recovery. Public sector organizations have the peace of mind that data is always protected, given the ~99.995% uptime SLA, auto-scaling to handle data consumption fluctuations, and automated backup and recovery. How the State of Utah used MongoDB to meet a statewide mandate The State of Utah’s Electronic Resource and Eligibility Product (eREP) is a mission-critical software system that helps caseworkers determine eligibility for public benefits programs across one of the fastest-growing states in the U.S. To comply with a statewide mandate, the Utah Department of Technology Services needed to migrate eREP off legacy on-premises infrastructure and into a FedRAMP-compliant cloud environment. The department chose MongoDB Atlas for Government and used cluster-to-cluster sync to migrate from its on-premises data center with minimal disruption. The move to a fully managed, secure cloud database platform delivered immediate performance and operational gains: 25% faster benefits calculations and document retrievals Significantly reduced management overhead for database architects Disaster recovery time cut from 58 hours to less than 5 minutes “It’s much less cumbersome to maintain our databases now that we’re using fully managed MongoDB Atlas for Government,” said Manoj Gangwar, Principal Data Architect at the Utah Department of Technology Services. Indeed, State of Utah database architects no longer have to manage shards, configurations, or load balancing. They can spend their time on more important developments, such as upgrading the rule engine at the backend of the eREP solution. With automated backups and improved resilience, eREP now delivers better service to Utah’s roughly 3.5 million citizens. It also reduces the burden on the Department of Technology Services’ IT staff. This frees up resources for modernization and innovation. Learn more here . Expanding secure cloud support with advanced authorizations FedRAMP High and IL5 authorizations are required by many public sector organizations to work securely with cloud service providers. Attaining these authorizations enables MongoDB Atlas for Government to support those highly regulated organizations. With these approvals, government agencies aiming to modernize outdated databases and embrace cloud technology will be able to rely on MongoDB Atlas for Government for secure, fully managed, and compliant operations tailored to meet stringent regulatory standards. Visit the MongoDB Atlas for Government webpage to learn more.

June 30, 2025