eugene-yesakov---edenlab

3111 results

Build AI Memory Systems with MongoDB Atlas, AWS and Claude

When working with conversational AI, most developers fall into a familiar trap: They treat memory as simple storage—write data in, read data out. But human memory doesn't work this way. Our brains actively evaluate information importance, strengthen connections through repetition, and let irrelevant details fade over time. This disconnect creates AI systems that either remember too much (overwhelming users with irrelevant details) or too little (forgetting critical context). The stakes are significant: Without sophisticated memory management, AI assistants can't provide truly personalized experiences, maintain consistent personalities, or build meaningful relationships with users. The application we're exploring represents a paradigm shift—treating AI memory not as a database problem but as a cognitive architecture challenge. This transforms AI memory from passive storage into an active, evolving knowledge network. A truly intelligent cognitive memory isn't one that never forgets, but one that forgets with intention and remembers with purpose. Imagine an AI assistant that doesn't just store information but builds a living, adaptive memory system that carefully evaluates, reinforces, and connects knowledge just like a human brain. This isn't science fiction—it's achievable today by combining MongoDB Atlas Vector Search with AWS Bedrock and Anthropic's Claude. You'll move from struggling with fragmented AI memory systems to building sophisticated knowledge networks that evolve organically, prioritize important information, and recall relevant context exactly when needed. The cognitive architecture of AI memory At its simplest, our memory system mimics three core aspects of human memory: Importance-weighted storage: Not all memories are equally valuable. Reinforcement through repetition: Important concepts strengthen over time. Contextual retrieval: Memories are recalled based on relevance to current context. This approach differs fundamentally from traditional conversation storage: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Traditional conversation storage Cognitive memory architecture Flat history retention Hierarchical knowledge graph Equal weighting of all information Importance-based prioritization Keyword or vector-only search Hybrid semantic & keyword retrieval Fixed memory lifetime Dynamic reinforcement & decay Isolated conversation fragments Connected knowledge network The practical implication is an AI that "thinks" before remembering—evaluating what information to prioritize, how to connect it with existing knowledge, and when to let less important details fade. Let's build a minimum viable implementation of this cognitive memory architecture using MongoDB Atlas , AWS Bedrock, and Anthropic's Claude. Our focus will be on creating the fundamental components that make this system work. Service architecture The following service architecture defines the foundational components and their interactions that power the cognitive memory system. Figure 1. AI memory service architecture. Built on AWS infrastructure, this comprehensive architecture connects user interactions with sophisticated memory management processes. The User Interface (Client application) serves as the entry point where humans interact with the system, sending messages and receiving AI responses enriched with conversation summaries and relevant contextual memories. At the centre sits the AI Memory Service, the critical processing hub that coordinates information flow, processes messages, and manages memory operations across the entire system. MongoDB Atlas provides a scalable, secure, multi-cloud database foundation. The system processes data through the following key functions: Bedrock Titan Embeddings for converting text to vector representations. Memory Reinforcement for strengthening important information. Relevance-based Retrieval for finding contextually appropriate memories. Anthropic’s Claude LLM handles the importance assessment to evaluate long-term storage value, memory merging for efficient information organization, and conversation summary generation. This architecture ultimately enables AI systems to maintain contextual awareness across conversations, providing more natural, consistent, and personalized interactions over time. Database structure The database structure organizes information storage with specialized collections and indexes that enable efficient semantic retrieval and importance-based memory management. Figure 2. Example of a database structure. The database design strategically separates raw conversation data from processed memory nodes to optimize performance and functionality. The Conversations Collection maintains chronological records of all interactions, preserving the complete historical context, while the Memory Nodes Collection stores higher-level semantic information with importance ratings that facilitate cognitive prioritization. Vector Search Indexes enable efficient semantic similarity searches with O(log n) performance, allowing the system to rapidly identify contextually relevant information regardless of database size. To manage storage growth automatically, TTL(Time-To-Live) Indexes expire older conversations based on configurable retention policies. Finally, Importance and User ID indexes optimize retrieval patterns critical to the system's function, ensuring that high-priority information and user-specific context can be accessed with minimal latency. Memory node structure The Memory node structure defines the data schemas that combine content with cognitive metadata to enable human-like memory operations. Figure 3. The memory node structure. Each node includes an importance score that enables memory prioritization similar to human memory processes, allowing the system to focus on what matters most. The structure tracks access count, which facilitates reinforcement learning by recording how frequently memories are retrieved. A critical feature is the summary field, providing quick semantic access without processing the full content, significantly improving efficiency. Vector embeddings within each node enable powerful semantic search capabilities that mirror human associative thought, connecting related concepts across the knowledge base. Complementing this, the ConversationMessage structure preserves raw conversational context without interpretation, maintaining the original exchange integrity. Both structures incorporate vector embeddings as a unifying feature, enabling sophisticated semantic operations that allow the system to navigate information based on meaning rather than just keywords, creating a more human-like cognitive architecture. Memory creation process The memory creation process transforms conversational exchanges into structured memory nodes through a cognitive pipeline mimicking human memory formation by thoughtfully evaluating new information against existing knowledge, rather than indiscriminately storing everything. Figure 3. The memory creation process. Through repetition, memories are strengthened via reinforcement, similar to human cognitive processes. At its core, the LLM functions as an "importance evaluator" that assigns each memory a value on a 1-10 scale, reflecting how humans naturally prioritize information based on relevance, uniqueness, and utility. This importance rating directly affects a memory's persistence, recall probability, and survival during pruning operations. As the system evolves, memory merging simulates the human brain's ability to consolidate related concepts over time, while importance updating reflects how new discoveries change our perception of existing knowledge. The framework's pruning mechanism mirrors our natural forgetting of less significant information. Rather than simply accumulating data, this dynamic system creates an evolving memory architecture that continuously refines itself through processes remarkably similar to human cognition. Memory retrieval process The memory retrieval process leverages multiple search methodologies that optimize both recall and precision to find and contextualize relevant information across conversations and memory nodes. Figure 4. The memory retrieval process. When initiated, the system converts user queries into vector embeddings while simultaneously executing parallel operations to enhance performance. The core of this system is its hybrid search methodology that combines vector-based semantic understanding with traditional text-based keyword search, allowing it to capture both conceptual similarities and exact term matches. The process directly searches memory nodes and applies different weighting algorithms to combine scores from various search methods, producing a comprehensive relevance ranking. After identifying relevant memories, the system fetches surrounding conversation context to ensure retrieved information maintains appropriate background, followed by generating concise summaries that distill essential insights. A key innovation is the effective importance calculation that dynamically adjusts memory significance based on access patterns and other usage metrics. The final step involves building a comprehensive response package that integrates the original memories, their summaries, relevance scores, and contextual information, providing users with a complete understanding of retrieved information without requiring exhaustive reading of all content. This multi-faceted approach ensures that memory retrieval is both comprehensive and precisely tailored to user needs. Code execution flowchart The code execution flowchart provides a comprehensive mapping of how API requests navigate through the system architecture, illuminating the runtime path from initial client interaction to final response delivery. Figure 5. The code execution flowchart. When a request enters the system, it first encounters the FastAPI endpoint, which serves as the primary entry point for all client communications. From there, specialized API route handlers direct the request to appropriate processing functions based on its type and intent. During processing, the system creates and stores message objects in the database, ensuring a permanent record of all conversation interactions. For human-generated messages meeting specific significance criteria, a parallel memory creation branch activates, analyzing the content for long-term storage. This selective approach preserves only meaningful information while reducing storage overhead. The system then processes queries through embedding generation, transforming natural language into vector representations that enable semantic understanding. One of the most sophisticated aspects is the implementation of parallel search functions that simultaneously execute different retrieval strategies, dramatically improving response times while maintaining comprehensive result quality. These searches connect to MongoDB Atlas to perform complex database operations against the stored knowledge base. Retrieved information undergoes context enrichment and summary generation, where the AWS Bedrock (Anthropic’s Claude) LLM augments raw data with contextual understanding and concise overviews of relevant conversation history. Finally, the response combination module assembles diverse data components—semantic matches, text-based results, contextual information, and generated summaries—into a coherent, tailored response that addresses the original request. The system's behavior can be fine-tuned through configurable parameters that govern memory processing, AI model selection, database structure, and service operations, allowing for optimization without code modifications. Memory updating process The memory updating process dynamically adjusts memory importance through sophisticated reinforcement and decay mechanisms that mimic human cognitive functions. Figure 6. The memory updating process. When new information arrives, the system first retrieves all existing user memories from the database, then methodically calculates similarity scores between this new content and each stored memory. Memories exceeding a predetermined similarity threshold are identified as conceptually related and undergo importance reinforcement and access count incrementation, strengthening their position in the memory hierarchy. Simultaneously, unrelated memories experience gradual decay as their importance values diminish over time, creating a naturally evolving memory landscape. This balanced approach prevents memory saturation by ensuring that frequently accessed topics remain prominent while less relevant information gracefully fades. The system maintains a comprehensive usage history through access counts, which informs more effective importance calculations and provides valuable metadata for memory management. All these adjustments are persistently stored in MongoDB Atlas, ensuring continuity across user sessions and maintaining a dynamic memory ecosystem that evolves with each interaction. Client integration flow The following diagram illustrates the complete interaction sequence between client applications and the memory system, from message processing to memory retrieval. This flow encompasses two primary pathways: Message sending flow: When a client sends a message, it triggers a sophisticated processing chain where the API routes it to the Conversation Service, which generates embeddings via AWS Bedrock. After storing the message in MongoDB Atlas, the Memory Service evaluates it for potential memory creation, performing importance assessment and summary generation before creating or updating a memory node in the database. The flow culminates with a confirmation response returning to the client. Check out the code reference on Github . Memory retrieval flow: During retrieval, the client's request initiates parallel search operations where query embeddings are generated simultaneously across conversation history and memory nodes. These dual search paths—conversation search and memory node search—produce results that are intelligently combined and summarized to provide contextual understanding. The client ultimately receives a comprehensive memory package containing all relevant information. Check out the code reference on Github . Figure 7. The client integration flow. The architecture deliberately separates conversation storage from memory processing, with MongoDB Atlas serving as the central persistence layer. Each component maintains clear responsibilities and interfaces, ensuring that despite complex internal processing, clients receive unified, coherent responses. Action plan: Bringing your AI memory system to life To implement your own AI memory system: Start with the core components: MongoDB Atlas, AWS Bedrock, and Anthropic’s Claude. Focus on cognitive functions: Importance assessment, memory reinforcement, relevance-based retrieval, and memory merging Tune parameters iteratively: Start with the defaults provided, then adjust based on your application's needs. Measure the right metrics: Track uniqueness of memories, retrieval precision, and user satisfaction—not just storage efficiency. To evaluate your implementation, ask these questions: Does your system effectively prioritize truly important information? Can it recall relevant context without excessive prompting? Does it naturally form connections between related concepts? Can users perceive the system's improving memory over time? Real-world applications and insights Case Study: From repetitive Q&A to evolving knowledge A customer service AI using traditional approaches typically needs to relearn user preferences repeatedly. With our cognitive memory architecture: First interaction: User mentions they prefer email communication. The system stores this with moderate importance. Second interaction: User confirms email preference. The system reinforces this memory, increasing its importance. Future interactions: The system consistently recalls email preference without asking again, but might still verify after long periods due to natural decay. The result? A major reduction in repetitive questions, leading to a significantly better user experience. Benefits Applications implementing this approach achieved unexpected benefits: Emergent knowledge graphs: Over time, the system naturally forms conceptual clusters of related information. Insight mining: Analysis of high-importance memories across users reveals shared concerns and interests not obvious from raw conversation data. Reduced compute costs: Despite the sophisticated architecture, the selective nature of memory storage reduces overall embedding and storage costs compared to retaining full conversation histories. Limitations When implementing this system, teams typically face three key challenges: Configuration tuning: Finding the right balance of importance thresholds, decay rates, and reinforcement factors requires experimentation. Prompt engineering: Getting consistent, numeric importance ratings from LLMs requires careful prompt design. Our implementation uses clear constraints and numeric-only output requirements. Memory sizing: Determining the optimal memory depth per user depends on the application context. Too shallow and the AI seems forgetful; too deep and it becomes sluggish. Future directions The landscape for AI memory systems is evolving rapidly. Here are key developments on the horizon: Short-term developments Emotion-aware memory: Extending importance evaluation to include emotional salience, remembering experiences that evoke strong reactions. Temporal awareness: Adding time-based decay that varies by information type (factual vs. preferential). Multi-modal memory: Incorporating image and voice embeddings alongside text for unified memory systems. Long-term possibilities Self-supervised memory optimization: Systems that learn optimal importance ratings, decay rates, and memory structures based on user satisfaction. Causal memory networks: Moving beyond associative memory to create causal models of user intent and preferences. Privacy-preserving memory: Implementing differential privacy and selective forgetting capabilities to respect user privacy boundaries. This approach to AI memory is still evolving. The future of AI isn't just about more parameters or faster inference—it's about creating systems that learn and remember more like humans do. With the cognitive memory architecture we've explored, you're well on your way to building AI that remembers what matters. Transform your AI applications with cognitive memory capabilities today. Get started with MongoDB Atlas for free and implement vector search in minutes. For hands-on guidance, explore our GitHub repository containing complete implementation code and examples.

June 18, 2025

Backup MongoDB Enterprise Advanced via Cohesity or Rubrik

In a world where software drives business strategy, data resilience has become a core business imperative. In fact, 90% of IT and security leaders report their organizations experienced a cyberattack in the past year, with 18% facing more than 25 attacks. 1 Every mission-critical workload must be secure, compliant, and able to recover quickly from any disruption. To help customers meet these demands, MongoDB is introducing a major data resilience enhancement: third-party backup integrations in MongoDB Enterprise Advanced . As the most flexible way to run MongoDB across on-premises, private, or hybrid cloud environments, MongoDB Enterprise Advanced now makes it even easier to integrate with customers’ existing enterprise backup tools. Previously, MongoDB Enterprise Advanced customers relied on our self-hosted database management platform, MongoDB Ops Manager , to handle backup and restore operations. For the first time, MongoDB Ops Manager now supports certified integrations with trusted vendors Cohesity and Rubrik . This enables organizations to unify MongoDB backups with the platforms they already use, streamlining operations and reinforcing existing resilience and compliance strategies. Streamlined and secure backups for enterprises As modern applications grow more complex, backup requirements scale alongside them. Enterprises managing multi-terabyte workloads or operating in highly regulated environments often need tailored solutions that match their infrastructure standards and processes. Policies may also require cold storage, where backup snapshots are stored for the long term. Cohesity DataProtect and Rubrik Security Cloud are two trusted solutions for securely backing up large volumes of data and recovering with minimal downtime. While MongoDB Ops Manager offers native backup features, these integrations provide alternatives for customers with specific vendor preferences or compliance mandates without compromising on resilience or speed. These integrations enable customers to run MongoDB on-premises or in private or hybrid clouds and: Reduce complexity by consolidating backup management into existing enterprise tools. Streamline recovery using familiar vendor platforms optimized for scale. Support compliance through enterprise-grade features like backup immutability and policy enforcement. Deliver greater support for sophisticated backup policies, including long-term storage of snapshots. Easy startup Getting started with third-party backup integrations for MongoDB Enterprise Advanced is straightforward. While the bulk of the configuration is handled on the backup provider’s side (Cohesity DataProtect or Rubrik Security Cloud), there are a few setup steps within Ops Manager to ensure a successful integration: Enabling the integration: Setting a feature flag to enable third-party backup management. Generating API keys: Creating global and project-level API keys to enable secure communication with MongoDB Ops Manager. Installing MongoDB Agents: Deploying the MongoDB Ops Manager agent on each server in the cluster. Setting permissions: Verifying that agents have read/write access to the configured directory. Connecting third-party software: Using the generated API keys to integrate with Cohesity DataProtect or Rubrik Security Cloud. Synchronizing system clocks: Ensuring consistent timestamps across machines using Network Time Protocol. Configuring the oplog export path: Defining a directory for MongoDB to store oplog data. Activating monitoring and backup: Turning on both services for each server. Marking the deployment as third-party managed: Using the UI or API to flag the cluster. For detailed setup and integration guidance, refer to the MongoDB Ops Manager documentation , as well as the Cohesity demo and Rubrik demo . With these steps complete, backup operations are managed through the third-party platform—no additional complexity inside MongoDB. For more information on these integrations, check out the announcements from Cohesity and Rubrik .

June 18, 2025

Now in Public Preview: The MongoDB for IntelliJ Plugin

The MongoDB for IntelliJ plugin empowers Java developers to build and ship applications quickly and confidently by enhancing the Database Explorer experience in the IntelliJ IDEA. After first announcing the plugin in private preview at .local London in the fall of 2024, we’ve partnered with our friends at JetBrains to release a new and improved experience in public preview. Using the MongoDB for IntelliJ plugin, developers can analyze their application code alongside their database, accelerating query development, validating accuracy, and highlighting anti-patterns with proactive performance insights. What’s in the MongoDB for IntelliJ plugin? As part of the public preview, we’re committed to ensuring that the MongoDB for IntelliJ plugin not only meets developers' technical requirements but also paves the way for a seamless developer experience with MongoDB Atlas . The MongoDB for IntelliJ plugin Public Preview offers developers the following capabilities: Field-level autocompletion for Java queries - Auto-suggests field names from MongoDB collections as developers write queries. Schema and type validation - Surfaces inline warnings when query values don’t match the expected field type based on the collection schema, and validates that a field exists in your collection’s schema. Java query execution in IntelliJ console - Allows developers to test Java queries with a single click without needing to switch tools or translate syntax. Proactive anti-pattern detection - Identifies potential performance issues (such as a query missing an index) and provides inline warnings and documentation links. Spring and Java driver support - Supports query syntax across popular Java patterns, criteria API, and aggregation patterns. Code smarter with your AI - Plugin-generated linting insights help your in-IDE AI assistant detect and resolve code issues. Figure 1. Code smarter with your AI. Benefits of using the official MongoDB for IntelliJ plugin Java development often involves working with complex, evolving data models, making MongoDB’s flexible document model an ideal choice for Java applications' data layer. The plugin provides developers with a unified experience for building with MongoDB directly inside IntelliJ, enabling faster and more focused development. By eliminating the need to switch between IntelliJ and external tools, the plugin streamlines query development and testing workflows. Features like field-level autocomplete and inline schema validation reduces errors before runtime, allowing developers to build and validate MongoDB queries with confidence and speed. Whether writing queries with the MongoDB Java driver, Spring Data, or aggregation pipelines, the plugin provides context-aware suggestions and real-time feedback that accelerate development. Additionally, the plugin proactively flags common MongoDB query anti-patterns—such as missing indexes or inefficient operators—within your line of code, helping teams catch performance issues before they hit production. With the ability to test queries directly in the IntelliJ MongoDB console and view execution metadata like query plans and durations, the plugin brings performance awareness and code correctness to where developers actually write the code for their applications. How to get started with the MongoDB for IntelliJ plugin You can get started using the MongoDB for IntelliJ plugin through the JetBrains marketplace . Questions? Feedback? Please post on our community forums or through UserVoice . We value your input as we continue to develop a compelling offering for the Java community.

June 17, 2025

Introducing Kingfisher: Real-Time Secret Detection and Validation

Foreword from Kingfisher’s developer As a Staff Security Engineer at MongoDB, I spend a lot of time thinking about how to further harden the environments that our customers rely on to protect their data. Central to that is detecting and managing exposed secrets before they turn into security risks. My role involves using an array of tools, from static code analyzers 1 to secrets managers. 2 However, I have never been fully satisfied with the tools at my disposal. Frustrated by the performance issues, limited flexibility, and high false positive rates of existing open source secret scanners, I started building my own tool in July 2024. Ten months later, that project became Kingfisher , an open-source secret scanner that goes beyond detection. It also verifies the validity of the secrets it detects. What began as a pet project has grown into a core component of MongoDB’s internal security workflows. Kingfisher now helps MongoDB’s engineering teams rapidly scan and verify secrets across Git repositories, directories, and more. Kingfisher, along with moving to short-term credentials, is our answer to the growing challenges of stolen credentials and credential-stuffing attacks. I am happy to announce that we are now releasing Kingfisher to the broader community so all developers and security teams can benefit from it. And by releasing Kingfisher as open source, we’re continuing a tradition that goes back to MongoDB’s roots—empowering developers through open, accessible tools. What is Kingfisher? Kingfisher is a high-performance, open-source secret scanning tool that combs through code repositories, Git commit histories, and file systems. Kingfisher performs this to rapidly uncover hard-coded credentials, API keys, and other sensitive data. It can be used seamlessly across GitHub and GitLab repositories, both remote and local, as well as files and directories on disk, helping security teams quickly catch exposed secrets wherever they live. However, Kingfisher goes a step beyond traditional secret scanners. Most tools simply flag anything that may look like a secret, which means engineers need to sift through false positives. Kingfisher is different. It actively validates the secrets it detects by testing them against external systems, such as the relevant cloud services or API endpoints. This dynamic approach helps identify which secrets are truly active and, thus, high-risk. Figure 1. An example of an active AWS secret access key detected and validated by Kingfisher. Figure 2. An example of an inactive Slack app token discovered and validated by Kingfisher. Figure 3. An example scan summary produced by Kingfisher showing one active secret and four inactive secrets detected. Kingfisher is designed for on-premises use, running entirely within the user’s own infrastructure. As a result, discovered secrets never leave the environment or pass through a third-party service. This ensures that developers and security teams retain full control over sensitive data without inheriting a third party’s security posture or introducing yet another external store of credentials. Kingfisher is also cloud-agnostic: It verifies credentials from AWS, Azure, Google Cloud, and any other platform in use. Unlike cloud provider-specific tools that overlook cross-cloud risks, Kingfisher supports security teams’ unified visibility and control, no matter where secrets live. Built with both performance and security in mind, Kingfisher combines extremely fast pattern matching, source code parsing, entropy analysis, and real-time validation. This all reduces noise to surface only what actually matters. It is designed for practical, real-world use, whether scanning a single repo or integrating it into a larger CI/CD pipeline. Why MongoDB built Kingfisher The threat landscape is constantly evolving, and credential-related attacks are on the rise. Stolen credentials are frequently sold on underground markets. Attackers use automated tools to launch credential-stuffing attacks that can lead to unauthorized access and serious data breaches. Traditional secret-scanning tools have not kept up. Such tools often flood teams with false positives, are slow to run, and do not confirm whether a detected secret remains active or dangerous. This means developers and security teams waste time and effort chasing down dead ends while missing actual threats. Kingfisher was built to solve this challenge. It is fast, lightweight, and designed to detect exposed secrets. It then validates them in real time by checking whether the secret remains active. By cutting through the noise and focusing on active risks, Kingfisher enables teams to respond faster and protect systems effectively. Kingfisher also helps security teams progress toward higher Supply-chain Levels for Software Artifacts (SLSA) compliance. It does this by supporting secure configuration management through proactive detection and verification of exposed secrets across codebases and repositories. At the foundational level, it supports SLSA’s core requirement of preventing secrets from being embedded in source code. This is one of the most common and critical vulnerabilities in the software supply chain. For organizations targeting SLSA Levels 2 and above, Kingfisher also helps strengthen source code integrity by reducing the risk of malicious or accidental secret exposure, which could compromise the trustworthiness of builds. Secure configuration management is a critical part of achieving higher SLSA levels. Kingfisher helps teams adopt these best practices by helping keep secrets out of source code and managing them securely throughout the development lifecycle. Figure 4. Runtime chart comparing Kingfisher with two other popular open-source secrets scanning tools. The runtime chart above presents the results of internal testing conducted by MongoDB engineers. It compares Kingfisher against two other popular open-source secret scanning tools: TruffleHog and GitLeaks . In this comparison, lower runtime values indicate superior performance. This underscores Kingfisher’s balance of speed and robust, real-time secret validation. How Kingfisher works Kingfisher is built in Rust, which was chosen for its speed, safety, and concurrency capabilities. Initially inspired by and built on top of a forked version of the Apache 2 licensed " Nosey Parker " code, Kingfisher re-engineers and extends its foundation with modern, high-performance technologies. Kingfisher’s features include: table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; } Feature Description Rust-powered performance Writing Kingfisher in Rust maximizes performance while providing memory safety. This makes it ideal for scanning large codebases without sacrificing reliability. High-speed regex matching with Hyperscan Kingfisher uses Hyperscan to handle complex and high-volume pattern matching. This engine delivers high-speed regular expression matching that enables real-time scanning on the largest code repositories. Multi-language source parsing with Tree-sitter Kingfisher employs Tree-sitter to parse source code accurately across 20+ programming languages. This enables Kingfisher to understand language-specific syntax, reducing false positives and improving detection accuracy. Efficient scanning engine In addition to its advanced parsing and regex capabilities, Kingfisher uses multi-threaded scanning to traverse files, commit histories, and binary blobs. Custom-built rules combine pattern matching with Shannon entropy checks 3 , flagging only high-confidence secret exposures. Dynamic validation Once a potential secret is detected, Kingfisher validates it by performing external checks. This includes testing database connectivity and calling cloud service APIs to confirm whether the secret is active and poses an immediate risk. Extensible rulesets Kingfisher supports a rich set of rules defined in YAML files. These rules describe the patterns and metadata to look for. This includes confidence levels, examples, and dependency rules to provide nuanced secret detection and validation. Integration ready Kingfisher is designed to be easily integrated into automated CI/CD pipelines and be used in conjunction with GitHub’s secret scanning program. This enhances its role as part of a comprehensive security strategy. How MongoDB uses Kingfisher internally At MongoDB, Kingfisher plays a critical role in safeguarding code repositories and internal systems. As part of the company's comprehensive security strategy, Kingfisher is used across various stages of MongoDB’s development and deployment pipeline. This helps secure MongoDB’s codebase and complements our move away from long-lived secrets. Below are four key ways Kingfisher is used at MongoDB: Pre-commit scanning: MongoDB developers run Kingfisher locally to catch accidentally hard-coded secrets before they commit code. CI/CD integration: Kingfisher is integrated into MongoDB’s continuous integration and deployment (CI/CD) pipelines. Thus, it automatically ensures that every build is scanned for potential secret exposure. Historical code analysis: Kingfisher scans Git commit histories to identify and remediate legacy exposures in MongoDB’s code repositories. Cloud and database validation: Kingfisher automatically tests whether a detected credential is still valid using its dynamic validation capabilities. This allows MongoDB engineers to take immediate action if a secret has been compromised. Get started with Kingfisher The development—and now release—of Kingfisher represents a major leap forward in MongoDB’s approach to securing code and infrastructure. More than a tool, it embodies our ongoing commitment to contribute open-source solutions that empower organizations to protect their critical assets against evolving cyber threats. Kingfisher builds on a solid foundation and introduces significant improvements. This includes: Real-time secret validation Enhanced accuracy with source code parsing with Tree-sitter, Over 700 rules for detecting and validating a broader range of secrets, Cross-platform support for macOS, Linux, and Windows To learn more about Kingfisher and start using it in your own workflows, visit our GitHub repository for detailed documentation and join the community discussions. 1 Tools that examine source code without executing it to identify potential errors, vulnerabilities, or code quality issues. 2 Tools used to securely store, manage, and access sensitive information like API keys, credentials, and tokens. 3 A method of measuring randomness in a string, often used to identify high-entropy values like passwords or API keys that may indicate a secret.

June 16, 2025

Digital Receipts: Mining for Customer & Business Insight with MongoDB

Imagine walking out of your favorite store and moments later receiving a personalized recommendation for a matching item, based not only on what you just bought, but your entire purchase history. This level of tailored experience has long been difficult to achieve in brick-and-mortar retail, but that’s changing thanks to digital receipts. Digital receipts are gaining traction, with Realtimes UK reporting that a quarter of UK retailers now offer them exclusively . In physical stores, traditional paper receipts represent missed opportunities: static, one-time records that serve little purpose beyond proof of purchase. In contrast, digital receipts unlock a dynamic stream of customer insights, which are a gateway to AI-powered personalization, enabling retailers to transform sales data into timely, relevant recommendations. Retailers are also seeing greater adoption of their customer loyalty apps by embedding features like digital receipts and personalized offers, giving shoppers more reasons to engage after leaving the store. Retailers are increasingly investing in digital receipts, and MongoDB enables them to digitize in-store transactions, understand shopper behavior, and deliver personalized product suggestions immediately after checkout. With MongoDB’s flexible document model , retailers can efficiently store and analyze rich transactional data, powering real-time personalization and adaptive customer experiences. It’s a smarter, data-driven approach to customer engagement, built for the physical retail world. The challenge in capturing the in-store customer journey Personalized shopping experiences are a proven driver of customer loyalty and revenue, but to deliver them effectively, retailers need a complete view of each customer’s journey. For retailers who have a brick-and-mortar presence, that’s where the gap lies. Today, many retailers are making personalization decisions based on incomplete data. While loyalty programs and customer profiles may capture some purchase history, in-store transactions often go unrecorded or take too long to turn into actionable insights. Paper receipts dominate the checkout process, and without a digital trail, these interactions are lost to the retailer’s systems. This means that even a highly engaged, in-store shopper may appear invisible when it comes to targeting and recommendations. The impact of this is twofold. First, it limits the retailer’s ability to offer relevant product suggestions, personalized promotions, or timely follow-ups, missing key opportunities to increase basket size and repeat visits. Second, it affects the customer experience, particularly in the retailer’s mobile app. Shoppers who frequent physical stores often find that their app doesn’t reflect their recent purchases or preferences, making it feel disconnected and less useful. By digitizing receipts, retailers can close this gap. Every in-store purchase becomes a rich source of insight, directly tied to the customer profile. This enables more accurate, real-time personalization, both right after checkout and in future interactions. It also adds meaningful value to the retailer’s mobile app: customers see their full purchase history, receive smarter recommendations, and access personalized offers that feel relevant. The business impact is significant: better personalization drives more revenue, while a more engaging app experience leads to higher adoption, increased usage, and stronger loyalty. Getting the most out of day-to-day data: Building a digital receipt solution Retailers aiming to enhance personalization must first digitize in-store transactional data, particularly the information generated at checkout from point-of-sale (POS) systems. However, the majority of existing POS systems have fixed, non-changeable data formats, designed primarily for payment processing. These systems often vary across store locations, lack integration with customer profiles, and don't support rapid data access. To address these challenges, retailers should centralize transaction data from all stores into a consistent and accessible format. Ensuring each purchase is reliably linked to a customer identity, through loyalty sign-ins or digital prompts, and storing that information in a manner that supports immediate, personalized engagement is crucial. Integration with POS systems is essential, allowing retailers to capture transaction data instantly and store it. A flexible document model (like MongoDB’s) stores structured, unstructured, and AI-ready data in one format, making it ideal for managing complex customer profiles and purchase histories. It captures detailed transaction data, including items, prices, context, and nested info like product attributes, preferences, and loyalty activity, all within a single document. Figure 1. MongoDB’s document model contains the data used to render the digital receipts. This image shows how MongoDB's document model supports digital receipts by instantly ingesting all receipt details. It features a MongoDB document (left) containing both purchased product information and personalized recommendations, and the digital receipt on PDF (right). It also makes the data instantly usable for personalization engines and AI models, without the need for heavy transformation or complex joins across multiple systems. Should the retailer have several different brands or types of PoS systems which data in different formats, the flexible document model allows them to be combined more easily, including fast onboarding if new types are introduced. Seamless integration allows connectivity with existing POS systems and third-party analytics tools, reducing friction in adoption. MongoDB enables this through features like real-time data ingestion with change streams, flexible data connectors for systems like Kafka, and an API-driven approach that supports REST. Combined with MongoDB Atlas ’s multi-cloud deployment support, retailers can connect and scale across diverse infrastructures without needing to re-architect their existing systems. Retailers can surface digital receipts directly in the customer-facing app, enhancing the post-purchase experience. Shoppers gain instant access to their full purchase history, enabling features like receipt lookups, easy reorders, warranty tracking, and personalized product suggestions. This drives more app adoption and keeps customers engaged beyond the store visit. To support this experience at scale, retailers need an architecture that can handle high volumes of receipt data from numerous store locations. MongoDB Atlas supports this through horizontal scalability and workload isolation, ensuring operational workloads like customer app interactions remain fast and reliable as data grows. Some retailers optimize storage by keeping receipt metadata in MongoDB while storing the full receipt in an object store like Azure Blob Storage or Google Cloud Storage, enabling a cost-effective approach. Figure 2. Architecture diagram showing the Digital Receipts components. MongoDB’s ability to serve real-time queries with low latency ensures that every tap or search in the app feels instant, helping reinforce customer trust and satisfaction. This makes the app not just a digital companion but a key driver of loyalty and repeat visits. By making digital receipts easily accessible in the app, alongside personalized recommendations and seamless post-purchase interactions, retailers create a more engaging and convenient experience that keeps customers coming back. Increased app adoption leads to more touchpoints, better data collection, and more opportunities to upsell or cross-sell, ultimately boosting revenue and retention. A notable example of a retailer leveraging MongoDB for digital receipts is Albert Heijn, the largest supermarket chain in the Netherlands . By utilizing MongoDB Atlas, Albert Heijn developed a digital receipts feature within their customer-facing app, providing shoppers with real-time and historical insights into their in-store purchases. This adoption of MongoDB Atlas led to annual savings of 25%, improved developer productivity, and a more efficient customer experience. Retailers use digital receipt data to improve personalized recommendations by combining purchase history, preferences, and behavior. Digitized receipts enable tracking of items, frequency, and context, allowing real-time linking of in-store purchases to customer profiles for more accurate, timely offers. Figure 3. Diagram showing the Digital Receipts process flow. The image illustrates the digital receipts process: 1. A customer makes a purchase in-store, 2. receives a digital receipt via email or SMS, 3. verifies it through an app, 4. accesses purchase history and personalized recommendations, and 5. can repurchase items through the app. Using MongoDB’s aggregation pipelines and change streams, retailers can process data efficiently and enable AI-driven personalization immediately after checkout. This streamlined handling of structured and unstructured receipt data supports rapid analysis of customer preferences and purchasing patterns. MongoDB's workload isolation ensures that analytical processes do not impact the performance of customer-facing applications, maintaining a seamless user experience. Retailers can enhance customer engagement by leveraging this data to offer personalized promotions, loyalty rewards, and cross-selling opportunities. Ready to embrace digital receipts? Digital receipts are reshaping how brick-and-mortar retailers unlock customer insights and deliver AI-driven personalization. With MongoDB Atlas, retailers can instantly analyze transactional data, customer preferences, and purchase history within a flexible document model, powering real-time, tailored recommendations that increase basket size, drive repeat purchases, and boost conversions. Beyond personalization, digital receipts reduce printing costs and support sustainability by eliminating paper waste, while offering customers a convenient, app-based way to access and search past purchases. The real value lies in the data: by capturing rich, real-time insights from every in-store transaction, retailers can unify physical and digital touchpoints, improving customer engagement and business agility. MongoDB’s scalable architecture and real-time processing empower retailers to adapt quickly to changing behavior and deliver seamless, data-driven experiences. Now is the time to modernize your customer engagement strategy. Digital receipts aren’t just a convenience; they’re a competitive advantage! Discover how MongoDB Atlas can help you deliver seamless customer experiences across all channels through our solutions page .

June 12, 2025

PointHealth AI: Scaling Precision Medicine for Millions

For years, the healthcare industry has grappled with a persistent, frustrating challenge: the absence of a unified, precise approach to patient treatment. Patients often endure "trial-and-error prescribing," leading to delayed recovery and a system bogged down by inefficiency. The core problem lies in scaling precision medicine—making advanced, individualized care accessible to millions of people. This was the big obstacle that Rachel Gollub, CTO and co-founder of the VC-backed startup PointHealth AI , set out to overcome. With a vision to integrate precision medicine into mainstream healthcare, Gollub and her team are transforming how care is delivered, a mission significantly bolstered by their pivotal partnership with MongoDB . Uncovering the gaps in healthcare treatment decisions Over a decade working within the insurance industry, Gollub and her co-founder, Joe Waggoner, observed a frustrating reality: persistent gaps in how treatment decisions were made. This wasn't just about inefficiency; it directly impacted patients, who often experienced "trial-and-error prescribing" that delayed their recovery. As Gollub states, they witnessed "the frustrating gaps in treatment decision-making." It motivated them to seek a better solution. The fundamental challenge they faced was scaling precision medicine. How could something so powerful be made accessible to millions rather than just a select few hundred? The biggest obstacle wasn't solely about the technology itself; it was about seamlessly integrating that technology into existing healthcare workflows. How PointHealth AI eliminates treatment guesswork PointHealth AI's approach involves a proprietary AI reinforcement learning model. This system analyzes a range of data, including similar patient cases, detailed medical histories, drug interactions, and pharmacogenomic insights. When a physician enters a diagnosis into their health record system, PointHealth AI generates a comprehensive patient report. This report offers tailored treatments, actionable insights, and clinical considerations, all designed to guide decision-making. Gollub explains the company’s mission: "to integrate precision medicine into mainstream healthcare, ensuring every diagnosis leads to the right treatment from the start." Its focus is on "eliminating guesswork and optimizing care from the very first prescription." The objective is "to deliver personalized, data-driven treatment recommendations." Its strategy for implementation involves direct partnerships with insurance companies and employers. By embedding its technology directly into these healthcare workflows, PointHealth AI aims to ensure widespread accessibility across the entire system. It’s also collaborating with health systems, electronic health record (EHR) companies, and other insurers. The natural choice: Why PointHealth AI chose MongoDB Atlas A significant enabler of this progress has been PointHealth AI's partnership with MongoDB. Gollub's prior experience with both self-hosted and managed MongoDB provided confidence in its performance and reliability. MongoDB Atlas was a "natural choice" when selecting a data platform for PointHealth AI. It offered the features the team was looking for, including vector search , text search , and managed scalability . The provision of Atlas credits also swayed the decision. PointHealth AI had specific requirements for its data platform. It needed "high security, HIPAA compliance, auto-scaling, fast throughput, and powerful search capabilities." The fact that MongoDB Atlas provided these features within a single, managed solution was huge. MongoDB Atlas ensures seamless backups and uptime through its managed database infrastructure. Its vector and text search capabilities are critical for effectively training AI models. The scaling experience has been "seamless," according to Gollub. The MongoDB team has offered "invaluable guidance in architecting a scalable system." This support has enabled PointHealth AI to optimize for performance while remaining on budget. Gollub emphasizes that "HIPAA compliance, scalability, expert support, and advisory sessions have all played critical roles in shaping our infrastructure." The MongoDB for Startups program has proven impactful. The "free technical advisor sessions provided a clear roadmap for our database architecture." The Atlas credits offered flexibility, allowing the team to "fine-tune our approach without financial strain." Furthermore, the "invaluable expert recommendations and troubleshooting support from the MongoDB advisor team" have been a vital resource. Gollub extends a "huge thank you to the MongoDB Atlas team for their support in building and scaling our system, and handling such an unusual use case." From pilots to Series A: PointHealth AI's next steps Looking forward, PointHealth AI has an ambitious roadmap for the current year. Its focus includes launching pilot installations and expanding partnerships with insurance and EHR companies. It’s also dedicated to refining its AI model to support a wider range of health conditions beyond depression. The overarching goal is to bring "precision-driven treatment recommendations to physicians and patients." The aim, Gollub said, is to "launch successful pilots, acquire new customers, and complete our Series A round." As Gollub states, "Precision medicine isn’t the future—it’s now." The team possesses the technology to deliver targeted treatment options, aiming to ensure patients receive the correct care from the outset. Their vision is to shape a healthcare system where personalized treatments are the standard. Visit PointHealth AI to learn more about how this innovative startup is making advanced, individualized care accessible to millions. Join the MongoDB for Startups program to start building faster and scaling further with MongoDB!

June 11, 2025

Scaling Vector Search with MongoDB Atlas Quantization & Voyage AI Embeddings

Key Takeaways Vector quantization fundamentals: A technique that compresses high-dimensional embeddings from 32-bit floats to lower precision formats (scalar/int8 or binary/1-bit), enabling significant performance gains while maintaining semantic search capabilities Performance vs. precision trade-offs: Binary quantization provides maximum speed (80% faster queries) with minimal resources; scalar quantization offers balanced performance and accuracy; float32 maintains highest fidelity at significant resource cost Resource optimization: Vector quantization can reduce RAM usage by up to 24x (binary) or 3.75x (scalar); storage footprint decreases by 38% using BSON binary format Scaling benefits: Performance advantages multiply at scale; most significant for vector databases exceeding 1M embeddings Semantic preservation: Quantization-aware models like Voyage AI's retain high representation capacity even after compression Search quality control: Binary quantization may require rescoring for maximum accuracy; scalar quantization typically maintains 90%+ retention of float32 results Implementation ease: MongoDB's automatic quantization requires minimal code changes to leverage quantization techniques As vector databases scale into the millions of embeddings, the computational and memory requirements of high-dimensional vector operations become critical bottlenecks in production AI systems. Without effective scaling strategies, organizations face: Infrastructure costs that grow exponentially with data volume Unacceptable query latency that degrades user experience and limits real-time applications Limited and restricted deployment options, particularly on edge devices or resource-constrained environments Diminished competitive advantage as AI capabilities become limited by technical constraints and bottlenecks rather than use case innovation This technical guide demonstrates advanced techniques for optimizing vector search operations through precision-controlled quantization—transforming resource-intensive 32-bit float embeddings into performance-optimized representations while preserving semantic fidelity. By leveraging MongoDB Atlas Vector Search ’s automatic quantization capabilities with Voyage AI's quantization-aware embedding models, we'll implement systematic optimization strategies that dramatically reduce both computational overhead and memory footprint. This guide provides an empirical analysis of the critical performance metrics: Retrieval latency benchmarking: Quantitative comparison of search performance across binary, scalar, and float32 precision levels with controlled evaluation of HNSW(hierarchical navigable small world) graph exploration parameters and k-retrieval variations. Representational capacity retention: Precise measurement of semantic information preservation through direct comparison of quantized vector search results against full-fidelity retrieval, with particular attention to retention curves across varying retrieval depths. We'll present implementation strategies and evaluation methodologies for vector quantization that simultaneously optimize for both computational efficiency and semantic fidelity—enabling you to make evidence-based architectural decisions for production-scale AI retrieval systems handling millions of embeddings. The techniques demonstrated here are directly applicable to enterprise-grade RAG architectures, recommendation engines, and semantic search applications where millisecond-level latency improvements and dramatic RAM reduction translate to significant infrastructure cost savings. The full end to end implementation for automatic vector quantization and other operations involved in RAG/Agent pipelines can be found on our Github repository . Auto-quantization of Voyage AI embeddings with MongoDB Our approach addresses the complete optimization cycle for vector search operations, covering: Generating embeddings with quantization-aware models Implementing automatic vector quantization in MongoDB Atlas Creating and configuring specialized vector search indices Measuring and comparing latency across different quantization strategies Quantifying representational capacity retention Analyzing performance trade-offs between binary, scalar, and float32 implementations Making evidence-based architectural decisions for production AI retrieval systems Figure 1. Vector quantization architecture with MongoDB Atlas and Voyage AI. Using text data as an example, we convert documents into numerical vector embeddings that capture semantic relationships. MongoDB then indexes and stores these embeddings for efficient similarity searches. By comparing queries run against float32, int8, and binary embeddings, you can gauge the trade-offs between precision and performance and better understand which quantization strategy best suits large-scale, high-throughput workloads. One key takeaway from this article is that representational capacity retention is highly dependent on the embedding model used. With quantization-aware models like Voyage AI’s voyage-3-large at appropriate dimensionality (1024 dimensions), our tests demonstrate that we can achieve 95%+ recall retention at reasonable numCandidate values. This means organizations can significantly reduce memory and computational requirements while preserving semantic search quality, provided they select embedding models specifically designed to maintain their representation capacity after quantization. For more information on why vector quantization is crucial for AI workloads, refer to this blog post . Dataset information Our quantization evaluation framework leverages two complementary datasets designed specifically to benchmark semantic search performance across different precision levels. Primary Dataset ( Wikipedia-22-12-en-voyage-embed ): Contains approximately 300,000 Wikipedia article fragments with pre-generated 1024-dimensional embeddings from Voyage AI’s voyage-3-large model. This dataset serves as a diverse vector corpus for testing vector quantization effects in semantic search. Throughout this tutorial, we'll use the primary dataset to demonstrate the technical implementation of quantization. Embedding generation with Voyage AI For generating new embeddings for AI Search applications, we use Voyage AI's voyage-3-large model, which is specifically designed to be quantization-aware. The voyage-3-large model generates 1024-dimensional vectors and has been specifically trained to maintain semantic properties even after quantization, making it ideal for our AI retrieval optimization strategy. For more information on how MongoDB and Voyage AI work together for optimal retrieval, see our previous article, Rethinking Information Retrieval with MongoDB and Voyage AI . import voyageai # Initialize the Voyage AI client client = voyageai.Client() def get_embedding(text, task_prefix="document"): """ Generate embeddings using the voyage-3-large model for AI Retrieval. Parameters: text (str): The input text to be embedded. task_prefix (str): A prefix describing the task; this is prepended to the text. Returns: list: The embedding vector (1024 dimensions). """ if not text.strip(): print("Attempted to get embedding for empty text.") return [] # Call the Voyage API to generate the embedding result = client.embed([text], model="voyage-3-large", input_type=task_prefix) # Return the first embedding from the result return result.embeddings[0] Converting embeddings to BSON BinData format A critical optimization step is converting embeddings to MongoDB's BSON BinData format , which significantly reduces storage and memory requirements. The BinData vector format provides significant advantages: Reduces disk space by approximately 3x compared to arrays Enables more efficient indexing with alternate types (int8, binary) Reduces RAM usage by 3.75x for scalar and 24x for binary quantization from bson.binary import Binary, BinaryVectorDtype def generate_bson_vector(array, data_type): return Binary.from_vector(array, BinaryVectorDtype(data_type)) # Convert embeddings to BSON BinData vector format wikipedia_data_df["embedding"] = wikipedia_data_df["embedding"].apply( lambda x: generate_bson_vector(x, BinaryVectorDtype.FLOAT32) ) Vector index creation with different quantization strategies The cornerstone of our performance optimization framework lies in creating specialized vector indices with different quantization strategies. This process leverages MongoDB for general-purpose database functionalities, more specifically, its high-performance vector database capabilities of efficiently handling million-scale embedding collections. This implementation step focuses on how to set up MongoDB's vector search capabilities with automatic quantization, focusing on two primary quantization strategies: scalar (int8) and binary. Two indices are created to measure and evaluate the retrieval latency and recall performance of various precision data types, including the full fidelity vector representation. The MongoDB database uses the vector index HNSW, which is a graph-based indexing algorithm that organizes vectors in a hierarchical structure of layers. In this structure, vector data points within a layer are contextually similar, while higher layers are sparse compared to lower layers, which are denser and contain more vector data points. The code snippet below showcases the implementation of two quantization strategies in parallel; this enables the systematic evaluation of the latency, memory usage, and representational capacity trade-offs across the precision spectrum, enabling data-driven decisions about the optimal approach for specific application requirements. MongoDB Atlas automatic quantization is activated entirely through the vector index definition. By including the "quantization" attribute and setting its value to either "scalar" or "binary", you enable automatic compression of your embeddings at index creation time. This declarative approach means no separate preprocessing of vectors is required—MongoDB handles the dimensional reduction transparently while maintaining the original embeddings for potential rescoring operations. from pymongo.operations import SearchIndexModel def setup_vector_search_index(collection, index_definition, index_name="vector_index"): """Setup a vector search index with the specified configuration""" ... # 1. Scalar Quantized Index (int8) vector_index_definition_scalar_quantized = { "fields": [{ "type": "vector", "path": "embedding", "quantization": "scalar", # Uses int8 quantization "numDimensions": 1024, "similarity": "cosine", }] } # 2. Binary Quantized Index (1-bit) vector_index_definition_binary_quantized = { "fields": [{ "type": "vector", "path": "embedding", "quantization": "binary", # Uses binary (1-bit) quantization "numDimensions": 1024, "similarity": "cosine", }] } # 3. Float32 ANN Index (no quantization) vector_index_definition_float32_ann = { "fields": [{ "type": "vector", "path": "embedding", "numDimensions": 1024, "similarity": "cosine", }] } # Create the indices setup_vector_search_index( wiki_data_collection, vector_index_definition_scalar_quantized, "vector_index_scalar_quantized" ) setup_vector_search_index( wiki_data_collection, vector_index_definition_binary_quantized, "vector_index_binary_quantized" ) setup_vector_search_index( wiki_data_collection, vector_index_definition_float32_ann, "vector_index_float32_ann" ) Implementing vector search functionality Vector search serves as the computational foundation of modern generative AI systems. While LLMs provide reasoning and generation capabilities, vector search delivers the contextual knowledge necessary for grounding these capabilities in relevant information. This semantic retrieval operation forms the backbone of RAG architectures that power enterprise-grade AI applications, such as knowledge-intensive chatbots and domain-specific assistants. In more advanced implementations, vector search enables agentic RAG systems where autonomous agents dynamically determine what information to retrieve, when to retrieve it, and how to incorporate it into complex reasoning chains. The implementation below provides the technical overview that transforms raw embedding vectors into intelligent search components that move beyond lexical matching to true semantic understanding. Our implementation below supports both approximate nearest neighbor (ANN) search and exact nearest neighbor (ENN) search through the use_full_precision parameter: Approximate nearest neighbor (ANN) search: When use_full_precision = False , the system performs an approximate search using: The specified quantized index (binary or scalar) The HNSW graph navigation algorithm A controlled exploration breadth via numCandidates This approach sacrifices perfect accuracy for dramatic performance gains, particularly at scale. The HNSW algorithm enables sub-linear time complexity by intelligently sampling the vector space, making it possible to search billions of vectors in milliseconds instead of seconds. When combined with quantization, ANN delivers order-of-magnitude improvements in both speed and memory efficiency. Exact nearest neighbor (ENN) search: When use_full_precision = True , the system performs exact search using: The original float32 embeddings (regardless of the index specified) An exhaustive comparison approach The exact = True directive to bypass approximation techniques ENN guarantees finding the mathematically optimal nearest neighbors by computing distances between the query vector and every single vector in the database. This brute-force approach provides perfect recall but scales linearly with collection size, becoming prohibitively expensive as vector counts increase beyond millions. We include both search modes for several critical reasons: Establishing ground truth: ENN provides the "perfect" baseline against which we measure the quality degradation of approximation techniques. The representational retention metrics discussed later directly compare ANN results against this ENN ground truth. Varying application requirements: Not all AI applications prioritize the same metrics. Time-sensitive applications (real-time customer service) might favor ANN's speed, while high-stakes applications (legal document analysis) might require ENN's accuracy. def custom_vector_search( user_query, collection, embedding_path, vector_search_index_name="vector_index", top_k=5, num_candidates=25, use_full_precision=False, ): """ Perform vector search with configurable precision and parameters for AI Search applications. """ # Generate embedding for the query query_embedding = get_embedding(user_query, task_prefix="query") # Define the vector search stage vector_search_stage = { "$vectorSearch": { "index": vector_search_index_name, "queryVector": query_embedding, "path": embedding_path, "limit": top_k, } } # Configure search precision approach if not use_full_precision: # For approximate nearest neighbor (ANN) search vector_search_stage["$vectorSearch"]["numCandidates"] = num_candidates else: # For exact nearest neighbor (ENN) search vector_search_stage["$vectorSearch"]["exact"] = True # Project only needed fields project_stage = { "$project": { "_id": 0, "title": 1, "text": 1, "wiki_id": 1, "url": 1, "score": {"$meta": "vectorSearchScore"} } } # Build and execute the pipeline pipeline = [vector_search_stage, project_stage] ... # Execute the query results = list(collection.aggregate(pipeline)) return {"results": results, "execution_time_ms": execution_time_ms} Measuring the retrieval latency of various quantized vectors In production AI retrieval systems, query latency directly impacts user experience, operational costs, and system throughput capacity. Vector search operations typically constitute the primary performance bottleneck in RAG architectures, making latency optimization a critical engineering priority. Sub-100ms response times are often necessary for interactive applications and mission-critical applications, while batch processing systems may tolerate higher latencies but require consistent predictability for resource planning. Our latency measurement methodology employs a systematic, parameterized approach that models real-world query patterns while isolating the performance characteristics of different quantization strategies. This parameterized benchmarking enables us to: Construct detailed latency profiles across varying retrieval depths Identify performance inflection points where quantization benefits become significant Map the scaling curves of different precision levels as the data volume increases Determine optimal configuration parameters for specific throughput targets def measure_latency_with_varying_topk( user_query, collection, vector_search_index_name, use_full_precision=False, top_k_values=[5, 10, 50, 100], num_candidates_values=[25, 50, 100, 200, 500, 1000, 2000], ): """ Measure search latency across different configurations. """ results_data = [] for top_k in top_k_values: for num_candidates in num_candidates_values: # Skip invalid configurations if num_candidates < top_k: continue # Get precision type from index name precision_name = vector_search_index_name.split("vector_index")[1] precision_name = precision_name.replace("quantized", "").capitalize() if use_full_precision: precision_name = "_float32_ENN" # Perform search and measure latency vector_search_results = custom_vector_search( user_query=user_query, collection=collection, embedding_path="embedding", vector_search_index_name=vector_search_index_name, top_k=top_k, num_candidates=num_candidates, use_full_precision=use_full_precision, ) latency_ms = vector_search_results["execution_time_ms"] # Store results results_data.append({ "precision": precision_name, "top_k": top_k, "num_candidates": num_candidates, "latency_ms": latency_ms, }) print(f"Top-K: {top_k}, NumCandidates: {num_candidates}, " f"Latency: {latency_ms} ms, Precision: {precision_name}") return results_data Latency results analysis Our systematic benchmarking reveals dramatic performance differences between quantization strategies across different retrieval scenarios. The visualizations below capture these differences for top-k=10 and top-k=100 configurations. Figure 2. Search latency vs the number candidates for top-k=10 Figure 3. Search latency vs the number of candidates for top-k=100. Several critical patterns emerge from these latency profiles: Quantization delivers exponential performance gains: The float32_ENN approach (purple line) demonstrates latency measurements an order of magnitude higher than any quantized approach. At top-k=10, ENN latency starts at ~1600ms and never drops below 500ms, while quantized approaches maintain sub-100ms performance until extremely high candidate counts. This performance gap widens further as data volume scales. Scalar quantization offers the best performance profile: Somewhat surprisingly, scalar quantization (orange line) consistently outperforms both binary quantization and float32 ANN across most configurations. This is particularly evident at higher num_candidates values, where scalar quantization maintains near-flat latency scaling. This suggests scalar quantization achieves an optimal balance in the memory-computation trade-off for HNSW traversal. Binary quantization shows linear latency scaling: While binary quantization (red line) starts with excellent performance, its latency increases more steeply as num_candidates grows, eventually exceeding scalar quantization at very high exploration depths. This suggests that while binary vectors require less memory, their distance computation savings are partially offset by the need for more complex traversal patterns in the HNSW graph and rescoring. All quantization methods maintain interactive-grade performance: Even with 10,000 candidate explorations and top-k=100, all quantized approaches maintain sub-200ms latency, well within interactive application requirements. This demonstrates that quantization enables order-of-magnitude increases in exploration depth without sacrificing user experience, allowing for dramatic recall improvements while maintaining acceptable latency. These empirical results validate our theoretical understanding of quantization benefits and provide concrete guidance for production deployment: scalar quantization offers the best general-purpose performance profile, while binary quantization excels in memory-constrained environments with moderate exploration requirements. In the images below we employ logarithmic scaling for both axes in our latency analysis because search performance data typically spans multiple orders of magnitude. When comparing different precision types (scalar, binary, float32_ann) across varying numbers of candidates, the latency values can range from milliseconds to seconds, while candidate counts may vary from hundreds to millions. Linear plots would compress smaller values and make it difficult to observe performance trends across the full range(as we see above). Logarithmic scaling transforms exponential relationships into linear ones, making it easier to identify proportional changes, compare relative performance improvements, and detect patterns that would otherwise be obscured. This visualization approach is particularly valuable for understanding how each precision type scales with increasing workload and for identifying the optimal operating ranges where certain methods outperform others(as shown below). Figure 4. Search latency vs the number of candidates (log scale) for top-k=10. Figure 5. Search latency vs the number of candidates (log scale) for top-k=100. The performance characteristics observed in the logarithmic plots above directly reflect the architectural differences inherent in binary quantization's two-stage retrieval process. Binary quantization employs a coarse-to-fine search strategy: an initial fast retrieval phase using low-precision binary representations, followed by a refinement phase that rescores the top-k candidates using full-precision vectors to restore accuracy. This dual-phase approach creates a fundamental performance trade-off that manifests differently across varying candidate pool sizes. For smaller candidate sets, the computational savings from binary operations during the initial retrieval phase can offset the rescoring overhead, making binary quantization competitive with other methods. However, as the candidate pool expands, the rescoring phase—which must compute full-precision similarity scores for an increasing number of retrieved candidates—begins to dominate the total latency profile. Measuring representational capacity retention While latency optimization is critical for operational efficiency, the primary concern for AI applications remains semantic accuracy. Vector quantization introduces a fundamental trade-off: computational efficiency versus representational capacity. Even the most performant quantization approach is useless if it fails to maintain the semantic relationships encoded in the original embeddings. To quantify this critical quality dimension, we developed a systematic methodology for measuring representational capacity retention—the degree to which quantized vectors preserve the same nearest-neighbor relationships as their full-precision counterparts. This approach provides an objective, reproducible framework for evaluating semantic fidelity across different quantization strategies. def measure_representational_capacity_retention_against_float_enn( ground_truth_collection, collection, quantized_index_name, top_k_values, num_candidates_values, num_queries_to_test=1, ): """ Compare quantized search results against full-precision baseline. For each test query: 1. Perform baseline search with float32 exact search 2. Perform same search with quantized vectors 3. Calculate retention as % of baseline results found in quantized results """ retention_results = {"per_query_retention": {}} overall_retention = {} # Initialize tracking structures for top_k in top_k_values: overall_retention[top_k] = {} for num_candidates in num_candidates_values: if num_candidates < top_k: continue overall_retention[top_k][num_candidates] = [] # Get precision type precision_name = quantized_index_name.split("vector_index")[1] precision_name = precision_name.replace("quantized", "").capitalize() # Load test queries from ground truth annotations ground_truth_annotations = list( ground_truth_collection.find().limit(num_queries_to_test) ) # For each annotation, test all its questions for annotation in ground_truth_annotations: ground_truth_wiki_id = annotation["wiki_id"] ... # Calculate average retention for each configuration avg_overall_retention = {} for top_k, cand_dict in overall_retention.items(): avg_overall_retention[top_k] = {} for num_candidates, retentions in cand_dict.items(): if retentions: avg = sum(retentions) / len(retentions) else: avg = 0 avg_overall_retention[top_k][num_candidates] = avg retention_results["average_retention"] = avg_overall_retention return retention_results Our methodology takes a rigorous approach to retention measurement: Establishing ground truth: We use float32 exact nearest neighbor (ENN) search as the baseline "perfect" result set, acknowledging that these are the mathematically optimal neighbors. Controlled comparison: For each query in our annotation dataset, we perform parallel searches using different quantization strategies, carefully controlling for top-k and num_candidates parameters. Retention calculation: We compute retention as the ratio of overlapping results between the quantized search and the ENN baseline: |quantized_results ∩ baseline_results| / |baseline_results|. Statistical aggregation: We average retention scores across multiple queries to account for query-specific variations and produce robust, generalizable metrics. This approach provides a direct, quantitative measure of how much semantic fidelity is preserved after quantization. A retention score of 1.0 indicates that the quantized search returns exactly the same results as the full-precision search, while lower scores indicate divergence. Representational capacity results analysis The findings from the representational capacity retention evaluation provide empirical validation that properly implemented quantization—particularly scalar quantization—can maintain semantic fidelity while dramatically reducing computational and memory requirements. Note that in the chart below, the scalar curve (yellow) exactly matches the float32_ann performance (blue)—so much so that the blue line is completely hidden beneath the yellow. The near-perfect retention of scalar quantization should alleviate concerns about quality degradation, while binary quantization's retention profile suggests it's suitable for applications with higher performance demands that can tolerate slight quality trade-offs or compensate with increased exploration depth. Figure 6. Retention score vs the number of candidates for top-k=10. Figure 7. Retention score vs the number of candidates for top-k=50. Figure 8. Retention score vs the number of candidates for top-k=100. Scalar quantization achieves near-perfect retention: The scalar quantization approach (orange line) demonstrates extraordinary representational capacity preservation, achieving 98-100% retention across nearly all configurations. At top-k=10, it reaches perfect 1.0 retention with just 100 candidates, effectively matching full-precision ENN results while using 4x less memory. This remarkable performance validates the effectiveness of int8 quantization when implemented with MongoDB's automatic quantization. Binary quantization shows retention-exploration trade-off: Binary quantization (red line) exhibits a clear correlation between exploration depth and retention quality. At top-k=10, it starts at ~91% retention with minimal candidates but improves to 98% at 500 candidates. The effect is more pronounced at higher top-k values (50 and 100), where initial retention drops to ~74% but recovers substantially with increased exploration. This suggests that binary quantization's information loss can be effectively mitigated by exploring more of the vector space. Retention dynamics change with retrieval depth: As top-k increases from 10 to 100, the retention patterns become more differentiated between quantization strategies. This reflects the increasing challenge of maintaining accurate rankings as more results are requested. While scalar quantization remains relatively stable across different top-k values, binary quantization shows more sensitivity, indicating it's better suited for targeted retrieval scenarios (low top-k) than for broad exploration. Exploration depth compensates for precision loss: A fascinating pattern emerges across all quantization methods: increased num_candidates consistently improves retention. This demonstrates that reduced precision can be effectively counterbalanced by broader exploration of the vector space. For example, binary quantization at 500 candidates achieves better retention than scalar quantization at 25 candidates, despite using 32x less memory per vector. Float32 ANN vs. scalar quantization: The float32 ANN approach (blue line) shows virtually identical retention to scalar quantization at higher top-k values, while consuming 4x more memory. This suggests scalar quantization represents an optimal balance point, offering full-precision quality with significantly reduced resource requirements. Conclusion This guide has demonstrated the powerful impact of vector quantization in optimizing vector search operations through MongoDB Atlas Vector Search and automatic quantization feature, using Voyage AI embeddings. These findings provide empirical validation that properly implemented quantization—particularly scalar quantization—can maintain semantic fidelity while dramatically reducing computational and memory requirements. The near-perfect retention of scalar quantization should alleviate concerns about quality degradation, while binary quantization's retention profile suggests it's suitable for applications with higher performance demands that can tolerate slight quality trade-offs or compensate with increased exploration depth. Binary quantization achieves optimal latency and resource efficiency, particularly valuable for high-scale deployments where speed is critical. Scalar quantization provides an effective balance between performance and precision, suitable for most production applications. Float32 maintains maximum accuracy but incurs significant performance and memory costs. Figure 9. Performance and memory usage metrics for binary quantization, scalar quantization, and float32 implementation. Based on the image above our implementation demonstrated substantial efficiency gains: Binary Quantized Index achieves the most compact disk footprint at 407.66MB, representing approximately 4KB per document. This compression comes from representing high-dimensional vectors as binary bits, dramatically reducing storage requirements while maintaining retrieval capability. Float32 ANN Index requires 394.73MB of disk space, slightly less than binary due to optimized index structures, but demands the full storage footprint be loaded into memory for optimal performance. Scalar Quantized Index shows the largest storage requirement at 492.83MB (approximately 5KB per document), suggesting this method maintains higher precision than binary while still applying compression techniques, resulting in a middle-ground approach between full precision and extreme quantization. The most striking difference lies in memory requirements. Binary quantization demonstrates a 23:1 memory efficiency ratio, requiring only 16.99MB in RAM versus the 394.73MB needed by float32_ann. Scalar quantization provides a 3:1 memory optimization, requiring 131.42MB compared to float32_ann's full memory footprint. For production AI Retrieval implementation, general guidance is as follows: Use scalar quantization for general use cases requiring good balance of speed and accuracy. Use binary quantization for large-scale applications (1M+ vectors) where speed is critical. Use float32 only for applications requiring maximum precision, where accuracy is paramount. Vector quantization becomes particularly valuable for databases exceeding 1M vectors, where it enables significant scalability improvements without compromising retrieval accuracy. When combined with MongoDB Atlas Search Nodes , this approach effectively addresses both cost and performance constraints in advanced vector search applications. Boost your MongoDB skills today through our Atlas Learning Hub . Head over to our quick start guide to get started with Atlas Vector Search.

June 10, 2025

Enhancing AI Observability with MongoDB and Langtrace

Building high-performance AI applications isn’t just about choosing the right models—it’s also about understanding how they behave in real-world scenarios. Langtrace offers the tools necessary to gain deep insights into AI performance, ensuring efficiency, accuracy, and scalability. San Francisco-based Langtrace AI was founded in 2024 with a mission of providing cutting-edge observability solutions for AI-driven applications. While still in its early stages, Langtrace AI has rapidly gained traction in the developer community, positioning itself as a key player in AI monitoring and optimization. Its open-source approach fosters collaboration, enabling organizations of all sizes to benefit from advanced tracing and evaluation capabilities. The company’s flagship product, Langtrace AI, is an open-source observability tool designed for building applications and AI agents that leverage large language models (LLMs). Langtrace AI enables developers to collect and analyze traces and metrics, optimizing performance and accuracy. Built on OpenTelemetry standards, Langtrace AI offers real-time tracing, evaluations, and metrics for popular LLMs, frameworks, and vector databases, with integration support for both TypeScript and Python. Beyond its core observability tools, Langtrace AI is continuously evolving to address the challenges of AI scalability and efficiency. By leveraging OpenTelemetry, the company ensures seamless interoperability with various observability vendors. Its strategic partnership with MongoDB enables enhanced database performance tracking and optimization, ensuring that AI applications remain efficient even under high computational loads. Langtrace AI's technology stack Langtrace AI is built on a streamlined—yet powerful—technology stack, designed for efficiency and scalability. Its SDK integrates OpenTelemetry libraries, ensuring tracing without disruptions. On the backend, MongoDB works with the rest of their tech stack, to manage metadata and trace storage effectively. For the client-side, Next.js powers the interface, utilizing cloud-deployed API functions to deliver robust performance and scalability. Figure 1. How Langtrace AI uses MongoDB Atlas to power AI traceability and feedback loops “We have been a MongoDB customer for the last three years and have primarily used MongoDB as our metadata store. Given our longstanding confidence in MongoDB's capabilities, we were thrilled to see the launch of MongoDB Atlas Vector Search and quickly integrated it into our feedback system, which is a RAG (retrieval-augmented generation) architecture that powers real-time feedback and insights from our users. Eventually, we added native support to trace MongoDB Atlas Vector Search to not only trace our feedback system but also to make it natively available to all MongoDB Atlas Vector Search customers by partnering officially with MongoDB.” Karthik Kalyanaraman, Co Founder and CTO, Langtrace AI. Use cases and impact The integration of Langtrace AI with MongoDB has proven transformative for developers using MongoDB Atlas Vector Search . As highlighted in Langtrace AI's MongoDB partnership announcement , our collaboration equips users with the tools needed to monitor and optimize AI applications, enhancing performance by tracking query efficiency, identifying bottlenecks, and improving model accuracy. The partnership enhances observability within the MongoDB ecosystem, facilitating faster, more reliable application development. Integrating MongoDB Atlas with advanced observability tools like Langtrace AI offers a powerful approach to monitoring and optimizing AI-driven applications. By tracing every stage of the vector search process—from embedding generation to query execution—MongoDB Atlas provides deep insights that allow developers to fine-tune performance and ensure smooth, efficient system operations. To explore how Langtrace AI integrates with MongoDB Atlas for real-time tracing and optimization of vector search operations, check out this insightful blog by Langtrace AI, where they walk through the process in detail. Opportunities for growth and the evolving AI ecosystem Looking ahead, Langtrace AI is excited about the prospects of expanding the collaboration with MongoDB. As developers craft sophisticated AI agents using MongoDB Atlas, the partnership aims to equip them with the advanced tools necessary to fully leverage these powerful database solutions. Together, both companies support developers in navigating increasingly complex AI workflows efficiently. As the AI landscape shifts towards non-deterministic systems with real-time decision-making, the demand for advanced observability and developer tools intensifies. MongoDB is pivotal in this transformation, providing solutions that optimize AI-driven applications and ensuring seamless development as the ecosystem evolves. Explore further Interested in learning more about Langtrace AI and MongoDB partnership? Discover the enriching capabilities Langtrace AI brings to developers within the MongoDB ecosystem. Learn about tracing MongoDB Atlas Vector Search with Langtrace AI to improve AI model performance. Access comprehensive documentation for integrating Langtrace AI with MongoDB Atlas. Start enhancing your AI applications today and experience the power of optimized observability. To learn more about building AI-powered apps with MongoDB, check out our AI Learning Hub and stop by our Partner Ecosystem Catalog to read about our integrations with MongoDB’s ever-evolving AI partner ecosystem.

June 9, 2025

What I Wish I’d Known Before Becoming a Solutions Architect

My journey to becoming a solutions architect (SA) has been anything but straightforward. After working as an engineer in telecom, receiving my PhD in computer science, and spending time in the energy efficiency and finance industries, I joined MongoDB to work at the intersection of AI and data solutions, guiding enterprises to success with MongoDB’s flexible, scalable database platform. It’s a role that requires having both deep technical knowledge and business acumen, and while the nature of the SA role has evolved over time, one thing has remained constant: the need to understand people, their problems, and how the technology we use can solve them. As I reflect on my career journey, here are some key lessons I’ve learned about being an SA—and things I wish I’d known when I first started. 1. Influence comes from understanding In my earlier roles, I thought that presenting clients with a perfect technical solution was the key to success. However, I quickly learned that being a successful solutions architect requires much more than technical excellence. Instead, the solutions that you offer need to be aligned with customers’ business needs. You also need to understand the underlying challenges driving the conversation. In my role, I frequently work with clients facing complex data challenges, whether in real-time analytics, scaling operations, or AI applications. The first step is always understanding their business goals and technical pain points, which is more important than simply proposing the “best” solution. By stepping back and listening, you can not only better design a solution that addresses their needs but also gain their trust. I’ve found that the more I understand the context, the better I can guide clients through the complexities of data architecture—whether they're building on MongoDB Atlas, optimizing for performance, or leveraging our data products to drive innovation. What I wish I’d known: Influence doesn’t come from showing how much you know—it comes from showing how much you understand. Listening is your most powerful design tool. 2. Building champions drives success You can build the most scalable, secure, and elegant system in the world — but if it doesn’t align with stakeholder priorities, it will stall. In reality, architecture is rarely a purely technical exercise. Success depends on alignment with a diverse set of stakeholders, each with their own priorities. Whether you're collaborating with engineering teams, product managers, security specialists, or leadership, the key to success is to engage everyone early and often. Stakeholders are not just passive recipients of your solution; they are active participants who co-own the outcome. In many cases, your design will be shaped by their feedback, and finding a champion within the organization can make all the difference. This champion—whether from the technical side or the business side—will help advocate for your solution internally, align the team, and overcome any resistance. This is particularly important for MongoDB SAs because we’re often addressing diverse needs, from data privacy concerns to performance scalability. Building a strong internal advocate ensures that your design gains the necessary momentum and credibility within the client’s organization. What I wish I’d known: Success doesn’t come from being right—it comes from being aligned. Influence is earned through empathy, clarity, and trust. As a solutions architect, your greatest value is not just in solving technical problems—it’s in helping diverse teams pull in the same direction. And nothing accelerates that more than having a strong, trusted internal champion on your side. 3. Winning deals requires teamwork At MongoDB, we’re not just selling a product—we’re selling a solution. Winning deals involves close collaboration with Sales, Engineering, and Client Services. The most successful deals come when the entire team is aligned, from understanding the customer’s unique needs to crafting a solution that fits their long-term goals. You want to win? Here’s what that actually looks like: You prep with sales like it’s a final exam. Know the account history, know the politics, know what was promised six months ago that never landed. Be the person who connects past pain to future value. You do dry runs and anticipate the tough questions. Then you hand those questions to someone else on your team who can knock them out of the park. That’s trust. You turn strategy decks into conversations . A flashy diagram is great, but asking “Does this actually solve the headache you told us about last week?” — that’s where momentum starts. You loop in Professional Services early to pressure-test feasibility. You loop in CSMs to ask, “If we win this, what does success look like a year from now?” You help sales write the follow-up  — not just with a thank-you, but with a crisp summary of what we heard, what we proposed, and what comes next. You make the path forward obvious. One of the most valuable lessons I’ve learned is that winning a deal doesn’t rely solely on delivering a flawless demo. It’s the little things that matter—anticipating questions, making quick adjustments based on client feedback, and being agile in your communication. Being part of a unified team that works seamlessly together is the key to winning deals and ensuring client success. What I wish I’d known: Winning a deal is a series of micro-decisions made together, not a solo act. Great architecture doesn’t close a deal—great alignment does. Your best asset isn’t the system you design—it’s the trust you build with your team and the confidence you project to your client that we’ve got this. Together. 4. You don’t have to know everything When I first transitioned into this role, I felt the pressure to master every piece of the tech stack—especially at MongoDB, where our solutions touch on everything from cloud data platforms to AI, real-time data processing, and beyond. It was overwhelming to think that I needed to be an expert in all of it. But here’s the truth: As a solutions architect, your real value lies not in knowing every detail, but in understanding how the pieces fit together. You don’t need to be the deepest expert in each technology—what’s important is knowing how MongoDB’s platform integrates with client needs and when to bring in the right specialists. The role is about connecting the dots, asking the right questions, and collaborating across teams. The more you embrace curiosity and rely on your colleagues, the better your solutions will be. What I wish I’d known: Mastery isn’t about knowing all the answers. It’s about knowing which questions to ask, and who to ask them to. Focus on principles, patterns, and clarity. Let go of the pressure to be the smartest person at the table—you’re there to make the table work better together. Curiosity is your compass, and collaboration is your fuel. 5. Architecture lives beyond the diagram When most people think of a solutions architect, they picture designing systems, building technical architectures, and drawing elegant diagrams. While that’s part of the job, the true value lies in how well those designs are communicated, understood, and adopted by the client. Specifically, your architecture needs to work in real-world scenarios. You’re not just drawing idealized diagrams on a whiteboard—you’re helping clients translate those ideas into actionable steps. That means clear communication, whether through shared documentation, interactive walkthroughs, or concise explanations. Understanding your client’s needs and constraints is just as important as the technical design itself. And when it comes to sizing and scaling, MongoDB’s flexibility makes it easy to adapt and grow as the business evolves. What I wish I knew: Architecture doesn’t end at the diagram—it begins there. The real value is realized in how well the design is communicated, contextualized, sized, and adopted. Use whatever format helps people get it. And before you document the system, understand the system of people and infrastructure you’re building it for. 6. It’s not just about data Data may be the foundation of my work as a solutions architect, but the real magic happens when you connect with people. Being a great architect means being a great communicator, listener, and facilitator. You’ll frequently find yourself between business leaders seeking faster insights and developers looking for the right data model. Translating these needs and building consensus is a big part of the role. The solutions we design are only valuable if they meet the diverse needs of the client’s teams. Whether it’s simplifying data operations, optimizing query performance, or enabling AI-driven insights, your ability to connect with stakeholders and address their unique challenges is key. Emotional intelligence, empathy, and collaboration are essential. What I wish I’d known: Being a great architect means being a great communicator, listener, and facilitator. Emotional intelligence is your secret weapon. The more time you invest in understanding your stakeholders’ pain points, motivations, and language, the more successful your architecture will be—because people will actually use it. 7. The job is constantly evolving and so are you The field of data architecture is rapidly evolving, and MongoDB is at the forefront of this change. From cloud migrations to AI-driven data products, the technology landscape is always shifting. As a solutions architect, you have to be adaptable and prepared for the next big change. At MongoDB, we work with cutting-edge technologies and constantly adapt to new trends, whether it’s AI, machine learning, or serverless computing. The key is to embrace change and continuously learn. The more you stay curious and open to new ideas, the more you’ll grow in your role and your ability to drive client success. As MongoDB continues to innovate, the learning curve is steep, but that’s what keeps the job exciting. What I wish I knew: You don’t “arrive” as a solutions architect—you evolve. And that evolution doesn’t stop. But everything you learn builds on itself. No effort is wasted. Every challenge adds depth. Every mistake adds clarity. The technologies may change, but the thinking compounds—and that’s what makes you valuable over the long run. It’s not just a role–it’s a journey Reflecting on my path to becoming a solutions architect at MongoDB, I realize that the journey is far from linear. From network protocols to financial systems and AI-driven data solutions, each role added a new layer to my experience. Becoming a solutions architect didn’t mean leaving behind my past—it meant integrating it into a broader perspective. At MongoDB, every day brings new challenges and opportunities. Whether you’re designing a solution for a global enterprise or helping a startup scale their data operations, the core of the job remains the same: solving problems, connecting people, and helping others succeed. And as you grow in the role, you’ll find that the most powerful thing you bring to the table isn’t just your expertise—it’s your ability to keep learning, to show up with intention, and to simplify complexity for everyone around you. To anyone stepping into this role at MongoDB: welcome. The journey is just beginning! Join our talent community for the latest MongoDB culture and careers content.

June 5, 2025

Navigating the AI Revolution: The Importance of Adaptation

In 1999, Steve Ballmer gave a famous speech in which he said that the “key to industry transformation, the key to success is developers developers developers developers developers developers developers, developers developers developers developers developers developers developers! Yes!” A similar mantra applies when discussing how to succeed with AI: adaptation, adaptation, adaptation! Artificial intelligence has already begun to transform how we work and live, and the changes AI is bringing to the world will only accelerate. Businesses rely ever more heavily on software to run and execute their strategies. So, to keep up with competitors, their processes and products must deliver what end-users increasingly expect: speed, ease of use, personalization—and, of course, AI features. Delivering all of these things (and doing so well) requires having the right tech stack and software foundation in place and then successfully executing. To better understand the challenges organizations adopting AI face, MongoDB and Capgemini recently worked with the research organization TDWI to assess the state of AI readiness across industries. The road ahead Based on a survey “representing a diverse mix of industries and company sizes,” TDWI’s “The State of Data and Operational Readiness for AI ” contains lots of super interesting findings. One I found particularly compelling is the percentage of companies with AI apps in production: businesses largely recognize the potential AI holds, but only 11% of survey respondents indicated that they had AI applications in production. Still only 11%! We’re well past the days of exploring whether AI is relevant. Now, every organization sees the value. The question is no longer ‘if’ but ‘how fast and how effectively’ they can scale it. Mark Oost, VP, AI and Generative AI Group Offer Leader, Capgemini There’s clearly work to be done; data readiness challenges highlighted in the report include managing diverse data types, ensuring accessibility, and providing sufficient compute power. Less than half (39%) of companies surveyed manage newer data formats, and only 41% feel they have enough compute. The report also shows how much AI has changed the very definition of software, and how software is developed and managed. Specifically, AI applications continuously adapt, and they learn and respond to end-user behavior in real-time; they can also autonomously make decisions and execute tasks. All of which depends on having a solid, flexible software foundation. Because the agility and adaptability of software are intrinsically linked to the data infrastructure upon which it's built, rigid legacy systems cannot keep pace with the demands of AI-driven change. So modern database solutions (like, ahem, MongoDB)—built with change in mind—are an essential part of a successful AI technology stack. Keeping up with change The tech stack can be said to comprise three layers: at the “top,” the interface or user experience layer; then the business logic layer; and a data foundation at the bottom. With AI, the same layers are there, but they’ve evolved: Unlike traditional software applications, AI applications are dynamic . Because AI-enriched software can reason and learn, the demands placed on the stack have changed. For example, AI-powered experiences include natural language interfaces, augmented reality, and those that anticipate user needs by learning from other interactions (and from data). In contrast, traditional software is largely static: it requires inputs or events to execute tasks, and its logic is limited by pre-defined rules. A database underpinning AI software must, therefore, be flexible and adaptable, and able to handle all types of data; it must enable high-quality data retrieval; it must respond instantly to new information; and it has to deliver the core requirements of all data solutions: security, resilience, scalability, and performance. So, to take action and generate trustworthy, reliable responses, AI-powered software needs access to up-to-date, context-rich data. Without the right data foundation in place, even the most robust AI strategy will fail. Figure 1. The frequency of change across eras of technology. Keeping up with AI can be head-spinning, both because of the many players in the space (the number of AI startups has jumped sharply since 2022, when ChatGPT was first released 1 ), and because of the accelerating pace of AI capabilities. Organizations that want to stay ahead must evolve faster than ever. As the figure above dramatically illustrates, this sort of adaptability is essential for survival. Execution, execution, execution But AI success requires more than just the right technology: expert execution is critical. Put another way, the difference between success and failure when adapting to any paradigm shift isn’t just having the right tools; it’s knowing how to wield those tools. So, while others experiment, MongoDB has been delivering real-world successes, helping organizations modernize their architectures for the AI era, and building AI applications with speed and confidence. For example, MongoDB teamed up with the Swiss bank Lombard Odier to modernize its banking tech systems. We worked with the bank to create customizable generative AI tooling, including scripts and prompts tailored for the bank’s unique tech stack, which accelerated its modernization by automating integration testing and code generation for seamless deployment. And, after Victoria’s Secret transformed its database architecture with MongoDB Atlas , the company used MongoDB Atlas Vector Search to power an AI-powered visual search system that makes targeted recommendations and helps customers find products. Another way MongoDB helps organizations succeed with AI is by offering access to both technology partners and professional services expertise. For example, MongoDB has integrations with companies across the AI landscape—including leading tech companies (AWS, Google Cloud, Microsoft), system integrators (Capgemini), and innovators like Anthropic, LangChain, and Together AI. Adapt (or else) In the AI era, what organizations need to do is abundantly clear: modernize and adapt, or risk being left behind. Just look at the history of smartphones, which have had an outsized impact on business and communication. For example, in its Q4 2007 report (which came out a few months after the first iPhone’s release), Apple reported earnings of $6.22 billion, of which iPhone sales comprised less than 2% 2 ; in Q1 2025, the company reported earnings of $124.3 billion, of which 56% was iPhone sales. 3 The mobile application market is now estimated to be in the hundreds of billions of dollars, and there are more smartphones than there are people in the world. 4 The rise of smartphones has also led to a huge increase in the number of people globally who use the internet. 5 However, saying “you need to adapt!” is much easier said than done. TWDI’s research, therefore, is both important and useful—it offers companies a roadmap for the future, and helps them answer their most pressing questions as they confront the rise of AI. Click here to read the full TDWI report . To learn more about how MongoDB can help you create transformative, AI-powered experiences, check out MongoDB for Artificial Intelligence . P.S. ICYMI, here’s Steve Ballmer’s famous “developers!” speech. 1 https://ourworldindata.org/grapher/newly-funded-artificial-intelligence-companies 2 https://www.apple.com/newsroom/2007/10/22Apple-Reports-Fourth-Quarter-Results/ 3 https://www.apple.com/newsroom/pdfs/fy2025-q1/FY25_Q1_Consolidated_Financial_Statements.pdf 4 ttps://www.weforum.org/stories/2023/04/charted-there-are-more-phones-than-people-in-the-world/ 5 https://ourworldindata.org/grapher/number-of-internet-users

June 4, 2025

Luna AI and MongoDB Throw Lifeline to Product Teams

Product and engineering leaders face a constant battle: making crucial real-time decisions amidst a sea of fragmented, reactive, and disconnected progress data. The old ways—chasing updates, endlessly pinging teams on Slack, digging through Jira, and enduring endless status meetings—simply aren't cutting it. This struggle leaves product and engineering leads wasting precious hours on manual updates, while critical risks silently slip through the cracks. This crucial challenge is precisely what Luna AI , powered by its robust partnership with MongoDB , is designed to overcome. Introducing Luna AI: Your intelligent program manager Luna AI was founded to tackle this exact problem, empowering product and engineering leaders with the visibility and context they need, without burying their PMs in busy work. Imagine having an AI program manager dedicated to giving you clear insights into goals, roadmap ROI, initiative progress, and potential risks throughout the entire product lifecycle. Luna AI makes this a reality by intelligently summarizing data from your existing tools like Jira and Slack. It can even automatically generate launch and objective and key result (OKR) status updates, create your roadmap, and analyze your Jira sprints, drastically reducing the need for manual busywork. From concept to command center: The evolution of Luna AI Luna AI’s Co-founder, Paul Debahy, a seasoned product leader with experience at Google, personally felt the pain of fragmented data during his time as a CPO. Inspired by Google's internal LaunchCal, which provided visibility into upcoming launches, Luna AI initially began as a launch management tool. However, a key realization quickly emerged: Customers primarily needed help "managing up." This insight led to a pivotal shift, focusing Luna AI on vertical management—communicating status, linking execution to strategy, and empowering leaders, especially product leaders, to drive decisions. Today, Luna AI has evolved into a sophisticated AI-driven insights platform. Deep Jira integration and advanced LLM modules have transformed it from a simple tracker into a strategic visibility layer. Luna AI now provides essential capabilities like OKR tracking, risk detection, resource and cost analysis, and smart status summaries. Luna AI believes product leadership is increasingly strategic, aiming to be the system of record for outcomes, not just tasks. Its mission: to be everyone’s AI program manager, delivering critical strategy and execution insights for smarter decision-making. The power under the hood: Building with MongoDB Atlas Luna AI’s robust technology stack includes Node.js, Angular, and the latest AI/LLM models. Its infrastructure leverages Google Cloud and, crucially, MongoDB Atlas as its primary database. When selecting a data platform, Luna AI prioritized flexibility, rapid iteration, scalability, and security. Given the dynamic, semi-structured data ingested from diverse sources like Jira, Slack, and even meeting notes, a platform that could handle this complexity was essential. Key requirements included seamless tenant separation, robust encryption, and minimal operational overhead. MongoDB proved to be the perfect fit for several reasons. The developer-friendly experience was a major factor, as was the flexible schema of its document database, which naturally accommodated Luna AI’s complex and evolving data model. This flexibility was vital for tracking diverse information such as Jira issues, OKRs, AI summaries, and Slack insights, enabling quick adaptation and iteration. MongoDB also offered effortless support for the startup’s multi-tenant architecture. Scaling with MongoDB Atlas has been smooth and fast, according to Luna AI. Atlas effortlessly scaled as the company added features and onboarded workspaces ranging from startups to enterprises. The monitoring dashboard has been invaluable, offering insights that helped identify performance bottlenecks early. In fact, index suggestions from the dashboard directly led to significant improvements to speed. Debahy even remarked, "Atlas’s built-in insights make it feel like we have a DB ops engineer on the team." Luna AI relies heavily on Atlas's global clusters and automated scaling . The monitoring and alerting features provide crucial peace of mind, especially during launches or data-intensive tasks like Jira AI epic and sprint summarization. The monitoring dashboard was instrumental in resolving high-latency collections by recommending the right indexes. Furthermore, in-house backups are simple, fast, and reliable, with painless restores offering peace of mind. Migrating from serverless to dedicated instances was seamless and downtime-free. Dedicated multi-tenant support allows for unlimited, isolated databases per customer. Auto-scaling is plug-and-play, with Atlas handling scaling across all environments. Security features like data-at-rest encryption and easy access restriction management per environment are also vital benefits. The support team has consistently been quick, responsive, and proactive. A game-changer for startups: The MongoDB for Startups program Operating on a tight budget as a bootstrapped and angel-funded startup, Luna AI found the MongoDB for Startups program to be a true game changer. It stands out as one of the most founder-friendly programs the company has encountered. The Atlas credits completely covered the database costs, empowering the team to test, experiment, and even make mistakes without financial pressure. This freedom allowed them to scale without worrying about database expenses or meticulously tracking every compute and resource expenditure. Access to technical advisors and support was equally crucial, helping Luna AI swiftly resolve issues ranging from load management to architectural decisions and aiding in designing a robust data model from the outset. The program also opened doors to a valuable startup community, fostering connections and feedback. Luna AI’s vision: The future of product leadership Looking ahead, Luna AI is focused on two key areas: Building a smarter, more contextual insights layer for strategy and execution. Creating a stakeholder visibility layer that requires no busy work from product managers. Upcoming improvements include predictive risk alerts spanning Jira, Slack, and meeting notes. They are also developing ROI-based roadmap planning and prioritization, smart AI executive status updates, deeper OKR traceability, and ROI-driven tradeoff analysis. Luna AI firmly believes that the role of product leadership is becoming increasingly strategic. With the support of programs like MongoDB for Startups, they are excited to build a future where Luna AI is the definitive system of record for outcomes. Ready to empower your product team? Discover how Luna AI helps product teams thrive. Join the MongoDB for Startups program to start building faster and scaling further with MongoDB!

June 3, 2025

Conformance Checking at MongoDB: Testing That Our Code Matches Our TLA+ Specs

Some features mentioned below have been sunset since this paper was originally written. Visit our docs to learn more. At MongoDB, we design a lot of distributed algorithms—algorithms with lots of concurrency and complexity, and dire consequences for mistakes. We formally specify some of the scariest algorithms in TLA+, to check that they behave correctly in every scenario. But how do we know that our implementations conform to our specs? And how do we keep them in sync as the implementation evolves? This problem is called conformance checking. In 2020, my colleagues and I experimented with two MongoDB products, to see if we could test their fidelity to our TLA+ specs. Here's a video of my presentation on this topic at the VLDB conference. (It'll be obvious to you that I recorded it from my New York apartment in deep Covid lockdown.) Below, I write about our experience with conformance checking from 2025's perspective. I'll tell you what worked for us in 2020 and what didn't, and what developments there have been in the field in the five years since our paper. Agile modelling Our conformance-checking project was born when I read a paper from 2011—"Concurrent Development of Model and Implementation"—which described a software methodology called eXtreme Modelling. The authors argued that there's a better way to use languages like TLA+, and I was convinced. They advocated a combination of agile development and rigorous formal specification: Multiple specifications model aspects of the system. Specifications are written just prior to the implementation. Specifications evolve with the implementation. Tests are generated from the model, and/or trace-checking verifies that test traces are legal in the specification. I was excited about this vision. Too often, an engineer tries to write one huge TLA+ spec for the whole system. It's too complex and detailed, so it's not much easier to understand than the implementation code, and state-space explosion dooms model checking. The author abandons the spec and concludes that TLA+ is impractical. In the eXtreme Modelling style, a big system is modeled by a collection of small specs, each focusing on an aspect of the whole. This was the direction MongoDB was already going, and it seemed right to me. In eXtreme Modelling, the conformance of the spec and implementation is continuously tested. The authors propose two conformance checking techniques. To understand these, let's consider what a TLA+ spec is: it's a description of an algorithm as a state machine. The state machine has a set of variables, and each state is an assignment of specific values to those variables. The state machine also has a set of allowed actions, which are transitions from one state to the next state. You can make a state graph by drawing states as nodes and allowed actions as edges. A behavior is any path through the graph. This diagram shows the whole state graph for some very simple imaginary spec. One of the spec's behaviors is highlighted in green. Figure 1. A formal spec's state graph, with one behavior highlighted. The spec has a set of behaviors B spec , and the implementation has a set of behaviors B impl . An implementation refines a spec if B impl ⊂ B spec . If the converse is also true, if B spec ⊂ B impl , then this is called bisimulation , and it's a nice property to have, though not always necessary for a correctly implemented system. You can test each direction: Test-case generation: For every behavior in B spec , generate a test case that forces the implementation to follow the same sequence of transitions. If there's a spec behavior the implementation can't follow, then B spec ⊄ B impl , and the test fails. Trace-checking: For every behavior in B impl , generate a trace: a log file that records the implementation's state transitions, including all implementation variables that match spec variables. If the behavior recorded in the trace isn't allowed by the spec, then B impl ⊄ B spec and the test fails. Figure 2. Two ways to test that the spec's behaviors are the same as the implementation's. Non-conforming behaviors are highlighted in red. Both techniques can be hard, of course. For test-case generation, you must somehow control every decision the implementation makes, squash all nondeterminism, and force it to follow a specific behavior. If the spec's state space is huge, you have to generate a huge number of tests, or choose an incomplete sample. Trace-checking, on the other hand, requires you to somehow map the implementation's state back to the spec's, and log a snapshot of the system state each time it changes—this is really hard with multithreaded programs and distributed systems. And you need to make the implementation explore a variety of behaviors, via fault-injection and stress-testing, and so on. Completeness is usually impossible. We found academic papers that demonstrated both techniques on little example applications, but we hadn’t seen them tried on production-scale systems like ours. I wanted to see how well they work, and what it would take to make them practical. I recruited my colleagues Judah Schvimer and Max Hirschhorn to try it with me. Judah and I tried trace-checking the MongoDB server (in the next section), and Max tried test-case generation with the MongoDB Mobile SDK (the remainder of this article). Figure 3. We tried two conformance checking techniques on two MongoDB products. Trace-checking the MongoDB server For the trace-checking experiment, the first step Judah and I took was to choose a TLA+ spec. MongoDB engineers had already written and model-checked a handful of specs that model different aspects of the MongoDB server (see this presentation and this one ). We chose RaftMongo.tla , which focuses on how servers learn the commit point, which I'll explain now. MongoDB is typically deployed as a replica set of cooperating servers, usually three of them. They achieve consensus with a Raft-like protocol . First, they elect one server as the leader. Clients send all writes to the leader, which appends them to its log along with a monotonically increasing logical timestamp. Followers replicate the leader's log asynchronously, and they tell the leader how up-to-date they are. The leader keeps track of the commit point—the logical timestamp of the newest majority-replicated write. All writes up to and including the commit point are committed, all the writes after it are not. The commit point must be correctly tracked even when leaders and followers crash, messages are lost, a new leader is elected, uncommitted writes are rolled back, and so on. RaftMongo.tla models this protocol, and it checks two invariants: A safety property, which says that no committed write is ever lost, and a liveness property, which says that all servers eventually learn the newest commit point. Figure 4. MongoDB replica set servers and their logs. Judah and I wanted to test that MongoDB's C++ implementation matched our TLA+ spec, using trace-checking. Here are the steps: Run randomized tests of the implementation. Collect execution traces. Translate the execution traces into TLA+. Check the trace is permitted by the spec. Figure 5. The trace-checking workflow. The MongoDB server team has hundreds of integration tests handwritten in JavaScript, from which we chose about 300 for this experiment. We also have randomized tests; we chose one called the "rollback fuzzer" which does random CRUD operations while randomly creating and healing network partitions, causing uncommitted writes to be logged and rolled back. We added tracing code to the MongoDB server and ran each test with a three-node replica set. Since all server processes ran on one machine and communicated over localhost, we didn't worry about clock synchronization: we just merged the three logs, sorting by timestamp. We wrote a Python script to read the combined log and convert it into a giant TLA+ spec named Trace.tla with a sequence of states for the whole three-server system. Trace.tla asserted only one property: "This behavior conforms to RaftMongo.tla." Here's some more detail about the Python script. At each moment during the test, the system has some state V, which is the values of the state variables for each node. The script tries to reconstruct all the changes to V and record them in Trace.tla. It begins by setting V to a hardcoded initial state V0, and outputs it as the first state of the sequence: \* Each TLA+ tuple is \* <<action, committedEntries, currentTerm, log, role, commitPoint, \* serverLogLocation>> \* We know the first state: all nodes are followers with empty logs. Trace == << <<"Init", \* action name <<"Follower","Follower","Follower">>, \* role per node <<1, 1, 1>>, \* commitPoint per node <<<<...>>,<<...>>,<<...>>>>, \* log per node "">>, \* trace log location (empty) \* ... more states will follow ... The script reads events from the combined log and updates V. Here's an example where Node 1 was the leader in state Vi, then Node 2 logs that it became leader. The script combines these to produce Vi+1 where Node 2 is the leader and Node 1 is now a follower. Note, this is a lie. Node 1 didn't actually become a follower in the same instant Node 2 became leader. Foreshadowing! This will be a problem for Judah and me. Figure 6. Constructing the next state from a trace event. Anyway, the Python script appends a state to the sequence in Trace.tla: Trace == << \* ... thousands of events ... <<"BecomePrimary", \* action name for debugging <<"Follower","Leader","Follower">>, \* role per node <<1, 1, 1>>, \* commitPoint per node <<<<...>>,<<...>>,<<...>>>>, \* log per node \* trace log location, for debugging: "/home/emptysquare/RollbackFuzzer/node2.log:12345">>, \* ... thousands more events ... >> We used the Python script to generate a Trace.tla file for each of the hundreds of tests we'd selected: handwritten JavaScript tests and the randomized "rollback fuzzer" test. Now we wanted to use the model-checker to check that this state sequence was permitted by our TLA+ spec, so we know our C++ code behaved in a way that conforms to the spec. Following a technique published by Ron Pressler , we added these lines to each Trace.tla: VARIABLES log, role, commitPoint \* Instantiate our hand-written spec, RaftMongo.tla. Model == INSTANCE RaftMongo VARIABLE i \* the trace index \* Load one trace event. Read == /\ log = Trace[i][4] /\ role = Trace[i][5] /\ commitPoint = Trace[i][6] ReadNext == /\ log' = Trace[i'][4] /\ role' = Trace[i'][5] /\ commitPoint' = Trace[i'][6] Init == i = 1 /\ Read Next == \/ i < Len(Trace) /\ i' = i + 1 /\ ReadNext \/ UNCHANGED <<i, vars>> \* So that we don’t get a deadlock error in TLC TraceBehavior == Init /\ [][Next]_<<vars, i>> \* To verify, we check the spec TraceBehavior in TLC, with Model!SpecBehavior \* as a temporal property. We run the standard TLA+ model-checker ("TLC"), which tells us if this trace is an allowed behavior in RaftMongo.tla. But this whole experiment failed. Our traces never matched our specification. We didn't reach our goal, but we learned three lessons that could help future engineers. What disappointment taught us Lesson one: It's hard to snapshot a multithreaded program's state. Each time a MongoDB node executes a state transition, it has to snapshot its state variables in order to log them. MongoDB is highly concurrent with fairly complex locking within each process—it was built to avoid global locking. It took us a month to figure out how to instrument MongoDB to get a consistent snapshot of all these values at one moment. We burned most of our budget for the experiment, and we worried we'd changed MongoDB too much (on a branch) to test it realistically. The 2024 paper "Validating Traces of Distributed Programs Against TLA+ Specifications" describes how to do trace-checking when you can only log some of the values (see my summary at the bottom of this page). We were aware of this option back in 2020, and we worried it would make trace-checking too permissive; it wouldn't catch every bug. Lesson two: The implementation must actually conform to the spec. This is obvious to me now. After all, conformance checking was the point of the project. In our real-life implementation, when an old leader votes for a new one, first the old leader steps down, then the new leader steps up. The spec we chose for trace-checking wasn't focused on the election protocol, though, so for simplicity, the spec assumed these two actions happened at once. (Remember I said a few paragraphs ago, "This is a lie"?) Judah and I knew about this discrepancy—we'd deliberately made this simplification in the spec. We tried to paper over the difference with some post-processing in our Python script, but it never worked. By the end of the project, we decided we should have backtracked, making our spec much more complex and realistic, but we'd run out of time. The eXtreme Modelling methodology says we should write the spec just before the implementation. But our spec was written long after most of the implementation, and it was highly abstract. I can imagine another world where we knew about eXtreme Modelling and TLA+ at the start, when we began coding MongoDB. In that world, we wrote our spec before the implementation, with trace-checking in mind. The spec and implementation would've been structured similarly, and this would all have been much easier. Lesson three: Trace-checking should extend easily to multiple specs. Judah and I put in 10 weeks of effort without successfully trace-checking one spec, and most of the work was specific to that spec, RaftMongo.tla. Sure, we learned general lessons (you're reading some of them) and wrote some general code, but even if we'd gotten trace-checking to work for one spec we'd be practically starting over with the next spec. Our original vision was to gather execution traces from all our tests, and trace-check them against all of our specifications, on every git commit. We estimated that the marginal cost of implementing trace-checking for more specs wasn't worth the marginal value, so we stopped the project. Practical trace-checking If we started again, we'd do it differently. We'd ensure the spec and implementation conform at the start, and we'd fix discrepancies by fixing the spec or the implementation right away. We'd model easily observed events like network messages, to avoid snapshotting the internal state of a multithreaded process. I still think trace-checking is worthwhile. I know it's worked for other projects. In fact MongoDB is sponsoring a grad student Finn Hackett , whom I'm mentoring, to continue trace-checking research. Let's move on to the second half of our project. Test-case generation for MongoDB Mobile SDK The MongoDB Mobile SDK is a database for mobile devices that syncs with a central server (since we wrote the paper, MongoDB has sunsetted the product ). Mobile clients can make changes locally. These changes are periodically uploaded to the server and downloaded by other clients. The clients and the server all use the same algorithm to resolve write conflicts: Operational Transformation , or OT. Max wanted to test that the clients and server implement OT correctly, meaning they resolve conflicts the same way, eventually resulting in identical data everywhere. Originally, the clients and server shared one C++ implementation of OT, so we knew they implemented the same algorithm. But in 2020, we'd recently rewritten the server in Go, so testing their conformance became urgent. Figure 7. MongoDB mobile SDK. My colleague Max Hirschhorn used test-case generation to check conformance. This technique goes in the opposite direction from trace-checking: trace-checking starts with an implementation and checks that its behaviors are allowed by the spec, but test-case generation starts with a spec and checks that its behaviors are in the implementation. But first, we needed a TLA+ spec. Before this project, the mobile team had written out the OT algorithm in English and implemented it in C++. Max manually translated the algorithm from C++ to TLA+. In the mobile SDK, clients can do 19 kinds of operations on data; six of these can be performed on arrays, resulting in 21 array merge rules, which are implemented in about 1000 lines of C++. Those 21 rules are the most complex, and Max focused his specification there. He used the model-checker to verify that his TLA+ spec ensured all participants eventually had the same data. This translation was a gruelling job, but the model-checker caught Max's mistakes quickly, and he finished in two weeks. There was one kind of write conflict that crashed the model-checker: if one participant swapped two array elements, and another moved an element, then the model-checker crashed with a Java StackOverflowError. Surprisingly, this was an actual infinite-recursion bug in the algorithm. Max verified that the bug was in the C++ code. It had hidden there until he faithfully transcribed it into TLA+ and discovered it with the model-checker. He disabled the element-swap operation in his TLA+ spec, and the mobile team deprecated it in their implementation. To test conformance, Max used the model-checker to output the entire state graph for the spec. He constrained the algorithm to three participants, all editing a three-element array, each executing one (possibly conflicting) write operation. With these constraints, the state space is a DAG, with a finite number of behaviors (paths from an initial state to a final state). There are 30,184 states and 4913 behaviors. Max wrote a Go program to parse the model-checker's output and write out a C++ unit test for each behavior. Here’s an example unit test. (It's edited down from three participants to two.) At the start, there's an array containing {1, 2, 3}. One client sets the third element of an array to 4 and the second client removes the second element from the array. The test asserts that both clients agree the final array is {1, 4}. TEST(Transform_Array) { size_t num_clients = 2; TransformArrayFixture fixture{test_context, num_clients, {1, 2, 3}}; fixture.transaction(0, [](TableRef array) { array->set_int(0, 2, 4); }); fixture.transaction(1, [](TableRef array) { array->remove(1); }); fixture.sync_all_clients(); fixture.check_array({1, 4}); fixture.check_ops(0, {ArrayErase{1}}); fixture.check_ops(1, {ArraySet{1, 4}}); } These 4913 tests immediately achieved 100% branch coverage of the implementation, which we hadn't accomplished with our handwritten tests (21%) or millions of executions with the AFL fuzzer (92%). Retrospective Max's test-case generation worked quite well. He discovered a bug in the algorithm, and he thoroughly checked that the mobile SDK's Operational Transformation code conforms to the spec. Judah's and my trace-checking experiment didn't work: our spec and code were too far apart, and adding tracing to MongoDB took too long. Both techniques can work, given the right circumstances and strategy. Both techniques can fail, too! We published our results and lessons as a paper in VLDB 2020, titled " eXtreme Modelling in Practice ." In the subsequent five years, I've seen some progress in conformance checking techniques. Test-case generation: Model Checking Guided Testing for Distributed Systems . The "Mocket" system generates tests from a TLA+ spec, and instruments Java code (with a fair amount of human labor) to force it to deterministically follow each test, and check that its variables have the same values as the spec after each action. The authors tested the conformance of three Java distributed systems and found some new bugs. Their technique is Java-specific but could be adapted for other languages. Multi-Grained Specifications for Distributed System Model Checking and Verification . The authors wrote several new TLA+ specs of Zookeeper, at higher and lower levels of abstraction. They checked conformance between the most concrete specs and the implementation, with a technique similar to Mocket: a human programmer instruments some Java code to map Java variables to spec variables, and to make all interleavings deterministic. The model-checker randomly explores spec behaviors, while the test framework checks that the Java code can follow the same behaviors. SandTable: Scalable Distributed System Model Checking with Specification-Level State Exploration . This system is not language-specific: it overrides system calls to control nondeterminism and force the implementation to follow each behavior of the spec. It samples the spec's state space to maximize branch coverage and event diversity while minimizing the length of each behavior. As in the "Multi-Grained" paper, the SandTable authors wisely developed new TLA+ specs that closely matched the implementations they were testing, rather than trying to use existing, overly abstract specs like Judah and I did. Plus, my colleagues Will Schultz and Murat Demirbas are publishing a paper in VLDB 2025 that uses test-case generation with a new TLA+ spec of MongoDB's WiredTiger storage layer, the paper is titled "Design and Modular Verification of Distributed Transactions in MongoDB." Trace-checking: Protocol Conformance with Choreographic PlusCal . The authors write new specs in an extremely high-level language that compiles to TLA+. From their specs they generate Go functions for trace-logging, which they manually add to existing Go programs. They check that the resulting traces are valid spec behaviors and find some bugs. Validating Traces of Distributed Programs Against TLA+ Specifications . Some veteran TLA+ experts demonstrate in detail how to trace-log from a Java program and validate the traces with TLC, the TLA+ model-checker. They've written small libraries and added TLC features for convenience. This paper focuses on validating incomplete traces: if you can only log some of the variables, TLC will infer the rest. Smart Casual Verification of the Confidential Consortium Framework . The authors started with an existing implementation of a secure consensus protocol. Their situation was like mine in 2020 (new specs of a big old C++ program) and so was their goal: to continuously check conformance and keep the spec and implementation in sync. Using the new TLC features announced in the "Validating Traces" paper above, they toiled for months, brought their specs and code into line, found some bugs, and realized the eXtreme Modelling vision. Finn Hackett is a PhD student I'm mentoring, he's developed a TLA+-to-Go compiler . He's now prototyping a trace-checker to verify that the Go code he produces really conforms to its source spec. We're doing a summer project together with Antithesis to thoroughly conformance-check the implementation's state space. I'm excited to see growing interest in conformance checking, because I think it's a serious problem that needs to be solved before TLA+ goes mainstream. The "Validating Traces" paper announced some new trace-checking features in TLC, and TLC's developers are discussing a better way to export a state graph for test-case generation . I hope these research prototypes lead to standard tools, so engineers can keep their code and specs in sync. Join our MongoDB Community to learn about upcoming events, hear stories from MongoDB users, and connect with community members from around the world.

June 2, 2025