MongoDB Blog

Announcements, updates, news, and more

PLAID, Inc. Optimizes Real-Time Data With MongoDB Atlas Stream Processing

A MongoDB customer since 2015, Tokyo, Japan-based PLAID, Inc. works to “maximize the value of people with the power of data,” according to the company’s mission statement. PLAID’s customer experience platform, KARTE, analyzes and visualizes website and application users’ data in real time, offering the company’s customers a one-stop solution that helps them better understand their customers and provide personalized experiences. After running a self-hosted instance of MongoDB for several years, in 2021, PLAID adopted MongoDB Atlas , a fully managed suite of cloud database services. Subsequently, however, the company ran into real-time data challenges. Specifically, PLAID faced challenges when trying to migrate an existing batch processing system that sent real-time data from MongoDB Atlas to Google BigQuery, which helps organizations “go from data to AI action faster.” While their initial cloud setup with Kafka connectors provided valuable streaming capabilities by capturing events from MongoDB and streaming them to BigQuery, the complexity tied to the number of pipelines became a concern. The staging environment, which required duplicate pipelines, further exacerbated the issue, and rising costs could hinder PLAID's ability to scale and expand its real-time data processing system efficiently. Easy event data processing with Atlas Stream Processing To address these challenges, PLAID turned to MongoDB Atlas Stream Processing , which enables development teams to process streams of complex data using the same query API used in their MongoDB Atlas databases. Atlas Stream Processing provided PLAID with a cost-effective way of acquiring and processing event data in real time, all while being natively integrated within their existing MongoDB Atlas environment for a seamless developer experience. This allowed them to replace some of their costly Kafka source connectors while maintaining the overall data flow to BigQuery via their existing Confluent Cloud Kafka setup. Key aspects of the solution included: Replacing Kafka source connectors: Atlas Stream Processing efficiently captures event data from MongoDB Atlas databases and writes them to Kafka, reducing costs associated with the previous Kafka source connectors. MongoDB Atlas Stream Processing: Stream processing instance (SPI): PLAID used SPIs, where cost is determined by the instance tier and the number of workers, which in turn depends on the number of stream processors. This offered a more optimized cost structure compared to the previous connector-task-based pricing. Connection management: Atlas Stream Processing simplifies connection management. Connecting to Atlas databases is straightforward, and a single connection can be used for the Kafka cluster. Stream processors: These processing units perform data transformation and routing with the same aggregation pipelines used by MongoDB databases. Thus, the PLAID team leveraged their existing MongoDB knowledge to define pipeline logic, making the transition smoother. Custom backfill mechanism: To address the lack of a backfill feature in Stream Processing, PLAID developed a custom application to synchronize existing data. Custom metric collection: Since native monitoring integration with Datadog was unavailable, PLAID created a bot to collect Atlas Stream Processing metrics and send them to Datadog for monitoring and alerting. Atlas Stream Processing provided us with a robust solution for real-time data processing, which has significantly reduced costs and improved scalability throughout our platform. Hajime Shiozawa, senior software engineer, PLAID, Inc. The outcome: Lower costs, improved efficiency By implementing MongoDB Atlas Stream Processing, PLAID achieved significant improvements. These include everything from reduced costs to operational efficiencies: Reduced costs: PLAID eliminated the cost structure that was proportional to the number of pipelines, resulting in substantial cost savings. The new cost model based on Atlas Stream Processing workers offered a more scalable and predictable pricing structure. Improved scalability: The optimized architecture allowed PLAID to scale their real-time data processing system efficiently, supporting the addition of new products and Atlas clusters without escalating costs. Simplified management: Because Stream Processing is a native MongoDB Atlas capability, it simplified connection management and pipeline configuration, reducing operational overhead. Stable operation: PLAID successfully deployed and operated more than 20 pipelines, processing over 3 million events per day to BigQuery. Enhanced real-time data capabilities: The improved system strengthened the real-time nature of their data, improving operational efficiency. MongoDB Atlas Stream Processing provided PLAID with a robust and cost-effective solution for real-time data processing to BigQuery. By replacing costly Kafka Source Connectors and optimizing their architecture, PLAID significantly reduced costs and improved scalability. The seamless integration with MongoDB Atlas and the developer-friendly API further enhanced their operational efficiency. PLAID’s success with Atlas Stream Processing demonstrates that it is a valuable tool for organizations that are looking to streamline their data integration pipelines and leverage real-time data effectively. To learn how Atlas Stream Processing helps organizations integrate MongoDB with Apache Kafka to build event-driven applications, see the MongoDB Atlas Stream Processing page.

July 17, 2025
Home

Revolutionizing Inventory Classification with Generative AI

In today's volatile geopolitical environment, the global automotive industry faces compounding disruptions that require a fundamental rethink of data and operations strategy. After decades of low import taxes, the return of tariffs as a tool of economic negotiations has led the global automotive industry to delay model-year transitions and disrupt traditional production and release cycles. As of June 2025, only 3% of US automotive inventory comprises next-model-year vehicles —less than half the number seen at this time in previous years. This severe decline in new-model availability, compounded by a 12.2% year-over-year drop in overall inventory, is pressuring consumer pricing and challenging traditional dealer inventory management. In this environment of constrained supply, better tools are urgently needed to classify and control vehicle, spare part, and raw material inventories for both dealers and manufacturers. Traditionally, dealerships and automakers have relied on ABC analysis to segment and control inventory by value. This widely used method classifies items into Category A, B, or C. For example, Category A items typically represent just 20% of stock but drive 80% of sales, while Category C items might comprise half the inventory yet contribute only 5% to the bottom line. This approach effectively helps prioritize resource allocation and promotional efforts. Figure 1. ABC analysis for inventory classification. While ABC analysis is known for its ease of use, it has been criticized for its focus on dollar usage. For example, not all Category C items are necessarily low-priority, as some may be next-model-year units arriving early or aging stock affected by shifting consumer preferences. Other criteria—such as lead-time, commonality, obsolescence, durability, inventory cost, and order size requirements—have also been recognized as critical for inventory classification. A multi-criteria inventory classification (MCIC) methodology, therefore, adds additional criteria to dollar usage. MCIC can be achieved with methods like statistical clustering or unsupervised machine learning techniques. Yet, a significant blind spot remains: the vast amount of unstructured data that organizations must deal with; unstructured data accounts for an estimated 80% of the world's total. Traditional ABC analysis—and even MCIC—often overlook the growing influence of insights gleaned from unstructured sources like customer sentiment and product reviews on digital channels. But now, valuable intelligence from reviews, social media posts, and dealer feedback can be vectorized and transformed into actionable features using large language models (LLMs). For instance, analyzing product reviews can yield qualitative metrics like the probability of recommending or repurchasing a product, or insights into customer expectations vs. the reality of ownership. This textual analysis can also reveal customers' product perspectives, directly informing future demand. By integrating these signals into inventory classification models, businesses can gain a deeper understanding of true product value and demand elasticity. This fusion of structured and unstructured data represents a crucial shift from reactive inventory management to predictive and customer-centric decision-making. In this blog post, we propose a novel methodology to convert unstructured data into powerful feature sets for augmenting inventory classification models. Figure 2. Transforming unstructured data into features for machine learning models. How MongoDB enables AI-driven inventory classification So, how does MongoDB empower the next generation of AI-driven inventory classification? It all comes down to four crucial steps, and MongoDB provides the robust technology and features to support every single one. Figure 3. Methodology and requirements for gen AI-powered inventory classification. Step 1: Create and store vector embeddings from unstructured data MongoDB Atlas enables modern vector search workflows. Unstructured data like product reviews, supplier notes, or customer support transcripts can be vectorized via embedding models (such as Voyage AI models) and ingested into MongoDB Atlas, where they are stored next to the original text chunks. This data then becomes searchable using MongoDB Atlas Vector Search, which allows you to run native semantic search queries directly inside the database. Unlike solutions that require separate databases for structured and vector data, MongoDB stores them side by side using the flexible document model, enabling unified access via one API. This reduces system complexity, technical debt, and infrastructure footprint—and allows for low-latency semantic searches. Figure 4. Product reviews can be stored as vector embeddings in MongoDB Atlas. Step 2: Design and store evaluation criteria In a gen AI-powered inventory classification system, evaluation criteria are no longer a set of static rules stored in a spreadsheet. Instead, the criteria are dynamic and data-backed, and are generated via an AI agent using structured and unstructured data—and enriched by domain experts using business objectives and constraints. As shown in Figure 5, the criteria for features like “Product Durability” can be defined based on relevant unstructured data stored in MongoDB (product reviews, audit reports) as well as structured data like inventory turnover and sales history. Such criteria are not just instructions or rules, but are knowledge objects with structure and semantic depth. The AI agent uses tools such as generate_criteria and embed_criteria tool and iterates over each product in the inventory. It leverages the LLM to create the criteria definition and uses an embedding model (e.g., voyage-3-large ) to generate embeddings of each definition. MongoDB Atlas is uniquely suited to store these dynamic criteria. Each rule is modeled as a flexible JSON document containing the name of the feature, criteria definition, data sources use, and the embeddings. Since there are different types of products (different car models/makes and different car parts), the documents can evolve over time without requiring schema migrations and be queried and retrieved by the AI agent in real time. MongodB Atlas provides all the necessary tools for this design—a flexible document model database, vector search, and full search tools—that can be leveraged by the AI agent to create the criteria. Figure 5. Unstructured and structured data are used by the AI agent to create criteria for feature generation. Step 3: Create an agentic application to perform transformation based on the criteria In the third step, we have another AI agent that operates over products, criteria, and unstructured data to generate enriched feature sets. This agent iterates over every product and uses MongoDB Atlas Vector Search to find relevant customer reviews to apply the criteria to and calculate a numerical feature score. The new features are added to the original features JSON document in MongoDB. In Figure 6, the agent has created “durability” and “criticality” features from the product reviews. MongoDB Atlas is the ideal foundation for this agentic architecture. Again, it provides the agent the tools it needs for features to evolve, adding new dimensions without requiring schema redesign. This results in an adaptive classification dataset that contains both structured and unstructured data. Figure 6. An AI agent enriches product features with vectorized review data to generate new features. Step 4: Rerun the inventory classification model with new features added As a final step, the inventory classification domain experts can assign or balance weights to existing and new features, choose a classification technique, and rerun inventory classification to find new inventory classes. Figure 7 shows the process where generative AI features are used in the existing inventory classification algorithm. Figure 7. Domain experts can rerun classification after balancing weights. Figure 8 shows the solution in action. The customer satisfaction score is created by LLM a using customer reviews vectorized collection and then utilized in the inventory classification model with a new weight of 0.2. Figure 8. Inventory classification using generative AI. Driving smarter inventory decisions As the automotive industry navigates slowing sales and uneven inventory, traditional inventory classification techniques also need to evolve. Though such techniques provide a solid foundation, they fall short in the face of geopolitical uncertainty, tariff-driven supply shifts, and fast-evolving consumer expectations. By combining structured sales and consumption data with unstructured insights, and enabling agentic AI using MongoDB, the automotive industry can enable a new era of inventory intelligence where products are dynamically classified based on all available data—both structured and unstructured. Clone the GitHub repository if you are interested in trying out this solution yourself. To learn more about MongoDB’s role in the manufacturing industry, please visit our manufacturing and automotive webpage .

July 16, 2025
Artificial Intelligence

Introducing MongoDB’s Multimodal Search Library For Python

AI applications increasingly rely on a variety of different data types—text, images, charts, and complex documents—to drive rich user experiences. For developers building these applications, determining how to effectively search and retrieve information that spans these data types presents a challenge. Developers have to consider different chunking strategies, figure out how to incorporate figures and tables, and manage context that could bleed across chunks. To simplify this, we're excited to announce the public preview of MongoDB’s Multimodal Search Python Library . This new library makes it easy to build sophisticated applications using multimodal data, providing a single interface for integrating MongoDB Atlas Vector Search , AWS S3, and Voyage AI's multimodal embedding model voyage-multimodal-3 . The library handles: Processing and storage: It interacts with S3 for storing PDFs from a URL or referring to a PDF already stored in S3. PDFs are then turned into single-page images and stored in S3. Generating embeddings: Images use voyage-multimodal-3 to produce high-quality embeddings. Vector indexing: Finally, it indexes the embeddings using Atlas Vector Search and provides a reference back to S3. The power of multimodal Traditional search methods often struggle when dealing with documents that contain text alongside visual elements like charts and graphs, which are common in research papers, financial reports, and more. Developers typically need to build complex, custom pipelines to handle image storage, embedding generation, and vector indexing. Our Multimodal Search Library abstracts this complexity away, using the best-in-class voyage-multimodal-3. It empowers developers to build applications that can understand and search the content of images just as easily as text. This enables accurate and efficient information retrieval and richer user experiences when working with either multimodal data or PDFs with visually rich documents. Figure 1. Traditional chunking vs. multimodal embedding. Imagine you're a financial analyst sifting through hundreds of annual reports—dense PDFs filled with text, tables, and charts—to find a specific trend. With our Multimodal Search Library, you can simply ask a question in natural language, like: " Show me all the charts illustrating revenue growth over the past three years ." The library will process the query and retrieve pages containing the relevant charts from your corpus of knowledge. Likewise, consider an e-commerce platform with a large product catalog. A shopper might be looking for a specific style of shoes but may not know the right keywords to describe exactly what they are looking for. By leveraging multimodal search, the user could upload an image of the shoes they like, and the application finds visually similar in-stock items, creating a seamless product discovery journey. Learn how to get started To get started, you’ll need: A MongoDB Atlas cluster ( sign up for the free tier) A MongoDB collection in that cluster A MongoDB Atlas Vector Search index A Voyage AI API key ( sign up ) An S3 bucket ( sign up ) Installation and setup First, we’ll ensure that we can connect to MongoDB Atlas, AWS S3, and Voyage AI. pip install pymongo-voyageai-multimodal import os from pymongo import MongoClient from pymongo_voyageai_multimodal import PyMongoVoyageAI client = PyMongoVoyageAI.from_connection_string( connection_string=os.environ["MONGODB_ATLAS_CONNECTION_STRING"], database_name="db_name", collection_name="collection_name", s3_bucket_name=os.environ["S3_BUCKET_NAME"], voyageai_api_key=os.environ["VOYAGEAI_API_KEY"], ) Adding documents Next, we’ll add relevant documents for embedding generation. from pymongo_voyageai_multimodal import TextDocument, ImageDocument text = TextDocument(text="foo", metadata={"baz": "bar"}) images = client.url_to_images( "https://www.fdrlibrary.org/documents/356632/390886/readingcopy.pdf" ) documents = [text, images[0], images[1]] ids = ["1", "2", "3"] client.add_documents(documents=documents, ids=ids) Performing search Finally, we’ll search for content most semantically similar to our query. results = client.similarity_search(query="example", k=1) for doc in results: print(f"* {doc['id']} [{doc['inputs']}]") Loading data already stored in S3 Developers can also query against documents already stored in S3. See more information in the documentation . import os from pymongo_voyageai_multimodal import PyMongoVoyageAI client = PyMongoVoyageAI( voyageai_api_key=os.environ["VOYAGEAI_API_KEY"], s3_bucket_name=os.environ["S3_BUCKET_NAME"], mongo_connection_string=os.environ["MONGODB_URI"], collection_name="test", database_name="test_db", ) query = "The consequences of a dictator's peace" url = "s3://my-bucket-name/readingcopy.pdf" images = client.url_to_images(url) resp = client.add_documents(images) client.wait_for_indexing() data = client.similarity_search(query, extract_images=True) print(f"Found {len(data)} relevant pages") client.close() A few important notes: Automatic updates to source data are not supported. Changes to indexed data need to be made via application code calling the client using the add_documents and delete functions. This library is primarily meant to support integrating multimodal embeddings and MongoDB Atlas on relatively static datasets. It is not intended to support sophisticated aggregation pipelines that combine multiple stages or data that updates frequently. voyage-multimodal-3 is the only embedding model supported directly, and AWS is the only cloud provider supported directly. Ready to try it yourself? Learn more in our documentation , and please share feedback . We can't wait to see what you build!

July 16, 2025
Home

“Hello, Community!”: Meet the 2025 MongoDB Community Champions!

We are so excited to announce this year’s new cohort of MongoDB Community Champions! Community Champions are the connective tissue between MongoDB and our community, keeping them informed about MongoDB’s latest developments and offerings. Community Champions also share their knowledge and experiences with others through a variety of media channels and event engagements. “The MongoDB Community Champions program is one of the best influencer programs,” says Shrey Batra, Head of Engineering and a fifth-year returning Champion. “We can contribute directly to the product development, participate in developer outreach, get developer feedback to the right people, and so much more! “ This year’s 47-member group includes 21 new champions. They come to us from countries all over the world, including Canada, the United States, South Korea, Malaysia, China, Australia, Serbia, Germany, India, Portugal, and Brazil. As a group, they represent a broad range of expertise and serve in a variety of community and professional roles—ranging from engineering leads to chief architects to heads of developer relations. “I’m excited to join the MongoDB Community Champions program because it brings together engineers who are deeply invested in solving real-world data challenges,” says Ruthvik Reddy Anumasu, Principal Database Engineer and a first-year Champion. “As someone who’s worked on scaling, securing, and optimizing critical data systems, I see this as a chance to both share practical insights and learn from others pushing boundaries.” Each Community Champion demonstrates exceptional leadership in advancing the growth and knowledge of MongoDB’s brand and technology. “Being part of the MongoDB Community Champions program is like a solo leveling process—from gathering like-minded personnel to presenting valuable insights that help others in their careers,” says Lai Kai Yong, a Software Engineer and first-year Champion. “I’m excited to continue shipping things, as I believe MongoDB is not only a great product and an amazing company, but also a vibe.” As members of this program, Community Champions gain a variety of experiences—including exclusive access to executives, product roadmaps, preview programs, an annual Champions Summit with product leaders—and relationships that grow their professional stature as MongoDB practitioners, helping them be seen as leaders in the technology community. “After working with MongoDB for more than a decade, I’m happy to be a MongoDB Community Champion,” says Patrick Pittich-Rinnerthaler, Hands-on Web Architect and first-year Champion. “One of the things I’m interested in particular, is the connection to other Champions and Engineers. Together, we enable customers and users to do more with MongoDB.” And now, without further ado, let’s meet the 2025 cohort of Community Champions! NEW COMMUNITY CHAMPIONS: Maria Khalusova, Margaret Menzin, Samuel Molling, Karen Zhang, Shaun Roberts, Joey Marburger, Steve Jones, Ruthvik Reddy Anumasu, Karen Huaulme, Lai Kai Yong, XiaoLei Dai, Luke Thompson, Darae Park, Kim Joong Hui, Rishi Agrawal, Sachin Hejip, Sachin Gupta, Patrick Pittich-Rinnerthaler, Marko Aleksendrić, PhD, Markus Wildgruber, Carla Barata. RETURNING COMMUNITY CHAMPIONS: Abirami Sukumaran, Arek Borucki, Azri Azmi, Christoph Strobl, Christopher Dellaway, Claudia Cardeno Cano, Elie Hannouch, Flavia da Silva Bomfim Policante, Igor Alekseev, Justin Jenkins, Kevin Smith, Leandro Domingues, Malak Abu Hammad, Mateus Leonardi, Michael Höller, Mustafa Kadioglu, Nancy Agarwal, Nenad Milosavljevic, Nilesh Soni, Nuri Halperin, Rajesh Nair, Roman Right, Shrey Batra, Tamara Manzi de Azevedo, Vivekanandan Sakthivelu, Zidan M. For more, visit our MongoDB Community Champions page. If you’d like to connect with your local MongoDB community, check out our MongoDB User Groups on Meetup .

July 15, 2025
Home

Improving Industrial Safety with Game Theory and MongoDB

In industrial operations, safety is both a business and a human imperative. Heavy-asset industries like aerospace, shipbuilding, and construction constantly invest in better safety systems and policies to keep their staff safe. But a variety of factors—tight physical environments, time pressures, and steep production targets—can lead workers to take unsafe shortcuts to meet quotas. For instance, the European Maritime Safety Agency (EMSA) cited 650 fatalities and over 7,600 injuries linked to marine incidents involving EU-registered ships between 2014 and 2023, and human factors contributed to 80% of these incidents. Traditional safety incident reporting tools focus on retrospective data. Such systems capture and document safety incidents only after they have occurred, meaning that companies are reacting to events rather than proactively preventing them. On the ground, factory and shipyard workers often find themselves having to make split-second choices: safety versus speed, following protocols versus meeting production targets, etc. To move beyond hindsight—and to proactively guarantee safety—organizations must be able to model and analyze these behavioral trade-offs in real time to build informed policy (as well as an organizational culture) that supports safe behavior on the ground. In this blog post, we’ll dive into how organizations can leverage MongoDB as a unified operational data store for time series sensor telemetry, worker decisions, and contextual factors. By consolidating this information into a single database, MongoDB makes it possible to easily generate proactive insights into how workers will act under different conditions, thereby improving safety policies and incentives. Modeling human decisions and trade-offs in industrial environments Game theory, a mathematical framework used to model and analyze strategic interactions between individuals or entities, can be leveraged here to better anticipate and influence operational decisions. Let’s use the example of a shipyard, in which workers must constantly weigh critical decisions—balancing safety against speed, following rules versus meeting deadlines, deciding whether to take a shortcut that helps them hit a deadline. These decisions are not random and are shaped by peer pressures, working conditions, management oversight, and the incentive structures in place. So in an industrial context, game theory allows us to simulate these decisions as an ongoing, repeated game. For example, “if a policy is too strict, do workers take more risks to save time?” or “if incentives favor speed, does safety compliance drop?” and most importantly, “how do these patterns evolve as conditions and oversight change?” By modeling these decisions and choices as part of a repeated game, we can simulate how workers behave under different combinations of policy strictness and incentive strength. To create such a game-theoretic system, we need to bring together different data sets—real-time environmental sensor telemetry, worker profiles, operations context, etc.—and use this data to simulate a game-theoretic model. A behavior-aware safety simulation engine powered by MongoDB enables this approach; the engine brings together disparate data and models it using MongoDB’s flexible document model. The document model can easily adapt to the fast-changing, real-time conditions, meaning that companies can leverage MongoDB to build data-driven and dynamic safety policy tuning systems in order to predict where, when, and why risky behavior might occur during daily operations . MongoDB Atlas: Turning game theory into industrial intelligence To bring this model to life, we need to simulate, store, and analyze decision flows in real time. This is where MongoDB Atlas plays a central role. In this example, we will build this solution for shipyard operations. Figure 1 shows the conceptual architecture of our simulation engine, in which MongoDB acts as both the behavioral memory and analytical core, capturing decisions, scoring risk, and enabling feedback-driven policy experimentation. Figure 1. A closed feedback loop for safer shipyards. Per below, we can see the figure’s architecture definition of each element that drives smarter decision-making with smarter outcomes for a seamless, real-time integration: Time series data storage: All worker actions/decisions and sensor (temperature, gas, humidity, etc.) data are stored in MongoDB collections as a central, flexible operational database. Game theoretic decision modeling: A game theory-based simulator models worker trade-offs under different policy and incentive setups. Data contextualization and storage: MongoDB stores not just the raw sensor data but context as well, which includes payoff and risk. Flexibility of the document model enables easy data modelling. Risk scoring and analysis: MongoDB’s Aggregation Framework helps analyze trends over time to detect rising risk profiles or policy blind spots. Adaptive safety design: Safety teams can tweak policies and incentives directly, shaping safer behavior before incidents occur. MongoDB acts as the data backbone for the entire solution, storing three key datasets; the code snippets below show a detailed document model visibility per collection in Atlas: Environmental telemetry (sensor_data time series collection) from simulated or actual sensors in the shipyard: { "timestamp": { "$date": "2025-06-06T20:00:22.970Z" }, "zone": "Tank Zone", "run_id": "9722c0e7-c10d-4526-a1a1-2647c9731589", "_id": { "$oid": "684348d687d59464d1f498d0" }, "temperature": 42.6, "gas": "normal" } Worker profiles (workers collection) capturing static attributes and evolving risk indicators: { "timestamp": "2025-04-15T01:57:04.938Z", "workerId": "W539", "zone": "Tank Zone", "environment": { "temperature": 35.3, "gas": "normal" }, "incentive": "high", "decision": "followed_procedure", "policy": "strict", "computed": { "risk_score": 0.24, "payoff": 3 }, "_id": { "$oid": "67fdbcf0b9b3624b42add7b4" } } Behavior logs (worker_behavior time series collection) recording every simulated or real decision made in context (policy, incentive, zone): { "_id": "W539", "name": "Worker89", "role": "Welder", "risk_profile": { "avg_shortcut_rate": 0, "historical_decision_trends": [ { "policy": "strict", "incentive": "high", "rate": 0 } ] }, "metadata": { "ppe_compliance": "good", "training_completed": [ "confined space", "hazmat" ] } } Figure 2, meanwhile, shows the physical architecture of the behavior-aware simulation system. Here, MongoDB acts as the central data backbone, providing data to the risk and decision dashboard for trend analysis and policy experimentation. Figure 2. Physical architecture of the behavior-aware simulation system. MongoDB provides all the foundational building blocks to power our simulation engine from end to end. The time series collections enable high-speed ingestion of sensor data while built-in compression and windowing functions support efficient risk scoring and trend analysis at scale. This eliminates the need for an external time series database. Change streams and Atlas Stream Processing power real-time dashboards and risk analytics pipelines that respond to new inputs as they occur. As policies, sensors, or simulator logic evolve over time, MongoDB’s flexible schema ensures that you do not need to rework your data model or incur any downtime. Finally, Atlas Vector Search can help derive insights from unstructured text data such as incident reports or operator feedback. Figure 3 shows the solution in action; over time, the risk profiles of simulated workers rise because of the policy leniency and low incentive levels. The figure highlights how even well-meaning safety policies can unintentionally encourage risky behavior and even workplace accidents—which is why it’s critical to simulate and evaluate policies’ impact before deploying them in the real world. Figure 3. Game theoretic safety simulation overview. With these safety insights stored and analyzed in MongoDB, organizations can run what-if scenarios, adjust policy configurations, and measure predicted behavioral outcomes in advance. The organizational impact of such a system is significant because safety leaders can move away from reactive investigations to proactive policy design. For example, a shipyard might decide to introduce targeted safety training for specific zones, or fine-tune supervision protocols based on the simulation outcomes, rather than waiting for an actual incident to occur. Together, these features make MongoDB uniquely suited to drive safety innovation where real-world complexity demands flexible and scalable infrastructure. Check out the repo of this solution that you can clone and try out yourself. To learn more about MongoDB’s role in the manufacturing industry, please visit our manufacturing and automotive page .

July 14, 2025
Home

Build an AI-Ready Data Foundation with MongoDB Atlas on Azure

It’s time for a database reality check. While conversations around AI usually focus on its immense potential, these advancements are also bringing developers face to face with an immediate challenge: Their organizations’ data infrastructure isn’t ready for AI. Many developers now find themselves trying to build tomorrow’s applications on yesterday’s foundations. But what if your database could shift from bottleneck to breakthrough? Is your database holding you back? Traditional databases were built for structured data in a pre-AI world—they’re simply not designed to handle today’s need for flexible, real-time data processing. Rigid schemas force developers to spend time managing database structure instead of building features, while separate systems for operational data and analytics create costly delays and complexity. Your data architecture might be holding you back if: Your developers spend more time wrestling with data than innovating. AI implementation feels like forcing a square peg into a round hole. Real-time analytics are anything but real-time. Go from theory to practice: Examples of modern data architecture at work Now is the time to rethink your data foundation by moving from rigid to flexible schemas that adapt as applications evolve. Across industries, leading organizations are unifying operational and analytical structures to eliminate costly synchronization processes. Most importantly, they’re embracing databases that speak developers’ language. In the retail sector , business demands include dynamic pricing that responds to market conditions in real-time. Using MongoDB Atlas with Azure OpenAI from Microsoft Azure, retailers are implementing sophisticated pricing engines that analyze customer behavior and market conditions, enabling data-driven decisions at scale. In the healthcare sector , organizations can connect MongoDB Atlas to Microsoft Fabric for advanced imaging analysis and results management, streamlining the flow of critical diagnostic information while maintaining security and compliance. More specifically, when digital collaboration platform Mural faced a 1,700% surge in users, MongoDB Atlas on Azure handled its unstructured application data. The results aligned optimally with modern data principles: Mural’s small infrastructure team maintained performance during massive growth, while other engineers were able to focus on innovation rather than database management. As noted by Mural’s Director of DevOps, Guido Vilariño, this approach enabled Mural’s team to “build faster, ship faster, and ultimately provide more expeditious value to customers.” This is exactly what happens when your database becomes a catalyst rather than an obstacle. Shift from “database as storage” to “database as enabler” Modern databases do more than store information—they actively participate in application intelligence. When your database becomes a strategic asset rather than just a record-keeping necessity, development teams can focus on innovation instead of infrastructure management. What becomes possible when data and AI truly connect? Intelligent applications can combine operational data with Azure AI services. Vector search capabilities can enhance AI-driven features with contextual data. Applications can handle unpredictable workloads through automated scaling. Seamless integration occurs between data processing and AI model deployment. Take the path to a modern data architecture The deep integration between MongoDB Atlas and Microsoft’s Intelligent Data Platform eliminates complex middleware, so organizations can streamline their data architecture while maintaining enterprise-grade security. The platform unifies operational data, analytics, and AI capabilities—enabling developers to build modern applications without switching between multiple tools or managing separate systems. This unified approach means security and compliance aren’t bolt-on features—they’re core capabilities. From Microsoft Entra ID integration for access control to Azure Key Vault for data protection, the platform provides comprehensive security while simplifying the development experience. As your applications scale, the infrastructure scales with you, handling everything from routine workloads to unexpected traffic spikes without adding operational complexity. Make your first move Starting your modernization journey doesn’t require a complete infrastructure overhaul or the disruption of existing operations. You can follow a gradual migration path that prioritizes business continuity and addresses specific challenges. The key is having clear steps for moving from legacy to modern architecture. Make decisions that simplify rather than complicate: Choose platforms that reduce complexity rather than add to it. Focus on developer experience and productivity. Prioritize solutions that scale with your needs. For example, you can begin with a focused proof of concept that addresses a specific challenge—perhaps an AI feature that’s been difficult to implement or a data bottleneck that’s slowing development. Making small wins in these areas demonstrates value quickly and builds momentum for broader adoption. As you expand your implementation, focus on measurable results that matter to your organization. Tracking these metrics—whether they’re developer productivity, application performance, or new capabilities—helps justify further investment and refine your approach. Avoid these common pitfalls As you undertake your modernization journey, avoid these pitfalls: Attempting to modernize everything simultaneously: This often leads to project paralysis. Instead, prioritize applications based on business impact and technical feasibility. Creating new data silos: In your modernization efforts, the goal must be integration and simplification. Adding complexity: remember that while simplicity scales, complexity compounds. Each decision should move you toward a more streamlined architecture, not a more convoluted one. The path to a modern, AI-ready data architecture is an evolution, not a revolution. Each step builds on the last, creating a foundation that supports not just today’s applications but also tomorrow’s innovations. Take the next step: Ready to modernize your data architecture for AI? Explore these capabilities further by watching the webinar “ Enhance Developer Agility and AI-Readiness with MongoDB Atlas on Azure .” Then get started on your modernization journey! Visit the MongoDB AI Learning Hub to learn more about building AI applications with MongoDB.

July 8, 2025
Artificial Intelligence

Why Relational Databases Are So Expensive to Enterprises

Relational databases were designed with a foundational architecture based on the premise of normalization. This principle—often termed “3rd Normal Form”—dictates that repeating groups of information are systematically cast out into child tables, allowing them to be referenced by other entities. While this design inherently reduces redundancy, it significantly complicates underlying data structures. Figure 1. Relational database normalization structure for insurance policy data. Every entity in a business process, its attributes, and their complex interrelations must be dissected and spread across multiple tables—policies, coverages and insured items, each becoming a distinct table. This traditional decomposition results in a convoluted network of interconnected tables that developers must constantly navigate to piece back together the information they need. The cost of relational databases Shrewd C-levels and enterprise portfolio managers are interested in managing cost and risk, not technology. Full stop. This decomposition into countless interconnected tables comes at a significant cost across multiple layers of the organization. Let’s break down the cost of relational databases for three different personas/layers: Developer and software layer Let’s imagine that as a developer you’re dealing with a business application that must create and manage customers and their related insurance policies. That customer has addresses, coverages, and policies. Each policy has insured objects and each object has its own specificities. If you’re building relational databases, it’s likely that you may be dealing with a dozen or more database objects that represent the aggregate business object of policy. In this design, all of these tables require you to break up the logical dataset into many parts, insert that data across many tables, and then execute complex JOIN operations when you wish to retrieve and edit it. As a developer, you’re familiar with working with object-oriented design, and to you, all of those tables likely represent one to two major business objects: the customer and the policy. With MongoDB, these dozen or more relational database tables can be modeled as one single object (see Figure 2). Figure 2. Relational database complexity vs. MongoDB document model for insurance policy data. At the actual business application-scale, with production data volumes, we start to truly see just how complicated this can get for the developers. In order to render it meaningfully to the application user interface, it must be constantly joined back together. When it’s edited, it must again be split apart, and saved into those dozen or more underlying database tables. Relational is therefore not only a more complex storage model, but it’s also cognitively harder to figure out. It’s not uncommon for a developer who didn’t design the original database, and is newer to the application team, to struggle to understand, or even mis-interpret a legacy relational model. Additionally, the normalized relational requires more code to be written for basic create, update, and read operations. An object relational mapping layer will often be introduced to help translate the split-apart representation in the database to an interpretation that the application code can more easily navigate. Why is this so relevant? Because more code equals more developer time and ultimately more cost. Overall it takes noticeably longer to design, build, and test a business feature when using a relational database than it would with a database like MongoDB. Finally, changing a relational schema is a cumbersome process. ALTER TABLE statements are required to change the underlying database object structure. Since relational tables are like spreadsheets, they can only have one schema at any given point in time. Your business feature requires you to add new fields? You must alter the single, fixed schema that is bound to the underlying table. This might seem to be a quick and easy process to execute in a development environment, but by the time you get to the production database, deliberate care, caution must be applied, and extra steps are mandatory to ensure that you do not jeopardize the integrity of the business applications that use the database. Altering production table objects incurs significant risk, so organizations must put in place lengthy and methodical processes that ensure change is thoroughly tested and scheduled, in order to minimize possible disruption. The fundamental premise of normalization, and its corresponding single, rigid and predefined table structures are a constant bottleneck when it comes to speed and cost to market. Infrastructure administrator Performing JOIN operations across multiple database objects at runtime requires more computational resources than if you were to retrieve all of the data you need from a single database object. If your applications are running against well-designed, normalized relational databases, your infrastructure is most certainly feeling the resource impact of those joins. Across a portfolio of applications, the hardware costs of normalization add up. For a private data center, it can mean the need to procure additional, expensive hardware. For the cloud, it likely means your overall spending is higher than that of a portfolio running on a more efficient design (like MongoDB’s Document Model). Ultimately, MongoDB allows more data-intensive workloads to be run on the same server infrastructure than that of relational databases, and this directly translates to lower infrastructure costs. In addition to being inefficient at the hardware layer, normalized relational tables result in complex ways in which the data must be conditionally joined together and queried, especially within the context of actual business rules. Application developers have long pushed this complex logic ‘down to the database’ in an effort to reduce complexity at the application layer, as well as preserve application tier memory and cpu. This decades-long practice can be found across every industry, and in nearly every flavor and variant of relational database platforms. The impact is multi-fold. Database administrators, or those specialized in writing and modifying complex SQL ‘stored procedures,’ are often called upon to augment the application developers who maintain code at the application tier. This external dependency certainly slows down delivery teams tasked with making changes to these applications, but it’s just the tip of the iceberg. Below the waterline, there exists a wealth of complexity. Critical application business logic ends up bifurcated; some in the database as SQL, and some in the application tier in a programming language. The impact to teams wishing to modernize or refactor legacy applications is significant in terms of the level of complexity that must be dealt with. At the root of this complexity and phenomenon is the premise of normalized database objects, which would otherwise be a challenge to join and search, if done at the application tier. Portfolio manager An Application Portfolio Manager is responsible for overseeing an organization’s suite of software applications, ensuring they align with business goals, provide value, and are managed efficiently. The role typically involves evaluating, categorizing, and rationalizing application catalogs to reduce redundancy, lower costs, and enhance the overall ability to execute the business strategy. In short, the portfolio manager cares deeply about speed, complexity, and cost to market. At a macro level, a portfolio with relational databases translates into slower teams that deliver fewer features per agile cycle. In addition, a larger staff is needed as database/infrastructure admins are a necessary interface between the developers and the database. Unlike relational databases, MongoDB allows developers to maintain more than simply one version of a schema at a given time. In addition, documents contain both data and structure, which means you don’t need the complex, lengthy, and risky change cycles that relational demands, to simply add or edit existing fields within the database. The result? Software teams deliver more features than is possible with relational databases, with less time, cost, and complexity. Something the business owners of the portfolio will certainly appreciate, even if they don’t understand the underlying technology. Add in the fact that MongoDB runs more efficiently on the same hardware than relational databases, and your portfolio will see even more cost benefits. Beyond relational databases: A new path to efficiency and agility The fundamental premise of normalization, and its corresponding single, rigid, and predefined table structures are a constant bottleneck when it comes to speed, cost, and complexity to market. At a time when the imperative is to leverage AI to lower operating expenses, the cost, complexity, and agility of the underlying database infrastructure needs to be scrutinized. In contrast, MongoDB’s flexible Document Model offers a superior, generational step-change forward. One that enables your developers to move more quickly, runs more efficiently on anyone's hardware, yours or a cloud data center, and increases your application portfolio's speed to market for advancing the business agenda. Transform your enterprise data architecture today. Start with our free Overview of MongoDB and the Document Model course at MongoDB University, then experience the speed and flexibility firsthand with a free MongoDB Atlas cluster.

July 7, 2025
Home

New in MongoDB Atlas Stream Processing: External Function Support

Today we're excited to introduce External Functions, a new capability in MongoDB Atlas Stream Processing that lets you invoke AWS Lambda, directly from your streaming pipelines. The addition of External Functions to Atlas Stream Processing unlocks new ways to enrich, validate, and transform data in-flight, enabling smarter and more modular event-driven applications. This functionality is available through a new pipeline stage, $externalFunction. What are external functions? External functions allow you to integrate Atlas Stream Processing with external logic services such as AWS Lambda. This lets you reuse existing business logic, perform AI/ML inference, or enrich and validate data as it moves through your pipeline, all without needing to rebuild that logic directly in your pipeline definition. AWS Lambda is a serverless compute service that runs your code in response to events, scales automatically, and supports multiple languages (JavaScript, Python, Go, etc.). Because there’s no infrastructure to manage, Lambda is ideal for event-driven systems. Now, by using external functions, you can seamlessly plug that logic into your streaming workloads. Where $externalFunction fits in your pipeline MongoDB Atlas Stream Processing can connect to a wide range of sources and output to various sinks. The diagram below shows a typical streaming architecture: Atlas Stream Processing ingests data, enriches it with stages like $https and $externalFunction, and routes the transformed results to various destinations. Figure 1. A high-level visual of a stream processing pipeline. The $externalFunction stage can be placed anywhere in your pipeline (except as the initial source stage) allowing you to inject external logic at any step. Atlas Stream Processing supports two modes for invoking external functions—synchronous and asynchronous. Synchronous execution type In synchronous mode, the pipeline calls the Lambda function and waits for a response. The result is stored in a user-defined field (using the “as” key) and passed into the following stages. let syncEF = { $externalFunction: { connectionName: "myLambdaConnection", functionName: "arn:aws:lambda:region:account-id:function:function-name", execution: "sync", as: "response", onError: "fail", payload: [ { $replaceRoot: { newRoot: "$fullDocument.payloadToSend" } }, { $addFields: { sum: { $sum: "$randomArray" }}}, { $project: { success: 1, sum: 1 }} ] } } Let’s walk through what each part of the $externalFunction stage does in this synchronous setup: connectionName: external function connection name specified in the Connection Registry. functionName: full AWS ARN or the name of the AWS Lambda function. execution: Indicates synchronous execution ("sync") as opposed to asynchronous (“async). as: specifies the Lambda response will be stored in the “response” field. onError: behavior when the operator encounters an error (in this case "fail" stops the processor). The default is to add the event to the dead letter queue. payload: inner pipeline that allows you to customize the request body sent, using this allows you to decrease the size of the data passed and ensure only relevant data is sent to the external function. This type is useful when you want to enrich or transform a document using external logic before it proceeds through the rest of the pipeline. Asynchronous execution type In async mode, the function is called, but the pipeline does not wait for a response. This is useful when you want to notify downstream systems, trigger external workflows, or pass data into AWS without halting the pipeline. let asyncEF = { $externalFunction: { connectionName: "EF-Connection", functionName: "arn:aws:lambda:us-west-1:12112121212:function:EF-Test", execution: "async" } } Use the async execution type for propagating information outward, for example: Triggering downstream AWS applications or analytics Notifying external systems Firing off alerts or billing logic Real-world use case: Solar device diagnostics To illustrate the power of external functions, let’s walk through an example: a solar energy company wants to monitor real-time telemetry from thousands of solar devices. Each event includes sensor readings (e.g., temperature, power output) and metadata like device_id and timestamp. These events need to be processed, enriched and then stored into a MongoDB Atlas collection for dashboards and alerts. This can easily be accomplished using a synchronous external function. Each event will be sent to a Lambda function that enriches the record with a status (e.g., ok, warning, critical) as well as diagnostic comments. After which the function waits for the enriched events to be returned and then sends them to the desired MongoDB collection. Step 1: Define the external function connection First, create a new AWS Lambda connection in the Connection Registry within Atlas. You can authenticate using Atlas's Unified AWS Access, which securely connects Atlas and your AWS account. Figure 2. Adding an AWS Lambda connection in the UI. 2. Implement the lambda function Here’s a simple diagnostic function. It receives solar telemetry data, checks it against thresholds, and returns a structured result. export const handler = async (event) => { const { device_id, group_id, watts, temp, max_watts, timestamp } = event; // Default thresholds const expectedTempRange = [20, 40]; // Celsius const wattsLowerBound = 0.6 * max_watts; // 60% of max output let status = "ok"; let messages = []; // Wattage check if (watts < wattsLowerBound) { status = "warning"; messages.push(`Observed watts (${watts}) below 60% of max_watts (${max_watts}).`); } // Temperature check if (temp < expectedTempRange[0] || temp > expectedTempRange[1]) { status = "warning"; messages.push(`Temperature (${temp}°C) out of expected range [${expectedTempRange[0]}–${expectedTempRange[1]}].`); } // If multiple warnings, escalate to critical if (messages.length > 1) { status = "critical"; } return { device_id, status, timestamp, watts_expected_range: [wattsLowerBound, max_watts], temp_expected_range: expectedTempRange, comment: messages.length ? messages.join(" ") : "All readings within expected ranges." }; }; 3. Create the streaming pipeline Using VS Code, define a stream processor using the sample solar stream as input. let s = { $source: { connectionName: 'sample_stream_solar' } }; // Define the External Function let EFStage = { $externalFunction: { connectionName: "telemetryCheckExternalFunction", onError: "fail", functionName: "arn:aws:lambda:us-east-1:121212121212:function:checkDeviceTelemetry", as: "responseFromLambda", } }; // Replace the original document with the Lambda response let projectStage = { $replaceRoot: { newRoot: "$responseFromLambda" } }; // Merge the results into a DeviceTelemetryResults collection let sink = { $merge: { into: { connectionName: "IoTDevicesCluster", db: "SolarDevices", coll: "DeviceTelemetryResults" } } }; sp.createStreamProcessor("monitorSolarDevices", [s, EFStage, projectStage, sink]); sp.monitorSolarDevices.start(); Once running, the processor ingests live telemetry data, invokes the Lambda diagnostics logic, and returns enriched results to MongoDB Atlas, complete with status and diagnostic comments. 4. View enriched results in MongoDB Atlas Explore the enriched data in MongoDB Atlas using the Data Explorer . For example, filter all documents where status = "ok" after a specific date. Figure 3. Data Explorer filtering for all documents with a status of “ok” from June 14 onwards. Smarter stream processing with external logic MongoDB Atlas Stream Processing external functions allow you to enrich your data stream with logic that lives outside the pipeline, making your processing smarter and more adaptable. In this example, we used AWS Lambda to apply device diagnostics in real-time and store results in MongoDB. You could easily extend this to use cases in fraud detection, personalization, enrichment from third-party APIs, and more. Log in today to get started, or check out our documentation to create your first external function. Have an idea for how you'd use external functions in your pipelines? Let us know in the MongoDB community forum !

July 3, 2025
Home

Introducing Query Shape Insights in MongoDB Atlas

As modern applications scale, databases are often the first to show signs of stress, especially when query patterns shift or inefficiencies arise. MongoDB has invested in building a robust observability suite to help teams monitor and optimize performance. Tools such as the Query Profiler and, more recently, Namespace Insights provide deep visibility into query behavior and collection-level activity. While powerful, these capabilities primarily focus on individual queries or collections, limiting their ability to surface systemic patterns that impact overall application performance. Today, MongoDB is excited to announce Query Shape Insights, a powerful new feature for MongoDB Atlas that offers a high-resolution, holistic view of how queries behave at scale across clusters. Query Shape Insights delivers a paradigm shift in visibility by surfacing aggregated statistics for the most resource-intensive query shapes. This accelerates root cause analysis, streamlines optimization workflows, and improves operational efficiency. Figure 1. Overview page of Query Shape Insights showing the most resource-intensive query shapes. A new granularity for performance analysis Previously, if a modern application experienced a traffic surge, it risked overloading the database with queries, causing rapid performance degradation. In those critical moments, developers and database administrators must quickly identify the queries contributing most acutely to the bottleneck. This necessitated scrutinizing logs or per-query samples. With the launch of Query Shape Insights, the top 100 query shapes are surfaced by grouping structurally similar queries with shared filters, projects, and aggregation stages into defined query shapes. These query shapes are then ranked by total execution time, offering MongoDB Atlas users greater visibility into the most resource-intensive queries. Each query shape is enriched with detailed metrics such as execution time, operation count, number of documents examined and returned, and bytes read. These metrics are rendered as time series data, enabling developers and database administrators to pinpoint when the regressions began, how long they persisted, and what triggered them. Figure 2. Detailed view of a query shape, with a pop-up displaying associated metrics. This new feature integrates seamlessly into the performance workloads teams use to monitor, debug, and optimize applications. Each query shape includes associated client metadata, such as application name, driver version, and host. This empowers teams to identify which services, applications, or teams impact performance. This level of visibility is particularly valuable for microservices-based environments, where inefficiencies might manifest across multiple teams and services. Query Shape Insights adapts based on cluster tier to support varying workload sizes. Teams can analyze the performance data of each query shape over a 7-day window. This enables them to track trends, find changes in application behavior, and identify slow regressions that might otherwise be missed. Integration with MongoDB’s observability suite Query Shape Insights was designed to enable MongoDB Atlas users to move from detection to resolution with unprecedented speed and clarity. Built directly into the MongoDB Atlas experience, this feature is a clear starting point for performance investigations. This is imperative for dynamic environments where application behavior evolves rapidly and bottlenecks must be identified and resolved rapidly. The Query Shape Insights dashboard offers comprehensive, time series–based analysis of query patterns across clusters. It enables teams to detect inefficiencies and understand when and how workloads have changed. Query Shape Insights answers critical diagnostic questions by surfacing the most resource-intensive query shapes. It identifies the workloads that consume the most resources and can help determine whether these workloads are expected or anomalous. Query Shape Insights can also help identify the emergence of new workloads and reveal how workloads have changed over time. To support this level of analysis, Query Shape Insights offers a rich set of capabilities, giving teams the clarity and speed they need to troubleshoot intelligently and maintain high-performing applications: Unified query performance view: Monitor query shapes to rapidly identify and investigate bottlenecks. Detailed query shape statistics: Track key metrics including execution time, document return counts, and execution frequency. Interactive analysis tools: Query shape drill-downs to view detailed metadata and performance trends. Flexible filtering options: Narrow analysis by shard/host, data range, namespace, or operation type. Programmatic access: Leverage MongoDB’s new Admin API endpoint to integrate query shape data with the existing observability stack. After using Query Shape Insights, MongoDB Atlas users can pivot directly to Query Profiler with filters pre-applied to the specific collection and operation type for more information beyond that provided by Query Shape Insights. Once they have traced the issue to its root, users can continue their diagnostics journey by visiting Performance Advisor . This recommends indexes tailored to the query shape, ensuring that cluster optimizations are data-driven and precise. Query Shape Insights is a leap forward in how teams manage, investigate, and respond to performance issues with MongoDB. By introducing a high-level, shape-aware view of query activity, Query Shape Insights enhances traditional reactive troubleshooting with greater clarity. This enables teams to troubleshoot faster and monitor performance effectively. Query Shape Insights is now available for all MongoDB Atlas dedicated clusters (M10 and above) deployments. Clusters must run on MongoDB 8.0 or later to access this feature. Support for Cloud Manager deployments is planned for the future. Check out MongoDB’s documentation for more details on Query Shape Insights. Start using Query Shape Insights today through your MongoDB Atlas portal.

July 2, 2025
Home

Build Event-Driven Apps Locally with MongoDB Atlas Stream Processing

Building event-driven architectures (EDAs) often poses challenges, particularly when you’re integrating complex cloud components with local development services. For developers, working directly from a local environment provides convenience, speed, and flexibility. Our demo application demonstrates a unique development workflow that balances local service integration with cloud stream processing, showcasing portable, real-time event handling using MongoDB Atlas Stream Processing and ngrok. With MongoDB Atlas Stream Processing, you can streamline the development of event-driven systems while maintaining all the components locally. Using this service’s capabilities alongside ngrok, this demo application shows a secure way to interact with cloud services directly from your laptop, ensuring you can build, test, and refine applications with minimal friction and maximum efficiency. Using MongoDB Atlas Stream Processing MongoDB Atlas Stream Processing is a powerful feature within the MongoDB Atlas modern database that enables you to process data streams in real time using the familiar MongoDB Query API (and aggregation pipeline syntax). It integrates seamlessly with MongoDB Atlas clusters, Apache Kafka, AWS Lambda, and external HTTP endpoints. Key takeaway #1: Build event-driven apps more easily with MongoDB Atlas Stream Processing One of the primary goals of MongoDB Atlas Stream Processing is to simplify the development of event-driven applications. Instead of managing separate stream processing clusters or complex middleware, you can define your processing logic directly within MongoDB Atlas. This means: A unified platform: Keep your data storage and stream processing within the same ecosystem. Familiar syntax: Use the MongoDB Query API and aggregation pipelines you already know. Managed infrastructure: Let MongoDB Atlas handle the underlying infrastructure, scaling, and availability for your stream processors. Key takeaway #2: Develop and test locally, deploy globally A significant challenge in developing event-driven systems is bridging the gap between your local development environment and cloud-based services. How do you test interactions with services running on your laptop? You can configure MongoDB Atlas Stream Processing to connect securely to HTTP services and even Apache Kafka instances running directly on your development machine! You can typically achieve this using a tunneling service like ngrok, which creates secure, publicly accessible URLs for your local services. MongoDB Atlas Stream Processing requires HTTPS for HTTP endpoints and specific Simple Authentication and Security Layer protocols for Apache Kafka, making ngrok an essential tool for this local development workflow. Introducing the real-time order fulfillment demo To showcase these capabilities in action, we’ve built a full-fledged demo application available on GitHub . Figure 1. High-level architecture diagram. This demo simulates a real-time order fulfillment process using an event-driven architecture orchestrated entirely by MongoDB Atlas Stream Processing. What the demo features A shopping cart service: Generates events when cart items change. An order processing service: Handles order creation and validation (running locally as an HTTP service). A shipment service: Manages shipment updates. Event source flexibility: Can ingest events from either a MongoDB capped collection or an Apache Kafka topic (which can also run locally). Processors from Atlas Stream Processing: Act as the central nervous system, reacting to events and triggering actions in the different services. An order history database: Centralizes status updates for easy tracking. Figure 2. High-level sequence diagram of a flow. How the demo uses MongoDB Atlas Stream Processing and local development Event orchestration: MongoDB Atlas Stream Processing instances listen for shopping cart events (from MongoDB or Kafka). Local service interaction: An ASP processor calls the Order Processing Service running locally on localhost via an ngrok HTTPS tunnel. Kafka integration (optional): Demonstrates ASP connecting to a local Kafka broker, also tunneled via ngrok . Data enrichment & routing: Processors enrich events and route them appropriately (e.g., validating order, triggering shipments). Centralized logging: All services write status updates to a central MongoDB collection that functions as a continuously materialized view of order status and history. This demo practically illustrates how you can build sophisticated, event-driven applications using ASP while performing key development and testing directly on your laptop, interacting with local services just as you would in a deployed environment. What the demo highlights Real-world EDA: Provides a practical example of asynchronous service communication. Orchestration powered by MongoDB Atlas Stream Processing: Shows how this service manages complex event flows. Local development workflow: Proves the concept of connecting this service to local HTTP / Apache Kafka via ngrok. Flexible event ingestion: Supports both MongoDB and Apache Kafka sources. Centralized auditing: Demonstrates easy status tracking via a dedicated history collection. Get started with the demo! MongoDB Atlas Stream Processing significantly lowers the barrier to entry for building robust, real-time EDAs. Its ability to integrate seamlessly with MongoDB Atlas, external services, and, crucially, your local development environment (thanks to tools like ngrok) makes it a powerful addition to the developer toolkit. Explore the demo project, dive into the code, and see for yourself how ASP can simplify your next event-driven architecture, starting right from your own laptop! Ready to see it in action? Head over to the GitHub repository ! The repository’s README.md file contains comprehensive, step-by-step instructions to get you up and running. In summary, you’ll: Clone the repository. Set up a Python virtual environment and install dependencies. Crucially, set up ngrok to expose your local order-processing service (and Apache Kafka, if applicable) via secure tunnels. (Details in the README.md appendix!) Configure your .env file with MongoDB Atlas credentials, API keys, and the ngrok URLs. Run scripts to create necessary databases, collections, and the MongoDB Atlas Stream Processing instance/connections/processors. Start the local order_processing_service.py . Run the shopping_cart_event_generator.py to simulate events. Query the order history to see the results! For detailed setup guidance, especially regarding ngrok configuration for multiple local services (HTTP and TCP / Apache Kafka), please refer to the appendix of the project's README.md .

July 1, 2025
Home

Data Modeling Strategies for Connected Vehicle Signal Data in MongoDB

Today’s connected vehicles generate massive amounts of data. According to an article from S&P Global Mobility, a single modern car produces nearly 25GB of data per hour. To put that in perspective: that’s like each car pumping out the equivalent of six full-length Avatar movies in 4K—every single day! Now scale that across millions of vehicles, and it’s easy to see the challenge ahead. Of course, not all of that data needs to be synchronized to the cloud—but even a fraction of it puts significant pressure on the systems tasked with processing, storing, and analyzing it at scale. The challenge isn’t just about volume. The data is fast-moving and highly diverse—from telematics and location tracking to infotainment usage and driver behavior. Without a consistent structure, this data is hard to use across systems and organizations. That’s why organizations across the industry are working to standardize how vehicle data is defined and exchanged. One such example is the Connected Vehicle Systems Alliance or COVESA , which developed the Vehicle Signal Specification (VSS)—a widely adopted, open data model that helps normalize vehicle signals and improve interoperability. But once data is modeled, how do you ensure it's persistent and available at all times in real-time? To meet these demands, you need a data layer that's flexible, reliable, and performant at scale. This is precisely where a robust data solution designed for modern needs becomes essential. In this blog, we’ll explore data strategies for connected vehicle systems using VSS as a reference model, with a focus on real-world applications like fleet management. These strategies are particularly effective when implemented on flexible, high-performance databases like MongoDB, a solution trusted by leading automotive companies . Is your data layer ready for the connected car era? Relational databases were built in an era when saving storage space was the top priority. They work well when data fits neatly into tables and columns—but that’s rarely the case with modern, high-volume, and fast-moving vehicle data. Telematics, GPS coordinates, sensor signals, infotainment activity, diagnostic logs—data that’s complex, semi-structured, and constantly evolving. Trying to force it into a rigid schema quickly becomes a bottleneck. That’s why many in the automotive world are moving to document-oriented databases. A full-fledged data solution, designed for modern needs, can significantly simplify how one works with data, scale effortlessly as demands grow, and adapt quickly as systems evolve. A solution embodying these capabilities, like MongoDB, supports the demands of complex connected vehicle systems. Its features include: Reduced complexity: The document model mirrors the way developers already structure data in their code. This makes it a natural fit for vehicle data, where data often comes in nested, hierarchical formats. Scale by design: MongoDB’s distributed architecture and flexible schema design help simplify scaling. It reduces interdependencies, making it easier to shard workloads without performance headaches. Built for change: Vehicle platforms are constantly evolving, and MongoDB makes it easy to update data models without costly migrations or downtime, keeping development fast and agile. AI-ready: MongoDB supports a wide variety of data types—structured, time series, vector, graph—which are essential for AI-driven applications. This makes it the natural choice for AI workloads, simplifying data integration and accelerating the development of smart systems. Figure 1. The MongoDB connected car data platform. These capabilities are especially relevant in connected vehicle systems. Companies like Volvo Connect use MongoDB Atlas to track 65 million daily events from over a million vehicles, ensuring real-time visibility at massive scale. Another example is SHARE NOW , which handles 2TB of IoT data per day from 11,000 vehicles across 16 cities, using MongoDB to streamline operations and deliver better mobility experiences. It’s not just the data—it’s how you use it Data modeling is where good design turns into great performance. In traditional relational systems, modeling starts with entities and relationships to focus on minimizing data duplication. MongoDB flips that mindset. You still care about entity relationships—but what really drives design is how the data will be used. The core principle? Data that is accessed together should be stored together. Let’s bring this to life. Take a fleet management system. The workload includes vehicle tracking, diagnostics, and usage reporting. Modeling in MongoDB starts by understanding how that data is produced and consumed. Who’s reading it, when, and how often? What’s being written, and at what rate? Below, we show a simplified workload table that maps out entities, operations, and expected rates. Table 1. Fleet management workload example. Now, to the big question: how do you model connected vehicle signal data in MongoDB? It depends on the workload. If you're using COVESA’s VSS as your signal definition model, you already have a helpful structure. VSS defines signals as a hierarchy: attributes (rarely change, like tank size), sensors (update often, like speed), and actuators (reflect commands, like door lock requests). This classification is a great modeling hint. VSS’s tree structure maps neatly to MongoDB documents. You could store the whole tree in a single document, but in most cases, it’s more effective to use multiple documents per vehicle. This approach better reflects how the data is produced and consumed—leading to a model that’s better suited for performance at scale. Now, let’s look at two examples that show different strategies depending on the workload. Figure 2. Sample VSS tree. Source: Vehicle Signal Specification documentation . Example 1: Modeling for historical analysis For historical analysis—like tracking fuel consumption trends—time-stamped data needs to be stored efficiently. Access patterns may include queries like “What was the average fuel consumption per km in the last hour?” or “How did the fuel level change over time?” Here, separating static attributes from dynamic sensor signals helps minimize unnecessary updates. Grouping signals by component (e.g., powertrain, battery) allows updates to be scoped and efficient. MongoDB Time Series collections are built for exactly this kind of data, offering optimized storage, automatic bucketing, and fast time-based queries. Example 2: Modeling for the last vehicle state If your focus is real-time state—like retrieving the latest signal values for a vehicle—you’ll prioritize fast reads and lightweight updates. Common queries include “What’s the latest coolant temperature?” or “Where are all fleet vehicles right now?” In this case, storing a single document per vehicle or update group with only the most recent signal values works well. Updating fields in place avoids document growth and keeps read complexity low. Grouping frequently updated signals together and flattening nested structures ensures that performance stays consistent as data grows. These are just two examples—tailored for different workloads—but MongoDB offers the flexibility to adapt your model as needs evolve. For a deeper dive into MongoDB data modeling best practices, check out our MongoDB University course and explore our Building with Patterns blog series . The right model isn't one-size-fits-all—it’s the one that matches your workload. How to model your vehicle signal data At the COVESA AMM Spring 2025 event, the MongoDB Industry Solutions team presented a prototype to help simplify how connected vehicle systems adopt the Vehicle Signal Specification. The concept: make it easier to move from abstract signal definitions to practical, scalable database designs. The goal wasn’t to deliver a production-ready tool—it was to spark discussion, test ideas, and validate patterns. It resonated with the community, and we’re continuing to iterate on it. For now, the use cases are limited, but they highlight important design decisions: how to structure vehicle signals, how to tailor that structure to the needs of an application, and how to test those assumptions in MongoDB. Figure 3. Vehicle Signal Data Model prototype high-level architecture. This vehicle signals data modeler is a web-based prototype built with Next.js and powered by MongoDB Atlas. It’s made up of three core modules: Schema builder: This is where it starts. You can visually explore the vehicle signals tree, select relevant data points, and define how they should be structured in your schema. Use case mapper: Once the schema is defined, this module helps map how the signals are used. Which signals are read together? Which are written most often? These insights help identify optimization opportunities before the data even hits your database. Database exporter: Finally, based on what you’ve defined, the tool generates an initial database schema optimized for your workload. You can load it with sample data, export it to a live MongoDB instance, and run aggregation pipelines to validate the design. Together, these modules walk you through the journey—from signal selection to schema generation and performance testing—all within a simple, intuitive interface. Figure 4. Vehicle signal data modeler demo in action. Build smarter, adapt faster, and scale more confidently Connected vehicle systems aren’t just about collecting data—they’re about using it, fast and at scale. To get there, you need more than a standardized signal model. You need a data solution that can keep up with constant change, massive volume, and real-time demands. That’s where MongoDB stands out. Its flexible document model, scalable architecture, and built-in support for time series and AI workloads make it a natural fit for the complexities of connected mobility. Whether you're building fleet dashboards, predictive maintenance systems, or next-gen mobility services, MongoDB helps you turn vehicle data into real-world outcomes—faster. To learn more about MongoDB-connected mobility solutions, visit the MongoDB for Manufacturing & Mobility webpage. You can also explore the vehicle signals data modeller prototype and related resources on our GitHub repository .

July 1, 2025
Home

Introducing Text-to-MQL with LangChain: Query MongoDB using Natural Language

We're excited to announce that we've added a powerful new capability to the MongoDB integration for LangChain: Text-to-MQL. This enhancement allows developers to easily transform natural language queries into MongoDB Query Language (MQL), enabling them to build new and intuitive application interfaces powered by large language models (LLMs). Whether you're building chatbots to interact with internal company data stored on MongoDB or AI agents that will work directly with MongoDB, this LangChain toolkit delivers out-of-the-box natural language querying with Text-to-MQL. Enabling new interfaces with Text-to-MQL LLMs are transforming the workplace by enabling people to “talk” to their data. Historically, accessing and querying databases required specialized knowledge or tools. Now, with natural language querying enabled by LLMs, developers can create new, intuitive interfaces that give virtually anyone access to data and insights—no specialized skills required. Using Text-to-MQL, developers can build applications that rely on natural language to generate insights or create visualizations for their users. This includes conversational interfaces that query MongoDB directly, democratizing database exploration and interactions. Robust database querying capabilities through natural language are also critical for building more sophisticated agentic systems. Agents leveraging MongoDB through MQL can interact autonomously with both operational and analytical data, greatly enhancing productivity across a wide range of operational and business tasks. Figure 1. Agent components and how MongoDB powers tools and memory. For instance, customer support agents leveraging Text-to-MQL capabilities can autonomously retrieve the most recent customer interactions and records directly from MongoDB databases, enabling faster and more informed responses. Similarly, agents generating application code can query database collections and schemas to ensure accurate and relevant data retrieval logic. In addition, MongoDB’s flexible document model aligns more naturally with how users describe data in plain language. Its support for nested, denormalized data in JSON-like BSON documents reduces the need for multi-table joins—an area where LLMs often struggle—making MongoDB more LLM-friendly than traditional SQL databases. Implementing Text-to-MQL with MongoDB and LangChain The LangChain and MongoDB integration package provides a comprehensive set of tools to accelerate AI application development. It supports advanced retrieval-augmented generation (RAG) implementations through integrations with MongoDB for vector search, hybrid search, GraphRAG, and more. It also enables agent development using LangGraph, with built-in support for memory persistence. The latest addition, Text-to-MQL, can be used either as a standalone component in your application or as a tool integrated into LangGraph agents. Figure 2. LangChain and MongoDB integration overview. Released in version 0.6.0 of the langchain-mongodb package, the agent_toolkit class introduces a set of methods that enable reliable interaction with MongoDB databases, without the need to develop custom integrations. The integration enables reliable database operations, including the following pre-defined tools: List the collections in the database Retrieve the schema and sample rows for specific collections Execute MongoDB queries to retrieve data Check MongoDB queries for correctness before executing them You can leverage the LangChain database toolkit as a standalone class in your application to interact with MongoDB from natural language and build custom text interfaces or more complex agentic systems. It is highly customizable, providing the flexibility and control needed to adapt it to your specific use cases. More specifically, you can tweak and expand the standard prompts and parameters offered by the integration. When building agents using LangGraph —LangChain’s orchestration framework—this integration serves as a reliable way to give your agents access to MongoDB databases and execute queries against them. Real-world considerations when implementing Text-to-MQL Natural language querying of databases by AI applications and agentic systems is a rapidly evolving space, with best practices still taking shape. Here are a few key considerations to keep in mind as you build: Ensuring accuracy The generated MongoDB Query Language (MQL) relies heavily on the capabilities of the underlying language model and the quality of the schema or data samples provided. Ambiguities in schemas, incomplete metadata, or vague instructions can lead to incorrect or suboptimal queries. It's important to validate outputs, apply rigorous testing, and consider adding guardrails or human review, especially for complex or sensitive queries. Preserving performance Providing AI applications and agents with access to MongoDB databases can present performance challenges. The non-deterministic nature of LLMs makes workload patterns unpredictable. To mitigate the impact on production performance, consider routing agent queries to a replica set or using dedicated, optimized search nodes . Maintaining security and privacy Granting AI apps and agents access to your database should be considered with care. Apply the common security principles and best practices: define and enforce roles and policies to implement least-privilege access, granting only the minimum permissions necessary for the task. Giving access to your data may involve sharing private and sensitive information with LLM providers. You should evaluate what kind of data should actually be sent (such as database names, collection names, or data samples) and whether that access can be toggled on or off to accommodate users. Build reliable AI apps and agents with MongoDB LLMs are redefining how we interact with databases. We're committed to providing developers the best paths forward for building reliable AI interfaces with MongoDB. We invite you to dive in, experiment, and explore the power of connecting AI applications and agents to your data. Try the LangChain MongoDB integration today! Ready to build? Dive into Text-to-MQL with this tutorial and get started building your own agents powered by LangGraph and MongoDB Atlas!

June 30, 2025
Home

Ready to get Started with MongoDB Atlas?

Start Free