MongoDB Developer Blog
Deep dives into technical concepts, architectures, and innovations with MongoDB.
Automotive After Sales Diagnostics Using GraphRAG and Multimodal AI
Modern vehicles act as distributed computing systems and generate terabytes of telemetry. However, the majority of after-sales diagnostic and repair workflows still depend on static documentation and basic keyword search. In 2025, J.D. Power reported that 12% of repairs are not completed correctly on the first visit.1 These repeat repairs increase costs, reduce workshop throughput, and erode customer trust.
High vs Low Ingestion: A Practical Study of MongoDB Time Series Bucket Behavior
Time series data captures any signal, metric, or observation whose state changes continuously over time. Infrastructure metrics, IoT sensor readings, financial market data, observability signals, and distributed system telemetry all qualify. What they share is the need to record an ordered sequence of measurements efficiently.
db.youtube.insert(): Our Developer YouTube Channel is Officially Live
If you’ve spent any time learning MongoDB on YouTube, you’ve likely visited our main channel. It’s been the hub for all video content—from company news and keynote highlights to the tutorials that help you get your first cluster up and running.
Near Real-time Analytics Powered by Mirroring in Microsoft Fabric for MongoDB Atlas
MongoDB’s accelerator for mirroring enables customers to bring operational data from MongoDB Atlas to Microsoft Fabric in near real-time for big data analytics, AI, and business intelligence (BI), combining it with the rest of the data estate of the enterprise. Open mirroring in Fabric provides a unique way to import data from operational data stores to the uniform data layer of OneLake in Fabric. Once mirroring is enabled for a MongoDB Atlas collection, the corresponding table in OneLake stays in sync with the changes in the source MongoDB Atlas collection, unlocking opportunities for various analytics and for AI and BI in near real-time.
Port Mapping for Google Private Service Connect on MongoDB Atlas
For organizations leveraging MongoDB Atlas on Google Cloud, network architecture is a critical component of performance and scalability. Today, we are excited to announce a significant architectural enhancement that simplifies the connection between these two platforms. This new feature, Port Mapping for Private Service Connect (PSC), reduces developer efforts and enables faster scaling by streamlining connection management and resource allocation.
A How-To Guide to Building Fast, Cheap, and Accurate Retrieval
Building Gen AI prototypes is straightforward. Whether you're building search, RAG, or agentic applications, the main focus when prototyping is often accuracy. But production is different. In production, you’re handling thousands or millions of queries instead of a handful of tests. Your users expect accurate responses, and they want them instantly. This requires optimizing for three things at once: accuracy, speed, and operating costs.
Building a Movie Recommendation Engine with Hugging Face and Voyage AI
This guest blog post is from Arek Borucki, Machine Learning Platform & Data Engineer for Hugging Face - a collaboration platform for the machine learning community. The Hugging Face Hub works as a central place where anyone can share, explore, discover, and experiment with open-source ML. HF empowers the next generation of machine learning engineers, scientists, and end users to learn, collaborate and share their work to build an open and ethical AI future together. With the fast-growing community, some of the most used open-source ML libraries and tools, and a talented science team exploring the edge of tech, Hugging Face is at the heart of the AI revolution.
Optimizing the MongoDB Java Driver: How minor optimizations led to macro gains
Donald Knuth, widely recognized as the ‘father of the analysis of algorithms,’ warned against premature optimization—spending effort on code that appears inefficient but is not on the critical path. He observed that programmers often focus on the wrong 97% of the codebase. Real performance gains come from identifying and optimizing the critical 3%. But, how can you identify the critical 3%? Well, that’s where the philosophy of ‘never guess, always measure’ comes in.
Evaluation of Update-Heavy Workloads With PostgreSQL JSONB and MongoDB BSON
JSON has become a common data format for modern applications, and as a result, many teams evaluate whether a single database can serve both relational and document-style workloads. PostgreSQL’s JSONB support and MongoDB’s BSON document model often appear comparable at a glance, leading to the assumption that they can be used interchangeably.
Build an Agentic Video Search System Using Voyage AI, MongoDB, and Anthropic
As natural language queries replace keyword searches and search systems embrace multimodal data, a single information retrieval strategy can no longer capture the full spectrum of user intent.
Vision RAG: Enabling Search on Any Documents
Information comes in many shapes and forms. While retrieval-augmented generation (RAG) primarily focuses on plain text, it overlooks vast amounts of data along the way. Most enterprise knowledge resides in complex documents, slides, graphics, and other multimodal sources. Yet, extracting useful information from these formats using optical character recognition (OCR) or other parsing techniques is often low-fidelity, brittle, and expensive. Vision RAG makes complex documents—including their figures and tables—searchable by using multimodal embeddings, eliminating the need for complex and costly text extraction. This guide explores how Voyage AI’s latest model powers this capability and provides a step-by-step implementation walkthrough. Vision RAG: Building upon text RAG Vision RAG is an evolution of traditional RAG built on the same two components: retrieval and generation. In traditional RAG, unstructured text data is indexed for semantic search. At query time, the system retrieves relevant documents or chunks and appends them to the user’s prompt so the large language model (LLM) can produce more grounded, context-aware answers. Figure 1. Text RAG with Voyage AI and MongoDB. Text RAG with Voyage AI and MongoDB Enterprise data, however, is rarely just clean plain text. Critical information often lives in PDFs, slides, diagrams, dashboards, and other visual formats. Today, this is typically handled by parsing tools and OCR services. Those approaches create several problems: Significant engineering effort to handle many file types, layouts, and edge cases Accuracy issues across different OCR or parsing setups High costs when scaled across large document collections Next-generation multimodal embedding models provide a simpler and more cost-effective alternative. They can ingest not only text but also images or screenshots of complex document layouts, and generate vector representations that capture the meaning and structure of that content. Vision RAG uses these multimodal embeddings to index entire documents, slides, and images directly, even when they contain interleaved text and images. This enables them to be searchable via vector search without requiring heavy parsing or OCR. At query time, the system retrieves the most relevant visual assets and feeds them, along with the text prompt, into a vision-capable LLM to inform its answer. Figure 2. Vision RAG with Voyage AI and MongoDB. Vision RAG with Voyage AI and MongoDB As a result, vision RAG enables LLM-based systems with native access to rich, multimodal enterprise data, while reducing engineering complexity and avoiding the performance and cost pitfalls associated with traditional text-focused preprocessing pipelines. Voyage AI’s latest multimodal embedding model The multimodal embedding model is where the magic happens. Historically, building such a system was challenging due to the modality gap. Early multimodal embedding models, such as contrastive language-image pretraining (CLIP)-based models, processed text and images using separate encoders. Because the outputs were generated independently, results were often biased toward one modality, making retrieval across mixed content unreliable. These models also struggled to handle interleaved text and images, a critical limitation for vision RAG in real-world environments. Voyage-multimodal-3 adopts an architecture similar to modern vision-capable LLMs. It uses a single encoder for both text and visual inputs, closing the modality gap and producing unified representations. This ensures that textual and visual features are treated consistently and accurately within the same vector space. Figure 3. CLIP-based architecture vs. voyage-multimodal-3’s architecture. CLIP-based architecture vs. voyage-multimodal-3’s architecture This architectural shift enables true multimodal retrieval, making vision RAG a viable and efficient solution. For more details, refer to the voyage-multimodal-3 blog announcement. Implementation of vision RAG Let’s take a simple example and showcase how to implement vision RAG. Traditional text-based RAG often struggles with complex documents, such as slide decks, financial reports, or technical papers, where critical information is often locked inside charts, diagrams, and figures. By using Voyage AI’s multimodal embedding models alongside Anthropic’s vision-capable LLMs, we can bridge this gap. We will treat images (or screenshots of document pages) as first-class citizens, retrieving them directly based on their visual and semantic content and passing them to a vision-capable LLM for reasoning. To demonstrate this, we will build a pipeline that extracts insights from the charts and figures of the GitHub Octoverse 2025 survey, which simulates the type of information typically found in enterprise data. The Jupyter Notebook for this tutorial is available on GitHub in our GenAI Showcase repository. To follow along, run the notebook in Google Colab (or similar), and refer to this tutorial for explanations of key code blocks. Step 1: Install necessary libraries First, we need to set up our Python environment. We will install the voyageai client for generating embeddings and the anthropic client for our generative model. ....