Thibaut Gourdel

3 results

Introducing Text-to-MQL with LangChain: Query MongoDB using Natural Language

We're excited to announce that we've added a powerful new capability to the MongoDB integration for LangChain: Text-to-MQL. This enhancement allows developers to easily transform natural language queries into MongoDB Query Language (MQL), enabling them to build new and intuitive application interfaces powered by large language models (LLMs). Whether you're building chatbots to interact with internal company data stored on MongoDB or AI agents that will work directly with MongoDB, this LangChain toolkit delivers out-of-the-box natural language querying with Text-to-MQL. Enabling new interfaces with Text-to-MQL LLMs are transforming the workplace by enabling people to “talk” to their data. Historically, accessing and querying databases required specialized knowledge or tools. Now, with natural language querying enabled by LLMs, developers can create new, intuitive interfaces that give virtually anyone access to data and insights—no specialized skills required. Using Text-to-MQL, developers can build applications that rely on natural language to generate insights or create visualizations for their users. This includes conversational interfaces that query MongoDB directly, democratizing database exploration and interactions. Robust database querying capabilities through natural language are also critical for building more sophisticated agentic systems. Agents leveraging MongoDB through MQL can interact autonomously with both operational and analytical data, greatly enhancing productivity across a wide range of operational and business tasks. Figure 1. Agent components and how MongoDB powers tools and memory. For instance, customer support agents leveraging Text-to-MQL capabilities can autonomously retrieve the most recent customer interactions and records directly from MongoDB databases, enabling faster and more informed responses. Similarly, agents generating application code can query database collections and schemas to ensure accurate and relevant data retrieval logic. In addition, MongoDB’s flexible document model aligns more naturally with how users describe data in plain language. Its support for nested, denormalized data in JSON-like BSON documents reduces the need for multi-table joins—an area where LLMs often struggle—making MongoDB more LLM-friendly than traditional SQL databases. Implementing Text-to-MQL with MongoDB and LangChain The LangChain and MongoDB integration package provides a comprehensive set of tools to accelerate AI application development. It supports advanced retrieval-augmented generation (RAG) implementations through integrations with MongoDB for vector search, hybrid search, GraphRAG, and more. It also enables agent development using LangGraph, with built-in support for memory persistence. The latest addition, Text-to-MQL, can be used either as a standalone component in your application or as a tool integrated into LangGraph agents. Figure 2. LangChain and MongoDB integration overview. Released in version 0.6.0 of the langchain-mongodb package, the agent_toolkit class introduces a set of methods that enable reliable interaction with MongoDB databases, without the need to develop custom integrations. The integration enables reliable database operations, including the following pre-defined tools: List the collections in the database Retrieve the schema and sample rows for specific collections Execute MongoDB queries to retrieve data Check MongoDB queries for correctness before executing them You can leverage the LangChain database toolkit as a standalone class in your application to interact with MongoDB from natural language and build custom text interfaces or more complex agentic systems. It is highly customizable, providing the flexibility and control needed to adapt it to your specific use cases. More specifically, you can tweak and expand the standard prompts and parameters offered by the integration. When building agents using LangGraph —LangChain’s orchestration framework—this integration serves as a reliable way to give your agents access to MongoDB databases and execute queries against them. Real-world considerations when implementing Text-to-MQL Natural language querying of databases by AI applications and agentic systems is a rapidly evolving space, with best practices still taking shape. Here are a few key considerations to keep in mind as you build: Ensuring accuracy The generated MongoDB Query Language (MQL) relies heavily on the capabilities of the underlying language model and the quality of the schema or data samples provided. Ambiguities in schemas, incomplete metadata, or vague instructions can lead to incorrect or suboptimal queries. It's important to validate outputs, apply rigorous testing, and consider adding guardrails or human review, especially for complex or sensitive queries. Preserving performance Providing AI applications and agents with access to MongoDB databases can present performance challenges. The non-deterministic nature of LLMs makes workload patterns unpredictable. To mitigate the impact on production performance, consider routing agent queries to a replica set or using dedicated, optimized search nodes . Maintaining security and privacy Granting AI apps and agents access to your database should be considered with care. Apply the common security principles and best practices: define and enforce roles and policies to implement least-privilege access, granting only the minimum permissions necessary for the task. Giving access to your data may involve sharing private and sensitive information with LLM providers. You should evaluate what kind of data should actually be sent (such as database names, collection names, or data samples) and whether that access can be toggled on or off to accommodate users. Build reliable AI apps and agents with MongoDB LLMs are redefining how we interact with databases. We're committed to providing developers the best paths forward for building reliable AI interfaces with MongoDB. We invite you to dive in, experiment, and explore the power of connecting AI applications and agents to your data. Try the LangChain MongoDB integration today! Ready to build? Dive into Text-to-MQL with this tutorial and get started building your own agents powered by LangGraph and MongoDB Atlas!

June 30, 2025

GraphRAG with MongoDB Atlas: Integrating Knowledge Graphs with LLMs

A key challenge AI developers face is providing context to large language models (LLMs) to build reliable AI-enhanced applications; retrieval-augmented generation (RAG) is widely used to tackle this challenge. While vector-based RAG, the standard (or baseline) implementation of retrieval-augmented generation, is useful for many use cases, it is limited in providing LLMs with reasoning capabilities that can understand relationships between diverse concepts scattered throughout large knowledge bases. As a result, the accuracy of vector RAG-enhanced LLM outputs in applications can disappoint—and even mislead—end users. Now generally available, MongoDB Atlas ’ new LangChain integration for GraphRAG—a variation of RAG architecture that integrates a knowledge graph with LLMs—can help address these limitations. GraphRAG: Connecting the dots First, a short explanation of knowledge graphs: a knowledge graph is a structured representation of information in which entities (such as people, organizations, or concepts) are connected by relationships. Knowledge graphs work like maps, and show how different pieces of information relate to each other. This structure helps computers understand connections between facts, answer complex questions, and find relevant information more easily. Traditional RAG applications split knowledge data into chunks, vectorize them into embeddings, and then retrieve chunks of data through semantic similarity search; GraphRAG builds on this approach. But instead of treating each document or chunk as an isolated piece of information, GraphRAG considers how different pieces of knowledge are connected and relate to each other through a knowledge graph. Figure 1. Embedding-based vector search vs. entity-based graph search. GraphRAG improves RAG architectures in three ways: First, GraphRAG can improve response accuracy . Integrating knowledge graphs into the retrieval component of RAG has shown significant improvements in multiple publications. For example, benchmarks in the AWS investigation, “ Improving Retrieval Augmented Generation Accuracy with GraphRAG ” demonstrated nearly double the correct answers compared to traditional embedding-based RAG. Also, embedding-based methods rely on numerical vectors and can make it difficult to interpret why certain chunks are related. Conversely, a graph-based approach provides a visual and auditable representation of document relationships. Consequently, GraphRAG offers more explainability and transparency into retrieved information for improved insight into why certain data is being retrieved. These insights can help optimize data retrieval patterns to improve accuracy. Finally, GraphRAG can help answer questions that RAG is not well-suited for—particularly when understanding a knowledge base's structure, hierarchy, and links is essential . Vector-based RAG struggles in these cases because breaking documents into chunks loses the big picture. For example, prompts like “What are the themes covered in the 2025 strategic plan?” are not well handled. This is because the semantic similarity between the prompt, with keywords like “themes,” and the actual themes in the document may be weak, especially if they are scattered across different sections. Another example prompt like, “What is John Doe’s role in ACME’s renewable energy projects?” presents challenges because if the relationships between the person, the company, and the related projects are mentioned in different places, it becomes difficult to provide accurate responses with vector-based RAG. Traditional vector-based RAG can struggle in cases like these because it relies solely on semantic similarity search. The logical connections between different entities—such as contract clauses, legal precedents, financial indicators, and market conditions—are often complex and lack semantic keyword overlap. Making logical connections across entities is often referred to as multi-hop retrieval or reasoning in GraphRAG. However, GraphRAG has its own limitations, and is use-case dependent to achieve better accuracy than vector-based RAG: It introduces an extra step: creating the knowledge graph using LLMs to extract entities and relationships. Maintaining and updating the graph as new data arrives becomes an ongoing operational burden. Unlike vector-based RAG, which requires embedding and indexing—a relatively lightweight and fast process—GraphRAG depends on a large LLM to accurately understand, map complex relationships, and integrate them into the existing graph. The added complexity of graph traversal can lead to response latency and scalability challenges as the knowledge base grows. Latency is closely tied to the depth of traversal and the chosen retrieval strategy, both of which must align with the specific requirements of the application. GraphRAG introduces additional retrieval options . While this allows developers more flexibility in the implementation, it also adds complexity. The additional retrieval options include keyword and entity-based retrieval, semantic similarity on the first node, and more. MongoDB Atlas: A unified database for operational data, vectors, and graphs MongoDB Atlas is perfectly suited as a unified database for documents, vectors, and graphs. As a unified platform, it’s ideal for powering LLM-based applications with vector-based or graph-based RAG. Indeed, adopting MongoDB Atlas eliminates the need for point or bolt-on solutions for vector or graph functionality, which often introduce unnecessary complexity, such as data synchronization challenges that can lead to increased latency and potential errors. The unified approach offered by MongoDB Atlas simplifies the architecture and reduces operational overhead, but most importantly, it greatly simplifies the development experience. In practice, this means you can leverage MongoDB Atlas' document model to store rich application data, use vector indexes for similarity search, and model relationships using document references for graph-like structures. Implementing GraphRAG with MongoDB Atlas and LangChain Starting from version 0.5.0, the langchain-mongodb package introduces a new class to simplify the implementation of a GraphRAG architecture. Figure 2. GraphRAG architecture with MongoDB Atlas and LangChain First, it enables the automatic creation of a knowledge graph. Under the hood, it uses a specific prompt sent to an LLM of your choice to extract entities and relationships, structuring the data to be stored as a graph in MongoDB Atlas. Then, it sends a query to the LLM to extract entities and then searches within the graph to find connected entities, their relationships, and associated data. This information, along with the original query, then goes back to the LLM to generate an accurate final response. MongoDB Atlas’ integration in LangChain for GraphRAG follows an entity-based graph approach. However, you can also develop and implement your own GraphRAG with a hybrid approach using MongoDB drivers and MongoDB Atlas’ rich search and aggregation capabilities. Enhancing knowledge retrieval with GraphRAG GraphRAG complements traditional RAG methods by enabling deeper understanding of complex, hierarchical relationships, supporting effective information aggregation and multi-hop reasoning. Hybrid approaches that combine GraphRAG with embedding-based vector search further enhance knowledge retrieval, making them especially effective for advanced RAG and agentic systems. MongoDB Atlas’ unified database simplifies RAG implementation and its variants, including GraphRAG and other hybrid approaches, by supporting documents, vectors, and graph representations in a unified data model that can seamlessly scale from prototype to production. With robust retrieval capabilities ranging from full-text and semantic search to graph search, MongoDB Atlas provides a comprehensive solution for building AI applications. And its integration with proven developer frameworks like LangChain accelerates the development experience—enabling AI developers to build more advanced and efficient retrieval-augmented generation systems that underpin AI applications. Ready to dive into GraphRAG? Learn how to implement it with MongoDB Atlas and LangChain. Head over to the Atlas Learning Hub to boost your MongoDB skills and knowledge.

April 14, 2025

AI-Powered Java Applications With MongoDB and LangChain4j

MongoDB is pleased to introduce its integration with LangChain4j , a popular framework for integrating large language models (LLMs) into Java applications. This collaboration simplifies the integration of MongoDB Atlas Vector Search into Java applications for building AI applications. The advent of generative AI has opened up many new possibilities for developing novel applications. These advancements have led to the development of AI frameworks that simplify the complexities of orchestrating and integrating LLMs and the various components of the AI stack , where MongoDB plays a key role as an operational and vector database. Simplifying AI development for Java The first AI frameworks to emerge were developed for Python and JavaScript, which were favored by early AI developers. However, Java remains widespread in enterprise software. This has led to the development of LangChain4j to address the needs of the Java ecosystem. While largely inspired by LangChain and other popular AI frameworks, LangChain4j is independently developed. As with other LLM frameworks, LangChain4j offers several advantages for developing AI systems and applications by providing: A unified API for integrating LLM providers and vector stores. This enables developers to adopt a modular approach with an interchangeable stack while ensuring a consistent developer experience. Common abstractions for LLM-powered applications, such as prompt templating, chat memory management, and function calling, offering ready-to-use building blocks for common AI applications like retrieval-augmented generation (RAG) and agents. Powering RAG and agentic systems with MongoDB and LangChain4j MongoDB worked with the LangChain4j open-source community to integrate MongoDB Atlas Vector Search into the framework, enabling Java developers to develop AI-powered applications from simple RAG to agentic applications. In practice, this means developers can now use the unified LangChain4j API to store vector embeddings in MongoDB Atlas and use Atlas Vector Search capabilities for retrieving relevant context data. These capabilities are essential for enabling RAG pipelines, where private, often enterprise data is retrieved based on relevancy and combined with the original prompt to get more accurate results in LLM-based applications. LangChain4j supports various levels of RAG, from basic to advanced implementations, making it easy to prototype and experiment before customizing and scaling your solution to your needs. A basic RAG setup with LangChain4j typically involves loading and parsing unstructured data from documents stored locally or on remote services like Amazon S3 or Azure Storage using the Document API. The process then transforms and splits the data, then embeds it to capture the semantic meaning of the content. For more details, check out the documentation on core RAG APIs . However, real-world use cases often demand solutions with advanced RAG and agentic systems. LangChain4j optimizes RAG pipelines with predefined components designed to enhance accuracy, latency, and overall efficiency through techniques like query transformation, routing, content aggregation, and reranking. It also supports AI agent implementation through dedicated APIs, such as AI Services and Tools , with function calling and RAG integration, among others. Learn more about the MongoDB Atlas Vector Search integration in LangChain4j’s documentation . MongoDB’s dedication to providing the best developer experience for building AI applications across different ecosystems remains strong, and this integration reinforces that commitment. We will continue strengthening our integration with LLM frameworks enabling developers to build more-innovative AI applications, agentic systems, and AI agents. Ready to start building AI applications with Java? Learn how to create your first RAG system by visiting our tutorial: How to Make a RAG Application With LangChain4j .

March 4, 2025