Peter Richards

2 results

Checkpointers and Native Parent Child Retrievers with LangChain and MongoDB

MongoDB and LangChain, the company known for its eponymous large language model (LLM) application framework, are excited to announce new developments in an already strong partnership. Two additional enhancements have just been added to the LangChain codebase, making it easier than ever to build cutting-edge AI solutions with MongoDB. Checkpointer support In LangGraph, LangChain’s library for building stateful, multi-actor applications with LLMs, memory is provided through checkpointers . Checkpointers are snapshots of the graph state at a given point in time. They provide a persistence layer, allowing developers to interact and manage the graph’s state. This has a number of advantages for developers—human-in-the-loop, "memory" between interactions, and more. Figure adapted from “Launching Long-Term Memory Support in LangGraph”. LangChain Blog. Oct. 8, 2024. https://blog.langchain.dev/launching-long-term-memory-support-in-langgraph/ MongoDB has developed a custom checkpointer implementation, the " MongoDBSaver " class, that, with just a MongoDB URI (local or Atlas ), can easily store LangGraph state in MongoDB. By making checkpointers a first-class feature, developers can have confidence that their stateful AI applications built on MongoDB will be performant. That’s not all, since there are actually two new checkpointers as part of this implementation— one synchronous and one asynchronous . This versatility allows the new functionality to be even more versatile, and serving developers with a myriad of use cases. Both implementations include helpful utility functions to make using them painless, letting developers easily store instances of StateGraph inside of MongoDB. A performant persistence layer that stores data in an intuitive way will mean a better end-user experience and a more robust system, no matter what a developer is building with LangGraph. Native parent child retrievers Second, MongoDB has implemented a native parent child retriever inside LangChain. This approach enhances the performance of retrieval methods utilizing the retrieval-augmented Generation (RAG) technique by providing the LLM with a broader context to consider. In essence, we divide the original documents into relatively small chunks, embed each one, and store them in MongoDB. Using such small chunks (a sentence or a couple of sentences) helps the embedding models to better reflect their meaning. Now developers can use " MongoDBAtlasParentDocumentRetriever " to persist one collection for both vector and document storage. In this implementation, we can store both parent and child documents in a single collection while only having to compute and index embedding vectors for the chunks. This has a number of performance advantages because storing vectors with their associated documents means no need to join tables or worry about painful schema migrations. Additionally, as part of this work, MongoDB has also added a " MongoDBDocStore " class which provides many helpful utility functions. It is now easier than ever to use documents as a key-value store and insert, update, and delete them with ease. Taken together, these two new classes allow developers to take full advantage of MongoDB’s abilities. MongoDB and LangChain continue to be a strong pair for building agentic AI—combining performance and ease of development to provide a developer-friendly experience. Stay tuned as we build out additional functionality! To learn more about these LangChain integrations, here are some resources to get you started: Check out our tutorial . Experiment with checkpointers and native parent child retrievers to see their utility for yourself. Read the previous announcement with LangChain about AI Agents, Hybrid Search, and Indexing.

December 16, 2024

AI Agents, Hybrid Search, and Indexing with LangChain and MongoDB

Since we announced integration with LangChain last year, MongoDB has been building out tooling to help developers create advanced AI applications with LangChain . With recent releases, MongoDB has made it easier to develop agentic AI applications (with a LangGraph integration), perform hybrid search by combining Atlas Search and Atlas Vector Search , and ingest large-scale documents more effectively. For more on each development—plus new support for the LangChain Indexing API—please read on! The rise of AI agents Agentic applications have emerged as a compelling next step in the development of AI. Imagine an application able to act on its own, working towards complicated goals and drawing on context to create a strategy. These applications leverage large language models (LLMs) to dynamically determine their execution path, breaking free from the constraints of traditional, deterministic logic. Consider an application tasked with answering a question like "In our most profitable market, what is the current weather?" While a traditional retrieval-augmented generation (RAG) app may falter, unable to obtain information about “current weather,” an agentic application shines. The application can intelligently deduce the need for an external API call to obtain current weather information, seamlessly integrating this with data retrieved from a vector search to identify the most profitable market. These systems take action and gather additional information with limited human intervention, supplementing what they already know. Building such a system is easier than ever thanks to MongoDB’s continued work with LangGraph. Unleashing the power of AI agents with LangGraph and MongoDB Because it now offers LangGraph—a framework for performing multi-agent orchestration—LangChain is more effective than ever at simplifying the creation of applications using LLMs, including AI agents. These agents require memory to maintain context across multiple interactions, allowing users to engage with them repeatedly while the agent retains information from previous exchanges. While basic agentic applications can utilize in-memory structures, for more complicated use cases these structures are not sufficient. MongoDB allows developers to build stateful, multi-actor applications with LLMs, storing and retrieving the “checkpoints” needed by LangGraph.js. The new MongoDBSaver class makes integration simpler than ever before, as LangGraph.js is able to utilize historical user interactions to enhance agentic AI. By segmenting this history into checkpoints, the library allows for persistent session memory, easier error recovery, and even the ability to “time travel”—allowing users to jump back in the graph to a previous state to explore alternative execution. The MongoDBSaver class implements all of this functionality right into LangGraph.js, with sensible defaults and MongoDB-specific optimization. To learn more, please visit the source code , the documentation , and our new tutorial (which includes both a written and video version). Improve retrieval accuracy with Hybrid Search Retriever Hybrid search is particularly well-suited for queries that have both semantic and keyword-based components. Let’s look at an example, a query such as "find recent scientific papers about climate change impacts on coral reefs that specifically mention ocean acidification". This query would use a hybrid search approach, combining semantic search to identify papers discussing climate change effects on coral ecosystems, keyword matching to ensure "ocean acidification" is mentioned, and potential date-based filtering or boosting to prioritize recent publications. This combination allows for more comprehensive and relevant results than either semantic or keyword search alone could provide. With our recent release of Retrievers in LangChain-MongoDB, building such advanced retrieval patterns is more accessible than ever. Retrievers are how LangChain integrates external data sources into LLM applications. MongoDB has added two new custom, purpose-built Retrievers to the langchain-mongodb Python package, giving developers a unified way to perform hybrid search and full-text search with sensible defaults and extensive code annotation. These new classes make it easier than ever to use the full capabilities of MongoDB Vector Search with LangChain. The new MongoDBAtlasFullTextSearchRetriever class performs full-text searches using the Best Match 25 (BM25) analyzer. The MongoDBAtlasHybridSearchRetriever class builds on this work, combining the above implementation with vector search, fusing the results with Reciprocal Rank Fusion (RRF) algorithm. The combination of these two techniques is a potent tool for improving the retrieval step of a Retrieval-Augmented Generation (RAG) application, enhancing the quality of the results. To find out more, please dive into the MongoDBAtlasHybridSearchRetriever and MongoDBAtlasFullTextSearchRetriever classes. Seamless synchronization using LangChain Indexing API In addition to these releases, we’re also excited to announce that MongoDB now supports the LangChain Indexing API, allowing for seamless loading and synchronization of documents from any source into MongoDB, leveraging LangChain's intelligent indexing features. This new support will help users avoid duplicate content, minimize unnecessary rewrites, and optimize embedding computations. The LangChain Indexing API's record management system ensures efficient tracking of document writes, computing hashes for each document, and storing essential information like write time and source ID. This feature is particularly valuable for large-scale document processing and retrieval applications, offering flexible cleanup modes to manage documents effectively in MongoDB vector search. To read more about how to use the Indexing API, please visit the LangChain Indexing API Documentation . We’re excited about these LangChain integrations and we hope you are too. Here are some resources to further your learning: Check out our written and video tutorial to walk you through building your own JavaScript AI agent with LangGraph.js and MongoDB. Experiment with Hybrid search retrievers to see the power of Hybrid search for yourself. Read the previous announcement with LangChain about Semantic Caching.

September 12, 2024