Retrieval Augmented Generation (RAG): The Open-Book Test for Gen AI

Steve Jurczak
October 26, 2023 | Updated: April 3, 2024
#genAI

The release of ChatGPT in November 2022 marked a groundbreaking moment for AI, introducing the world to an entirely new realm of possibilities created by the fusion of generative AI and machine learning foundation models, or large language models (LLMs). In order to truly unlock the power of LLMs, organizations need to not only access the innovative commercial and open-source models but also feed them vast amounts of quality internal and up-to-date data. By combining a mix of proprietary and public data in the models, organizations can expect more accurate and relevant LLM responses that better mirror what's happening at the moment.

The ideal way to do this today is by leveraging retrieval-augmented generation (RAG), a powerful approach in natural language processing (NLP) that combines information retrieval and text generation. Most people by now are familiar with the concept of prompt engineering, which is essentially augmenting prompts to direct the LLM to answer in a certain way. With RAG, you're augmenting prompts with proprietary data to direct the LLM to answer in a certain way based on contextual data. The retrieved information serves as a basis for generating coherent and contextually relevant text. This combination allows AI models to provide more accurate, informative, and context-aware responses to queries or prompts.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Applying retrieval-augmented generation (RAG) in the real world

Let's use a stock quote as an example to illustrate the usefulness of retrieval-augmented generation in a real-world scenario. Since LLMs aren't trained on recent data like stock prices, the LLM will hallucinate and make up an answer or deflect from answering the question entirely. Using retrieval-augmented generation, you would first fetch the latest news snippets from a database (often using vector embeddings in a vector database or MongoDB Atlas Vector Search) that contains the latest stock news. Then, you insert or "augment" these snippets into the LLM prompt. Finally, you instruct the LLM to reference the up-to-date stock news in answering the question. With RAG, because there is no retraining of the LLM required, the retrieval is very fast (sub 100 ms latency) and well-suited for real-time applications.

Another common application of retrieval-augmented generation is in chatbots or question-answering systems. When a user asks a question, the system can use the retrieval mechanism to gather relevant information from a vast dataset, and then it generates a natural language response that incorporates the retrieved facts.

RAG vs. fine-tuning

Users will immediately bump up against the limits of GenAI anytime there's a question that requires information that sits outside the LLM's training corpus, resulting in hallucinations, inaccuracies, or deflection. RAG fills in the gaps in knowledge that the LLM wasn't trained on, essentially turning the question-answering task into an “open-book quiz,” which is easier and less complex than an open and unbounded question-answering task.

Fine-tuning is another way to augment LLMs with custom data, but unlike RAG it's like giving it entirely new memories or a lobotomy. It's also time- and resource-intensive, generally not viable for grounding LLMs in a specific context, and especially unsuitable for highly volatile, time-sensitive information and personal data.

Conclusion

Retrieval-augmented generation can improve the quality of generated text by ensuring it's grounded in relevant, contextual, real-world knowledge. It can also help in scenarios where the AI model needs to access information that it wasn't trained on, making it particularly useful for tasks that require factual accuracy, such as research, customer support, or content generation. By leveraging RAG with your own proprietary data, you can better serve your current customers and give yourself a significant competitive edge with reliable, relevant, and accurate AI-generated output.

To learn more about how Atlas helps organizations integrate and operationalize GenAI and LLM data, download our white paper, Embedding Generative AI and Advanced Search into your Apps with MongoDB. If you're interested in leveraging generative AI at your organization, reach out to us today and find out how we can help your digital transformation.

← Previous

4 Key Considerations for Unlocking the Power of GenAI

Artificial intelligence is evolving at an unprecedented pace, and generative AI (GenAI) is at the forefront of the revolution. GenAI capabilities are vast, ranging from text generation to music and art creation. But what makes GenAI truly unique is its ability to deeply understand context, producing outputs that closely resemble that of humans. It's not just about conversing with intelligent chatbots. GenAI has the potential to transform industries, providing richer user experiences and unlocking new possibilities. In the coming months and years, we'll witness the emergence of applications that leverage GenAI's power behind the scenes, offering capabilities never before seen. Unlike now popular chatbots like ChatGPT, users won't necessarily realize that GenAI is working in the background. But behind the scenes, these new applications are combining information retrieval and text generation to deliver truly personalized and contextual user experiences in real-time. This process is called retrieval-augmented generation, or RAG for short. So, how does retrieval-augmented generation (RAG) work, and what role do databases play in this process? Let's delve deeper into the world of GenAI and its database requirements. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. The challenge of training AI foundation models One of the primary challenges with GenAI is the lack of access to private or proprietary data. AI foundation models, of which large language models (LLMs) are a subset, are typically trained on publicly available data but do not have access to confidential or proprietary information. Even if the data were in the public domain, it might be outdated and irrelevant. LLMs also have limitations in recognizing very recent events or knowledge. Furthermore, without proper guidance, LLMs may produce inaccurate information, which is unacceptable in most situations. Databases play a crucial role in addressing these challenges. Instead of sending prompts directly to LLMs, applications can use databases to retrieve relevant data and include it in the prompt as context. For example, a banking application could query the user's transaction data from a legacy database, add it to the prompt, and then send this engineered prompt to the LLM. This approach ensures that the LLM generates accurate and up-to-date responses, eliminating the issues of missing data, stale data, and inaccuracies. Top 4 database considerations for GenAI applications It won't be easy for businesses to achieve real competitive advantage leveraging GenAI when everyone has access to the same tools and knowledge base. Rather, the key to differentiation will come from layering your own unique proprietary data on top of Generative AI powered by foundation models and LLMs. There are four key considerations organizations should focus on when choosing a database to leverage the full potential of GenAI-powered applications: Queryability: The database needs to be able to support rich, expressive queries and secondary indexes to enable real-time, context-aware user experiences. This capability ensures data can be retrieved in milliseconds, regardless of the complexity of the query or the size of data stored in the database. Flexible data model: GenAI applications often require different types and formats of data, referred to as multi-modal data. To accommodate these changing data sets, databases should have a flexible data model that allows for easy onboarding of new data without schema changes, code modifications, or version releases. Multi-modal data can be challenging for relational databases because they're designed to handle structured data, where information is organized into tables with rows and columns, with strict schema rules. Integrated vector search: GenAI applications may need to perform semantic or similarity queries on different types of data, such as free-form text, audio, or images. Vector embeddings in a vector database enable semantic or similarity queries. Vector embeddings capture the semantic meaning and contextual information of data making them suitable for various tasks like text classification, machine translation, and sentiment analysis. Databases should provide integrated vector search indexing to eliminate the complexity of keeping two separate systems synchronized and ensuring a unified query language for developers. Scalability: As GenAI applications grow in terms of user base and data size, databases must be able to scale out dynamically to support increasing data volumes and request rates. Native support for scale-out sharding ensures that database limitations aren't blockers to business growth. The ideal database solution: MongoDB Atlas MongoDB Atlas is a powerful and versatile platform for handling the unique demands of GenAI. MongoDB uses a powerful query API that makes it easy to work with multi-modal data, enabling developers to deliver more with less code. MongoDB is the most popular document database as rated by developers. Working with documents is easy and intuitive for developers because documents map to objects in object-oriented programming, which are more familiar than the endless rows and tables in relational databases. Flexible schema design allows for the data model to evolve to meet the needs of GenAI use cases, which are inherently multi-modal. By using sharding, Atlas scales out to support large increases in the volume of data and requests that come with GenAI-powered applications. MongoDB Atlas Vector Search embeds vector search indexing natively so there's no need to maintain two different systems. Atlas keeps Vector Search indexes up to date with the source data constantly. Developers can use a single endpoint and query language to construct queries that combine regular database query filters and vector search filters. This removes friction and provides an environment for developers to prototype and deliver GenAI solutions rapidly. Conclusion GenAI is poised to reshape industries and provide innovative solutions across sectors. With the right database solution, GenAI applications can thrive, delivering accurate, context-aware, and dynamic data-driven user experiences that meet the growing demands of today's fast-paced digital landscape. With MongoDB Atlas, organizations can unlock agility, productivity, and growth, providing a competitive edge in the rapidly evolving world of generative AI. To learn more about how Atlas helps organizations integrate and operationalize GenAI and LLM data, download our white paper, Embedding Generative AI and Advanced Search into your Apps with MongoDB . If you're interested in leveraging generative AI at your organization, reach out to us today and find out how we can help your digital transformation. Head over to our quick-start guide to get started with Atlas Vector Search today.

October 26, 2023

Next →

MongoDB.local London 2024: Better Applications, Faster

Since we kicked off MongoDB’s series of 2024 events in April, we’ve connected with thousands of customers, partners, and community members in cities around the world—from Mexico City to Mumbai. Yesterday marked the nineteenth stop of the 2024 MongoDB.local tour, and we had a blast welcoming folks across industries to MongoDB.local London, where we discussed the latest technology trends, celebrated customer innovations, and unveiled product updates that make it easier than ever for developers to build next-gen applications. Over the past year, MongoDB’s more than 50,000 customers have been telling us that their needs are changing. They’re increasingly focused on three areas: Helping developers build faster and more efficiently Empowering teams to create AI-powered applications Moving from legacy systems to modern platforms Across these areas, there’s a common need for a solid foundation: each requires a resilient, scalable, secure, and highly performant database. The updates we shared at MongoDB.local London reflect these priorities. MongoDB is committed to ensuring that our products are built to exceed our customers’ most stringent requirements, and that they provide the strongest possible foundation for building a wide range of applications, now and in the future. Indeed, during yesterday’s event, Sahir Azam, MongoDB’s Chief Product Officer, discussed the foundational role data plays in his keynote address. He also shared the latest advancement from our partner ecosystem, an AI solution powered by MongoDB, Amazon Web Services, and Anthropic that makes it easier for customers to deploy gen AI customer care applications. MongoDB 8.0: The best version of MongoDB ever The biggest news at .local London was the general availability of MongoDB 8.0 , which provides significant performance improvements, reduced scaling costs, and adds additional scalability, resilience, and data security capabilities to the world’s most popular document database. Architectural optimizations in MongoDB 8.0 have significantly reduced memory usage and query times, and MongoDB 8.0 has more efficient batch processing capabilities than previous versions. Specifically, MongoDB 8.0 features 36% better read throughput, 56% faster bulk writes, and 20% faster concurrent writes during data replication. In addition, MongoDB 8.0 can handle higher volumes of time series data and can perform complex aggregations more than 200% faster—with lower resource usage and costs. Last (but hardly least!), Queryable Encryption now supports range queries, ensuring data security while enabling powerful analytics. For more on MongoDB.local London’s product announcements—which are designed to accelerate application development, simplify AI innovation, and speed developer upskilling—please read on! Accelerating application development Improved scaling and elasticity on MongoDB Atlas capabilities New enhancements to MongoDB Atlas’s control plane allow customers to scale clusters faster, respond to resource demands in real-time, and optimize performance—all while reducing operational costs. First, our new granular resource provisioning and scaling features—including independent shard scaling and extended storage and IOPS on Azure—allow customers to optimize resources precisely where needed. Second, Atlas customers will experience faster cluster scaling with up to 50% quicker scaling times by scaling clusters in parallel by node type. Finally, MongoDB Atlas users will enjoy more responsive auto-scaling, with a 5X improvement in responsiveness thanks to enhancements in our scaling algorithms and infrastructure. These enhancements are being rolled out to all Atlas customers, who should start seeing benefits immediately. IntelliJ plugin for MongoDB Announced in private preview, the MongoDB for IntelliJ Plugin is designed to functionally enhance the way developers work with MongoDB in IntelliJ IDEA, one of the most popular IDEs among Java developers. The plugin allows enterprise Java developers to write and test Java queries faster, receive proactive performance insights, and reduce runtime errors right in their IDE. By enhancing the database-to-IDE integration, JetBrains and MongoDB have partnered to deliver a seamless experience for their shared user-base and unlock their potential to build modern applications faster. Sign up for the private preview here . MongoDB Copilot Participant for VS Code (Public Preview) Now in public preview, the new MongoDB Participant for GitHub Copilot integrates domain-specific AI capabilities directly with a chat-like experience in the MongoDB Extension for VS Code .

October 3, 2024