Lower-Cost Vector Retrieval with Voyage AI’s Model Options

James Gentile
August 6, 2025 | Updated: September 29, 2025

Vector search is often the first step in retrieval augmented generation (RAG) systems. In a previous post, we discussed the future of AI-powered search at MongoDB. At MongoDB, we’re making it easier to select an embedding model so your search solution can scale. At scale, the choice of vector representations and dimensionality can have a significant impact on the cost and performance of a vector search system.

In this blog post, we discuss options to reduce the storage and compute costs of your vector search solution.

The cost of dimensionality

Many MongoDB customers use vector indexes on the order of hundreds of gigabytes. An index’s size is determined by the number of documents and the dimensionality (or count of floating-point numbers) of the vector representation that encodes a document’s semantics. For example, it would require ~500 GB to store 41M documents if the embedding model uses 3072 dimensions to represent a document. At query time, each vector similarity computation will require 3072 floating-point operations.

However, if these documents could be represented by a 512-dimensional vector, a similar index would only require 84GB of storage—a six-fold reduction in storage costs, as well as less computation at query time. In sum, the dimensionality of a document’s vector representation will directly affect the storage and retrieval costs of the system. Smaller vectors are cheaper to store, index, and query since they are represented with fewer floating-point numbers, but this may come with a tradeoff in accuracy.

Put simply, an embedding model converts an input into a vector of fixed dimensions. The model was trained to ensure that vectors from similar documents are close to one another in the embedding space. The amount of storage and compute required for a vector search query is directly proportional to the dimensionality of the vector. If we can reduce the vector representation without compromising retrieval accuracy, our system can more quickly answer queries while using less storage space.

Matroyshka representation learning

So, how do we shrink vectors without losing meaning? One answer is Matroyshka Representation Learning (MRL) representations. Instead of reducing the vector size through quantization, MRL structures the embedding vector like a stacking doll, in which smaller representations are packed inside the full vector and appear very similar to the larger representation. This means we can select the level of fidelity we would like to use within our system because the similarity between lower-dimensional vectors approximates the similarity of their full-fidelity representations. With MRL representations, we can find the right balance of storage, compute, and accuracy.

Figure 1. Visualization of Matroyshka Representation Learning.

Diagram visualizing MRL. With a small matroyshka doll on the left, growing into bigger sizes three times to the largest on the right. The caption on the image says vector dimensions at different representations.

To use MRL, we must first select an embedding model that was trained for it. When training with MRL, an additional term is added to the loss function that ensures the similarity between lower-dimensional representations approximates the similarities of the full-fidelity counterparts. Voyage AI’s latest text embedding models—voyage-3-large, voyage-3.5, and voyage-3.5-lite—are trained with MRL terms and allow the user to specify an output dimension of 256, 512, 1024, and 2048. We can use the output_dimension parameter to specify which representation we want to consider.

Let’s see how the similarities among shorter vectors can approximate the similarities among full-fidelity vectors with voyage-3.5:

def cosine_similarity(v1,v2):
    magnitude_v1 = 0.0
    magnitude_v2 = 0.0
    dot_product = 0.0
    for i,v in enumerate(v1):
        dot_product  += v1[i]*v2[i]
        magnitude_v1 += v1[i]**2
        magnitude_v2 += v2[i]**2
    return dot_product/(math.sqrt(magnitude_v1)*math.sqrt(magnitude_v2))

# Calculating cosine similarities for MRL representations
cosine_similarity(query[:256],relevant_doc[:256])
cosine_similarity(query[:256],non_relevant_doc[:256])

cosine_similarity(query[:512],relevant_doc[:512])
cosine_similarity(query[:512],non_relevant_doc[:512])

cosine_similarity(query[:1024],relevant_doc[:1024])
cosine_similarity(query[:1024],non_relevant_doc[:1024])

cosine_similarity(query,relevant_doc)
cosine_similarity(query,non_relevant_doc])

We’ll use three MRL vectors, a query, a relevant document vector, and a non-relevant document vector. We expect the cosine similarity between the query and the relevant document vector to be larger than the similarity between the query and a non-relevant document vector. Cosine similarity measures the alignment between two vectors, where a high score indicates that the vectors point in the same direction in the embedding space. A score of 1.0 means the query and document vectors are identical, and a score of 0.0 means the vectors are orthogonal.

Let’s see if that’s the case:

Table 1. Similarity scores under several MRL sub-dimensions.

The first column of this table represents the MRL dimensions, the second column query/relevant_doc, and the third query/non_relevant_doc. Row 1 has inputs of 0..256, 0.728, and 0.386. Row 2 has inputs of 0..512, 0.704., and 0.346. Third row has inputs of 0..1024, 0.697, and 0.340. The final row has inputs of 0..2048, 0.702, and 0.340. There is an arrow pointing from the fourth row to the second row with a caption saying nearly identical similarities, 25% of the vector length.

The full-fidelity similarity scores can be approximated well with the 256, 512, and 1024 dimension vectors, so using all 2048 dimensions may not be necessary. For example, the full fidelity scores, 0.702 and 0.340, are close to the cosine similarities of the 512-dimensional representations, 0.704 and 0.346. This suggests that indexes built using the shorter vectors will have similar performance to indexes that use the 2048-dimensional vectors.

MongoDB vector search collections. We will generate four vector search indexes, each with a different MRL configuration, and measure the retrieval performance on the dataset’s queries for each index (for existing vectors, we can build MRL indexes with Views). We will examine the normalized discounted cumulative gain (NDCG) and the mean reciprocal rank (MRR).

The results are below:

Dimensions	NDCG@10	MRR@10	Relative Performance	Storage for 100M Vectors
256	0.703	0.653	0.963	102GB
512	0.721	0.672	0.987	205GB
1024	0.729	0.681	0.998	410GB
2048	0.730	0.682	1.000	820GB

We can then analyze the plot of relative accuracy versus storage costs:

Figure 2. Retrieval accuracy versus storage costs.

Graph showing retrieval accuracy on the Y axis versus storage cost on the x axis. at 256 GB, storage cost is at it's lowest but accuracy is also low. Meanwhile, at 2048 GB, storage costs are high, but accuracy is perfect.

The results indicate that for this corpus, we can represent documents with vectors of 512 dimensions, as the system provides retrieval accuracies comparable to those of higher-dimensional vectors, while achieving a significant reduction in storage and compute costs.

This choice dramatically cuts the amount of storage and compute required for vector retrieval, so our system will provide the best retrieval quality for each dollar spent on storage, achieving ~99% relative performance at a quarter of the storage and compute cost.

Faster and cheaper search

This blog post demonstrates that we can easily assess shorter vector representations to reduce the cost of our vector search systems using MRL parameters exposed in VoyageAI’s models. Using retrieval quality analysis tools, we discover that a vector 25% the length of the full fidelity representation is suitable for our use case, so our system will be less expensive and faster.

MRL options enable our customers to select the optimal representation for their data. Evaluating new vector search options can lead to improved overall system performance. We’re continuing to make it easy to tune vector search solutions and will be releasing additional features to tune and measure search system performance.

For more information about Voyage AI's models, check out the Voyage AI documentation page.

Join our MongoDB Community to learn about upcoming events, hear stories from MongoDB users, and connect with community members from around the world.

← Previous

How Tavily Uses MongoDB to Enhance Agentic Workflows

As AI agents grow in popularity and are used in increasingly mission-critical ways, preventing hallucinations and giving agents up-to-date context is more important than ever. Context can come from many sources—prompts, documents, proprietary internal databases, and the internet itself. Among these sources, the internet stands out as uniquely valuable, a best-in-class resource for humans and LLMs alike due to its massive scale and constant updates. But how can large language models (LLMs) access the latest and greatest information from the internet? Enter Tavily , one of the companies at the heart of this effort. Tavily provides an easy way to connect the web to LLMs, giving them the answers and context they need to be even more useful. MongoDB had the opportunity to sit down with Rotem Weiss, CEO of Tavily, and Eyal Ben Barouch, Tavily’s Head of Data and AI, to talk about the company’s history, how Tavily uses MongoDB, and the future of agentic workflows. Tavily’s origins Tavily began in 2023 with a simple but powerful idea. "We started with an open source project called GPT Researcher ," Weiss said. "It did something pretty simple—go to the web, do some research, get content, and write a report." That simplicity struck a chord. The project exploded, getting over 20,000 GitHub stars in under two years, signaling to the team that they had tapped into something developers desperately needed. The viral success revealed a fundamental gap in how AI systems access information. "So many use cases today require real-time search, whether it's from the web or from your users," Weiss noted. "And that is basically RAG (retrieval-augmented generation) ." "Developers are slowly realizing not everything is semantic, and that vector search alone cannot be the only solution for RAG," Weiss said. Indeed, for certain use cases, vector stores benefit from further context. This insight, buttressed by breakthrough research around CRAG (Corrective RAG) , pointed toward a future where systems automatically turn to the web to search when they lack sufficient information. Solving the real-time knowledge problem Consider the gap between static training data and our dynamic reality. Questions like "What is the weather today?" or "What was the score of the game last night?" require an injection of real-time information to accurately answer. Tavily's system fills this gap by providing AI agents with fresh, accurate data from the web, exactly when they need it. The challenge Tavily addresses goes beyond information retrieval. “Even if your model ‘knows’ the answer, it still needs to be sent in the right direction with grounded results—using Tavily makes your answers more robust,” Weiss explained. The new internet graph Weiss envisions a fundamental shift in how we think about the architecture of the web. "If you think about the new internet, it’s a fundamentally different thing. The internet used to be between people—you would send emails, you would search websites, etc. Now we have new players, the AI agents, who act as new nodes on the internet graph." These new nodes change everything. As they improve, AI agents can perform many of the same actions as humans, but with different needs and expectations. "Agents want different things than people want," Weiss explained. "They want answers; they don't need fancy UIs and a regular browser experience. They need a quick, scalable system to give them answers in real time. That's what Tavily gives you." The company's focus remains deliberately narrow and deep. "We always want to stick to the infrastructure layer compared to our competitors, since you don't know where the industry is going," Weiss said. "If we focus on optimizing the latency, the accuracy, the scalability, that's what is going to win, and that's what we're focused on." Figure 1. The road to insightful responses for users with TavilyHybridClient. MongoDB: The foundation for speed and scale To build their infrastructure, Tavily needed a database that could meet their ambitious performance requirements. For Weiss, the choice was both practical and personal. "MongoDB is the first database I ever used as a professional in my previous company," he said. "That's how I started, and I fell in love with MongoDB. It's amazing how flexible it is–it's so easy to implement everything." The document model, the foundation upon which MongoDB is built, allowed Tavily to build and scale an enterprise-grade solution quickly. But familiarity alone didn't drive the decision. MongoDB Atlas had the performance characteristics Tavily required. "Latency is one of the things that we always optimize for, and MongoDB delivers excellent price performance," Tavily’s Ben Barouch explained. "The performance is much more similar to a hot cache than a cold cache. It's almost like it's in memory!" The managed service aspect proved equally crucial. "MongoDB Atlas also saves a lot of engineering time," Weiss noted. In a fast-moving startup environment, MongoDB Atlas enabled Weiss to focus on building Tavily and not worry about the underlying data infrastructure. "Today, companies need to move extremely fast, and at very lean startups, you need to only focus on what you are building. MongoDB allows Tavily to focus on what matters most, our customers and our business." Three pillars of success The Tavily team highlighted three specific MongoDB Atlas characteristics that have become essential to their operations: Vector search : Perhaps most importantly for the AI era, MongoDB's vector search capabilities allow it to be "the memory for agents." As Weiss put it, "The only place where a company can have an edge is their proprietary data. Every company can access the best models, every company can search the web, every company can have good agent orchestration. The only differentiation is utilizing your internal, proprietary data and injecting it in the fastest and most efficient way to the prompt." MongoDB, first with Atlas Vector Search and now with Hybrid Search , has effective ways of giving agents performant context, setting them apart from those built with other technologies. Autoscaling : "Our system is built for a very fast-moving company, and we need to scale in a second," Weiss continued. "We don't need to waste time each week making changes that are done automatically by MongoDB Atlas." Monitoring : "We have other systems where we need to do our own monitoring with other cloud providers, and it's a lot of work that MongoDB Atlas takes care of for us," Weiss explained. "MongoDB has great visibility." Betting on proven innovation Tavily has been impressed with the way MongoDB has kept a finger on the pulse of the evolving AI landscape and added features accordingly. “I believed that MongoDB would be up to date quickly, and I was right," Weiss said. "MongoDB quickly thought about vector search, about other features that I needed, and got them in the product. Not having to bolt-on a separate vector database and having those capabilities natively in Atlas is a game changer for us." Ben Barouch emphasized the strategic value of MongoDB’s entire ecosystem, including the community built around the database: "When everyone's offering the same solutions, they become the baseline, and then the things that MongoDB excels at, things like reliability and scalability, are really amplified. The community, especially, is great; MongoDB has excellent developer relations, so learning and using MongoDB is very easy." The partnership between MongoDB and Tavily extends beyond technology to trust. "In this crazy market, where you have new tools every two hours and things are constantly changing, you want to make sure that you're choosing companies you trust to handle things correctly and fast," Weiss said. "I want a vendor where if I have feedback, I'm not afraid to say it, and they will listen." Looking ahead: The multi-agent future As Tavily continues building the infrastructure for AI agents to search the web, Weiss sees the next evolution already taking shape. "The future is going to be thinking about combining these one, two, three, four agents into a workflow that makes sense for specific use cases and specific companies. That will be the new developer experience." This vision of orchestrated AI workflows represents just the beginning. With MongoDB Atlas providing the scalable, reliable foundation they need, Tavily is positioning itself at the center of a fundamental shift in how information flows through our digital world. The internet welcomed people first, then connected them in revolutionary ways. Now, as AI agents join the network, companies like Tavily are building the infrastructure to ensure this next chapter of digital evolution is both powerful and accessible. With MongoDB as their foundation, they're not just adapting to the future—they're building it. Interested in building with MongoDB Atlas yourself? Try it today ! Use Tavily for working memory in this MongoDB tutorial . Explore Tavily’s Crawl to RAG example.

August 5, 2025

Next →

MongoDB Announces Leadership Transition

Dev Ittycheria, President and Chief Executive Officer, shared the following message with MongoDB employees this morning. This is the hardest email I have ever had to write to all of you. If you have not seen the announcement, I have decided to retire as CEO. Effective November 10, 2025, Chirantan “CJ” Desai will become the new CEO of MongoDB. This was not an easy decision for me. The process to get to this point has been deeply emotional, as I care profoundly about MongoDB and the people who have made the company what it is today. This news may come as a surprise, and for some, perhaps even a shock. That’s natural. Leadership transitions can evoke a range of reactions. I want to share why this is happening, and why it’s the right thing for MongoDB. Every personnel change, including the most senior leadership changes, involves two key decisions: first, recognizing that it is the right time for change, and second, selecting the best person to replace the person leaving. This email is intended to explain both decisions. Earlier this year, as part of our regular succession planning process, the Board and I discussed my long-term commitment. They asked if I would continue as CEO for another five years. After many conversations with my family and the Board, I realized I could not make that commitment. Some CEOs see their title as their identity. I do not. My core responsibility is to serve in the company's best interests. The company is primed for a new leader. One with a fresh perspective, grounded in experience and skills needed to guide MongoDB through its next evolution as a company, what we call MongoDB 3.0. Consequently, I informed the Board that I would commit to two more years to help find a successor. That began the search process for a suitable successor. To our surprise and delight, what we thought would easily take 12 to 24 months happened much faster than anyone expected. After engaging with multiple qualified candidates, we found the right successor in CJ. CJ is uniquely qualified for this role. CJ brings the rare growth-at-scale experience that will help continue to build MongoDB into an iconic technology company. At ServiceNow, he was the only executive to work directly with three of its highly regarded public company CEOs and played a pivotal role in organically scaling the company from just over $1 billion to more than $10 billion in revenue. Only a handful of independent software companies have ever reached that milestone. CJ helped transform ServiceNow from a product company to a platform company, scaled engineering, drove go-to-market excellence, and engaged deeply with investors. More recently, as President of Product and Engineering at Cloudflare, he helped fuel strong growth and stock performance. CJ also possesses the personal qualities needed to succeed as CEO. He is humble, eager to learn, and wants to draw on the perspectives of the people at MongoDB and other stakeholders to inform his thinking. This blend of experience, judgment, and character gives me full confidence that he is well-equipped to lead MongoDB through its next phase of growth. I often think of MongoDB’s journey as a long and extraordinary expedition. For the past eleven years, I have had the privilege of serving as its guide, helping chart the course, rally the team, and climb together through both calm and challenging terrain. Along the way, we have reached remarkable summits and proven what is possible through relentless innovation, persistence, and teamwork. Now it is time for a new guide to lead the next stage of the ascent and take MongoDB to even greater heights. CJ is the right leader to take MongoDB to the next summit. MongoDB is on a strong footing, with a clear strategy, an exceptional leadership team, a product platform that is more relevant than ever, and a business that is executing well. The rise of AI and the explosion of data-intensive applications play directly to MongoDB’s strengths. Our technology sits at the center of how modern applications are built and how organizations will harness data to power intelligent, adaptive systems. I am confident MongoDB is perfectly positioned to capture this next wave of innovation. As for me, I am not running away from MongoDB or leaving to join another company as CEO. I will remain on the Board and work closely with CJ to ensure a seamless transition. Over the years, this role has demanded an enormous amount of focus and energy; as a result, there are many things I’ve missed doing along the way. I’m looking forward to being more present for those moments — from simple time with my family to experiences and travel we’ve long put off. I plan to hold on to my MongoDB stock, as I firmly believe in the people and the opportunity, knowing that MongoDB’s best days are ahead of it. Yes, change can be unsettling. I’m sure you will have many questions about this change, such as why now, why CJ is the best person to lead the company, and what this means for you. We will hold an all-hands meeting tomorrow at 10:30AM ET to discuss this transition, introduce CJ and take your questions. That being said, I want to emphasize that the right change at the right time is how great companies get stronger. Just as a championship team refreshes its roster to stay competitive, MongoDB is bringing in new leadership, including other recent C-suite leaders who came before CJ, to drive our next phase of growth. This is not an ending; it’s the founding of a new moment. I am incredibly proud of what we have built together and genuinely excited about what lies ahead with CJ leading us forward. I also want to thank each of you for making this journey so meaningful. Words cannot fully capture my gratitude for your passion, creativity, and belief in building something truly special. I have often said that I want MongoDB to be an inflection point in people’s careers, a place where they can grow, take risks, and do the best work of their lives. I can say without hesitation that it has been exactly that for me. The skills I have developed, the experiences I have gained, and the relationships I have formed here have shaped me more than any other chapter in my professional life. I will carry them with me always, and will continue to cheer for and support MongoDB every step of the way. --Dev

November 3, 2025