rerank-2.5 and rerank-2.5-lite: Instruction-Following Rerankers

MongoDB
August 11, 2025 | Updated: September 9, 2025

Note to readers: rerank-2.5 and rerank-2.5-lite are available through the Voyage AI APIs directly. For access, sign up for Voyage AI.

TL;DR – We are excited to introduce the rerank-2.5 series, which significantly improves upon rerank-2’s performance while also introducing instruction-following capabilities for the first time. On our standard suite of 93 retrieval datasets spanning multiple domains, rerank-2.5 and rerank-2.5-lite improve retrieval accuracy by 7.94% and 7.16% over Cohere Rerank v3.5.

Furthermore, the new instruction-following feature allows users to steer the model’s output relevance scores using natural language. On the Massive Instructed Retrieval Benchmark (MAIR), rerank-2.5 and rerank-2.5-lite outperform Cohere Rerank v3.5 by 12.70% and 10.36%, respectively, and by similar margins on our in-house evaluation datasets.

Both models now support a 32K token context length – 8x that of Cohere Rerank v3.5 and double that of rerank-2 – enabling more accurate retrieval across longer documents.

Rerankers are a critical component in sophisticated retrieval systems, refining initial search results to deliver superior accuracy. Today, we are excited to announce rerank-2.5 and rerank-2.5-lite. Both models outperform LLMs as rerankers – a topic which we will dive deeper into in an upcoming blog post. These models are the product of an improved mixture of training data and advanced distillation techniques from our larger, in-house instruction-following models.

Both rerank-2.5 and rerank-2.5-lite now support a 32K token context length, an 8x increase over Cohere Rerank v3.5. This allows for the reranking of much longer documents without truncation and comes with no change in pricing.

For an introduction into rerankers, check out our previous post.

Instruction-following capability

A key feature of the rerank-2.5 series is its instruction-following capability. This allows users to dynamically steer the reranking process by providing explicit instructions alongside their query. These instructions can define the user’s notion of relevance or specify the desired characteristics of the documents to be retrieved. Leveraging the new instruction-following capability is straightforward. Users can simply append or prepend natural language instructions to their queries. The model is designed to understand these instructions and adjust the output relevance score accordingly.

Examples of instructions - Instructions can include, but are not limited to, the following examples:

Emphasizing query components: Specify which parts of a document are most important. For an application that searches academic papers, a standing instruction could be “Prioritize the title and ignore the abstract” to consistently surface the most relevant research based on titles.
Defining document types: Direct the reranker to retrieve a specific type of document (e.g., for the query “legal implications of AI,” an instruction could be “Retrieve regulatory documents and legal statutes, not court cases.”). For example, a legal research tool could be configured with the instruction “Retrieve regulatory documents and legal statutes, not court cases” to ensure that all queries prioritize statutory law over case law.
Disambiguating queries with contexts: Provide complementary information so that ambiguous queries can be clarified. For example, an instruction could be “This is an e-commerce application about cars” so that the word “Jaguar” will be interpreted as the car brand rather than as an animal.

Concrete examples of instructions and the impact of instructions on search results are available in Appendix A.

Accuracy gains from instruction following: The instruction-following feature is particularly useful for search/retrieval tasks where user intent can be nuanced. To demonstrate this, we built a set of in-house evaluation datasets composed of 24 domain-specific instruction-following datasets across 7 domains (web, tech, legal, finance, conversational, medical, and code). On domain-specific data, the accuracy of rerank-2.5 and rerank-2.5-lite is increased by an average of 8.13% and 7.55%, respectively, when leveraging instructions.

Figure 1. Accuracy of rerank-2.5 with and without instructions for domain-specific instruction.

Bar chart showing the accuracy of Voyage AI models with and without instructions. The highest accuracy comes from voyage-3.5 + rerank-2.5 with instructions. — Domain-specific instruction following results.

Evaluation details

Datasets: For standard results without instruction following, we conducted an evaluation across 9 domains: technical documentation, code, law, finance, web reviews, multilingual, long documents, medical, and conversations. The multilingual domain is composed of 51 datasets from 31 languages. Detailed information about each of the domains and languages can be found in the rerank-2 release blog.

To evaluate instruction-following capabilities, we utilize a set of in-house domain-specific and real-world instruction-following datasets (detailed in the previous section) as well as the MAIR (Massive Instructed Retrieval) benchmark, an academic benchmark with task-specific instructions in domains such as web, legal, and biomedical search.

Method and metrics: We evaluate the retrieval quality of various rerankers on top of four first-stage search methods: (1) lexical search with BM25, (2) OpenAI v3 large (text-embedding-3-large), (3) voyage-3-large, and (4) voyage-3.5. For each query, the first-stage method retrieves up to 100 candidate documents. The reranker then re-orders these documents, and we retrieve the top 10. We report the normalized discounted cumulative gain (NDCG@10), the standard metric for retrieval quality.

Baselines: We compare our models against rerank-2-lite, rerank-2, Cohere Rerank 3.5, and Qwen3-Reranker-8B.

Results

rerank-2.5 and rerank-2.5-lite collectively set a new cost-to-performance frontier. Specifically, rerank-2.5 outperforms rerank-2 by 1.85% at the same price per token, while rerank-2.5-lite outperforms rerank-2-lite by 3.40% at the same price per token. Furthermore, rerank-2.5-lite performs better than Qwen3-Reranker-8B, the best open source reranker, despite being over an order of magnitude smaller.

Figure 2. Retrieval quality versus price per million tokens for rerankers.

Graph displaying retrieval quality versus cost for reranker models. rerank-2.5 has the highest retrieval quality at the second highest cost, and rerank-2.5-lite has the second highest retrieval quality at the second lowest cost. — We use $0.10 for Qwen3-Reranker-8B following the industry standard for 8B-parameter models.

Real-world instruction following: In addition to the 24 domain-specific instruction-following datasets, we also curated 3 instruction-following datasets from real-world applications. Evaluating on these datasets shows that the accuracy of rerank-2.5 and rerank-2.5-lite is increased by an average of 11.48% and 7.83%, respectively, when leveraging instructions.

Figure 3. Accuracy of Voyage AI with and without instruction for real-world instruction.

Another bar chart showing the accuracy of Voyage AI models with and without instructions. Results are the same as the above bar chart. — Real-world instruction following results.

Results without instruction following: The first bar chart below shows the average accuracy of each reranker when evaluated across 9 domains without instruction following. rerank-2.5 and rerank-2.5-lite consistently emerge as the top-performing rerankers, regardless of the first-stage retrieval method used. This is not the case for Cohere Rerank v3.5, which hurts retrieval quality when applied on top of voyage-3-large (the most powerful first-stage retrieval method). In particular:

Averaged across the four first-stage retrieval methods, rerank-2.5 outperforms Cohere Rerank v3.5, Qwen3-Reranker-8B, and rerank-2 by 7.94%, 2.25%, 1.85%, respectively.
rerank-2.5-lite, while optimized for latency, still outperforms Cohere Rerank v3.5, Qwen3 Reranker 8B, and rerank-2 by 7.16%, 1.47%, and 1.08%, respectively.
Both rerank-2.5 and rerank-2.5-lite provide a significant quality improvement on top of all first-stage retrieval results.

Figure 4. Reranker averages across domains without instruction.

A set of bar graphs showing reranker retrieval accuracy averages across domains. In each instance, rerank-2.5 and rerank-2.5-lite outperform all other models.

The bar charts below illustrate NDCG@10 across different languages. Both rerank-2.5 and rerank-2.5-lite consistently increase performance across the board for all languages and first-stage retrieval methods. Specifically:

Averaged across the four first-stage retrieval methods, rerank-2.5 outperforms Cohere Rerank v3.5, Qwen3-Reranker-8B, and rerank-2 by 3.26%, 2.34%, and 1.35%, respectively.
Likewise, rerank-2.5-lite outperforms Cohere Rerank v3.5, Qwen3-Reranker-8B, and rerank-2-lite by 1.93%, 1.01%, and 2.70%, respectively.

Figure 5. Retrieval accuracy averages for reranker models across languages.

A set of bar graphs showing reranker retrieval accuracy averages across different languages. In each instance, rerank-2.5 and rerank-2.5-lite outperform all other models.

Detailed domain-specific and multilingual results using BM25, voyage-3-large, and voyage-3.5 as first-stage retrieval methods can be found in Appendix B.

MAIR benchmark - The figures below illustrate the accuracy gains attained by rerank-2.5 and rerank-2.5-lite on MAIR. Both rerank-2.5 and rerank-2.5-lite consistently improve atop all first-stage search results. Specifically:

rerank-2.5 outperforms Cohere Rerank v3.5 and rerank-2 by an average of 12.70% and 4.90% when evaluated atop the four first-stage retrieval methods.
rerank-2.5-lite outperforms Cohere Rerank v3.5 and rerank-2 by an average of 10.36% and 2.57% when evaluated atop the four first-stage retrieval methods.

Figure 6. Accuracy gains through MAIR.

Bar charts showing the increases to accuracy through rerank-2.5 and rerank-2.5-lite through MAIR.

Detailed results: Numeric results for all evaluations are available in this spreadsheet.

Try rerank-2.5 and rerank-2.5-lite today!

Both rerank-2.5 and rerank-2.5-lite are available today with flexible, token-based pricing. For existing rerank-2 and rerank-2-lite users, we recommend upgrading to rerank-2.5 and rerank-2.5-lite, respectively. This upgrade provides better quality and double the context length at the same cost. We will continue to offer the rerank-2 series for existing users who do not wish to upgrade to rerank-2.5.

For new users, head over to our docs to get started and learn more; first 200M tokens are free. As our results show, combining Voyage embedding models with Voyage rerankers delivers the highest possible retrieval accuracy.

Appendix A – Examples of instruction following

Query & Instruction	Model Results (Top-1 Document)
Query: Who is at highest risk for Hand-foot-and-mouth disease? Instruction: Focus on age-related risk factors and the most common age group.	rerank-2 (No Instruction): Children are at the highest risk of getting the disease. rerank-2.5 (With Instruction): The most important risk factor is age. The infection occurs most often in children under age 10, but can be seen in adolescents and occasionally adults. The outbreaks occur most often in the summer and early fall.
Query: what does unlock my device mean? Instruction: I am an international business person and need to unlock my work phone for use with multiple carriers. My search should prioritize the implications of unlocking a device provided by my employer, focusing on adherence to my company’s BYOD policies and the impact on global connectivity.	rerank-2 (No Instruction): The term “unlock my device” typically means making a mobile phone or tablet work with a different service provider’s network… It’s like having a key to open up your selection of providers. rerank-2.5 (With Instruction): Gaining access to other network services on a phone initially configured for a specific telecommunications company… For a professional who travels across borders on business transactions, this process allows seamless switching to local service providers, ensuring constant connection to corporate networks and clients, given that this does not contravene any pre-established protocols…
Query: Why are historical prices of stocks different on different websites? Which one should I believe? Instruction: Explain the process and challenges of collecting and reconciling historical stock price data across different financial websites.	rerank-2 (No Instruction): I still can’t understand why there is a price discrepancancy. There isn’t. It’s the same stock and price differences between such major exchanges will always be minimal… rerank-2.5 (With Instruction): The cause of incomplete/inaccurate financial data’s appearing on free sites is that it is both complicated and expensive to obtain and parse these data. Even within a single country, different pieces of financial data are handled by different authorities… There are some companies (e.g. Bloomberg) whose entire business model is to do the above…

Appendix B – Figures for domain-specific and multilingual results

The figures below show results on domain-specific datasets without instructions for BM25, voyage-3-large, and voyage-3.5 as the first-stage retrieval method, respectively:

The figures below show results on multilingual datasets when using BM25, voyage-3-large, and voyage-3.5 as the first-stage retrieval method, respectively:

← Previous

Scale Performance with View Support for MongoDB Atlas Search and Vector Search

We are thrilled to announce the general availability (GA) of View Support for MongoDB Atlas Search and Atlas Vector Search , available on MongoDB versions 8.0+. This new feature allows you to perform powerful pre-indexing optimizations—including Partial Indexing to filter your collections, and Document Transformation to reshape your data for peak performance. View Support for MongoDB Atlas Search helps you build more efficient, performant, and cost-effective search experiences by giving you precise control over your search strategy. Let's look at how it works. How it works in 3 simple steps At its core, View Support is powered by MongoDB views , queryable objects whose contents are defined by an aggregation pipeline on other collections or views. Getting started is straightforward: Create a view: Define a standard view using an aggregation pipeline to filter or transform your source collection. This feature is designed to support views that contain the stages $match with an $expr operator, $addFields , and $set . Note: Views with multi-collection stages like $lookup are not supported for search indexing at this time. Index the view: Build your MongoDB Atlas Search or Atlas Vector Search index directly on the view you just created. Query the view: This is the best part. You run your $search , $searchMeta , or $vectorSearch queries directly against the view itself to get results from your perfectly curated data. With this simple workflow, you can now fine-tune exactly what and how your data is indexed. The two key capabilities you can use today are Partial Indexing and Document Transformation. Figure 1. High-level architectural diagram of search index replication on a view. Search indexes perform initial sync on the collection and apply the view pipeline before saving the search index to disk storage. Index only what you need with partial indexing Often, only a subset of your data is truly relevant for search. Imagine an e-commerce catalog where only "in-stock" products should be searchable or a RAG system where only documents containing vector embeddings will be retrieved. With Partial Indexing, you can use a $match stage in your view to create a highly-focused index that: Reduces index size: Dramatically shrink the footprint of your search indexes, leading to cost savings and faster operations. Improves performance: Smaller indexes mean faster queries and index build times. Optimize your data model with document transformation Beyond filtering, you can also reshape documents for optimal search performance. Using $addFields or $set in your view, you can create a search-optimized version of your data without altering your original collection. This is perfect for: Pre-computing values: Combine a firstName and lastName into a fullName field for easier searching, or pre-calculate the number of items in an array. Supporting all data types: Convert incompatible data types, like Decimal128, into search-compatible types like Double. You can also convert booleans or ObjectIDs to strings to enable faceting. Flattening your schema: Promote important fields from deeply nested documents to the top level, simplifying queries and improving performance over expensive $elemMatch operations. Testing different vector dimensionalities: Splice large MRL Embeddings from Voyage into smaller ones to evaluate the tradeoffs between accuracy and performance. For example, consider a vacation home rental company with a listings collection storing reviews as an array of objects. To enable end-users to filter for listings with > N reviews, they create a view called listingsSearchView . The view pipeline of listingsSearchView uses an $addFields stage to add the numReviews field, which is computed based on the size of the reviews array. By creating a search index on listingsSearchView , they can run efficient $search queries on numReviews without compromising data integrity in the source collection. Figure 2. High-level architectural diagram of running search queries on a view. After the search index identifies documents to return, mongod applies the view pipeline to return the view documents. Why these optimizations are critical for scaling As your application and data volume grow, search efficiency can become a bottleneck. View Support for MongoDB Atlas Search provides the critical tools you need to maintain blazing-fast performance and control costs at scale by giving you granular control over your indexes. We are incredibly excited to see how you use these new capabilities to build the next generation of powerful search experiences on MongoDB Atlas. Ready to get started? Dive into the documentation to learn more: Atlas Search , Atlas Vector Search . Note: We plan to add compatibility for more types of Views in the future. If there’s a stage that you want to see, please let us know .

August 7, 2025

Next →

Cars24 Improves Search For 300 Million Users With MongoDB Atlas

The Indian multinational online car marketplace Cars24 serves 300 million users globally. The company offers services that span sales, insurance, maintenance, financing, and more, reshaping the entire car ownership journey. Speaking at MongoDB .local Bengaluru in July 2025 , Pradeep Sharma, Head of Technology at Cars24, shared how MongoDB has been a key driver of Car24’s digital transformation journey. Specifically, he highlighted two recent use cases that show how MongoDB Atlas has helped Cars24 scale, improve its search capabilities, and reduce its architectural complexity. Matching the growing scale with simplified and expanded search Cars24 has operations in multiple countries, and a diverse customer base. Over the years, the company has used customer data, behavior analytics, and operational workflows to build, evolving from being a platform for buying and selling cars, to an end-to-end ecosystem, supported by a hub of interconnected systems. At the start of its journey, Cars24 relied on legacy databases for managing and searching data, such as Postgres. Their relational database set-up would store information, synchronize the data to a separate “bolt-on” search engine (such as Elasticsearch), manually indexing it, and then querying the index. While initially effective for a small application ecosystem, these processes became bottlenecked as the organization’s services grew. Multiple engineering teams piped data into a single search index, which often resulted in synchronization challenges and overwhelming administrative overhead. Cars24 faced three core limitations with this setup: Lower developer productivity: Exponential effort was spent maintaining pipelines and synchronizing procedures. Developers had little bandwidth for building business features or innovation. Architectural complexity: Ensuring data sync consistency required multiple pipelines and race logic. This led to inefficiencies in real-time dashboard updates for agents. Operational overhead: Maintaining separate systems for database and search—alongside provisioning, patching, scaling, and monitoring—strained resources. Seeking an integrated approach, Cars24 embraced MongoDB Atlas, hosted on Google Cloud . MongoDB Atlas would serve as a single, consistent, modern database and embedded search solution, powered by Apache Lucene. MongoDB Atlas Search also enabled Cars24 to run queries directly in the database. This eliminated the need to synchronise data between systems while delivering real-time results. This unified approach allowed the company’s developers to transition from managing complex synchronization mechanisms to building applications. Furthermore, the reduced administrative overhead enabled Cars24 to consolidate the team’s efforts, and to streamline query execution across the ecosystem. Thanks to MongoDB Atlas and MongoDB Atlas Search, Cars24 was able to: Avoid "synchronization tax”: Switching to MongoDB Atlas eliminated the need for data synchronization and the additional tooling this mandated. Real-time searches can be performed from a single interface and workflow. Deliver new search features faster: By using a single, unified API across database and search operations, new features can be delivered rapidly. Work with a fully managed platform: With MongoDB Atlas, Cars24’s engineers can focus more on application development and building products, rather than thinking about managing indexes, syncing, and more. Following this successful migration, Cars24 decided to also use MongoDB Atlas to replace one of its legacy databases, ArangoDB. The switch to MongoDB Atlas eliminated major roadblocks for other critical search capabilities. From ArangoDB to MongoDB: Streamlined operations and 50% cost savings As Cars24 scaled new services globally, it encountered limitations with its geospatial search solution, which was based on ArangoDB. This included performance bottlenecks, weak transactions as it was difficult to guarantee consistent data operations, and a limited ecosystem which meant that scaling developer onboarding and troubleshooting became increasingly onerous. Moving to MongoDB Atlas enabled Cars24 to transition its geospatial services, consolidating its data storage and search capabilities under a single, versatile platform. “We now have a highly available architecture, and an amazing team at MongoDB that has our back,” said Sharma. MongoDB offered a proven architecture for high availability, scalability, and real-world production readiness: Enhanced scalability: MongoDB’s ability to scale massive workloads supports Cars24’s growing global presence. Reliable transactions: MongoDB provides robust multi-document ACID transactions across shards, meeting mission-critical needs. Streamlined operations: MongoDB offers a single platform that is not limited to a database only. By consolidating its geospatial search workload under MongoDB, Cars24 has reduced maintenance and operational overhead. Not only did Cars24 cut costs in half by moving to MongoDB, but the widespread market adoption of MongoDB Atlas also means that Cars24 can continue to rapidly onboard developers familiar with MongoDB, a recruiting priority for Cars24’s growing development team. “To give you an idea, one of our business units had a developer team of less than 10 about a year ago. Now they are a triple-digit team,” said Sharma. “If we are going to keep introducing new developers, for a product coming up or scaling up, it becomes very important to focus on the community skills and support provided by our technology partner.” “Now that we have moved from ArangoDB to MongoDB Atlas, our developers are the happiest,” he added. Cars24 is now looking to consolidate even more of its application and data workflows under MongoDB Atlas. With the growing number of developers joining Cars24’s engineering teams, plans are to utilize MongoDB Atlas further to enhance productivity, scalability, and data-driven insights. Visit the MongoDB Atlas Learning Hub to learn more about Atlas. To learn more about MongoDB Atlas Search, visit our product page .

October 12, 2025