Rerankers

The Embedding and Reranking API is in Preview. The feature and the corresponding documentation might change at any time during the preview period.

A reranker receives a query and many documents, and returns a ranked list of relevancy between the query and documents. The documents are often the preliminary results from an embedding-based retrieval system, and the reranker refines the ranks of these candidate documents and provides more accurate relevancy scores.

Unlike embedding models that encode queries and documents separately, rerankers are cross-encoders that jointly process a pair of query and document, enabling more accurate relevancy prediction. Apply a reranker on the top candidates retrieved with embedding-based search or with lexical search algorithms such as BM25 and TF-IDF.

Available Models

Model	Context Length	Description
`rerank-2.5`	32,000	Highest accuracy. Recommended for most applications. To learn more, see the blog post.
`rerank-2.5-lite`	32,000	Fast and cost-effective model optimized for latency-sensitive applications. To learn more, see the blog post.

Older Models

The following older models are still accessible from our API, but we recommend using the new models for better quality and efficiency.

Our latest models perform better than the legacy models in all aspects, such as quality, context length, latency, and throughput.

Model	Context Length	Description
`rerank-2`	16,000 tokens	Our generalist second-generation reranker optimized for quality with multilingual support. To learn more, see the blog post.
`rerank-2-lite`	8,000 tokens	Our generalist second-generation reranker optimized for both latency and quality with multilingual support. To learn more, see the blog post.

Tutorials

For tutorials on using rerankers, see the following resources:

Usage

Language

Back

Multimodal Embeddings