A reranker receives a query and many documents, and returns a ranked list of relevancy between the query and documents. The documents are often the preliminary results from an embedding-based retrieval system, and the reranker refines the ranks of these candidate documents and provides more accurate relevancy scores.
Unlike embedding models that encode queries and documents separately, rerankers are cross-encoders that jointly process a pair of query and document, enabling more accurate relevancy prediction. Apply a reranker on the top candidates retrieved with embedding-based search or with lexical search algorithms such as BM25 and TF-IDF.
Available Models
Model | Context Length | Description |
|---|---|---|
| 32,000 | Highest accuracy. Recommended for most applications. To learn more, see the blog post. |
| 32,000 | Fast and cost-effective model optimized for latency-sensitive applications. To learn more, see the blog post. |
Our latest models perform better than the legacy models in all aspects, such as quality, context length, latency, and throughput.
Model | Context Length | Description |
|---|---|---|
| 16,000 tokens | Our generalist second-generation reranker optimized for quality with multilingual support. To learn more, see the blog post. |
| 8,000 tokens | Our generalist second-generation reranker optimized for both latency and quality with multilingual support. To learn more, see the blog post. |
Tutorials
For tutorials on using rerankers, see the following resources: