Docs Menu
Docs Home
/ /

Embedding and Reranking API Overview

The Embedding and Reranking API provides programmatic access to the latest Voyage AI embedding and reranking models through a RESTful interface. This page provides an overview of the API and its features.

For detailed information and parameters, see the API specification.

You use MongoDB Atlas to manage API keys for the Embedding and Reranking API. This includes creating and managing your model API keys across your organization and projects, monitoring usage, and configuring rate limits.

To learn more, see Model API Keys.

Note

It is named model API key to distinguish it from other API keys in Atlas. You use this key the same way as API keys from other model providers.

All requests to the Embedding and Reranking API must include an Authorization header with your model API key using the Bearer token format.

Authorization: Bearer VOYAGE_API_KEY

When you use a client SDK, you set the API key when constructing a client, and the SDK sends the header on your behalf with every request. When you integrate directly with the API, you must send this header yourself.

All entities are represented in JSON. The following rules and conventions apply:

Content Type Request Header
When you send JSON to the server with a POST request, specify the Content-Type: application/json header. Client SDKs handle this automatically.
Invalid Requests
If you attempt to create a request with invalid JSON, incorrect data types, or constraint violations (such as exceeding token limits or batch sizes), the server responds with a 400 status code and an error message describing the issue.
Field Names for Fields with Numbers
Fields that contain numeric values are named to disambiguate the unit being used. For example, token counts are specified in fields like total_tokens and output_dimension to clarify the measurement unit.

The Embedding and Reranking API implements rate limiting to ensure fair usage and optimal performance. Rate limits are applied per API key and measured in two dimensions. Your rate limits increase as you advance through usage tiers.

  • TPM (Tokens Per Minute): Maximum number of tokens processed per minute

  • RPM (Requests Per Minute): Maximum number of API requests per minute

If you exceed the rate limit, the API returns a 429 (Rate Limit Exceeded) HTTP status code.

Free trial rate limits without a payment method are 3 RPM and 10K TPM. To qualify for higher rate limits, add a payment method to your account.

Model
Tokens Per Min (TPM)
Requests Per Min (RPM)

voyage-4-lite, voyage-3.5-lite

16,000,000

2,000

voyage-4, voyage-3.5

8,000,000

2,000

voyage-4-large

3,000,000

2,000

voyage-3-large, voyage-context-3, voyage-code-3, voyage-code-2, voyage-law-2, voyage-finance-2

3,000,000

2,000

voyage-multimodal-3.5, voyage-multimodal-3

2,000,000

2,000

rerank-2-lite, rerank-2.5-lite

4,000,000

2,000

rerank-2, rerank-2.5

2,000,000

2,000

The rate limits for Usage Tier 2 are twice those of Usage Tier 1.

Model
Tokens Per Min (TPM)
Requests Per Min (RPM)

voyage-4-lite, voyage-3.5-lite

32,000,000

4,000

voyage-4, voyage-3.5

16,000,000

4,000

voyage-4-large

6,000,000

4,000

voyage-3-large, voyage-context-3, voyage-code-3, voyage-code-2, voyage-law-2, voyage-finance-2

6,000,000

4,000

voyage-multimodal-3.5, voyage-multimodal-3

4,000,000

4,000

rerank-2-lite, rerank-2.5-lite

8,000,000

4,000

rerank-2, rerank-2.5

4,000,000

4,000

The rate limits for Usage Tier 3 are three times those of Usage Tier 1.

Model
Tokens Per Min (TPM)
Requests Per Min (RPM)

voyage-4-lite, voyage-3.5-lite

48,000,000

6,000

voyage-4, voyage-3.5

24,000,000

6,000

voyage-4-large

9,000,000

6,000

voyage-3-large, voyage-context-3, voyage-code-3, voyage-code-2, voyage-law-2, voyage-finance-2

9,000,000

6,000

voyage-multimodal-3.5, voyage-multimodal-3

6,000,000

6,000

rerank-2-lite, rerank-2.5-lite

12,000,000

6,000

rerank-2, rerank-2.5

6,000,000

6,000

To learn more about usage tiers, see Usage Tiers.

To set custom rate limits for your organization, use the Atlas UI. To learn more, see Manage Rate Limits.

The following example demonstrates how you can use cURL to make a request to the embedding service. You can also use an HTTP client in any programming language to access the API.

For additional usage examples, see the following resources:

curl \
--request POST 'https://ai.mongodb.com/v1/embeddings' \
--header "Authorization: Bearer $VOYAGE_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"input": [
"MongoDB is redefining what a database is in the AI era.",
"Voyage AI embedding and reranking models are state-of-the-art."
],
"model": "voyage-4-large"
}'

To learn more about errors returned by the API, see the API specification.

Consider the following best practices when you use the API:

For semantic search and retrieval tasks, set the input_type to query or document to optimize how Voyage AI models create the vectors. Do not omit this parameter.

The parameter adds the following prompts to your input before generating embeddings:

  • query: "Represent the query for retrieving supporting documents: "

  • document: "Represent the document for retrieval: "

Example

input_type="query" transforms "When is Apple's conference call scheduled?" into "Represent the query for retrieving supporting documents: When is Apple's conference call scheduled?"

If you're using the Python client, you must use version 0.3.7 or later. To check the version of your Python client installation, run the following command in your terminal:

python -c "import voyageai; print(voyageai.__version__)"

Back

Revoke Service Account Token

On this page