Embedding and Reranking API Overview

The Embedding and Reranking API is in Preview. The feature and the corresponding documentation might change at any time during the preview period.

The Embedding and Reranking API provides programmatic access to the latest Voyage AI embedding and reranking models through a RESTful interface. This page provides an overview of the API and its features.

For detailed information and parameters, see the API specification.

API Key Management

You use MongoDB Atlas to manage API keys for the Embedding and Reranking API. This includes creating and managing your model API keys across your organization and projects, monitoring usage, and configuring rate limits.

To learn more, see Model API Keys.

Note

It is named model API key to distinguish it from other API keys in Atlas. You use this key the same way as API keys from other model providers.

Authentication

All requests to the Embedding and Reranking API must include an Authorization header with your model API key using the Bearer token format.

Authorization: Bearer VOYAGE_API_KEY

When you use a client SDK, you set the API key when constructing a client, and the SDK sends the header on your behalf with every request. When you integrate directly with the API, you must send this header yourself.

JSON

All entities are represented in JSON. The following rules and conventions apply:

Content Type Request Header: When you send JSON to the server with a POST request, specify the Content-Type: application/json header. Client SDKs handle this automatically.
Invalid Requests: If you attempt to create a request with invalid JSON, incorrect data types, or constraint violations (such as exceeding token limits or batch sizes), the server responds with a 400 status code and an error message describing the issue.
Field Names for Fields with Numbers: Fields that contain numeric values are named to disambiguate the unit being used. For example, token counts are specified in fields like total_tokens and output_dimension to clarify the measurement unit.

Rate Limits and Usage Tiers

The Embedding and Reranking API implements rate limiting to ensure fair usage and optimal performance. Rate limits are applied per API key and measured in two dimensions. Your rate limits increase as you advance through usage tiers.

TPM (Tokens Per Minute): Maximum number of tokens processed per minute
RPM (Requests Per Minute): Maximum number of API requests per minute

If you exceed the rate limit, the API returns a 429 (Rate Limit Exceeded) HTTP status code.

Free trial rate limits without a payment method are 3 RPM and 10K TPM. To qualify for higher rate limits, add a payment method to your account.

Model	Tokens Per Min (TPM)	Requests Per Min (RPM)
`voyage-4-lite`, `voyage-3.5-lite`	16,000,000	2,000
`voyage-4`, `voyage-3.5`	8,000,000	2,000
`voyage-4-large`	3,000,000	2,000
`voyage-3-large`, `voyage-context-3`, `voyage-code-3`, `voyage-code-2`, `voyage-law-2`, `voyage-finance-2`	3,000,000	2,000
`voyage-multimodal-3.5`, `voyage-multimodal-3`	2,000,000	2,000
`rerank-2-lite`, `rerank-2.5-lite`	4,000,000	2,000
`rerank-2`, `rerank-2.5`	2,000,000	2,000

The rate limits for Usage Tier 2 are twice those of Usage Tier 1.

Model	Tokens Per Min (TPM)	Requests Per Min (RPM)
`voyage-4-lite`, `voyage-3.5-lite`	32,000,000	4,000
`voyage-4`, `voyage-3.5`	16,000,000	4,000
`voyage-4-large`	6,000,000	4,000
`voyage-3-large`, `voyage-context-3`, `voyage-code-3`, `voyage-code-2`, `voyage-law-2`, `voyage-finance-2`	6,000,000	4,000
`voyage-multimodal-3.5`, `voyage-multimodal-3`	4,000,000	4,000
`rerank-2-lite`, `rerank-2.5-lite`	8,000,000	4,000
`rerank-2`, `rerank-2.5`	4,000,000	4,000

The rate limits for Usage Tier 3 are three times those of Usage Tier 1.

Model	Tokens Per Min (TPM)	Requests Per Min (RPM)
`voyage-4-lite`, `voyage-3.5-lite`	48,000,000	6,000
`voyage-4`, `voyage-3.5`	24,000,000	6,000
`voyage-4-large`	9,000,000	6,000
`voyage-3-large`, `voyage-context-3`, `voyage-code-3`, `voyage-code-2`, `voyage-law-2`, `voyage-finance-2`	9,000,000	6,000
`voyage-multimodal-3.5`, `voyage-multimodal-3`	6,000,000	6,000
`rerank-2-lite`, `rerank-2.5-lite`	12,000,000	6,000
`rerank-2`, `rerank-2.5`	6,000,000	6,000

To learn more about usage tiers, see Usage Tiers.

To set custom rate limits for your organization, use the Atlas UI. To learn more, see Manage Rate Limits.

Making Requests

The following example demonstrates how you can use cURL to make a request to the embedding service. You can also use an HTTP client in any programming language to access the API.

For additional usage examples, see the following resources:

Accessing Voyage AI Models for HTTP request and client SDK examples
Model pages for model-specific usage.
API specification for full details on all API endpoints.

curl \
  --request POST 'https://ai.mongodb.com/v1/embeddings' \
  --header "Authorization: Bearer $VOYAGE_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "input": [
      "MongoDB is redefining what a database is in the AI era.",
      "Voyage AI embedding and reranking models are state-of-the-art."
    ],
    "model": "voyage-4-large"
  }'

Errors

To learn more about errors returned by the API, see the API specification.

Best Practices

Consider the following best practices when you use the API:

Specifying Input Type

For semantic search and retrieval tasks, set the input_type to query or document to optimize how Voyage AI models create the vectors. Do not omit this parameter.

The parameter adds the following prompts to your input before generating embeddings:

query: "Represent the query for retrieving supporting documents: "
document: "Represent the document for retrieval: "

Example

input_type="query" transforms "When is Apple's conference call scheduled?" into "Represent the query for retrieving supporting documents: When is Apple's conference call scheduled?"

Troubleshooting

If you're using the Python client, you must use version 0.3.7 or later. To check the version of your Python client installation, run the following command in your terminal:

python -c "import voyageai; print(voyageai.__version__)"

Back

Revoke Service Account Token

Atlas Kubernetes Operator