Rate limits are restrictions on the frequency and number of tokens you can request from Voyage AI within a specified period of time. To learn more about rate limits, see Best Practices.
Atlas enforces rate limits based on the
model API key usage (requests per
minute (RPM) and tokens per minute (TPM)).
If you exceed the number of
requests or tokens in the most recent minute, the API denies any
subsequent additional request and returns a 429 (Rate Limit
Exceeded) HTTP status code.
Manage Rate Limits
The following sections describe how to manage rate limits in the Atlas UI.
Required Permissions
To set and reset rate limits at the project level, you must have
Project Owner access or higher to Atlas.
To view rate limits:
At the organization and project levels, you must have
Organization Read Onlyor higher access to Atlas.At only the project level, you must have
Project Read Onlyor higher access to Atlas.
Set Rate Limits
You can set different limits for each project at the project level. Project level rate limits can't exceed the rate limits for the organization. Rate limits set at the project level apply to all model API keys for the project.
Log in to Atlas.
Go to the AI Models page in the Atlas UI.
If it's not already displayed, select your desired organization from the Organizations menu in the navigation bar.
If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
At the project level, click AI Models under the Services header in the navigation bar.
Set the rate limits for the project.
From the navigation bar, select Rate Limits.
In the Actions column corresponding to the embeddings model for which you want to modify rate limits, click .
Modify the TPM and RPM values.
Project-level rate limits for each model can be any value less than or equal to the organization's rate limit.
Example
At usage tier 1, rate limits for the
voyage-4embedding model for a project can be set to2000RPM and8,000,000TPM, or lower.Click to apply the rate limit.
View Rate Limits
You can view the rate limits at the organization and project levels.
Log in to Atlas.
The page displays the following information:
Name | Description |
|---|---|
Model | List of Voyage AI embedding models. |
Tokens Per Minute (TPM) | Number of tokens that you can request within a minute from the Embedding and Reranking API endpoints. |
Requests Per Min (RPM) | Number of API requests that you can send within a minute to the Embedding and Reranking API endpoints. |
Log in to Atlas.
Go to the AI Models page in the Atlas UI.
If it's not already displayed, select your desired organization from the Organizations menu in the navigation bar.
If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
At the project level, click AI Models under the Services header in the navigation bar.
Select Rate Limits from the left navigation.
The page displays the following information about the rate limits:
Column Name | Column Description |
|---|---|
Model | List of Voyage AI embedding models. |
Tokens Per Minute (TPM) | Number of tokens that you can request within a minute from the Voyage AI Embedding and Reranking API endpoints. |
Requests Per Min (RPM) | Number of requests you can send within a minute to the Voyage AI Embedding and Reranking API endpoints. |
Actions | Actions you can take. You can:
|
If you set custom limits, the page also displays Reset all limits button to revert all the custom rate limits on the page to the default for the organization.
Reset All Rate Limits
You can reset all the custom limits that you set for a project at any time. When you reset the limits, the rate limits for the project revert to the default rate limits for the organization.
Log in to Atlas.
Go to the AI Models page in the Atlas UI.
If it's not already displayed, select your desired organization from the Organizations menu in the navigation bar.
If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
At the project level, click AI Models under the Services header in the navigation bar.
Usage Tiers
Rate limits follow a tiered system, with higher tiers offering increased limits. Qualification for a tier is based on billed usage (excluding free tokens). Atlas offers 200 million free tokens for each model. The multimodal models also include 150 billion free pixels. Once you qualify for a tier, you are never downgraded. As your usage and spending increase, Atlas automatically promotes you to the next usage tier, raising rate limits across all models.
To learn more, see Rate Limits and Usage Tiers.
Default Rate Limits
This section describes the default rate limits for each usage tier that are applied at the organization level. It also describes the rate limits that you can configure for each project.
Organization Rate Limits
The following tables show the default rate limits (TPM and RPM) based on usage tier for each Voyage AI embedding model.
Model | Tokens Per Min (TPM) | Requests Per Min (RPM) |
|---|---|---|
| 16,000,000 | 2,000 |
| 8,000,000 | 2,000 |
| 3,000,000 | 2,000 |
| 3,000,000 | 2,000 |
| 2,000,000 | 2,000 |
| 4,000,000 | 2,000 |
| 2,000,000 | 2,000 |
The rate limits for Usage Tier 2 are twice those of Usage Tier 1.
Model | Tokens Per Min (TPM) | Requests Per Min (RPM) |
|---|---|---|
| 32,000,000 | 4,000 |
| 16,000,000 | 4,000 |
| 6,000,000 | 4,000 |
| 6,000,000 | 4,000 |
| 4,000,000 | 4,000 |
| 8,000,000 | 4,000 |
| 4,000,000 | 4,000 |
The rate limits for Usage Tier 3 are three times those of Usage Tier 1.
Model | Tokens Per Min (TPM) | Requests Per Min (RPM) |
|---|---|---|
| 48,000,000 | 6,000 |
| 24,000,000 | 6,000 |
| 9,000,000 | 6,000 |
| 9,000,000 | 6,000 |
| 6,000,000 | 6,000 |
| 12,000,000 | 6,000 |
| 6,000,000 | 6,000 |
Project Rate Limits
By default, projects inherit the rate limits based on the rate limits for the organization. However, you can set different limits for each project at the project level. Project level rate limits can't exceed the rate limits for the organization. Rate limits set at the project level apply to all model API keys for the project. However, if the organization rate limit is reached first, projects might be rate-limited to a lower rate. This can occur when the sum of all project rate limits exceeds the organization limit.
Example
Consider an organization rate limit O with three projects with rate limits P1, P2, and P3. The table below illustrates three scenarios where the sum of the project rate limits is less than, equal to, or greater than the organization rate limit. For each scenario, the table indicates whether the organization limit can be reached and whether one project's usage can impact another.
Scenario 1 P1 + P2 + P3 < O | Scenario 2 P1 + P2 + P3 = O | Scenario 3 P1 + P2 + P3 > O | |
|---|---|---|---|
Scenario Description | Sum of all project rate limits is less than the organization limit. | Sum of all project rate limits is equal to the organization limit. | Sum of all project rate limits is greater than the organization limit. |
Can the organization limit be reached? | No, even if all projects reach their rate limits, the organization rate limit will not be exceeded. | Yes, if all projects reach their rate limits, the organization limit will also be reached. | Yes, as the sum of all project rate limits exceeds the organization limit, the organization limit can be reached before individual projects hit their own limits. |
Can one project's usage impact another? | No. | No. | Yes. If projects collectively consume enough usage to reach the organization limit before any or all projects reach their individual limits, projects can be rate-limited to a lower rate than their individual limits. |
Best Practices
Rate limits ensure a balanced and efficient utilization of the API's resources, preventing excessive traffic that could impact the overall performance and accessibility of the service. Specifically, rate limits serve the following vital purposes:
Rate limits promote equitable access to the API for all users. If one individual or organization generates an excessive volume of requests, it could potentially impede the API's performance for others. Through rate limiting, we ensure that a larger number of users can utilize the API without encountering performance issues.
Rate limits enable Voyage AI to effectively manage the workload on its infrastructure. Sudden and substantial spikes in API requests could strain server resources and lead to performance degradation. By establishing rate limits, Voyage AI can effectively maintain a consistent and reliable experience for all users.
They act as a safeguard against potential abuse or misuse of the API. For instance, malicious actors might attempt to inundate the API with excessive requests to overload it or disrupt its services. By instituting rate limits, Voyage AI can thwart such nefarious activities.
To avoid and manage rate limit errors, we recommend the following best practices.
Use Large Batches
If you have many documents to embed, you can increase the number of documents you embed per request and increase your overall throughput by sending larger batches. A "batch" is the collection of documents you are embedding in one request, and the "batch size" is the number of documents in the batch, meaning the length of the list of documents.
Example
Suppose you want to vectorize 512 documents. If you used a batch size of 1, then this would require 512 requests and you could hit your RPM limit. However, if you used a batch size of 128, then this would require only 4 requests and you would not hit your RPM limit. You can control the batch size by changing the number of documents you provide in the request, and using larger batch sizes will reduce your overall RPM for a given number of documents.
You must consider the API maximum batch size and tokens when selecting your batch size. You cannot exceed the API max batch size. If you have longer documents, the token limit per request might constrain you to a smaller batch size.
Set a Wait Period
Make requests less frequently. You can do this by pacing your requests, and the most straightforward approach is inserting a wait period between each request.
Perform Exponential Backoff
Backoff once you've hit your rate limit (that is, receive a 429
error). You could wait for an exponentially increased time after
receiving a rate limit error before trying again. Wait until the request
is successful or until a maximum number of retries is reached.
Example
If your initial wait time was one second and you got three consecutive rate limit errors before success, you would wait one, two, and four seconds after each rate limit error, respectively, before resending the request.