You can explore and deploy Voyage AI by MongoDB models from the Google Cloud Model Garden.
Model Garden manages licenses for Voyage AI by MongoDB models and provides deployment options using either on-demand hardware or your existing Compute Engine reservations.
Voyage AI by MongoDB models are self-deployed partner models, meaning you pay for both the model usage and the Vertex AI infrastructure consumed. Vertex AI handles deployment and provides endpoint management features.
Available Models
To see which models you can deploy, search for "Voyage" in the Google Cloud Model Garden.
To learn more about Voyage AI models, see Models Overview.
Pricing
Pricing for Voyage AI by MongoDB models in the Google Cloud Model Garden includes:
Model Usage fee: A cost for using the Voyage AI model container, billed at an hourly rate. The usage fee depends on the specific model and the hardware configuration you choose for deployment. For detailed pricing information, see the pricing section on the model's listing page in the Google Cloud Marketplace.
Google Cloud underlying instance in your region: The cost of the underlying Google Cloud GPU instance, such as NVIDIA L4, A100, or H100) that is specific to a region, is billed monthly and priced per vCPU. To learn more, see Google Cloud Compute Engine pricing.
All billing charges appear as the use of Vertex AI on your Google Cloud bill.
To view pricing for a specific Voyage AI model:
Go to Model Garden.
Quotas
When you deploy Voyage AI models, you consume Vertex AI resources that are subject to quotas. You can view and manage your quotas in the Quotas section of the Google Cloud Console's IAM page. For more information, see View the quotas for your project. In the same page, you can right-click any current quota, click Edit quota, and submit a request to increase your quota if needed.
Prerequisites
To get started using the Voyage AI by MongoDB models through Google Cloud Vertex AI, you must:
Set up a Google Cloud project and a development environment. For instructions, see Set up your project and development environment.
Enable the Vertex AI API. For instructions, see Setup.
Hardware Configuration
Each model in the Model Garden lists its recommended hardware configuration. Consult the Google Cloud Model Garden for Vertex AI for each Voyage model's recommended hardware specifications.
For example, for the voyage-4 model, use the following recommended instances that the Vertex AI Model Garden suggests for deployment. These recommendations may change and we recommend that you consult the official Google Cloud Model Garden page for a particular Voyage AI model to see its recommended hardware.
A2 instances, such as
a2-highgpu-1gora2-ultragpu-1g, with A100 GPUs are the default choice.A3 instances, such as
a3-highgpu-1g, with H100 GPUs are recommended for higher performance needs.
Supported Regions
The Model Garden lists supported regions for each Voyage AI model. If you need support in another region for any of the models, contact MongoDB support.
Best Practices and Limitations
Endpoint Type: All Voyage AI models require a dedicated public endpoint type. For more information, see Choose an endpoint type.
Understand input_type: Query vs. Document: The
input_typeparameter optimizes embeddings for retrieval tasks. Use"query"for search queries and"document"for content being searched. This optimization improves retrieval accuracy. To learn more about theinput_typeparameter, see the Embedding and Reranking API Overview.Use Different Output Dimensions: Voyage 4 models support multiple output dimensions: 256, 512, 1024 (default), and 2048. Smaller dimensions reduce storage and computation costs, while larger dimensions may provide better accuracy. Choose the dimension that best balances your accuracy requirements with resource constraints.
Locate the Voyage AI Models
To find Voyage AI by MongoDB models in the Model Garden:
Go to Model Garden.
Go to the Model Garden console.
Search for Voyage models.
In the Search Models field, enter "Voyage" to display the list of Voyage AI by MongoDB models.
Note
The Google Cloud Marketplace has two search boxes: one for the entire Marketplace and one within the Vertex AI Model Garden site. To locate Voyage AI by MongoDB models, use the search box on the Vertex AI Model Garden site.
Alternatively, you can navigate to Voyage AI models through Model Garden > Model Collections > Partner Models, and then select any of the Voyage AI models listed there.
You can also scroll down to Task-specific solutions to find Voyage AI models that you can use as-is or customize to your needs.
Deploy a Voyage AI Model in Vertex AI
To make predictions using a Voyage AI by MongoDB model, you must deploy it to a private dedicated endpoint for online inferences. Deployment associates physical resources with a model for low-latency and high-throughput online predictions. You can deploy multiple models to one endpoint, or the same model to multiple endpoints.
When you deploy a model, consider the following options:
Endpoint location
Model container
Compute resources required to run the model
Once you deploy a model, you can't change these settings. If you need to modify any deployment configuration, you must undeploy the model and redeploy it with the new settings.
Voyage AI models require a dedicated public endpoint. For more information, see Create a public endpoint in the Google Cloud Vertex AI documentation.
To deploy a model in the Google Cloud Vertex AI using the console:
Locate the model.
Go the Model Garden console and search for "Voyage" in the Search Models field to display the list of Voyage AI by MongoDB models.
Enable the model and accept the agreement.
Click Enable. The MongoDB Marketplace End User Agreement opens. Review and accept the agreement to enable the model and get the necessary commercial use licenses.
Review deployment options.
After you accept the agreement, the model page displays the following options:
Deploy a model: Saves the model to the Model Registry and deploys it to an endpoint in Google Cloud. Continue with the following steps to deploy using the console.
Create an Open Notebook for Voyage Embedding Models Family: Lets you fine-tune and customize your model in a collaborative environment, and mix and match models for optimal cost and performance. See Vertex AI Notebook Samples for Voyage AI.
View Code: Displays code samples for deploying and using the model. To deploy programmatically using code, see Deploy Using Code.
Fill out the deployment form.
A form opens that allows you to review and edit the deployment options. Vertex AI provides default settings that are optimized for the model, but you can customize them as needed. For example, you can select the machine type, GPU type, and number of replicas. The following example shows default settings for the voyage-4 model, but these may change, so review the settings carefully before deploying.
Field | Description |
|---|---|
Resource ID | Select from the dropdown menu (preselected). |
Model Name | Select from the dropdown menu (preselected). |
Region | Select your desired region, such as |
Endpoint name | Provide a name for your endpoint, such as
|
Serving spec | Select the machine type, such as |
Accelerator type | Select the GPU type, such as |
Accelerator count | Specify the number of GPUs, such as |
Replica count | Specify the minimum and maximum number of replicas,
such as |
Reservation type | Select reservation type, such as |
VM provisioning model | Select provisioning model, such as |
Endpoint access | Select Public (Dedicated endpoint). |
Deploy Using Code
If you selected View Code from the model details page, you can deploy a model programmatically using the Vertex AI SDK. This approach provides full control over deployment configuration through code.
For more information about the Google Cloud Vertex AI SDK, see the Vertex AI SDK for Python documentation.
Note
The code examples in this section are for the voyage-4 model and are subject to change. For the most current code examples, consult the View Code tab on the model's page in the Model Garden. For other Voyage AI models, the code is similar, but check that model's page in the Model Garden for model-specific details.
To deploy a model using code:
Deploy to an endpoint.
Choose whether to deploy a new model or use an existing endpoint:
# Choose whether to deploy a new model or use an existing endpoint: deployment_option = "deploy_new" # ["deploy_new", "use_existing"] # If using existing endpoint, provide the endpoint ID: ENDPOINT_ID = "" # {type:"string"} if deployment_option == "deploy_new": print("Deploying new model...") endpoint = model.deploy( machine_type="a3-highgpu-1g", accelerator_type="NVIDIA_H100_80GB", accelerator_count=1, accept_eula=True, use_dedicated_endpoint=True, ) print(f"Endpoint deployed: {endpoint.display_name}") print(f"Endpoint resource name: {endpoint.resource_name}") else: if not ENDPOINT_ID: raise ValueError("Please provide an ENDPOINT_ID when using existing endpoint") from google.cloud import aiplatform print(f"Connecting to existing endpoint: {ENDPOINT_ID}") endpoint = aiplatform.Endpoint( endpoint_name=f"projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}" ) print(f"Using endpoint: {endpoint.display_name}") print(f"Endpoint resource name: {endpoint.resource_name}")
Important
Set use_dedicated_endpoint to True as Voyage AI models require
a dedicated public endpoint.
Vertex AI deploys the model to a managed endpoint that you can access to make online inferences or batch inferences through the Google Cloud console or the Vertex AI API.
For more information, see Deploy a model to an endpoint in the Google Cloud Vertex AI documentation.
Make predictions.
After deployment, you can make predictions using the Vertex AI endpoint.
For all endpoint parameters and prediction options, see the Embedding and Reranking API Overview.
import json # Multiple texts to embed texts = [ "Machine learning enables computers to learn from data.", "Natural language processing helps computers understand human language.", "Computer vision allows machines to interpret visual information.", "Deep learning uses neural networks with multiple layers." ] # Prepare the batch request and make invoke call body = { "input": texts, "output_dimension": 1024, "input_type": "document" } response = endpoint.invoke( request_path="/embeddings", body=json.dumps(body).encode("utf-8"), headers={"Content-Type": "application/json"} ) # Extract embeddings result = response.json() embeddings = [item["embedding"] for item in result["data"]] print(f"Number of texts embedded: {len(embeddings)}") print(f"Embedding dimension: {len(embeddings[0])}") print(f"\nFirst embedding (first 5 values): {embeddings[0][:5]}") print(f"Second embedding (first 5 values): {embeddings[1][:5]}")
Undeploy a Model and Delete the Endpoint
To remove a deployed model and its endpoint:
Undeploy the model from the endpoint.
Optionally delete the endpoint itself.
For detailed instructions, see Undeploy a model and delete the endpoint in the Google Cloud Vertex AI documentation.
Important
You can delete the endpoint only after all models have been undeployed from it. Undeploying models and deleting the endpoint stops all inference services and billing for that endpoint.