Create multimodal embeddings

POST /multimodalembeddings

Creates vector embeddings for multimodal inputs consisting of text, images, or a combination of both.

This endpoint accepts inputs that can contain text and images in any combination and returns their vector representations.

application/json

Body Required

inputs array[object] Required
A list of multimodal inputs to be vectorized.

A single input in the list is a dictionary containing a single key "content", whose value represents a sequence of text and images.
- The value of "content" is a list of dictionaries, each representing a single piece of text or image. The dictionaries have four possible keys:
- type: Specifies the type of the piece of the content. Allowed values are text, image_url, or image_base64.
- text: Only present when type is text. The value should be a text string.
- image_base64: Only present when type is image_base64. The value should be a Base64-encoded image in the data URL format data:[<mediatype>];base64,<data>. Currently supported mediatypes are: image/png, image/jpeg, image/webp, and image/gif.
- image_url: Only present when type is image_url. The value should be a URL linking to the image. We support PNG, JPEG, WEBP, and GIF images.
- Note: Only one of the keys, image_base64 or image_url, should be present in each dictionary for image data. Consistency is required within a request, meaning each request should use either image_base64 or image_url exclusively for images, not both.
Example payload where inputs contains an image as a URL:

The inputs list contains a single input, which consists of a piece of text and an image (which is provided via a URL).
```
{ "inputs": [ { "content": [ { "type": "text", "text": "This is a banana." }, { "type": "image_url", "image_url": "https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg" } ] } ], "model": "voyage-multimodal-3.5" }
```
Example payload where inputs contains a Base64 image:

Below is an equivalent example to the one above where the image content is a Base64 image instead of a URL. (Base64 images can be lengthy, so the example only shows a shortened version.)
```
{ "inputs": [ { "content": [ { "type": "text", "text": "This is a banana." }, { "type": "image_base64", "image_base64": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..." } ] } ], "model": "voyage-multimodal-3.5" }
```
The following constraints apply to the inputs list:
- The list must not contain more than 1000 inputs.
- Each image must not contain more than 16 million pixels or be larger than 20 MB in size.
- With every 560 pixels of an image being counted as a token, each input in the list must not exceed 32,000 tokens, and the total number of tokens across all inputs must not exceed 320,000.
At least 1 but not more than 1000 elements.
Hide inputs attribute Show inputs attribute object
- content array[object] Required
  
  A sequence of text and images.
  
  At least 1 element.
  One of:
  object-1 object object-2 object object-3 object
  
  Image content via Base64
  
  Hide attributes Show attributes
  
  type string Required
  
  Value is image_base64.
  
  image_base64 string Required
  
  Base64-encoded image in data URL format (data:[mediatype];base64,[data]). Supported mediatypes are image/png, image/jpeg, image/webp, and image/gif.
  
  Minimum length is 1.
model string Required

The multimodal embedding model to use. Recommended model: voyage-multimodal-3.5.

Values are voyage-multimodal-3.5 or voyage-multimodal-3.
input_type
Type of the input. Defaults to null. Other options: query, document.
- When input_type is null, the embedding model directly converts the inputs into numerical vectors. For retrieval or search purposes, where a "query", which can be text or image in this case, searches for relevant information among a collection of data referred to as "documents," specify whether your inputs are queries or documents by setting input_type to query or document, respectively. In these cases, Voyage automatically prepends a prompt to your inputs before vectorizing them, creating vectors more tailored for retrieval or search tasks. Since inputs can be multimodal, "queries" and "documents" can be text, images, or an interleaving of both modalities. Embeddings generated with and without the input_type argument are compatible.
- For transparency, the following prompts are prepended to your input:
- For query, the prompt is "Represent the query for retrieving supporting documents: ".
- For document, the prompt is "Represent the document for retrieval: ".
Values are query, document, or null.
truncation boolean
Whether to truncate the inputs to fit within the context length. Defaults to true.
- If true, over-length inputs are truncated to fit within the context length before vectorization by the embedding model. If the truncation happens in the middle of an image, the entire image is discarded.
- If false, an error occurs if any input exceeds the context length.
Default value is true.
output_encoding
Format in which the embeddings are encoded. Defaults to null.
- If null, the embeddings are represented as a list of floating-point numbers.
- If base64, the embeddings are represented as a Base64-encoded NumPy array of single-precision floats.
Values are base64 or null.

Responses

200 application/json

Success
Hide response attributes Show response attributes object
- object string Required
  
  The object type, which is always list.
  
  Value is list.
- data array[object] Required
  
  An array of embedding objects.
  
  Hide data attributes Show data attributes object
  
  object string Required
  
  The object type, which is always embedding.
  
  Value is embedding.
  
  embedding array[number] | string Required
  
  The embedding vector. When output_encoding is null, this is an array of floating-point numbers. When output_encoding is base64, this is a base64-encoded string.
  
  One of:
  array-1 array[number] string-2 string
  
  Array format when output_encoding is null
  
  Base64-encoded format when output_encoding is base64
  
  index integer Required
  
  An integer representing the index of the embedding within the list of embeddings.
- model string Required
  
  Name of the model.
- usage object Required
  
  Hide usage attributes Show usage attributes object
  
  text_tokens integer Required
  
  The total number of text tokens in the list of inputs.
  
  image_pixels integer Required
  
  The total number of image pixels in the list of inputs.
  
  total_tokens integer Required
  
  The combined total of text and image tokens. Every 560 pixels counts as a token.
400 application/json

Invalid Request
Hide response attribute Show response attribute object
- detail string
  
  The request is invalid. This error can occur due to invalid JSON, invalid parameter types, incorrect data types, batch size too large, total tokens exceeding the limit, or tokens in an example exceeding context length.
401 application/json

Unauthorized
Hide response attribute Show response attribute object
- detail string
  
  Invalid authentication. Ensure your model API key is correctly specified in the Authorization header as Bearer VOYAGE_API_KEY.
403 application/json

Forbidden
Hide response attribute Show response attribute object
- detail string
  
  Access forbidden. This may occur if the IP address you are sending the request from is not allowed.
429 application/json

Rate Limit Exceeded
Hide response attribute Show response attribute object
- detail string
  
  Rate limit exceeded. Your request frequency or token usage is too high. Reduce your request rate or wait before retrying.
500 application/json

Internal Server Error
Hide response attribute Show response attribute object
- detail string
  
  An unexpected error occurred on the server. Retry your request after a brief wait.
502 application/json

Bad Gateway
Hide response attribute Show response attribute object
- detail string
  
  The server received an invalid response from an upstream server. Retry your request after a brief wait.
503 application/json

Service Unavailable
Hide response attribute Show response attribute object
- detail string
  
  The service is temporarily unavailable due to high traffic or maintenance. Retry your request after a brief wait.
504 application/json

Gateway Timeout
Hide response attribute Show response attribute object
- detail string
  
  The server did not receive a timely response from an upstream server. Retry your request after a brief wait.

POST /multimodalembeddings

curl \
 --request POST 'https://ai.mongodb.com/v1/multimodalembeddings' \
 --header "Authorization: Bearer $ACCESS_TOKEN" \
 --header "Content-Type: application/json" \
 --data '{"inputs":[{"content":[{"type":"text","text":"string"}]}],"model":"voyage-multimodal-3.5","input_type":"query","truncation":true,"output_encoding":"base64"}'

Request examples

{
  "inputs": [
    {
      "content": [
        {
          "type": "text",
          "text": "string"
        }
      ]
    }
  ],
  "model": "voyage-multimodal-3.5",
  "input_type": "query",
  "truncation": true,
  "output_encoding": "base64"
}

Response examples (200)

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        42.0
      ],
      "index": 42
    }
  ],
  "model": "string",
  "usage": {
    "text_tokens": 42,
    "image_pixels": 42,
    "total_tokens": 42
  }
}

Response examples (400)

{
  "detail": "string"
}

Response examples (401)

{
  "detail": "string"
}

Response examples (403)

{
  "detail": "string"
}

Response examples (429)

{
  "detail": "string"
}

Response examples (500)

{
  "detail": "string"
}

Response examples (502)

{
  "detail": "string"
}

Response examples (503)

{
  "detail": "string"
}

Response examples (504)

{
  "detail": "string"
}