POST /contextualizedembeddings

Creates contextualized vector embeddings for document chunks. These embeddings capture both local details within each chunk and global context from the entire document.

This endpoint accepts queries, full documents, or document chunks and returns embeddings that are context-aware across the entire document.

application/json

Body Required

  • inputs array[array] Required

    A list of lists, where each inner list contains a query, a document, or document chunks to be vectorized.

    Each inner list in inputs represents a set of text elements that are embedded together. Each element in the list is encoded not just independently, but also encodes context from the other elements in the same list.

    inputs = [["text_1_1", "text_1_2", ..., "text_1_n"],
              ["text_2_1", "text_2_2", ..., "text_2_m"]]
    

    Document Chunks. Most commonly, each inner list contains chunks from a single document, ordered by their position in the document. In this case:

    inputs = [["doc_1_chunk_1", "doc_1_chunk_2", ..., "doc_1_chunk_n"],
              ["doc_2_chunk_1", "doc_2_chunk_2", ..., "doc_2_chunk_m"]]
    

    Each chunk is encoded in context with the others from the same document, resulting in more context-aware embeddings. Supplied chunks should not have any overlap.

    Context-Agnostic Behavior for Queries and Documents. If there is one element per inner list, each text is embedded independently—similar to standard (context-agnostic) embeddings:

    inputs = [["query_1"], ["query_2"], ..., ["query_k"]]
    inputs = [["doc_1"], ["doc_2"], ..., ["doc_k"]]
    

    Therefore, if the inputs are queries, each inner list should contain a single query (a length of one), as shown above, and the input_type should be set to query.

    The following constraints apply to the inputs list:

    • The list must not contain more than 1,000 inputs.
    • The total number of tokens across all inputs must not exceed 120K.
    • The total number of chunks across all inputs must not exceed 16K.

    For queries, the list contains only a single query. For documents or document chunks, the list should include all chunks from a single document, ordered by their position in the document, or the entire document may be provided as a single chunk. The total number of tokens in the list must not exceed 32,000 tokens.

    At least 1 but not more than 1000 elements.

    For queries, the list contains only a single query. For documents or document chunks, the list should include all chunks from a single document, ordered by their position in the document, or the entire document may be provided as a single chunk. The total number of tokens in the list must not exceed 32,000 tokens.

    At least 1 element. Minimum length of each is 1.

  • model string Required

    The contextualized embedding model to use. Recommended model: voyage-context-3.

    Value is voyage-context-3.

  • input_type

    Type of the input text. Defaults to null. Other options: query, document.

    • When input_type is null, the embedding model directly converts the inputs into numerical vectors. For retrieval or search purposes, where a "query" searches for relevant information among a collection of data referred to as "documents," specify whether your inputs are queries or documents by setting input_type to query or document, respectively. In these cases, Voyage automatically prepends a prompt to your inputs before vectorizing them, creating vectors more tailored for retrieval or search tasks. Embeddings generated with and without the input_type argument are compatible.
    • For transparency, the following prompts are prepended to your input:
      • For query, the prompt is "Represent the query for retrieving supporting documents: ".
      • For document, the prompt is "Represent the document for retrieval: ".

    Values are query, document, or null.

  • output_dimension integer | null

    The number of dimensions for resulting output embeddings. Defaults to null. voyage-context-3 supports the following output_dimension values: 2048, 1024 (default), 512, and 256. If set to null, the model uses the default value of 1024.

    Values are 256, 512, 1024, 2048, or null.

  • output_dtype string

    The data type for the embeddings to be returned. Defaults to float. Other options: int8, uint8, binary, ubinary.

    • float: Each returned embedding is a list of 32-bit (4-byte) single-precision floating-point numbers. This is the default and provides the highest precision / retrieval accuracy.
    • int8 and uint8: Each returned embedding is a list of 8-bit (1-byte) integers ranging from -128 to 127 and 0 to 255, respectively.
    • binary and ubinary: Each returned embedding is a list of 8-bit integers that represent bit-packed, quantized single-bit embedding values: int8 for binary and uint8 for ubinary. The length of the returned list of integers is 1/8 of output_dimension (which is the actual dimension of the embedding). The binary type uses the offset binary method.

    Default value is float.

  • encoding_format

    Format in which the embeddings are encoded. Defaults to null. Other options: base64.

    • If null, each embedding is an array of float numbers when output_dtype is set to float and an array of integers for all other values of output_dtype (int8, uint8, binary, and ubinary). See output_dtype for more details.
    • If base64, the embeddings are represented as a Base64-encoded NumPy array of:
      • Floating-point numbers (numpy.float32) for output_dtype set to float.
      • Signed integers (numpy.int8) for output_dtype set to int8 or binary.
      • Unsigned integers (numpy.uint8) for output_dtype set to uint8 or ubinary.

    Values are base64 or null.

Responses

  • 200 application/json

    Success

    Hide response attributes Show response attributes object
    • object string Required

      The object type, which is always "list".

      Value is list.

    • data array[object] Required

      An array of contextualized embeddings.

      Hide data attributes Show data attributes object
      • object string Required

        The object type, which is always "list".

        Value is list.

      • data array[object] Required

        An array of embedding objects, one for each chunk in the input list.

        Hide data attributes Show data attributes object
        • object string Required

          The object type, which is always "embedding".

          Value is embedding.

        • embedding array[number] | string Required

          The embedding vector. When encoding_format is null, this is an array of numbers (floats when output_dtype is float, integers for int8, uint8, binary, and ubinary). When encoding_format is base64, this is a base64-encoded string.

          One of:

          Array format when encoding_format is null

          Base64-encoded format when encoding_format is base64

        • index integer Required

          An integer representing the index of the query or the contextualized chunk embedding within the list of embeddings from the same document.

      • index integer Required

        An integer representing the index of the query or document within the list of queries or documents, respectively.

    • model string Required

      Name of the model.

    • usage object Required
      Hide usage attribute Show usage attribute object
      • total_tokens integer Required

        The total number of tokens used for computing the embeddings.

  • 400 application/json

    Invalid Request

    Hide response attribute Show response attribute object
    • detail string

      The request is invalid. This error can occur due to invalid JSON, invalid parameter types, incorrect data types, batch size too large, total tokens exceeding the limit, or tokens in an example exceeding context length.

  • 401 application/json

    Unauthorized

    Hide response attribute Show response attribute object
    • detail string

      Invalid authentication. Ensure your model API key is correctly specified in the Authorization header as Bearer VOYAGE_API_KEY.

  • 403 application/json

    Forbidden

    Hide response attribute Show response attribute object
    • detail string

      Access forbidden. This may occur if the IP address you are sending the request from is not allowed.

  • 429 application/json

    Rate Limit Exceeded

    Hide response attribute Show response attribute object
    • detail string

      Rate limit exceeded. Your request frequency or token usage is too high. Reduce your request rate or wait before retrying.

  • 500 application/json

    Internal Server Error

    Hide response attribute Show response attribute object
    • detail string

      An unexpected error occurred on the server. Retry your request after a brief wait.

  • 502 application/json

    Bad Gateway

    Hide response attribute Show response attribute object
    • detail string

      The server received an invalid response from an upstream server. Retry your request after a brief wait.

  • 503 application/json

    Service Unavailable

    Hide response attribute Show response attribute object
    • detail string

      The service is temporarily unavailable due to high traffic or maintenance. Retry your request after a brief wait.

  • 504 application/json

    Gateway Timeout

    Hide response attribute Show response attribute object
    • detail string

      The server did not receive a timely response from an upstream server. Retry your request after a brief wait.

POST /contextualizedembeddings
curl \
 --request POST 'https://ai.mongodb.com/v1/contextualizedembeddings' \
 --header "Authorization: Bearer $ACCESS_TOKEN" \
 --header "Content-Type: application/json" \
 --data '{"inputs":[["string"]],"model":"voyage-context-3","input_type":"query","output_dimension":256,"output_dtype":"float","encoding_format":"base64"}'
Request examples
{
  "inputs": [
    [
      "string"
    ]
  ],
  "model": "voyage-context-3",
  "input_type": "query",
  "output_dimension": 256,
  "output_dtype": "float",
  "encoding_format": "base64"
}
Response examples (200)
{
  "object": "list",
  "data": [
    {
      "object": "list",
      "data": [
        {
          "object": "embedding",
          "embedding": [
            42.0
          ],
          "index": 42
        }
      ],
      "index": 42
    }
  ],
  "model": "string",
  "usage": {
    "total_tokens": 42
  }
}
Response examples (400)
{
  "detail": "string"
}
Response examples (401)
{
  "detail": "string"
}
Response examples (403)
{
  "detail": "string"
}
Response examples (429)
{
  "detail": "string"
}
Response examples (500)
{
  "detail": "string"
}
Response examples (502)
{
  "detail": "string"
}
Response examples (503)
{
  "detail": "string"
}
Response examples (504)
{
  "detail": "string"
}