Now Available: voyage-multimodal-3.5

January 15, 2026

What it is:

voyage-multimodal-3.5 is a next-generation multimodal embedding model built for retrieval over text, images, and videos. It embeds interleaved text and images (screenshots, PDFs, tables, figures, slides), and adds explicit support for video frames. It's also the first production-grade video embedding model to support flexible dimensionality, enabled by Matryoshka learning.

Who it’s for:


This new model is designed for developers seeking higher retrieval accuracy across multiple modalities—text, images, and videos—while matching state-of-the-art text models on pure-text search.

Why it matters:


voyage-multimodal-3, the industry’s first production-grade multimodal model capable of embedding interleaved texts and images, was released over a year ago. Since then, it has enabled numerous customers to build search and retrieval pipelines over text, PDFs, figures, tables, and other documents rich with visuals. voyage-multimodal-3.5 introduces support for embedding videos while further improving upon voyage-multimodal-3 in terms of retrieval quality.

How to get started:

Sign up, generate a model API key, and get 200M free tokens and 150B pixels. Dive into the documentation and start building with the quick start.

Related Content

Blog

voyage-multimodal-3.5: a new multimodal retrieval frontier with video support

Blog

Introducing the Embedding and Reranking API on MongoDB Atlas

Web

Voyage AI