January 15, 2026
What it is:
voyage-multimodal-3.5 is a next-generation multimodal embedding model built for retrieval over text, images, and videos. It embeds interleaved text and images (screenshots, PDFs, tables, figures, slides), and adds explicit support for video frames. It's also the first production-grade video embedding model to support flexible dimensionality, enabled by Matryoshka learning.
Who it’s for:
This new model is designed for developers seeking higher retrieval accuracy across multiple modalities—text, images, and videos—while matching state-of-the-art text models on pure-text search.
Why it matters:
voyage-multimodal-3, the industry’s first production-grade multimodal model capable of embedding interleaved texts and images, was released over a year ago. Since then, it has enabled numerous customers to build search and retrieval pipelines over text, PDFs, figures, tables, and other documents rich with visuals. voyage-multimodal-3.5 introduces support for embedding videos while further improving upon voyage-multimodal-3 in terms of retrieval quality.
Sign up, generate a model API key, and get 200M free tokens and 150B pixels. Dive into the documentation and start building with the quick start.
Blog
voyage-multimodal-3.5: a new multimodal retrieval frontier with video support
Blog
Introducing the Embedding and Reranking API on MongoDB Atlas
Web
Voyage AI