Introducing MongoDB’s Multimodal Search Library For Python
July 16, 2025 | Updated: August 18, 2025
AI applications increasingly rely on a variety of different data types—text, images, charts, and complex documents—to drive rich user experiences. For developers building these applications, determining how to effectively search and retrieve information that spans these data types presents a challenge. Developers have to consider different chunking strategies, figure out how to incorporate figures and tables, and manage context that could bleed across chunks.
To simplify this, we're excited to announce the public preview of MongoDB’s Multimodal Search Python Library. This new library makes it easy to build sophisticated applications using multimodal data, providing a single interface for integrating MongoDB Atlas Vector Search, AWS S3, and Voyage AI's multimodal embedding model voyage-multimodal-3
.
The library handles:
-
Processing and storage: It interacts with S3 for storing PDFs from a URL or referring to a PDF already stored in S3. PDFs are then turned into single-page images and stored in S3.
-
Generating embeddings: Images use voyage-multimodal-3 to produce high-quality embeddings.
-
Vector indexing: Finally, it indexes the embeddings using Atlas Vector Search and provides a reference back to S3.
The power of multimodal
Traditional search methods often struggle when dealing with documents that contain text alongside visual elements like charts and graphs, which are common in research papers, financial reports, and more. Developers typically need to build complex, custom pipelines to handle image storage, embedding generation, and vector indexing.
Our Multimodal Search Library abstracts this complexity away, using the best-in-class voyage-multimodal-3. It empowers developers to build applications that can understand and search the content of images just as easily as text. This enables accurate and efficient information retrieval and richer user experiences when working with either multimodal data or PDFs with visually rich documents.

Imagine you're a financial analyst sifting through hundreds of annual reports—dense PDFs filled with text, tables, and charts—to find a specific trend. With our Multimodal Search Library, you can simply ask a question in natural language, like: "Show me all the charts illustrating revenue growth over the past three years." The library will process the query and retrieve pages containing the relevant charts from your corpus of knowledge.
Likewise, consider an e-commerce platform with a large product catalog. A shopper might be looking for a specific style of shoes but may not know the right keywords to describe exactly what they are looking for. By leveraging multimodal search, the user could upload an image of the shoes they like, and the application finds visually similar in-stock items, creating a seamless product discovery journey.
Learn how to get started
To get started, you’ll need:
-
A MongoDB Atlas cluster (sign up for the free tier)
-
A MongoDB collection in that cluster
-
A MongoDB Atlas Vector Search index
-
A Voyage AI API key (sign up)
-
An S3 bucket (sign up)
Installation and setup
First, we’ll ensure that we can connect to MongoDB Atlas, AWS S3, and Voyage AI.
pip install pymongo-voyageai-multimodal

import os
from pymongo import MongoClient
from pymongo_voyageai_multimodal import PyMongoVoyageAI

client = PyMongoVoyageAI.from_connection_string(
 connection_string=os.environ["MONGODB_ATLAS_CONNECTION_STRING"],
 database_name="db_name",
 collection_name="collection_name",
 s3_bucket_name=os.environ["S3_BUCKET_NAME"],
 voyageai_api_key=os.environ["VOYAGEAI_API_KEY"],
)

Adding documents
Next, we’ll add relevant documents for embedding generation.
from pymongo_voyageai_multimodal import TextDocument, ImageDocument

text = TextDocument(text="foo", metadata={"baz": "bar"})
images = client.url_to_images(
 "https://www.fdrlibrary.org/documents/356632/390886/readingcopy.pdf"
)
documents = [text, images[0], images[1]]
ids = ["1", "2", "3"]
client.add_documents(documents=documents, ids=ids)

Performing search
Finally, we’ll search for content most semantically similar to our query.
results = client.similarity_search(query="example", k=1)
for doc in results:
 print(f"* {doc['id']} [{doc['inputs']}]")

Loading data already stored in S3
Developers can also query against documents already stored in S3. See more information in the documentation.
import os
from pymongo_voyageai_multimodal import PyMongoVoyageAI

client = PyMongoVoyageAI(
 voyageai_api_key=os.environ["VOYAGEAI_API_KEY"],
 s3_bucket_name=os.environ["S3_BUCKET_NAME"],
 mongo_connection_string=os.environ["MONGODB_URI"],
 collection_name="test",
 database_name="test_db",
)

query = "The consequences of a dictator's peace"
url = "s3://my-bucket-name/readingcopy.pdf"
images = client.url_to_images(url)
resp = client.add_documents(images)
client.wait_for_indexing()
data = client.similarity_search(query, extract_images=True)

print(f"Found {len(data)} relevant pages")
client.close()

A few important notes:
-
Automatic updates to source data are not supported. Changes to indexed data need to be made via application code calling the client using the add_documents and delete functions.
-
This library is primarily meant to support integrating multimodal embeddings and MongoDB Atlas on relatively static datasets. It is not intended to support sophisticated aggregation pipelines that combine multiple stages or data that updates frequently.
-
voyage-multimodal-3 is the only embedding model supported directly, and AWS is the only cloud provider supported directly.
Ready to try it yourself?
Check out the Github project today to get started.
Learn more in our documentation, and please share feedback.
We can't wait to see what you build!