Keywords Meet Vectors: Hybrid Search on MongoDB

Modern applications increasingly rely on intelligent search capabilities. Users expect results that are both precise (matching keywords) and smart (understanding meaning). MongoDB brings these worlds together with hybrid search, combining semantic vector search and traditional full-text search into one unified experience.

What you’ll learn

In this article, you’ll learn:

  • What vector search is and how it works in MongoDB.

  • What full-text (BM25) search is and what problems it solves.

  • Why combining both techniques creates more relevant search results.

  • How embeddings are stored and indexed directly in MongoDB.

  • How to build hybrid pipelines that blend semantics and metadata (e.g., IMDb ratings).

  • How ranking strategies such as Reciprocal Rank Fusion (RRF) merge results effectively.

What is vector search?

Vector search uses embeddings — high-dimensional numerical representations created by machine learning models — to find items that are semantically similar.
Instead of matching exact words, vector search measures the meaning behind text.

This allows your application to retrieve documents that are conceptually related, even when they don’t share any keywords. For example, a vector query based on “dream heist” may return Inception even if the phrase never appears in the plot description.

Vector search in MongoDB is powered by k-nearest neighbors (k-NN) over vectors stored in BSON Binary (Float32), indexed using knnVector indexes.

What is full-text search?

Full-text search in MongoDB (via MongoDB Search and Lucene) uses BM25, a highly effective ranking algorithm for keyword-based retrieval.
It excels at matching exact terms, synonyms, textual relevance, and user intent when the query is literal.

Full-text search handles:

  • keywords

  • phrases

  • stemming

  • language analyzers

  • scoring based on term frequency and importance

For queries like “computer hacker” or “time travel movie,” this is the most accurate approach.

Why use hybrid search?

When you combine vector search (semantic meaning) with full-text search (keyword precision), you get the best of both worlds.

  • Precision + context: Match exact terms and understand the deeper meaning.

  • Better ranking: Rerank by metadata — rating, genre, year, popularity.

  • All-in-one system: Run BM25 and k-NN directly inside MongoDB.

  • Production-ready pipelines: Use aggregation to filter, fuse, and customize scoring.

  • Ideal for AI: Recommendation engines, semantic search, assistants, and LLM retrieval.

Read it here: Keywords Meet Vectors: Hybrid Search on MongoDB

2 Likes

Thanks @Arkadiusz_Borucki for sharing! This is a great tutorial.

2 Likes

Like @Veronica_Cooley-Perry said - this is a great tutorial! Along those lines, we now have a $rankFusion stage that does the same sort of thing but a bit more streamlined underneath with fusing lexical and vector searches together - check it out here: $rankFusion (aggregation) - Database Manual - MongoDB Docs

I’ve just written a bit about fusion techniques using this new operator as well as the cool $scoreFusion stage < $scoreFusion (aggregation) - Database Manual - MongoDB Docs > - it goes well with this article to show how these fusion techniques work at their most basic level (without lexical or vector searches), just two ranked/scored lists being blended together, check it out here:

Reciprocal Rank Fusion and Relative Score Fusion: Classic Hybrid Search Techniques

3 Likes

Hi @Erik_Hatche

Thank you for sharing this information. I see that this is a new aggregation pipeline stage. I’ll use this next time!

1 Like