Vector Search and LLM Essentials - What, When and Why

Jose Parra
October 16, 2023 | Updated: October 6, 2025

This post is also available in: Deutsch, Français, Español, Português

Vector search and, more broadly, Artificial Intelligence (AI) are more popular now than ever. These terms are arising everywhere. Technology companies around the globe are scrambling to release vector search and AI features in an effort to be part of this growing trend. As a result, it's unusual to come across a homepage for a data-driven business and not see a reference to vector search or large language models (LLMs). In this blog, we'll cover what these terms mean while examining the events that led to their current trend.

Check out our AI Learning Hub to learn more about building AI-powered apps with MongoDB.

What is vector search

Vectors are encoded representations of unstructured data like text, images, and audio in the form of arrays of numbers.

Diagram showing how data is turned into vectors by embedding models. — Figure 1: Data is turned into vectors by embedding models

These vectors are produced by machine learning (ML) techniques called "embedding models". These models are trained on large corpuses of data. Embedding models effectively capture meaningful relationships and similarities between data. This enables users to query data based on the meaning rather than the data itself. This fact unlocks more efficient data analysis tasks like recommendation systems, language understanding, and image recognition.

Every search starts with a query and, in vector search, the query is represented by a vector. The job of vector search is finding, from the vectors stored on a database, those that are most similar to the vector of the query. This is the basic premise. It is all about similarity. This is why vector search is often called similarity search. Note: similarity also applies to ranking algorithms that work with non-vector data.

To understand the concept of vector similarity, let’s picture a three-dimensional space. In this space, the location of a data point is fully determined by three coordinates.

Three dimensional graph diagram showing how the location of data is determined by three coordinates. — Figure 2: Location of a point P in a three-dimensional space

In the same way, if a space has 1024 dimensions, it takes 1024 coordinates to locate a data point.

Diagram showing a multi-dimensional space represented by a globe. — Figure 3: Point P in a sphere that represents a multi-dimensional space

Vectors also provide the location of data points in multidimensional spaces. In fact, we can treat the values in a vector as an array of coordinates. Once we have the location of the data points — the vectors — their similarity to each other is calculated by measuring the distance between them in the vector space. Points that are closer to each other in the vector space represent concepts that are more similar in meaning.

For example, "tire" has a greater similarity to "car" and a lesser one to "airplane." However, "wing" would only have a similarity to "airplane." Therefore, the distance between the vectors for “tire” and “car” would be smaller than the distance between the vectors for “tire” and “airplane.” Yet, the distance between “wing” and “car” would be enormous. In other words, “tire” is relevant when we talk about a “car,” and to a lesser extent, an “airplane.” However, a “wing” is only relevant when we talk about an “airplane” and not relevant at all when we talk about a “car” (at least until flying cars are a viable mode of transport). The contextualization of data — regardless of the type — allows vector search to retrieve the most relevant results to a given query.

A simple example of similarity

Table showing the similarity between different terms — Table 1: Example of similarity between different terms

What are Large Language Models?

LLMs are what bring AI to the vector search equation. LLMs and human minds both understand and associate concepts in order to perform certain natural language tasks, such as following a conversation or understanding an article. LLMs, like humans, need training in order to understand different concepts. For example, do you know what the term “corium” pertains to? Unless you're a nuclear engineer, probably not. The same happens with LLMs: if they are not trained in a specific domain, they are not able to understand concepts and therefore perform poorly. Let’s look at an example.

LLMs understand pieces of text thanks to their embedding layer. This is where words or sentences are converted into vectors. In order to visualize vectors, we are going to use word clouds. Word clouds are closely related to vectors in the sense that they are representations of concepts and their context. First, let’s see the word cloud that an embedding model would generate for the term “corium” if it was trained with nuclear engineering data:

Sample word cloud. — Figure 4: Sample word cloud from a model trained with nuclear data

As shown in the picture above, the word cloud indicates that corium is a radioactive material that has something to do with safety and containment structures. But, corium is a special term that can also be applied to another domain. Let’s see the word cloud resulting from an embedding model that has been trained in biology and anatomy:

Another sample word cloud. — Figure 5: Sample word cloud from a model trained with biology data

In this case, the word cloud indicates that corium is a concept related to skin and its layers. What happened here? Is one of the embedding models wrong? No. They have both been trained with different data sets. That is why finding the most appropriate model for a specific use case is crucial. One common practice in the industry is to adopt a pre-trained embedding model with strong background knowledge. One takes this model and then fine-tunes it with the domain-specific knowledge needed to perform particular tasks.

The quantity and quality of the data used to train a model is relevant as well. We can agree that a person who has read just one article on aerodynamics will be less informed on the subject than a person who studied physics and aerospace engineering. Similarly, models that are trained with large sets of high-quality data will be better at understanding concepts and generate vectors that more accurately represent them. This creates the foundation for a successful vector search system.

It is worth noting that although LLMs use text embedding models, vector search goes beyond that. It can deal with audio, images, and more. It is important to remember that the embedding models used for these cases share the same approach. They also need to be trained with data — images, sounds, etc. — in order to be able to understand the meaning behind it and create the appropriate similarity vectors.

When was vector search created?

MongoDB Atlas Vector Search currently provides three approaches to calculate vector similarity. These are also referred to as distance metrics, and consist of:

euclidean distance
cosine product
dot product

While each metric is different, for the purpose of this blog, we will focus on the fact that they all measure distance. Atlas Vector Search feeds these distance metrics into an approximate nearest neighbor (ANN) algorithm to find the stored vectors that are most similar to the vector of the query. In order to speed this process up, vectors are indexed using an algorithm called hierarchical navigable small world (HNSW). HNSW guides the search through a network of interconnected data points so that only the most relevant data points are considered.

Using one of the three distance metrics in conjunction with the HNSW and KNN algorithms constitutes the foundation for performing vector search on MongoDB Atlas. But, how old are these technologies? We would think they are recent inventions by a bleeding-edge quantum computing lab, but the truth is far from that.

Timeline of vector search technologies. — Figure 6: Timeline of vector search technologies

Euclidean distance was formulated in the year 300 BC, the cosine and the dot product in 1881, the KNN algorithm in 1951, and the HNSW algorithm in 2016. What this means is that the foundations for state-of-the-art vector search were fully available back in 2016. So, although vector search is today’s hot topic, it has been possible to implement it for several years.

When were LLMs created?

In 2017, there was a breakthrough: the transformer architecture. Presented in the famous paper Attention is all you need, this architecture introduced a neural network model for natural language processing (NLP) tasks. This enabled ML algorithms to process language data on an order of magnitude greater than was previously possible. As a result, the amount of information that could be used to train the models increased exponentially. This paved the way for the first LLM to appear in 2018: GPT-1 by OpenAI. LLMs use embedding models to understand pieces of text and perform certain natural language tasks like question answering or machine translation. LLMs are essentially NLP models that were re-branded due to the large amount of data they are trained with — hence the word large in LLM. The graph below shows the amount of data — parameters — used to train ML models over the years. A dramatic increase can be observed in 2017 after the transformer architecture was published.

Graph showing the parameter count of ML systems through time. — Figure 7: Parameter count of ML systems through time. Source: towardsdatascience.com

Why are vector search and LLMs so popular?

As stated above, the technology for vector search was fully available back in 2016. However, it did not become particularly popular until the end of 2022. Why?

Although the ML industry has been very active since 2018, LLMs were not widely available or easy to use until OpenAI’s release of ChatGPT in November 2022. The fact that OpenAI allowed everyone to interact with an LLM with a simple chat is the key to its success. ChatGPT revolutionized the industry by enabling the average person to interact with NLP algorithms in a way that would have otherwise been reserved for researchers and scientists. As can be seen in the figure below, OpenAI’s breakthrough led to the popularity of LLMs skyrocketing. Concurrently, ChatGPT became a mainstream tool used by the general public. The influence of OpenAI on the popularity of LLMs is also evidenced by the fact that both OpenAI and LLMs had their first popularity peak simultaneously. (See figure 8.)

Graphs showing the popularity of LLM and OpenAI terms over time. — Figure 8: Popularity of the terms LLM and OpenAI over time. Source: Google Trends

Here is why. LLMs are so popular because OpenAI made them famous with the release of ChatGPT. Searching and storing large amounts of vectors became a challenge. This is because LLMs work with embeddings. Thus the adoption of vector search increased in tandem. This is the largest contributing factor to the industry shift. This shift resulted in many data companies introducing support for vector search and other functionalities related to LLMs and the AI behind them.

Conclusion

Vector search is a modern disruptor. The increasing value of both vector embeddings and advanced mathematical search processes has catalyzed vector search adoption to transform the field of information retrieval. Vector generation and vector search might be independent processes, but when they work together their potential is limitless.

To learn more visit our Atlas Vector Search product page. To get started using Vector Search, head over to our quick-start guide.

← Previous

Building AI with MongoDB: Supercharging Three Communication Paradigms

Communication mediums are core to who we are as humans, from understanding each other to creating bonds and a shared purpose. The methods of communication have evolved over thousands of years, from cave drawings and scriptures to now being able to connect with anyone at any time via internet-enabled devices. The latest paradigm shift to supercharge communication is through the use and application of natural language processing and artificial intelligence. In our latest roundup of AI innovators building with MongoDB, we’re going to focus on three companies building the future across three mediums of communication: data, language, and video. Our blog begins by featuring SuperDuperDB . The company provides tools for developers to apply AI and machine learning on top of their existing data stores for generative AI applications such as chatbots, Question-Answering (Q-A), and summarization. We then cover Algomo , who uses generative AI to help companies offer their best and most personalized service to customers and employees across more than 100 languages. Finally, Source Digital is a monetization platform delivering a new era of customer engagement through video and the metaverse. Let’s dive in to learn more about each company and use case. Check out our AI Learning Hub to learn more about building AI-powered apps with MongoDB. Bringing AI to your database SuperDuperDB is an open-source Python package providing tools for developers to apply AI and machine learning on top of their existing data stores. Developers and data scientists continue to use their preferred tools, avoiding both data migration and duplication to specialized data stores. They also have the freedom to run SuperDuperDB anywhere, avoiding lock-in to any one AI ecosystem. With SuperDuperDB developers can: Deploy their chosen AI models to automatically compute outputs (inference) in their database in a single environment with simple Python commands. Train models on their data simply by querying without additional ingestion and pre-processing. Integrate AI APIs (such as OpenAI) to work together with other models on their data effortlessly. Search data with vector search, including model management and serving. Today SuperDuperDB supports MongoDB alongside select relational databases, cloud data warehouses, data lake houses, and object stores. SuperDuperDB provides an array of sample use cases and notebooks that developers can use to get started including vector search with MongoDB, multimodal search, retrieval augmented generation (RAG), transfer learning, and many more. The team has also built an AI chatbot app that allows users to ask questions about technical documentation. The app is built on top of MongoDB and OpenAI with FastAPI and React (FARM stack) + SuperDuperDB. It showcases how easily developers can build next-generation AI applications on top of their existing data stores with SuperDuperDB. You can try the app and read more about how it is built at SuperDuperDB's documentation . “We integrate MongoDB as one of the key backend databases for our platform, the PyMongo driver for the app connectivity and Atlas Vector Search for storing and querying vector embeddings” , said Duncan Blythe, co-founder of SuperDuperDB. “It therefore made sense for us to partner more closely with the company through MongoDB Ventures . We get direct access to the MongoDB engineering team to help optimize our product, along with visibility within MongoDB’s vast ecosystem of developers.” Here are some useful links to learn more: SuperDuperDB Github SuperDuperDB Docs Intro SuperDuperDB Use Cases Page SuperDuberDB Blog Conversational support, powered by generative AI Algomo uses generative AI to help companies offer their best service to both their customers and employees across more than 100 languages. The company’s name is a portmanteau of the words Algorithm (originating from Arabic) and Homo, (human in Latin). It reflects the two core design principles underlying Algomo’s products: Human-centered AI that amplifies and augments rather than displaces human abilities. Inclusive AI that is accessible to all, and that is non-discriminatory and unbiased in its outputs. With Algomo, customers can get a ChatGPT-powered bot up on their site in less than 3 minutes. More than just a bot, Algomo also provides a complete conversational platform. This includes Question-Answering text generators and autonomous agents that triage and orchestrate support processes, escalating to human support staff for live chat as needed. It works across any communication channel from web and Google Chat to Intercom, Slack, WhatsApp, and more. Customers can instantly turn their support articles, past conversations, slack channels, Notion pages, Google Docs, and content on their public website into personalized answers. Algomo vectorizes customer content, using that alongside OpenAI’s ChatGPT. The company uses RAG (Retrieval Augmented Generation) prompting to inject relevant context to LLM prompts and Chain-Of-Thought prompting to increase answer accuracy. A fine-tuned implementation of BERT is also used to classify user intent and retrieve custom FAQs. Taking advantage of its flexible document data model, Algomo uses MongoDB Atlas to store customer data alongside conversation history and messages, providing long-term memory for context and continuity in support interactions. As a fully managed cloud service, Algomo’s team can leave all of the operational heavy lifting to MongoDB, freeing its team up to focus on building great conversational experiences. The team considers using MongoDB as a “no-brainer,” allowing them to iterate quickly while removing the support burden via the simplicity and reliability of the Atlas platform. The company’s engineers are now evaluating Atlas Vector Search as a replacement for its current standalone vector database, further reducing costs and simplifying their codebase. Being able to store source data, chunks, and metadata alongside vector embeddings eliminates the overhead and duplication of synchronizing data across two separate systems. The team is also looking forward to using Atlas Vector Search for their upcoming Agent Assist feature that will provide suggested answers, alongside relevant documentation snippets, to customer service agents who are responding to live customer queries. Being part of the AI Innovators program provides Algomo with direct access to MongoDB technical expertise and best practices to accelerate its evaluation of Atlas Vector Search. Free Atlas credits in addition to those provided by the AWS and Azure start-up program help Algomo reduce its development costs. Creating a new media currency with video detection and monetization Source Digital, Inc . is a monetization platform that delivers a new era of customer engagement through video and the metaverse. The company provides tools for content creators and advertisers to display real-time advertisements and content recommendations directly to users on websites or in video streams hosted on platforms like Netflix, YouTube, Meta, and Vimeo. Source Digital engineers built it’s own in-house machine learning and vector embedding models using Google Vision AI and TensorFlow. These models provide computer vision across video streams, detecting elements that automatically trigger the display of relevant ads and recommendations. An SDK is also provided to customers so that they can integrate the video detection models onto their own websites. The company started out using PostgreSQL to store video metadata and model features, alongside the pgvector extension for video vector embeddings. This initial setup worked well at a small scale, but as Source Digital grew, PostgreSQL began to creak with costs rapidly escalating. PostgreSQL can only be scaled vertically, and so the company encountered step changes in costs as they moved to progressively larger cloud instance sizes. Scaling limitations were compounded by the need for queries to execute resource-intensive JOIN operations. These were needed to bring together data in all of the different database tables hosting video metadata, model features, and vector embeddings. With prior MongoDB experience from an earlier audio streaming project, the company’s engineers were confident they could tame their cost challenges. Horizontal scale-out allows MongoDB to grow at much more granular levels, aligning costs with application usage. Expensive JOIN operations are eliminated because of the flexibility of MongoDB’s document data model. Now developers store the metadata, model features, and vector embeddings together in a single record. The company estimates that the migration from PostgreSQL to MongoDB Atlas and Vector Search will reduce monthly costs by 7x . These are savings that can be reinvested into accelerating delivery against the feature backlog. Being part of the MongoDB AI Innovators Program provides Source Digital with access to expert technical advice on scaling its platform, along with co-marketing opportunities to further fuel its growth. What's next? If you are getting started with building AI-enabled apps on MongoDB, sign up for our AI Innovators Program . Successful applicants get access to expert technical advice, free MongoDB Atlas credits, co-marketing opportunities, and – for eligible startups, introductions to potential venture investors. We’ve seen a whole host of interesting use cases and different companies building the future with AI, so you can refer back to some of our earlier blog posts below: Building AI with MongoDB: first qualifiers include AI at the network edge for computer vision and augmented reality; risk modeling for public safety; and predictive maintenance paired with Question-Answering generation for maritime operators. Building AI with MongoDB: compliance to copilots features AI in healthcare along with intelligent assistants that help product managers specify better products and help sales teams compose emails that convert 2x higher. Building AI with MongoDB: unlocking value from multimodal data showcases open source libraries that transform unstructured data into a usable JSON format; entity extraction for contracts management; and making sense of “dark data” to build customer service apps. Building AI with MongoDB: Cultivating Trust with Data covers three key customer use cases improving model explainability, securing generative AI outputs, and transforming cyber intelligence with the power of MongoDB. And please take a look at the MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into an AI-driven reality. Consider joining our AI Innovators Program to build the next big thing in AI with us!

October 16, 2023

Next →

That’s a Wrap: MongoDB’s 2025 in Review & 2026 Predictions

It’s nearly the end of the year—again! That means it’s time for an end-of-year blog post that expresses disbelief at the passage of time. Which, as the saying goes, flies when you’re having fun. And definitely when you’re as busy as MongoDB was in 2025. It was a big year for the company—and more importantly, for the tens of thousands of customers and millions of developers who rely on MongoDB’s modern data platform for their most mission-critical workloads. At MongoDB, everything we do starts with our obsession with customers and their needs, and if there’s a theme to MongoDB’s 2025, it was (and will continue to be) enabling customer innovation and helping them succeed in the AI era. So here are a few highlights of how MongoDB acted on behalf of customers in 2025. From the acquisition of Voyage AI to customer success across industries, a lot happened in 2025. Let’s go!* *Read to the end for 2026 thoughts. 2025: The (MongoDB) year that was Voyage AI, modernization, and search In February, MongoDB announced the acquisition of Voyage AI, a pioneer in embedding and reranking models, to enhance the accuracy of AI applications. Integrating Voyage AI's advanced retrieval technology with MongoDB’s modern, AI-ready data platform addresses a critical challenge: LLM model hallucinations caused by a lack of context. By improving retrieval accuracy for specialized domains like finance and law, the integration enables businesses to deploy AI for mission-critical use cases. To learn more, see the MongoDB Voyage AI page. Then, in September, we launched MongoDB AMP, an AI-powered Application Modernization Platform. AMP is designed to accelerate the transformation of legacy applications through a combination of AI-powered tooling, a proven delivery framework, and expert guidance (tools, techniques, and talent) to help enterprises reduce technical debt and modernize 2-3 times faster. Want more? Sure you do! Check out this short video. MongoDB also announced the addition of search and vector search capabilities to MongoDB Community Edition and MongoDB Enterprise Server. This allows developers to build and test AI-native applications, including those using retrieval-augmented generation (RAG), in local or on-premises environments. Previously exclusive to MongoDB Atlas, these features enable secure, hybrid deployments where sensitive data can remain on-premises while still leveraging advanced search tools. Here’s a (slightly less short) video about search and vector search on Enterprise Server. Growing and scaling with MongoDB As noted, everything we do at MongoDB starts with our obsession with customers. 2025 was another banner year for customer success and innovation—we were inspired by what organizations of every shape and size, across industries and geographies, built with MongoDB in 2025. Here are just two of the many stories our customers shared in 2025; much more can be found in my colleague Katie Palmer’s blog series, Innovating with MongoDB. Factory By combining the Atlas modern data platform with Voyage AI’s high-performance embeddings, the AI-native startup Factory—which uses AI agents called Droids to accelerate software development lifecycles for organizations—consolidated its fragmented tech stack. This enabled superior code retrieval, simplified operations, and provided the scalability needed to process billions of tokens daily. McKesson McKesson, a global pharmaceutical distributor, replaced its monolithic legacy infrastructure with MongoDB Atlas to meet strict drug tracing mandates. By adopting our modern cloud data platform, McKesson scaled its operations 300x, managing tracking data for 1.2 billion containers annually without latency, and ensuring compliance and patient safety while reducing developer complexity. For more, check out the video of McKesson at MongoDB.local NYC from September. From niche NoSQL to enterprise powerhouse As senior MongoDB engineer and Technical Fellow Ashish Kumar put it earlier this year, “through a sustained and deliberate engineering effort,” MongoDB has gone from a (seemingly) niche NoSQL solution to a trusted enterprise standard, and now delivers “the high availability, tunable consistency, ACID transactions, and robust security that enterprises demand.” A new era of leadership The face of MongoDB has also changed—our CFO, Mike Berry, joined the company in April, and Dev Ittycheria stepped down as CEO in November, after more than 11 years leading the company (including its 2017 IPO). In a LinkedIn post about his role, new MongoDB CEO CJ Desai noted that the company is “at the forefront of a new data revolution, unlocking the next wave of productivity and intelligence.” “Having spent my career building and scaling technology platforms, I’ve always been drawn to companies defined by clarity of vision, relentless organic innovation, and a customer-first culture. MongoDB exemplifies all three,” said Desai. We couldn’t agree more. Onward! Reading the 2026 tea leaves So what might 2026 bring (for MongoDB and tech at large)? Here are a handful of our leaders’ predictions: “As much as people want to talk about Artificial General Intelligence (AGI), we’re still in the phase where most AI use cases automate redundant tasks but benefit from human-in-the-loop checks. Organizations that use AI to complete work that historically is a drain on human resources—but then uses people to carefully verify what AI builds, apply governance frameworks, and maintain accountability across the data lifecycle—will be more successful.” —Pete Johnson, Field CTO, AI, MongoDB “After years of inflated expectations and unsustainable spending, the AI industry is trapped in a bubble where companies reflexively attempt to deploy LLMs at every problem, driving up costs with minimal to no return. Businesses that break free from this spending cycle are the ones that understand the need to ground LLM responses in factual data and learn from prior mistakes. We believe the best way to do this will be with highly accurate embedding models and rerankers for reliable data retrieval.” —Frank Liu, Staff Product Manager, MongoDB "In 2026, cloud independence will evolve from strategic preference to existential imperative across enterprises of every scale. The outages and disruptions of recent years have exposed a fundamental truth: in an always-on digital economy—where commerce, mobility, governance, and even public safety depend on uninterrupted access to cloud services—single-provider reliance is no longer a calculated risk, but a systemic vulnerability. Compounding this is the inexorable rise of data sovereignty. Regulatory regimes worldwide now demand precise jurisdictional control over data residency, rendering rigid cloud commitments incompatible with compliance at global scale. The defining competitive advantage will belong to organizations that transcend fragile prevention theater and engineer true infrastructural resilience: architectures inherently portable, data frictionlessly mobile, and operations autonomously sustained across heterogeneous clouds through AI-orchestrated redundancy. In short, the winners will not merely mitigate downtime—they will design systems that render the concept obsolete." —Ben Cefalo, SVP, Head of Core Products, MongoDB Happy holidays and happy New Year, everyone!

December 22, 2025