Database Search Explained Every day, each of us conducts countless database searches without even thinking about it — the movie we choose on Netflix, browsing shoes on our favorite retailer website, even logging into our home Wi-Fi. These are all examples of database searches, or queries, made in the course of daily life.
Read on to discover what database search really is, understand how database search works, and learn how to conduct your own database searches using our tutorial.
Note: The technical term for a database search is a query. If you came across this article looking for free text search, which is similar to how search engines work and allows you to search for words and phrases in blocks of text, then you can skip to the section How to build a search engine.
Table of contents
Databases are collections of individual data records organized to make it easy to access and update them. Databases usually store structured data, which means that a record will contain one or more fields, each with a name. An example would be a payroll record, which might contain fields for an employee ID, employee name, employee date of birth, and department.
A simple database, like a key-value store, may only allow you to find records where a field exactly matches the value that you provide, or maybe with some flexibility, like a range of employee IDs, or a subset of an employee name.
More powerful databases, like MongoDB, will allow you to search for specific records (we call them documents in MongoDB) using a flexible range of criteria. An example might be "all employees in department X AND who were born this month." (This is not an actual database query string!) Because sometimes, the kind of query you need to make might be quite complex, database query languages can be quite complex too.
MongoDB is queried with a set of function calls, called the MongoDB Query API. (It used to be called the MongoDB Query Language, or MQL.) The exact API depends on the driver that you're using. MongoDB provides drivers for all popular programming languages. The MongoDB Compass application even provides the ability to search your data with natural language. If, instead, you're using a relational database, it will probably be queried using Structured Query Language or SQL. Other databases have different query languages tailored to the type of data that they store.
Queries in MongoDB are provided to the API methods as structured data. A query in MongoDB is actually a BSON document, as is the data that's stored in the database. Depending on the driver, the query will be encoded as BSON behind the scenes, so this isn't something that the user has to worry about.
The query data is sent to the database, which will then generate an efficient query plan, which is a series of operations the database will execute to locate data matching the query within your database. The specifics of how a query plan is generated are unique to each database engine, but the end result is similar — the database is attempting to work out what indexes (see the upcoming section, Indexes) to use and in what order to search for the data your query has requested.
MongoDB's Query API has been tailored toward searching for documents stored in MongoDB, where fields can contain arrays of data, or even subdocuments, containing nested fields.
A search for a single employee document in MongoDB by employee ID would look something like this:
payrollCollection.findOne({
employee_id: "eid-654321"
})
The query is contained within the curly braces and is executed by calling the collection's findOne
method, which returns a single document, or null
if no document is found.
More complicated queries are also possible, such as querying the value of embedded data. The following query searches for all documents that contain a "size" subdocument that in turn contains an "h" field with a value less than 15:
// Search for documents that contain a "size" subdocument containing "h" less than 15:
db.inventory.find( { "size.h": { $lt: 15 } } )
Because MongoDB can store arrays as field values, there are a number of different ways to search for documents where an array exactly matches some criteria, contains individual items, or overall matches some criteria.
// Search for documents where "tags" is an array exactly matching the provided array:
db.inventory.find( { tags: ["red", "blank"] } );
// Search for documents where "tags" is an array containing both "red" and "blank" in any order:
db.inventory.find( { tags: { $all: ["red", "blank"] } } );
// Search for documents where "tags" contains "red":
db.inventory.find( { tags: "red" } );
Different parts of a find query can be combined using operators like "$and" and "$or" to build complex descriptions of the kind of data you're searching for. In fact, the middle expression above can be rewritten like this:
// This is the same as using "$all" in the query above:
db.inventory.find({tags: "red"}, {tags: "blank"});
Aggregation queries are the most powerful of all, allowing you to search for your data using a pipeline of operations that, step-by-step, filter and transform the data you're searching for, before returning the results. Aggregation operations in MongoDB process multiple documents and return computed results. In the example below, we utilize two stages, $match (similar to $eq) and $count, to retrieve the total number of inventory items with the tag “red.”
db.inventory.aggregate([
{
// first stage
$match: {
tags: "red"
}
},
{
// second stage
$count: 'total'
}
]);
Indexes work in a similar way to the old card-file indexes that were once used in libraries, where cards would be stored in a box, sorted by title or name, allowing the librarian to find a card for a particular book. The card would contain the location in the library where the book could be found, so the librarian could walk directly to the right place, instead of having to search every book in the library by hand.
Most databases support indexes in some form, even if you don't explicitly create an index on your data. In MongoDB, for example, every document has a primary key, stored in the "_id"
field, that is always indexed. So if you only ever look up your document by _id"
, then you won't need to create any other indexes. We can check it by running the following command:
db.inventory.getIndexes()
/*
Result:
[
{
"v": 2,
"key": {
"_id": 1
},
"name": "_id_"
}
]
*/
One of the most important ways to make your searches more efficient is to have an index that supports the queries you are executing. If you don't have an appropriate index for the query being executed, then MongoDB must scan every document in the collection to find matching results. If you have an appropriate index, then MongoDB can jump directly to a subset of matching documents instead of searching all of them. Even if the index only supports part of your query, it can still be used to whittle down all of the documents so that only a small subset needs to be more slowly scanned.
Indexes come with a performance cost for database writes, so in this case, you need to balance the speed of searching for data with the speed of updating data.
In any database, how you organize your data is going to have a big impact on how quickly you can find the data that you're looking for. Data modeling is an advanced topic, but the golden rule in MongoDB is that data that is accessed together should be stored together. What this means is that if related data, such as the comments associated with a blog post, are always loaded with the blog content, then the comments and the blog post itself should be stored in a single document. If the comments are stored in separate documents, then the database needs to do extra work, effectively searching for many documents before returning the results.
You can find out more about how to structure your data to make queries more efficient in the MongoDB documentation on data modeling, and you can learn some useful, reusable MongoDB design patterns, many of which are designed to make your database searches more efficient.
It's important to understand the work that your database has to do to find the documents you're searching for, and you can find out what the database is doing by using MongoDB's explain feature. Providing a database query to explain will list the steps that the database executed when searching for the relevant data and will highlight when no index was used to look up the data and how many documents had to be scanned.
You should use explain as a routine part of building your database application and not just when you notice a particularly slow query. Optimizing your database searches will save you money as well as make your application more responsive because less hardware will be required to run all of your database searches.
Many applications require a "search" feature, which works a little differently from a typical database search. A search box will usually search all of the text in each document, looking for a particular word or phrase. Maybe the word has been misspelled or even a synonym has been used! This database feature is called "full-text search," and in MongoDB Atlas, you can implement full-text search on one or more text fields by creating an Atlas Search index.
Behind the scenes, Atlas Search creates a text index using Lucene, which is a system used by many database text search platforms. Lucene breaks down the content of a text field into a series of words or tokens and then creates an efficient index linking each word to the documents that contain it. When it comes time to search for matching documents, the Lucene index is used to look up matching documents, and because sometimes, document content doesn't exactly match the query, documents can be ordered by how closely they match. This is pretty much how search engines work, albeit on a massive scale.
Here's a great tutorial on how to build a search service in Java.
It's often useful to be able to search for documents by location. Maybe a document contains the location of a restaurant or the outline of a city, or you would like to search for all of the restaurants within a particular city boundary. Some systems use MongoDB to track the location of ships in the ocean and to predict intercept courses for cargo vessels.
These types of search are called geospatial queries. MongoDB supports the use of GeoJSON structures to store this kind of geographic information in the database and can index them efficiently, so searching your database by location is efficient.
The fast growth of AI has resulted in powerful new ways to search your database. There are a number of specialized vector databases that can store embeddings. These are special data structures that describe the semantic meaning of a set of data. This can allow you to search for text data by describing it in general terms, not just the literal contents of the document. It can also be used to create indexes for image, audio, or video data, allowing you to search for an image by describing its contents, or allowing you to compare the noise of a car engine to a database of car engine noises that describe various engine faults. Vector search also allows you to, for example, search for an image with another image, allowing you to find similar media.
MongoDB Atlas Vector Search implements vector indexes in MongoDB Atlas, and this new way to index and search for data is very powerful, especially because these queries can be combined with all of the other features of the MongoDB Query API. To work with embeddings, we need to convert the data into embeddings using machine learning models such as OpenAI or Hugging Face. Subsequently, we utilize MongoDB's vector search index to perform similarity searches.
There are many resources to help you learn more about how to search for data in MongoDB. One MongoDB tutorial contains a section on querying documents. We also have an in-depth tutorial on the Aggregation Framework. Additionally, check out the MongoDB University courses on inserting and finding documents, the Aggregation Framework, Atlas Search, and Atlas Vector Search.
Depending on the kind of application you're building, you don't need to know everything about how to search your database and how to optimize your queries. But, knowing about the different types of indexes, and that there are techniques for effectively modeling your data to make your searches more efficient, will ensure that you can effectively scale your database and application as you need to store more data and as your users drive more database searches.
If you are writing software that needs to search a MongoDB database, you should use one of the language drivers that MongoDB provides for all popular programming languages. If you are making ad-hoc queries or building a query to insert into your code, there are a number of tools you can use to search the database:
Mongosh is a command-line tool that allows you to write and execute queries in a JavaScript shell.
MongoDB Compass is a GUI application that provides a useful interface for building queries, especially aggregation pipelines, for searching your database. You can also save frequently used queries so that they can be reused in the future.
MongoDB Atlas Data Explorer is part of the MongoDB Atlas web interface and allows you to build queries and search your database in a similar way to Compass.