MongoDB as a Graph Database

MongoDB offers graphing capabilities with its $graphLookup stage. Give $graphLookup a try by creating a free cluster in MongoDB Atlas.

Graph databases fulfill a need that traditional databases have left unmet: They prioritize relationships between entities. This makes graph databases incredibly efficient for looking for patterns, making predictions, and finding solutions.

While it’s a general purpose document database, MongoDB provides graph and tree traversal capabilities with its $graphLookup stage in the aggregation pipeline. For applications that require more advanced graph capabilities or that use graph capabilities frequently, MongoDB can also be coupled with a dedicated graph database.

In this article, we'll explore the answers to the following questions:

What is a Graph Database?

A graph database is a database that stores data in nodes and edges—nodes for information about an entity and edges for information about the relationship or actions between nodes. This data model allows graph databases to prioritize relationships between the data, which makes searching for connections and patterns in the data easier than it is with traditional databases.

Graph databases use the following key terms:

  • Nodes (or vertices). You can think of nodes as the nouns in your databases; they store information about people, places, and things.
  • Edges (or relationships). You can think of edges as the verbs in your databases; they store information about the actions that are taken between the nodes.
  • Properties. A property is a key-value pair that stores information about a particular node or edge.
  • Labels. A label can optionally be used to tag a group of related nodes.

The section below contains an example of how data is modeled in a graph database using nodes, edges, properties, and labels.

How is Data Stored in a Graph Database? An Example

Let's walk through how data is stored in a graph database. Let's say you want to store air travel information.

Each airport would be represented by a node, and information about the airports would be stored in the nodes' properties. For example, you could create a node for Auckland Airport in New Zealand with properties such as airport_code: AKL and country: New-Zealand.

A flight between two airports—that is, the relationship between two nodes—is represented by an edge. The edge's properties contain information about the flight. In this example, you could create an edge between the nodes for Auckland Airport and Brisbane Airport. The edge could have properties like airplane: 73H 388 and airline: Qantas.

Two ovals (one with information about Auckland Airport and one with information about Brisbane Airport) are connected by an arrow (edge). The arrow has the following text beneath it: airplane: 73H 388 and airline: Qantas

A node represents Auckland Airport, with its properties listed. An edge represents a flight from Auckland Airport to Brisbane Airport, with properties for airplane and airline.

In addition to the nodes for airports, you might create nodes for other related entities such as airport restaurants. You might choose to apply a label like "Airport" to the airport nodes and "Restaurant" to the restaurant nodes. The diagram below shows a graph with several nodes, edges, properties, and labels.

A graph representation of three airports, four flights, and two restaurants. The airport nodes are labeled Airport at the top of their ovals. The restaurant nodes are labeled Restaurant at the top of their ovals.

A graph that represents airports, airport restaurants, and flights.
Open image in new tab.

Why Should You Care About Graph Databases?

Because graph databases prioritize relationships, they allow users to efficiently query those relationships between nodes to look for patterns and make predictions. In a world where intelligently making decisions and recommendations is becoming increasingly important, graph databases shine where relational databases struggle.

Additionally, graph databases provide a flexible schema. This allows developers to easily make changes to their database as requirements inevitably change and evolve. Flexible schemas are hugely valuable for agile teams building modern applications.

What are Common Use Cases of Graph Data?

Graph databases are suited for a variety of use cases. Some of the most common are detecting fraud, building recommendation engines, managing IT networks, and computing graph algorithms between data.

  • Detecting fraud. Graph databases are commonly used for the detection of money laundering, credit card fraud, insurance fraud, tax fraud, and other criminal activity. In recent years, criminals have become increasingly sophisticated in their schemes. Rather than working from a single point or account, they create a fraud ring of smaller schemes that are much more likely to go undetected. Graph databases allow analysts to look at the connections between the data to find patterns of fraud.

  • Building recommendation engines. Modern businesses want to provide intelligent recommendations to their customers as a strategic way to increase revenue. Graph databases allow an application to analyze previous purchase data to determine what the customer may want to purchase next.

  • Managing IT networks. Graph databases can help IT specialists model and manage networks, IT infrastructure, and IoT devices. Because these entities are physically connected in the real world, they map naturally to graphs. Graph databases can enable organizations to easily perform impact analysis and plan for outages.

  • Computing graph algorithms. When you need to efficiently perform calculations on the connections between data, graph databases are a great option. They are used extensively for computing graph algorithms such as shortest path and PageRank. These algorithms are used in everything from finding the cheapest flight between two locations to determining influence in social networks.

What are the Key Considerations for Graph Databases?

As described above, graph databases are a great choice when you need to analyze relationships between entities to look for patterns, generate intelligent recommendations, model networks, or compute graph algorithms.

Graph databases tend to be less performant for non-traversal queries. Most applications require such queries, so they require a general purpose database. For applications that have use cases that would benefit from both a graph database as well as a general purpose database, developers have two options:

  • Couple a general purpose database (like a document or relational database) with a graph database.
  • Use a general purpose database (like MongoDB) that has graphing capabilities.

How Does MongoDB Integrate with Graph Data?

MongoDB is a general purpose document database. As shown in the diagram below, the document model is a superset of other ways to work with data including key-value pairs, relational, objects, graph, and geospatial.

A multi-layer diagram. The top layer is the document model. The second layer has sections for key-value pairs, relational, objects, graph, and geospatial. The third layer is the unified interface. The outermost layer has sections for transactional, search, mobile, real-time analytics, and data lake.

Documents are a superset of all other data models.

MongoDB has a $graphLookup aggregation pipeline stage for querying graphs. $graphLookup is ideal for traversing graphs, trees, and hierarchical data. Keep in mind that $graphLookup may not be as performant or provide the full graphing algorithm capabilities that a standalone graph database does.

If your application will be frequently running graphing queries, consider coupling your MongoDB database with a graph database. Note that coupling databases introduces significant complexity into your architecture, requiring you to keep data in sync between two distinct systems as well as use different languages to query them. Therefore, if your requirement for graph queries can be served by the capabilities that are built into MongoDB, you may be better off keeping everything together in one place and using a single API to interact with your data.

For more information on the graph capabilities in MongoDB, check out this webinar on working with graph data in MongoDB.

How Can You Execute a Graph Query in MongoDB?

MongoDB has a $graphLookup aggregation pipeline stage that performs a recursive search. $graphLookup can be used to traverse connections between documents.

Let's take a look at an example from the routes collection, which is included in the MongoDB Atlas sample dataset. The routes collection contains information about airline routes between airports. Below is a sample document from the routes collection.

{
   _id:"56e9b39c732b6122f878b0cc",
   airline: {
     id: 4089, 
     name: 'Qantas', 
     alias: 'QF', 
     iata: 'QFA' 
   },
   src_airport: 'AKL',
   dst_airport: 'BNE',
   airplane: '73H 388'
 }

The document above stores information about a route from Auckland Airport (AKL) to Brisbane Airport (BNE). The document indicates that the airplanes assigned to this route are the Boeing 737-800 (73H) and the Airbus 380-800 (388). The document also indicates that the airline that operates this route is Qantas.

Let's query for all possible destinations that can be reached from Auckland Airport either directly or through connecting flights. We can use MongoDB's aggregation pipeline to craft our query. We can begin our aggregation pipeline with the $match stage to match documents where the src_airport is AKL. Then we can use the $graphLookup stage to recursively search for documents where the src_airport matches the current document's dst_airport. Below is the full aggregate command to be run in the MongoDB shell. Note that this query will likely take a few minutes to run as it searches for all possible routes.

db.routes.aggregate(
[
  {
    $match: {
        src_airport: 'AKL'
    }
  }, 
  {
    $graphLookup: {
        from: 'routes',
        startWith: '$dst_airport',
        connectFromField: 'dst_airport',
        connectToField: 'src_airport',
        as: 'routesThatCanBeReached',
        depthField: 'numberOfAdditionalStops'
    }
 }
])

The result is a set of documents. Each document in the result is a route that originates from AKL. Each document also contains an array named routesThatCanBeReached, which lists all of the routes that can be reached either directly or indirectly from the route listed in the document. The numberOfAdditionalStops field indicates how many additional stops a passenger would have after their first flight in order to complete this route. Below is part of the output from running the aggregate command above.

[
 {
   _id: "56e9b39c732b6122f878b0cc",
   airline: { id: 4089, name: "Qantas", alias: "QF", iata: "QFA" },
   src_airport: "AKL",
   dst_airport: "BNE",
   airplane: "73H 388",
   routesThatCanBeReached: [
     {
       _id: "56e9b39c732b6122f8787a6d",
       airline: { id: 2822, name: "Iberia Airlines", alias: "IB", iata: "IBE" },
       src_airport: "ORY",
       dst_airport: "IBZ",
       airplane: "320",
       numberOfAdditionalStops: 3
     },
     {
       _id: "56e9b39c732b6122f87882a1",
       airline: { id: 3052, name: "Jetstar Airways", alias: "JQ", iata: "JST" },
       src_airport: "BNE",
       dst_airport: "LST",
       airplane: "320",
       numberOfAdditionalStops: 0
     }
     ...
   ]
 },
 {
   _id: "56e9b39c732b6122f878b0cd",
   airline: { id: 4089, name: "Qantas", alias: "QF", iata: "QFA" },
   src_airport: "AKL",
   dst_airport: "MEL",
   airplane: "73H",
   routesThatCanBeReached: [
     ...    
   ]
 },
 ...
]

Let's say we want to further refine this query. We want to limit our results to trips with, at most, one stop. To do this, we can set maxDepth to 0 to indicate the query should use a maximum recursion depth of 0. We can also choose to only search for flights on a particular airline by using the restrictSearchWithMatch property. Below is the full aggregate command to be run in the MongoDB shell:

db.routes.aggregate([
    {
        '$match': {
            'src_airport': 'AKL', 
            'airline.name': 'Qantas'
        }
    }, {
        '$graphLookup': {
            'from': 'routes', 
            'startWith': '$dst_airport', 
            'connectFromField': 'dst_airport', 
            'connectToField': 'src_airport', 
            'as': 'airportsThatCanBeReachedWithOnlyOneStop', 
            'maxDepth': 0, 
            'restrictSearchWithMatch': {
                'airline.name': 'Qantas'
            }
        }
    }
  ]
)

Just like the previous command that we ran, the result is a set of documents. Each document contains an array named airportsThatCanBeReachedWithOnlyOneStop, which contains information about routes that begin at the connection and are operated by Qantas. Below is part of the output from running the aggregate command above.

[
  {
    _id: ObjectId("56e9b39c732b6122f878b0cc"),
    airline: { id: 4089, name: 'Qantas', alias: 'QF', iata: 'QFA' },
    src_airport: 'AKL',
    dst_airport: 'BNE',
    airplane: '73H 388',
    airportsThatCanBeReachedWithOnlyOneStop: [
      {
        _id: ObjectId("56e9b39c732b6122f878b104"),
        airline: { id: 4089, name: 'Qantas', alias: 'QF', iata: 'QFA' },
        src_airport: 'BNE',
        dst_airport: 'LDH',
        airplane: 'DH8'
      },
      {
        _id: ObjectId("56e9b39c732b6122f878b0f8"),
        airline: { id: 4089, name: 'Qantas', alias: 'QF', iata: 'QFA' },
        src_airport: 'BNE',
        dst_airport: 'CBR',
        airplane: '717 73H'
      },
…
 {
    _id: ObjectId("56e9b39c732b6122f878b0cd"),
    airline: { id: 4089, name: 'Qantas', alias: 'QF', iata: 'QFA' },
    src_airport: 'AKL',
    dst_airport: 'MEL',
    airplane: '73H',
    airportsThatCanBeReachedWithOnlyOneStop: [
      {
        _id: ObjectId("56e9b39c732b6122f878b1bd"),
        airline: { id: 4089, name: 'Qantas', alias: 'QF', iata: 'QFA' },
        src_airport: 'MEL',
        dst_airport: 'HBA',
        airplane: '73H'
      },
    …
    ]
}
…
]

The examples above are just a taste of what you can do with $graphLookup. Check out the official MongoDB documentation on $graphLookup, the free MongoDB University Course on the Aggregation Framework, or this video tutorial on how to analyze network graph data in MongoDB for more information.

Getting Started with MongoDB as a Graph Database

MongoDB Atlas
MongoDB offers graphing capabilities with its $graphLookup stage. Give $graphLookup a try by creating a free cluster in MongoDB Atlas, MongoDB’s fully managed Database-as-a-Service.
MongoDB University
Learn how to use $graphLookup in the free MongoDB University Course: M121 The MongoDB Aggregation Framework.

FAQ

Is MongoDB a graph database?

No, MongoDB is not a graph database. MongoDB is a general purpose document database with graph traversal capabilities.

When should you use a graph database vs MongoDB?

Most applications will require a general purpose database like MongoDB. Some applications will also require graph capabilities. In those cases, MongoDB's $graphLookup may provide sufficient graph capabilities, providing developers with the simplicity of using a single database. Other applications will require more advanced graph capabilities that are currently only available in a dedicated graph database. In those cases, a graph database can be coupled with a general purpose database like MongoDB.

Can MongoDB be used as a graph database?

MongoDB's $graphLookup aggregation pipeline stage can be used to traverse graphs. For applications that require more advanced graph capabilities, a general purpose database like MongoDB can be coupled with a dedicated graph database.