Atlas
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

How to Query from Multiple MongoDB Databases Using MongoDB Atlas Data Lake

Joe KarlssonPublished Feb 07, 2022 • Updated Aug 17, 2022
AWSAtlasData Federation
Copy Link
facebook icontwitter iconlinkedin icon
random alt
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
As of June 2022, the functionality previously known as Atlas Data Lake is now named Atlas Data Federation. Atlas Data Federation’s functionality is unchanged and you can learn more about it here. Atlas Data Lake will remain in the Atlas Platform, with newly introduced functionality that you can learn about here.
Have you ever needed to make queries across databases, clusters, data centers, or even mix it with data stored in a AWS S3 blob? You probably haven't had to do all of these at once, but I'm guessing you've needed to do at least one of these at some point in your career. I'll also bet that you didn't know that this is possible (and easy) to do with MongoDB federated queries on a MongoDB Atlas Data Lake! These allow you to configure multiple remote MongoDB deployments, and enable federated queries across all the configured deployments.
MongoDB Atlas Data Lake design graphic showing a lake with JSON brackets for waves and a sample data overlay.
MongoDB Federated Query allows you to perform queries across many MongoDB systems, including Clusters, Databases, and even AWS S3 buckets. Here's how MongoDB federated query works in practice.
Diagram showing how MongoDB Atlas Data Lake uses a compute plane to distribute and perform queries across multiple MongoDB Databases.
Note: In this post, we will be demoing how to query from two separate databases. However, if you want to query data from two separate collections that are in the same database, I would personally recommend that you use the $lookup (aggregation pipeline) query. $lookup performs a left outer join to an unsharded collection in the same database to filter documents from the "joined" collection for processing. In this scenario, using a data lake is not necessary.
tl;dr: In this post, I will guide you through the process of creating and connecting to a Data Lake in MongoDB Atlas, configuring paths to collections in two separate MongoDB databases stored in separate datacenters, and querying data from both databases using only a single query.

Prerequisites

In order to follow along this tutorial, you need to:
Screenshot of MongoDB Atlas Cluster overview page. There are red boxes highlighting that the two clusters used in this example are being hosted in two different cloud providers, AWS and GCP.

Deploy a Data Lake

First, make sure you are logged into MongoDB Atlas. Next, select the Data Lake option on the left-hand navigation.
Screenshot from the MongoDB Atlas cluster overview page with a red box highlighting the Data Lake navigation button on the right side of the screen.
Create a Data Lake.
  • For your first Data Lake, click Create a Data Lake.
  • For your subsequent Data Lakes, click Configure a New Data Lake.
Screenshot from the MongoDB Atlas Data Lake overview page with a red box highlighting the Create Data Lake button.
Click Connect Data on the Data Lake Configuration page, and select MongoDB Atlas Cluster. Select your first cluster, input sample_airbnb as the databases and listingsAndReviews as the collection. For this tutorial, we will be analyzing Airbnb rental data and some sample weather data to see if we can draw any insights into renting behaviors on Airbnb and the weather.
Screenshot from the MongoDB Atlas Data Lake creation modal showing how I filled in the form for this demo.
Repeat the steps above to connect the data for your other cluster and data source.
Screenshot from the MongoDB Atlas Data Lake creation modal showing how I filled in the form for this demo.
Next, drag these new data stores into your data lake and click save. It should look like this.
Screenshot from the MongoDB Atlas Data Lake overview page with a red box highlighting the data sources we created in the previous step and how I positioned them in the data lake.

Connect to Your Data Lake

The next thing we are going to need to do after setting up our data lake is to connect to it so we can start running queries on all of our data. First, click connect in the second box on the data lake overview page.
Screenshot from the MongoDB Atlas Data Lake overview page with a red box highlighting the "Connect to your Data Lake" button.
Click Add Your Current IP Address. Enter your IP address and an optional description, then click Add IP Address. In the Create a MongoDB User step of the dialog, enter a Username and a Password for your database user. (Note: You'll use this username and password combination to access data on your cluster.)

Run Queries Against Your Data Lake

You can run your queries any way you feel comfortable. You can use MongoDB Compass, the MongoDB Shell, connect to an application, or anything you see fit. For this demo, I'm going to be running my queries using MongoDB Visual Studio Code plugin and leveraging its Playgrounds feature. For more information on using this plugin, check out this post on our Developer Hub.
Make sure you are using the connection string for your data lake and not for your individual MongoDB databases. To get the connection string for your new data lake, click the connect button on the MongoDB Atlas Data Lake overview page. Then click on Connect using MongoDB Compass. Copy this connection string to your clipboard. Note: You will need to add the password of the user that you authorized to access your data lake here.
Screenshot from the MongoDB Atlas Data Lake connection modal with a red box highlighting your connection string that you will use to connect to your data lake.
You're going to paste this connection string into the MongoDB Visual Studio Code plugin when you add a new connection.
Screenshot from the MongoDB Visual Studio Code plugin showing where to paste your MongoDB Atlas Data Lake Connection string.
Note: If you need assistance with getting started with the MongoDB Visual Studio Code Plugin, be sure to check out my post, How To Use The MongoDB Visual Studio Code Plugin, and the official documentation.
You can run operations using the MongoDB Query Language (MQL) which includes most, but not all, standard server commands. To learn which MQL operations are supported, see the MQL Support documentation.
The following queries use the paths that you added to your Data Lake during deployment.
For this query, I wanted to construct a unique aggregation that could only be used if both sample datasets were combined using federated query and MongoDB Atlas Data Lake. For this example, I am running a query to determine the number of theaters and restaurants in each zip code, by analyzing the sample_restaurants.restaurants and the sample_mflix.theaters datasets. If you haven't added these data sources to your data lake, be sure to do that before moving forward with this query.
I want to make it clear that these data sources are still being stored in different MongoDB databases in completely different datacenters, but by leveraging MongoDB Atlas Data Lake, we can query all of our databases at once as if all of our data is in a single collection! The following query is only possible using federated search! How cool is that?
This outputs the zip codes with the most theaters and restaurants.

Wrap-Up

Congratulations! You just set up an Atlas Data Lake that contains databases being run in different cloud providers. Then, you queried both databases using the MongoDB Aggregation pipeline by leveraging AtlasData Lake and federated queries. This allows us to more easily run queries on data that is stored in multiple MongoDB database deployments across clusters, data centers, and even in different formats, including S3 blob storage.
Screenshot from the MongoDB Atlas Data Lake overview page showing the information for our new Data Lake.
Screenshot from the MongoDB Atlas Data Lake overview page showing the information for our new Data Lake.
If you have questions, please head to our developer community website where the MongoDB engineers and the MongoDB community will help you build your next big idea with MongoDB.

Additional Resources


Copy Link
facebook icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial
How to deploy MongoDB on Heroku

Sep 23, 2022
Code Example
Trends analyser

Jul 07, 2022
Article
Triggers Treats and Tricks - Auto-Increment a Running ID Field

Sep 23, 2022
Article
Atlas Search from Soup to Nuts: The Restaurant Finder Demo App

May 09, 2022
Table of Contents