EventGet 50% off your ticket to MongoDB.local London on October 2. Use code WEB50Learn more >>
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Combining Your Database With Azure Blog Storage Using Data Federation

Tim Kelly7 min read • Published Aug 29, 2024 • Updated Aug 29, 2024
AzureData FederationJavaScriptAtlas
FULL APPLICATION
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
For as long as you have been reviewing restaurants, you've been storing your data in MongoDB. The plethora of data you've gathered is so substantial, you decide to team up with your friends to host this data online, so other restaurant goers can decide where to eat, informed by your detailed insights. But your friend has been storing their data in Azure Blob storage. They use JSON now, but they have reviews upon reviews stored as .csv files. How can we get all this data pooled together without the often arduous process of migrating databases or transforming data? With MongoDB's Data Federation, you can combine all your data into one unified view, allowing you to easily search for the best French diner in your borough.
This tutorial will walk you through the steps of combining your MongoDB database with your Azure Blob storage, utilizing MongoDB's Data Federation.

Prerequisites

Before you begin, you'll need a few prerequisites to follow along with this tutorial, including:
  • A MongoDB Atlas account, if you don't have one already
  • A Microsoft Azure account with a storage account and container setup. If you don't have this, follow the steps in the Microsoft documentation for the storage account and the container.
  • Azure CLI, or you can install Azure PowerShell, but this tutorial uses Azure CLI. Sign in and configure your command line tool following the steps in the documentation for Azure CLI and Azure PowerShell.
  • Node.js 18 or higher and npm: Make sure you have Node.js and npm (Node.js package manager) installed. Node.js is the runtime environment required to run your JavaScript code server-side. npm is used to manage the dependencies.

Add your sample data

To have something to view when your data stores are connected, let's add some reviews to your blob. First, you'll add a review for a new restaurant you just reviewed in Manhattan. Create a file called example1.json, and copy in the following:
Upload this file as a blob to your container:
Here, BlobName is the name you want to assign to your blob (just use the same name as the file), and PathToFile is the path to the file you want to upload (example1.json).
But you're not just restricted to JSON in your federated database. You're going to create another file, called example2.csv. Copy the following data into the file:
Load example2.csv to your blob using the same command as above.
You can list the blobs in your container to verify that your file was uploaded:

Connect your databases using Data Federation

The first steps will be getting your MongoDB cluster set up. For this tutorial, you're going to create a free M0 cluster. Once this is created, click "Load Sample Dataset." In the sample dataset, you'll see a database called sample_restaurants with a collection called restaurants, containing thousands of restaurants with reviews. This is the collection you'll focus on.
Now that you have your Azure Storage and MongoDB cluster setup, you are ready to deploy your federated database instance.
  1. Select "Data Federation" from the left-hand navigation menu.
  2. Click "Create New Federated Database" and, from the dropdown, select "Set up manually."
  3. Choose Azure as your cloud provider and give your federate database instance a name. The screen displaying our selection of cloud provider for our federated database, Azure.
  4. To add your data source, click "Add Data Source" and select Azure Blob Storage as your data store.
  5. Next, you need to select an Azure Service Principal. You can use an existing one, but you'll create a new one for this tutorial. From the dropdown, select "Authorize an Azure Service Principal'' and click continue.
  6. To assign the relationships, follow the onscreen instructions. In this tutorial, you are going to use AzureCLI.
    1. First, you need to get the tenant ID by running the command:
    2. Next, run the following command to create your new Service Principle for Atlas, and copy the "id" from the output:
  7. You need to grant your access. Again, there will be prompts on the screen to guide you through the steps, but you can follow along here.
    1. First, run az storage account list --query "[].id" to get the Storage Account Resource ID. This will auto-populate the command on the page that you can copy and run in your terminal to set up the credentials delegation.
    2. After this, set up your storage container access. Choose whether you want to grant "Read-only" or "Read and write" privileges. Select your storage container region, and enter your storage container name. This will auto-populate the command below on the page that you can copy and paste in.
  8. Now, you need to provide sample pathways so you can query your data. If you've been following along with this simple example, copy the pathway https://<YourStorageAccount>.blob.core.windows.net/<YourContainerName>/sample.json — of course, with your account and container names — into the box.
  9. To link your Azure Blob Storage path components, from the dropdown, accept "any value (*)" and click “Next.”
  10. You should see your container name under your data sources now. All you need to do is drag your dataset and drop it into your virtual collection.
Let's add your MongoDB dataset to your federated database now.
  1. Click "Add Data Sources" and choose “Atlas Cluster.”
  2. Select the cluster that contains the sample dataset.
  3. This will bring up all the databases in the cluster. Click the dropdown arrow next to sample_restaurants to view the collections inside that database, choose restaurants, and click “Next.” The Add Data Source screen showing MongoDB Atlas Cluster selected, along with our collection restaurants.
  4. Now, you should see sample_restaurants.restaurants under data sources. You can drag your dataset over to your virtual collection, just like before.
Now that you have both datasets in your federated database instance, click “Create,” and it's time to view your data.

Connect to our federated database

There are many ways to view the data in the federated database. For this tutorial, you’ll create a simple Node.js application. To do this, you'll need a connection string. Once your federated database instance is created, you'll be able to see it under "Data Federation" on the left-hand navigation menu. On your instance, click “Connect.” Select “Driver” as your connection method and copy your connection string. Now, open up an IDE of your choice.
For this tutorial, only a simple Node.js application is needed. If you want to learn more about developing with MongoDB and JavaScript, check out Developer Center, where you’ll find a whole variety of tutorials, or explore MongoDB with other languages.
Before you start, make sure you have Node.js installed in your environment.
  1. Set up a new Node.js project:
    • Create a new directory for your project.
    • Initialize a new Node.js project by running npm init -y in your terminal within that directory.
    • Install the MongoDB Node.js driver by running npm install mongodb.
  2. Create a JavaScript file:
    • Create a file named searchApp.js in your project directory.
  3. Implement the application:
    • Edit searchApp.js to include the following code, which connects to your MongoDB database and creates a client.
    • Now, create a function called searchDatabase that takes an input string and field from the command line and searches for documents containing that string in the specified field.
    • Lastly, create a main function to control the flow of the application.
  4. Run your application with node searchApp.js fieldName "searchString".
    • The script expects two command line arguments: the field name and the search string. It constructs a dynamic query object using these arguments, where the field name is determined by the first argument, and the search string is used to create a regex query.
In the terminal, you can type the query node searchApp.js "Restaurant ID" "40356030" to find your example2.csv file as if it was stored in a MongoDB database. Or maybe node searchApp.js borough "Manhattan", to find all restaurants in your virtual database (across all your databases) in Manhattan. You're not just limited to simple queries. Most operators and aggregations are available on your federated database. There are some limitations and variations in the MongoDB Operators and Aggregation Pipeline Stages on your federated database that you can read about in our documentation.

Conclusion

By following the steps outlined, you've learned how to set up Azure Blob storage, upload diverse data formats like JSON and CSV, and connect these with your MongoDB dataset using a federated database.
This tutorial highlights the potential of data federation in breaking down data silos, promoting data interoperability, and enhancing the overall data analysis experience. Whether you're a restaurant reviewer looking to share insights or a business seeking to unify disparate data sources, MongoDB's Data Federation along with Azure Blob storage provides a robust, scalable, and user-friendly platform to meet your data integration needs.
Are you ready to start building with Atlas on Azure? Get started for free today with MongoDB Atlas on Azure Marketplace. If you found this tutorial useful, make sure to check out some more of our articles in Developer Center, like MongoDB Provider for EF Core Tutorial. Or pop over to our Community Forums to see what other people in the community are building!
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Building an Advanced RAG System With Self-Querying Retrieval


Sep 10, 2024 | 21 min read
Tutorial

How to Implement Agentic RAG Using Claude 3.5 Sonnet, LlamaIndex, and MongoDB


Jul 02, 2024 | 17 min read
Tutorial

How to Optimize LLM Applications With Prompt Compression Using LLMLingua and LangChain


Jun 18, 2024 | 13 min read
Tutorial

Sentiment Chef Agent App with Google Cloud and MongoDB Atlas


Jun 24, 2024 | 16 min read
Table of Contents
  • Prerequisites