Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

AWS Glue Visual ETL for Your Data in MongoDB Atlas

Venkatesh Shanbhag, Igor Alekseev, Anuj Panchal3 min read • Published Nov 22, 2024 • Updated Nov 22, 2024
Atlas
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
AWS Glue is a serverless data integration service. It simplifies data processing for customers. Migrating data between MongoDB Atlas and AWS becomes efficient with AWS Glue's visual ETL (Extract, Transform, and Load) capabilities. The visual interface and built-in features allow data extraction and insertion to and from MongoDB Atlas collections, with optional transformations, before loading it to the target location. This serverless and scalable approach ensures cost-effective data movement while maintaining security through IAM roles. In this post, we will go through a visual approach to utilize MongoDB Atlas and AWS Glue for building pipelines between the two platforms.
To follow this post and test out AWS Glue’s capabilities with MongoDB Atlas, we need an AWS account and a subscription to MongoDB Atlas. You can subscribe to MongoDB Atlas from the AWS marketplace.
The post describes a process for transferring data between MongoDB Atlas and AWS S3 using AWS Glue's visual ETL capabilities. These capabilities allow developers to create ETL pipelines without the knowledge of Spark or SQL by leveraging AWS Glue Studio. This post highlights the benefits of using AWS Glue for data transformation and integration with other AWS services. AWS S3—being highly scalable, durable, and cost-effective object storage—can be used as data lakes, a data warehousing solution, machine learning, media streaming, backup and recovery, and web hosting.

Set up MongoDB Atlas

  1. Configure a MongoDB cluster on AWS. For instructions, refer to How to Set Up a MongoDB Cluster.
  2. Configure PrivateLink by following the steps described in Connecting Applications Securely to a MongoDB Atlas Data Plane with AWS PrivateLink. With AWS PrivateLink, we will simplify our networking architecture and make sure the traffic stays on the AWS network.
  3. To obtain your MongoDB cluster connection string from the Connect UI on the MongoDB Atlas console, navigate to your Atlas home screen and click on Connect for the AWS cluster you want to connect. Select the Private Endpoint and Connection Method.
  4. Copy the SRV connection string. We use this SRV connection string in the subsequent steps.
Image 1: Connect to MongoDB Atlas using connection string
The following screenshot shows that we have loaded a sample collection (in this case, sample weather data) in MongoDB Atlas, which we will connect to in the next steps. Note: The records in this collection include several arrays as well as nested data.
Image 2: MongoDB Sample dataset

Set up the MongoDB Atlas connection with AWS Glue

Before we can configure the AWS Glue crawler, we need to create the MongoDB Atlas connection in AWS Glue.
  1. On the AWS Glue Studio console, choose Connectors in the navigation pane.
  2. Choose Create connection.
Image 3: AWS Glue Data Connections interface
  1. When filling out the connection details, use the SRV connection string we obtained earlier in MongoDB Atlas.
  2. In the Network options section, add the VPC and subnet. Important: The VPC and the subnet must correspond to the PrivateLink settings you configured earlier.
Image 4: Create Glue connection for MongoDB Atlas
Create and run an ETL job to extract data from MongoDB Atlas into S3
Once the connection is configured, Navigate to the Glue | ETL jobs | Visual ETL to create the ETL job.
To open the Glue visual editor, either click on Visual ETL or Author and edit ETL jobs, and then click on visual ETL on the next screen.
Image 5: Navigation to AWS Glue ETL interface
If you have saved jobs, you can access them from the AWS Glue studio, as well.
Image 6: Creating a Visual ETL job on AWS Glue
Click on (+) on the home screen of the Glue visual editor to add nodes. Search for MongoDB and select MongoDB as the source to read from the previously created connection.
Image 7: MongoDB connector on AWS Glue visual ETL
Select the MongoDB connection and provide the database and collection name. Save the pipeline.
Image 8: Configuring the connector on AWS Glue Visual ETL
Click on (+) to add one more node and search for S3. Select Amazon S3 as Target. Select MongoDB as Node parent.
Image 9: Configuring S3 bucket connection AWS Glue Visual ETL
Select the data format for your destination data file. Select the S3 bucket where you want to write the data and click on Save.
Image 10:  Configuring S3 bucket connection AWS Glue Visual ETL
Run the job by clicking the button in the top right corner.
Image 11: Run ETL job to move data between MongoDB Atlas and AWS S3
The job will take a few minutes to complete. You can monitor your job on the Runs tab, as shown below.
Image 12: Visualizing the ETL job running
You can verify your data written to S3 by navigating to your S3 bucket.
 Image 13: Verify the the document moved to S3 bucket in parquet format
Watch this Video to see the steps in action
AWS Glue Visual ETL simplifies creating and managing data transformations, allowing developers to create ETL pipelines without the specialized knowledge of data engineering tools. It offers connectors to numerous third-party and AWS-native products and services. This enables you to enrich data from various sources for analysis in data warehousing, while building efficient pipelines effortlessly with Glue Visual ETL. For advanced data transformations involving MongoDB Atlas, refer to the AWS Glue documentation.
Questions? Join us in the MongoDB Developer Community.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Getting Started With Deno 2.0 & MongoDB


Oct 22, 2024 | 13 min read
Quickstart

How to Connect MongoDB Atlas to Vercel Using the New Integration


Aug 05, 2024 | 4 min read
Tutorial

Building an Advanced RAG System With Self-Querying Retrieval


Sep 12, 2024 | 21 min read
Tutorial

How to Implement Working Memory in AI Agents and Agentic Systems for Real-time AI Applications


Nov 18, 2024 | 13 min read
Table of Contents