Extracting data from Mongo DB to s3

Pinaka_Pani · March 5, 2024, 12:16pm

Hi,
Need help with the following issue.
I am exporting Mongo DB data (hosted on the ec2 node) to the S3 bucket in JSON format. The current approach uses an intermediate EC2 instance to run Mongo export and copy the exported JSON data to S3 from intermediate EC2. The collection size varies between 10 to 20 TB and Mongo export is taking too long to export.
constraints:
The collection does not contain any date fields to run the job in batch mode.

Please suggest any alternative or better approach for the issue. Any suggestion is much appreciated. Thank you for your help.

eric_N_A3 · March 5, 2024, 6:31pm

To speed up exporting MongoDB data to S3 without date fields:

Directly export from EC2 to S3 using mongodump and aws s3 cp.
Consider parallel exports by breaking down the collection.
Optimize MongoDB configuration for export operations.
Compress data before transferring to S3.
Explore AWS Data Pipeline for automation.
Evaluate alternative storage options like Amazon Glacier or Redshift for long-term needs.