I am trying to copy data from Mongo DB to s3 bucket. I followed this tutorial : How to Automate Continuous Data Copying from MongoDB to S3 | MongoDB
Steps :
Created s3 bucket and IAM role with all the required permissions (including access policy)
Created a data lake in mongo DB
Connected the data lake with S3
While Creating the Trigger I am facing this issue.
exports = function()
{
const datalake = context.services.get("v3ProdCluster-us-east-1");
const db = datalake.db("v3StagingDB");
const events = db.collection("work_sessions");
const pipeline = [
{
$match: {
"time": {
$gte: new Date(Date.now() - 60 * 60 * 10000000000000000),
$lt: new Date(Date.now())
}
}
}, {
"$out": {
"s3": {
"bucket": "mongodb-s3-staging",
"region": "us-east-1",
"filename":
{ "$concat": [
"work_sessions/",
"$_id"
]
},
"format": {
"name": "json",
"maxFileSize": "10GB",
}
}
}
}
];
return events.aggregate(pipeline);
};
i am having the same problem
Stennie_X
(Stennie)
October 3, 2022, 2:31am
#3
Welcome to the MongoDB Community @Marina_Stolet !
Can you confirm the MongoDB Atlas cluster tier you are using (M_
)? Are you following the same tutorial as the original poster?
Regards,
Stennie
I am using M10 and yes, the same tutorial, although I now adapted it. What I did was I created a federated database using my cluster and a “analytics” db and assessments coll. I am not getting that same error anymore, but no data comes into my s3 bucket. That’s the code:
exports = function () {
const datalake = context.services.get("FederatedDatabaseInstance-analytics");
const db = datalake.db("analytics");
const coll = db.collection("assessments");
const pipeline = [
{
"$out": {
"s3": {
"bucket": "322104163088-mongodb-data-ingestion",
"region": "eu-west-2",
"filename": "analytics/",
"format": {
"name": "json",
"maxFileSize": "100GB"
}
}
}
}
];
return coll.aggregate(pipeline);
};
Hi All,
I would possibly raise a new topic as it may be a completely separate issue since you are not getting the same error message.
If an object is passed to $out it must have exactly 2 fields: ‘db’ and ‘coll’
const datalake = context.services.get("v3ProdCluster-us-east-1");
Regarding the original topic title and error message, one thing you may wish to check is that you are specifying the Federated Database Instance service rather than your Atlas cluster. As per the the blog page:
You must connect to your Federated Database Instance to use $out to S3.
Regards,
Jason
Brock
(Brock)
April 11, 2023, 11:12pm
#6
@Nirmal_Patil
Codes fixed, please review the changes/differences, I also made it easier to read.
exports = function() {
const datalake = context.services.get("v3ProdCluster-us-east-1");
const db = datalake.db("v3StagingDB");
const events = db.collection("work_sessions");
const pipeline = [
{
$match: {
"time": {
$gte: new Date(Date.now() - 60 * 60 * 1000),
$lt: new Date(Date.now())
}
}
},
{
$out: {
s3: {
bucket: "mongodb-s3-staging",
region: "us-east-1",
filename: { $concat: ["work_sessions/", "$_id"] },
format: {
name: "json",
maxFileSize: "10GB"
}
}
}
}
];
return events.aggregate(pipeline).toArray();
};
@Marina_Stolet I corrected yours as well.
exports = function () {
const datalake = context.services.get("FederatedDatabaseInstance-analytics");
const db = datalake.db("analytics");
const coll = db.collection("assessments");
const pipeline = [
{
$out: {
s3: {
bucket: "322104163088-mongodb-data-ingestion",
region: "eu-west-2",
filename: "analytics/",
format: {
name: "json",
maxFileSize: "100GB"
}
}
}
}
];
return coll.aggregate(pipeline).toArray();
};
The issues with both:
Too many curly braces for your $out’s.
You needed to implement .toArray() to each of your aggregate calls.
I hope this helps.