Using Data Federation and triggers to copy data to a s3 bucket

Marina_Stolet · October 3, 2022, 2:41pm

Recently I have using this tutorial ([How to Automate Continuous Data Copying from MongoDB to S3 | MongoDB](https://How to Automate Continuous Data Copying from MongoDB to S3)) to try to copy data from assessments(collection name)>analytics(db name)>dev(cluster name) to a s3 bucket. I first created a federated database called FederatedDatabaseInstance-analytics, I then created a a trigger with this function:

exports = function () {

   const datalake = context.services.get("FederatedDatabaseInstance-analytics");
   const db = datalake.db("analytics");
   const coll = db.collection("assessments");

   const pipeline = [
      /*{
            $match: {
               "time": {
                  $gt: new Date(Date.now() - 60 * 60 * 1000),
                  $lt: new Date(Date.now())
               }
            }
      },
      {
            "$out": {
               "s3": {
                  "bucket": "322104163088-mongodb-data-ingestion",
                  "region": "eu-west-2",
                  "filename": { "$concat": [
                      "analytics",
                      
                      "$_id"
                    ]
                  },
                  "format": {
                        "name": "JSON",
                        "maxFileSize": "40GB"
                        //"maxRowGroupSize": "30GB" //only applies to parquet
                  }
               }
            }
      }*/
      {
        "$out": {
          "s3": {
            "bucket": "322104163088-mongodb-data-ingestion",
            "region": "eu-west-2",
            "filename": "analytics/",
            "format": {
              "name": "json",
              "maxFileSize": "100GB"
            }
          }
        }
      }
   ];

   return coll.aggregate(pipeline);
};

The thing is, I get no errors runnings it but nothing appears on my bucket

Benjamin_Flast · October 3, 2022, 8:41pm

Hey Marina,

Can you confirm that you are using the right name spaces in your Federated Database? You should be using the ones that you’ve set in the visual editor.

So if you have VirtualCollectionFoo (and inside you have AtlasCollectionBar1, AtlasCollectionBar2), then your aggregation should look like

db.VirtualCollectionFoo.aggerage([])

Then when the aggregation pipeline is reading from VirtualCollectionFoo it will in turn be pulling data from the underlying Atlas Collections.

Let me know if that works.

Best,
Ben

Marina_Stolet · October 3, 2022, 9:24pm

Well, “FederatedDatabaseInstance-analytics” is what I set in the visual editor for the data federation configuration, which is what I am using (see picture 1), is that correct? Should I put the name of something else?

Also, do you mean using return db.coll.aggregate(pipeline);? when I do that this

Cannot access member 'aggregate' of undefined
	at exports (function.js:51:11(47))
	at function_wrapper.js:4:27(18)
	at <eval>:8:8(2)
	at <eval>:2:15(6)

Marina_Stolet · October 3, 2022, 9:38pm

Every time I run the trigger function the number of queries in this instance raises, so it must mean I am “connecting” to it

Benjamin_Flast · November 7, 2022, 4:51pm

Sorry for the delay, this seems to have gotten lost in my inbox!

Yes, that is the right thing to have there, but i’m referring to the database name and collection name. Are they the names of the DB and Collection in your Federated Instance? Or are they the names from your cluster itself? They should be the names that you manually edit inside the Federated Database Instance

TeenSmart_International · January 6, 2023, 12:35am

I’m having the exact same problem. I’m using the database and collection name in the federated database.

Marina_Stolet · January 6, 2023, 1:17pm

When you created the federated database, did you add as a source both the cluster you want to ingest and the landing bucket on aws? It needs to be in the same Federated database.