Copy atlas collection to another database

We need to clone/ copy a collection from one db (Let’s call it main) to another (Let’s call it analytics) every 24 hours.

Currently the best idea I’ve come up with is to do a mongodump to an s3 bucket and then use mongo restore to copy it to the analytics db.

Is there any built in tool/ db sync, or is there an established best practice for doing this in mongo? All the docs seem to point at mongodump → restore or older versions of those.

Take a look at

Hello Kai, to go a bit deeper, is this other Database in another cluster or is it all within the same cluster?

You could use a delayed node as steevej recommended or you could use a scheduled trigger with $out.
Every 24 hours run a $out to the new collection.

Furthermore, if the other database is in another cluster, you could use Atlas Data Lake along with “$out to Atlas” and again a realm trigger to schedule it.

This tutorial covers that use case, except instead of $out to S3 you would use $out to Atlas.

Lastly, we actually have some new functionality coming at MongoDB world that I think might actually solve your problem better than these others, please reach out at benjamin.flast@mongodb.com if you’d like to discuss it further and I can get you early access.

Best,
Ben

2 Likes

Looking forward to this:

1 Like

Hi @steevej & @Benjamin_Flast, thanks for the responses and links.

The other database will be within a seperate cluster; the idea being to create some anonymised data in the collection based on various collections in the main database and the copy this to a seperate db which will be used for analytics through Tableau (Assuming we’ll be using the mongo Tableau connector) without any connection or reference back to the main db.

I’ll send you an email, wouldn’t mind having a look at the new functionality if it’s going to be a better solution.

Cheers
Kai

1 Like

Hi @Benjamin_Flast thanks again.
Looks like we’ll take the DataLake route using a trigger with $merge or $out.

I have one last question around DataLake & Atlas triggers.

Is there a way to setup DataLake or Atlas triggers locally or in a Docker container?
We currently have a mongodb-memory-server that gets spun up for our integration tests. I’m hoping to do something similar for this scheduled task that will copy / update the analytics db. My thinking isn’t to test the triggers or mongo functionality; I assume you have that covered.
I mainly want to have a canary test to indicate if something may be broken and ensure our queries are correct so we can catch things early. For example if we have a schema change that may affect the aggregate query or something like that.

In case someone else stumbles across this thread searching for the same thing our approach as a rough/ pseudocode-ish overview on the same db using the mflix movies example collection is as follows:

  1. Link DataLake to Db
  2. Create Atlas Trigger to run every 24 hours
    Example trigger function below.
exports = async function() {

  const movies = context.services
    .get("DataLake0")
    .db("Database0")
    .collection("Collection0");
    
    const pipeline = [
      {
            $match: {}
      }, {
    "$out": {
      "atlas": {
        "projectId": "111111111111111111111111",
        "clusterName": "mflix",
        "db": "analytics",
        "coll": "test"
      }
    }
}
   ];
  return movies.aggregate(pipeline);
};

Hello Kai,

Unfortunately no there is no way to deploy Data Lake or Triggers locally, they are only available in Atlas.

Do you have a Dev or QA project in Atlas where you could run these integration tests on a low tier cluster?

-Ben

Hi Ben,

We do, the problem is if we use those the data could potentially change when someone uses those environments leading to inconsistent automated tests.
I’ll just test the query in isolation for now using the in-memory db. Ideally it would be a little closer to production, but at least it will cover the most likely potential cause of issues.

Cheers
Kai

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.