Docs Menu

Docs HomeMongoDB Cluster-to-Cluster Sync

Filtered Sync

On this page

  • Configure a Filter
  • Replace an Existing Filter
  • Adding and Renaming Collections
  • Filtering with $out
  • Limitations
  • Examples

New in version 1.1.

Cluster-to-Cluster Sync provides continuous data synchronization or a one-time data migration between two MongoDB clusters. You can use filtered sync to specify which databases and collections the mongosync utility transfers between the source and destination clusters.

Important

Once you start mongosync with a filter in place, the filter cannot be modified. If you do need to create a new filter, see: Replace an Existing Filter.

1

Identify the databases and collections that you want to sync to the destination cluster. When you add a set of databases to the filter, you also exclude any other databases in the cluster.

When you specify a collection in your filter, you also exclude any other collections that are in the same database.

2

The includeNamespaces parameter specifies an optional filter that you can pass to the /start API.

If you don't specify a filter, mongosync does a full cluster sync.

The filter syntax is:

[
{
"database": "databaseOne", // required
"collections": [ // optional
"collectionOne",
"collectionTwo"
]
},
{
"database": "databaseTwo"
}
]

Create an entry in the includeNamespaces array for each database that you identified in step 1. Use the "database" field to specify the database name.

If you want to filter on collections within a database, add those collections to the list in the "collections" field for that database entry.

3

To use the filter, attach the filter json when you make the /start API call to begin syncing.

curl -X POST "http://localhost:27182/api/v1/start" --data '
{
"source": "cluster0",
"destination": "cluster1",
"includeNamespaces" : [
{
"database" : "sales",
"collections": [ "EMEA", "APAC" ]
},
{
"database" : "marketing"
}
]
} '

For an example configuration, see: Start mongosync with a Filter.

You cannot update an existing filter. You must stop the ongoing sync process, prepare the destination cluster, and restart mongosync with a new filter.

When mongosync ran your original filter, it created databases with your data ("user databases") and the mongosync_reserved_for_internal_use system database on the destination cluster. You must remove those databases before restarting mongosync with your new filter.

Follow these steps to prepare the destination cluster for a new filter.

1
  1. Stop the mongosync process.

  2. Connect to the destination cluster with mongosh. If the destination is a sharded cluster, connect to the mongos instance. If the destination is a replica set, connect to the primary mongod instance.

  3. Drop the mongosync_reserved_for_internal_use system database.

    use mongosync_reserved_for_internal_use
    db.dropDatabase()
2
  1. List the databases in the cluster

    show databases
  2. Remove user databases. The admin, local, and config databases are system databases. You should not edit these system databases without instructions from MongoDB support.

    If the show databases command lists any user databases on the destination cluster, you must remove them.

    Repeat this step for each user database list:

    use <user database name>
    db.dropDatabase()

    Note: After the first db.dropDatabase() operation completes, you may need to run it a second time to remove the database.

3
  1. Decide which databases and collections you want to filter on. Add the databases and collections to the includeNamespaces array. For configuration details, see Configure a Filter.

  2. Run mongosync to reconnect to the source and destination clusters.

  3. Use the /start API end point to start syncing. Be sure to attach your new filter when you call /start.

You can, with some restrictions, add or rename a collection during a filtered sync.

Warning

If your renaming operation violates the renaming restrictions, mongosync stops syncing and reports an error.

To clean up and restart after an error, follow the steps to replace an existing filter.

You can add new collections or rename an existing collection if the entire database is part of the filter.

You can also rename a collection if the old name and the new name are both specified in the filter.

See the renaming examples.

You can only rename a collection across databases if the entire target database is part of a filter. If the filter specifies individual collections in the target database, renaming across databases does not work.

See the renaming examples.

The $out aggregation stage creates a new collection when it runs. You can use the $out stage with filtering if you are filtering on the whole database and not just the collection specified in the $out statement.

For example, consider this aggregation pipeline:

use library
db.books.aggregate( [
{ $group : { _id : "$author", titles: { $push: "$title" } } },
{ $out : "authors" }
] )

The $out stage creates the authors collection in the library database. If you want to sync the authors collection, you must specify the entire library database in your filter. The filter will not work if you only specify the authors collection.

This filter works:

"includeNamespaces": [
{
"database": "library"
}
]

This filter does not work with $out:

"includeNamespaces": [
{
"database": "library",
"collections": [ "authors", "books" ] // DOES NOT WORK WITH $OUT
}
]
  • Filtering is not supported with reversible sync.

  • The destination cluster must not contain user data prior to starting.

  • The destination cluster must not contain the mongosync_reserved_for_internal_use system database prior to starting.

  • You cannot modify a filter that is in use. To create a new filter, see: Replace an Existing Filter.

  • You can only rename collections in certain situations. For more details see: Adding and Renaming Collections.

  • If a filter includes a view but not the base collection, only the view is replicated.

  • You cannot specify system collections or system databases in a filter.

  • Operations that use the $out aggregation stage are only supported if the entire database is specified in the filter. You cannot limit the filter to a collection within the database. See: Filtering with $out.

The following example starts a sync job between cluster0 and cluster1. The source cluster is cluster0 and the destination cluster is cluster1.

cluster0 contains the sales, marketing, and engineering databases.

The sales database contains the EMEA, APAC, and AMER collections.

The includeNamespaces array in this example defines a filter on two of the databases, sales and marketing.

The sales database also filters on the EMEA and APAC collections.

"includeNamespaces" : [
{
"database" : "sales",
"collections": [ "EMEA", "APAC" ]
},
{
"database" : "marketing"
}
]

After you call the /start API with this filter in place, mongosync:

  • Syncs all of the collections in the marketing database

  • Filters out the engineering database

  • Syncs the EMEA and APAC collections from the sales database

  • Filters out the AMER collection

The following example starts a sync job between cluster0 and cluster1. The source cluster is cluster0 and the destination cluster is cluster1.

cluster0 contains the students, staff, and prospects databases.

  • The students database contains the undergrad and graduate collections.

  • The staff database contains the employees and contractors collections.

The includeNamespaces array in this example defines a filter on two of the databases:

{
"source": "cluster0",
"destination": "cluster1",
"includeNamespaces":
[
{ "database" : "students", "collections": ["undergrad", "graduate", "adjuncts"] },
{ "database" : "staff" }
]
}

With this filter in place, mongosync syncs:

  • The entire staff database

  • The undergrad, graduate, and adjuncts collections in the students database

mongosync does not sync any information from the prospects database.

mongosync syncs the entire staff database. If you add new collections to the staff database, mongosync syncs them too.

mongosync does not sync new collections that are added to the students database unless the collection is a part of the filter.

For example, mongosync does not sync the new collection if you add the postdocs collection to the students database. If you add the adjuncts collection, mongosync syncs it since adjuncts is part of the filter.

You can rename any collection in the staff database.

// This code works
use admin
db.runCommand( { renameCollection: "staff.employees", to: "staff.salaried" } )

You can only rename a collection within the students database if the new and old names are both in the filter. If either of the names is not in the filter, monogsync reports an error and exists.

// This code works
use admin
db.runCommand( { renameCollection: "students.graduate", to: "students.adjuncts" } )

If a collection is specified in the filter, you can drop it, but you cannot rename it to remove it from the filter.

// This code produces an error and mongosync stops syncing
use admin
db.runCommand( { renameCollection: "students.graduate", to: "students.notAFilteredCollection" } )

When the whole target database is included in the filter, you can rename collections to add them to the filter:

  • Source collection is specified in the filter

    use admin
    db.runCommand( { renameCollection: "students.adjuncts", to: "staff.adjuncts" } )
  • Source collection is not specified in the filter

    use admin
    db.runCommand( { renameCollection: "prospects.current", to: "staff.newHires" } )

You can also rename collections in the source database when the whole target database is in the filter:

use admin
db.runCommand( { renameCollection: "staff.employees", to: "staff.onPayroll" } )

Important

If you anticipate renaming collections, consider adding the entire database to the filter rather than specifying individual collections.

←  mongosync Statesoplog Sizing →
Share Feedback
© 2023 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2023 MongoDB, Inc.