Docs Home → MongoDB Cluster-to-Cluster Sync
Filtered Sync
On this page
New in version 1.1.
Cluster-to-Cluster Sync provides continuous data synchronization or a one-time data migration between two MongoDB clusters. You can use filtered sync to specify which databases and collections the mongosync utility transfers between the source and destination clusters.
Configure a Filter
Important
Once you start mongosync
with a filter in place, the filter
cannot be modified. If you do need to create a new filter,
see: Replace an Existing Filter.
Identify Databases and Collections.
Identify the databases and collections that you want to sync to the destination cluster. When you add a set of databases to the filter, you also exclude any other databases in the cluster.
When you specify a collection in your filter, you also exclude any other collections that are in the same database.
Create a Filter.
The includeNamespaces
parameter specifies an optional filter
that you can pass to the /start API.
If you don't specify a filter, mongosync
does a full cluster
sync.
The filter syntax is:
[ { "database": "databaseOne", // required "collections": [ // optional "collectionOne", "collectionTwo" ] }, { "database": "databaseTwo" } ]
Create an entry in the includeNamespaces
array for each
database that you identified in step 1. Use the "database"
field to specify the database name.
If you want to filter on collections within a database, add those
collections to the list in the "collections"
field for that
database entry.
Use the Filter.
To use the filter, attach the filter json when you make the /start API call to begin syncing.
curl -X POST "http://localhost:27182/api/v1/start" --data ' { "source": "cluster0", "destination": "cluster1", "includeNamespaces" : [ { "database" : "sales", "collections": [ "EMEA", "APAC" ] }, { "database" : "marketing" } ] } '
For an example configuration, see: Start mongosync
with a Filter.
Replace an Existing Filter
You cannot update an existing filter. You must stop the ongoing sync
process, prepare the destination cluster, and restart mongosync
with
a new filter.
When mongosync
ran your original filter, it created databases with
your data ("user databases") and the
mongosync_reserved_for_internal_use
system database on the
destination cluster. You must remove those databases before restarting
mongosync
with your new filter.
Follow these steps to prepare the destination cluster for a new filter.
Remove mongosync_reserved_for_internal_use
.
Stop the
mongosync
process.Connect to the destination cluster with
mongosh
. If the destination is a sharded cluster, connect to themongos
instance. If the destination is a replica set, connect to the primarymongod
instance.Drop the
mongosync_reserved_for_internal_use
system database.use mongosync_reserved_for_internal_use db.dropDatabase()
Remove user databases.
List the databases in the cluster
show databases Remove user databases. The
admin
,local
, andconfig
databases are system databases. You should not edit these system databases without instructions from MongoDB support.If the
show databases
command lists any user databases on the destination cluster, you must remove them.Repeat this step for each user database list:
use <user database name> db.dropDatabase() Note: After the first
db.dropDatabase()
operation completes, you may need to run it a second time to remove the database.
Configure a new filter.
Decide which databases and collections you want to filter on. Add the databases and collections to the
includeNamespaces
array. For configuration details, see Configure a Filter.Run
mongosync
to reconnect to the source and destination clusters.Use the
/start
API end point to start syncing. Be sure to attach your new filter when you call/start
.
Adding and Renaming Collections
You can, with some restrictions, add or rename a collection
during a filtered sync.
Warning
If your renaming operation violates the renaming restrictions,
mongosync
stops syncing and reports an error.
To clean up and restart after an error, follow the steps to replace an existing filter.
Adding and Renaming Within a Single Database
You can add new collections or rename an existing collection if the entire database is part of the filter.
You can also rename a collection if the old name and the new name are both specified in the filter.
See the renaming examples.
Renaming Across Different Databases
You can only rename a collection across databases if the entire target database is part of a filter. If the filter specifies individual collections in the target database, renaming across databases does not work.
See the renaming examples.
Filtering with $out
The $out aggregation stage creates a new collection
when it runs. You can use the $out
stage with filtering if you are
filtering on the whole database and not just the collection specified in
the $out
statement.
For example, consider this aggregation pipeline:
use library db.books.aggregate( [ { $group : { _id : "$author", titles: { $push: "$title" } } }, { $out : "authors" } ] )
The $out
stage creates the authors
collection in the library
database. If you want to sync the authors
collection, you must
specify the entire library
database in your filter. The filter will
not work if you only specify the authors
collection.
This filter works:
"includeNamespaces": [ { "database": "library" } ]
This filter does not work with $out
:
"includeNamespaces": [ { "database": "library", "collections": [ "authors", "books" ] // DOES NOT WORK WITH $OUT } ]
Limitations
Filtering is not supported with reversible sync.
The destination cluster must not contain user data prior to starting.
The destination cluster must not contain the
mongosync_reserved_for_internal_use
system database prior to starting.You cannot modify a filter that is in use. To create a new filter, see: Replace an Existing Filter.
You can only rename collections in certain situations. For more details see: Adding and Renaming Collections.
If a filter includes a view but not the base collection, only the view is replicated.
You cannot specify system collections or system databases in a filter.
Operations that use the
$out
aggregation stage are only supported if the entire database is specified in the filter. You cannot limit the filter to a collection within the database. See: Filtering with $out.
Examples
Start mongosync
with a Filter
The following example starts a sync job between cluster0
and
cluster1
. The source cluster is cluster0
and the destination
cluster is cluster1
.
cluster0
contains the sales
, marketing
, and
engineering
databases.
The sales
database contains the EMEA
, APAC
, and AMER
collections.
The includeNamespaces
array in this example defines a filter on two
of the databases, sales
and marketing
.
The sales
database also filters on the EMEA
and APAC
collections.
"includeNamespaces" : [ { "database" : "sales", "collections": [ "EMEA", "APAC" ] }, { "database" : "marketing" } ]
After you call the /start
API with this filter in place,
mongosync
:
Syncs all of the collections in the
marketing
databaseFilters out the
engineering
databaseSyncs the
EMEA
andAPAC
collections from thesales
databaseFilters out the
AMER
collection
Adding and Renaming Collections While Syncing
The following example starts a sync job between cluster0
and
cluster1
. The source cluster is cluster0
and the destination
cluster is cluster1
.
cluster0
contains the students
, staff
, and prospects
databases.
The
students
database contains theundergrad
andgraduate
collections.The
staff
database contains theemployees
andcontractors
collections.
The includeNamespaces
array in this example defines a filter on two
of the databases:
{ "source": "cluster0", "destination": "cluster1", "includeNamespaces": [ { "database" : "students", "collections": ["undergrad", "graduate", "adjuncts"] }, { "database" : "staff" } ] }
With this filter in place, mongosync
syncs:
The entire
staff
databaseThe
undergrad
,graduate
, andadjuncts
collections in thestudents
database
mongosync
does not sync any information from the prospects
database.
Adding a Collection
mongosync
syncs the entire staff
database. If you add new
collections to the staff
database, mongosync
syncs them too.
mongosync
does not sync new collections that are added to
the students
database unless the collection is a part of the filter.
For example, mongosync
does not sync the new collection if you add
the postdocs
collection to the students
database. If you add the
adjuncts
collection, mongosync
syncs it since adjuncts
is
part of the filter.
Renaming a Collection
You can rename any collection in the staff
database.
// This code works use admin db.runCommand( { renameCollection: "staff.employees", to: "staff.salaried" } )
You can only rename a collection within the students
database if the
new and old names are both in the filter. If either of the names is not
in the filter, monogsync
reports an error and exists.
// This code works use admin db.runCommand( { renameCollection: "students.graduate", to: "students.adjuncts" } )
If a collection is specified in the filter, you can drop it, but you cannot rename it to remove it from the filter.
// This code produces an error and mongosync stops syncing use admin db.runCommand( { renameCollection: "students.graduate", to: "students.notAFilteredCollection" } )
When the whole target database is included in the filter, you can rename collections to add them to the filter:
Source collection is specified in the filter
use admin db.runCommand( { renameCollection: "students.adjuncts", to: "staff.adjuncts" } ) Source collection is not specified in the filter
use admin db.runCommand( { renameCollection: "prospects.current", to: "staff.newHires" } )
You can also rename collections in the source database when the whole target database is in the filter:
use admin db.runCommand( { renameCollection: "staff.employees", to: "staff.onPayroll" } )
Important
If you anticipate renaming collections, consider adding the entire database to the filter rather than specifying individual collections.