mongosync usage and instructions on how
to upgrade your version of mongosync.New in version 1.1.
Cluster-to-Cluster Sync provides continuous data synchronization or a one-time data migration between two MongoDB clusters. You can use filtered sync to specify which databases and collections the mongosync utility transfers between the source and destination clusters.
Starting in 1.1, mongosync supports inclusion filters to specify which
databases and collections to include in sync. Starting in 1.6, mongosync
also supports exclusion filters and regular expressions.
With inclusion filters,
mongosyncsyncs matching databases and collections.With exclusion filters,
mongosyncsyncs all databases and collections, except for those that match the filters.With both inclusion and exclusion filters,
mongosynconly syncs databases and collections that match the inclusion filters then excludes any that also match the exclusion filters.With no filters,
mongosyncsyncs all databases and collections.
Filter Syntax
The start API endpoint accepts two fields that configure
filtered sync: includeNamespaces and excludeNamespaces.
Each field takes an array of filters that specify the databases and collections
to include or exclude from sync.
Note
If the start call uses both includeNamespaces and
excludeNamespaces parameters, mongosync first matches databases
and collections from the inclusion filters, then excludes those that
also match an exclusion filter.
Filters have the following syntax:
"includeNamespaces": [ { "database": "<database-name>", "collections": [ "<collection-name>" ] "databaseRegex": { "pattern": "<regex-pattern>", "options": "<options>" }, "collectionsRegex": { "pattern": "<regex-pattern>", "options": "<options>" } } ], "excludeNamespaces": [ { "database": "<database-name>", "collections": [ "<collection-name>" ] "databaseRegex": { "pattern": "<regex-pattern>", "options": "<options>" }, "collectionsRegex": { "pattern": "<regex-pattern>", "options": "<options>" } } ]
Filters must include either the database field or the databaseRegex field.
If you need the filter to match specific collections, you can use either
the collections array to specify collections individually or define
a regular expression using the collectionsRegex field.
Configure a Filter
Important
Once you start mongosync with a filter in place, the filter
cannot be modified. If you do need to create a new filter,
see: Replace an Existing Filter.
Identify Databases and Collections.
Identify the databases and collections that you want to sync to the destination cluster.
When you add a set of databases to the filter, you also exclude any other databases in the cluster.
When you specify a collection in your filter, you also exclude any other collections that are in the same database.
Create a Filter.
The start API accepts two parameters that configure
optional filters:
The
includeNamespacesparameter takes an array of filters, which are used to determines which databases and collectionsmongosyncshould include in the sync.The
excludeNamespacesparameter takes an array of filters, which are used to determine which databases and collectionsmongosyncshould exclude from the sync.
If you don't specify a filter, mongosync performs a full cluster
sync.
Create inclusion and/or exclusion filters to identify the databases and collections you want to sync.
For example, this inclusion filter would configure mongosync to only
sync collections whose names begin with accounts_ from the sales
database, except for the accounts_old collection:
"includeNamespaces": [ { "database": "sales", "collectionsRegex": { "pattern": "^accounts_.+?$", "options": "ms" } ], "excludeNamespaces": [ { "database": "sales", "collections": [ "accounts_old" ] } ]
For more information on filters, see Filter Syntax.
Use the Filter.
To use the filter, attach the filter json when you make the /start API call to begin syncing.
curl -X POST "http://localhost:27182/api/v1/start" --data ' { "source": "cluster0", "destination": "cluster1", "includeNamespaces": [ { "database": "sales", "collectionsRegex": { "pattern": "^accounts_.+$", "options": "i" } }, { "database": "marketing" } ] } '
For an example configuration, see: Start mongosync with a Filter.
Replace an Existing Filter
You cannot update an existing filter. You must stop the ongoing sync
process, prepare the destination cluster, and restart mongosync with
a new filter.
When mongosync ran your original filter, it created databases with
your data ("user databases") and the
mongosync_reserved_for_internal_use system database on the
destination cluster. You must remove those databases before restarting
mongosync with your new filter.
Follow these steps to prepare the destination cluster for a new filter.
Remove mongosync_reserved_for_internal_use.
Stop the
mongosyncprocess.Connect to the destination cluster with
mongosh. If the destination is a sharded cluster, connect to themongosinstance. If the destination is a replica set, connect to the primarymongodinstance.Drop the
mongosync_reserved_for_internal_usesystem database.use mongosync_reserved_for_internal_use db.dropDatabase()
Remove user databases.
List the databases in the cluster
show databases Remove user databases. The
admin,local, andconfigdatabases are system databases. You should not edit these system databases without instructions from MongoDB support.If the
show databasescommand lists any user databases on the destination cluster, you must remove them.Repeat this step for each user database list:
use <user database name> db.dropDatabase() Note: After the first
db.dropDatabase()operation completes, you may need to run it a second time to remove the database.
Configure a new filter.
Decide which databases and collections you want to filter on. Add the databases and collections to the
includeNamespacesarray. For configuration details, see Configure a Filter.Run
mongosyncto reconnect to the source and destination clusters.Use the
/startAPI end point to start syncing. Be sure to attach your new filter when you call/start.
Adding and Renaming Collections
You can, with some restrictions, add or rename a collection during a filtered sync.
Warning
If your renaming operation violates the renaming restrictions,
mongosync stops syncing and reports an error.
To clean up and restart after an error, follow the steps to replace an existing filter.
Adding and Renaming Within a Single Database
You can add new collections or rename an existing collection if the entire database is part of the filter.
You can also rename a collection if the old name and the new name are both specified in the filter.
See the renaming examples.
Renaming Across Different Databases
You can only rename a collection across databases if the entire target database is part of a filter. If the filter specifies individual collections in the target database, renaming across databases does not work.
See the renaming examples.
Filtering with mapReduce and $out
To use the $out aggregation stage or
the mapReduce command (when set to create
or replace a collection) with filtering, you must
filter the whole database and not just
the specified collection.
For example, consider this aggregation pipeline:
use library db.books.aggregate( [ { $group : { _id : "$author", titles: { $push: "$title" } } }, { $out : "authors" } ] )
The $out stage creates the authors collection in the library
database. If you want to sync the authors collection, you must
specify the entire library database in your filter. The filter will
not work if you only specify the authors collection.
This filter works:
"includeNamespaces": [ { "database": "library" } ]
This filter does not work:
"includeNamespaces": [ { "database": "library", "collections": [ "authors", "books" ] // DOES NOT WORK WITH $OUT } ]
Limitations
Filtering is not supported with reversible sync.
The destination cluster must not contain user data prior to starting.
The destination cluster must not contain the
mongosync_reserved_for_internal_usesystem database prior to starting.You cannot modify a filter that is in use. To create a new filter, see: Replace an Existing Filter.
You can only rename collections in certain situations. For more details see: Adding and Renaming Collections.
If a filter includes a view but not the base collection, only the view metadata syncs to the destination cluster. To include the view documents, you must also sync the base collection.
You cannot specify system collections or system databases in a filter.
To use the
$outaggregation stage or themapReducecommand (when set to create or replace a collection) with filtering, you must configure the filter to use the entire database. You cannot limit the filter to collections within the database.For more information, see Filtering with mapReduce and $out.
Examples
Start mongosync with a Filter
The following example starts a sync job between cluster0 and
cluster1. The source cluster is cluster0 and the destination
cluster is cluster1.
cluster0 contains the sales, marketing, and
engineering databases.
The sales database contains the EMEA, APAC, and AMER
collections.
The includeNamespaces array in this example defines a filter on two
of the databases, sales and marketing.
The sales database also filters on the EMEA and APAC
collections.
"includeNamespaces" : [ { "database" : "sales", "collections": [ "EMEA", "APAC" ] }, { "database" : "marketing" } ]
After you call the /start API with this filter in place,
mongosync:
Syncs all of the collections in the
marketingdatabaseFilters out the
engineeringdatabaseSyncs the
EMEAandAPACcollections from thesalesdatabaseFilters out the
AMERcollection
Adding and Renaming Collections While Syncing
The following example starts a sync job between cluster0 and
cluster1. The source cluster is cluster0 and the destination
cluster is cluster1.
cluster0 contains the students, staff, and prospects
databases.
The
studentsdatabase contains theundergradandgraduatecollections.The
staffdatabase contains theemployeesandcontractorscollections.
The includeNamespaces array in this example defines a filter on two
of the databases:
{ "source": "cluster0", "destination": "cluster1", "includeNamespaces": [ { "database" : "students", "collections": ["undergrad", "graduate", "adjuncts"] }, { "database" : "staff" } ] }
With this filter in place, mongosync syncs:
The entire
staffdatabaseThe
undergrad,graduate, andadjunctscollections in thestudentsdatabase
mongosync does not sync any information from the prospects
database.
Adding a Collection
mongosync syncs the entire staff database. If you add new
collections to the staff database, mongosync syncs them too.
mongosync does not sync new collections that are added to
the students database unless the collection is a part of the filter.
For example, mongosync does not sync the new collection if you add
the postdocs collection to the students database. If you add the
adjuncts collection, mongosync syncs it since adjuncts is
part of the filter.
Renaming a Collection
You can rename any collection in the staff database.
// This code works use admin db.runCommand( { renameCollection: "staff.employees", to: "staff.salaried" } )
You can only rename a collection within the students database if the
new and old names are both in the filter. If either of the names is not
in the filter, monogsync reports an error and exists.
// This code works use admin db.runCommand( { renameCollection: "students.graduate", to: "students.adjuncts" } )
If a collection is specified in the filter, you can drop it, but you cannot rename it to remove it from the filter.
// This code produces an error and mongosync stops syncing use admin db.runCommand( { renameCollection: "students.graduate", to: "students.notAFilteredCollection" } )
When the whole target database is included in the filter, you can rename collections to add them to the filter:
Source collection is specified in the filter
use admin db.runCommand( { renameCollection: "students.adjuncts", to: "staff.adjuncts" } ) Source collection is not specified in the filter
use admin db.runCommand( { renameCollection: "prospects.current", to: "staff.newHires" } )
You can also rename collections in the source database when the whole target database is in the filter:
use admin db.runCommand( { renameCollection: "staff.employees", to: "staff.onPayroll" } )
Important
If you anticipate renaming collections, consider adding the entire database to the filter rather than specifying individual collections.