Push not matching documents and collections from local database to remote database

Prasanna_Sasne · January 13, 2023, 11:29pm

I want to push not matching documents and collections from the local database to a remote database, what could be the best way to do that? Please note that both databases are standalone databases, not replicas or shards.

Requirement:

I want to show the documents from the local database which does not match with the remote database and then I want to push only a few documents from the resulting documents. I do not want to push all documents automatically from the local to the remote database.

How can I push only selected documents to a remote database once the difference has been calculated? I read about aggregation and pipeline but it does not work with two standalone databases. Please suggest to me the best way to do this case.

Or if we can use aggregation with pipeline when replicas are not there, please suggest this way too.
Or if with replicas can I achieve this condition? like updating a few documents remotely instead of updating everything automatically. I want to update some documents from local to remote only on some API calls. Can someone help me to get this case correctly?

Justin_Jenkins · January 14, 2023, 12:35am

Hello @Prasanna_Sasne,

I believe if you have two separate servers you cannot (without some custom code) do something like a $merge in one operation, to aggregate and then output the results to another collection, on another server.

Even if you output your aggregated documents to another database on your local server, if you used replication, you’d need to replicate the whole server (so all databases).

Besides custom code, you could script (on your local database server) something to:

Regularly run a mongodump or mongoexport to export documents of whatever collection your aggregation output goes to on Server A into a file or to standard out
Then use mongorestore or mongoimport and a connection string to connect to, and insert documents into your Server B

In both cases you apply a query to define what data you dump/export or import/export via options to those programs. Also, if you are “restoring” to an existing collection MongoDB will be smart enough to use the _id to avoid any duplication.

Prasanna_Sasne · January 14, 2023, 3:12am

Thank you for your response. I am more interested to push local database collections to a remote database on API calls. And I do not want to use the command line interface so, mongodump will not be useful.

I tried creating a replica and writing to the secondary node (Ideally I should not write to the secondary node) but collections from the secondary node also were automatically synchronizing with a remote database. Can you please tell me, what the correct behavior of the replica is? I set the high priority for the primary node and low priority for the secondary node, but still, every write on the secondary node was reflected in the primary node. So my question is

What configuration shall I make in the secondary node so that it won’t sync automatically with the primary node?
Can I really push only a few documents from secondary to primary? Not synchronizing everything with the primary? Or everything which is written to the secondary node, will automatically sync with the primary node?

Justin_Jenkins · January 14, 2023, 5:17am

Great question! A replica set copies (replicates) the contents of one sever node to another node. This action isn’t on the database or collection level. There are various technical resons for this, but suffice to say it is meant to work on the server level.

There is an election process when a Replica Set starts up and based off various factors a “primary” is “elected”. While the “priority” effects which node will be the “primary” (or the server you can write to and have its contents replicated to the other nodes) it is merely a weighting. You can set a priority from 1-1000, or 0. A 0 is the only way to make sure a node won’t be a primary.

Unless you overwrite the default this isn’t happening. You can only write to the primary. Perhaps the server you don’t consider to be the primary is actually the primary? You should be able to determine this by logging into the server. Here are some commands you can try:

rs.status()

rs.isMaster()

Again, the priority doesn’t guarantee which node will be the primary, it just makes it more likely. The reason this doesn’t really matter is each node will have the same data.

That particular configuration is impossible. The idea of replica set is the same data is replicated across all the nodes so if any one node goes down you can still provide all the same data from one of the other nodes. This is also why when you connect to a replica set you don’t connect directly to a particular server, but rather allow MongoDB to direct you to the primary.

All the data on a node will be replicated to the other nodes in the set, by design. If you want subsets of your data on another server that requires a programmatic solution.

Replica Sets are meant to copy everything in an idempotent manner between servers so any one server can become the primary at any time.

P.S.

There is one caveat, although I don’t recommend it … technically anything in the local database isn’t replicated to other nodes. You could (in theory) store the data you don’t want replicated in that database and then and then aggregate out into some other database documents you want replicated (i.e. data in any other database).

This is basically circumventing how replication is supposed to work though, so again … not recommended for most uses cases.