Sync batch database with collection database

I have 2 databases that have the exact same data but are stored in different structures. One data bases is using batches and indexing. The other data bases stores all the records within collections.

The databases are accessed by proprietary applications. If it was possible I would love it, if either databases is updated the other is updated. However if I could get the data from the database that is utilizing the batches structure with index to update the other database that would be great.

Please let me know what information anyone would need to accomplish this.

You may certainly do that with change streams.

If running on Atlas, it’s easy to setup change stream processing within a server side app, we’ve done this many times. Setup a trigger and process the data in that, you may need to have a “Last Changed” flag or something to avoid infinite calls between the two triggers…

Depending on if one app only reads the data and the other writes, could you create a view on the data so that you have one set of data that’s formatted in multiple ways?

@Carmelo_Rizza, it has been 8 days since you received replies from your original post.

The best way to keep this forum useful is to followup on your issues.

Thanks

@steevej
these are propriety applications creating the databases. @John_Sewell I’m not running atlas.

Both databases are running on a local server isolated for the internet. I can bring the server online only for updates.
One application setups a web portal front end to update the database for manual access and it can also be updated through the application. This is the same port that the application uses to access the data.

The database that that uses batches has not portal, all data is uploaded and configured through the application. To make matters more complicated, it utilizes a web portal for authentication. However, I have created an a user that doesn’t need authentication for updating but it does not have authorization to delete files.

I have the schema from both databases but the portal won’t allow me to upload the data since I am a new user.

You should be able to embed code fragments that show examples of the data, or upload an example of data from both environment onto mongo playground:

Depending on if your data has something like a “last updated” stamp you could either:

  • Create a monitoring application as @steevej suggested where it’ll open a change stream on each database and based on the change detected create an update to keep the other in sync.
  • Have a scheduled task every X period that will check what has been changed and merge the modified document over to the other collection. You may need to be VERY careful with this in case of updates on both sides have been done and then need to replicate changes on both side to the other without losing changes.

If you give sample documents, a better solution may be obvious though! It does seem a bit of a bad situation to be in though, having the data duplicated like this, long term it may be best to see if you could re-architect the system to have a single database storage and cut down on complexities!

1 Like

The 2 applications use to be 2 different companies but they have since merged/bought out 1 of the companies. Eventually they will move the architecture over bit who knows when.

I was actually trying to figure out how to get the other application to connect to the same database. That might be easier than trying to get these 2 database to sync.

The database structure is so different and the database which stores everything based on batches actually converts the hash vaules into base64 where the other database doesn’t. I will upload data to the play ground later today.

Another thought…you may want to look at your support contract to ensure that anything like you’re planning does not invalidate any form of support going forward.

That would be my preferred option too. In particular, I would make the one that uses batch to use the more structured version of the data. Because updating the structured version is probably more efficient. The aggregation framework probably can be use to present the structured data into batch. And whatever aggregation pipeline is use to present the data in batches, you can then set a view that uses this pipeline.

If the data is in 2 places it will go out of sync and both version will probably be wrong.