How to synchronize 2 collections after aggregation when data is added to original collection?

Chloe_Gwon · January 13, 2023, 9:39pm

Hi, I’m working on mongoDB and Next.js. Before I call API, I want to aggregate in Mongo Compass and call the aggregated API. And I’d like to synchronize the 2 collections when data is added to original collection. The data is added through MongoDB, not my Next.js website (server side). So I guess trigger is not the case. How to make aggregate collection to have new added data based on insert of original collection? The data is added by an hour FYI.
This is the sample of my data.

Original Collection

_id: "ABC123"
date: "2022-10-10 01:00:00"
powerIn: 30
powerOut: 50

Aggregated Collection

_id: "ABC123"
date: "2022-10-10 01:00:00"
powerIn: 30
powerOut: 50
netzero: -20
cumulativePowerIn: 80
cumulativePowerOut: 120
cumulativeNetzero: -40

Justin_Jenkins · January 13, 2023, 10:45pm

Hello @Chloe_Gwon !

If I am understanding your need correctly …

You should be able to use a $merge stage (at the end of your aggregation, to just add new documents.

Make sure to check out the options for various different matching strategies you can configure:

{ $merge: {
     into: <collection> -or- { db: <db>, coll: <collection> },
     on: <identifier field> -or- [ <identifier field1>, ...],  // Optional
     let: <variables>,                                         // Optional
     whenMatched: <replace|keepExisting|merge|fail|pipeline>,  // Optional
     whenNotMatched: <insert|discard|fail>                     // Optional
} }

Chloe_Gwon · January 13, 2023, 10:51pm

Hi! Thanks for your reply, Justin. I just tried merge, and this is the output that I was hoping. Just to clarify tho, my main issue is make new data automatically aggregate without me merging it.

While I was waiting for the response, I found out scheduled trigger. Will it work in this case?

Prasanna_Sasne · January 13, 2023, 11:20pm

I think creating replicas will help you in this scenario. This will be an efficient way

Chloe_Gwon · January 13, 2023, 11:23pm

Oh yeah this makes more sense. Thanks so much, I’ll try it right now

Chloe_Gwon · January 13, 2023, 11:31pm

But can you do aggregation in replicas tho?

Prasanna_Sasne · January 13, 2023, 11:33pm

I think aggregation will only work with replicas or shards. The database should be connected to the same mongo instance. Aggregation can not be used on standalone databases. If we can add aggregation on two standalone databases, please let me know how to do it?

Justin_Jenkins · January 13, 2023, 11:35pm

As I understand it you have:

An “original” collection with documents, and new documents get inserted at some regular basis?
Your goal is to have a separate collection, with documents that have a different structure, based off the original documents? Hence the aggrgation?

Are those “original” inserts happening at any time, or at the hourly mark you mentioned? Or is the hourly when you planned to do this sync?

The exact performance really depends on your application as a whole, etc. so it is a little hard to say.

However you could use these concepts to guide your decision:

If you are using Atlas you could use either Database Triggers or Scheduled Triggers.

The main difference being the Database Triggers will fire when a document is added or in some way modified … so performance here will mostly be around what your trigger code does, and if doing that per document is best. This has the advantages of keeping things up-to-date with the original collection automatically, but the downside of it might run very often, depending on how many modifications you have.

The Scheduled Triggers can let you batch together an operation, so for example you can run a single aggregation with a $merge at the end to mass insert any new data, and run that on an hourly schedule.

If you aren’t using Atlas you could still trigger something (like a cron job) to run the aggregation pipeline every hour as well.

Or, you can as part of your application code do this aggregation and insertion into the your aggregated collection.

steevej · January 14, 2023, 2:10pm

This is false. Please verify your sources.

Chloe_Gwon · January 15, 2023, 1:19am

Thanks so much! This works very well : ) I appreciate your reply!

Prasanna_Sasne · January 15, 2023, 5:08am

I have mentioned in the question itself, if we can do aggregation with two standalone databases(passing collections from one database to another), please suggest. You must provide the answer before saying some concept is wrong. I will definitely appreciate your answer in this case.

steevej · January 15, 2023, 2:06pm

Writing that what you wrote is wrong is the answer.

It might be true that your think aggregation will only work with replicas or shards. But what you think is wrong. It is false that aggregation only work with replicas or shards. Aggregation works even if you do not run a replica set or shards.

And you repeat the same wrong affirmation.

Aggregation can be used on standalone databases. You do not need replica set or shard to run an aggregation.

But you cannot use the aggregation framework to $out or $merge from one standalone to another standalone.

Just to be clear, aggregation can be used on standalone (that is no replica set and no shard) mongod instances but it cannot be used to synchronized the database of 1 instance into the other instance.

If Atlas triggers are not available to you. Change stream, which requires running a replica set of at least one mongod instance, can be used to synchronized databases from one database system to another database system. And just to be clear a database system can be a replica set of 1 mongod instance or a normal replica set. The target database system does not need to be a replica set.

Benjamin_Flast · January 17, 2023, 2:59pm

One thing to add in here for future readers. Triggers are a great way to detect a change in a collection. And if you need to then write data into another collection using the Aggregation language, you can do this using the $merge or $out syntax.

Additionally, if you need to write data not just into a different Database and Collection, but also into an entirely different Cluster or Serverless instance, you can use our Data Federation service. With Data Federation you can use $out and $merge, reading data from one Cluster or Instance and writing it to another Cluster or Instance as long as it is in the same Atlas Project.