We have a requirement where we want to use an “aggregation pipeline” which will include operations over 2 or 3 “source collections” and at the final stage of the pipeline the results has to be added to a “Target Collection”.
Now, the data in the source collections can change on a daily basis and therefore we would want the data inside “Target Collection” also to be Refreshed periodically. On checking the article On-Demand Materialized Views — MongoDB Manual it is mentioned that we can use a function to trigger the pipeline again, however there is no info on how this refresh can be scheduled. So we have the following questions :
Could you kindly suggest how we can schedule the refresh of Target collection on a periodic basis using standard MongoDB instance, we dont have Atlas instance.
If some JSON records insert/updates to the Target Collection are rejected during Aggregation Pipeline run then how can we get a summary of this information at end of run?
MongoDB version : 4.2.17
Hosted on: AWS EC2
Config: Standalone 3 node replica set
You could write a script and then invoke it with a cron job (assuming Linux OS).
Depends what you mean by “rejected” - if there is an error when inserting or updating then the aggregation will error out with an error message. If the aggregation does not error then all the documents were processed according to the pipeline directive - unfortunately we don’t make the summary of how many documents that was available except potentially in mongod logs.
We tried out this option however due to organization policies we do not have permissions to schedule a cron job on the server. So is there any alternative to cron jobs? The MongoDB instance is hosted on AWS EC2 so is there any AWS service that might help us in scheduling that you are aware of? Thanks in advance.
The cron job does not have to run on the server that mongod is running on. You can literally run it on any server, in the cloud or on prem. It just needs to be able to connect to the cluster.