We suspend some of our Atlas clusters overnight when not in use as they are quite large and there is no point keeping some of the testing environment active when all the developers are offline.
The issue we have is that the triggers fail when this happens, we have the advanced option set to Auto Resume the trigger but I believe this is just if the resume token fails as opposed to the whole data connection going away.
Currently we need to manually resume the trigger after the resume has taken place in the morning.
What’s the best solution to this? Get the DBA script to pause the triggers before the cluster is suspended and then resume it after? Is there an alternative setting that could be set to cope with this a touch more gracefully?
Hi, triggers under the hood are just a Change Stream that we operate for you. Therefore, when you pause your cluster, the trigger attempts to connect to the cluster and open a change stream (it retries this for a while) and ultimately errors and enters the failed state.
We have some customers that do similar things to you and the best solution is to do the following:
Thanks Tyler, I suspected that was the approach to take. Our DBA team already use the API for pausing the cluster so I’ll get them to add that call to their scripts.