Triggers not working with Global Deployment

Try_Catch_Do_Nothing · April 11, 2023, 1:16am

Has anyone experienced issues with Atlas App Service triggers not working in a Global Deployment setup?
I have an app in one project that is a Single Deployment and the triggers work fine/execute as expected.

When I push the exact same app code to a project with a Global Deployment, the same trigger code NEVER fires.

I updated the Global Deployment to be Single Deployment in the second project and the triggers work fine with no code changes.

WHY???

Mansoor_Omar · April 11, 2023, 1:31am

Hello,

Please clarify this point.

When I push the exact same app code to a project with a Global Deployment, the same trigger code NEVER fires.

Do your app logs show that the trigger is being fired but the function code does not appear to be carrying out the code logic? OR
Do your logs have no sign of the trigger getting fired.

Are there any error logs at all in the logs? If so, please share.

Regards
Manny

Try_Catch_Do_Nothing · April 11, 2023, 1:43am

#2 - There’s no sign the trigger is ever fired.
No errors, nothing.

Mansoor_Omar · April 11, 2023, 2:51am

Do you have exactly the same configuration for the trigger on both apps? (Particularly the match expression)

Please share the json config files for the trigger on both apps.

Regards

Try_Catch_Do_Nothing · April 11, 2023, 3:51am

The trigger config is the same and is attached.
The config references the databaseSuffix environment variable, which is different between the two projects.

Once the app is deployed, the full database name ends up being:

DEV: users-FAV-24-Add-modified-
TEST: users-2023-04-10_1681166526

TEST is the environment where the trigger did not fire with Global Deployment. After I changed to Single Deployment with no other changes, the trigger began firing.

onUserProfileSavedTrigger.json (1.2 KB)

Mansoor_Omar · April 13, 2023, 6:00am

Thanks for that, I’ve found the app in question based on the trigger name.

As the trigger has a match condition, please try the troubleshooting step of creating a duplicate copy of this trigger but remove the match condition and link it to a function that only has the following:

exports = function(changeEvent){

console.log(JSON.stringify(changeEvent.updateDescription));

};

Perform an update operation and check the logs to see if what is printed in updateDescription does allow your match expression {"updateDescription.updatedFields._modifiedTS":{"$exists":false}} to pass.

Does the duplicate trigger fire at all?
Does the console log accommodate the match expression you’re using?

If the trigger is still not firing with no match expression then this is most likely due to the cluster size being M0 which is not uncommon to experience issues with changestreams (that triggers use) since the resources are shared with other clusters. In fact I do see error logs on my side for this trigger pertaining to changestream limitations. Please try upgrading to a larger tier size (preferably a dedicated tier such as M10).

Regards
Manny

Try_Catch_Do_Nothing · April 13, 2023, 4:45pm

So the trigger does fire with the setup you mentioned. I can see entries in the logs and when I expand one of the entries, I see the below info.
This leads me to believe your last statement must be true – the trigger not firing is related to the cluster size.

My next concern is, if I upgrade to a larger tier size, and then hit a certain limit, how will I know the trigger is no longer working, similar to the current situation?

There’s no alert that I can see to inform me of this.

Logs:

[ "{\"updatedFields\":{\"email\":\"bob@bob.com\"},\"removedFields\":[],\"truncatedArrays\":[]}" ]

Try_Catch_Do_Nothing · April 13, 2023, 9:39pm

And also, FYI, I just upgraded to a dedicated M10 cluster and triggers still DID NOT work with the app deployed globally. However, I changed the app deployment to local (same region as the cluster) and the triggers finally did work.

So, it seems triggers are not supported with global deployment or something…

I would like to know about alerting though. If triggers simply are not firing, then there needs to be an alert of some kind.

Try_Catch_Do_Nothing · April 14, 2023, 12:49am

Another interesting twist…it seems just redeploying the app sometimes fixes the trigger issue (not necessary switching the deployment location). I made a simple variable update (not related to triggers) and redeployed using the UI and then re-ran my tests. The trigger did fire after redeployment.

Any ideas here??

Brock · April 14, 2023, 5:31am

You have the option to sign up for a free trial subscription for development support and request a comprehensive evaluation of the backend associated with your triggers and functions. As M10 is designed for low-traffic production environments, it might be too small for your requirements. Therefore, it would be advisable to conduct a more in-depth analysis.

Regarding identifying trigger failure, there are a few indicators:

The data does not appear.
The logs report errors related to the trigger.
The triggers menu displays the shut-down trigger.

A way I get alerts is with my text alert script, all of my Atlas accounts (All DevOps honestly…) I have text alerts sent to me if something goes wrong.

const { MongoClient } = require('mongodb');
const accountSid = 'your_account_sid';
const authToken = 'your_auth_token';
const client = require('twilio')(accountSid, authToken);

// Replace with your MongoDB Atlas connection string
const uri = 'mongodb+srv://<username>:<password>@<clustername>.mongodb.net/test?retryWrites=true&w=majority';

const dbName = 'test';
const collectionName = 'myCollection';

async function checkTrigger() {
  const client = await MongoClient.connect(uri);
  const collection = client.db(dbName).collection(collectionName);

  // Query the collection to check if data is being inserted
  const result = await collection.find({}).toArray();

  // If the result is empty, the trigger is not working
  if (result.length === 0) {
    // Send an SMS alert using Twilio API
    client.messages
      .create({
         body: 'Alert: Trigger is not working',
         from: '+1your_twilio_number',
         to: '+1your_phone_number'
       })
      .then(message => console.log(`Alert sent: ${message.sid}`));
  }

  await client.close();
}

// Call the function to check the trigger every 5 minutes
setInterval(checkTrigger, 5 * 60 * 1000);

Try_Catch_Do_Nothing · April 14, 2023, 4:55pm

As M10 is designed for low-traffic production environments, it might be too small for your requirements.

So the answer is to continue paying more and more money and eventually triggers will work?
We’re talking one simple trigger here with the same code that is used in a separate project and works.

Every time this trigger is deployed to the second project, it never works until the app is manually re-deployed.

Tyler_Kaye · April 15, 2023, 2:17pm

Hi, I am pretty sure that there should be no difference in the trigger for local vs global deployment and your trigger should definitely work regardless of your cluster tier. We do some rate limiting and limit the number of triggers you can have, but you have not hit those.

I think that Manny is correct and this issue is more about your match expression. If I understand correctly, removing the match expression led to your trigger firing correctly, so the conclusion of that experiment is not that “the trigger not firing is related to the cluster size”, but rather that the match expression is likely misconfigured.

Match expressions can admittedly be tricky. One thing that stands out is that your trigger is configured for Insert, Replace, and Update events, but your match expression is:

{"updateDescription.updatedFields._modifiedTS":{"$exists":false}}

This match expression will only pass for an update event where one of the modified fields is “_modifiedTS”. If something is inserted or replaced (and many tools like Compass are replacing objects, not updating them) then the match expression here will skip over the event (as designed). I see that you have specific logic in your function for handling insert events so I suspect this is the issue. This also explains why removing the match expression led to the execution of your trigger.

I think it is worth pointing out that the Match Expression is more of a power-user feature in order to prevent the trigger from firing too much under a lot of load; however, it can be tricky to configure given it is a filter on the Change Event’s which are not something people are used to interacting with much. Therefore, I often advise people to use no Match Expression and instead write the filtering logic directly into the function where you have more control and understanding of the input.

If you do want to continue with the Match Expression, can you clarify a few things:

Can you clarify the goal of your trigger? When do you want it to run exactly?
What events are you not seeing come through? Are they updates, inserts, deletes, etc? How are you making these modifications?

Thanks,
Tyler

Brock · April 15, 2023, 3:59pm

@Tyler_Kaye

There might actually be a bug in that case, as this isn’t the first time hearing something working in regional vs global.

It’s also been observed seeing peoples apps regionally connecting with as much as 64.4% (per one user who launched their companies dispatch app) more easily with nearest regional cluster vs global deployments as well.

This has been observed since global was released, and is viewable in previous ticket histories too for the TS department.

Potentially it could be the match expression, but ultimately having something work like this in regional vs global is a fairly consistent phenomenon that’s been seen on more than one occasion.

Try_Catch_Do_Nothing · April 15, 2023, 4:31pm

I think this is an incorrect assessment. I’ve done more testing and this issue does not seem to be related to global vs. local deployment like I first thought.

To summarize my setup:
I have a Github repo/Github Actions that automatically deploy a Realm app based on push/pull request.
On push, the Realm app is automatically pushed to a “DEV” environment (which is a separate Atlas project), and integration tests are run immediately after deployment, which include testing trigger functionality. So far, the triggers have been working as expected in DEV.

On pull request, the exact SAME codebase is deployed to a “TEST” environment (which is another Atlas project) and the SAME integration tests are run, which currently fail consistently every time on the trigger tests.

What I’ve noticed is the trigger doesn’t seem to “initialize” for at least 10-15 minutes after deployment in the TEST environment, so when the integration tests run immediately after deployment, the trigger is not yet running (I can confirm in the UI (“Last Cluster Time Processed” shows blank). If I wait (don’t have to re-deploy either like I initially thought), the trigger does eventually start.

But now I’m wondering why there is a delay with the trigger in TEST project vs. DEV.

Brock · April 15, 2023, 4:33pm

What cloud services are each environment, and what regions?
Is one on global and the other on regional?
Are they both the same region?
@Try_Catch_Do_Nothing

The behavior you’re describing has been observed with globals (Time delays).

Also @Try_Catch_Do_Nothing and @Tyler_Kaye

Gitlab and Github deployments to Atlas can vary based on which service provider the cluster is on.

This is largely speculated to be due to the hardware differences in say an Azure cluster vs AWS cluster specs etc.

Tyler_Kaye · April 15, 2023, 4:46pm

Hi Brock, I appreciate you chiming in but I am happy to take this over.

@Try_Catch_Do_Nothing do you mind answering my questions above? While it might not answer the entire question, I think there is definitely something to be said that you have function code asserting on the operation type being “insert” but have a match expression that will only let in Update events. However, I realize there might be two issues going on here.

Can you link to your Test app? I see all of your organizations. Prod / Dev is a single region in Oregon and Test has no application linked to it from what I can see. I can tell you that there should not be any difference between any of these environments. The environment badges you place on your application do not affect anything in our deployment or the service we offer you.

The conversation here seems to have gotten a little confusing to follow, so mind if I summarize:

There seems to be some match expression issues occurring here. It might not be the cause of your issues, but it does seem like at the very least an issue that will come up later on.
This does not seem to be related to local vs. global deployment.
It seems like some tests are occasionally failing and we can try to get to the bottom of it. Do you mind linking to an application where the trigger is added and a timestamp of when this is happening? Generally speaking, it can take up to a minute for a trigger to begin running when it is created. Normally this time is much lower, but it is possible due to the underlying system that there is some delay. If you can provide a link to the trigger that you are observing this delay on I would be happy to look into our logs to see what might be going on.

Let me know if this sounds correct.

Best,
Tyler

Try_Catch_Do_Nothing · April 15, 2023, 4:55pm

Hi Tyler, I’m deploying the latest build to test right now and it will automatically kick off the integration tests.

Tyler_Kaye · April 15, 2023, 5:03pm

Hi, I need to step out for a few hours, but I just took a quick look and I can see trigger with:

App ID: 643ad6b60ee565a48b911db7
Trigger ID: 643ad6c9d74168373224fcc5

Starting up properly at: 4/15/23 4:57:44.315 PM (this is 3 minutes ago from the time of this post)

Can you let me know how the test works? Also, can you let me know what update you are making that you are expecting to see the trigger fire on?

Lastly, I am a touch confused still because it sounds like when you removed the match expression from the trigger, it worked. Is that correct?

Try_Catch_Do_Nothing · April 15, 2023, 5:06pm

The same tests just ran against both environments with the same result. The trigger in TEST was not initialized in time when the tests ran, so they failed. DEV passed as usual.

TEST:

DEV:

Try_Catch_Do_Nothing · April 15, 2023, 5:15pm

Starting up properly at: 4/15/23 4:57:44.315 PM (this is 3 minutes ago from the time of this post)

This is the problem!
The tests began executing immediately after deployment which was:

Sat, 15 Apr 2023 16:55:15 GMT

Is it normal to take several minutes after deployment for triggers to initialize??