What does mean "runtime per request" in Atlas Functions limits

Trdat_Mkrtchyan · October 19, 2022, 2:41pm

Hi community!

What does “per request” mean here? Does it mean that only requests inside of functions must be executed in 150 seconds, or does it meant that function itself should executed in less than 150 seconds?

I’m planning to write a function which traverse big database and made some changes. It may take hours, but every single request to DB is pretty fast. Can I do it?

Mansoor_Omar · October 19, 2022, 10:45pm

Hi Trdat,

It means the entire function needs to complete in 150 seconds as well as any other functions that are called from it.

Regards
Manny

Trdat_Mkrtchyan · October 20, 2022, 9:09am

Maybe you have some clues, how can I organize my task in Atlas ecosystem. I don’t wanna setup dedicated machine to run jobs on mongodb when something changes on DB, but from the other side jobs will take hours, so I’m not able to use functions.

Paolo_Manna · October 20, 2022, 12:31pm

Hi @Trdat_Mkrtchyan,

There are two questions that arise from your description:

How frequently the tasks need to run?
Is the whole DB traverse always necessary, or can it be split in logical chunks?

If the tasks take hours because they run, say, once per day or even larger intervals, then functions can still be used from scheduled triggers that run more frequently (for example, 10-15 mins), limiting the job that each run has to do to a defined chunk, and staying under the 150 secs.

If however the tasks take hours and need to run on the whole DB all the time, then having a dedicated machine is the least of your problems: an architecture that requires such a continuous maintenance/processing would have costed also in computing hours, and probably require a higher cluster tier just to ensure that these tasks don’t affect the overall performance…

Trdat_Mkrtchyan · October 20, 2022, 3:51pm

Ideally I’d like to run function when collection changes. But there are some restriction:

I have a very big collection (millions) which sometimes totally updates from third party, but updates come randomly, they not separated on some logical chunks. Only way I can distinguish that third party finished updates is detect that updates started and for a period of time there’s no update occurred in collection. So I’d like to start trigger not on any insert but say after an hour since last update.

Function itself can work chunk by chunk, I have fields in collection which allow to group jobs by some sign. But I can’t figure out how to organize whole task:

Third party updates collection → I’m assured that updates are finished → Iterate functions chunk by chunk

Trdat_Mkrtchyan · October 20, 2022, 3:53pm

And I’m ok to start job manually if it’s possible to iterate though DB

Paolo_Manna · October 20, 2022, 4:10pm

Hi @Trdat_Mkrtchyan,

There are a number of possible solutions to that, one possibility, for example:

Have a database trigger on the collection you want to observe: the connected function does nothing but keeping track of the $clusterTime (i.e. the time the change was applied to the DB), and save it somewhere if higher than the previous one.
Have a scheduled trigger, running every 5-10 minutes, that observes the latest registered time the collection was changed: if more than an hour has passed, than the collection is ready for cleanup, and a relatively small chunk of data can be processed
Let the schedule cycle through all the chunks, until it’s done, and wait for the next update
You can also add more complex logic, i.e. by saving different $clusterTime for each 3rd party, and process at any given time only the updates of the one that had finished first.

Does the above make sense?

Trdat_Mkrtchyan · October 20, 2022, 5:43pm

Hi @Paolo_Manna

Thanks for response. Scheduling itself is not essential, and moreover I feel observing collection is not quite good idea cuz functions will change collection and fall into infinite loop. The thing that I can’t understand, how to run function on chunks. Ideally I’d like to run “something”, and it will traverse and collection DB. Only thing that comes to mind is:

run function with arguments
write message {from, to, current} into separate collection, say named operations
trigger which observers operations runs function with argument current = from
when functions finish it updates operations with current = from + 1
got to 3 unless current != to

Paolo_Manna · October 21, 2022, 8:04am

Hi @Trdat_Mkrtchyan ,

You’re of course aware of the tasks, so, as I wrote, mine was just one possibility, you may well find a different one that suits better. One thing however I wanted to clarify

That’s a common point that triggers have to face, and there are standard procedures to avoid that (for example, that’s what match expressions are for): as long as you can identify which kind of changes you want to react to (or not), you’ll be fine.

system · October 26, 2022, 8:05am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.