How to schedule tasks in MongoDB

Leandro_Santiago_Gom · March 27, 2020, 9:36pm

First of all, sorry for my English, I had help from google for this question =D

I am looking to migrate my MySQL solution to MongoDB because of a number of advantages that it allows me to. For this, there is a procedure that is done through a procedure that takes records that expired on a certain date and stores them in a secondary backup table, to speed up the search process in the main table.

I looked in the MongoDB documentation but I didn’t find anything that looks like a procedure for that. I cannot place this process being done by an external application because I work with clustered solutions so I would have to have the same application doing the same job at least 3 times in the bank (because of the multiple zones in the amazon cloud)

is there any way to schedule a task, procedure, or execution of a particular process within MongoDB itself, so that I can transfer these documents hourly to a backup collection?

DavidSol · March 28, 2020, 12:59am

Hola Leandro!

I think this is what you are looking for:
You set of a TTL for your documents: https://docs.mongodb.com/manual/tutorial/expire-data/
And then you get the deleted documents with a Change Stream: https://docs.mongodb.com/manual/changeStreams/
And when you get them you insert them into the secondary collection.

In Atlas, on the other hand, you could use Triggers: https://docs.atlas.mongodb.com/triggers/

Stennie_X · March 28, 2020, 3:50am

Welcome to the community Leandro,

The MongoDB server does not have a built-in task scheduler for running tasks or archiving documents, so you will have to use an external application and scheduler for a self-managed deployment.

MongoDB Atlas (our managed cloud service) does have a Scheduled Triggers feature enabling custom functions to run on a schedule.

If you are working with clustered MongoDB deployments (for example, a sharded cluster with multiple zones), you should only have to execute your archival task once per deployment. If you are managing multiple deployments, you will have to run separate tasks per deployment.

However, before adding the I/O overhead of moving documents to a new collection, I would consider if this is actually the best approach.

Some alternative approaches to consider:

If you are archiving on a schedule (like daily), you could write to different collections using a date-based naming scheme (eg: collection_yyyymmdd). However, this won’t be ideal if queries typically span multiple date collections.
If your concern is index size, you could use partial indexes to support common queries for documents that are active.
You could use zone sharding with tiered storage to move archival data to lower spec’d hardware.

A TTL index removes matching documents after an expiry date (in seconds).

The change stream delete event will be fired after documents are removed, but will only include the document _id:

The fullDocument document is omitted as the document no longer exists at the time the change stream cursor sends the delete event to the client.

MongoDB Atlas provides two kinds of triggers:

Database Triggers are based on change streams, so would not be suitable for archiving documents.
Scheduled Triggers run functions on a periodic basis, so would be a viable approach for Atlas users.

Regards,
Stennie

system · July 10, 2020, 1:01pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.