I’m builidng some sort of website monitor with around 10k websites to monitor, roughly 2 times per month.
The monitor should start, run some tests on a batch of websites, maybe 100 per batch run, wait a min or so and run the next 100 batch tests.
In the collection: website_seeds I’ve got the list of all websites with some meta info.
The test results should be stored in website_results.
How would I data model / structure the data of the progress?
If I do sth like:
website_seeds.doc.status = new
start batch run
website_seeds.doc.status = processing
finished batch run
website_seeds.doc.status = completed
I’d need to “reset” all values to ‘new’, when I need to start the 2nd run.
In this example, each entry in the run_log array represents a monitoring run. It includes the run_start timestamp, run_end timestamp (if available), and the status of the run. This allows you to track the history of each run and determine the most recent run.
To initiate a new batch run, you can check the latest entry in the run_log array for each website. If the latest run is completed or if a certain time threshold has passed, you can update the status field of all documents in the website_seeds collection to “new” to indicate that they need to be processed again.
By following this data model, you can easily track the progress of each website, store the test results separately, and have a history of the monitoring runs for reference.
Remember to adapt this model to your specific requirements and consider any additional fields or information that might be relevant to your monitoring process.