Hello,
I’m builidng some sort of website monitor with around 10k websites to monitor, roughly 2 times per month.
The monitor should start, run some tests on a batch of websites, maybe 100 per batch run, wait a min or so and run the next 100 batch tests.
In the collection: website_seeds I’ve got the list of all websites with some meta info.
The test results should be stored in website_results.
How would I data model / structure the data of the progress?
If I do sth like:
website_seeds.doc.status = new
start batch run
website_seeds.doc.status = processing
finished batch run
website_seeds.doc.status = completed
I’d need to “reset” all values to ‘new’, when I need to start the 2nd run.
So would that be reasonable to nest this info?
website_seeds.doc.run_log = [{'run_start': datetime, 'status': 'completed', ...},
{'run_start': datetime, 'status': 'processing'}]
In that way, I can check the most recent entry of all run_log.run_end fields and start a new batch after x days for the whole collection.
Is there a better aproach?