Hi we have a time series based collection. Devices continually send their data into this collection. It has a designated time field (ts) and a metaFiled by name metadata.serial_number. There are currently about 4000 device contributing documents (records) to this collection. I want to get the latest document from each of these devices. This is proving to be quite slow, takes about 4 minutes!! Any suggestions on how to speed this up is greatly appreciated. I am currently running this query as these steps:
- Get a list of unique serial numbers. This is reasonably fast takes about a couple of seconds.
Query line in python: db.command({ “distinct”: “collection”, “key”: “metadata.serial_number”}) - For each serial number run an aggregate to get that serial number’s last document. This takes a long time (almost 4 minutes). Below is the aggregate pipeline I am using:
pipeline = [
{ “$match”: { “metadata.serial_number”: serial_number} },
{ “$sort”: { “ts”: -1 }},
{ “$limit”: 1 }
]
Again, any suggestions on how to speed this up is greatly appreciated!!
Thanks & regards,
Sreenath