In a MongoDB time series collection, get the latest document from each device

Sreenath_Ramanna · October 3, 2022, 6:01pm

Hi we have a time series based collection. Devices continually send their data into this collection. It has a designated time field (ts) and a metaFiled by name metadata.serial_number. There are currently about 4000 device contributing documents (records) to this collection. I want to get the latest document from each of these devices. This is proving to be quite slow, takes about 4 minutes!! Any suggestions on how to speed this up is greatly appreciated. I am currently running this query as these steps:

Get a list of unique serial numbers. This is reasonably fast takes about a couple of seconds.
Query line in python: db.command({ “distinct”: “collection”, “key”: “metadata.serial_number”})
For each serial number run an aggregate to get that serial number’s last document. This takes a long time (almost 4 minutes). Below is the aggregate pipeline I am using:
pipeline = [
{ “$match”: { “metadata.serial_number”: serial_number} },
{ “$sort”: { “ts”: -1 }},
{ “$limit”: 1 }
]

Again, any suggestions on how to speed this up is greatly appreciated!!

Thanks & regards,
Sreenath