TimeSeries Collection - Query Targeting: Scanned Objects / Returned has gone above 1000

Hi all,

Thank you for reading my post! I really need your help.
I have a timeseries collection to query.
My Atlas Cluster installs a MongoDB 6.0.2 Enterprise.

My collection is created with the following command which leverages pymongo.


    def create_timeseries_collection(self, name: str, granularity: str) -> Collection:
        try:
            self.db.command(
                "create",
                name,
                timeseries={
                    "timeField": "date_utc",
                    "metaField": "symbol",
                    "granularity": granularity,
                },
            )
            self.db.get_collection(name).create_index([("symbol", ASCENDING), ("date_utc", DESCENDING)])
        except OperationFailure:
            logger.exception(f"Collection {name} already exists. Skipping..")
        coll = self.db.get_collection(name)
        
        return coll

I would need to get the last value of my timeseries.
These are stock market timeseries.

These are the ways I have tried.

    def _get_latest_historical_values(self, symbol, coll_name):
       
        cursor = self.db[coll_name].aggregate([
                            {
                                '$sort': {
                                    'symbol': 1, 
                                    'date_utc': -1
                                }
                            }, {
                                '$match': {
                                    'symbol': symbol
                                }
                            }, {
                                '$limit': 1
                            }
                        ])
        for el in cursor:
            latest_value = el
        cursor.close()
        
        return latest_value

Or:

        latest_value = self.db[coll_name].find({"symbol": symbol}).hint('symbol_1_date_utc_-1').sort({"symbol":1, "date_utc":-1}).limit(1)[0]

But I keep getting the alert (by email) every 5 minutes–because I run the query every 5 minutes.

I’ve tried following Mongo’s recommended best practices…but I don’t understand what I’m doing wrong and especially if there is a better way to grab the last value of a collection by filtering on a metaField.

Please Help Me!

Hi @Vatemecum

I believe a very similar question was answered by @Aasawari on TimeSeries last x documents please have a look at that thread and see if it helps your case.

Basically due to the way the time series collection is working currently, I think adding a timestamp window should make this better. For example, if you add {$match: {date_utc: {$gte: <some recent date time>}}} as the first stage of the pipeline, it should perform better.

If this doesn’t work, could you post the output of db.collection.explain('executionStats').aggregate(...) output, both before and after adding the timestamp window matching? Also some example documents will be helpful.

Best regards
Kevin

1 Like