Improving Performance for a Find query with filters

Hi everyone,

I am quite new to MongoDB and I’ve been assigned to deal with a really large data set (a Collection with over a billion documents) by querying it with a filter of dates and another value. Naturally I created a Compound Index because without it nothing would have worked but still performance time of queries is poor. I don’t know if adding more Indexes would really help (Compund Index includes the date field and the other one is ASC order). I consider migrating the Collection to Time-Series collection. I’d be glad to head everyone’s thoughts and advises + what would be cheapest way to migrate to Time-Series collection?

Does your query return a lot of fields? If it’s targeted for a few, then it may be worth looking at a covering index, this means that all the data needed for the query to return results is within the index so no document will need to be retrieved.

I’m afraid I’ve not used time-series, but I’m sure there are lots of experts about.

It may be helpful to post a .explain of your query to look at what is actually taking the time. It’s probably also worth looking at server setup. Is it Atlas or on-prem and how large are the indexes, can they fit into the available memory.

Hi Jown_Sewell,

My query does return a lot of fields, mainly because of the data structure.
I am trying to get the output of the .explain() method for my query but so far for today it’s not success in my Robo 3T Studio client as I get a blank response.
I am using an on-prem server and the Compound Index size is 48.9 GB. The unique _id Index size is 41.4 GB. I would need to ask Sysadmin about the available memory but he never prompted me of memory over usage so far.

Sounds like a covering index may consume a vast quantity of ram! I guess an example document as well as example query may help others assist with optimisations and comments on conversion to a time-series if that’s the best option.

This is an example document:

{
    "_id" : ObjectId("647dd8996bb8634c52e492cd"),
    "dateTime" : ISODate("2018-01-01T09:00:00.920+0000"),
    "securityId" : NumberInt(1111),
    "priceType" : " ",
    "lastVolume" : NumberInt(0),
    "isMega" : false,
    "lastPrice" : NumberInt(222),
    "bidLevel1" : NumberInt(0),
    "bidLevel2" : NumberInt(0),
    "bidLevel3" : NumberInt(0),
    "bidLevel4" : NumberInt(0),
    "bidLevel5" : NumberInt(0),
    "bidSizeLevel1" : NumberInt(0),
    "bidSizeLevel2" : NumberInt(0),
    "bidSizeLevel3" : NumberInt(0),
    "bidSizeLevel4" : NumberInt(0),
    "bidSizeLevel5" : NumberInt(0),
    "askLevel1" : 51,
    "askLevel2" : 52,
    "askLevel3" : NumberInt(54),
    "askLevel4" : NumberInt(0),
    "askLevel5" : NumberInt(0),
    "askSizeLevel1" : NumberInt(22),
    "askSizeLevel2" : NumberInt(23),
    "askSizeLevel3" : NumberInt(24),
    "askSizeLevel4" : NumberInt(0),
    "askSizeLevel5" : NumberInt(0)
}

Example query as written in C# driver:

var filter = Builders<Quote>.Filter.And(
                Builders<Quote>.Filter.Eq(x => x.SecurityId, int.Parse(assetId, System.Globalization.NumberStyles.Integer)),
                Builders<Quote>.Filter.Gte(x => x.DateTime, from),
                Builders<Quote>.Filter.Lte(x => x.DateTime, to));

Did you check ESR rule?

And if your index-ed fields have lots of same values, then index doesn’t help a lot. (e.g. number of match-ed doc is high)

I did, Nothing helpful.