Is it possible our index not covering well our demand?

Istvan_Hornyak · January 27, 2022, 6:15pm

Hi,

Our Mongo

First, our collection ( name it XY ) has more than ~3m documents.
The documents are mostly consistent, complex and nested.
All documents have these fields > {X: <str>}, Y: <datetime>
There is {X: 1, Y: 1} index on the collection

What we want

want to fetch for a big date range ( more than a year ) where most cases of X consist of thousands documents

Problem:
The query is simple > {X: <str>, Y: {$gte: <datetime>, $lte: <datetime>}}
But, when it runs, the results only come after long time that means 30-40 seconds or more.
According to the explain the query uses the index >

( copied from my company, that’s why somewhere I just replaced the values, for instance epoch )

'winningPlan': {
    'stage': 'FETCH', 'inputStage': {
        'stage': 'IXSCAN',
        'keyPattern': {'X': 1, 'Y': 1},
        'indexName': 'XY',
        'isMultiKey': False,
        'multiKeyPaths': {'X': [], 'Y': []},
        'isUnique': False,
        'isSparse': False
        'isPartial': False
        'indexVersion': 2
        'direction': 'forward'
        'indexBounds': {'X': ['["abc", "abc"]'], 'Y': ['(new Date(<epoch>), new Date(<epoch>))']}
    }
}

The results of explain was copied from python, and we’ve arrived to the main point of the issue.
For take all documents in one’s stride, in the code we use the built-in list method like this: list(<cursor>). But the query breaks because reaches the SSL connection timeout threshold.

According to the count, the results should be 16k documents, but that retrieves so slow…
What can we do? Could you please give advice, suggestions about issue?

Thanks for your attention in advance.

steevej · January 29, 2022, 12:55pm

Many things can explain the slowness. May be your station does not have enough RAM to load all 16k documents and it starts swapping. May be the working set is not in the server cache and documents have to be read from disk. May be you connection is too slow ti transfert 16k documents.

This being said, I would be worry of a use-case that requires 16k documents. I would check if the computation done on my station could be done with the aggregation framework instead.

You could also try to $project to reduce the amount of data transfer.

system · February 3, 2022, 12:56pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.