Hello @Daniel_Lux and welcome to the MongoDb community!!
MongoDB time series collections are basically a non materialised views under system.buckets
backed by the internal collections. Querying in the time-series collection, utilises this format and returns results faster.
The query optimisation on the time series collection works differently than the normal collection and hence, the db.<collection>.explain().<query>
works on the underlying collection rather on the non-materialised views being created.
The time series makes use the clustered index created by default and to further increase the performance, you can manually add the secondary indexes to the collection.
However, for the above dataset provided, I tried to reproduce the issue on the MongoDB version 6.0 with dataset of around 1 GB of dataset and observed the following:
Using the query :
db.sample.explain().aggregate([ {$sort:{timestamp:-1}}, {$limit:10} ])
This query takes around 16 seconds to respond with the appropriate documents and uses the COLLSCAN
to fetch the documents which explains the delay.
....
executionStats: {
executionSuccess: true,
nReturned: 11001,
executionTimeMillis: 16937,
totalKeysExamined: 0,
totalDocsExamined: 11001,
executionStages: {
stage: 'COLLSCAN',
nReturned: 11001
....
However, using the range query as:
db.sample.explain().aggregate([ {$match:{timestamp:{$gte:ISODate(“2021-01-10T10:02:18.242Z”)}}}, {$sort:{timestamp:-1}}, {$limit:10} ])
subsequently reduced the execution time for the query and makes use of the clustered Index.
....
executionStats: {
executionSuccess: true,
nReturned: 11001,
executionTimeMillis: 8607,
totalKeysExamined: 0,
totalDocsExamined: 11001,
executionStages: {
stage: 'CLUSTERED_IXSCAN',
filter: {
'$and': [
{
_id: { '$gte': ObjectId("5fd343aa0000000000000000") }
},
{
'control.max.timestamp': {
'$_internalExprGte': ISODate("2021-01-10T10:02:18.242Z")
}
},
{
'control.min.timestamp': {
'$_internalExprGte': ISODate("2020-12-11T10:02:18.242Z")
}
}
]
},
nReturned: 11001,
executionTimeMillisEstimate: 2,
....
This further explains that, the range query targets the buckets and unpacks and produces the output, however, query without the range, would basically unpack all and give the output documents which explains the COLLSCAN
in the execution status.
For the above two query, the executionTimeMillisEstimate
field value has efficiently reduced from ~16 seconds to ~8 seconds on average.
Therefore, the recommendation would be make use of the range query to optimise the response for the collection.
In conclusion, a time series collection works differently than a normal collection, and as of MongoDB 6.0, the usual query optimisation methods may not work as expected. Note however that this may change in the future
Let us know if you have any further questions.
Best regards
Aasawari