This issue I discuss is the slow time of retrieval of 60 000 documents from a timeseries collection.
Environment and tech-stack facts:
Server Version: MongoDB 8.0.4-2
Hosted on Linux (bare metal server 24 cores, 128GB RAM, a sufficient SSD… doubt it’s up to the hardware or network).
Using dotnet MongoDB.Driver v2.12.3
Collection definition:
{
"name" : "gpsTS",
"type" : "timeseries",
"options" : {
"timeseries" : {
"timeField" : "utcCreated",
"metaField" : "metaData",
"granularity" : "seconds",
"bucketMaxSpanSeconds" : 3600.0
}
},
"info" : {
"readOnly" : false
}
}
Sample document (with info of the metaData):
{
“utcCreated” : ISODate(“2024-04-16T22:22:55.000+0000”),
“metaData” : {
“assetId” : 26658,
“companyId” : 1399,
“driverId” : null
},
“direction” : 308,
“latitude” : 41.72669982910156,
“utcReceived” : ISODate(“2025-05-10T18:29:50.242+0000”),
“satellites” : 16,
“_id” : ObjectId(“681f9b1e2e4b9663c0fedf8b”),
“speed” : 0.0,
“hdop” : 0.0,
“newDistance” : 0.0,
“longitude” : -87.80660247802734
}
All the documents are uniform, with the exact same fields present.
The indexes are:
[
{
“v” : 2.0,
“key” : {
“metaData” : 1.0,
“utcCreated” : 1.0
},
“name” : “metaData_1_utcCreated_1”
},
{
“v” : 2.0,
“key” : {
“metaData.companyId” : -1.0,
“metaData.assetId” : -1.0,
“utcCreated” : -1.0
},
“name” : “metaData.companyId_-1_metaData.assetId_-1_utcCreated_-1”
}
]
Where I’d put an accent on the “metaData.companyId_-1_metaData.assetId_-1_utcCreated_-1” as a relevant index for the query in question (confirmed hit of index for the query with explain(“executionStats”)).
Since the query was too slow, I’ve tried 3 approaches:
1. The first is the most basic one Find().ToListAsync()
public async Task<IEnumerable<GpsTSDto>> GetAsync()
{
var filter = Builders<GpsTSDto>.Filter.And(
Builders<GpsTSDto>.Filter.Gt(d => d.MetaData.CompanyId, 1000),
Builders<GpsTSDto>.Filter.Gte(d => d.MetaData.AssetId, 26657),
Builders<GpsTSDto>.Filter.Lte(d => d.MetaData.AssetId, 26659),
Builders<GpsTSDto>.Filter.Gte(d => d.UtcCreated, new DateTime(2025, 04, 16, 0, 0, 0, DateTimeKind.Utc)),
Builders<GpsTSDto>.Filter.Lte(d => d.UtcCreated, new DateTime(2025, 05, 10, 21, 50, 0, DateTimeKind.Utc))
);
return await _gpsTSCollection.Find(filter).ToListAsync().ConfigureAwait(false);
}
2. Tried using cursor with default dotnet driver settings:
public async Task<IEnumerable<GpsTSDto>> GetToTestAsync()
{
var filter = Builders<GpsTSDto>.Filter.And(
Builders<GpsTSDto>.Filter.Gt(d => d.MetaData.CompanyId, 1000),
Builders<GpsTSDto>.Filter.Gte(d => d.MetaData.AssetId, 26657),
Builders<GpsTSDto>.Filter.Lte(d => d.MetaData.AssetId, 26659),
Builders<GpsTSDto>.Filter.Gte(d => d.UtcCreated, new DateTime(2025, 04, 16, 0,0,0, DateTimeKind.Utc)),
Builders<GpsTSDto>.Filter.Lte(d => d.UtcCreated, new DateTime(2025, 05, 10, 21, 50, 0, DateTimeKind.Utc))
);
var documents = new List<GpsTSDto>();
var i = 1;
using var cursor = await _gpsTSCollection.FindAsync(filter);
while (await cursor.MoveNextAsync())
{
documents.AddRange(cursor.Current);
logger.LogInformation($"# {i++}");
logger.LogInformation($"# {i} docs {documents.Count}");
}
logger.LogInformation($"GetToTestAsync {i} times");
return documents;
}
came out that the default batch size in the MongoDB .NET driver is around 101 documents for the first batch, then 4MB worth of documents per subsequent batch
3. Then I override the fetch batch size of the dotnet driver to 60000 so that it would fetch all docs in the first batch, thus avoiding additional db fetch round trips, but the result did not change much.
The code is the same as the one undet opint 2 with the only difference:
var options = new FindOptions<GpsTSDto>
{
BatchSize = 60000
};
using var cursor = await _gpsTSCollection.FindAsync(filter, options);
All of the attempts (tried for at least 10 times each) are returning the documents for approximately 24,5 seconds.
-
The collection and the indexes are created “by the book”.
One of the main points (among others like reduced storage, single index per bucket, fast BulkInsert) is to be able to retrieve data faster (with native bucket unpacking). -
The collection settings are set to seconds since the data for a single document is usually generated on each 5 seconds (although BulkInsert-ed for optimization, also by the book).
Seems that the MongoDB is REALLY SLOW on performing data retrieval of larger sets (that has to be fetched with or without a cursor).
QUESTIONS:
Could it be that the setting needs to be changed for the bucketing, hours instead of seconds.
Then it could mean it would probably have less buckets, and gain more performance when fetching in terms of mongo engine unpacking the buckets?
But then MongoDB would probably auto-close buckets when the number of docs. reaches 1000.
Additional note:
I’ve inspected the distribution of documents per bucket:
Query:
db.getCollection(“system.buckets.gpsTS”).aggregate([
{
$project: {
bucketSize: { $size: { $objectToArray: “$data” } }
}
},
{
$group: {
_id: null,
averageDocsPerBucket: { $avg: “$bucketSize” },
minDocsPerBucket: { $min: “$bucketSize” },
maxDocsPerBucket: { $max: “$bucketSize” },
countBuckets: { $sum: 1 }
}
}
])
Result:
{
“_id” : null,
“averageDocsPerBucket” : 10.0,
“minDocsPerBucket” : 10,
“maxDocsPerBucket” : 10,
“countBuckets” : 108719.0
}
it seems that it might be good to set the granularity on wider span, like hours instead of seconds, thus reducing the number of buckets and gaining a bit of speed.
But my doubt and basic concern is that fetching larger set of docs. will still be slow since it will use a cursor on behind the sceenes.
I claim that this isn’t a network issue (cause I’ve inserted the same data into an MS SQL table, with a clustered index after struggling with this issue for quite a while), and the response time is usually below 120ms (being hosted on the exact same physical machine).
I hope I’m doing something wrong, and get some tips on how-to.
Thank you all in advance!