Hi,
could you have a look at the repo?
any doubt i can clarify?
Thanks!
Hi,
could you have a look at the repo?
any doubt i can clarify?
Thanks!
Hi @Joao_Passos!
I have cloned your repo and have been trying to debug the situation. I was also able to reproduce your issue, so I’ve pulled in a few of the driver engineers to see if they can help me debug further!
I’ll continue to post here when I have any meaningful updates. In the meantime, thank you for your patience while we figure this out!
Hi! Thanks for all your analysis @yo_adrienne
In my case, after lot of hours of research and work we concluded it was not the toList() what was wrong in our application. It was a little bit of everything… apart from the bandwith potential delays we had for sure, we were retrieving a lot of data - around 13MB on some particular scenario - from the database and there was the bottleneck. We had to change a litte bit our model by implementing the bucket pattern in the first place. Then we also made changes on our aggregate queries by setting lookups properly using the needed projections there as well.
After all those changes now it works as expected.
Thanks a lot!
That is so awesome to hear @Sergio_Fiorillo! Thanks for coming back and posting about it as it helps others as well
@Joao_Passos an update!
TL;DR: Most of the performance hit is due to dealing with the complexity of the Facility
objects.
Longer version:
After spending quite some time with the .NET driver team, we’ve found the following:
Facility
objects alone takes about 2 seconds. This is pure C# without the driver involved at all.RawBsonDocument
to Facility
objects isn’t really an equal or fair comparison (more on that below).Why comparing performance of RawBsonDocument
to Facility
object is not an equal comparison,
as explained by one of our Sr. .NET driver engineers:
Comparing the performance of RawBsonDocument to Facility objects isn’t exactly a fair comparison as we wrap the raw payload in an object but don’t iterate it or deserialize it into objects. So we are comparing time-on-the-wire (RawBsonDocument case) to time-on-the-wire plus deserialization overhead. That said, retrieving the raw BSON bytes and wrapping them in a collection of RawBsonDocuments takes 897ms but deserializing them into C# objects take ~14 seconds.
If we deserialize into
BsonDocument
(rather thanRawBsonDocument
) then we are avoiding the mapping overhead of converting raw BSON bytes into C# objects, but are still reading the bytes to convert into a series of nested BSON documents. Deserializing to BsonDocument takes 12s.
As a final test, we also modified just your Program.cs
file from your repo to see if any performance increases could be made. We were able to get some performance increases, but in a “quick and dirty, but gets results done” kind of way:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using MongoDB.Bson;
using MongoDB.Bson.Serialization;
using MongoDB.Driver;
namespace mongodb
{
internal class Program
{
private static readonly FacilityAttributeStringValue Country = new()
{Id = Guid.NewGuid(), AttributeId = Guid.NewGuid(), Value = "Spain"};
private static async Task Main(string[] args)
{
BsonClassMap.RegisterClassMap<FacilityAttributeDateTimeValue>();
BsonClassMap.RegisterClassMap<FacilityAttributeStringValue>();
BsonClassMap.RegisterClassMap<FacilityAttributeDecimalValue>();
BsonClassMap.RegisterClassMap<MonitoredDataSourceId>();
BsonClassMap.RegisterClassMap<CustomDataSourceId>();
var client =
new MongoClient(
"mongodb://localhost:27017/?readPreference=primary&appname=mongodb-vscode%200.4.1&ssl=false");
await client.DropDatabaseAsync("mongo-benchmark");
var db = client.GetDatabase("mongo-benchmark");
var typedCollection = db.GetCollection<Facility>("facilities");
//Insert 5000 facilities with 150 parameter and 500 attributes each
var sw = Stopwatch.StartNew();
var facilities = GetFacilities().ToList();
sw.Stop();
Console.WriteLine($"Created 5000 facilities in {sw.ElapsedMilliseconds} ms.");
await typedCollection.InsertManyAsync(facilities);
//Bson
var bsonCollection = db.GetCollection<BsonDocument>("facilities");
sw.Restart();
var bsonValues = await (await bsonCollection
.FindAsync(new BsonDocumentFilterDefinition<BsonDocument>(new BsonDocument()))).ToListAsync();
sw.Stop();
Console.WriteLine(bsonValues.Count + " BSON values took " + sw.ElapsedMilliseconds);
//RawBson
var rawCollection = db.GetCollection<RawBsonDocument>("facilities");
sw.Restart();
var rawValues = await (await rawCollection
.FindAsync(new BsonDocumentFilterDefinition<RawBsonDocument>(new BsonDocument()))).ToListAsync();
sw.Stop();
Console.WriteLine(rawValues.Count + " raw values took " + sw.ElapsedMilliseconds);
sw.Restart();
foreach (var doc in rawValues)
{
foreach (var field in doc.Elements)
{
switch (field.Name)
{
case "_id":
var id = field.Value;
break;
default:
foreach (var elem in (RawBsonArray)field.Value)
{
var value = elem.AsBsonValue;
}
break;
}
}
}
sw.Stop();
Console.WriteLine($"Iterating the raw BSON took {sw.ElapsedMilliseconds} ms.");
//Typed
sw.Restart();
var typedValues = await (await typedCollection.FindAsync(new BsonDocument())).ToListAsync();
sw.Stop();
Console.WriteLine(typedValues.Count + " typed values took " + sw.ElapsedMilliseconds);
}
private static IEnumerable<Facility> GetFacilities()
{
var parameter = Enumerable.Range(0, 150).Select(_ => Guid.NewGuid()).ToList();
var attributes = Enumerable.Range(0, 500).Select(_ => Guid.NewGuid()).ToList();
for (var i = 0; i < 5000; i++)
yield return new Facility
{
Id = Guid.NewGuid(),
ParametersValues = parameter.Select(p => new ParameterValue
{
Id = Guid.NewGuid(),
ParameterId = p,
DataSourceId = new CustomDataSourceId {Id = Guid.NewGuid()}
}).ToList(),
AttributesValues =
i % 3 == 0
? new List<AttributeValue>(attributes.Take(499).Select(a =>
new FacilityAttributeStringValue
{Id = Guid.NewGuid(), AttributeId = a, Value = a.ToString()}).Append(Country))
: new List<AttributeValue>(attributes.Select(a =>
new FacilityAttributeStringValue
{Id = Guid.NewGuid(), AttributeId = a, Value = a.ToString()}))
};
}
}
}
Those changes resulted in these benchmarks:
Created 5000 facilities in 3229 ms.
5000 BSON values took 16468
5000 raw values took 1014
Iterating the raw BSON took 3627 ms.
5000 typed values took 16698
So while the total time of about 16.7 seconds is longer than we’d like, it’s only twice as slow as the component parts, not 16x slower.
For reference:
which means best case scenario = 7.8 seconds without the class maps or other deserialization infrastructure involved.
I hope this helps better explain why you may be experiencing this expected delay with the objects you have!
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.