C# .NET Core 3.1 - Driver (2.11.1) - Slow ToList() data manifestation

Hi,

could you have a look at the repo?
any doubt i can clarify?

Thanks!

Hi @Joao_Passos!

I have cloned your repo and have been trying to debug the situation. I was also able to reproduce your issue, so I’ve pulled in a few of the driver engineers to see if they can help me debug further!

I’ll continue to post here when I have any meaningful updates. In the meantime, thank you for your patience while we figure this out!

1 Like

Hi! Thanks for all your analysis @yo_adrienne

In my case, after lot of hours of research and work we concluded it was not the toList() what was wrong in our application. It was a little bit of everything… apart from the bandwith potential delays we had for sure, we were retrieving a lot of data - around 13MB on some particular scenario - from the database and there was the bottleneck. We had to change a litte bit our model by implementing the bucket pattern in the first place. Then we also made changes on our aggregate queries by setting lookups properly using the needed projections there as well.

After all those changes now it works as expected.

Thanks a lot!

4 Likes

That is so awesome to hear @Sergio_Fiorillo! Thanks for coming back and posting about it as it helps others as well :slight_smile:

2 Likes

@Joao_Passos an update!

TL;DR: Most of the performance hit is due to dealing with the complexity of the Facility objects.

Longer version:

After spending quite some time with the .NET driver team, we’ve found the following:

  • Instantiation of 5000 Facility objects alone takes about 2 seconds. This is pure C# without the driver involved at all.
  • Comparing the performance of RawBsonDocument to Facility objects isn’t really an equal or fair comparison (more on that below).
  • Taking the sample repo you’ve shared with us, most of the time is spent on traversing 5000 facilities objects with 150 parameters each and 500 attributes each; that’s about 4.75 million in-memory objects!

Why comparing performance of RawBsonDocument to Facility object is not an equal comparison,
as explained by one of our Sr. .NET driver engineers:

Comparing the performance of RawBsonDocument to Facility objects isn’t exactly a fair comparison as we wrap the raw payload in an object but don’t iterate it or deserialize it into objects. So we are comparing time-on-the-wire (RawBsonDocument case) to time-on-the-wire plus deserialization overhead. That said, retrieving the raw BSON bytes and wrapping them in a collection of RawBsonDocuments takes 897ms but deserializing them into C# objects take ~14 seconds.

If we deserialize into BsonDocument (rather than RawBsonDocument ) then we are avoiding the mapping overhead of converting raw BSON bytes into C# objects, but are still reading the bytes to convert into a series of nested BSON documents. Deserializing to BsonDocument takes 12s.

As a final test, we also modified just your Program.cs file from your repo to see if any performance increases could be made. We were able to get some performance increases, but in a “quick and dirty, but gets results done” kind of way:


using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using MongoDB.Bson;
using MongoDB.Bson.Serialization;
using MongoDB.Driver;
namespace mongodb
{
	internal class Program
	{
		private static readonly FacilityAttributeStringValue Country = new()
			{Id = Guid.NewGuid(), AttributeId = Guid.NewGuid(), Value = "Spain"};
		private static async Task Main(string[] args)
		{
			BsonClassMap.RegisterClassMap<FacilityAttributeDateTimeValue>();
			BsonClassMap.RegisterClassMap<FacilityAttributeStringValue>();
			BsonClassMap.RegisterClassMap<FacilityAttributeDecimalValue>();
			BsonClassMap.RegisterClassMap<MonitoredDataSourceId>();
			BsonClassMap.RegisterClassMap<CustomDataSourceId>();
			var client =
				new MongoClient(
					"mongodb://localhost:27017/?readPreference=primary&appname=mongodb-vscode%200.4.1&ssl=false");
			await client.DropDatabaseAsync("mongo-benchmark");
			var db = client.GetDatabase("mongo-benchmark");
			var typedCollection = db.GetCollection<Facility>("facilities");
			//Insert 5000 facilities with 150 parameter and 500 attributes each
			var sw = Stopwatch.StartNew();
			var facilities = GetFacilities().ToList();
			sw.Stop();
			Console.WriteLine($"Created 5000 facilities in {sw.ElapsedMilliseconds} ms.");
			await typedCollection.InsertManyAsync(facilities);
            //Bson
			var bsonCollection = db.GetCollection<BsonDocument>("facilities");
			sw.Restart();
            var bsonValues = await (await bsonCollection
                .FindAsync(new BsonDocumentFilterDefinition<BsonDocument>(new BsonDocument()))).ToListAsync();
			sw.Stop();
			Console.WriteLine(bsonValues.Count + " BSON values took " + sw.ElapsedMilliseconds);
            //RawBson
			var rawCollection = db.GetCollection<RawBsonDocument>("facilities");
			sw.Restart();
            var rawValues = await (await rawCollection
                .FindAsync(new BsonDocumentFilterDefinition<RawBsonDocument>(new BsonDocument()))).ToListAsync();
			sw.Stop();
			Console.WriteLine(rawValues.Count + " raw values took " + sw.ElapsedMilliseconds);
            sw.Restart();
            foreach (var doc in rawValues)
            {
                foreach (var field in doc.Elements)
                {
                    switch (field.Name)
                    {
                        case "_id":
                            var id = field.Value;
                            break;
                        default:
                            foreach (var elem in (RawBsonArray)field.Value)
                            {
                                var value = elem.AsBsonValue;
                            }
                            break;
                    }
                }
            }
            sw.Stop();
			Console.WriteLine($"Iterating the raw BSON took {sw.ElapsedMilliseconds} ms.");
			//Typed
			sw.Restart();
			var typedValues = await (await typedCollection.FindAsync(new BsonDocument())).ToListAsync();
			sw.Stop();
			Console.WriteLine(typedValues.Count + " typed values took " + sw.ElapsedMilliseconds);
		}
		private static IEnumerable<Facility> GetFacilities()
		{
			var parameter = Enumerable.Range(0, 150).Select(_ => Guid.NewGuid()).ToList();
			var attributes = Enumerable.Range(0, 500).Select(_ => Guid.NewGuid()).ToList();
			for (var i = 0; i < 5000; i++)
				yield return new Facility
				{
					Id = Guid.NewGuid(),
					ParametersValues = parameter.Select(p => new ParameterValue
					{
						Id = Guid.NewGuid(),
						ParameterId = p,
						DataSourceId = new CustomDataSourceId {Id = Guid.NewGuid()}
					}).ToList(),
					AttributesValues =
						i % 3 == 0
							? new List<AttributeValue>(attributes.Take(499).Select(a =>
								new FacilityAttributeStringValue
									{Id = Guid.NewGuid(), AttributeId = a, Value = a.ToString()}).Append(Country))
							: new List<AttributeValue>(attributes.Select(a =>
								new FacilityAttributeStringValue
									{Id = Guid.NewGuid(), AttributeId = a, Value = a.ToString()}))
				};
		}
	}
}

Those changes resulted in these benchmarks:

Created 5000 facilities in 3229 ms.
5000 BSON values took 16468
5000 raw values took 1014
Iterating the raw BSON took 3627 ms.
5000 typed values took 16698

So while the total time of about 16.7 seconds is longer than we’d like, it’s only twice as slow as the component parts, not 16x slower.

For reference:

  • Retrieving raw BSON: ~1 second
  • Traversing raw BSON: 3.6 seconds
  • Instantiating all C# objects: 3.2 seconds

which means best case scenario = 7.8 seconds without the class maps or other deserialization infrastructure involved.

I hope this helps better explain why you may be experiencing this expected delay with the objects you have!

3 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.