Poor driver performance, how to optimize?

Hi,

We are comparing MongoDB vs our current SQL database solution.
One thing we noticed is that in the C# driver when we want to retrieve around 1 million documents it’s super slow versus de SQL version.

Some background information: we have a collection containing 70 million documents. One query we are currently testing is to retrieve some documents based on a filter. When that result is retrieved, it takes over 1 minute before the list is filled. In SQL that same query and table takes 15 seconds to fill the same list.

Is there a way to optimize this?

Have you performed your comparison on the same hardware?

Have you performed your comparison on cold or hot server? Filling the cache with data from disk might be slow so if SQL was hot and Mongo is cold then you are comparing apples and oranges.

Do you have indexes that support your query?

I think that a use-case that needs to download 1 million documents/rows should be redesigned. For example, if you download 1 million documents to do some calculations, then you should redesign so that the calculations are done on the server with the aggregation framework.

Note that being schema less mongo needs to send field names over the wire. With 1M documents you might have hit a bandwidth bottleneck. One way to optimize is to $project only the fields needed for the computation.

Also blindly copy one SQL table to one MongoDB collection fails to tap into the flexible nature of JSON documents. For example, if rows/docs with dates, you could use the bucketing pattern and group all rows/docs with same date with a single document. This will reduce the total number of documents and data duplication. Your 1_000_000 docs might end up being only 100_000.

That is it for now. I hope others will join the conversation.

Hi Steeve,

Thanks for your inisghts. Let me indeed provide some more info:
Yes, comparison is done on the same hardware. Also on the same exact server so I don’t think hot or cold servers is the issue. We have indeed an index that supports the query, both on SQL as in Mongodb.

I also think you’re right about the use case, that it’s bad design. In this case it was merely a benchmark.

Hopefully others will chime in as well :slight_smile:

1 Like

Hi there,
I’m using c# too, and also doing a benchmark project on an isolated environment(yes, my laptop, i7 SSD)
I replicated the same dataset on MS SQL Express and MongoDB Community 7.0, around 931k recorders(documents), and both indexed the properties that the query needed.

In MongoDB Compass, the result is very very impressive, a filtered result with nearly 20k records and the explainer said it used 400ms, really cool. But in c# and Mongodb driver, the same query condition and back the same records, it took nearly 10 seconds to convert into a model array.

Can anyone tell me what’s happened to the c# driver or how can I improve the reading speed?
thanks!

Do you have the source code you’re running to do the comparison?

What SQL flavour are you using? Maria or MS or Oracle or something else?

/Edit I see on the follow up from James, he’s using SQL Express

In the C# code is it pulling ALL results back or first set of results? I assume that in Compass it’ll open a cursor and get the first set of results back, which could be quick. You could check the network usage between the two use-cases to verify that the same amount of data is flowing.

Hi John,
Thanks for your feedback,

var collection = IntraMongoDB.GetCollection<RawBsonDocument>("br_Report");
            filter.Add("dbname", new BsonDocument()
        .Add("$in", new BsonArray(new string[] { "TMT" }))
);
            filter.Add("Data_year", new BsonDocument()
                    .Add("$in", new BsonArray()
                            .Add("2023")
                    )
            );
        var projection = Builders<RawBsonDocument>.Projection.Exclude("_id");

        var rawBsonDoc = collection.Find(filter).Project<RawBsonDocument>(projection).ToList();

I’ve tested another type of object “RawBsonDocument” via seeing another forum article, and boom, the result was outstanding. it just took ~1100ms to get all the data (around 22k records). about 8~9 times faster than BsonDocument.
But all I need is my data model, so I used BsonSerializer.Deserialize to convert RawBsonDocument to my model, unfortunately, that’s very slow.
Here’s the forum thread I mentioned.
https://www.mongodb.com/community/forums/t/c-net-core-3-1-driver-2-11-1-slow-tolist-data-manifestation/8783/28

So I think the bottleneck shouldn’t be the network cause I can fetch data in a very fast way, (and yes the MongoDB is located on “localhost”)
Might it be the mapping or converter’s issue?

Depending on your model, could you create your own de-serialiser optimised to your model?

Some questions:

Do you really really need to download 22_000 documents and convert them?

Are you sure that whatever you do with that many documents cannot be done with an appropriate aggregation pipeline?

You only exclude the _id, are all other fields needed for what ever you do with these documents? Could you include only the fields that are needed? Even in SQL, doing select * from is wrong.

1 Like

Good point…our model has 1700 fields…everything we do has a projection down to just what we’re dealing with. If we tried to deserialise a full document, that’s a serious overhead…when you may only need a handfull.

1 Like

Yes, maybe others don’t need such a huge amount of data.

But my main point is, the driver seems need amount of time to do deserilization thing.

However, in my case, we need to acquire this data to perform some post-calculation (pivot analysis) within our application.
If we want the flexibility of the pivot, the data should be almost raw data. Otherwise we need to prepare 10+ kinds of aggration pipe lines.

*We’ve also grouped the data before putting it into MongoDB to reduce the dataset, but currently, these ~22k data points are all we need.

Since mongod is localhost on a laptop, I suspect that your laptop is memory starved at this point. I do not know much about C# but I suspect that RawBsonDocument is how the data is received from the server. When you start converting to your model then you might need twice the amount of memory.

Do you have any metrics about CPU and RAM usage during the process?

Rather than using high level API like ToList() you might one to try to convert to your model in a loop one document at a time making sure the RawBsonDocument object is deleted from the received list (to release memory) before converting the next one.

What does a document look like? Can you give a sample one?

The most likely cause to this isn’t the driver itself, but instead is an issue of syntax and formatting.

Make sure that you’re not building your data model the same as you would an SQL model, because I can tell you as someone who has built and constructed a 700TB blockchain node, that SQL doesn’t even hold a candle to the speed and performance of the MongoDB C# driver.

Please send up your data model and how you have it laid out and I’m sure we can help you optimize it.

SQL will never be even close to being as fast as JSON/BSON No-SQL…

This is being dead honest, there’s nothing close. And this is just recently with 970 million transactions a minute across a multi sharded cluster in what’s already not the best idea to setup.

I would encourage maybe checking out MongoDB University “MongoDB For SQL Developers” class, as that may highlight some issues you might not realize you’ve built.

This also isn’t some fanboy thing either, I’m very agnostic of database admin and usages. But also make sure that your use case is even relevant to NoSQL DBs, as NoSQL DBs inherently are for horizontal, not vertical data model designs. So do make sure your use case is even relevant for using MongoDB for it and vice-versa.

Hi Brock and John,

Thanks for your advices, I’ll dive into the MongoDB University.
But as you said, RDB can never handle a mega dataset while MongoDB can, perhaps I didn’t find the right way to work with MongoDB yet.

Cause my dataset contains sentive data, please give me some time to create a similar situation for simulation my issue. Thanks.