Explanation for query performance on linux between python and c# drivers

Initially I ran into a big performance difference with the C# Mongodb driver running on my local windows 11 and with the same data set and code running in a Linux docker image on a Linux box.

Some difference would be expected because of the difference in performance on the hosts but as the query returned more rows the performance became unexpectable.

For exploration into the performance I condensed the query code down to python and a compatible c# version to compare performance on various hosts but no Docker this time.

Here’s the results as simple timed console outputs where I run both code sets on windows and linux

The performance drop of C# on Linux is crushing and this is the query and data set that our actual application would be using.

The collection being used is on Atlas and because of the code similarities and the way I’ve run it in various scenarios I cannot see what else could be causing the performance issue other than the C# mongo driver when built for linux via visual studio 2022. I’m using .net8.0

If anyone has any advice on something to help identify what might be going on here I’d be very happy to hear it and try it out.

Thank you

Here’s the python version (sorry about the code formatting the fencing went wonky and I couldn’t change it)

`````from pymongo import MongoClient
from datetime import datetime, timezone
from bson.objectid import ObjectId
import timeit
import pprint

print('hello')

CONNECTION_STRING = ''

client = None
db = None
coll = None

cartIds = [
    ObjectId("657aac82c5505fd69c69cdb8"),
    ObjectId("657aad7dc5505fd69c69cdb9"),
    ObjectId("657aaddbc5505fd69c69cdba"),
    ObjectId("657aaf0dc5505fd69c69cdbb"),
    ObjectId("657ab0dbc5505fd69c69cdbc"),
    ObjectId("657abccec5505fd69c69cdbd"),
    ObjectId("657abe3ec5505fd69c69cdbe"),
    ObjectId("657ac76bc5505fd69c69cdbf"),
    ObjectId("657ac88ec5505fd69c69cdc0"),
    ObjectId("657ac90611075a550153fd74"),
    ObjectId("65c1129dd514fb5eeb002aad"),
    ObjectId("65c15922d1666792496206ae"),
    ObjectId("65e001f8ef75fac698b5774a")
];

def startup():
    global client
    global db
    global coll

    client = MongoClient(CONNECTION_STRING)
    db = client['']
    coll = db['']

taken = timeit.timeit(startup, number=1)
print('Startup: ' + str(taken))

count = 0

def query():
    global count

    items = coll.find(
	    { 
            '$and': [
		        {'Eid': { '$in': cartIds }},
                {'Data.Ts': { '$gte': datetime(2024, 3, 15, tzinfo=timezone.utc) }},
                {'Data.Ts': { '$lt': datetime(2024, 4, 1, tzinfo=timezone.utc) }}
            ]
	    },
        {
            'Data.Lcn':1,
            'Eid':1,
            'Data.Flags':1,
            'Data.Rnd.Diff':1,
            '_id':0
        }
    )#.explain()

    #pprint.pp(items)

    for item in items:
        if (item['Eid'] == ''):
            count = count+2
        count = count + 1

for i in range(3):
    count = 0
    taken = timeit.timeit(query, number=1)

    print('Count: ' + str(count) + ' in ' + str(taken))
`

and for C#

`class Program
{
    private static string connectionString = ""; // Set your connection string here
    private static MongoClient client = null!;
    private static IMongoDatabase db = null!;
    private static IMongoCollection<RawBsonDocument> coll = null!;

    private static List<ObjectId> cartIds = new List<ObjectId>
    {
        new ObjectId("657aac82c5505fd69c69cdb8"),
        new ObjectId("657aad7dc5505fd69c69cdb9"),
        new ObjectId("657aaddbc5505fd69c69cdba"),
        new ObjectId("657aaf0dc5505fd69c69cdbb"),
        new ObjectId("657ab0dbc5505fd69c69cdbc"),
        new ObjectId("657abccec5505fd69c69cdbd"),
        new ObjectId("657abe3ec5505fd69c69cdbe"),
        new ObjectId("657ac76bc5505fd69c69cdbf"),
        new ObjectId("657ac88ec5505fd69c69cdc0"),
        new ObjectId("657ac90611075a550153fd74"),
        new ObjectId("65c1129dd514fb5eeb002aad"),
        new ObjectId("65c15922d1666792496206ae"),
        new ObjectId("65e001f8ef75fac698b5774a")
    };

    static void Main(string[] args)
    {
        Console.WriteLine("hello");

        Stopwatch stopwatch = Stopwatch.StartNew();
        Startup();
        Console.WriteLine($"Startup: {stopwatch.Elapsed.TotalSeconds}");

        for (int i = 0; i < 3; i++)
        {
            stopwatch.Restart();
            int count = Query();
            stopwatch.Stop();

            Console.WriteLine($"Count: {count} in {stopwatch.Elapsed.TotalSeconds}");
        }
    }

    private static void Startup()
    {
        client = new MongoClient(connectionString);
        db = client.GetDatabase("");
        coll = db.GetCollection<RawBsonDocument>("");
    }

    private static int Query()
    {
        int count = 0;

        var filter = Builders<RawBsonDocument>.Filter.And(
            Builders<RawBsonDocument>.Filter.In("Eid", cartIds),
            Builders<RawBsonDocument>.Filter.Gte("Data.Ts", new DateTime(2024, 3, 15, 0, 0, 0, DateTimeKind.Utc)),
            Builders<RawBsonDocument>.Filter.Lt("Data.Ts", new DateTime(2024, 4, 1, 0, 0, 0, DateTimeKind.Utc))
        );

        var projection = Builders<RawBsonDocument>.Projection.Include("Data.Lcn")
                                                        .Include("Eid")
                                                        .Include("Data.Flags")
                                                        .Include("Data.Rnd.Diff")
                                                        .Exclude("_id");
        var items = coll.Find(filter).Project(projection).ToEnumerable();

        foreach (var item in items)
        {
            // Just a reundant thing to make sure stuff is created
            if (item.GetValue("Eid").Equals(""))
            {
                count += 2;
            }
            count++;
        }

        return count;
    }
}`

here’s the c# code

class Program
{
    private static string connectionString = ""; // Set your connection string here
    private static MongoClient client = null!;
    private static IMongoDatabase db = null!;
    private static IMongoCollection<RawBsonDocument> coll = null!;

    private static List<ObjectId> cartIds = new List<ObjectId>
    {
        new ObjectId("657aac82c5505fd69c69cdb8"),
        new ObjectId("657aad7dc5505fd69c69cdb9"),
        new ObjectId("657aaddbc5505fd69c69cdba"),
        new ObjectId("657aaf0dc5505fd69c69cdbb"),
        new ObjectId("657ab0dbc5505fd69c69cdbc"),
        new ObjectId("657abccec5505fd69c69cdbd"),
        new ObjectId("657abe3ec5505fd69c69cdbe"),
        new ObjectId("657ac76bc5505fd69c69cdbf"),
        new ObjectId("657ac88ec5505fd69c69cdc0"),
        new ObjectId("657ac90611075a550153fd74"),
        new ObjectId("65c1129dd514fb5eeb002aad"),
        new ObjectId("65c15922d1666792496206ae"),
        new ObjectId("65e001f8ef75fac698b5774a")
    };

    static void Main(string[] args)
    {
        Console.WriteLine("hello");

        Stopwatch stopwatch = Stopwatch.StartNew();
        Startup();
        Console.WriteLine($"Startup: {stopwatch.Elapsed.TotalSeconds}");

        for (int i = 0; i < 3; i++)
        {
            stopwatch.Restart();
            int count = Query();
            stopwatch.Stop();

            Console.WriteLine($"Count: {count} in {stopwatch.Elapsed.TotalSeconds}");
        }
    }

    private static void Startup()
    {
        client = new MongoClient(connectionString);
        db = client.GetDatabase("");
        coll = db.GetCollection<RawBsonDocument>("");
    }

    private static int Query()
    {
        int count = 0;

        var filter = Builders<RawBsonDocument>.Filter.And(
            Builders<RawBsonDocument>.Filter.In("Eid", cartIds),
            Builders<RawBsonDocument>.Filter.Gte("Data.Ts", new DateTime(2024, 3, 15, 0, 0, 0, DateTimeKind.Utc)),
            Builders<RawBsonDocument>.Filter.Lt("Data.Ts", new DateTime(2024, 4, 1, 0, 0, 0, DateTimeKind.Utc))
        );

        var projection = Builders<RawBsonDocument>.Projection.Include("Data.Lcn")
                                                        .Include("Eid")
                                                        .Include("Data.Flags")
                                                        .Include("Data.Rnd.Diff")
                                                        .Exclude("_id");
        var items = coll.Find(filter).Project(projection).ToEnumerable();

        foreach (var item in items)
        {
            // Just a reundant thing to make sure stuff is created
            if (item.GetValue("Eid").Equals(""))
            {
                count += 2;
            }
            count++;
        }

        return count;
    }
}

and the python

from pymongo import MongoClient
from datetime import datetime, timezone
from bson.objectid import ObjectId
import timeit
import pprint

print('hello')

CONNECTION_STRING = ''

client = None
db = None
coll = None

cartIds = [
    ObjectId("657aac82c5505fd69c69cdb8"),
    ObjectId("657aad7dc5505fd69c69cdb9"),
    ObjectId("657aaddbc5505fd69c69cdba"),
    ObjectId("657aaf0dc5505fd69c69cdbb"),
    ObjectId("657ab0dbc5505fd69c69cdbc"),
    ObjectId("657abccec5505fd69c69cdbd"),
    ObjectId("657abe3ec5505fd69c69cdbe"),
    ObjectId("657ac76bc5505fd69c69cdbf"),
    ObjectId("657ac88ec5505fd69c69cdc0"),
    ObjectId("657ac90611075a550153fd74"),
    ObjectId("65c1129dd514fb5eeb002aad"),
    ObjectId("65c15922d1666792496206ae"),
    ObjectId("65e001f8ef75fac698b5774a")
];

def startup():
    global client
    global db
    global coll

    client = MongoClient(CONNECTION_STRING)
    db = client['']
    coll = db['']

taken = timeit.timeit(startup, number=1)
print('Startup: ' + str(taken))

count = 0

def query():
    global count

    items = coll.find(
	    { 
            '$and': [
		        {'Eid': { '$in': cartIds }},
                {'Data.Ts': { '$gte': datetime(2024, 3, 15, tzinfo=timezone.utc) }},
                {'Data.Ts': { '$lt': datetime(2024, 4, 1, tzinfo=timezone.utc) }}
            ]
	    },
        {
            'Data.Lcn':1,
            'Eid':1,
            'Data.Flags':1,
            'Data.Rnd.Diff':1,
            '_id':0
        }
    )#.explain()

    #pprint.pp(items)

    for item in items:
        if (item['Eid'] == ''):
            count = count+2
        count = count + 1

for i in range(3):
    count = 0
    taken = timeit.timeit(query, number=1)

    print('Count: ' + str(count) + ' in ' + str(taken))

Hi, @Chris_N_A3,

Thank you for taking the time to reach out with your performance results and provide a self-contained repro in both C# and Python. 10x slower performance on Linux is not expected. While there are some necessary code differences between Linux and Windows, these are kept to an absolute minimum. Since you’re running .NET 8 on both Windows and Linux, you would be using the same .NET Standard 2.1 assemblies from our NuGet packages. Additionally the Python version (on Ubuntu and Windows) has roughly similar performance to C# on Windows. It is C# on Ubuntu that is 10x slower.

We are going to try to reproduce the results that you are seeing and investigate why the C# app on Linux is 10x slower. We will keep you apprised of our investigations and let you know if we require any additional information. Thanks again for reporting this issue.

Sincerely,
James

Hi, @Chris_N_A3,

I ran the repro on a VM running Ubuntu 20.04 and didn’t observe the 10x perf difference reported.

hello
Startup: 0.002088544000116599
Count: 200000 in 0.8493798950000837
Count: 200000 in 0.8191589279999789
Count: 200000 in 0.8060957580000832
(.venv) james@jameskovacs-ubuntu:~/code/Net80LinuxPerf$ dotnet build && dotnet run
MSBuild version 17.9.6+a4ecab324 for .NET
  Determining projects to restore...
  All projects are up-to-date for restore.
  Net80LinuxPerf -> /home/james/code/Net80LinuxPerf/bin/Debug/net8.0/Net80LinuxPerf.dll

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:01.14
hello
Startup: 0.1510909
Count: 200000 in 1.6469108
Count: 200000 in 1.2501779
Count: 200000 in 1.2015398

Runtime information:
Ubuntu 20.04 x64 (Intel Core i9 8-core 2.4GHz)
.NET 8.0
Python 3.8.10

I populated the database with 200K documents as follows:

db = db.getSiblingDB("perf"); db.data.drop(); for (const i of Array(200000).keys()) db.data.insertOne({Eid: new ObjectId("657aac82c5505fd69c69cdb8"), Data: {Lcn: 42, Flags: "Y", Rnd: {Diff: 99}, Ts: ISODate("2024-03-31")}}) db.data.createIndex({Eid:1})

While there is a performance difference between pymongo and the .NET/C# Driver, it is on the order of 50% with pymongo completing a run in ~0.8 seconds and C# in about ~1.2s. I haven’t seen if this scales linearly with document count or not. It is not the 10x difference that you observed on Ubuntu in your tests.

In order to ensure that we are investigating the same problem, please try re-running your test with the test data noted above to see if that makes a difference in the results. Any additional information regarding Ubuntu version, MongoDB version, Python version, hardware, document schema (with dummy data), indices, etc. would be helpful in investigating further. I would encourage you to create a https://jira.mongodb.org/browse/CSHARP ticket with this information as it will be easier to track and collaborate further.

Sincerely,
James

Hi James

I’ll try the data you provided and if required a ticket on jira, however, I have solved the problem but not what the best way to fix it would be.

It appears to be these two lines of code within the driver

If I comment those lines out my code now runs faster than python on my Linux host.

On my Linux environment the buffer sizes read from the socket opts are both 87380 bytes. Even If I try and enforce that via the settings of the driver, and please note from manual

SO_SNDBUF Sets or gets the maximum socket send buffer in bytes. The kernel doubles this value (to allow space for bookkeeping overhead) when it is set using setsockopt(2) and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/wmem_default file and the maximum allowed value is set by the /proc/sys/net/core/wmem_max file. The minimum (doubled) value for this option is 2048.

, it still affects the performance detrimentally. At the moment the only way I can avoid the performance drop is to comment those lines out - if I actually set the settings values so that when doubled they equal 87380 it still is slow.

I have two Linux environments where this is happening but I’ve only tested my change on one of them. The environments are from AWS EC2 and Vultr.

For Vultr where I’m testing it is

Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-101-generic x86_64)

and for AWS EC2 where I have not yet tried my fix

Welcome to Ubuntu 22.04.3 LTS (GNU/Linux 6.2.0-1016-aws x86_64)

Thank you for the mongo code. I did try it quickly and for me my Windows built code runs fine but the Linux version is slow unless I put in my fix. As a note for building .net to Linux I’m using the publish from Visual Studio 2022 and creating a self contained app.

I’m using Atlas as the server, and the latest mongo driver.

Regards, Chris

Hi, @Chris_N_A3,

This is an excellent root cause analysis. Very interesting about SO_SNDBUF and SO_RCVBUF . According to the API docs for Socket.ReceiveBufferSize, the default is 8K.

/// <summary>Gets or sets a value that specifies the size of the receive buffer of the <see cref="T:System.Net.Sockets.Socket" />.</summary>
/// <exception cref="T:System.Net.Sockets.SocketException">An error occurred when attempting to access the socket.</exception>
/// <exception cref="T:System.ObjectDisposedException">The <see cref="T:System.Net.Sockets.Socket" /> has been closed.</exception>
/// <exception cref="T:System.ArgumentOutOfRangeException">The value specified for a set operation is less than 0.</exception>
/// <returns>An <see cref="T:System.Int32" /> that contains the size, in bytes, of the receive buffer. The default is 8192.</returns>
public int ReceiveBufferSize { get; set; }

According to MSDN docs for Socket.ReceiveBufferSize, the size is OS dependent:

An Int32 that contains the size, in bytes, of the receive buffer. The default value depends on the operating system.

The same is true for Socket.ReceiveBufferSize where API docs say 8K and MSDN says OS dependent.

In the .NET/C# Driver, we “increase” the TCP send and receive buffer sizes to 64KB in both MongoDefaults and the TcpStreamSettings constructor. Either the original documentation was incorrect regarding the 8K default or operating systems have adjusted their defaults upwards. MacOS Sonoma and Ubuntu 20.04 use 128KB default for both. (No rhyme or reason why I chose those particular OSes other than I happened to have them handy.)

You mentioned that adjusting the defaults in the driver still resulted in slow performance. Please confirm that you used code similar to the following:

var settings = MongoClientSettings.FromConnectionString(MONGODB_URI);
settings.ClusterConfigurator = 
    cc =>
        cc.ConfigureTcp(tcp => tcp.With(receiveBufferSize: 128 * 1204, sendBufferSize: 128 * 1024));
var client = new MongoClient(settings);

Given that you are reading ~158K documents back, I suspect that your performance will be affected most by SO_RCVBUF. I notice that you’re projecting out the following fields: { _id:0, Eid:1, Data.Flags:1, Data.Lcn:1, Data.Rnd.Diff:1}. I can tell that Eid is an ObjectId, but what are the other data types (and sizes if they’re arrays or strings)? I’m trying to figure out how many 16MB batches the results would be split into. Even if it fit into a single 16MB batch, that batch would be transmitted as multiple TCP packets which would be affected by SO_RCVBUF.

We are going to try to reproduce the behaviour by retrieving larger payloads from an Atlas cluster. (My initial repro attempt was against a local cluster.) Knowing your approximate data sizes of the projected documents would be helpful in our investigation. It would also be useful to know the minimum/default/maximum values for SO_RCVBUF and SO_SNDBUF. Here are the values from my Ubuntu 20.04 VM:

$ cat /proc/sys/net/ipv4/tcp_rmem
4096	131072	6291456
$ cat /proc/sys/net/ipv4/tcp_wmem
4096	16384	4194304

Thanks in advance for your continued collaboration with investigating this issue.

Sincerely,
James

James,

A quick reply to provide you with details.

Here’s an example doc

{
    "_id": {
        "$oid": "657aad5f3151fd4f1b87d447"
    },
    "Added": {
        "$date": "2023-12-14T07:23:11.328Z"
    },
    "Updated": {
        "$date": "2023-12-14T07:23:11.328Z"
    },
    "Removed": null,
    "Eid": {
        "$oid": "657aacfc11075a550153fd67"
    },
    "Data": {
        "Ts": {
            "$date": "2023-12-14T07:23:10.016Z"
        },
        "Names": [],
        "Lcn": {
            "Lat": 48.60299,
            "Lng": 101.98106833333335
        },
        "Rnd": {
            "Start": {
                "$date": "2023-12-13T23:48:20.728Z"
            },
            "Course": 2,
            "Hole": 1,
            "Diff": 116533,
            "Thru": 7
        },
        "Man": null,
        "Stats": null,
        "Flags": {
            "$numberLong": "15"
        },
        "Events": null,
        "Acks": []
    },
    "Writes": {
        "$numberLong": "1"
    },
    "Ins": true
}

and from my Vultr host

$ cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 33554432
$ cat /proc/sys/net/ipv4/tcp_wmem
4096 87380 33554432

I’ll try your config approach but previously I just hard coded the values into the driver code to try it out since the only way to not set the properties was to change the code by commenting those lines out. What I found strange was that just by setting the properties even to the same values I read from them still messed up the performance.

It might be nice in the config to have a way to NOT even try set the properties - like if they are null in the config for instance. If I had that I would be able to use the driver as is.

Thanks again for digging into this.

Regards, Chris

Hi, @Chris_N_A3,

Thank you for the provided information. I have created CSHARP-5030 to track this issue.

Sincerely,
James

1 Like