Mongodb scaling with cores is flattening after > 30 client requests/threads

I’m new to the mongodb project. I’m very much interested in the scaling of a latest mongodb server (v5.0.13) in a container from docker hub. As a first step I exercised “YCSB” client against a mongodb server running on a high core count machine (64 cores) for read-only queries (workloadc configuration file from YCSB), it seems the core scaling tapers off after 30 client threads. At or after 40 threads the overall throughput goes down by > 5% with additional client threads.
The perf output collected for 30sec. shows almost 50% cycles in the kernel (native_queued_spin_lock_slowpath). I’m not able to get any symbols from the mongodb to identify the caller sites while executing these queries. I would appreciate if anyone have any suggestion to debug this further?
My Steps:

  1. docker pull mongo (this pull v5.0.13)
  2. sudo docker run -d --rm --name mongo mongo:5.0.13
  3. Start YCSB client with load and run operations,
  1. Load the database with records
  2. Execute read-only queries starting with 10, 20, 30,40,80 threads.
    Following is a sample scaling result (baseline is with 1 client thread)
    Threads : Throughput
    1: 1.00 ← baseline
    10: 9.45 (scaling with 10 threads over 1-thread)
    20: 17.65
    30: 23.60
    40: 22.17
    80: 20.51

The question is, “does some kind of synchronization (even for read queries) is destroying potential throughput gains?”