Pymongo Cursor cursor optimization because it is very slow

It’s a topic that keeps coming up, but it’s still not settled for me.

If I have a DB with an average document of 1,6KB and 100.000 documents, then the cursor needs 50ms to load with pymongo. Until I have saved everything in a list it takes over 2 seconds.

How can I optimize this?

code:

document = list(col.find({},{"_id": 0, "id": 0}).limit(100000))
takes 2734 ms in Average

document = col.find({},{"_id": 0, "id": 0}).explain("executionStats")
tells me it took 54 ms

python -m cProfile -s time .\test_call.py shows:

         409630 function calls (407971 primitive calls) in 2.813 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4904    2.368    0.000    2.368    0.000 {method 'recv_into' of '_socket.socket' objects}
       12    0.132    0.011    0.132    0.011 {built-in method bson._cbson._decode_all}
       24    0.059    0.002    2.437    0.102 network.py:287(_receive_data_on_socket)
   100001    0.035    0.000    2.646    0.000 cursor.py:1244(next)
       95    0.025    0.000    0.025    0.000 {built-in method io.open_code}
      486    0.023    0.000    0.023    0.000 {built-in method nt.stat}
       12    0.014    0.001    2.597    0.216 cursor.py:1128(_refresh)
        1    0.013    0.013    2.813    2.813 test_call.py:1(<module>)
       10    0.011    0.001    0.012    0.001 {built-in method _imp.create_dynamic}
        2    0.009    0.004    0.009    0.004 {method 'read' of '_io.TextIOWrapper' objects}
106150/106041    0.008    0.000    0.008    0.000 {built-in method builtins.len}
       95    0.007    0.000    0.007    0.000 {built-in method marshal.loads}
   100011    0.007    0.000    0.007    0.000 {method 'popleft' of 'collections.deque' objects}
     1302    0.006    0.000    0.009    0.000 <frozen importlib._bootstrap_external>:96(_path_join)
       83    0.006    0.000    0.006    0.000 {method 'acquire' of '_thread.lock' objects}
        2    0.006    0.003    0.006    0.003 {built-in method _winapi.CreateProcess}
     4904    0.004    0.000    0.004    0.000 network.py:250(wait_for_read)
     4995    0.004    0.000    0.006    0.000 _csot.py:29(get_timeout)
      306    0.003    0.000    0.015    0.000 {built-in method builtins.__build_class__}
       95    0.003    0.000    0.003    0.000 {method 'read' of '_io.BufferedReader' objects}
      266    0.002    0.000    0.031    0.000 <frozen importlib._bootstrap_external>:1536(find_spec)
       11    0.002    0.000    2.578    0.234 cursor.py:1037(__send_message)

As you wrote the server took 54 ms to determine which documents to sent.

The cProfile seems to indicate that most of the time is spent reading data on the network.

I may be wrong but 100K documents * 1.6Kb is 163Mb. On a 100Mb/s network that about 1.6 secs to get the data over.

1 Like

I wouldn’t say it takes 54ms to send. Afterwards the data are fetched via the cursor.

The DB runs locally, network latency should not play a role. Is it normal that it loads so long.

I did not wrote it takes 54ms to send. I wrote it takes 54ms to determine which documents do send.

When both the client and server are local, you are right and network delay is not a major part of the equation. But

seems to indicate a lot of time spent in the network code.

But when both the client and server are local, they both fight for the same CPU and RAM. Context switches might be the culprit. What are the specifications of your CPU, RAM and disk?

Hi steeve, sorry I misunderstood you. Yes you are right 54ms for the decision.

The computer has 16GB RAM, Ryzen 7 5800U and 500GB Samsung SSD (Windows, DB as Docker, Client not).

I have also tried it on another PC with Ryzen 7 5800 24GB RAM and 500GB Samsung SSD (Linux, DB as Docker, Client not).

There I had the same results, rather even worse

I would try running the client on one machine and the server on the other. Network latency will come to play but resources contention will be better.

Have you observed swapping? Probably not. What are the CPU, RAM, SSD loads when running the test? Do you run anything else at the same time. When testing performance you must make sure to eliminate all interferences.

When running on the other machine the cache is cold so the storage engine has to read all documents from the disk before.

Yes the idea is good, I’ll test it with the different devices tomorrow.

When I tested, I only had the Docker container and the program running.

Just now I observed something funny.

When I run the client also as a Docker container, the times are 10 times better.

That means same system, 2 Docker containers one with the database, one that fetches and stores the data. This results in 100,000 documents at ~1.6kb in 240-300 ms, which is pretty good.

Currently I still can’t quite explain why it’s so much faster in the system, possibly because the RAM and CPU are reserved?

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.