Hard to explain timeout errors

Hello,

We’re running a fairly small (in terms of traffic / volume) python app on Heroku.
It uses pymongo 4.6.3 with mongoengine 0.28.2 on top to connect to an M10 Atlas cluster.

We’re facing a tough problem with regularly occurring (at least daily) episodes of queries timing out. All the server metrics are super-happy, show hardly any sweat, query insights shows max latencies in tens of ms. Yet, we start seeing errors like these pop up

pymongo.errors.ExecutionTimeout: operation would exceed time limit, remaining timeout:-1.55924 <= network round trip time:0.00538  (configured timeouts: timeoutMS: 10000.0ms, connectTimeoutMS: 3000.0ms), full error: {'ok': 0, 'errmsg': 'operation would exceed time limit, remaining timeout:-1.55924 <= network round trip time:0.00538  (configured timeouts: timeoutMS: 10000.0ms, connectTimeoutMS: 3000.0ms)', 'code': 50}

which are eventually followed by

pymongo.errors.NetworkTimeout: duckbill-prod-a9f81d6-shard-00-02.j73pa.mongodb.net:27017: timed out (configured timeouts: timeoutMS: 10000.0ms, connectTimeoutMS: 3000.0ms)

and/or

pymongo.errors.WaitQueueTimeoutError: Timed out while checking out a connection from connection pool. maxPoolSize: 100, timeout: 10.0

We had a few support cases open but the response is always

  • these are “network issues” (duh), and
  • setup VPC peering (we’d love to but it’s super expensive on Heroku)

Here are the MongoClient connection options for the reference:

        w='majority',
        read_preference=ReadPreference.PRIMARY_PREFERRED,
        connectTimeoutMS=3_000,
        socketTimeoutMS=20_000,
        timeoutMS=10_000,
        maxIdleTimeMS=600_000,
        minPoolSize=5

Perhaps somebody here is also running on Heroku? Have you ever faced similar issues?
Any general / directional troubleshooting recommendations?

Thank you!

1 Like

These errors occur when a pymongo operation (like find_one(), insert_one(), etc…) exceeds the timeoutMS. When these errors occur do they correspond with server maintenance events like rolling restarts or election of a new primary?

It would be great if you could enable debug logging in pymongo so we can trace the lifecycle of one of these failing operations. Note that logging was added in pymongo 4.7.0: Logging - PyMongo Driver - MongoDB Docs

1 Like

Thanks for sharing. Timeout issues despite healthy metrics can be frustrating—curious to see if it’s a network latency issue or something with connection pooling.