Hard to explain timeout errors

Hello,

We’re running a fairly small (in terms of traffic / volume) python app on Heroku.
It uses pymongo 4.6.3 with mongoengine 0.28.2 on top to connect to an M10 Atlas cluster.

We’re facing a tough problem with regularly occurring (at least daily) episodes of queries timing out. All the server metrics are super-happy, show hardly any sweat, query insights shows max latencies in tens of ms. Yet, we start seeing errors like these pop up

pymongo.errors.ExecutionTimeout: operation would exceed time limit, remaining timeout:-1.55924 <= network round trip time:0.00538  (configured timeouts: timeoutMS: 10000.0ms, connectTimeoutMS: 3000.0ms), full error: {'ok': 0, 'errmsg': 'operation would exceed time limit, remaining timeout:-1.55924 <= network round trip time:0.00538  (configured timeouts: timeoutMS: 10000.0ms, connectTimeoutMS: 3000.0ms)', 'code': 50}

which are eventually followed by

pymongo.errors.NetworkTimeout: duckbill-prod-a9f81d6-shard-00-02.j73pa.mongodb.net:27017: timed out (configured timeouts: timeoutMS: 10000.0ms, connectTimeoutMS: 3000.0ms)

and/or

pymongo.errors.WaitQueueTimeoutError: Timed out while checking out a connection from connection pool. maxPoolSize: 100, timeout: 10.0

We had a few support cases open but the response is always

  • these are “network issues” (duh), and
  • setup VPC peering (we’d love to but it’s super expensive on Heroku)

Here are the MongoClient connection options for the reference:

        w='majority',
        read_preference=ReadPreference.PRIMARY_PREFERRED,
        connectTimeoutMS=3_000,
        socketTimeoutMS=20_000,
        timeoutMS=10_000,
        maxIdleTimeMS=600_000,
        minPoolSize=5

Perhaps somebody here is also running on Heroku? Have you ever faced similar issues?
Any general / directional troubleshooting recommendations?

Thank you!