ServerSelectionTimeoutError error messages when we make performance test

Hi,
We developed REST API using AWS Lambda. In these lambda we connect to Atlas MongoDB (we tested M1 to M3 instances). During performance test that we recently executed we notices following errors (reported by Python MongoDB client). There are two types of errors:

[ERROR] ServerSelectionTimeoutError: No replica set members match selector "Primary()", Timeout: 1.0s, Topology Description: <TopologyDescription id: 65b8bcfbb1668c0d7aca80ae, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('aws-prod-prod-shard-00-00.rip0z.mongodb.net', 27017) server_type: RSSecondary, rtt: 0.007091070000001309>, <ServerDescription ('aws-prod-prod-shard-00-01.rip0z.mongodb.net', 27017) server_type: RSSecondary, rtt: 0.018172891799986247>, <ServerDescription ('aws-prod-prod-shard-00-02.rip0z.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('SSL handshake failed: aws-prod-prod-shard-00-02.rip0z.mongodb.net:27017: EOF occurred in violation of protocol (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
Traceback (most recent call last):
  File "/var/lang/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/var/task/handler.py", line 14, in <module>
    from lib.platform.defaults import get_platform_dependent_defaults
  File "/var/task/lib/platform/defaults.py", line 36, in <module>
    from lib.access_layer.mongodb.mdb_access_layer import TTMongoDBAccessLayer
  File "/var/task/lib/access_layer/mongodb/mdb_access_layer.py", line 13, in <module>
    from lib.connections.prime_db_connection import default_mongo_client
  File "/var/task/lib/connections/prime_db_connection.py", line 69, in <module>
    default_mongo_client = _initialize_mongodb_client()
  File "/var/task/lib/connections/prime_db_connection.py", line 60, in _initialize_mongodb_client
    mongo_client.admin.command('ping')  # blocks until connected
  File "/var/task/pymongo/_csot.py", line 107, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/var/task/pymongo/database.py", line 890, in command
    with self.__client._conn_for_reads(read_preference, session) as (
  File "/var/task/pymongo/mongo_client.py", line 1346, in _conn_for_reads
    server = self._select_server(read_preference, session)
  File "/var/task/pymongo/mongo_client.py", line 1303, in _select_server
    server = topology.select_server(server_selector)
  File "/var/task/pymongo/topology.py", line 302, in select_server
    server = self._select_server(selector, server_selection_timeout, address)
  File "/var/task/pymongo/topology.py", line 286, in _select_server
    servers = self.select_servers(selector, server_selection_timeout, address)
  File "/var/task/pymongo/topology.py", line 237, in select_servers
    server_descriptions = self._select_servers_loop(selector, server_timeout, address)
  File "/var/task/pymongo/topology.py", line 259, in _select_servers_loop
    raise ServerSelectionTimeoutError(

and second type of error:

[ERROR] ServerSelectionTimeoutError: SSL handshake failed: aws-prod-prod-shard-00-02.rip0z.mongodb.net:27017: EOF occurred in violation of protocol (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 1.0s, Topology Description: <TopologyDescription id: 65b8bcfbdd51187a286bc39f, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('aws-prod-prod-shard-00-00.rip0z.mongodb.net', 27017) server_type: Unknown, rtt: None>, <ServerDescription ('aws-prod-prod-shard-00-01.rip0z.mongodb.net', 27017) server_type: Unknown, rtt: None>, <ServerDescription ('aws-prod-prod-shard-00-02.rip0z.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('SSL handshake failed: aws-prod-prod-shard-00-02.rip0z.mongodb.net:27017: EOF occurred in violation of protocol (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
Traceback (most recent call last):
  File "/var/lang/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/var/task/handler.py", line 14, in <module>
    from lib.platform.defaults import get_platform_dependent_defaults
  File "/var/task/lib/platform/defaults.py", line 36, in <module>
    from lib.access_layer.mongodb.mdb_access_layer import TTMongoDBAccessLayer
  File "/var/task/lib/access_layer/mongodb/mdb_access_layer.py", line 13, in <module>
    from lib.connections.prime_db_connection import default_mongo_client
  File "/var/task/lib/connections/prime_db_connection.py", line 69, in <module>
    default_mongo_client = _initialize_mongodb_client()
  File "/var/task/lib/connections/prime_db_connection.py", line 60, in _initialize_mongodb_client
    mongo_client.admin.command('ping')  # blocks until connected
  File "/var/task/pymongo/_csot.py", line 107, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/var/task/pymongo/database.py", line 890, in command
    with self.__client._conn_for_reads(read_preference, session) as (
  File "/var/task/pymongo/mongo_client.py", line 1346, in _conn_for_reads
    server = self._select_server(read_preference, session)
  File "/var/task/pymongo/mongo_client.py", line 1303, in _select_server
    server = topology.select_server(server_selector)
  File "/var/task/pymongo/topology.py", line 302, in select_server
    server = self._select_server(selector, server_selection_timeout, address)
  File "/var/task/pymongo/topology.py", line 286, in _select_server
    servers = self.select_servers(selector, server_selection_timeout, address)
  File "/var/task/pymongo/topology.py", line 237, in select_servers
    server_descriptions = self._select_servers_loop(selector, server_timeout, address)
  File "/var/task/pymongo/topology.py", line 259, in _select_servers_loop
    raise ServerSelectionTimeoutError(

Does this mean we should choose a more powerful MongoDB instance? Or do we need to change some instance settings?

See:

M0 free clusters and M2/M5 shared clusters can only have a maximum of 500 connections.

With PyMongo 4.6 this means a maximum of <=250 MongoClients can be created, meaning a max of 250 AWS Lambda instances assuming this function is the only app connecting to this cluster (Each MongoClient creates a minimum of 2 connection). The errors you are seeing can start happening once the cluster reaches the connection limit and starts rejecting new connection requests.

My advice would be to scale up your Atlas cluster or set the functions Reserved Concurrency to <=250 (or less to account for other apps connecting to the cluster).

How many Lambda instances are you running concurrently?

During performance test we had even more than 500 Lambdas that run concurently. You wrote tjat PyMOngo has a limit 250 MongoClient but each Lambda should be treated as separate virtual machine. It means we have only few connection per PyMongo. Maybe you meant connection limits of server, but according to documentation for M10 we have 1500 and for M30 3000. I will check once again but as I remember during a test we had only several hundred open connection (I’ve checked on MOngoDB monitoring).

It is not clear what you tested.

Is it

or is it

Did you also had issues with M10 or m30?

My mistake, we have made test on M10 - M30

What pymongo version is your app using? I recommend upgrading to pymongo >=4.6 to see if these issues improve. 4.6 introduced changes to lower the number of connections requires when running on AWS Lambda.

Pymongo does not have a limit on the number of MongoClients that can be created. The server has a max number of connections that can be created. If each Lambda instance creates a MongoClient and runs a command (ping/insert/find), then each MongoClient creates 2 connections to the primary (note that with pymongo <4.6 it’s actually 3 connections). So 500 instances should translate to at least 1000 connections to the primary. Since you say these issues happen even on M30 which supports 3000 connections this is unlikely to be the issue (assuming you are already on >=4.6).

Another theory is that your workload is creating too many connections at once (ie connection storm). To mitigate this you could try:

  • starting your instances at a slower pace, or
  • using reserved concurrency, or
  • increasing serverSelectionTimeoutMS to wait out the connection storm. Note, the serverSelectionTimeoutMS should always be less than the Lambda function timeout which defaults to 3 seconds.

Also the mongo_client.admin.command('ping') # blocks until connected can probably be removed to improve startup time. The extra ping command adds latency at startup.

1 Like

We use 4.6.1. For sure we produce connection storm because from time to time user produce request storm what initiate Lambda creations and connection generations.

I do not understand what do you mean. Should we add ping to our code?

I was referring to this:

  File "/var/task/lib/connections/prime_db_connection.py", line 69, in <module>
    default_mongo_client = _initialize_mongodb_client()
  File "/var/task/lib/connections/prime_db_connection.py", line 60, in _initialize_mongodb_client
    mongo_client.admin.command('ping')  # blocks until connected

The “ping” call should be remove because it does nothing besides add latency to your app.

Did you find a solution here?

1 Like