Error in Topology: ReplicaSetNoPrimary - AWS Lambda

francisco_Innocenti · May 23, 2023, 3:28pm

What problem are you facing?
having issue trying to connect with mongo atlas using aws lambda.

What driver and relevant dependency versions are you using?
nodejs 14.x
mongodb 5.5 ( mongo client )
mongo atlas 6.0.6

Steps to reproduce?
this is my code when try to connect:

// marcador de posición de código
const { MongoClient } = require('mongodb')
const URI = process.env.MONGO_URI
let cachedDb = null
const connect = async () => {  const payload = { success: false, data: null, message: [], errors: [] }  try {    if (!cachedDb) {      // Si no está conectado, establece la conexión      const client = await MongoClient.connect(URI)      cachedDb = client.db()      payload.success = true      payload.message.push('Mongo connection established successfully')      payload.data = cachedDb    }    return payload  } catch (error) {    console.error('Error connecting with database:', error)    payload.success = false    payload.data = null    payload.errors.push('Uncontrolled error in mongoConnection.service.connect')    return payload  }}
module.exports = { connect }

then when I try to run it in aws lambda, serverless.com report me this error (from aws)

// marcador de posición de código
2023-05-22T22:30:55.830Z b4f960ec-6c55-43d6-b6a0-b4ea59a63538 ERROR Error conecting with database: MongoServerSelectionError: Server selection timed out after 30000 ms at Timeout._onTimeout (/var/task/node_modules/mongodb/lib/sdam/topology.js:278:38) at listOnTimeout (internal/timers.js:557:17) at processTimers (internal/timers.js:500:7) { reason: TopologyDescription { type: 'ReplicaSetNoPrimary', servers: Map(3) { 'plata-dev-shard-00-01.w4zuh.mongodb.net:27017' => [ServerDescription], 'plata-dev-shard-00-02.w4zuh.mongodb.net:27017' => [ServerDescription], 'plata-dev-shard-00-00.w4zuh.mongodb.net:27017' => [ServerDescription] }, stale: false, compatible: true, heartbeatFrequencyMS: 10000, localThresholdMS: 15, setName: 'atlas-yu8x8e-shard-0', maxElectionId: null, maxSetVersion: null, commonWireVersion: 0, logicalSessionTimeoutMinutes: null }, code: undefined, [Symbol(errorLabels)]: Set(0) {} }

—

I already peering with my VPC, allow public connection, and did all the tips that I found in internet, but I don’t know why it still failling

Jason_Tran · May 24, 2023, 4:07am

Thanks for providing those details @francisco_Innocenti.

To try narrow down what the issue could be, please confirm / provide the following information:

Whether the cluster you’re connecting to is a Serverless Instance or a M10+ tier. If neither, please advise what shared tier instance the cluster is.

Are you able to connect to the cluster from outside of the VPC (via public internet) from a different client (e.g. your own laptop through the internet)?
You noted you’ve allowed public connection and VPC peering - is your main goal simply just to connect to the cluster or are you wanting to connect via the VPC peering connection only?
From within the same VPC where the lambda instance exists, can you perform simple networking tests such as ping and telnet to the Atlas cluster nodes / hosts? More details on this here. Please advise the results.
Have you ever been able to connect to this cluster in the past?

Additionally, please review the following articles which may be of use:

Regards,
Jason

francisco_Innocenti · May 24, 2023, 2:32pm

Hello @Jason_Tran ,

sure, here is more details:

Cluster M0
Yes, currently also testing to connect outside ( all ip allowed )
we are using differents stages. prod is over peering and develop/testing has allowed public connection. Errors shows in AWS lambda with allowed public connection.
yep, is working
yes.

Also I am following los post about AWS Lambda and MongoDB. there is some post that are contradictory, making some confusion of what is right:

this one advice to use “.connect” everything that lambda function run or check for a “stored” a client instance => Write A Serverless Function with AWS Lambda and MongoDB | MongoDB
this one advice to use new “constructor”, declared outside of handler and there is no need to use “.connect” => Using the Node.js MongoDB Driver with AWS Lambda | MongoDB

Second one explaining what the problem that we are having with the “Connection pool”,

After Apply second one, we realize that is some functions in our lambda that are running issues with “connection pool”, going deep in the code we check that we are using a iteration and making queries in parallel. Could be that an problem with “connection pool” ?

Jason_Tran · May 24, 2023, 10:17pm

Based off your answers it seems the cluster can be connected to but from other instances (as opposed to AWS lambda). However, since you’ve also stated its an M0 tier cluster, it won’t be able to utilise the network peering connection as per the Set Up a Network Peering Connection documentation:

This feature is not available for M0 free clusters, M2, and M5 clusters. To learn more, see Atlas M0 (Free Cluster), M2, and M5 Limitations.

Can you advise the output of the networking tests from the client that you tested it from that was from the same VPC as the AWS lambda instance?

It is difficult to say at this moment whether or not this would be related to the connection issue. Can you further explain here? Are you having a scenario where you’re maxing out of connections?

Regards,
Jason

francisco_Innocenti · May 30, 2023, 6:40pm

I am using M0 for testing but we are having the same issue from M30.

the instance and problem from peering is not. This error starting after some weeks we deploy our first lambda.

When we deployed our first lambda functions. everything was working good THEN after some weeks, this issues start to appears. No reason and I can find explainiation of why. SOMETIMES it connect to intance M30 and other time NOT.

This is a seriues issue that is putting our business in risk be cause we can not connect with Mongo Atlas. this is terrible!

we need to fix this AS SOON AS POSIBLE

I found that many other users have been having same issues but NO response of why that is happening: even the post doesnt seems that have a solution

Jason_Tran · May 31, 2023, 1:47am

The community forums can be a starting point for discussion on development or product questions if you do not have a paid support plan. There is no SLA (or guarantee) around responses, but anyone in the community is encouraged to share suggestions or experience so you should get more eyes on your posts. Our engineering and product teams also look for community discussions where we can help, but have to balance availability with development and product priorities. If this project is of the utmost importance to you, then perhaps raising a support case with agreed SLA is the best option. Details on support plans are available through the UI as part of the procedure to change your support plan or by contacting MongoDB.

Regarding the post that you’ve linked, although the error is the same, I do not believe that alone can be a direct indicator that the issue / root cause is the same. For example, I have a test environment in which I’ve removed all Network Access List entries and tried connecting and got the same error:

Note: The test client is using the MongoDB NodeJS Driver version 4.11.0 for the below test.

MongoServerSelectionError: connection <monitor> to <redacted>:27017 closed
    at Timeout._onTimeout (/home/ubuntu/tour/node_modules/mongodb/lib/sdam/topology.js:293:38)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7) {
  reason: TopologyDescription {
    type: 'ReplicaSetNoPrimary',
    servers: Map(3) {
      'ac-<redacted>-shard-00-01.qemgxcq.mongodb.net:27017' => [ServerDescription],
      'ac-<redacted>-shard-00-02.qemgxcq.mongodb.net:27017' => [ServerDescription],
      'ac-<redacted>-shard-00-00.qemgxcq.mongodb.net:27017' => [ServerDescription]
    },
    stale: false,
    compatible: true,
    heartbeatFrequencyMS: 10000,
    localThresholdMS: 15,
    setName: 'atlas-<redacted>-shard-0',
    maxElectionId: null,
    maxSetVersion: null,
    commonWireVersion: 0,
    logicalSessionTimeoutMinutes: null
  },
  code: undefined,
  [Symbol(errorLabels)]: Set(0) {}
}

However, with the above example and as stated previously, the error may be the same but the root cause may differ. In this example, it was due to the client’s IP not being on the Network Access List for my test environment.

In saying the above, this is probably not the case for you as you’ve allowed all IP entries and you’re stating that the connection issue is intermittent. Assuming there are no cluster issues (resource exhaustion, outages, etc.) and although I understand it is not ideal but for the purposes of troubleshooting, you can try connecting from a non AWS Lambda instance and see if the same intermittent connection issue is also happening. This could determine if the issue is from AWS Lambda or Atlas.

You can get this client to connect during the same periods to see if the timeout error occurs as well during the same period when connecting to the same cluster ensuring most, if not all, other variables are the same (driver, driver version, etc).

This will be one step to help narrow down what the root cause could be. On top of this, to ensure we cover as much possibilities as we can, you can also consider contacting AWS support to see if there is anything lambda specific that may cause the intermittent time outs.

Regards,
Jason

Jason_Tran · June 21, 2023, 11:10pm

A post was split to a new topic: M10 and AWS Lambda connection issue