Primary replica slow to download a single document

Nick_Grealy · August 30, 2023, 4:30pm

Hi all,

I currently have a collection, with multiple 15MB JSON documents stored on an M5 database.

Using the following command, It’s takes 2.5 seconds to fetch a document in dev, from my M2 primary replica (this is good).

db.my_collection.find({_id: ObjectId("123abc815c237fcd9ad50744")})

The same command however in my mock production (M5, same document, but about 10x the number of documents), is taking 2 minutes to fetch a single document!

In summary:

In the “development” environment on the primary replica, it takes 2.5 seconds to download. (good)
In the “production” environment on the secondary replica, it takes 2.5 seconds to download. (good)
In the “production” environment on the primary replica, it takes 120 seconds to download. (bad)

I’m fetching the document by _id, so there shouldn’t be a delay in the time it takes to query the document.

I’m the only user currently on the system.

I can’t tell what else could be the cause.

Why am I getting this slow performance for downloading a single document in my primary replica only?

Nick_Grealy · August 30, 2023, 4:34pm

Attaching the .explain("executionStats"), in case it sheds light.

{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'mydb.my_collection',
    indexFilterSet: false,
    parsedQuery: {
      _id: {
        '$eq': ObjectId("123abc815c237fcd9ad50744")
      }
    },
    queryHash: '740C02B0',
    planCacheKey: 'E351FFEC',
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'IDHACK'
    },
    rejectedPlans: []
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 1,
    executionTimeMillis: 1,
    totalKeysExamined: 1,
    totalDocsExamined: 1,
    executionStages: {
      stage: 'IDHACK',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 2,
      advanced: 1,
      needTime: 0,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keysExamined: 1,
      docsExamined: 1
    }
  },
  command: {
    find: 'my_collection',
    filter: {
      _id: ObjectId("123abc815c237fcd9ad50744")
    },
    '$db': 'mydb'
  },
  serverInfo: {
    host: 'mydb-shard-00-01.abcde.mongodb.net',
    port: 27017,
    version: '6.0.9',
    gitVersion: '90c65f9cc8fc4e6664a5848230abaa9b3f3b02f7'
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 16793600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 33554432,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
  },
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1693412063, i: 5 }),
    signature: {
      hash: Binary(Buffer.from("abcdefa76c986469f16ba2c5ae5f348475bc8743", "hex"), 0),
      keyId: 1234894767884730000
    }
  },
  operationTime: Timestamp({ t: 1693412063, i: 5 })
}

Jason_Tran · August 30, 2023, 11:48pm

Hi @Nick_Grealy,

Firstly - thanks for providing the detailed summary It’s definitely interesting (just from an initial glance) for the 120 second document fetch using _id on the M5 tier cluster.

Based off the "executionStats", it shows 1ms. Just to provide some extra context so I can better understand the issue, can you advise how you are measuring the 120 seconds / 2 minutes from the M5 tier cluster? Additionally, are you seeing this 120 seconds query time when you are running the same find command when connecting to the M5 tier cluster via mongosh shell?

Lastly, has this behaviour always been experienced on this M5 tier cluster? Or is this something more recent? I am thinking that perhaps exceeding the data transfer limitation may be a possibility here.

Look forward to hearing from you.

Regards,
Jason

Nick_Grealy · August 31, 2023, 1:04am

Hi @Jason_Tran ,

I measure the time time in Compass, from the moment I execute the query ({_id: ObjectId("123abc815c237fcd9ad50744")}) to the moment it returns the document.

I experience the same behaviour in my application (NodeJS), and in a second DB client (NoSQL).

I haven’t tried mongosh… I can try it, but I don’t think it’ll make much difference.

I think you might’ve hit the nail on the head with the data transfer limitation. How do I find out:

if it’s in force?
how much I’ve used (*in the 7 day window)?

Kind regards,
Nick

Jason_Tran · August 31, 2023, 1:06am

Sorry, just to clarify here, is this using a particular feature in compass or more so just counting seconds (general)?

You’ll need to contact the atlas in-app chat support team since they’ll have more insight into your Atlas project / cluster. Provide them with the cluster name / link.

Regards,
Jason

Nick_Grealy · August 31, 2023, 1:30am

Compass - didn’t show execution time so I’m “counting seconds”.

NoSQL - shows an execution time of 112.581 sec

NodeJS - performance logging shows 112,437 ms

(primary - 112,437ms)
(readPreference=secondary = 2,115ms)

Support haven’t been helpful - they seem to only be interested in funnelling me to purchasing a higher tier db, despite already being a paying customer.

How do I find out if I’m being rate limited?

Nick_Grealy · August 31, 2023, 2:33am

Update

I upgraded to M10, and somehow the performance is worse. Waiting for support to get back to me…

Nick_Grealy · August 31, 2023, 4:52am

Shout out to Seemi on support ( @Seemi_Hasan ?) - who was actually able to help confirm…

Hi Nick,
I apologise for the delay in my response as this required some in depth log analysis. To address your > original concern before your upgrade to M10 cluster tier:
I database has become extremely slow to return documents. I am on the M5 tier.
I believe I have hit the Data Transfer Limits, as decribed here, and am being rate limited.https://www.> mongodb.com/docs/atlas/reference/free-shared-limitations/#operational-limitations

How do I see on the dashboard, whether I have hit my rate limit?

I have confirmed from the internal logs that your M5 cluster was throttled due to the Network Limit.

[2023/08/31 01:14:38.577] [ProxySession(_,mydb-shard-00-01.abcde.mongodb.net,139.> 59.100.19:43514).info] [commands.go:InterceptMongoToClient:1376] Network transfer limit (inbound: 50.000 GB/> week, outbound: 50.000 GB/week) exceeded on mydb-shard-00-01.abcde.mongodb.net. Weekly > (past 7 days) inbound usage = 0.053 GB, weekly outbound usage = 189.617 GB. Throttling down to 100000 bytes/> week by sleeping 0.002 secs.

I hope that this provides a clearer understanding of the earlier instance of slow cluster response.

Jason_Tran · August 31, 2023, 4:57am

Thanks for the update Nick! Glad to hear you were helped out by Seemi on the chat support team.

As you are working with the in-app chat support team would it be okay to close this particular post? I presume the original M5 issue was due to the throttling you’ve mentioned in the most recent reply.

Regards,
Jason

Nick_Grealy · August 31, 2023, 5:01am

@Jason_Tran - before you close this off, I’d like to know how to see the following from the dashboard?

→ Weekly (past 7 days) inbound usage / weekly outbound usage

That way I can monitor the limits, and determine whether I’m going to be rate limited (proactively) in the future.
(Perhaps even setup an alert! I hate finding out Production is down, after the fact.)

Jason_Tran · August 31, 2023, 5:33am

Unfortunately there are no alerts available for this limit. In a previous post about whether this could be verified, I had advised checking with the in-app chat support as well since they have more insight about it per Atlas account.

To my knowledge, the only way currently is to manually approximate using calculations from the Network chart for shared tier clusters. You could change the zoom to 1 week and adjust the granularity. I believe the limit is for all nodes (i.e. You need to add up the network usage for all 3 and then if that total value exceeds the limit, you will be throttled). Hope this makes sense.

Nick_Grealy · August 31, 2023, 1:16pm

Feature requested… hopefully this saves someone else from a sleepless night!

Nick_Grealy · August 31, 2023, 1:21pm

@Jason_Tran - while I have you. Does M10 have any rate limits?

kevinadi · August 31, 2023, 2:52pm

Hi @Nick_Grealy

Glad to hear that your issue was solved to your satisfaction.

However I’d like to circle back to one of your earlier comments:

Sorry if you feel that way, but for the record, let me assure you that no one in support is incentivized to sell you anything. They are not sales. Their goal is to help you be successful with MongoDB. However if it was determined that your workload is too large for your current deployment size, then support is obliged to tell you that fact.

As with your question:

If you deploy on AWS, as per MongoDB on AWS Cloud Pricing, you pay for data transfer costs instead of being throttled. There are more details on the page Atlas Cluster Sizing and Tier Selection that may be of interest as well.

Thanks for your patience and welcome to the community!

Best regards
Kevin

system · September 5, 2023, 2:53pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.