Slow queries between mongoose and mongoDB Atlas on Google Cloud Function

hazal · July 20, 2023, 1:22pm

Hi,
Here is the summary of our case :

we are using MongoDb Atlas (M30).
we have a collection have 400K records (we are expecting it will be about 40M)
We are using google cloud function, node.js and mongoose.
For connect to MongoDB Atlas, we are using srv connection string, (‘mongodb+srv://…’ ) and ssl is enabled.
MongoDB Atlas and Google Cloud function are in the same region.
We are using lean() option for queries to get data in json format and save memory.
We are adding projection to get just fields we need.
We use index for our query and we tested and been sure it works.
here is our index:

{
 brand: 1,
 facture: 1,
 results.content.algo1.version: 1,
 results.content.algo1.bestOf.segments.cluster: 1
 _id: 1,
}

note that: we add _id to index just for trying _id pagination to get data.

Our query is something like that:

find({
  brand: "MY_BRAND",
  facture: "MY_FACTURE",
  'results.content.algo1.version': MY_VERSION,
  'results.content.algo1.bestOf.segments.cluster': MY_CLUSTER
})
projection({
  'platforms.platform1.devices': 1,
  'platforms.platform1.devices': 1,
})
.sort({_id:1})
.limit(1000)

and generally about 50k-100k record is matching from queries in total(of course we will use multiple queries with limit or cursor). (this is for 400k record, when we reach 40M record it will be parallelly increase)
“segments” and “devices” are both array. so we couldn’t build a full covered index.

So here is the problem:

When we try to querying data from GCF, it takes 3-4 seconds to response for a single query(limit 1000). But it just takes maximum 300-400ms on MongoDB atlas sheel and also when we monitor the same queries on Atlas console we saw querying only takes 300-400 ms.
The size of result can be about 10-15MB. We know it is huge but i am not sure it should caused 10x times higher time.
We tried cursor with batchSize and also _id pagination to get data fast. But it didn’t change.
We know, trying the get smaller data will be the best. But we need that. In other hand, decreasing the limit can decrease the response time but in total it takes much time and caused so much query count for whole operation. Because we have to get all data at the end we need.

We may missing something about dealing with huge data on MongoDB.
So, do you think the gap between query and response time (300ms → 3seconds) is normal because of the size of returning data?
And do you have any other suggestion for handling this situation ?
Thank you.

Kushagra_Kesav · July 21, 2023, 7:11am

Hey @hazal,

Welcome to the MongoDB Community!

You mentioned the query takes 3-4 seconds even when returning a smaller dataset. To clarify - if you reduce the limit to say 100 records, do you still see the query taking 3-4 seconds consistently? Or does it scale down with the smaller result set size?
To eliminate network latency as a possible factor, could you try running the query from your local machine using mongosh connected directly to Atlas? How does the performance compare when running from your laptop?
Could you share the full output of db.collection.find(...).explain()? That may provide some insights into what operations are taking time during the query execution.
As one of the performance tests - could you try structuring the same query as an aggregation pipeline with $match, $project, and $limit stages? Does running it as an aggregation show any difference in timing compared to the find() query?
Also, could you share a sample document from your collection on which you are working?

These additional troubleshooting steps and comparisons will provide us with further clues into what may be causing the slower performance. The objective is to narrow down where the slowness is occurring - in the network, query execution, or other factors.

Look forward to hearing from you.

Regards,
Kushagra

hazal · July 21, 2023, 3:19pm

Hi Kushagra,
thanks for your reply!

yes, we tried with limit 100, as expected it decreased the response time. one query with 100 limit only took about 200-300 ms on GCF. But in that case, we had so many queries. And the total time wasn’t lower than other case.
we observed queries from mongoDb atlas slow queries console as i mentioned before. And this queries only takes maximum 300-400 ms and most of the queries didn’t seem as slow queries btw.
here is the result of explain() of the query that i wrote before (find with _id pagination and limit 1000)

{
  explainVersion: '1',
  queryPlanner: {
    namespace: 'myDb.combinedUsers',
    indexFilterSet: false,
    parsedQuery: {
      '$and': [
        {
          'results.content.algo1.version': {
            '$eq': '2023-06-01-0-12'
          }
        },
        {
          'results.content.algo1.bestOf.segments.cluster': {
            '$eq': 1
          }
        },
        {
          brand: {
            '$eq': 'mybrand'
          }
        },
        {
          facture: {
            '$eq': 'myfacture'
          }
        }
      ]
    },
    maxIndexedOrSolutionsReached: false,
    maxIndexedAndSolutionsReached: false,
    maxScansToExplodeReached: false,
    winningPlan: {
      stage: 'LIMIT',
      limitAmount: 1000,
      inputStage: {
        stage: 'PROJECTION_DEFAULT',
        transformBy: {
          'platforms.platform1.devices': 1,
          'platforms.platform2.devices': 1
        },
        inputStage: {
          stage: 'FETCH',
          inputStage: {
            stage: 'IXSCAN',
            keyPattern: {
              brand: 1,
              facture: 1,
              'results.content.algo1.version': 1,
              'results.content.algo1.bestOf.segments.cluster': 1,
              _id: 1
            },
            indexName: 'brand_1_facture_1_results.content.algo1.version_1_results.content.algo1.bestOf.segments.cluster__id_1',
            isMultiKey: true,
            multiKeyPaths: {
              brand: [],
              facture: [],
              'results.content.algo1.version': [],
              'results.content.algo1.bestOf.segments.cluster': [
                'results.content.algo1.bestOf.segments'
              ],
              _id: []
            },
            isUnique: false,
            isSparse: false,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: {
              brand: [
                '["mybrand", "mybrand"]'
              ],
              facture: [
                '["myfacture", "myfacture"]'
              ],
              'results.content.algo1.version': [
                '["2023-06-01-0-12", "2023-06-01-0-12"]'
              ],
              'results.content.algo1.bestOf.segments.cluster': [
                '[1, 1]'
              ],
              _id: [
                '[MinKey, MaxKey]'
              ]
            }
          }
        }
      }
    },
    rejectedPlans: [
      ...
    ]
  },
  command: {
    find: 'combinedUsers',
    filter: {
      brand: 'mybrand',
      facture: 'myfacture',
      'results.content.algo1.version': '2023-06-01-0-12',
      'results.content.algo1.bestOf.segments.cluster': 1
    },
    sort: {
      _id: 1
    },
    projection: {
      'platforms.platform1.devices': 1,
      'platforms.platform2.devices': 1
    },
    limit: 1000,
    '$db': 'myDb'
  },
  serverInfo: {
   ...
  },
  serverParameters: {
    internalQueryFacetBufferSizeBytes: 104857600,
    internalQueryFacetMaxOutputDocSizeBytes: 104857600,
    internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
    internalDocumentSourceGroupMaxMemoryBytes: 104857600,
    internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
    internalQueryProhibitBlockingMergeOnMongoS: 0,
    internalQueryMaxAddToSetBytes: 104857600,
    internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
  },
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1689950028, i: 139 }),
    signature: {
      ...
    }
  },
  operationTime: Timestamp({ t: 1689950028, i: 139 })
}

yes we tried agregation pipeline too. There was no difference between find.
And additionally this is our monitoring of service network while we tried to get this data (total 120K matched records)
** first big pick is while we tried get data with limit 1000,
** the 2nd and 3rd ones belongs to tries with limit 100

So,
i think the problem is not performance of MongoDB, it is about network.
Maybe we need edit something about network preferences, idk. Is there any setting for max network out bytes or something else on Mongodb Atlas?

Best regards

hazal · July 28, 2023, 9:44am

Hey,
We found the solution for our case. Maybe it will help someone else.

It looks like, it is about Google Cloud Function’s sources in our case. Altough metric graphics were good, when we increase the memory of GCF to 4GB we got so much better results. (3s ->. 400-500 ms). More memory is good but it directly effects the CPU source too in Google Cloud Functions. (Limiti di memoria | Documentazione di Cloud Functions | Google Cloud)

We were trying so much complex things, we suprised about this basic solution !

But keep in mind:
This is not a solution by itself.

create proper collection schema
create proper indexes
create proper query
check your MongoDb Atlas and server are in same region
set right poolSizes while connecting to MongoDb
choose right MongoDb Atlas cluster for your need
use lean() to save memory
use projection and get fields only you need

you can also try:

peering connection between GCF and Atlas (https://www.mongodb.com/docs/atlas/security-vpc-peering/#considerations-1)
network compression (https://www.mongodb.com/docs/drivers/node/current/fundamentals/connection/network-compression/#network-compression)

Best Regards,
Hazal