How to speed up find and aggregate operations?

Dear all,

I’m trying to get a list of elements from MongoDB 4.2 using a pair of Bash functions and the mongo CLI command:

getnewlist ()
{
ssh $ssh_user@$mongo_host_source -i $ssh_key_source sudo -u someuser mongo --quiet -u someuser mongodb://$mongo_ip_source/metadata -p $db_admin_password_source -eval \'DBQuery.shellBatchSize\="$shell_batch_size"\;db.Unit.aggregate\(\[\{\$group\:\{_id\:\"\$_"$1"\"\}\}\],\{allowDiskUse\:true\}\)\' | cut -d ":" -f 2 | tr -d [:blank:][:punct:]
}

With that first function I’m querying the metadata database to get a list of operations to work with. That list is saved to a file for later use. $1 equals “opi”, short for “operation identifier”.

getnumberofua ()
{
ssh $ssh_user@$mongo_host_source -i $ssh_key_source sudo -u someuser mongo --quiet -u someuser mongodb://$mongo_ip_source/metadata -p $db_admin_password_source -eval \'db.Unit.aggregate\(\[\{\$group\:\{_id\:\"\$_"$1"\"\}\},\{\$facet\:\{totalCount\:\[\{\$count\:\"count\"\}\]\}\}\],\{allowDiskUse\:true\}\)\' | cut -d ":" -f 3 | tr -d [:blank:][:punct:]
}

With that second function I’m getting a list of “archiving units” from the metadata database, they are a subdivision of the aforementioned “opi”. $1 in that context equals “id”, i.e archiving units. At first I wanted to get a complete liste of archiving units, similar to my list of OPIs. But then I realized I didn’t care much about a list and a simple total count would be sufficient.

These two functions are working great on small databases but they don’t scale up well when millions of documents (~7) are involved.

What am I doing wrong ?

Best regards,

Samuel

Hello @Samuel_VISCAPI1 ,

Welcome to The MongoDB Community Forums! :wave:

If the queries are fast for small amount of data but struggles with a larger amount of data, please make sure that all your queries are backed by indexes. To check further on the slowness observed, could you share below details:

  1. Please share the queries you are using.
  2. If there are any error messages observed while executing the queries with large dataset.
  3. The output of your queries with .explain().
  4. Deployment specifications
  5. Hardware specifications (eg. disc space, RAM etc)
  6. How many documents are returned as result by these queries?

Regards,
Tarun

Welcome to the MongoDB community @Samuel_VISCAPI1 !

As @Tarun_Gaur mentioned, further details would be useful in order to provide relevant advice.

However, I noticed your aggregation queries are using $group to count all documents so the number of documents processed will scale with the size of the collection.

A few ideas to consider:

  • Improve your aggregation query by limiting the number of documents to process with an initial $match stage (see Improve Performance with Indexes and Document Filters).

  • Maintain a count in your application logic when documents are updated or deleted so you don’t have to calculate this dynamically.

  • If you need a count of all documents in a collection and speed is more important than accuracy, use the estimatedDocumentCount().

  • If you have other functions that return larger result sets, consider implementing using an officially supported MongoDB driver instead of piping to the mongo shell. You will likely have better performance and can do any extra transformations within a single implementation instead of piping through other tools like cut or tr.

Regards,
Stennie

1 Like