Aggreagtion query optimization

steevej · March 16, 2024, 1:11pm

Here is a link to something that looks a lot like exactly the same issue but being worked on by different people.

I am skeptical of use-cases that needs to return 95K documents back to the application. Why do you need that much documents? What is your use-case exactly? Is this a frequent use-case?

The $lookup seems to perform well since the _id index is used. I suspect most of the time is spent reading documents from disk. What is the average size of documents from the users collection? What are the characteristics of the server? Is the application making the query running on the same machine? Do you need all the fields from users? A pipeline with a $project in the $lookup might be appropriate. This with a index on _id and the $project-ed fields might be able to reduce totalDocsExamined hence the disk I/O.

Sample, non redacted, documents from both collection might help understand.

You should store your dates as date rather than string. You would then not need to do $dateFromString. And dates as date take less space and are faster than the string counter parts.

Sometimes, aggregating from the $lookup-ed collection is faster that the other way around as it might reduce the number of duplicate in the result.