Optimize aggregation query limit based on lookup results

Lucas_GHANEM · April 25, 2023, 7:30pm

Hi everyone,

I’m trying to optimize an aggregation query having a match based on some calculated fields from a lookup stage, using limit and skip for pagination.

Let’s consider 2 collections, invoice and invoiceLine.

invoice will have a property called status, based the paidAt of the invoiceLines.
invoiceLine will have an invoiceId, and a paidAt that is a date.

invoice status can have different values depending on all the items paidAt. If one is unpaid, status is unpaid, if all are paid, status is paid for exemple.

If i want to query only the 500 first paid invoices, how can I limit my query?

For now, the limit operator is located at the end of my aggregation pipeline, but I feel like it term of efficiency, i could do better.

I’ve created an index on invoiceId in the invoiceLines collection, that helps, but with a lot of data, it’s still quite slow.

If I use limit at the start of my pipeline, I only have 500 invoices to work with, and then some are filtered, so I do not have the 500 expected invoices in the result.

I also feel like I can’t use a match in the lookup stage, because the status is calculated using all the invoiceLines paidAt of the invoice.

So I don’t really know how to paginate efficiently this use case.

Hoping that the question is clear, tell me if you need any complementary information.
Thanks in advance.

steevej · April 29, 2023, 1:05pm

Please share sample documents from all collections so that we can experiment with your use case.