Optimize aggregation query limit based on lookup results

Hi everyone,

I’m trying to optimize an aggregation query having a match based on some calculated fields from a lookup stage, using limit and skip for pagination.

Let’s consider 2 collections, invoice and invoiceLine.

invoice will have a property called status, based the paidAt of the invoiceLines.
invoiceLine will have an invoiceId, and a paidAt that is a date.

invoice status can have different values depending on all the items paidAt. If one is unpaid, status is unpaid, if all are paid, status is paid for exemple.

If i want to query only the 500 first paid invoices, how can I limit my query?

For now, the limit operator is located at the end of my aggregation pipeline, but I feel like it term of efficiency, i could do better.

I’ve created an index on invoiceId in the invoiceLines collection, that helps, but with a lot of data, it’s still quite slow.

If I use limit at the start of my pipeline, I only have 500 invoices to work with, and then some are filtered, so I do not have the 500 expected invoices in the result.

I also feel like I can’t use a match in the lookup stage, because the status is calculated using all the invoiceLines paidAt of the invoice.

So I don’t really know how to paginate efficiently this use case.

Hoping that the question is clear, tell me if you need any complementary information.
Thanks in advance.

Please share sample documents from all collections so that we can experiment with your use case.