I’m trying to optimize an aggregation query having a match based on some calculated fields from a lookup stage, using limit and skip for pagination.
Let’s consider 2 collections, invoice and invoiceLine.
invoice will have a property called status, based the paidAt of the invoiceLines.
invoiceLine will have an invoiceId, and a paidAt that is a date.
invoice status can have different values depending on all the items paidAt. If one is unpaid, status is unpaid, if all are paid, status is paid for exemple.
If i want to query only the 500 first paid invoices, how can I limit my query?
For now, the limit operator is located at the end of my aggregation pipeline, but I feel like it term of efficiency, i could do better.
I’ve created an index on invoiceId in the invoiceLines collection, that helps, but with a lot of data, it’s still quite slow.
If I use limit at the start of my pipeline, I only have 500 invoices to work with, and then some are filtered, so I do not have the 500 expected invoices in the result.
I also feel like I can’t use a match in the lookup stage, because the status is calculated using all the invoiceLines paidAt of the invoice.
So I don’t really know how to paginate efficiently this use case.
Hoping that the question is clear, tell me if you need any complementary information.
Thanks in advance.