Aggregate pipeline - keeping count of records found

Simen_Munter · April 4, 2022, 9:10am

I am using an aggregation pipeline to drill down through a dataset through several stages and come up with an outcome dataset. That works well.

However, I would also like to tell the user how many records were “dropped” (or remaining - doesnt matter) on the way through the pipeline. Eg. if I start with 2000 records and after the first match the aggregation pipeline contains 750 records I want to either collect 1250 or 750 into a separate variable.

Likewise, if the next step takes those 750 and reduces that to 350, I want to store either 400 or 350.

However I am struggling (no doubt due to my lack of knowledge!) as if I include “count” then that becomes the output of that stage and thereby disrupts the flow into the next stage. If I try to count the pipeline results I get an error. I have spent to much time on this, wondering to do the aggregation twice, first to get the records and second time to count them but that just feels wrong.

Some sample code to help illustrate my struggles:

Pipeline1= [
{’$match’: {‘Int_Nominal_IntRate’: {’$lt’: CustomerRate}}},
#This is a problem
** {’$count’: ‘LowerRate’}**
]
results = Deal_collection.aggregate(Pipeline1)

#This also doesnt work
LowerRate= Deal_collection.count({‘Int_Nominal_IntRate’: {’$lt’: CustomerRate}})

Pipeline2 =[
{’$match’: {CustomerLTV : {’$lt’: ‘Restr_MaxLTV’, ‘$gt’: ‘Restr_MinLTV’}}},
{’$count’: ‘WithinLTV’}
results = Deal_collection.aggregate(Pipeline2)

Any help much appreciated.

steevej · April 4, 2022, 12:49pm

Your feeling is correct. It would be wrong to do it with 2 accesses to the DB because you may end up with inconsistent results because the data might be modified between the 2 accesses.

I am not sure how you could do that but you might want to experiment with

or