$exists query vs full scan which is faster and proper way?

I have total 20m documents, and about 17m documents has Addr key.
This key is kind of address, about 1m type of value exists.

I need to get all documents that Addr exists and classify & sum value(each document has Price, Type key)
sample code of my script is like this

for docu in collection.find():
# TODO - classify by Type and determine to add or not Price

In this case, which is faster and proper way?

  1. Addr: {$exists: True}
  2. scan whole data and continue if Addr not in docu

I’m confusing which one is right way

You definitively NOT want to do with a client code for-loop.

You want to use the aggregation framework. IT is made for this kind of stuff.

You start with a $match stage for Addr existence.
You then use a $group stage with _id:$Type and $sum for Price.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.