I have a collection with over 10 Million records, I need to match with a particular field and get the distinct _ids of the records set.
after the $match pipeline the result set becomes less than 5 Million. if i group with id to get the unique ids, the execution time on my local environment is over 20 seconds.
I’ve already tried the above approaches and they take around 14 seconds in local environment, the execution time will be doubled when I run the query on my hosted production db. only having $match or $group is executed within less than 1ms. combining both the pipelines increases the execution time. what could be the reason? My expection was as the first match already reduce the dataset the group should work even faster.
i have same problem with $match and $group.
i think it is slow because Aggregation Pipeline work like a process Pipe.
The documents that are output from a stage are passed to the next stage.
it will bring 5 millions record (fully data) to next stage for $group. that why when we combine $match and group. it will be slow.
p/s: u using single $match but take limit records. so it is executed within less than 1ms. https://www.mongodb.com/docs/manual/core/aggregation-pipeline/