Hello all,
I want to apply a projection to limit the size of each document before I start my aggregation pipeline.
My document format is roughly:
sibilingField:""
metadata:{
importantField:"...",
extremlyLargeField:Binary(...)
}
In increasing the scale of my DB, I noticed that my aggregations became incredibly slow. After investigation, I have learned that the size of each document is the source of this latency. In using the explain()
feature I noticed the Query that is executed at the start of each aggregation causes 95% of the latency. This is my current understanding of my problem.
For finding a solution, this is what I understand.
The same “Select *” Query that the aggregation executes is the same as a find({})
. Given this, I want to omit the field that is incredible large and not used in my aggregation. With find({})
I can do: find({},{"metadata.extremlyLargeField:0})
which works and slashes my times! So therefore my goal should be completing the same on my aggregation.
Although, that does not exist. The obvious answer is to use the $project
aggregation stage or $unset
(which are understood as the same when only omitting), but that does not work either. The initial query time is still 95% of my latency. In the documentation it states that unused fields will be automatically omitted, but I noticed that my large field is very clearly not.
There is a clear similarity between find()
and what occurs at the start of every aggregation, but there is a very clear difference between the projection on find()
and aggregation steps.
Given this preface, here is my question: How do I apply the same pre-fetch projection to the start of my aggregation?