The $bucketAuto accepts an optional granularity parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series. So, here if you don’t specify the granularity it will distribute it automatically in equal sets**.
**Note: It will depend purely on the incoming number of the document.
For example:
A collection of things have an _id numbered from 1 to 100:
{ _id: 1 }
{ _id: 2 }
...
{ _id: 100 }
If I use the $bucketAuto without specifying the granularity, it will distribute it into equal counts of 20 documents.
It is simply because 101 is not wholly divided by 5. Overall, using $bucketAuto we can specify the number of buckets, but not the number of documents each bucket will contain.
I guess “evenly distributed” is vague, and my intention is to divide into buckets having equal distance.
From the $bucketAuto doc, it says “Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.”, so looks like it is trying to divide the population into buckets, each of which would have almost the same frequency.
However, my intention is to make buckets boundary having equal range, and frequency of each bucket could differ
Here, the 10/6 split is the best it can do with 2 buckets, as other potential boundaries would result in similarly balanced or even less balanced splits.
Further, if you increase the number of buckets to 4:
The boundaries are getting automatically determined in an attempt to evenly split within the given number of buckets.
So, if you wish to determine the bucket boundaries manually, I’ll recommend using the $bucket aggregation pipeline instead. As you stated in your first post, it may require fetching the range of the data first to determine the boundaries, but this is an essential step in the process. However, feel free to reach out if you need any assistance or further guidance.
I guess this is a feature request then.
To summarize, here is a detailed example of my request:
For example, requirement is automatically bucket the data into 5 buckets with equal-distance range, Then
a) if the total range of the data is 1-100, then it would automatically pick the following range: