How to use $bucketAuto to divide into evenly distributed buckets

Is there a way to use $bucketAuto to divide my data into N buckets with equal distance?
I am trying to use this in my $facet query

(It is not perfect to use $bucket with boundaries, as I would then have to fetch the range of the data first to decide the boundaries)

Thank you!

Hello :wave: @williamwjs,

Welcome back to the MongoDB Community forums :sparkles:

The $bucketAuto accepts an optional granularity parameter which ensures that the boundaries of all buckets adhere to a specified preferred number series. So, here if you don’t specify the granularity it will distribute it automatically in equal sets**.

**Note: It will depend purely on the incoming number of the document.

For example:

A collection of things have an _id numbered from 1 to 100:

{ _id: 1 }
{ _id: 2 }
...
{ _id: 100 }

If I use the $bucketAuto without specifying the granularity, it will distribute it into equal counts of 20 documents.

db.things.aggregate( [
  {
    $bucketAuto: {
      groupBy: "$_id",
      buckets: 5,
      granularity: <No granularity>
    }
  }
] )
{ "_id" : { "min" : 1, "max" : 21 }, "count" : 20 }
{ "_id" : { "min" : 21, "max" : 41 }, "count" : 20 }
{ "_id" : { "min" : 41, "max" : 61 }, "count" : 20 }
{ "_id" : { "min" : 61, "max" : 81 }, "count" : 20 }
{ "_id" : { "min" : 81, "max" : 100 }, "count" : 20 }

But if I just increase one more document in the collection the result will be not consistent across each bucket.

{ "_id" : { "min" : 1, "max" : 21 }, "count" : 20 }
{ "_id" : { "min" : 21, "max" : 41 }, "count" : 20 }
{ "_id" : { "min" : 41, "max" : 61 }, "count" : 20 }
{ "_id" : { "min" : 61, "max" : 81 }, "count" : 20 }
{ "_id" : { "min" : 81, "max" : 101 }, "count" : 21 }

It is simply because 101 is not wholly divided by 5. Overall, using $bucketAuto we can specify the number of buckets, but not the number of documents each bucket will contain.

I hope it answers your question.

Best,
Kushagra

3 Likes

@Kushagra_Kesav Thank you for your response!

I guess “evenly distributed” is vague, and my intention is to divide into buckets having equal distance.

From the $bucketAuto doc, it says “Bucket boundaries are automatically determined in an attempt to evenly distribute the documents into the specified number of buckets.”, so looks like it is trying to divide the population into buckets, each of which would have almost the same frequency.

However, my intention is to make buckets boundary having equal range, and frequency of each bucket could differ

May I ask if you know how to do that? Thank you!