Get a histogram of a collection's fields

Frequency (histogram) was a topic covered in chapter 5. I examined the frequency of stars field by bucketing it to find out whether it’s monotonically increasing using the following aggregate query:

db.restaurants.aggregate([ {$match: {stars: {$gt: 0}}}, {$bucketAuto: {groupBy: “$stars”, buckets: 5 } } ])

I got " Sort exceeded memory limit of 104857600 bytes", however, an index on stars exists. if stars > 2 or so, no error is issued. How is such problem solved? Also, is there another way of getting a histogram of all fields or specified fields beside $facet?


Look at Compass schema analysis.

You haven’t answered why the query issues a memory error. How can I optimize the query and get the intended result?
Appreciate your help.

Because I do not know. You might try sorting first.

was my answer to

I added sort, the same memory error was issued:
db.restaurants.aggregate([ {$match: {stars: {$gt: 0}}}, {$sort: {stars: 1}}, {$bucketAuto: {groupBy: “$stars”, buckets: 5 } } ]).pretty()

I had one index only {stars: 1}, plus the default _id.
Can you try running the query to see if you get the same memory error like I did? Appreciate your help.

I think that you’re hitting the memory limit for $bucketAuto. See MongoDB documentation:

“The $bucketAuto stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $bucketAuto returns an error. To allow more space for stage processing, use the use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files.”

1 Like