$bucket
The $bucket
operator in MongoDB is used within an aggregation pipeline to categorize incoming documents into buckets or groups based on a specified expression and boundaries. This operator is particularly useful for dividing a collection of documents into ranges and performing aggregate calculations on each range.
Syntax
Here's the basic syntax of the $bucket
operator:
{
$bucket: {
groupBy: <expression>,
boundaries: [<lowerbound1>, <lowerbound2>, ...],
default: <default_value>,
output: {
<output_field1>: { <accumulator1>: <expression1> },
...
}
}
}
groupBy
: The expression by which to group documents.boundaries
: An array of values that specify the boundaries for each bucket.default
: The value to use for documents that don't fall into any bucket.output
: Optional. The fields to include in the output documents, along with their corresponding accumulator expressions.
Example
Consider a sales
collection with the following documents:
[
{ "_id": 1, "amount": 100 },
{ "_id": 2, "amount": 200 },
{ "_id": 3, "amount": 300 },
{ "_id": 4, "amount": 400 },
{ "_id": 5, "amount": 500 }
]
You can use the $bucket
operator to categorize these sales into different ranges:
db.sales.aggregate([
{
$bucket: {
groupBy: "$amount",
boundaries: [0, 200, 400, 600],
default: "Other",
output: {
count: { $sum: 1 },
average_amount: { $avg: "$amount" }
}
}
}
])
This will produce:
[
{ "_id": 0, "count": 2, "average_amount": 150 },
{ "_id": 200, "count": 2, "average_amount": 350 },
{ "_id": 400, "count": 1, "average_amount": 500 }
]
Considerations
The
boundaries
array must be sorted in ascending order, and it cannot contain duplicate values.The
groupBy
expression can include field paths, literals, and other expressions.The
default
field is mandatory for handling documents that don't fit into any of the specified buckets.The
output
field allows you to apply various accumulator expressions like$sum
,$avg
,$min
,$max
, etc., to the documents in each bucket.