What are some of the biggest mistakes people make in aggregation pipelines? How can people avoid these mistakes?
Paging @Asya_Kamsky after her fantastic talk at .live today
What are some of the biggest mistakes people make in aggregation pipelines? How can people avoid these mistakes?
Paging @Asya_Kamsky after her fantastic talk at .live today
Hi
the biggest mistake is not to use an aggregation pipeline. Many, literally over 50% of the projects I see, have the same pattern: SQL people try out MongoDB. They limit their test to CRUD operations, some do not even know that there is something like an aggregation pipeline, other feel lost in too many curly braces …
Cheers
Michael
PS if you want to get a concrete answer on the biggest mistakes
I find these most often:
find
, when aggregation operators need to be used.$map
, $reduce
and $filter
, when working with array fields.Yes - The aggregation pipeline is a hidden gem
Hi
this would be nice to have in VSCode with the MongoDB plugin
I don’t have any written rules. Its is about aesthetics and about code readability. These are subjective. I guess, proper indentation is the simple way of saying it.
An aggregation stage is too long (vertically), this makes a stage more than a page length or with all operators and fields put together and too wide (horizontally). Both cases affect code readability. When the readability of code is affected, it in turn affects the maintainability.
So, what is the correct indentation? For example, the following are three samples of the same code with different indentation. The first and last are either too wide or too long to read clearly. The second sample, fits within a page and I see it is indented appropriately. A set of related code when it fits within a page and also properly indented is good readability, the eye can glance from top to bottom of the page and get the meaning of the code. If scrolling is involved it becomes awkward and then difficult.
Sample 1:
db.collection.aggregate([
{
$group: {
_id: "$workers.order",
order_avg: { $avg: { $subtract: [ "$workers.endAt", "$workers.startAt" ] } },
global_values: { $addToSet: { some_id: "$_id", duration: { $subtract: [ "$endAt", "$startAt" ] } } },
another_field: { ... ... ...}
}
}
])
Sample 2:
{
$group: {
_id: "$workers.order",
order_avg: {
$avg: {
$subtract: [ "$workers.endAt", "$workers.startAt" ]
}
},
global_values: {
$addToSet: {
some_id: "$_id",
duration: { $subtract: [ "$endAt", "$startAt" ] }
}
},
another_field: {
... ... ...
}
}
}
Sample 3:
{
$group: {
_id: "$workers.order",
order_avg: {
$avg: {
$subtract: [
"$workers.endAt",
"$workers.startAt"
]
}
},
global_values: {
$addToSet: {
some_id: "$_id",
duration: {
$subtract: [
"$endAt",
"$startAt"
]
}
}
},
another_field: {
...
...
...
}
}
}
One thing I do to make it more readable is to assign each stage a variable and have the pipeline be an array of my variables. For example:
match = { "$match" : { ... } } ;
sort = { "$sort" : { ... } } ;
lookup = { "$lookup" : { ... } } ;
group = { "$group" : { ... } } ;
pipeline = [ match , lookup , sort , group ] ;
db.collection.aggregate( pipeline ) ;
I find it is easier to modify a stage because it is by itself rather than being embedded in a myriad of braces. I can also easily remove a stage from the pipeline. As for indentation, being as old as I am, I prefer the K&R/Allman braces style. So it would be, taking Prasad_Saya example:
group =
{
$group:
{
_id: "$workers.order",
order_avg:
{
$avg: { $subtract: [ "$workers.endAt", "$workers.startAt" ] }
},
global_values:
{
$addToSet:
{
some_id: "$_id",
duration: { $subtract: [ "$endAt", "$startAt" ] }
}
},
another_field:
{
... ... ...
}
}
} ;
And I like to indent with tabs, which makes me not like yaml and python very much. B-)
Done