What are some of the biggest mistakes people make in aggregation pipelines?

What are some of the biggest mistakes people make in aggregation pipelines? How can people avoid these mistakes?

Paging @Asya_Kamsky after her fantastic talk at .live today

Hi
the biggest mistake is not to use an aggregation pipeline. Many, literally over 50% of the projects I see, have the same pattern: SQL people try out MongoDB. They limit their test to CRUD operations, some do not even know that there is something like an aggregation pipeline, other feel lost in too many curly braces …
Cheers
Michael

PS if you want to get a concrete answer on the biggest mistakes

  1. $match is not user as the first stage
  2. in correct use of $unwind
3 Likes

I find these most often:

  • Trying to use the operators from the MQL’s find, when aggregation operators need to be used.
  • Having too many stages - explains the logic, but inefficient.
  • Avoid using the $map, $reduce and $filter, when working with array fields.
  • Bad formatting of the query - not very readable (there can be some kind formatting rule! or even an Aggregation Query Formatter tool).
2 Likes

Yes - The aggregation pipeline is a hidden gem

1 Like

@Prasad_Saya Thanks for sharing! What formatting rules do you use?

Hi

this would be nice to have in VSCode with the MongoDB plugin

2 Likes

I don’t have any written rules. Its is about aesthetics and about code readability. These are subjective. I guess, proper indentation is the simple way of saying it.

An aggregation stage is too long (vertically), this makes a stage more than a page length or with all operators and fields put together and too wide (horizontally). Both cases affect code readability. When the readability of code is affected, it in turn affects the maintainability.

So, what is the correct indentation? For example, the following are three samples of the same code with different indentation. The first and last are either too wide or too long to read clearly. The second sample, fits within a page and I see it is indented appropriately. A set of related code when it fits within a page and also properly indented is good readability, the eye can glance from top to bottom of the page and get the meaning of the code. If scrolling is involved it becomes awkward and then difficult.

Sample 1:

db.collection.aggregate([
  { 
      $group: {
           _id: "$workers.order", 
           order_avg: { $avg: { $subtract: [ "$workers.endAt", "$workers.startAt" ] } },
           global_values: { $addToSet: { some_id: "$_id",  duration: { $subtract: [ "$endAt", "$startAt" ] } } },
           another_field: { ... ... ...}
      } 
  }
])

Sample 2:

  { 
      $group: {
           _id: "$workers.order", 
           order_avg: { 
              $avg: { 
                  $subtract: [ "$workers.endAt", "$workers.startAt" ] 
              } 
           },
           global_values: { 
               $addToSet: {
                   some_id: "$_id",  
                   duration: { $subtract: [ "$endAt", "$startAt" ] } 
               } 
           },
          another_field: { 
              ... ... ...
          }
      } 
  }

Sample 3:

  { 
      $group: {
           _id: "$workers.order", 
           order_avg: { 
              $avg: { 
                  $subtract: [ 
                      "$workers.endAt", 
                      "$workers.startAt" 
                  ] 
              } 
           },
           global_values: { 
               $addToSet: {
                   some_id: "$_id",  
                   duration: { 
                       $subtract: [ 
                           "$endAt", 
                            "$startAt" 
                       ] 
                   } 
               } 
           },
           another_field: { 
              ...
              ... 
              ...
          }
      } 
  }
2 Likes

One thing I do to make it more readable is to assign each stage a variable and have the pipeline be an array of my variables. For example:

match = { "$match" : { ... } } ;
sort = { "$sort" : { ... } } ;
lookup = { "$lookup" : { ... } } ;
group = { "$group" : { ... } } ;
pipeline = [ match , lookup , sort , group ] ;
db.collection.aggregate( pipeline ) ;

I find it is easier to modify a stage because it is by itself rather than being embedded in a myriad of braces. I can also easily remove a stage from the pipeline. As for indentation, being as old as I am, I prefer the K&R/Allman braces style. So it would be, taking Prasad_Saya example:

group = 
{ 
    $group:
    {
        _id: "$workers.order", 
        order_avg:
        { 
            $avg: { $subtract: [ "$workers.endAt", "$workers.startAt"  ] } 
        },
        global_values:
        { 
            $addToSet:
            {
                some_id: "$_id",  
                duration: { $subtract: [ "$endAt", "$startAt" ] } 
            } 
        },
        another_field:
        { 
              ... ... ...
        }
    } 
} ;

And I like to indent with tabs, which makes me not like yaml and python very much. B-)

3 Likes

Love that idea! You can submit feature requests at https://feedback.mongodb.com/

Done :slight_smile:

2 Likes