The Journey of 100DaysofCode aka 100DaysofMongoDB (@Aasawari_24)

Aasawari · March 3, 2022, 4:44pm

Day02 of 100DaysOfCode as 100DaysOfMongoDB

Share on twitter: https://twitter.com/Aasawar61618175

I started with the basics of aggregation concepts and came across some amazing which makes life easier.

Theoretically, aggregation is based out of pipeline concept where output of one stage(series of query/operations) becomes input for the second stage of the pipeline and so on…

Aggregation have proved to be widely and immensely used in the real time analytics, Big Data, part of Transformation of the ETL process and various other applications etc.

The syntax and structure of an aggregation pipelines
db.<collection_name>.aggregate( [ { stage_1}, { stage 2}, { ... }, ...., { stage N} ] )

Beginning here with aggregation operators which has

$match and $project

where $match should make it at the beginning of the the aggregation where one can take the advantage of indexes. Below is an example to showcase the usage of $match and $project operators:

MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.solarSystem.aggregate( [ { $match: { type: "Terrestrial planet" } }, { $project: { _id: 0, name: 1, orderFromSun: 1}}])
{ "name" : "Earth", "orderFromSun" : 3 }
{ "name" : "Venus", "orderFromSun" : 2 }
{ "name" : "Mercury", "orderFromSun" : 1 }
{ "name" : "Mars", "orderFromSun" : 4 }
MongoDB Enterprise Cluster0-shard-0:PRIMARY>

A few stages known as cursor stage allows you to calculate, process and evaluate data as per your requirements. The aggregation give you a full freedom to perform operations without having to change the schema of the database.
Sharing a query which helped to find a data from a big collection to figure out the avg max temp in 1000 cities.

MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.icecream_data.aggregate( [ 
{ 
$project: 
            { _id: 0, 
             max_high: 
                         { $reduce: 
                                       { input: "$trends", 
                                       initialValue: -Infinity, 
                          in: 
                           { $cond: 
                                     [ { $gt: [ "$$this.avg_high_tmp", "$$value"] }, 
                                     "$$this.avg_high_tmp",
                                      "$$value" ] } } } } } ] )

{ "max_high" : 87 }

I shared a very basic of the aggregation as I understand and I am sure there is more to it too and will keep posting about my learning and challenges while learning the aggregation framework better.

Here are a few challenges I faced while making the pipelines:

Understand the schema of the collection.
Understanding the usage of project and applying calculations and operation on the data.
Choosing the correct fields to apply the operations on.

The approach I followed to overcome this are:

Performing right operation on right schema definition,
optimising the aggregation pipelines using different operators and syntax.

Please feel free to add your challenges in learning aggregation and any comments and reviews would be appreciated.

Thanks
Aasawari