How to reduce multiple aggregation stages

manoj.gupta.91 · July 24, 2021, 5:02pm

Hi All,

I’ve good experience in RDBMS but new to MongoDB. Please excuse me if I’m relating many things of MongoDB to RDBMS. I’ve one of the problem statement as below and then few questions on the same. Please help me in understanding them one by one.

Problem Statement :

Count number of movies from movies collection where released year is greater than or equal to 1990.

My Solution to the problem:

db.movies.aggregate([
    {
        $match: 
        {
			"released": {"$exists": true}
		}
	},
	{
		"$addFields":
		{
			"release_year":{$year:"$released"}
		}
	},
    {
        $match: 
        {
			"release_year":{"$gte":1990}
		}
	}
]).itcount()

My Queries :

Is there is shorter way to do this where I can merge these stages?

If it was RDBMS (eg. Oracle) I would have done this as (SELECT COUNT(1) FROM MOVIES WHERE TO_NUMBER(TO_CHAR(released,‘YYYY’)) >= 1990). I mean extraction of year part from date and the condition comparison in one step/stage.

If I keep on adding stages unnecessarily (due to not knowing shorter ways to do things) will that not degrade performance?

Because output from one stage will again be processed in memory to calculate next stage and similarly for other stages. Whereas reducing number of stages should reduce repetition in processing. Isn’t it?

This is just an example for understanding simple problem and concept please don’t relate it with real life scenarios, I know those will be much more complex than this.

Regards
Manoj

s77rt · July 24, 2021, 10:04pm

Hi Manoj,
regarding your first question:
you don’t need aggregate here, just store the date as it is in the released field (or keep it empty if not released yet)
since you need only the number of records (and not the records themselves) you can use db.collection.count()

db.movies.count( { released: { $gte: new Date('01/01/1990') } } )

Takis · July 25, 2021, 1:55am

Hello : )

Aggregations stages can seen easier at first,but aggregation operators are so powerful,and useful,you can nest use variables etc you dont need to use $addFields for example for variables

A translation of your query only in 1 stage with nested aggregate operators and variables
would look like bellow.

{
      "$match": {
        "$expr": {
          "$and": [
            {
              "$ne": [
                {
                  "$type": "$released"
                },
                "missing"
              ]
            },
            {
              "$let": {
                "vars": {
                  "released_year": {
                    "$year": "$released"
                  }
                },
                "in": {
                  "$gt": [
                    "$$released_year",
                    1990
                  ]
                }
              }
            }
          ]
        }
      }
    }

For the sort order when you have missing fields,nulls,different types etc to compare see
sort order

You can do the count in database with a $countStage in a pipeline or if not pipeline,you
can use count documents with a query like the the simple solution from @s77rt you dont need anything else here

If you want to search for a field,index is very usefull,but use null for that field not missing.
with null you can use the index,for example find the null values,with missing you cant use
the index to find the missing values

For the second mongodb does optimization you can see optimization and stages sometimes change in order or become 1 stage,like match combine.

I tried a simple benchmark with like 6 $addFields and it was very fast,faster than 6 nested $let.
So applying many stages seems to be very fast.Maybe someone can give us more details.
But variables can be used in any place,so they have another purpose

manoj.gupta.91 · July 28, 2021, 11:32am

Thanks a lot for detailed reply. It was really helpful. I’ll have to work lot on aggregation and performance tuning.

system · August 2, 2021, 11:33am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.