Problem solving Lab: Using Cursor-like Stages

Working through this lab. Here is the pseudocode I wrote with my approach to the problem:

  1. $match tomatoes.viewer.rating = greater than 3 and releaed in USA
  2. Add variable called num_favs
  3. If Sandra Bullock is in cast add 1 to num_favs
  4. If Tom Hanks is in cast add 1 to num_favs
  5. If Kevin Spacey is in cast add 1 to num_favs
  6. If George Clooney is in cast add 1 to num_favs
  7. If Julia Roberts is in cast add 1 to num_favs
  8. Sort results by num_favs then tomatoes.viewer.rating then title

Based on that, I wrote this aggregation pipeline language:

    {$match: [{$tomatoes.viewer.rating: {$gt: 3}}, {countries: {$in: ["USA"]}}]},
    {$project: {"_id": 0, "num_favs": 1}},
    {$cond: {if: {cast: {$in: "Sandra Bullock"}}, then: {$inc: {"$num_fav": 1}}}},
    {$cond: {if: {cast: {$in: "Tom Hanks"}}, then: {$inc: {"$num_fav": 1}}}},
    {$cond: {if: {cast: {$in: "Kevin Spacey"}}, then: {$inc: {"$num_fav": 1}}}},
    {$cond: {if: {cast: {$in: "George Clooney"}}, then: {$inc: {"$num_fav": 1}}}},
    {$cond: {if: {cast: {$in: "Julia Roberts"}}, then: {$inc: {"$num_fav": 1}}}},
    {$sort: { "num_fav": -1, "tomatoes.viewer.rating": -1, "title": -1 }}

When I run it in my terminal, I get this error: “uncaught exception: SyntaxError: missing : after property id :

Any idea what I’m doing wrong here?

The error message is not the best to help but you are missing, or having an extra brace or bracket at character 41 of the first line.

When in the mongo shell, which is JS bases, it helps to use variable to make big aggregation easier to read and modify.

For example:

rating = { "$tomatoes.viewer.rating" : { "$gt" : 3 } }
countries = { "$countries" : { "$in" : [ "USA" ] } }
match = { "$match" : { ...rating , ...countries , ... } }

While writing the above example, I notice that $tomatoes.viewer.rating is not between quotes. Quotes are not mandatory for field names, but they are when you have a dot in the name. So this might be the syntax error rather than missing or extra braces and brackets.

While your $inc approach would work, imagine having a favorites array of 25 actors. See for a better approach.

Good advice on variables. I think I’m getting closer but am not there yet. I’m entering the following but then just get a … in the terminal. Any further suggestions?

var favorites = ['Sandra Bullock', 'Tom Hanks', 'Julia Roberts', 'Kevin Spacey', 'George Clooney']
var rating = { "$tomatoes.viewer.rating" : { $gt : 3 } }
var countries = { "$countries" : { $in : [ "USA" ] } }

var pipeline = [{$match: {rating, countries}}, 
    {$addFields: {'favs_in_cast': {$setIntersection: [favorites, '$cast']}}}, 
    {$addFields: {'num_favs': {$size: '$favs_in_cast'}}}, 
    {$sort: {'num_favs': -1, 'tomatoes.viewer.rating': -1, 'title': -1}}, 
    {$limit: 25}]

Those 3 dots indicate your command is not complete
You are missing closing flower bracket at the end }
Check again

1 Like

My query is very similar to yours but when I try to run, it throws me an error saying that The argument to $size must be an array, but was of type: null

It means the setIntersection somehow is not returning the response we are hoping for. Did you face a similar issue?

Here is a hint to what is happening.

Consider the following documents:

{ _id: ObjectId("621266ba86b239a31605233f"), a: [ 1 ], b: null }
{ _id: ObjectId("621266c386b239a316052340"), a: [ 1 ] }
{ _id: ObjectId("6212678286b239a316052341"), a: [ 1 ], b: [] }

Run the following aggregation:

c.aggregate( { '$addFields' :
    { 'result' : { '$setIntersection' : [ '$a' , '$b' ] } }
} )

From the results, you should be able to figure out where the null in the error message comes from.

1 Like

Thanks for the hint, I got it to work and I won’t comment how because that will not be good for the learning of others.

However, I do wanted to ask is it okay or is it efficient to have multiple $match stages in your pipeline?

1 Like

For the second lab in this chapter where we need to do scaling as well, I have prepared my pipeline but it’s not producing the result as expected:

    '$match': {
      'languages': 'English',
      'imdb.rating': { '$gte': 1 },
      'imdb.votes': { '$gte': 1 },
      'released': { '$gte': new Date('1990-01-01')}
    '$addFields': {
      'scaled_votes':   {
        $add: [
            $multiply: [
                $divide: [
                  { $subtract: ['$imdb.votes', 5] },
                  { $subtract: [1521105, 5] }
    '$addFields': {
      'normalized_rating': {
        $avg: ['$scaled_votes', 'imdb.rating']
    '$sort': {
      'normalized_rating': 1
    '$limit': 1

Can you tell me what am I doing wrong here?

A $match stage usually weed out documents that are of no interest. It is then usually more efficient. (Usually, since if you have a very complicated $match stage that weed out only a few documents, may be the processing of the $match is more costly that letting a few odd case documents to go thru).

The earlier the $match stages are in the pipeline, the better. Specially, if you have indexes that supports the query. That’s covered either later in the course or in M201which I recommend highly.

When trying to debug a pipeline, I start by removing some stages, like $limit and $sort to see if the results make sense. I look at computed values, in this case normalized_rating and scaled_votes to see if the values make sense.

The only thing that I could see at first, is

The arguments of $avg are not consistent. With one of the field you use the dollar sign but not with the other. Since they are both fields of the documents coming into the $addFields stage, they should be accessed in the same way. I am pretty sure the $ sign way is the correct one. You can remove the $sort and $limit to see if you have numbers or something else out of this stage.

1 Like

Thanks @steevej for the help. It was a typo like always. Thanks for pointing it out and also for the explanation regarding the use of match stage. Appreciate it.

1 Like