Chapter 2 : Bringing it all together - don't understand what should be normalized

Hello,

The last lab from chapter 2 isn’t clear to me.
I don’t understand how the imdb.rating can be normalized with respect to imdb.votes.

I understand from the scaling.js file that the “imdb.votes” field must be rescaled. Movies with little votes get a normalized value close to 1, movies with a lot of votes get a value close to 10.
However, I don’t see how this can have an effect on the rating … ? Does a rating of e.g. 1.7 have to be interpreted differently if there are more or less votes ?

Furthermore, the last line in the scaling.js file says :

normalized_rating = average(scaled_votes, imdb.rating)

I don’t see what an average of two “distinct” values could mean. Is this a function call ? An average with two arguments in a different scale doesn’t really make sense. Is this a formula ? Then the comma must be some type ?
Perhaps it’s something obvious, but at this moment I don’t see what is meant by this at all.

Thank you for explaining what is meant by this “normalization”.

Bart

4 Likes

imdb.rating and scaled_votes are two separate values, each in the range [1,10].

I think what that question is asking is for you to calculate a simple average of these two values for each document. As it says in the handout, you can use $avg within $addFields. Using an array of two values as an argument to $avg would be one way to compute an average of two values.

Note that you’re not averaging anything with respect to the entire collection, just computing an average of two values within a single document, and sorting by this computed average to find the document with the lowest average.

That’s what I did, and got the correct answer. I must admit that I was initially just as confused as you with the way the question was worded.

2 Likes

Thanks, your explanation makes it clear.
I do have my doubts about their algorithm. Apparently they want to take the number of votes into account as 50% of the normalized rating. Which seems rather silly to me. A very bad movie with a lot of votes (scaled votes = 10) may get a better normalized rating than a good movie with only a few ratings.
In this exercise, the logic behind the algorithm isn’t very important, but it could have been explained better.

Bart

5 Likes

I agree. The algorithm could be explained a little better between comments and variable names in the context of the problem.

3 Likes

Thank you very much for your feedback, will pass this to the curriculum developers.

José Carlos

Other thing to consider is that in the problem description it refers to the released date but in the answer detail it use the year field.

I too am confused about the requirements of this lab. An example of a particular record’s calculation might show what the instructor is trying to have students achieve.

I agree with @Bart_22366 - silly is a good word. The requirement’s algorithm is not practical. I can’t imagine this kind of information is very useful. I was expecting the lab would be a bit more realistic and this caused me to stumble. Once I read these comments I realized the requirements are academic at best and no where near a real-world problem and I was able to continue.

1 Like

normalized_rating = average(scaled_votes, imdb.rating)

This piece of code from scaling.js was a bit confusing on my part. I kept on using the formulat { "$divide": [ scaled_votes, imdb.rating ] }. Reading the threads here says differently. And I was able to complete the lab test.

Hi,

The idea of re-scaling is to reduce the range of votes, which goes from 5 to 1521105 (this is the maximum number of votes, which corresponds to “The Shawshank Redemption”). The idea is to reduce this range to fall into 1-10.

José Carlos

Hi Team, I didn’t understand the concept of scaling and normalization . Can anyone explain me in detail?

I was getting a wrong answer (“Blues Story”) until I realized that I had not included “imdb.rating” to use it in the next stage for average calculation with “scaled_votes”. Perhaps it will help somebody.