Chapter 2: Basic Aggregation - Utility Stages Lab - Bringing it all together

Hi, I’m new here and learning. Why do we need to normalize imdb.votes ? can you guys please help.


Hey @byezid_71920

Welcome to the fourms!

From the lab

Calculate an average rating for each movie in our collection where English is an available language, the minimum imdb.rating is at least 1, the minimum imdb.votes is at least 1, and it was released in 1990 or after. You’ll be required to rescale (or normalize ) imdb.votes. The formula to rescale imdb.votes and calculate normalized_rating is included as a handout.

And from the link


Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. For example, many classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.

I believe this is to handle the fact that some movies will have a lot of votes and some will not. Which could alter the voting. Think of a movie with 5 votes vs a move with 5000 votes. Also consider as an analogy what a start rating means to you on Amazon when it only has 5 reviews vs 100, 000