Lab - $group and Accumulators - should we actually use stdDevPop?


In this exercise it says “Use the sample standard deviation expression”. I know this is not a course about statistics, and I’m not an expert in that field either, but I wonder whether, for this example, we should use the population deviation instead.

As explained here:

…you would normally calculate the population standard deviation if: (1) you have the entire population or (2) you have a sample of a larger population, but you are only interested in this sample and do not wish to generalize your findings to the population.

In the lab, we are presented with the situation 2: we are interested in the sample of movies with Oscars, and we do not want to generalize that value to the population. That is, we don’t want to generalize that value for movies without Oscars.

Although, now that I think twice, maybe we should use stdDevSamp because it is understood that we don’t have the entire population of movies in our database.

If any expert in statistics reads this, please clarify. Thanks! :slight_smile:

Bingo! :star_struck: :beers:

There’s the keyword:

Can you elaborate, 007_jb?

Hi @fmaylinch,

What I’m highlighting is that the choice of StDev is determined by the requirement, dataset and whether you want a biased or unbiased variance.

In this lab, the dataset doesn’t completely represent a real life example, we’re probably going to want an unbiased variance for sampling and most importantly, the requirement is to look at Sample StDev. There’s nothing stopping us from using Population StDev if we were to consider the subset of data as a population. When looking for deviation, it mainly boils down to the use case and how we wish to sample.

For this lab, I really wouldn’t worry about the theory as to whether Population or Sample StDev is correct.