Summarizing Numerical Data

STAT 20: Introduction to Probability and Statistics

Concept Questions

Describing Shape

Which of these variables do you expect to be uniformly distributed?

  1. bill length of Gentoo penguins
  2. salaries of a random sample of people from California
  3. house sale prices in San Francisco
  4. birthdays of classmates (day of the month)

Please vote at pollev.com.

01:00

Concept Activity - Measures of Center

Mean, median, mode: which is best?

It depends on your desiderata: the nature of your data and what you seek to capture in your summary.

Get out a piece of paper. You’ll be watching a 3 minute video that discusses characteristics of a typical human. Note which numerical summaries are used and what for.

General Advice

  1. Means are often a good default for symmetric data.
  1. Means are sensitive to very large and small values, so can be deceptive on skewed data. > Use a median
  1. Modes are often the only option for categorical data.

But there are other notions of typical… what about a maximum?

Concept Question 3 - Measures of Spread

  • Why are measures of spread so important? Consider the following question.

There are two new food delivery services that open in Berkeley: Oski Eats and Cal Cravings. A friend of yours that took Stat 20 collected data on each and noted that Oski Eats has a mean delivery time of 29 minutes and Cal Cravings a mean delivery time of 27 minutes. Which would would you rather order from?

One possible reality

Which would would you rather order from?

01:00

Worksheet: Summarizing Numerical Data

25:00

Break

05:00

Lab: Computing on the Data

25:00