Describing Distributions Numerically Measures of Variation And Boxplots.

Post on 18-Jan-2016

226 views 0 download

Transcript of Describing Distributions Numerically Measures of Variation And Boxplots.

Describing Distributions Numerically

Measures of Variation

And

Boxplots

Boxplots

Range: highest number - lowest number

Five number summary:MinimumQ1MedianQ3Maximum

Boxplot Continued

Interquartile Range: IQR = Q3 - Q1*Tells us how much territory the middle half of the data covers.

Percentile: for whole number P (where 1≤P≤99), the Pth percentile of a distribution is a value such that P% of the data fall at or below it and (100-P)% of the data fall at or above it.

Histogram

Median-splits the histogram into two halves with equal area

Mean-point at which the histogram would balance

Measures of Variation

Deviation: how far each data value is from the mean

Variance (s2): average (almost) of squared deviations

Standard Deviation (s):

Thinking about Variation…

The U.S. Census Bureau reports the median family income in its summary of census data. Why do you suppose they use the median instead of the mean? What might be the disadvantages of reporting the mean?

Thinking about Variation…

You’ve just bought a new car that claims to get a highway fuel efficiency of 31 mpg. Of course, your mileage will vary. If you had to guess, would you expect the IQR of gas mileage attained by all cars like yours be 30 mpg, 3 mpg, or 0.3 mpg? Why?

Thinking about Variation…

A company selling a new MP3 player advertises that the player has a mean lifetime of 5 years. If you were in charge of quality control at the factory, would you prefer that the standard deviation of lifespans of the players you produce be 2 years or 2 months? Why?

Rules about shape, center, and spread

1. If the shape is skewed, report the median and IQR.

2. If the shape is symmetrical, report the mean and standard deviation. IQR is usually larger than the standard deviation.

3. If outliers, report mean and standard deviation with outliers present and with outliers removed.

Summarizing a Distribution

A man owned a 1989 Nissan Maxima for 8 years. Being a statistician, he recorded the car’s fuel efficiency (in mpg) each time he filled the tank. He wanted to know what fuel

efficiency to expect as “ordinary” for his car.

Knowing this, he was able to predict when he’d need to fill the tank again, and notice if the fuel efficiency suddenly got worse, which could be a sign of trouble. What does

the data say?

When comparing boxplots

• Compare the medians, which group has the higher center?

• Compare the IQRs; which group is more spread out?

• Judged by the size of the IQRs, are the medians very different?

• Check for possible outliers. Identify them if you can.

Comparing Boxplots

A student designed an experiment to test the efficiency of various coffee

containers by placing hot liquid in each of 4 different containers types 8

different times. After 30 minutes she measured the temperature again and

recorded the difference in temperature. What can we say about the effectiveness of these four mugs?

*Because these are temperature differences, smaller differences mean

that the liquid stayed hot.

Measure of Variation Continued

Coefficient of Variation:

Chebyshev’s Theorem: