Chapter 4: Describing Distributions
description
Transcript of Chapter 4: Describing Distributions
1
Chapter 4:Describing Distributions
4.1 Graphs: good and bad 4.2 Displaying distributions with
graphs 4.3 Describing distributions with
numbers
2
Dow Jones Industrial Average
3
Pie Graph
4
Definitions
Types of variables Categorical
E.g., gender, type of degree Quantitative
E.g., time, mass, force, dollars
The distribution of a variable tells us what values it takes and how often it takes these values.
5
Bar graph showing a distribution
Education Level in U.S. (adults age 25+)
15.9
33.125.4 25.6
0
10
20
30
40
50
No highschooldegree
High schoolonly
1-3 years ofcollege
4+ years ofcollege
Years of Schooling
Per
cen
t o
f T
ota
l
6
Exercises, pp. 207-208
4.1 4.5
7
Bar graph for 4.1Lottery Game Sales Distribution
16420
5245
2776
8865
5134
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Instant 3-digit 4-digit Lotto Other
Type of Game
Sal
es (
mil
lio
n $
)
8
Lottery Game Sales Distribution (percent of total)
42.7
13.6
7.2
23.1
13.4
Instant
3-digit
4-digit
Lotto
Other
Pie Chart for 4.1
9
Misleading Pictogram (p. 209)
Worker Salary
$2000/mo
Manager Salary
$4000/mo
10
Dow Jones Industrial Average:This is a line graph (p. 210)
11
Misleading Graphs?
Salaries are Going Up!
20002050210021502200225023002350
1994 2004
Year
Mon
thly
Sal
ary
($)
Salaries Barely Increased
0500
10001500200025003000
1994 2004
Year
Mon
thly
Sal
ary
($)
12
Making good graphs (p. 213)
Graphs must have labels, legends, and titles.
Make the data stand out. Pay attention to what the eye sees.
3-D is really not necessary!
13
Exercises, pp. 214-216
4.6 through 4.8
14
Homework
Problems, pp. 219-221, to be done in Excel: 4.11, 4.15 Email Excel file by class time on Monday
Section 4.2 Reading, pp. 221-242
15
4.2 Displaying Distributions with Graphs
16
Displaying distributions graphically
The distribution of a variable tells us what values it takes and how often it takes these values.
Ways to display distributions for quantitative variables: dotplots histograms stemplots
See example on pp. 221-222.
17
Figure 4.15: A histogram
18
Figure 4.16: A stemplot
19
Histograms
Most common graph of the distribution of a quantitative variable.
How to make a histogram: Example 4.9, p. 224 Range: 5.7 to 17.6 Shoot for 6-15 classes (bars)
Read paragraph on p. 226
1 size of intervals19.110
7.56.17
20
Example 4.9, pp. 224-226
21
Practice Problem: 4.18, p. 226
22
Exercise 4.18
Histogram By hand Using calculator
Stemplot By hand
23
Interpreting the graphical displays
Concentrate on the main features. Overall pattern (p. 230)
Shape, center, spread Outliers
Individual observations outside the overall pattern of the graph
24
Example 4.10, p. 230
25
Shape
Symmetric or skewed (p. 231)? Is it unimodal (one hump) or bimodal
(two humps)?
26
Homework
Reading: pp. 221-242
27
Stemplots
Usually reserved for smaller data sets. Advantage:
Actual (or rounded) data are provided. Possible drawback:
Many people are not used to this type of plot, so the presenter/writer has to describe it.
28
How to make a stemplot, p. 236
29
More problems
Exercises: 4.24 and 4.25, p. 233 4.26, p. 233
30
Practice
Exercises 4.30, p. 239 and 4.32, p. 240
4.28, p. 238
31
Wrapping up Section 4.2 …
4.28, p. 238 4.33, p. 242 4.36 4.37
32
4.3 Describing Distributionswith Numbers
Until now, we’ve been satisfied with using words to describe the center and spread of distributions. Now, we will use numbers to describe
these characteristics of a distribution. The 5-number summary:
Center: Median (p. 248) Spread: Find the Quartiles, Q1 and Q3. (p.
250) Spread: Min and Max
33
Boxplots
We can use this information to construct a boxplot:
34
Practice
4.46, p. 254 Enter data in the Stat Edit menu in
your calculator, and order them.
35
Boxplot vs. Modified Boxplot The modified boxplot shows outliers … they
are marked with a *. The lines extending from the quartiles go to the last number which is not an outlier.
If there are no outliers, the modified boxplot and the regular boxplot are identical.
Below are a boxplot (on the left) and modified boxplot (on the right) for Problem 4.39, p. 245.
36
Side-by-side boxplots (p. 252)
37
Practice
Exercises: 4.50, p. 256 4.49, p. 256
38
Testing for Outliers Find the Inter-Quartile Range:
IQR=Q3-Q1
Multiply: 1.5*IQR Outliers on low side:
Q1-1.5*IQR Outliers on high side:
Q3+1.5*IQR Are there any numbers outside of these
values? If so, they are outliers, and are marked on boxplots
with an asterisk. The tail is drawn to the highest (or lowest) value
which is not an outlier.
39
Measures of Center and Spread
Median and IQR Mean and Standard Deviation
Mean is the arithmetic average Standard deviation measures the average distance
of the observations from their mean. Variance is simply the squared standard deviation.
All of these statistics can be calculated by hand, but we use technology to do these today …
We use 1-sample stats on our calculators, or a stats program.
40
Properties of standard deviation (p. 259)
Use s as a measure of spread when you use the mean.
If s=0, there is no spread. The larger the value for s, the larger
the spread of the distribution.
41
Practice Problem
4.52, p. 263 Mike:
59,69,71,52,65,55,72,50,75,67,51,69,68,62,69
42
Practice Problem
4.55, p. 263
43
Example 4.21, p. 265
44
Choosing a summary
The book has a section on which summary to use (mean and std. dev., or median with the quartiles).
I like to report all of them.
However, when writing about a distribution, or comparing distributions, we should think about which summary works best. See p. 266.
Skewed, outliers … median and quartiles Symmetrical, no (or few) outliers … mean and std. dev.
Mean and standard deviation are most common. One reason is that they allow for more sophisticated calculations to be used in higher statistics.
45
More Practice …
p. 271: 4.57, 4.58, 4.60