Chapter 4: Describing Distributions

45
1 Chapter 4: Describing Distributions 4.1 Graphs: good and bad 4.2 Displaying distributions with graphs 4.3 Describing distributions with numbers

description

Chapter 4: Describing Distributions. 4.1Graphs: good and bad 4.2Displaying distributions with graphs 4.3Describing distributions with numbers. Dow Jones Industrial Average. Pie Graph. Definitions. Types of variables Categorical E.g., gender, type of degree Quantitative - PowerPoint PPT Presentation

Transcript of Chapter 4: Describing Distributions

Page 1: Chapter 4: Describing Distributions

1

Chapter 4:Describing Distributions

4.1 Graphs: good and bad 4.2 Displaying distributions with

graphs 4.3 Describing distributions with

numbers

Page 2: Chapter 4: Describing Distributions

2

Dow Jones Industrial Average

Page 3: Chapter 4: Describing Distributions

3

Pie Graph

Page 4: Chapter 4: Describing Distributions

4

Definitions

Types of variables Categorical

E.g., gender, type of degree Quantitative

E.g., time, mass, force, dollars

The distribution of a variable tells us what values it takes and how often it takes these values.

Page 5: Chapter 4: Describing Distributions

5

Bar graph showing a distribution

Education Level in U.S. (adults age 25+)

15.9

33.125.4 25.6

0

10

20

30

40

50

No highschooldegree

High schoolonly

1-3 years ofcollege

4+ years ofcollege

Years of Schooling

Per

cen

t o

f T

ota

l

Page 6: Chapter 4: Describing Distributions

6

Exercises, pp. 207-208

4.1 4.5

Page 7: Chapter 4: Describing Distributions

7

Bar graph for 4.1Lottery Game Sales Distribution

16420

5245

2776

8865

5134

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Instant 3-digit 4-digit Lotto Other

Type of Game

Sal

es (

mil

lio

n $

)

Page 8: Chapter 4: Describing Distributions

8

Lottery Game Sales Distribution (percent of total)

42.7

13.6

7.2

23.1

13.4

Instant

3-digit

4-digit

Lotto

Other

Pie Chart for 4.1

Page 9: Chapter 4: Describing Distributions

9

Misleading Pictogram (p. 209)

Worker Salary

$2000/mo

Manager Salary

$4000/mo

Page 10: Chapter 4: Describing Distributions

10

Dow Jones Industrial Average:This is a line graph (p. 210)

Page 11: Chapter 4: Describing Distributions

11

Misleading Graphs?

Salaries are Going Up!

20002050210021502200225023002350

1994 2004

Year

Mon

thly

Sal

ary

($)

Salaries Barely Increased

0500

10001500200025003000

1994 2004

Year

Mon

thly

Sal

ary

($)

Page 12: Chapter 4: Describing Distributions

12

Making good graphs (p. 213)

Graphs must have labels, legends, and titles.

Make the data stand out. Pay attention to what the eye sees.

3-D is really not necessary!

Page 13: Chapter 4: Describing Distributions

13

Exercises, pp. 214-216

4.6 through 4.8

Page 14: Chapter 4: Describing Distributions

14

Homework

Problems, pp. 219-221, to be done in Excel: 4.11, 4.15 Email Excel file by class time on Monday

Section 4.2 Reading, pp. 221-242

Page 15: Chapter 4: Describing Distributions

15

4.2 Displaying Distributions with Graphs

Page 16: Chapter 4: Describing Distributions

16

Displaying distributions graphically

The distribution of a variable tells us what values it takes and how often it takes these values.

Ways to display distributions for quantitative variables: dotplots histograms stemplots

See example on pp. 221-222.

Page 17: Chapter 4: Describing Distributions

17

Figure 4.15: A histogram

Page 18: Chapter 4: Describing Distributions

18

Figure 4.16: A stemplot

Page 19: Chapter 4: Describing Distributions

19

Histograms

Most common graph of the distribution of a quantitative variable.

How to make a histogram: Example 4.9, p. 224 Range: 5.7 to 17.6 Shoot for 6-15 classes (bars)

Read paragraph on p. 226

1 size of intervals19.110

7.56.17

Page 20: Chapter 4: Describing Distributions

20

Example 4.9, pp. 224-226

Page 21: Chapter 4: Describing Distributions

21

Practice Problem: 4.18, p. 226

Page 22: Chapter 4: Describing Distributions

22

Exercise 4.18

Histogram By hand Using calculator

Stemplot By hand

Page 23: Chapter 4: Describing Distributions

23

Interpreting the graphical displays

Concentrate on the main features. Overall pattern (p. 230)

Shape, center, spread Outliers

Individual observations outside the overall pattern of the graph

Page 24: Chapter 4: Describing Distributions

24

Example 4.10, p. 230

Page 25: Chapter 4: Describing Distributions

25

Shape

Symmetric or skewed (p. 231)? Is it unimodal (one hump) or bimodal

(two humps)?

Page 26: Chapter 4: Describing Distributions

26

Homework

Reading: pp. 221-242

Page 27: Chapter 4: Describing Distributions

27

Stemplots

Usually reserved for smaller data sets. Advantage:

Actual (or rounded) data are provided. Possible drawback:

Many people are not used to this type of plot, so the presenter/writer has to describe it.

Page 28: Chapter 4: Describing Distributions

28

How to make a stemplot, p. 236

Page 29: Chapter 4: Describing Distributions

29

More problems

Exercises: 4.24 and 4.25, p. 233 4.26, p. 233

Page 30: Chapter 4: Describing Distributions

30

Practice

Exercises 4.30, p. 239 and 4.32, p. 240

4.28, p. 238

Page 31: Chapter 4: Describing Distributions

31

Wrapping up Section 4.2 …

4.28, p. 238 4.33, p. 242 4.36 4.37

Page 32: Chapter 4: Describing Distributions

32

4.3 Describing Distributionswith Numbers

Until now, we’ve been satisfied with using words to describe the center and spread of distributions. Now, we will use numbers to describe

these characteristics of a distribution. The 5-number summary:

Center: Median (p. 248) Spread: Find the Quartiles, Q1 and Q3. (p.

250) Spread: Min and Max

Page 33: Chapter 4: Describing Distributions

33

Boxplots

We can use this information to construct a boxplot:

Page 34: Chapter 4: Describing Distributions

34

Practice

4.46, p. 254 Enter data in the Stat Edit menu in

your calculator, and order them.

Page 35: Chapter 4: Describing Distributions

35

Boxplot vs. Modified Boxplot The modified boxplot shows outliers … they

are marked with a *. The lines extending from the quartiles go to the last number which is not an outlier.

If there are no outliers, the modified boxplot and the regular boxplot are identical.

Below are a boxplot (on the left) and modified boxplot (on the right) for Problem 4.39, p. 245.

Page 36: Chapter 4: Describing Distributions

36

Side-by-side boxplots (p. 252)

Page 37: Chapter 4: Describing Distributions

37

Practice

Exercises: 4.50, p. 256 4.49, p. 256

Page 38: Chapter 4: Describing Distributions

38

Testing for Outliers Find the Inter-Quartile Range:

IQR=Q3-Q1

Multiply: 1.5*IQR Outliers on low side:

Q1-1.5*IQR Outliers on high side:

Q3+1.5*IQR Are there any numbers outside of these

values? If so, they are outliers, and are marked on boxplots

with an asterisk. The tail is drawn to the highest (or lowest) value

which is not an outlier.

Page 39: Chapter 4: Describing Distributions

39

Measures of Center and Spread

Median and IQR Mean and Standard Deviation

Mean is the arithmetic average Standard deviation measures the average distance

of the observations from their mean. Variance is simply the squared standard deviation.

All of these statistics can be calculated by hand, but we use technology to do these today …

We use 1-sample stats on our calculators, or a stats program.

Page 40: Chapter 4: Describing Distributions

40

Properties of standard deviation (p. 259)

Use s as a measure of spread when you use the mean.

If s=0, there is no spread. The larger the value for s, the larger

the spread of the distribution.

Page 41: Chapter 4: Describing Distributions

41

Practice Problem

4.52, p. 263 Mike:

59,69,71,52,65,55,72,50,75,67,51,69,68,62,69

Page 42: Chapter 4: Describing Distributions

42

Practice Problem

4.55, p. 263

Page 43: Chapter 4: Describing Distributions

43

Example 4.21, p. 265

Page 44: Chapter 4: Describing Distributions

44

Choosing a summary

The book has a section on which summary to use (mean and std. dev., or median with the quartiles).

I like to report all of them.

However, when writing about a distribution, or comparing distributions, we should think about which summary works best. See p. 266.

Skewed, outliers … median and quartiles Symmetrical, no (or few) outliers … mean and std. dev.

Mean and standard deviation are most common. One reason is that they allow for more sophisticated calculations to be used in higher statistics.

Page 45: Chapter 4: Describing Distributions

45

More Practice …

p. 271: 4.57, 4.58, 4.60