4/12/2015Chapter 21 Describing Distributions with Numbers.

32
03/21/22 Chapter 2 1 Chapter 2 Describing Distributions with Numbers

Transcript of 4/12/2015Chapter 21 Describing Distributions with Numbers.

Page 1: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 1

Chapter 2

Describing Distributions

with Numbers

Page 2: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 2

Numerical Summaries of:

• Central location– mean– median

• Spread– Range– Quartiles – Standard Deviation / variance

• Shape measures not covered

Page 3: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 3

Arithmetic Mean

• Most common measure of central location

• Notation (“xbar”):

xnx x x

nxn i

i

n

1 1

1 2

1

x

Where

n is the sample size

∑ is the summation symbol

Page 4: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 4

Example: Sample MeanData: Metabolic rates, calories / day:

1792 1666 1362 1614 1460 1867 1439

1600 7

200,11

7

1439186714601614136216661792

x

Page 5: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 5

Median (M)

• Half the values are less than the median, half are greater

• If n is odd, the median is the middle ordered value

• If n is even, the median is the average of the two middle ordered values

Page 6: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 6

Examples: Median• Example 1: 2 4 6

Median = 4

• Example 2: 2 4 6 8 Median = 5 (average of 4 and 6)

• Example 3: 6 2 4 Median 2

(Values must first be ordered first 2 4 6 , Median = 4)

Page 7: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 7

Example: Median

Ordered array:

1362 1439 1460 1614 1666 1792 1867 median

Data = metabolic rates in slide 4 (n = 7)

The location of the median in ordered array: L(M) = (n + 1) / 2

Value of median = 1614

Page 8: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 8

The Median is robust to outliers

This data set:

1362 1439 1460 1614 1666 1792 1867

has median 1614 and mean 1600

This similar data with high outlier:

1362 1439 1460 1614 1666 1792 9867

still has median 1614 but now has mean 2742.9

Page 9: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 9

The skew pulls the mean

• The average salary at a high tech firm is $250K / year

• The median salary is $60K

• What does this tell you?

• Answer: There are some very highly paid executives, but most of the workers make modest salaries, i.e., there is a positive skew to the distribution

Page 10: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 10

Spread = Variability• Amount of spread around the center!

• Statistical measures of spread

–Range

–Inter-Quartile Range

–Standard deviation

Page 11: 4/12/2015Chapter 21 Describing Distributions with Numbers.

Range and IQR• Range = maximum – minimum

• Easy, but NOT as good as the…

• Quartiles & Inter-Quartile Range (IQR)– Quartile 1 (Q1) cuts off bottom 25% of data

(“25th percentile”)– Quartile 2 (Q2) cuts off two-quarters of data– same as the Median!– Quartile 3 (Q3) cuts off three-quarters of the

data (“75th percentile”)

Page 12: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 12

Obtaining Quartiles• Order data

• Find the median

• Look at the lower half of data set – Find “median” of this lower half– This is Q1

• Look at the upper half of the data set. – Find “median” of this upper half – This is Q3

Page 13: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 13

Example: QuartilesConsider these 10 ages:05 11 21 24 27 28 30 42 50 52

median

The median of the bottom half (Q1) = 2105 11 21 24 27

The median of the top half (Q3) = 4228 30 42 50 52

Page 14: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 14

Example 2: Quartiles, n = 53

100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212

L(M)=(53+1) / 2 = 27 Median = 165

Page 15: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 15

Example 2: Quartiles, n = 53

100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212

Bottom half has n* = 26 L(Q1)=(26 + 1) / 2= 13.5 from bottom

Q1 = avg(127, 128) = 127.5

Page 16: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 16

Example 2: Quartiles, n = 53

100 124 148 170 185 215101 125 150 170 185 220106 127 150 172 186 260106 128 152 175 187110 130 155 175 192110 130 157 180 194119 133 165 180 195120 135 165 180 203120 139 165 180 210123 140 170 185 212

Top half has n* = 26 L(Q3) = 13.5 from the top!

Q3 = avg(185, 185) = 185

Page 17: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 17

10 016611 00912 003457813 0035914 0815 0025716 55517 00025518 00005556719 24520 321 02522 023242526 0

Example 2Quartiles

Q2 = 165

Q3 = 185

Q1 = 127.5

"5 point summary"

= {Min, Q1, Median, Q3, Max}

= {100, 127.5, 165, 185, 260}

Page 18: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 18

Inter-quartile Range (IQR)

• Q1 = 127.5

• Q3 = 185

Inter-QuartileRange (IQR)= Q3 Q1

= 185 – 127.5= 57.5

“spread of middle 50%”

Page 19: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 19

M

Simple Box5-point summary graphically

Q1 Q3min max

100 125 150 175 200 225 250 275

Weight

Page 20: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 20

Boxplots are useful for comparing groups

Page 21: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 21

Standard Deviation & Variance

• Most popular measures of spread

• Each data value has a deviation, defined as:

xxi

Page 22: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 22

Example: DeviationsMetabolic data (n = 7)

1 1439 1600 161x x

1 1792 1600 192x x

Page 23: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 23

Variance• Find the mean• Find the deviation of each value • Square the deviations• Sum the squared deviations• Divide by (n − 1)

sn

x xii

n2

1

12

1

( )

( )

Page 24: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 24

DataData: Metabolic rates, n = 7

1792 1666 1362 1614 1460 1867 1439

Page 25: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 25

“Sum of Squares”Obs Deviations Squared deviations

1792 17921600 = 192 (192)2 = 36,864

1666 1666 1600 = 66 (66)2 = 4,356

1362 1362 1600 = -238 (-238)2 = 56,644

1614 1614 1600 = 14 (14)2 = 196

1460 1460 1600 = -140 (-140)2 = 19,600

1867 1867 1600 = 267 (267)2 = 71,289

1439 1439 1600 = -161 (-161)2 = 25,921

11,200 0 214,870

xxi ix 2xxi

SUMS11200

16007

x 2( ) "Sum of Squares"ix x

Page 26: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 26

Variance

67.811,35

870,21417

11

1 22

xxn

s i

Sum of Squares

Page 27: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 27

Standard Deviation

2ss

Square root of variance

24.18967.811,35 s

Page 28: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 28

Standard DeviationDirect Formula

189

870,21417

1

1

1 2

xxn

s i

Page 29: 4/12/2015Chapter 21 Describing Distributions with Numbers.

Use calculator to check work!

TI-30XIIS sequence:• On > CLEAR > 2nd > STAT >

Scroll > Clear Data > Enter• 2nd > STAT > 1-VAR or 2-VAR• DATA > “enter data• STATVAR key

I’m supporting the TI-30XIIS only

Page 30: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 30

Choosing Summary Statistics

• Use the mean and standard deviation to describe symmetrical distributions & distributions free of outliers

• Use the median and quartiles (IQR) to describe distributions that are skewed or have outliers

Page 31: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 31

Example: Number of Books Read

0 1 2 4 10 30

0 1 2 4 10 990 1 2 4 120 1 3 5 130 2 3 5 140 2 3 5 140 2 3 5 150 2 4 5 150 2 4 5 201 2 4 6 20

M

n = 52 L(M)=(52+1)/2=26.5

Page 32: 4/12/2015Chapter 21 Describing Distributions with Numbers.

04/18/23 Chapter 2 32

Example: Books read, n = 52

5-point summary: 0, 1, 3, 5.5, 99Highly asymmetric distribution

The mean (“xbar” = 7.06) and standard deviation (s = 14.43) give false impressions of location and spread for this distribution and are considered

inappropriate. Use the median and 5-point summary instead.

0 10 20 30 40 50 60 70 80 90 100 Number of books