Summary Statistics: Measures of Location and Dispersion.

29
Summary Statistics: Measures of Location and Dispersion

Transcript of Summary Statistics: Measures of Location and Dispersion.

Page 1: Summary Statistics: Measures of Location and Dispersion.

Summary Statistics: Measures of Location and

Dispersion

Page 2: Summary Statistics: Measures of Location and Dispersion.

nxxx 21

n

iix

1

The sum of values,

, can be denoted as

.

Page 3: Summary Statistics: Measures of Location and Dispersion.

Select 4 students and ask “how many brothers and sisters do you have?”

•Data: 2, 3, 1, 3

931324

1

i

ix

9 xOr we can write

Page 4: Summary Statistics: Measures of Location and Dispersion.

xccx

ncc ncxcxcx

Page 5: Summary Statistics: Measures of Location and Dispersion.

Solve the following:

x4

3x

34x

234x

Page 6: Summary Statistics: Measures of Location and Dispersion.

Measure of Central Tendency -Description of Average (Typical Value)

Sample Mean: nx

x

Page 7: Summary Statistics: Measures of Location and Dispersion.

number of siblings – Data: 2, 3, 1, 3

Suppose we had selected a 5th person for our sample which had 10 siblings.

•New Data: 2, 3, 1, 3, 10

The sample mean is sensitive to extreme values and does not have to be a possible data value.

Page 8: Summary Statistics: Measures of Location and Dispersion.

rank data from smallest to largest

if n is odd, median is the middle score

if n is even, median is the mean of two middle scores

x~

Page 9: Summary Statistics: Measures of Location and Dispersion.

number of siblings – Data: 2, 3, 1, 3

New Data: 2, 3, 1, 3, 10

Sample median is not sensitive to extreme scores

Half the data will fall above the sample median and half below the sample median

Page 10: Summary Statistics: Measures of Location and Dispersion.

  The median is a better measure of

central tendency if extreme scores exist.

If extreme scores are unlikely, the mean varies less from sample to sample than the median and is a better measure.

Page 11: Summary Statistics: Measures of Location and Dispersion.

If the distribution is right skewed

If the distribution is symmetric

If the distribution is left skewed

xx ~

xx ~

xx ~

Page 12: Summary Statistics: Measures of Location and Dispersion.

sample mode: most frequent score

Example: number of siblings – Data: 2,3,1,3Mode = 3

New Data: 2,3,1,3,10Mode = 3

Mode does not always exist/can be more than one

Also, it is unstable  Should be used with qualitative data

Page 13: Summary Statistics: Measures of Location and Dispersion.

Example: number of siblings – Data: 2,3,1,3

Midrange =

New Data: 2,3,1,3,10

Midrange =

Midrange is totally dependent on extreme scores.

2

HighLow

22

31

2

HighLow

5.52

101

2

HighLow

Page 14: Summary Statistics: Measures of Location and Dispersion.

Percentiles – gives the percentage below an observation

Quartiles – divide the data into four equally sized parts

  , First Quartile: 25th percentile

, Second Quartile ( ), 50th percentile

, Third Quartile, 75th percentile

1Q

2Q

3Q

x~

Page 15: Summary Statistics: Measures of Location and Dispersion.

Order the data from smallest to largest

Find . This is

is the median of the lower half of the data; that is, it is the median of the data falling below (not including )

is the median of the upper half of the data; (same as above)

x~

1Q

2Q

2Q

2Q

3Q

Page 16: Summary Statistics: Measures of Location and Dispersion.

Interquartile range (IQR) = Q3 – Q1

Range of the middle 50% of the data  5 number summary – The low score, Q1,

Q2, Q3, and the high score

Page 17: Summary Statistics: Measures of Location and Dispersion.

Students Faculty0 0013555678 0 1 0 1 0552 2 045883 3 14 4 35 5 6 6 7 7 3

Page 18: Summary Statistics: Measures of Location and Dispersion.

Students FacultyLow = 0 Low = 10Q1 = 1 Q1 = 15Q2 = 5 Q2 = 25Q3 = 7 Q3 = 31High = 10 High = 73

Page 19: Summary Statistics: Measures of Location and Dispersion.

The box goes from Q1 to Q3 and represents IQR

The line through the box is Q2 ( )

Extreme values are identified by *’s

Lines, called whiskers, run from Q1 to the lowest value and from Q3 to the highest value (If the low or high are extreme then the whisker goes to the next value)

x~

Page 20: Summary Statistics: Measures of Location and Dispersion.

Students Faculty

0

10

20

30

40

50

60

70

80

Stu

dent

s

Page 21: Summary Statistics: Measures of Location and Dispersion.
Page 22: Summary Statistics: Measures of Location and Dispersion.

CBA

43

38

33

A

Page 23: Summary Statistics: Measures of Location and Dispersion.

Distribution #1 Distribution #21 1 52 5 2 553 5555555 3 5554 5 4 555 5 5 Distribution #1 Distribution #2 = 35 = 35 = 35 = 35mode = 35 mode = 35midrange =35 midrange = 35

X XX~

X~

Page 24: Summary Statistics: Measures of Location and Dispersion.

Example: Years of experience of faculty Data: 1, 30, 22, 10, 5

   Range is sensitive to extreme scores

(Based entirely on the high and low)

Range is easy to compute

Page 25: Summary Statistics: Measures of Location and Dispersion.

Large values of suggest large variability

It is difficult to interpret since it is in square units

Keep in mind it can never be negative

x

1111

X Squared of Sum222

2

nn

xxn

n

xx

n

SSX

nS

Page 26: Summary Statistics: Measures of Location and Dispersion.

Example: Years of experience of faculty Data: 1, 30, 22, 10, 5

sample standard deviation – measures the average distance data points are from

Standard deviation is in the same units as the data

x

2SS

Page 27: Summary Statistics: Measures of Location and Dispersion.

Z-score – Gives the number of standard deviations an observation is above or below the mean

Example: Test scores = 79, s = 9

If your score is 88%, what is your z-score?If your score is 63%, what is your z-score?

s

xxz

X

Page 28: Summary Statistics: Measures of Location and Dispersion.

Approximately 68% of the data fall within 1 standard deviation of the mean

Approximately 95% of the data fall within 2 standard deviations of the mean

Approximately 99.7% of the data fall within 3 standard deviations of the mean

),( sxsx

)2,2( sxsx

)3,3( sxsx

Page 29: Summary Statistics: Measures of Location and Dispersion.

Example: Suppose that the amount of liquid in

“12 oz.” Pepsi cans is a mound shaped distribution with oz. and s = 0.1 oz.

12x