Summary Statistics: Measures of Location and Dispersion.
-
Upload
tracy-eaton -
Category
Documents
-
view
239 -
download
0
Transcript of Summary Statistics: Measures of Location and Dispersion.
Summary Statistics: Measures of Location and
Dispersion
nxxx 21
n
iix
1
The sum of values,
, can be denoted as
.
Select 4 students and ask “how many brothers and sisters do you have?”
•Data: 2, 3, 1, 3
931324
1
i
ix
9 xOr we can write
xccx
ncc ncxcxcx
Solve the following:
x4
3x
34x
234x
Measure of Central Tendency -Description of Average (Typical Value)
Sample Mean: nx
x
number of siblings – Data: 2, 3, 1, 3
Suppose we had selected a 5th person for our sample which had 10 siblings.
•New Data: 2, 3, 1, 3, 10
The sample mean is sensitive to extreme values and does not have to be a possible data value.
rank data from smallest to largest
if n is odd, median is the middle score
if n is even, median is the mean of two middle scores
x~
number of siblings – Data: 2, 3, 1, 3
New Data: 2, 3, 1, 3, 10
Sample median is not sensitive to extreme scores
Half the data will fall above the sample median and half below the sample median
The median is a better measure of
central tendency if extreme scores exist.
If extreme scores are unlikely, the mean varies less from sample to sample than the median and is a better measure.
If the distribution is right skewed
If the distribution is symmetric
If the distribution is left skewed
xx ~
xx ~
xx ~
sample mode: most frequent score
Example: number of siblings – Data: 2,3,1,3Mode = 3
New Data: 2,3,1,3,10Mode = 3
Mode does not always exist/can be more than one
Also, it is unstable Should be used with qualitative data
Example: number of siblings – Data: 2,3,1,3
Midrange =
New Data: 2,3,1,3,10
Midrange =
Midrange is totally dependent on extreme scores.
2
HighLow
22
31
2
HighLow
5.52
101
2
HighLow
Percentiles – gives the percentage below an observation
Quartiles – divide the data into four equally sized parts
, First Quartile: 25th percentile
, Second Quartile ( ), 50th percentile
, Third Quartile, 75th percentile
1Q
2Q
3Q
x~
Order the data from smallest to largest
Find . This is
is the median of the lower half of the data; that is, it is the median of the data falling below (not including )
is the median of the upper half of the data; (same as above)
x~
1Q
2Q
2Q
2Q
3Q
Interquartile range (IQR) = Q3 – Q1
Range of the middle 50% of the data 5 number summary – The low score, Q1,
Q2, Q3, and the high score
Students Faculty0 0013555678 0 1 0 1 0552 2 045883 3 14 4 35 5 6 6 7 7 3
Students FacultyLow = 0 Low = 10Q1 = 1 Q1 = 15Q2 = 5 Q2 = 25Q3 = 7 Q3 = 31High = 10 High = 73
The box goes from Q1 to Q3 and represents IQR
The line through the box is Q2 ( )
Extreme values are identified by *’s
Lines, called whiskers, run from Q1 to the lowest value and from Q3 to the highest value (If the low or high are extreme then the whisker goes to the next value)
x~
Students Faculty
0
10
20
30
40
50
60
70
80
Stu
dent
s
CBA
43
38
33
A
Distribution #1 Distribution #21 1 52 5 2 553 5555555 3 5554 5 4 555 5 5 Distribution #1 Distribution #2 = 35 = 35 = 35 = 35mode = 35 mode = 35midrange =35 midrange = 35
X XX~
X~
Example: Years of experience of faculty Data: 1, 30, 22, 10, 5
Range is sensitive to extreme scores
(Based entirely on the high and low)
Range is easy to compute
Large values of suggest large variability
It is difficult to interpret since it is in square units
Keep in mind it can never be negative
x
1111
X Squared of Sum222
2
nn
xxn
n
xx
n
SSX
nS
Example: Years of experience of faculty Data: 1, 30, 22, 10, 5
sample standard deviation – measures the average distance data points are from
Standard deviation is in the same units as the data
x
2SS
Z-score – Gives the number of standard deviations an observation is above or below the mean
Example: Test scores = 79, s = 9
If your score is 88%, what is your z-score?If your score is 63%, what is your z-score?
s
xxz
X
Approximately 68% of the data fall within 1 standard deviation of the mean
Approximately 95% of the data fall within 2 standard deviations of the mean
Approximately 99.7% of the data fall within 3 standard deviations of the mean
),( sxsx
)2,2( sxsx
)3,3( sxsx
Example: Suppose that the amount of liquid in
“12 oz.” Pepsi cans is a mound shaped distribution with oz. and s = 0.1 oz.
12x