Working with one variable data. Spread Joaquin’s Tests Taran’s Tests: 76, 45, 83, 68, 64 67, 70,...
-
Upload
felicity-reeves -
Category
Documents
-
view
217 -
download
0
Transcript of Working with one variable data. Spread Joaquin’s Tests Taran’s Tests: 76, 45, 83, 68, 64 67, 70,...
Working with one variable data
Spread Joaquin’s Tests Taran’s Tests: 76, 45, 83, 68, 64 67, 70, 70, 62,
62 What can you infer, justify and conclude about the
Joaquin’s and Taran’s tests scores? (Hint: Calculate the mean, median and mode for each.
What do they tell you?) J.’s mean = T.’s mean = med = med =
mode = none mode =
SpreadMean, median and mode are all good ways to find the centre of your data.
This information is most useful when the sets of data being compared are similar.
It is also important to find out how much your data is spread out. This gives a lot more insight to data sets that vary from each other.
Consider the following two data sets with identical mean and median values. Why is this information misleading? ( Mean =
5, Median = 5)Set A) 1, 2, 2, 3, 3, 4, 4, 4, 5,
5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9 Set B) 3, 3, 3, 4, 4, 4, 5, 5, 5,
6, 6, 6, 7, 7, 7Set A
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4 5 6 7 8 9
Series2
Set B
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9
Series2
This information is misleading because one graph is bell-shaped and the other is uniform, but the calculations make them appear to be similar when really A and B are spread out quite differently.
Measures of SpreadIn analysing data, it is often important to know whether it is spread out, or whether it is clustered around the mean.
Measures of spread are used to quantify the spread of the data.
The measures of spread, or dispersion are:RangeQuartilesVariance Standard deviation
RangeThe simplest measure of dispersion.Calculated by finding the difference
between the greatest and the least values of the data.
Useful since it is the easiest to understand.
Affected by extreme data.The range of values 1, 2, 4, 6, 9, 11, 15,
25is 25 – 1 = 24
Quartiles and Interquartile RangesQuartiles divide a set of ordered data into four groups with equal numbers of values.
Lowest Datum
First Quartile
Q1
Median
Q2
Third Quartile
Q3
Highest Datum
The three “dividing points” are the first quartile (Q1), median, (sometimes called the second
quartile, Q2), and the third quartile (Q3)
Quartiles and Interquartile Ranges
Lowest Datum Q1 Median
Q2
Q3 Highest Datum
The interquartile range is Q1 – Q3, which is the range of the middle of the data.
The semi-interquartile range is one half of the interquartile range.
Both these ranges indicate how closely the data are clustered around the median.
Box and Whisker PlotIllustrates the Quartiles
The Box shows the interquartile rangeThe whiskers represent the lowest and highest
values
A modified box and whisker plot shows outliers outside of the whiskers
See Page 141 for illustrations
Standard DeviationA deviation is the difference between an individual value in a set of data and the mean for the data.
Standard Deviation averages the square of the distance that each piece of data is from the mean.
The smaller the standard deviation, the more compact the data set.
Standard Deviation – Population
N
x
2)(
σ = Standard Deviation - Population
∑ = Sum
μ = Mean
N = Number of data in population
Standard Deviation – Sample
x
s = Standard Deviation - Sample
∑ = Sum
= Mean
n = Number of data in sample
1
)( 2
n
xxs
VarianceThe variance can be found by calculating
the average squared difference ( or deviation ) of each value from the mean.
N
x
22 )(
1
)( 22
n
xxs
Population Sample
Or square the standard deviation.
Standard Deviation – Group Data
1
)( 2
n
xmfs ii
N
mf ii
2)(
If you are working with grouped data, you can estimate the standard deviation using the following formula
Population Sample
fi = the frequency for a given interval
mi = the midpoint of the interval
Find the Measures of SpreadRachelle works part-time at a gas station. Her gross
earnings for the past eight weeks are shown.
$55 $68 $83 $59 $68 $95 $75 $65
Calculate the range, variance, standard deviation, interquartile, and semi-interquartile ranges for her weekly earnings.
Find the Measures of Dispersion
Range:
The range of Rachelle’s earnings is $
Find the Measures of DispersionVariance:
Gross Earnings
55
68
83
59
68
95
75
65
568Total
Mean
The variance of Rachelle’s earnings
is $
N
xVariance
2
2x x
Find the Measures of DispersionStandard Deviation:
The standard deviation of Rachelle’s earnings is $
Variance
Find the Measures of SpreadInterquartile range:
55 59 65 68 68 75 83 95
First, put the data into numerical order
792
83753
Q
622
65591
Q
Interquartile range = Q3 - Q1
= 79 – 62 = 17
Find the Measures of SpreadSemi-Interquartile range:
Semi-Interquartile range = 17/2
= 8.5
Therefore the interquartile range is 17 and semi-interquartile range is 8.5.
Standard Deviation Group Data - Example
1.5.
The following table represents the number of hours per day of watching TV in a sample of 500 people.
Number
of hours 0-1 2-3 4-5 6-7 8-9 10-11 12-13
Frequency 64 92 141 86 71 35 11
IntervalMidpoint
(mi)Frequency
fi
0 - 1 0.5 64 (0.5-5.1)2 = 21.16 64 x 21.16 = 1354.24
2 - 3 2.5 92 6.76 92 x 6.76 = 621.92
4 - 5 4.5 141 0.36 141 x 0.36 = 50.76
6 -7 6.5 86 1.96 86 x 1.96 = 168.56
8 - 9 8.5 71 11.56 71 x 11.86 = 842.06
10 - 11 10.5 35 29.16 35 x 29.16 = 1020.6
12 - 13 12.5 11 54.76 11 x 54.76 = 602.36
2)( ii mf2)( im
4660.5500
N
mf ii
2)( 500
5.4660
= 3.05
THEREFORE THE STANDARD DEVIATION IS APPROXIMATLY 3.05
Z-Scores• The number of standard deviations away from
the mean a data point is– Thus if our standard deviation is 8 then how many
8’s is a data point (13) away from the average or centre
– It is found by dividing the deviation by the standard deviation If your values are below
the mean their z score will be negative.
Similarly if your value is above the mean your z score will be positive
Percentiles
• Similar to quartiles• Percentiles divide the data into 100
intervals that have equal number of values.• k percent of the data are less than or
equal to kth percentile Pk
• Which means that you are finding what percent of the data is below your
specific value in question
• Often used for Standardized Tests
Homework
Pg 148 #1-6, 14
I LOVE HOMEWORK