Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape...
-
Upload
stephany-hicks -
Category
Documents
-
view
218 -
download
0
Transcript of Graphical Displays of Information 3.1 – Tools for Analyzing Data Learning Goal: Identify the shape...
Graphical Displays of Information
3.1 – Tools for Analyzing Data
Learning Goal: Identify the shape of a histogram
MSIP / Home Learning: p. 146 #1, 2, 4 , 9, 11 (data in Excel file on wiki),13
Histograms
Show how data is spread out Best choice for:
continuous data discrete data with a large spread
Data is divided into 5-6 intervals Bin width = width of each interval (same) Different bin widths can produce different
shaped distributions
Histogram Example These
histograms represent the same data
One shows much less of the structure of the data
Too many bins (bin width too small) is also a problem
Co
un
t
5
10
15
20
25
30
SomeData40 60 80 100 120
Data Histogram
Co
un
t
1
2
3
4
5
6
7
8
9
SomeData40 60 80 100 120
Data Histogram
Co
un
t1
2
3
4
5
6
SomeData30 40 50 60 70 80 90 100 110
Data Histogram
Histogram Applet – Old Faithfulhttp://www.stat.sc.edu/~west/javahtml/Histogram.html
Bin Width Calculation
Bin width = (range) ÷ (number of intervals) where range = (max) – (min) Number of intervals is usually 5-6
Bins should not overlap Incorrect:
0-10, 10-20, 20-30, 30-40, etc. Correct:
Discrete: 0 - 9, 10 - 19, 20 - 29, 30 - 39, etc. Concinuous: 0 - 9.99, 10 - 19.99, 20 - 29.99, 30 - 39.99, etc.
Shapes of Distributions
Symmetric Mound Shaped U-Shaped Uniform
Unsymmetrical Left-Skewed Right-Skewed
Mound-shaped distribution Middle interval(s) have the greatest frequency /
tallest bars Bars get shorter as you move away E.g. roll 2 dice, height, memory
U-shaped distribution
Lowest frequency in the centre, higher towards the outside
E.g. height of a combined grade 1 and 6 class
105.5-
110.5
110.5-
115.5
115.5-
120.5
120.5-
125.5
125.5-
130.5
130.5-
135.5
135.5-
140.5
140.5-
145.5
145.5-
150.5
150.5-
155.5
155.5-
160.5
160.5-
165.6
0
2
4
6
8
10
12
Student Heights
Height (cm)
Frequency
Uniform distribution
All bars are approximately the same height e.g. roll a die 50 times
Symmetric distribution A distribution that is the same on either side of the
centre U-Shaped, Uniform and Mound-shaped
Distributions are symmetric
Skewed distribution (left or right) Highest frequencies at one end Left-skewed has higher bars on the right and
drops off to the left E.g. the years on a handful of quarters (left) E.g. the years of cars on a classic car lot (right)
MSIP / Home Learning Define in your notes:
Frequency distribution (p. 142-143) Cumulative frequency (p. 148) Relative frequency (p. 148)
Complete p. 146 #1, 2, 4 , 9, 11 (data in Excel file on wiki),13
Warm up - Class marks
What shape is this distribution? Which of the following can you tell from the
graph: mean? median? modal interval?
Left-skewed Modal interval: 72 – 80 Median: 64-72 (70 actual) Mean: 66
1
2
3
4
5
6
7
Mark0 20 40 60 80 100
Collection 1 Histogram
Minds On!
Mr. Lieff recorded the following 20 quiz marks:
60 60 60 60 60 60 70 70 70 70
80 80 80 80 80 100 100 100 100 100
Find the average mark 2 different ways.
Measures of Central Tendency (Mean, Median, Mode)
Chapter 3.2 – Tools for Analyzing Data
Learning Goal: Calculate the mean, median and mode for weighted / grouped data
Due now: p. 146 #1, 2, 4 , 9, 11 (data in Excel file on wiki),13
MSIP / Home Learning: p. 159 #4, 5, 6, 8, 10-13
Sigma Notation the sigma notation is used to compactly
express a mathematical series ex: 1 + 2 + 3 + 4 + … + 15 this can be expressed:
the variable k is called the index of summation.
the number 1 is the lower limit and the number 15 is the upper limit
we would say: “the sum of k for k = 1 to k = 15”
15
1k
k
Example 1:
write in expanded form:
This is the sum of the term 2n+1 as n takes on the values from 4 to 7.
= (2×4 + 1) + (2×5 + 1) + (2×6 + 1) + (2×7 + 1) = 9 + 11 + 13 + 15 = 48 NOTE: any letter can be used for the index of
summation, though a, n, i, j, k & x are the most common
7
4
)12(n
n
Example 2: write the following in sigma notation
8
3
4
3
2
33
3
0 2
3
nn
The Mean (Average)
n
xx
n
ii
1
Found by dividing the sum of all the data points by the number of data points
Affected greatly by outliers Deviation (3.5)
the distance of a data point from the mean calculated by subtracting the mean from the value i.e. xx
The Weighted Mean
n
ii
n
iii
w
wxx
1
1
where xi represent the data points, wi represents the weight or the frequency
“The sum of the products of each item and its weight divided by the sum of the weights”
see examples on page 153 and 154 example: 7 students have a mark of 70 and 10 students
have a mark of 80 mean = (70×7 + 80×10) ÷ (7+10) = 75.9
Means with grouped data
for data that is already grouped into class intervals (assuming you do not have the original data), you must use the midpoint of each class to estimate the weighted mean
see the example on page 154-5
Median
the midpoint of the data calculated by placing all the values in order if there is an odd number of values, the median is
the middle number 1 4 6 8 9 median = 6
if there are an even number of values, the median is the mean of the middle two numbers 1 4 6 8 9 12 median = 7
not affected greatly by outliers
Mode
The number that occurs most often There may be no mode, one mode, two modes (bimodal), etc. Which distributions from yesterday have one mode? Mound-shaped, Left/Right-Skewed Two modes? U-Shaped, Mound-shaped (could), Uniform (could) Modes are appropriate for discrete data or non-numerical data
Shoe size, Number of siblings Eye colour, Favourite subject
Distributions and Central Tendancy the relationship between the three measures
changes depending on the spread of the data
symmetric (mound shaped) mean = median = mode
right skewed mean > median > mode
left skewed mean < median < mode
Co
un
t
1
2
3
data0 1 2 3 4 5 6 7
Data Histogram
Co
un
t
1
2
3
4
5
data0 1 2 3 4 5 6 7
Data Histogram
Co
un
t1
2
3
4
5
data0 1 2 3 4 5 6 7
Data Histogram
What Method is Most Appropriate? Outliers are data points that are quite
different from the other points Outliers affect the mean the greatest The median is least affected by outliers Skewed data is best represented by the
median If symmetric either median or mean If not numeric or if the frequency is the most
critical measure, use the mode
Example 3 a) Find the mean, median and mode
mean = [(1x2) + (2x8) + (3x14) + (4x3)] / 27 = 2.7 median = 3 (27 data points, so #14 falls in bin 3) mode = 3
b) which way is it skewed? Left
Survey responses 1 2 3 4
Frequency 2 8 14 3
Example 4 Find the mean, median and mode
mean = [(145.5×3) + (155.5×7) + (165.5×4)] ÷ 14
= 156.2 median = 151-160 or 155.5 mode = 151-160 or 155.5
MSIP / Home Learning: p. 159 #4, 5, 6, 8, 10-13
Height 141-150 151-160 161-170
No. of Students 3 7 4
MSIP / Home Learning
p. 159 #4, 5, 6, 8, 10-13
References
Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page