Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of...
-
Upload
warren-sullivan -
Category
Documents
-
view
216 -
download
3
Transcript of Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of...
![Page 1: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/1.jpg)
Central Tendency
Mechanics
![Page 2: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/2.jpg)
Notation
• When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase letter such as X or Y.
• When we want to talk about specific data points within that set, we specify those points by adding a subscript to the uppercase letter like X1
– X = variable Xi = specific value
![Page 3: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/3.jpg)
Example
5, 8, 12, 3, 6, 8, 7
X1, X2, X3, X4, X5, X6, X7
![Page 4: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/4.jpg)
Summation
• The Greek letter sigma, which looks like , means “add up” or “sum” whatever follows it.
• For example, Xi, means “add up all the Xis”.
• If we use the Xis from the previous example, Xi = 49 (or just X).
![Page 5: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/5.jpg)
Example
Pred. Actual Student Score Score X Y
1 82 84 2 66 51 3 70 72 4 81 56 5 61 73
![Page 6: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/6.jpg)
Example
X = 82 + 66 + 70 + 81 + 61 = 360
Y = 84 + 51 + 72 + 56 + 73 = 336
(X-Y) = (82-84) + (66-51) + (70-72) + (81-56) + (61-73) = -2 + 15 + (-2) + 25 + (-12) = 24
X2 = 822 + 662 + 702 + 812 + 612 = 6724 + 4356 +
4900 + 6561 + 3721 = 26262One can also see it as (X2)
(X)2 = 3602 = 129600
![Page 7: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/7.jpg)
Calculations of Measures of Central Tendency
• Mode = Most commonly occurring value• May have bimodal, trimodal etc. distributions.• A uniform distribution is one in which every value
has an equal chance of occurring
• Median• The position of the median value can then be
calculated using the following formula:
Median Location = N + 12
![Page 8: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/8.jpg)
Median
Median Location = 9 + 12
= 5
• If there are an odd number of data points:
(1, 2, 2, 3, 3, 4, 4, 5, 6)
• The median is the item in the fifth position of the ordered data set, therefore the median is 3.
![Page 9: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/9.jpg)
Median
• If there are an even number of data points:(1, 2, 2, 3, 3, 4, 4, 5, 6, 793)
• The formula would tell us to look in the 5.5th place, which we can’t really do.
• However we can take the average of the 5th and 6th values to give us the median.
• In the above scenario 3 is in the fifth place and 4 is in the sixth place so we can use 3.5 as our median.
![Page 10: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/10.jpg)
The Arithmetic Mean
303.33
9
XX
N
• For example, given the data set that we used to calculate the median (odd number example), the corresponding mean would be:
• Note that they are not exactly the same.• When would they be?
![Page 11: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/11.jpg)
Mode = 2 slices per week
Median = 4 slices per week
Mean = 5.7 slices per week
Example: Slices of Pizza Eaten Last Week
Value Freq Value Freq
0 4 8 51 2 10 22 8 15 13 6 16 14 6 20 15 6 40 16 5
• This raises the issue of which measure is best
![Page 12: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/12.jpg)
Other Means
• Geometric mean
• Harmonic mean
• Compare both to the Arithmetic mean of 3.8
nnxxxxxGM ...4321
448.348054622
5,4,6,2,255
GM
X i
nxxxxx
nHM
1...
4
1
3
1
2
1
1
1
093.35
5,4,6,2,2
5
1
4
1
6
1
2
1
2
1
HM
X i
![Page 13: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/13.jpg)
Other Means
• Weighted mean• Multiply each score
by the weight, sum those then divide by the sum of the weights.
i
ii
w
xwWM
![Page 14: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/14.jpg)
Trimmed mean
• You are very familiar with this in terms of the median, in which essentially all but the middle value is trimmed (i.e. a 50% trimmed mean)
• But now we want to retain as much of the data for best performance but enough to ensure resistance to outliers
• How much to trim?• About 20%, and that means from both sides• Example: 15 values. .2 * 15 = 3, remove 3
largest and 3 smallest
![Page 15: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/15.jpg)
Winsorized Mean• Make some percentage of the most extreme
values the same as the previous, non-extreme value
• Think of the 20% Winsorized mean as affecting the same number of values as the trimming
• Median = 3.5• Huber’s M1 = 3.56• M.20 = 3.533• WM.20 = 3.75 • Mean = 3.95
• Which of these best represents the sample’s central tendency?
122233333344444556810
33333333334444455555
![Page 16: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/16.jpg)
M-estimators• Wilcox’s text example with more detail, to show the ‘gist’ of the calculation1
• Data = 3,4,8,16,24,53• We will start by using a measure of outlierness as follows
• What it means:– M = median– MAD = median absolute deviation
• Order deviations from the median, pick the median of those outliers– .6745 = dividing by this allows this measure of variance to equal the population
standard deviation• When we do will call it MADN in the upcoming formula
– So basically it’s the old ‘Z score > x’ approach just made resistant to outliers
1.28/ .675
X M
MAD
![Page 17: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/17.jpg)
M-estimators
• Median = 12• Median absolute deviation
– -9 -8 -4 4 12 41 4 4 8 9 12 41– MAD is 8.5, 8.5/.6745 = 12.6
• So if the absolute deviation from the median divided by 12.6 is greater than 1.28, we will call it an outlier
• In this case the value of 53 is an outlier– (53-12)/12.6 = 3.25– If one used the poorer method of using a simple z-score > 2 (or
whatever) based on means and standard deviations, it’s influence is such that the z-score of 1.85 would not signify it as an outlier
![Page 18: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/18.jpg)
M-estimators
• L = number of outliers less than the median– For our data none qualify
• U = number of outliers greater than the median– For our data 1 value is an upper outlier
• B = sum of values that are not outliers
• Notice that if there are no outliers, this would default to the mean
1.28( )( )MADN U L BMest
n L U
![Page 19: Central Tendency Mechanics. Notation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase.](https://reader036.fdocuments.net/reader036/viewer/2022071807/56649ec85503460f94bd5c30/html5/thumbnails/19.jpg)
M-estimators
• Compare with the mean of 181
226.14106
55)01)(6.12(28.1
estM