Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range...
Transcript of Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range...
![Page 1: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/1.jpg)
Mathematical Statistics
Anna Janicka
Lecture II, 24.02.2020
DESCRIPTIVE STATISTICS, PART II
![Page 2: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/2.jpg)
Plan for today
1. Descriptive Statistics, part II:
mode
quantiles
measures of variability
measures of asymmetry
the boxplot
![Page 3: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/3.jpg)
Measures of central tendency – reminder
Classic:
arithmetic mean
Position (order, rank):
median
mode
quartile
![Page 4: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/4.jpg)
Example 1 – cont.
Mode – examples
Quartiles – example
Variance – example
Grade Number Frequency
2 74 29.84%
3 76 30.65%
3.5 48 19.35%
4 31 12.50%
4.5 9 3.63%
5 10 4.03%
Total 248 100%
![Page 5: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/5.jpg)
Example 3 – cont.
Interval Class
mark Number Frequency
Cumulative
number cni
Cumulative
frequency cfi
(30,40] 35 11 0,11 11 0,11
(40,50] 45 23 0,23 34 0,34
(50,60] 55 33 0,33 67 0,67
(60,70] 65 12 0,12 79 0,79
(70,80] 75 6 0,06 85 0,85
(80,90] 85 8 0,08 93 0,93
(90,100] 95 3 0,03 96 0,96
(100,110] 105 2 0,02 98 0,98
(110,120] 115 2 0,02 100 1
Total 100 1 Mode – example
Quartiles – example
Variance – example
![Page 6: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/6.jpg)
Mode
Mode
the value that appears most often
for grouped data:
Mo = most frequent value
for grouped class interval data:
where
nMo – number of elements in mode’s class,
cL, b – analogous to the median
bnnnn
nncMo
MoMoMoMo
MoMoL
)()( 11
1
![Page 7: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/7.jpg)
Mode – examples
Example 1:
Mo = 3
Example 3:
the mode’s interval is (50,60], with 33 elements
nMo = 33, cL = 50, b = 10, nMo-1 = 23, nMo+1 = 12
23.5310)1233()2333(
233350
Mo
Example 1 –
cont.
Example 3 –
cont.
![Page 8: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/8.jpg)
Which measure should we choose?
Arithmetic mean: for typical data series
(single max, monotonous frequencies)
Mode: for typical data series, grouped data
(the lengths of the mode’s class and
neighboring classes should be equal)
Median: no restrictions. The most robust (in
case of outlier observations, fluctuations
etc.)
![Page 9: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/9.jpg)
Quantiles, quartiles
p-th quantile (quantile of rank p): number
such that the fraction of observations less
than or equal to it is at least p, and values
greater than or equal to it at least 1-p
Q1 : first quartile = quantile of rank ¼
Second quartile = median
= quantile of rank ½
Q3: Third quartile = quantile of rank ¾
![Page 10: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/10.jpg)
Quantiles – cont.
Empirical quantile of rank p:
ZnpX
ZnpXX
Q
nnp
nnpnnp
p
:1][
:1:
2
![Page 11: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/11.jpg)
Quartiles – cont.
Quantiles for p = ¼ and p = ¾.
For grouped class interval data –
analogous to the median
for k=1 or 3
where M1, M3 – number of the quartile’s class
b – length of quartile class interval
cL – lower end of the quartile class interval
1
14
k
k
M
i
i
M
Lk nnk
n
bcQ
![Page 12: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/12.jpg)
Quartiles – examples
Example 1:
so
Example 3:
so
75100 25100 43
41
4M ,2 31 M
67,66)6775(12
106009,46)1125(
23
1040 31 QQ
Example1 –
cont.
Example 3 –
cont.
18624862248 41 4
3
5322321
2481872481862486324862 ., :::: XXXX
Q Q
![Page 13: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/13.jpg)
Variability measures
Classical measures
variance, standard deviation
average (absolute) deviation
coefficient of variation
Measures based on order statistics
range
interquartile range
quartile deviation
coefficients of variation (based on order stats)
median absolute deviation
![Page 14: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/14.jpg)
Measures based on order statistics
Range
the most simple measure, does not take into
account anything but the extreme values
Inter Quartile Range (midspread, middle fifty)
more robust than the range
nnn XXr :1:
13 QQIQR
may be further used to calculate quartile deviation Q= IQR/2, and
coefficients of variation VQ = Q/Med or VQ1Q3 = IQR/(Q3+Q1) (quartile
variation coefficient) or the typical range: [Med – Q, Med + Q]
length of the interval that covers the
middle 50% observations
![Page 15: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/15.jpg)
Range, interquartile range – examples
Example 1:
Example 3:
(in reality
5.125.3
,325
IQR
r
58,2009,4667,66
)45,8645329118
9030120
IQR
,-,
r
![Page 16: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/16.jpg)
Classical measures of dispersion
Variance
raw data
grouped data
grouped class interval data
+ Sheppard’s correction
in general
2
1
21
1
212 )()(ˆ
n
i
in
n
i
inXXXXS
2
1
21
1
212 )()(ˆ
k
i
iin
k
i
iinXXnXXnS
2
1
21
1
212 )()(ˆ
k
i
iin
k
i
iinXcnXcnS
12
22 2ˆ cSS
c=length of
class
interval (for
equal
intervals)
2
1
112122 )(ˆ
k
i
iiinccnSS
![Page 17: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/17.jpg)
Variance – examples
Example 1:
Example 3:
in reality
1006359063543106344806353760633740632 222222
2481 ).()..().()..().().(
710
2
.
ˆ
S
98.32212
1031.331
31.331
ˆ
22
10012
S
S
)2)7.58115(2)7.58105(3)7.5895(8)7.5885(6)7.5875(
12)7.5865(33)7.5855(23)7.5845(11)7.5835((
22222
2222
85.333ˆ2 S
Example 1 – cont.
Example 3 – cont.
distrubution not normal or
sample too small for
Sheppard’s correction –
larger errors from small
sample size than from class
grouping.
![Page 18: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/18.jpg)
Standard deviation
In the same units as the initial variable
Example 1:
Example 3:
22 ,ˆˆ SSSS
[grade] S 840.ˆ
][m 22.18ˆ S
![Page 19: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/19.jpg)
Average (absolute) deviation, mean deviation
Nowadays seldom used. Simple
calculations.
for raw data
etc...
We have: d<S
n
i
inXXd
1
1 ||
![Page 20: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/20.jpg)
Coefficient of variation (classical)
For comparisons of the same varaible
accross populations or different
variables for the same population
%)100( or
%),100(ˆ
X
dV
X
SV
d
S
![Page 21: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/21.jpg)
Skewness (asymmetry)
left symmetry right
(negative) (zero) (positive)
(typical order)
MoMedX MoMedX MoMedX
![Page 22: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/22.jpg)
Measures of asymmetry
Skewness
where M3 is the third
central moment
Skewness coefficient
Quartile skewness coefficient
3
3
S
MA
ˆ
or ˆ 11
S
MedXA
S
MoXA
13
132
2
QMedQA
measures skewness for the
middle 50% observations only
![Page 23: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/23.jpg)
Interpretation
positive values= positive asymmetry (right
skewed distribution)
negative values = negative asymmetry
(left skewed distribution)
For the skewness coefficient (with the
median) and the quartile skewness
coefficient the strength of asymmetry
(absolute value):
0 – 0.33: weak
0.34 – 0.66: medium
0.67 – 1: strong
![Page 24: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/24.jpg)
Asymmetry – examples
Example 1:
Example 3:
15.009.4667.66
09.4685.54267.66
24.02.18
85.547.583.0
2.18
23.537.58
,15.1
2
11
A
)( A or )( A
A
MedMo
3
1
253
23253
070840
3063
070840
3063
280
2
1
1
.
.
..
.
..
.
.
A
)Mo( A
)Med( A
A
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
25 35 45 55 65 75 85 95 105 115 125
Fre
qu
en
cy
Surface area
0%
5%
10%
15%
20%
25%
30%
35%
2 3 3.5 4 4.5 5
Fre
qu
en
cy
Grade
0%
5%
10%
15%
20%
25%
30%
35%
2 2.5 3 3.5 4 4.5 5
Fre
qu
en
cy
Grade
![Page 25: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/25.jpg)
Boxplot
(Box and whisker plot)
Allows to compare two (or more)
populations
outliers:
xmax
outliers
X*
Q3
Med
Q1
X*
outliers
xmin
]},[:max{
]},[:min{
23
33
123
1
IQRQQXXX
QIQRQXXX
ii
ii
XxXx or
![Page 26: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/26.jpg)
Boxpolot – example of comparison
05
10
15
1 2
![Page 27: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/27.jpg)
Examples (1)
Source: CSO Poland, 2009
![Page 28: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/28.jpg)
Examples (2)
Source: European
Commission
Dispersion (coeffcient of variability) of unemployment
rates
![Page 29: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/29.jpg)
Examples (3)
Growth charts
Source: WHO
![Page 30: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/30.jpg)
Examples (4)
Gross hourly earinings
Source: European Commission
![Page 31: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/31.jpg)
Examples(5)
Salary by occupational group and gender
Source: New Zealand State Services
![Page 32: Mathematical Statistics Anna Janickastrony.wne.uw.edu.pl/ajanicka/wp-content/uploads/... · Range the most simple measure, does not take into account anything but the extreme values](https://reader033.fdocuments.net/reader033/viewer/2022053015/5f1494f0cf6b65555e04c30a/html5/thumbnails/32.jpg)