Descriptive Statistics Descriptive Statistics describe a set of data.
Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive...
-
Upload
antonio-holmes -
Category
Documents
-
view
236 -
download
2
Transcript of Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive...
![Page 1: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/1.jpg)
Chapter 4. Elements of Statistics
# brief introduction to some concepts of statistics
# descriptive statistics inductive statistics(statistical inference)
# Classification of the field of statisticsi) Sampling theoryii) Estimation theoryiii) Hypothesis testingiv) Curve fitting or Regressionv) Analysis of variance
![Page 2: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/2.jpg)
4.2 Sampling Theory–the Sample MeanHow many samples are required
for a given degree of confidence in the result?
# Terminology
- population
N(size of population) very large or ∞
- (random) sample
n(size of sample)
# one of the most important quantities is the sample mean
How close the sample mean might be
to the average value of the population?
![Page 3: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/3.jpg)
Let the sample have the numerical value of x1, x2, … xn
Then, the sample mean is given by
Note that we are interested in the statistical properties of
arbitrary random samples rather than any particular sample.
That is, the sample mean becomes a random variable.
Therefore, it is appropriate to denote the sample mean as
n
i
xin
x1
1
n
i
Xin
x1
1
![Page 4: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/4.jpg)
We want the mean value of the sample mean
close to the true mean value of the population
the mean value of the sample mean
= the true mean value of the population
The sample mean is a unbiased estimate of the true mean.
But, this is not sufficient to indicate whether the sample mean is a good estimator of the true population mean.
n
i
n
iiXEn
Xin
EXE1 1
][1
]1
[]ˆ[
XXnn
1
X
![Page 5: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/5.jpg)
The variance of the sample mean 은 ?
N n ≫ 이라 가정 (population 의 특성이 sampling 중에 변하지 않는다 .)
Var mean
square of - square of the mean
n
i
n
jX
nXiXjEX
1 1
2
2 ]1
[)ˆ(
X̂
![Page 6: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/6.jpg)
가정 : statisticallyindep.
따라서 Var
(!)
n
i
n
jX
nXiXjE
1 1
2
2 ][1
XjXi& ji XXiXjE
2][ ji
X 2 ji
nn
nnX
XX
XXnXn
222
2222
2 ])([1ˆ
![Page 7: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/7.jpg)
Where is the true variance of the population As n => ∞, Variance => 0,
Which means that large sample sizes lead to a better estimate
* 참고 : 1)N 이 크지 않을 때 N 이 클 때와 같은 효과를 얻을 수 있는 방법 “sampling with replacement”
2
![Page 8: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/8.jpg)
2)N 이 작고 replace 할 수 없을 때는Var
N->∞ 앞식으로 수렴N = n 일때는 0 ( 당연 !)
`Two examples : 교재 pp163 ~165 참조
)1
(ˆ2
N
nN
nX
![Page 9: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/9.jpg)
4.3 Sampling Theory – The sample Variance
The population variance is needed for determiningthe sample size required to achieve a desired varianceof the sample mean (see eq. 4-4)
Definition(Sample Variance):
The expected value of the sample variance
can be derived easily using
not the true variance , that is, a biased estimate rather than an unbiased one
n
iXXS in 1
22 ˆ1
22 1][
n
nE S
n
j
Xjn
X1
1ˆ
2
2
![Page 10: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/10.jpg)
Now, we redefine the sample variance for having an unbia
sed estimate of the population variance :
Note that these hold for very large N, that is, N=∞.
How about when the population size is not large?
n
iXX
SS
in
n
n
1
2
22
ˆ
~
1
1
1
![Page 11: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/11.jpg)
# When N is not large, the expected value of S2 is given by
For obtaining an unbiased estimate, we redefine
# The variance of the estimates of the variance :
the variance of S2 :
the variance of :
where is the 4th central moment of the population
22 1
1][
n
n
N
NE S
SS n
n
N
N 22
1
1~
1 2)4( 42~
n
nVar S
n
Var S 4
42
S~2
][4
4 XXE
![Page 12: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/12.jpg)
4.4 Sampling Distributions & Confidence Intervalswhat is the probability that the estimates are within specified bounds?
p,d,f 를 알아야 함2 가지 종류 , 그리고 sample mean 에 대해서만 !
normalized sample mean Xi 가 Gaussian and independent 일때
=> Gaussian (0,1)
n
XXZ
ˆ
![Page 13: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/13.jpg)
Xi 가 not Gaussian 이더라도 n=>∞ 이면Z 는 asymptotically Gaussian by the
central limit theorem(n 은 보통 n≥30 은 되어야 함 ; A rule of
thumb)
H.W) Solve the problems in chap.4;4-2.1, 4-2.5, 4-3.1, 4-4.1, 4-5.1, 4-6.1
![Page 14: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/14.jpg)
를 모를 때 대신에 로 대치그러나
No longer Gaussian =>”Student’s t distribution” with n-1 d.of f.
그림 p170 그림 4-2 참조
S~
1
ˆ~ˆ
nS
XX
nS
XXT
![Page 15: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/15.jpg)
`pdf of student’s t distribution
Where the gamma heavier tails (n ≥30) n 의 유사 any
= ! integer
1n
2
1)1(
)2
(1
)2
1(
)(2
tf
Tt
T
(.);)1(
)()1( kkk kk k
![Page 16: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/16.jpg)
( 당연히 )confidence interval 이란 ?
interval estimate ( 어떤 확률을 가지고 구간 내에 존재하는 가를 따짐 )q- percent confidence interval (q/100 의 확률을 갖고 ) 신뢰도
)2
1(,1)2()1( p
n
kXX
n
kX
ˆ
![Page 17: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/17.jpg)
• 여 기 서 k 는 q 와 의 pdf 에 의존하는 상수임 .
• k 의 구체적인 값은 p.172 표 .4-1 참조 .
• (q 가 클수록 k 가 커짐 )
x̂
kx
kx xdxxfq )(100 ˆ
![Page 18: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/18.jpg)
• 예 ) q=95% -> • 가 이 구간에 놓일 확률은 0.95 이다 .• 구간이 작을수록 확률이 적어짐• (q=99% 인 경우는 가 동일 구간이 넓어지나 추정에 필요한 정보 효용성은 떨어짐 !)
x̂
x̂
196.10ˆ804.9 x
![Page 19: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/19.jpg)
• 참고 : q from PDF
• 여기서 F 는 Prob. Distribution for Student’s + function
• (See Appendix F or Table 4-2 page 172 for v = 8 )
)()(100 ˆˆ kxFkxFqxx
![Page 20: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/20.jpg)
4.5 Hypothesis Testing
• The question arises; How does one decide to accept or reject a given hypothesis when the sample size and the confidence level are specified?
![Page 21: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/21.jpg)
• Two steps; i) to make some hypothesis about the population
• ii) to determine if the observed sample confirms or rejects this hypothesis.
![Page 22: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/22.jpg)
• Two tests; one-sided or two-sided.
The average life time of the light bulb >= 1000 hours
100ohms resisters too high or too low
![Page 23: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/23.jpg)
One-sided test 경우예 ) A capacitor manufacturer claims
that a mean value of breakdown voltage >= 300 V
• a sample of 100 capacitors– >
• 99% confidence level is used• 문 ) Is the manufacturer’s claim valid?• 답 ) We would reject the hypothesis!
)40,400()~,ˆ( 22 VVsx
![Page 24: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/24.jpg)
Normalized r, v, Z
그런데 99% 의 신뢰수준은
5.2100/40
300290
/
n
Xxz
cz cZZ zdzzfzF 99.0)(1)()(
5.233.2 cz
Vx 300Vs 40~
- 2.5 - 2.33
![Page 25: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/25.jpg)
• 만약 99.5% 신뢰수준이라면– accept the hypothesis
• 신뢰수준이 낮을수록 구간이 좁아지고 가설을 받아들이기에 less likely
• 즉 more severe requirement 제시• 이것은 의미상 모순적으로 느껴짐
5.2575.2 cz
![Page 26: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/26.jpg)
• 이제 유의 수준 (level of significance)으로 재정의하자
• 즉 (100% - 신뢰수준 )• 유의수준이 클수록 more severe!
![Page 27: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/27.jpg)
• 예 ) 계속 sample size=9, • no longer Gaussian -> Student’s + distributi
on
• v=n-1=8 dof• 신뢰수준 99%,
– accept the hypothesis
)40,290( 2
75.0/~
ns
Xxt
75.0896.2 ct
![Page 28: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/28.jpg)
• a small sample size 는 t 를 증가시키고
• heavier tail 을 가지고 있는 t distribution 을 를 감소
more likely to exceed the critical valuesmall size less reliable(less severe) than
large size tests
![Page 29: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/29.jpg)
Two-sided test 경우• 예 ) A manufacture of Zener diodes clai
ms that the true mean breakdown voltage = 10V
• 문 ) hypothesis : the true accepts or rejects?
• 100 samples ->• 95% 신뢰수준
)2.1,3.10( 2VV
![Page 30: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/30.jpg)
• 답 ) Rejected!
• z is outside the interval,
5.2100/2.1
103.10
/
n
Xxz
96.196.1 z
![Page 31: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/31.jpg)
• 문 ) 계속 9 samples
t is inside the interval,
• accepted!– Less severe than a large sample test
75.010/2.1
103.10
/~
ns
Xxt
306.2306.2 t
)2.1,3.10( 2VV
2.5% 2.5%
95%tc=2.306
![Page 32: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/32.jpg)
4.6 Curve Fitting and Linear Regression
• 변수들간의 ( 독립변수와 종속변수 ) 간의 함 수 관 계 를 자 료 를 매 개 체 로 하 여 통계적으로 찾아보는 분석방법 즉 , x 와 y의 관련성을 적절한 회귀방정식을 찾아 알아 보려함 .
• 대개 1 차식 (linear) or 2 차식• 반면 다음 절의 상관분석 (correlation analys
is) 는 x 와 y 의 관련성을 상관계수를 구하여 알아 보려함 .
![Page 33: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/33.jpg)
• 용어– Scatter diagram ( 산점도 ) data 도시
- n samples
nn yyyxxx ,,,,,, 2121
![Page 34: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/34.jpg)
- Curve fitting to find a mathematical relationship regression curve (equation) ; resulting curve
![Page 35: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/35.jpg)
- What is the “best” fit? In a least squares sense
– Let be the errors between the regression curve and the scatter diagram
– 이것을 minimum 으로 하는 미지계수를 정하는 문제임 .
– 먼저 the type of equation to be fitted to the data 를 정하고 미지계수 수가 n 보다 훨씬 작게하면 smoothing 효과 얻음
222
21 n
i
2cxbxay
![Page 36: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/36.jpg)
• Linear regression
• 이 최소가
되도록하는 a, b 는 ?
bxay
n
iii bxayJ
1
2)(
![Page 37: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/37.jpg)
• 해 )
• 연립방정식을 풀면
n
i
n
iii xbany
a
J
1 10
n
i
n
ii
n
iiii xbxayx
b
J
1 1
2
10
![Page 38: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/38.jpg)
2
11
2
111
n
ii
n
ii
n
ii
n
ii
n
iii
xxn
yxyxnb
n
xbya
n
ii
n
ii
11
MATLAB in function, p = polyfit(y, x, n)
![Page 39: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/39.jpg)
• A second-order regression ( 교 재 p.180, 표 4-3, 그림 4-6)
0500.4266540.00334.0 2 TTvB
![Page 40: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/40.jpg)
4.7 Correlation between Two Sets of Data
• Two data sets correlated or not?
nxxx ,,, 21
n
iixn
x1
1
nyyy ,,, 21
n
iiyn
y1
1
![Page 41: Chapter 4. Elements of Statistics # brief introduction to some concepts of statistics # descriptive statistics inductive statistics(statistical inference)](https://reader033.fdocuments.net/reader033/viewer/2022061305/551442925503462d4e8b49f1/html5/thumbnails/41.jpg)
• Linear correlation coefficient“ Pearson’s r ”
Usage ; useful in determining the sources of errors예 ) a point-to-point digital communication link
BER(Bit Error Rate) 로 이 link 의 quality 판단BER may fluctuate randomly due to wind
문 ) error source 는 wind 인가 ?wind 속도 20 개 측정치와 resulting BER 과의 correlation test → r=0.891 충분히 크므로 yes!
1r
Gaussianelyapproximat500)( large;randomalso
)()(
))((
1
2
1
2
1
rnr
yyxx
yyxxr
n
ii
n
ii
n
iii