Chapter 9hosting03.snu.ac.kr/~hokim/int/2014/chap9.pdf · 2014-05-14 · Chapter 9...

Chapter 9

중회귀분석과 상관

Multiple regression & correlation

2014/5/15

9.1 머리말 (intro)

• One Y& k independent variables

1, , kx x

종속변수 (Dependent variable)

독립변수

(Independent variable)

반응변수 (Response variable)

설명변수(explanatory variable) 예측변수(predictor variable)

1, , kx x

9.2 중회귀모형 (Multiple Regression Model)

• 중회귀모형 (model)

• 회귀계수의 의미(Interpreting the coefficients)

e.g. 2 independent var’s

0 1 1 2 2

1, ,~ (0, )j

j j j k kj j

y x x x e

e iid N

Independently & identically distributed

0 1 1 2 2

( : , : :

( :length of hospital stay, :length of hospital stay, previous visit, :age

Y x x e

입원기간 과거입원회수 연령

• 가 0일 때 Y의 기대치 Centering 필요 E(Y|x1=x2=0)

0 1 2[ ( 0)]E Y x x

1, 2x x

1 1 2 1 2

0 1 2 0 1 2

increment of E(Y) corresponding to unit increase of x1 when x2 is fixed

[ ( 1, )] [ ( , )]

( 1) ( )

E y x a x b E y x a x b

a b a b

가 같은 값으로 남아 있을때 이 한 단위 증가할 때 의 기대치 의 증가값

fect adjust y effect

Effect of x1 on Y after controlling the effect of x2

를 한 후의 의 에 대한

가 같은 값으로 남아 있을때 이 한 단위 증가할 때 의 기대치 의 증가값

9.3 중회귀방정식을 얻는 방법 estimating regression coef.

• 정규방정식 (normal equation)

• Estimate which minimize L

0 1 1 2 2

0 1 1 1 2 1 2 1

0 2 1 1 2 2 2 2

j j j j j j

nb b x b x y

b x b x b x x x y

b x b x x b x x y

0 1 2, ,

0 1 1 2 2

j j j jL e y x x

dL dL dL

Homework

• 연습문제 9.3.4

9.4 중회귀방정식의 평가 evaluating regression model

• 중결정계수 (Multiple Coeff. Of Determination)

sum of squares, total=SS explained + SS unexplained

.12... 2

SST SSR SSE

y y SSRR

SSTy y

총변수=설명되는 자승합+설명되지 않는 자승합

ANOVA Table

. ( , - -1, ) then

b N ci i ii

H Not Ho

if V R F k n k reject H

Matrix

1 ( 1) ( 1) 1 1

1 11 21 1 0 1

2 12 22 2 1 2

( ) ( ) 2 ' '

2 ' 2 ' 0

n n k k n

n n n kn k n

y x x x

L y X y X y y X y X X

LX y X X

ˆ 1( ' ) '

X X X y

11 21 1

11 12 1 12 22 2

21 22 2 13 23 3

1 2 1 2

1 1 1 2 1

ˆ ( )

1 1 1 1

k k kn n n kn

j j kj

j j j j j kj

LSE X X X Y

x x x x x x

X Yx x x x x x

x x x x x x

n x x x

x x x x x x

1 2ˆ ˆ( )

kj j kj kj kj j

x x x x x y

Var X X

1 1 1 2

2 1 2 2

0 0 1 0 2

0 1 1 1 2

0 2 1 2 2

00 01 02

01 11 12

02 12 22

when 2

ˆvar( ) cov( , ) cov( , )

ˆcov( , ) var( ) cov( , )

ˆcov( , ) cov( , ) var( )

j j j j

x x x x

b b b b b

x x C C C

00 01 02

01 11 12

02 12 22

0 1 0 ( )

I x x C C C

검정 (Testing)

Hypothesis : 0

Test stat

bi y k ii

If t t n k then H

standard error

reject

9.5 중회귀방정식의 이용

• 특정한 값이 주어졌을 때 Y값의 하부모집단 평균에 대한 신뢰구간

• 특정한 값이 주어졌을 때 얻게 되는 Y값의 예측구간

2 2 2 2

.12 11 1 22 2 12 1 21 2, 1

1ˆ 2y j j j jn ky t s c x c x c x x

Application

Predicting Y for a given X

Estimating the mean of Y for a given X

9.6 질적 독립변수 (Qualitative indep. Var)

• 변수 (variable)

질적변수를 가변수(dummy variable)로 이용 (가변수: (0,1)의 값을 갖는것) 질적변수 k개 범주→k-1개의 가변수 사용

k categories -> k-1 dummy variables

양적(quantitative) 연속 -성적, 연령

Continuous-score, age

질적(qualitative) 범주 – 성별, 인종,직업

Categorical-sex, race, job

가변수의 예 (Examples of dummy var’s)

( , -5 , -5 , )

xotherwise

남자성별 여자

도시

농촌

흡연상태

거주지역

흡연자 금연자 년 내 금연자 연자 년 이상 금연자 비흡연자

흡연자

5년내 금연자

5년 이상 금연자

male female

Residential area (urban, rural, suburban)

Smoking status (current smoker, ex-smoker(<=5yrs), ex-smoker(.5 yrs)

smoker

ex-smoker (<=5 years)

ex-smoker (>5 years)

보기 9.6.1

Case # Birth weight

Gestation (week)

Smk status of the mother

0 1 1 2 2

0 2 1 1

gestation (weeks)

1: ( )

( ) ( )

E Y x x

for nonsmoker

for smoker

same slope ,

출생시 체중(birth weight, grams)

smoker산모의 흡연 smk status of the mother

nonsmoker

임신기간 주

different intercept

2 1 1 1 1

expected diff of birth weights between babies from smokers and nonsmokers

* ( | ) ( | )

E Y X x E Y X x

smoker non - smoker

임신기간이 같다고 할 때 주어진 값에 대해서

어머니가 흡연자인 경우와 어머니가 비흡연자인 경우의 생아의 체중의 차이

2* 5.83 2.0452

: significantally different.

( 330.3975 , 158.6825)

reject H

신뢰구간 (CI)

1smoker

Non-smoker

•If is significant -> slopes are diff btn smoker/nonsmoker

• If is significant -> intercepts are diff

→ not important without centering

0 1 1 2 2 3 1 2

0 2 1 3 1

2 : ( )

( ) ( ) ( )

for nonsmoker

for smoker

different slope , different intercept

model E Y x x x x

Model 2 그림

임신기간

체중

nonsmoker

smoker

38week

•centering

0 1 1 2 2 3 1 2

0 2 1 3 1

( ) ( ) ( )

if x x week

E Y x x x x

fornonsmoker

일때 의 기대치가 된다 (의미 ,관심있는모수)

for smoker

는 = 0일때 기대치의 차이가 아니라 38일때

흡연자와 비흡연자의 기대치의

차이가 된다.

* 교훈 : 연속변수를 centering을 시켜주면 절편이 = 0일때의 기대치가 아니라

= 특정값 일때의 기대치가 되므로 더욱 의미 있게 된다.

* centering의 다른 효과 → x간의 mult - colinearity(공선성)를 약화시켜준다.

E(Y|x1=38)

E(Y|x1=38, smoker) -E(Y|x1=38, non-smoker)

Intercept becomes more meaningful after centering. Multi-colinearity becomes weaker after centering

•예제 9.6.2

effect age Trt

Model- 예제9.6.2

치료효과 (trt effect)

연령 (양적) age ( )

qualitative

quantitative

A치료방법(질적) trt ( )

for trt = A

50 1 1 2 2 3 3 4 1 2 1 3

0 2 1 4 1

50 3 1 1

( ) ( ) ( ) :

Y x x x x x x x

for trt = B

for trt = C0 1 1

( ) :E Y x

intercept & slope for reference cell C

: diff of intercepts (A-C), =0 ?

: diff of intercepts (B-C), =0 ?

: diff of slopes (A-C) , =0 ?

:diff of slopes (B - C), = 0?

예제 9.6.2-mreg.sas

/* File mreg.sas

multiple regression for table

9.6.3 ;*/

data reg;

input effect age method $;

x1=age;x2=(method='A');x3=(me

thod='B');

x12=x1*x2;x13=x1*x3;

cards; 56 21 A

41 23 B

40 30 B

28 19 C

55 28 A

25 23 C

46 33 B

71 67 C

48 42 B

63 33 A

52 33 A

62 56 C

50 45 C

45 43 B

58 38 A

46 37 C

58 43 B

34 27 C

65 43 A

55 45 B

57 48 B

59 47 C

64 48 A

61 53 A

62 58 B

36 29 C

69 53 A

47 29 B

73 58 A

64 66 B

60 67 B

62 63 A

71 59 C

62 51 C

70 67 A

71 63 C

proc reg;

model effect=x1 x2 x3 x12

output out=d p=pred;

id age method;

proc sort;by method;

proc gplot;

plot effect*age=method/

legend;

symbol1 v='A' i=r c=c2

symbol2 v='B' i=r c=c2

symbol3 v='C' i=r c=c2

homework

• 연습문제 9.6.3

9.7 중상관모형 multiple correlation model

j jj k kj

y x x e

모수 확률변수일때

다변량 정규분포일때

와의 상관정도 → 중상관계수

와 가

Both x and y are random variables x and y ~ multivariate normal Multiple correlation can be used to see the correlation between them.

보기9.7.1

Serum cholesterol weight SBP

weight SBP

⇒혈청콜레스테롤은 수축기 혈압

중상관계수

11.61 11.04 0.005

8.7876 1, 099.669

817.876.7437

1.099.669

.7437 .86

1~ ( , 1)

Y cholesterol X X

SSR SST

R n kF F k n k

, 체중과 선형관계가 있다.

There is a significant linear association bwteen sreum cholesterol and (SBP and weight)

예제 9.7.1-mcorr.sas

/* file mcorr.sas

SAS example for Table 9.7.1

data mcorr;

input chol weight sbp;

cards;

162.2 51.0 108

158.0 52.9 111

157.0 56.0 115

155.0 56.5 116

156.0 58.0 117

154.1 60.1 120

169.1 58.0 124

181.0 61.0 127

174.9 59.4 122

180.2 56.1 121

174.0 61.2 125

proc plot;

plot chol*weight chol*sbp

sbp*weight;

proc corr;

var chol weight sbp ;

proc corr;

var chol weight ;

partial sbp ;

proc corr;

var chol sbp ;

partial weight ;

proc corr;

var weight sbp ;

partial chol ;

부분상관계수 (Partial Corr. Coef)

• 다른 변수의 효과를 제어한 상태에서의 관계조사

• Linear association after controlling for other covariates

. . :y

e g r X

를 상수로 하고,

와 과의 상관성을 측정하는 부분상관계수

is a constant (is fixed to a same value),

partial corr. coef. bewteen and

. . : when y

e g r X

∴혈청콜레스테롤치가 일정할 때

수축기 혈압과 체중간에는 유의한 상관관계가 존재한다고 결론을 내린다.

We may conclude there is significant linear association bwteen SBP

0 1.2...

21.2...1.2...

: 012.

11 2 1.948 8.425

21 .948

y ky k

n kt r

and weight

when serum cholesterol is not changing (=after adjsting for cholesterol effect).

homework

• 연습문제 9.7.1

pair-wise plot by R ## pair.r

## put histograms on the diagonal

panel.hist <- function(x, ...)

usr <- par("usr"); on.exit(par(usr))

par(usr = c(usr[1:2], 0, 1.5) )

h <- hist(x, plot = FALSE)

breaks <- h$breaks; nB <- length(breaks)

y <- h$counts; y <- y/max(y)

rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...)

## put (absolute) correlations on the upper panels,

## with size proportional to the correlations.

panel.cor <- function(x, y, digits=2, prefix="", cex.cor, ...)

usr <- par("usr"); on.exit(par(usr))

par(usr = c(0, 1, 0, 1))

r <- abs(cor(x, y))

txt <- format(c(r, 0.123456789), digits=digits)[1]

txt <- paste(prefix, txt, sep="")

if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)

text(0.5, 0.5, txt, cex = cex.cor * r)

?swiss

summary(swiss)

cor(swiss)

pairs(swiss)

pairs(swiss, lower.panel=panel.smooth, upper.panel=panel.cor)

pairs(swiss, lower.panel=panel.smooth, upper.panel=panel.cor, diag.panel=panel.hist)

9.8 variable(model) selection

• Forward selection

• Backward elimination

• Stepwise selection

Mod18.sas /* file : mod18.sas

Multiple Regression Model with

stepwise selection */

Filename electric

'd:\myweb\int\electric.dat';

data peak;

infile electric ;

input housize 1-3 income 6-11

aircapac 14-16 applindx 19-23

family 26-28 peak 31-35 ;

label housize = 'House Size'

income = 'Family Income'

aircapac = 'Air Conditioning

Capacity'

applindx = 'Appliance Index'

family = 'Number of Family

Members'

peak = 'Peak Hour Electric

Load' ;

proc reg data=peak;

model peak = housize income

aircapac applindx family

/selection=stepwise;

title 'Multiple Regression Model with

stepwise selection';

proc reg data=peak outest=est;

model peak = housize income

aircapac applindx family

/selection=rsquare cp

adjrsq mse best=2 ;

title 'Multiple Regression Model with

stepwise selection';

proc print;

title 'Actual Coefficients, etc.';

proc plot;

plot _cp_*_in_ ='C' _p_*_in_='*'/overlay

vaxis= 0 to 25 by 5 haxis=1 to 5

hpos=40 vpos=30;

title;

homework

• 종합문제10- sas로

Chapter 9hosting03.snu.ac.kr/~hokim/int/2014/chap9.pdf · 2014-05-14 · Chapter 9...

Documents

Transcript of Chapter 9hosting03.snu.ac.kr/~hokim/int/2014/chap9.pdf · 2014-05-14 · Chapter 9...

백팩여행백서백팩여행백서 4 목록 머리말 4 1.여행업계발전개요 6 2.전통여행업계단점분석 7 2.1여러차례의신분인증과정으로인한개인정보유출

머리말 - booksr.co.kr°이터베이스... · 10.3 SQL Server의보안및권한관리/ 502 10.3.1SQL Server의보안및권한관리의개요/ 502 10.3.2SQL Server의보안및권한관리실습

Oracle® DIVAdirector - 설치 설명서 · 3 차례 머리말 ..... 5 대상 ..... 5

Chapter 6 다중공선성 - hnuwolfpack.hnu.ac.kr/lecture/Regression/ch6_multicol... · 2011-07-31 · 용하면(viq, piq 상관 관계 매우 높음) mri 회귀계수 추정치는

머리말 - Rodek

ARMA 개념 AR(1) 모형 - HNUwolfpack.hnu.ac.kr/Fall_2011/SKKU/skku_TS2.pdf · 상관 함수 시계열 자료 {Yt}의 상관 함수는 acf, pacf, iacf가 있는데 이는 ARMA 모형

익산지역천주교공소학교와나바위계명학교 „병구.pdf · 2020. 1. 16. · 익산지역천주교공소학교와나바위계명학교 78 1.머리말 한국천주교는이승훈

생명보험회사의시장지위별마케팅경쟁전략 „문자료/re2002-04.pdf · 머리말 과거 지속적인 성장을 누렸던 우리나라 생명보험산업은 새로운

캄보디아국도3번및48번 교통안전개선사업PMC용역 · PDF file294_제19호 유신기술회보_설계사례 1. 머리말 2012년현재캄보디아의도로연장은 44,709km

웨스트민스터 신앙고백서 - cps2014.netcps2014.net/00download/02dogma-ethic/04confession.pdf · -3-머리말 웨스트민스터신앙고백서는1643년영국웨스트민스터

임상연구에서흔히사용 하는의학통계의실제hosting03.snu.ac.kr/~hokim/seminar/allergy20070617.pdf · 2007-06-21 · 임상연구에서흔히사용 하는의학통계의실제

국내 최대의 Digital Market Place · 2019-11-01 · · 어플리케이션 ... · 구매 패턴 분석 · 구매 성향 분석 ... Device에 상관 없이 모든 데이터를 사용자

논문에서발생하기쉬운 통계적오류hosting03.snu.ac.kr/~hokim/seminar/med20041028.pdf · 논문에서발생하기쉬운 통계적오류 김호 서울대학교보건대학원

6. 추 정 (Estimation)hosting03.snu.ac.kr/~hokim/int/2017/chap_6.pdf · 2017-04-10 · 6.1 머리말(Introduction) •통계적추론(statistical inference) –어느모집단으로부터구한표본에서

나와 “상관 있는” 범죄 이야기 @강은경 경찰청 과학수사센터 경감

계산모델: 오토마타와 형식언어(머리말-목차)

SunJavaSystemWebServer7.0 관리자설명서목차 머리말.....19 1 시작하기.....25

SunZFSStorage7120,7320및7420 Appliance …머리말 SunZFSStorage7x20Appliance설치설명서에는OracleSunZFSStorage7120,7320,7420 어플라이언스에대한설치,케이블연결및초기구성설명서가포함되어있습니다.

help.trimblegeospatial.com · F 2 도로 머리말 .............................................................................................................5 머리말

091201 EGEE-OSG Interoperability · 2018. 10. 23. · egee-osg상호연동인프라구축보고서 머리말 본책자는하나의로컬자원을상이한그리드미들웨어를기반으로하는그