서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf ·...

46
다수준 분석의 통계적 모형 서울대학교 보건대학원

Transcript of 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf ·...

Page 1: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

다수준 분석의 통계적 모형

서울대학교 보건대학원

김 호

Page 2: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Nested Data • Organizations are inherently hierarchical

– Individuals in groups/teams

– Groups/teams in departments

– Departments in organizations

• Why the nested nature of data matters – Necessitates a need to match theory with data

– Allows one to model contextual effects

– Makes it important to correctly specify statistical models

• Another Option => Hierarchical Bayesian Models

Page 3: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Match Theory & Data

• Proper alignment of level of analysis and level of inference is crucial

• Misalignment can lead to incorrect inferences – Ecological Fallacy (using aggregate-level

results to make inferences about individuals)

– Atomistic Fallacy (using individual-level results to make inferences about groups) • Diez-Roux (1998)

Page 4: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Match Theory & Data

0123

45

678910111213141516

Work Hours

Well-Being

Raw Correlation = -.17

Individual-level correlation between work hours and

well-being is low (n=7,382, Bliese & Halverson 1996)

Page 5: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Match Theory & Data

Unweighted group-level correlation = -.67

Group-level analysis based on group means (n=99)

Average Work Hours

Average Well-Being

10111213

2.22.6

3.0

Page 6: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Match Theory & Data

• Which correlation is right?

– Group-level correlation is right if we want to make inferences about groups

– Individual-level correlation is right if we want to make inferences about individuals

– Mismatch results in a fallacy (ecological or atomistic)

Page 7: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Match Theory & Data

• Different results at different levels of analysis raise questions about the link between lower-level and aggregate variables.

• When results differ across levels, lower-level and aggregate variables are non-isomorphic – Individual work hour preferences versus

externally mandated work requirements for the group

– Individual well-being versus shared strain

Page 8: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Match Theory & Data

• Aggregation models force one to describe the link between lower-level and aggregate variables – See James (1982); Chan (1998); Klein &

Kozlowski (2000); Bliese (2000)

• Agreement and reliability indices used to validate the aggregation model – Intraclass correlation coefficients (ICCs)

indicate degree of non-isomorphism (Bliese, 2000)

Page 9: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Match Theory & Data

• Why is it important to model hierarchical nature of data? (Summary of Point 1)

– To ensure that data and theory are aligned

– To avoid inferential fallacies

• Key theoretical and analysis points raised

– Need to specify aggregation model

– Need to test this aggregation model (within-group agreement, reliability or both)

Page 10: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Model Context

• Second reason why it is important to model the hierarchical nature of data is to explicitly account for contextual effects.

• Often organizational analyses implicitly assume that group membership does not matter. That is, analyses treat individuals as though they are unaffected by social contexts.

• By modeling the hierarchical nature of data we explore the possibility that social context matters.

Page 11: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Model Context • For instance consider the typical work stress

paradigm (note that social context is missing)

Stressors

•Family Problems

•Work overload

•Lack of Purpose

•Role Conflict

Individual Moderators

•Personality Hardiness

•Job Involvement

•Self Efficacy

Strains

•Morale

•Well being

Page 12: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Model Context • Versus a model with social context (Bliese

& Jex, 2002; Bliese, Jex & Halverson, 2002)

Stressors

•Family Problems

•Work overload

•Lack of Purpose

•Role Conflict

Individual Moderators

•Personality Hardiness

•Job Involvement

•Self Efficacy

Strains

•Morale

•Well being

Contextual Moderators

•Cohesion

•Leadership

•Collective Efficacy

Page 13: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Model Context

• Example 1 (Jex & Bliese, 1999)

2.0

2.2

2.4

2.6

2.8

3.0

3.2

3.4

3.6

3.8

4.0

Low Work Overload High Work Overload

Jo

b S

ati

sfa

cti

on

Low Collective Efficacy

High Collective Efficacy

Page 14: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Model Context and Time

• Modeling dynamic processes over time is a key variation on modeling context.

• Individual is the higher-level variable and the repeated observations from the individual are the lower-level observations.

• Goal is to model how attributes of the individual relate to different initial starting states and changes over time.

Page 15: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Model Context and Time

1

2

3

4

5

SSN

0. 0 0. 5 1. 0 1. 5 2. 0

SSN SSN

0. 0 0. 5 1. 0 1. 5 2. 0

SSN SSN

0. 0 0. 5 1. 0 1. 5 2. 0

SSN SSN

0. 0 0. 5 1. 0 1. 5 2. 0

SSN SSN SSN SSN SSN SSN

1

2

3

4

5

SSN1

2

3

4

5

SSN SSN SSN SSN SSN SSN SSN

SSN SSN

0. 0 0. 5 1. 0 1. 5 2. 0

SSN SSN

0. 0 0. 5 1. 0 1. 5 2. 0

SSN SSN

0. 0 0. 5 1. 0 1. 5 2. 0

1

2

3

4

5

SSN

Time (in six-month intervals)

Jo

b S

ati

sfa

ctio

n

Increased Slope

Decreased Slope

Page 16: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Model Context

• Why is it important to model hierarchical nature of data? (Summary of Point 2) – It is important to model hierarchical nature of

data to understand how context affect relationships

– Enrich organizational models

• Key theoretical and analysis points raised – How do we model context?

• Contextual Analysis

• Random Coefficient modeling (e.g., HLM)

• Growth Modeling

Page 17: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Account for variance

• Third reason why it is important to model the hierarchical nature of data is to correctly account for sources of variance

• Variance across levels can: – Affect form of relationships

– And also (Kenny & Judd, 1986): • Affect one’s standard error estimates which then

affects

• One’s t-values which then affects

• One’s conclusions about what is and is not significant

Page 18: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

X

Y

0 2 4 6

02

46

Intro: Account for variance • Example 1 from Bliese (2002): two groups and

pooled regression

Page 19: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Specify variance

• Bliese & Hanges (2002) simulation

ICC value

for X

ICC value

for Y

RC

Parameter

Estimate

RC

Standard

Error

OLS

Parameter

Estimate

OLS

Standard

Error

Ratio of OLS

t-value to RC

t-value

0.010 0.301 0.299 0.019 0.299 0.023 0.865

0.051 0.297 0.301 0.019 0.301 0.021 0.926

0.095 0.293 0.303 0.019 0.303 0.020 0.954

0.148 0.295 0.300 0.019 0.300 0.020 0.968

0.198 0.297 0.300 0.019 0.300 0.020 0.975

0.256 0.301 0.300 0.019 0.300 0.020 0.981

Page 20: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Specify variance

• Why is it important to model hierarchical nature of data? (Summary of Point 3) – It is important to model hierarchical nature of

data to account for known sources of variance

– Improve one’s statistical models

• Key theoretical and analysis points raised – How do we account for sources of variation

• ANCOVA models; Split-Plot ANOVA models

• Random Coefficient modeling (e.g., HLM)

Page 21: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Intro: Overall Summary

• Why are multilevel models important?

– Match theory and data

• Issues: Specify aggregation models

– Model contextual effects and time

• Issues: multilevel data analysis methods

– Correctly specify statistical models

• Issues: multilevel data analysis methods

Page 22: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

•Statistical Computing Seminar Introduction to Multilevel Modeling Using SAS

•This seminar is based on the paper Using SAS Proc Mixed to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models by Judith Singer and can be downloaded from Professor Singer's web site at http://gseweb.harvard.edu/~faculty/singer/Papers/sasprocmixed.pdf .

Page 23: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

• modeling organizational research;

• students nested within classes, children nested within families, patients nested within hospitals;

•Model 1: Unconditional Means Model

•Model 2: Including Effects of School Level (level 2) Predictors

•Model 3: Including Effects of Student-Level Predictors

•Model 4: Including Both Level-1 and Level-2 Predictors

Page 24: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

• School Effect Model

SCHOOL MATHACH SES MEANSES SECTOR

1296 6.588 -0.178 -0.420 0

1296 11.026 0.392 -0.420 0

1296 7.095 -0.358 -0.420 0

…..

……

7185 students nested in 160 schools.

The outcome variable of interest is student-level math achievement score (MATHACH).

Variable SES is social-economic-status of a student and therefore is a student-level variable.

Page 25: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

• School Effect Model

Variable MEANSES is the group mean of SES and therefore is a school-level variable.

Both SES and MEANSES are centered at the grand mean (they both have means of 0).

Variable SECTOR is an indicator variable indicating if a school is public or catholic and is therefore a school-level variable. There are 90 public schools (SECTOR=0) and 70 catholic schools (SECTOR=1) in the sample.

Page 26: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

• Multilevel data often forms some sort of

hierarchical structure (hence the term

‘multilevel modeling’).

• Students within schools:

level 2

level 1

Page 27: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 0:simple regression

proc mixed data = in.hsb12; ;

model mathach = SES / solution;

run;

Traditional Regression:

•Equal Slopes

•Equal Intercepts

ses

Page 28: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Centering의 중요성

Y=b0+ b1 X + e E(Y|X=0)= b0

Y=b0+ b1(X-m) + e E(Y|X=m)= b0

• 절편은 설명변수들이 다 0일때 Y의 기대치 (큰 의미가 없음)

• X대신에 X-Mean(X)를 사용하면 절편은 x가 평균값을 가질 때 Y의 기대치

• 모든 연속형 설명변수는 centering 해 주는 것이 바람직함

• 또 다른 장점: 다중 공선성을 완화시킴

Page 29: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 1: Unconditional Means Model

• This model is referred as a one-way ANOVA with random effects and is the simplest possible random effect linear model.

• The motivation for this model is the question on how much schools vary in their mean mathematics achievement. In terms of regression equations, we have the following, where rij ~ N(0, σ2) and u0j ~ N(0, τ2), MATHACHij = β0j + rij β0j = γ00 + u0j

Page 30: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 1: Unconditional Means Model

Combining the two equations into one by substituting the level-2 equation to level-1 equation, we have MATHACHij = γ00 + u0j + rij

proc mixed data = in.hsb12 covtest noclprint; class school;

model mathach = / solution;

random intercept / subject = school;

run;

Page 31: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 1: Unconditional Means Model

Combining the two equations into one by substituting the level-2 equation to level-1 equation, we have MATHACHij = γ00 + u0j + rij

proc mixed data = in.hsb12 covtest noclprint; class school;

model mathach = / solution;

random intercept / subject = school;

run; Intercept은 default

회귀계수 추정

Var-cov 모수 출력

학교마다 다른 절편

Page 32: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

Intercept SCHOOL 8.6097 1.0778 7.99 <.0001

Residual 39.1487 0.6607 59.26 <.0001

Fit Statistics

-2 Res Log Likelihood 47116.8

AIC (smaller is better) 47120.8

AICC (smaller is better) 47120.8

BIC (smaller is better) 47126.9

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 12.6370 0.2443 159 51.72 <.0001

σ2 τ2 추정치

Intraclass corr = 8.6097/(8,6097+39.1487) =0.1803

the plausible values range for these means, 12.637 +-1.96*(8.61)1/2 = (6.89, 18.39).

Page 33: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 2: Including Effects of School Level (level 2) Predictors -- predicting

mathach from meanses • This model is referred as regression with

Means-as-Outcomes by Raudenbush and Bryk. The motivation of this model is the question on if the schools with high MEANSES also have high math achievement. In other words, we want to understand why there is a school difference on mathematics achievement. In terms of regression equations, we have the following.

Page 34: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

proc mixed data = in.hsb12 covtest noclprint;

class school;

model mathach = meanses / solution ddfm = bw;

random intercept / subject = school;

run;

Meanses

자료가 unbalance일 때 필요하게 된다.

MATHACHij = β0j + rij β0j = γ00 + γ01(MEANSES) + u0j

Combining the two equations into one by substituting the level-2 equation to level-1 equation, we have

MATHACHij = γ00 + γ01(MEANSES) + u0j + rij

Page 35: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

Intercept SCHOOL 2.6357 0.4036 6.53 <.0001

Residual 39.1578 0.6608 59.26 <.0001

Fit Statistics

-2 Res Log Likelihood 46961.3

AIC (smaller is better) 46965.3

AICC (smaller is better) 46965.3

BIC (smaller is better) 46971.4

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 12.6495 0.1492 158 84.77 <.0001

MEANSES 5.8635 0.3613 158 16.23 <.0001

Type 3 Tests of Fixed Effects

Num Den

MeanSes가 평균일 때 Y의 기대치

큰 감소 (앞에서는 8.7): meanses가 school-to-school variation을 많이 설명한다. (8.6097-2.6357)/8.6097=69%, Y의 school-to-school variation의 69%를 meanses가 설명한다.

A range of plausible values for school means, given that all schools have MEANSES of zero, is 12.65 +-1.96 *(2.64)1/2 = (9.47, 15.83).

Page 36: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

Intercept SCHOOL 2.6357 0.4036 6.53 <.0001

Residual 39.1578 0.6608 59.26 <.0001

Fit Statistics

-2 Res Log Likelihood 46961.3

AIC (smaller is better) 46965.3

AICC (smaller is better) 46965.3

BIC (smaller is better) 46971.4

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 12.6495 0.1492 158 84.77 <.0001

MEANSES 5.8635 0.3613 158 16.23 <.0001

Type 3 Tests of Fixed Effects

Num Den

대단히 유의: meanses로 통제하더라고 학교간의 변동은 여전히 남아있다.

Page 37: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 3: Including Effects of Student-Level Predictors--predicting mathach from centered student-level ses, cses

This model is referred as a random-coefficient model by Raudenbush and Bryk. Pretend that we run regression of mathach on centered ses on each school, that is we are going to run 160 regressions.

What would be the average of the 160 regression equations (both intercept and slope)?

How much do the regression equations vary from school to school?

What is the correlation between the intercepts and slopes?

Page 38: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 3: Including Effects of Student-Level Predictors--predicting mathach from centered student-level ses, cses

MATHACHij = β0j + β1j (SES - MEANSES) + rij β0j = γ00 + u0j

β1j = γ10 + u1j

Combining the two equations into one by substituting the level-2 equation to level-1 equation, we have

MATHACHij = γ00 + γ10(SES - MEANSES) + u0j + u1j(SES - MEANSES) + rij

data hsbc;

set in.hsb12;

cses = ses - meanses; run;

proc mixed data = hsbc noclprint covtest noitprint;

class school;

model mathach = cses / solution ddfm = bw notest;

random intercept cses / subject = school type = un gcorr;

run;

cses

Cses의 효과가 학교간에 다르다고 허용

Page 39: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Estimated G Correlation Matrix

Row Effect SCHOOL Col1 Col2

1 Intercept 1224 1.0000 0.02068

2 cses 1224 0.02068 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) SCHOOL 8.6769 1.0786 8.04 <.0001

UN(2,1) SCHOOL 0.05075 0.4062 0.12 0.9006

UN(2,2) SCHOOL 0.6940 0.2808 2.47 0.0067

Residual 36.7006 0.6258 58.65 <.0001

Fit Statistics

-2 Res Log Likelihood 46714.2

AIC (smaller is better) 46722.2

AICC (smaller is better) 46722.2

BIC (smaller is better) 46734.5

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

3 1065.70 <.0001

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 12.6493 0.2445 159 51.75 <.0001

cses 2.1932 0.1283 7024 17.10 <.0001

학교간에 기울기가 다르다고 결론

The 95% plausible value range for the school means is 12.65+-1.96* (8.68)1/2 = (6.87, 18.41).

The 95% plausible value range for the SES-achievement slope is 2.19 +-1.96 *(.69)1/2 = (.56, 3.82).

Page 40: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) SCHOOL 8.6769 1.0786 8.04 <.0001

UN(2,1) SCHOOL 0.05075 0.4062 0.12 0.9006

UN(2,2) SCHOOL 0.6940 0.2808 2.47 0.0067

Residual 36.7006 0.6258 58.65 <.0001

(39.15-36.70)/39.15=6.3%,

Student level SES 사용하는 것이 within school variance를 6.3% 감소시킨다.

Page 41: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Model 4: Including Both Level-1 and Level-2 Predictors --predicting mathach from

meanses, sector, cses and the cross level interaction of meanses and sector with cses

MATHACHij = β0j + β1j (SES - MEANSES) + rij β0j = γ00 + γ01(MEANSES) + γ02(SECTOR) + u0j

β1j = γ10 + γ11(MEANSES) + γ12(SECTOR) + u1j

Combining the two equations into one by substituting the level-2 equation to level-1 equation, we have

MATHACHij = γ00 + γ01(MEANSES) + γ02(SECTOR) + γ10 (SES - MEANSES) + γ11(MEANSES)* (SES - MEANSES) + γ12(SECTOR)* (SES - MEANSES) + u0j +u1j(SES-MEANSES) + rij

The questions that we are interested in are:

Page 42: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

1. Do MEANSES and SECTOR significantly predict the intercept?

2. Do MEANSES and SECTOR significantly predict the within-school slopes?

3. How much variation in the intercepts and the slopes is explained by MEANSES and SECTOR?

proc mixed data = hsbc noclprint covtest oitprint; class school;

model mathach = meanses sector cses meanses*cses sector*cses / solution ddfm = bw notest;

random intercept cses/ subject=school type = un; run;

Page 43: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 12.1136 0.1988 157 60.93 <.0001

MEANSES 5.3391 0.3693 157 14.46 <.0001

SECTOR 1.2167 0.3064 157 3.97 0.0001

cses 2.9388 0.1551 7022 18.95 <.0001

MEANSES*cses 1.0389 0.2989 7022 3.48 0.0005

SECTOR*cses -1.6426 0.2398 7022 -6.85 <.0001

다른 변수들로 보정했을 깨, Y와 양의 관계

Catholic schools have higher Y

MEANSErk 높을 수록 cses의 기울기가 크다

Catholic School에서 cses의 기울기가 작다

(Catholic School에서 ses가 수학실력에 미치는 영향이 작다: 즉 교육효과가 SES와의 상관이 적다 (더 좋은 교육?) 추가분석!

Page 44: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

data toplot; set hsbc;

if meanses <= -0.317 then do;

ms = -0.317;

strata = "Low"; end;

else if meanses >= 0.333 then do;

ms = 0.333;

strata = "Hig"; end;

else do; ms = 0.038; strata = "Med" ; end;

predicted = 12.1136 + 5.3391*ms * 1.2167*sector + 2.9388*cses +

1.0389*ms*cses - 1.6426*sector*cses;

run;

proc sort data = toplot; by strata; run;

goptions reset = all;

symbol1 v = none i = join c = red ;

symbol2 v = none i = join c = blue ;

axis1 order = (-4 to 3 by 1) minor = none label=("Group Centered SES");

axis2 order = (0 to 22 by 2) minor = none label=(a = 90 "Math Achievement Score");

proc gplot data = toplot;

by strata; plot predicted*cses = sector / vaxis = axis2 haxis = axis1;

run; quit;

proc univariate data = hsbc;

var meanses;

run;

/*

90% 0.523

75% Q3 0.333

50% Median 0.038

25% Q1 -0.317

10% -0.579

5% -0.696

1% -1.043

0% Min -1.188

*/

Page 45: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model
Page 46: 서울대학교 보건대학원 김 호hosting03.snu.ac.kr/~hokim/seminar/multilevel20121107.pdf · 2012-11-04 · Intro: Model Context •Second reason why it is important to model

감사합니다

김호

서울대학교 보건대학원

[email protected]

http://plaza.snu.ac.kr/~hokim -> 열린강의실