Leuven Workshop August 2008

60
The Measurement and Analysis of Complex Traits Everything you didn’t want to know about measuring behavioral and psychological constructs Leuven Workshop August 2008

description

The Measurement and Analysis of Complex Traits Everything you didn’t want to know about measuring behavioral and psychological constructs. Leuven Workshop August 2008. Overview. SEM factor model basics Group differences: - practical Relative merits of factor scores & sum scores - PowerPoint PPT Presentation

Transcript of Leuven Workshop August 2008

Page 1: Leuven Workshop  August 2008

The Measurement and Analysis of Complex Traits

Everything you didn’t want to know about measuring behavioral and psychological constructs

Leuven Workshop August 2008

Page 2: Leuven Workshop  August 2008

Overview

• SEM factor model basics

• Group differences: - practical

• Relative merits of factor scores & sum scores

• Test for normal distribution of factor

• Alternatives to the factor model

• Extensions for multivariate linkage & association

Page 3: Leuven Workshop  August 2008

Structural Equation Model basics• Two kinds of relationships

– Linear regression X -> Y single-headed– Unspecified Covariance X<->Y double-headed

• Four kinds of variable– Squares – observed variables– Circles – latent, not observed variables

– Triangles – constant (zero variance) for specifying means– Diamonds -- observed variables used as moderators (on paths)

Page 4: Leuven Workshop  August 2008

Single Factor Model

1.00

F

S1 S2 S3 Sm

l1 l2 l3lm

e1 e2 e3 e4

Page 5: Leuven Workshop  August 2008

Factor Model with Means

1.00

F1

S1 S2 S3 Sm

l1 l2 l3lm

e1 e2 e3 e4

F2

MF

B8

mF1 mF2

mS1mS2 mS3

mSm

1.00

Page 6: Leuven Workshop  August 2008

Factor model essentials

• Diagram translates directly to algebraic formulae

• Factor typically assumed to be normally distributed: SEM

• Error variance is typically assumed to be normal as well

• May be applied to binary or ordinal data– Threshold model

Page 7: Leuven Workshop  August 2008

What is the best way to measure factors?

• Use a sum score

• Use a factor score

• Use neither - model-fit

Page 8: Leuven Workshop  August 2008

Factor Score Estimation

• Formulae for continuous case– Thompson 1951 (Regression method)– C = LL’ + V– f = (I+J)-1L’V-1x– Where J = L’V-1 L

Page 9: Leuven Workshop  August 2008

Factor Score Estimation

• Formulae for continuous case

– Bartlett 1938

– C = LL’ + V

– fb = J-1L’V-1x

– where J = L’V-1 L

• Neither is suitable for ordinal data

Page 10: Leuven Workshop  August 2008

Estimate factor score by ML

M1 M2 M3 M4 M5 M6

F1

l6l1

Want ML estimate of this

Page 11: Leuven Workshop  August 2008

ML Factor Score Estimation• Marginal approach

• L(f&x) = L(f)L(x|f) (1)• L(f) = pdf(f) • L(x|f) = pdf(x*) • x* ~ N(V,Lf)

• Maximize (1) with respect to f• Repeat for all subjects in sample

– Works for ordinal data too!

Page 12: Leuven Workshop  August 2008

Multifactorial Threshold ModelNormal distribution of liability x. ‘Yes’ when liability x > t

0 1 2 3 4-1-2-3-40

0.1

0.2

0.3

0.4

0.5

x

t

Page 13: Leuven Workshop  August 2008

Item Response Theory - Factor model equivalence

• Normal Ogive IRT Model

• Normal Theory Threshold Factor Model

• Takane & DeLeeuw (1987 Psychometrika)– Same fit– Can transform parameters from one to

the other

Page 14: Leuven Workshop  August 2008

Item Response ProbabilityExample item response probability shown in white

0 1 2 3 4-1-2-3-4

0.1

0.2

0.3

0.4

0.5

.25

.5

.75

1

ResponseProbability

1

.5

.0

Page 15: Leuven Workshop  August 2008

Do groups differ on a measure?

• Observed– Function of observed categorical variable (sex)– Function of observed continuous variable (age)

• Latent– Function of unobserved variable– Usually categorical – Estimate of class membership probability

• Has statistical issues with LRT

Page 16: Leuven Workshop  August 2008

Practical: Find the Difference(s)

Item 1 Item 2 Item 3

1.00

Factor

l1 l2 l3

1Mean

r1 r2 r3

1

mean1 mean2 mean3

Item 1 Item 2 Item 3

Variance

Factor

l1 l2 l3

1Mean

r1 r2 r3

1

mean1 mean2 mean3

Page 17: Leuven Workshop  August 2008

Sequence of MNI testing1. Model fx of covariates on factor mean & variance

2. Model fx of covariates on factor loadings & thresholds

2. Identify which loadings & thresholds are non-invariant

1 beats

2?

Yes Measurement invariance: Sum* or ML scores

No

3. Revise scale

MNI: Compute ML factor scores using covariates

* If factor loadings equal

Page 18: Leuven Workshop  August 2008

Continuous Age as a Moderator in the Factor Model

Stims Tranq MJ

Factor

l1l2

l3

1

0.00

r1 r2 r3

1

mean1

mean2

mean3

Z4

j

Age

1.00

L5

1.00

O5 b

Age

d

V

1.00

W5

k

b - Factor Varianced - Factor Mean

j - Factor Loadings

k - Item meansv - Item variances

[dk and bj confound]

Page 19: Leuven Workshop  August 2008

What is the best way to measure and model variation in my trait?

• Behavioral / Psychological characteristics usually Likert– Might use ipsative?

• What if Measurement Invariance does not hold?– How do we judge:

• Development• GxE interaction• Sex limitation

• Start simple: Finding group differences in mean

Page 20: Leuven Workshop  August 2008

Simulation Study (MK)

• Generate True factor score f ~ N(0,1)• Generate Item Errors ej ~ N(0,1)

• Obtain vector of j item scores sj = L*fj + ej

• Repeat N times to obtain sample• Compute sum score• Estimate factor score by ML

Page 21: Leuven Workshop  August 2008

Two measures of performance

• Reliability

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 22: Leuven Workshop  August 2008

Two measures of performance

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

• Validity

Page 23: Leuven Workshop  August 2008

Simulation parameters

• 10 binary item scale

• Thresholds – [-1.8 -1.35 -0.9 -0.45 0.0 0.45 0.9 1.35 1.8]

• Factor Loadings– [.30 .80 .43 .74 .55 .68 .36 .61 .49]

Page 24: Leuven Workshop  August 2008

Mess up measurement parameters

• Randomly reorder thresholds

• Randomly reorder factor loadings

• Blend reordered estimates with originals 0% - 100% ‘doses’

Page 25: Leuven Workshop  August 2008
Page 26: Leuven Workshop  August 2008
Page 27: Leuven Workshop  August 2008
Page 28: Leuven Workshop  August 2008
Page 29: Leuven Workshop  August 2008

Measurement non-invariance

• Which works better: ML or Sum score?

• Three tests:– SEM - Likelihood ratio test difference in latent factor

mean– ML Factor score t-test– Sum score t-test

Page 30: Leuven Workshop  August 2008

MNI figures

Page 31: Leuven Workshop  August 2008

More Factors: Common Pathway Model

Page 32: Leuven Workshop  August 2008

More Factors: Independent Pathway Model

= 1 = 1 = 1

Independent pathway model is submodel of 3 factor common pathway model

Page 33: Leuven Workshop  August 2008

Example: Fat MZT MZA

Page 34: Leuven Workshop  August 2008

Results of fitting twin

model

Page 35: Leuven Workshop  August 2008

ML A, C, E or P Factor Scores• Compute joint likelihood of data and factor scores

– p(FS,Items) = p(Items|FS)*p(FS)– works for non-normal FS distribution

• Step 1: Estimate parameters of (CP/IP) (Moderated) Factor Model

• Step 2: Maximize likelihood of factor scores for each (family’s) vector of observed scores– Plug in estimates from Step 1

Page 36: Leuven Workshop  August 2008

Business end of FS script

Page 37: Leuven Workshop  August 2008

The guts of it

! Residuals only

Page 38: Leuven Workshop  August 2008

Shell script to FS everyone

Page 39: Leuven Workshop  August 2008

Central Limit TheoremAdditive effects of many small factors

0

1

2

3

1 Gene 3 Genotypes 3 Phenotypes

0

1

2

3

2 Genes 9 Genotypes 5 Phenotypes

01234567

3 Genes 27 Genotypes 7 Phenotypes

0

5

10

15

20

4 Genes 81 Genotypes 9 Phenotypes

Page 40: Leuven Workshop  August 2008

Measurement artifacts

• Few binary items• Most items rarely endorsed (floor effect)• Most items usually endorsed (ceiling effect)

• Items more sensitive at some parts of distribution

• Non-linear models of item-trait relationship

Page 41: Leuven Workshop  August 2008

Assessing the distribution of latent trait• Schmitt et al 2006 MBR method

• N-variate binary item data have 2N possible patterns

• Normal theory factor model predicts pattern frequencies– E.g., high factor loadings but different thresholds– 0 0 0 0– 0 0 0 1 but 0 0 1 0 would be uncommon– 0 0 1 1– 0 1 1 1– 1 1 1 1 1 2 3 4

item threshold

Page 42: Leuven Workshop  August 2008

Latent Trait (Factor) Model

M1 M2 M3 M4 M5 M6

F1

l6l1Discrimination

Difficulty

Use Gaussian quadrature weights to integrate over factor; then relax constraints on weights

Page 43: Leuven Workshop  August 2008

Latent Trait (Factor) Model

M1 M2 M3 M4 M5 M6

F1

l6l1Discrimination

Difficulty

Difference in model fit: LRT~ 2

Page 44: Leuven Workshop  August 2008

Chi-squared test for non-normality performs well

Page 45: Leuven Workshop  August 2008

Detecting latent heterogeneityScatterplot of 2 classes

S1

S2

Mean S1|c1

Mean S1|c2

Mean S2|c1 Mean S2|c2

Page 46: Leuven Workshop  August 2008

Scatterplot of 2 classesCloser means

S1

S2

Mean S1|c1

Mean S1|c2

Mean S2|c1 Mean S2|c2

Page 47: Leuven Workshop  August 2008

Scatterplot of 2 classesLatent heterogeneity: Factors or classes?

S1

S2

Page 48: Leuven Workshop  August 2008

Latent Profile Model

Class 1: p

Class 2: (1-p)

ClassMembership probability

S 1 | c 2 S 2 | c 2 S 3 | c 2 S p | c 2

1

e 1 | c 2

m 1 | c 2

e 2 | c 2

m 2 | c 2

e 3 | c 2

m 3 | c 2

e 4 | c 2

m p | c 2

S 1 | c 1 S 2 | c 1 S 3 | c 1 S p | c 1

1

e 1 | c 1

m 1 | c 1

e 2 | c 1

m 2 | c 1

e 3 | c 1

m 3 | c 1

e 4 | c 1

m p | c 1

Page 49: Leuven Workshop  August 2008

Factor Mixture Model

1.00

F

S1 S2 S3 Sm

l1 l2 l3lm

e1 e2 e3 e4

1.00

F

S1 S2 S3 Sm

l1 l2 l3lm

e1 e2 e3 e4

Class 1: p

Class 2: (1- p)

ClassMembership probability

NB means omitted

Page 50: Leuven Workshop  August 2008

Classes or Traits? A Simulation Study

• Generate data under:– Latent class models– Latent trait models– Factor mixture models

• Fit above 3 models to find best-fitting model– Vary number of factors– Vary number of classes

• See Lubke & Neale Multiv Behav Res (2007 & In press)

Page 51: Leuven Workshop  August 2008

What to do about conditional data

• Two things– Different base rates of “Stem” item– Different correlation between Stem and “Probe”

items

• Use data collected from relatives

Page 52: Leuven Workshop  August 2008

Data from Relatives: Likely failure of conditional independence

STEM P2 P3 P4 P5 P6

F1

l6l1

STEM P2 P3 P4 P5 P6

F2

l6l1

R > 0

Page 53: Leuven Workshop  August 2008

Series of bivariate integrals

m/2 t1i t2i

Π ( (x1, x2) dx1 dx2 )j

j=1 t1 t2

i-1 i-1

-3 -2 -1 0 1 2 3 -3-2-10

1 2 3

0.30.40.5

Can work with p-variate integration, best if p<m “Generalized MML” built into Mx

Page 54: Leuven Workshop  August 2008

Dependence 1Did your use of it cause you physical problems or make you depressed or very nervous?

Consequence: physical & psychological

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-4 -3 -2 -1 0 1 2 3 4

cannabis

cocaine

stimulants

sedatives

opioids

hallucinogens

Page 55: Leuven Workshop  August 2008

Extensions to More Complex Applications

• Endophenotypes

• Linkage Analysis

• Association Analysis

Page 56: Leuven Workshop  August 2008

Basic Linkage (QTL) Model

Q: QTL Additive Genetic F: Family Environment E: Random Environment3 estimated parameters: q, f and e Every sibship may have different model

= p(IBD=2) + .5 p(IBD=1)

F1

P1

F2

P2

11

1

1

Q1

1

Q2

1

E2

1

E1

Pihat

e f q q f e

Page 57: Leuven Workshop  August 2008

Measurement Linkage (QTL) Model

Q: QTL Additive Genetic F: Family Environment E: Random Environment3 estimated parameters: q, f and e Every sibship may have different model

= p(IBD=2) + .5 p(IBD=1)

P1

F2

P2

1

1

1

Q2

1

E2

Pihat

F1

1 1

Q1

1

E1

e f q q f e

M1 M2 M3 M4 M5 M6

l6l1

M1M2M3M4M5M6

l6 l1

F1

1 1

Q1

1

E1

e f q

Page 58: Leuven Workshop  August 2008

Fulker Association Model

Multilevel model for the means

LDL1 LDL2

M

G1 G2

S D

B W

w

-0.50

-1.00

RR

C

0.75

1.00

1.00

1.00

0.50

0.50

0.50

b

Geno1 Geno2

m m

Page 59: Leuven Workshop  August 2008

Measurement Fulker Association Model (SM)M

G1 G2

S D

B W

w

-0.50

-1.001.00

1.00

1.00

0.50

0.50

0.50

b

Geno1 Geno2

m m

F2

M1M2M3M4M5M6

l6 l1

M1 M2 M3 M4 M5 M6

l6l1

F1

M

G1

G2

S D

B W

w

-0.50

-1.00

1.00

1.00

1.00

0.50

0.50

0.50

b

Geno1

Geno2

m m

Page 60: Leuven Workshop  August 2008

Multivariate Linkage & Association Analyses

• Computationally burdensome• Distribution of test statistics questionable• Permutation testing possible

– Even heavier burden• Potential to refine both assessment and genetic models• Lots of long & wide datasets on the way

– Dense repeated measures EMA– fMRI– Need to improve software! Open source Mx