SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM...

63
Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion SEM 1: Confirmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017

Transcript of SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM...

Page 1: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

SEM 1: Confirmatory Factor AnalysisWeek 1 - Common Cause Modeling

Sacha Epskamp

03-04-2017

Page 2: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Factor Analysis

• Many research questions in psychology relate to psychologicalconstructs

• Intelligence, ability, personality traits, disorders, types, etcetera

• To use these in research, they must first be measured• Questionnaires, tests

• Factor analysis allows us to evaluate how well a test measurespresumed underlying traits

• Exploratory Factor Analysis (EFA): no hypothesis onunderlying causal structure

• Confirmatory Factor Analysis (CFA): Able to test hypotheses

• Allows also to test for measurement invariance• Does the test measure the same thing in multiple groups?

Page 3: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Structural Equation Modeling

• Factor analysis is part of the more general framework ofStructural Equation Modeling (SEM)

• SEM allows for testing of causal relationships between bothobserved and latent variables

• We will discuss SEM in SEM 2!

Page 4: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Today

• Course outline

• Short stats recap

• CFA modeling• Very technical, but covers the most important concepts of CFA• Next weeks will be more practical and conceptual!

Page 5: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Introduction Round

• Who are you?

• What is your specialization?

• Why did you choose SEM 1?

• What do you expect from SEM 1?

• Will you do SEM 2?

Page 6: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

The SEM 1 team

• Sacha Epskamp ([email protected])

• Gaby Lunansky ([email protected])

• Eiko Fried ([email protected])

Page 7: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Schedule

Week 1 – Introduction to common cause modeling

• Monday April 3: Lecture

• Wednesday April 5 - Practical

Week 2 – Fitting and modifying CFA models

• Monday April 10 - Lecture

• Wednesday April 12 Practical

Week 3 – Measurement invariance, assumptions and power

• Wednesday April 19 - Lecture

Week 4 – Advanced CFA models

• Monday April 24 - Lecture

• Wednesday April 26 Practical

Week 5 – Presentations

• Monday May 1

Page 8: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Course Grading

Syllabus on Blackboard!

• Individual assignments: 40%• Every week one assignment worth 10%

• Group project: 30%• Pre-data report, post-data report and presentation (final week)

• Individual project: 30%• Report due in final week

NO EXAM!

Page 9: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Individual AssignmentsEach week, the assignment will be made available 11:00 onWednesday, and will be due 11:00 the next Wednesday. Eachassignment will contribute to 10% of your grade.

• Work on the assignments alone.

• Hand in a PDF file and an .R file (in case R was used). If youuse Jasp, hand in the Jasp object as well as a screenshot ofthe options used.

• Make sure your PDF report is as standalone readable aspossible. E.g., if you are asked to report a factor loadingmatrix, then report it in the PDF and not just say “look at .Rfile”.

• Assignments are due before 11:00. If you do not hand in anassignment before 11:00, you will get a 1.

• If you encounter any problems, or have any feedback, pleaselet me know before the deadline, as then I can take it intoaccount or help you.

Page 10: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Group Project

In this project, you will be asked to go through the entire empiricalcycle of performing a confirmatory factor analysis. The groupproject will contribute to 30% of your final grade. Form groups of4-5 people (ideally at most 6 groups).

• Step 1: Research setup

• Step 2: Research hypothesis• Pre-data report (5%)

• Step 3: Data collection

• Step 4: Data analysis

• Step 5: Present your findings• Presentation (5%)• Post-data report (20%)

You have to justify each member’s contributions in the final report.See syllabus for details!

Page 11: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Individual Project

• Find a paper that estimated a CFA model and reported thedata

• Analysis must not have originally been performed in Lavaan,Jasp or Onyx

• Ideally a slightly complicated analysis (many factors, higherorder model, bifactor model, latent growth model ormeasurement invariance tests)

• Replicate the analysis in Lavaan, Onyx or Jasp

• Write a report on your findings (30% of your grade), criticallydiscussing the original paper. Did you replicate their findings?Did the original authors fail to mention important details? Tryto include as many steps as possible as recommended byBrown

See syllabus for details!

Page 12: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Practicals

The practicals will have three goals:

• Work on group project

• Work on individual project

• Discuss assignments

Your attendance is not required for lectures or practicals (yourgroup might think differently about this though). I recommend youto start forming groups as soon as possible, and use the firstpractical to setup your questionnaire and work on the pre-datareport. Start collecting data as soon as possible.

Page 13: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Book

• Course roughly follows thebook

• Book is not required for thecourse

• Reference material

• Some differences in notationand computation as to book

• We use different software asin the book

Page 14: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Presumed Prior Knowledge

• Undergraduate level statistics• Linear regression• Correlations / covariances• Normal distribution

• Matrix algebra• Matrix multiplication!

• R• Although you are allowed to work in Jasp (essentially GUI

around the R package we will use) or Onyx (a graphical SEMprogram), knowing R is highly reccomended

If you lack prior knowledge, you are expected to try and catch upon these topics! I can send you some materials if needed

Page 15: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Scalars, Vectors and MatricesNormal faced letters indicate scalars (just a single number),boldfaced letters indicate column vectors, and UPPERCASEBOLDFACED letters indicate matrices. If a scalar is an elementof a matrix, the first subscript will denote the row and the secondsubscript the column. If a vector is part of a matrix, the subscriptwill denote either which row or column (the vector is alwaystransformed to a column-vector).

YYY =

1 2 34 5 67 8 9

10 11 12

yyy2 =

456

y23 = 6

Page 16: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Roman and Greek letters

A Roman letter indicates something that is observed (e.g.,observed scores yyy i of subject i) or specified (e.g., sample size n).A Greek letter indicates something that is not observed, e.g., latentvariable scores ηηηi or factor loadings matrix ΛΛΛ.

Page 17: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Page 18: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Random variables and parameters

Subscript i will denote the case, the row in your data set. Oftenthis will correspond to a particular subject, and thus we will usethe term subject often to denote cases.

• A variable with subscript i can differ per case. These arecalled random variables.

• e.g., observed responses yyy i , latent variables ηηηi or residual εεεi

• I will omit subscript i from random variables when I describetheir distribution.

• e.g., yyy ∼ N(...), indicating that the distribution of all subjectsresponse patterns follow a normal distribution

• Letters without subscript i to begin with do not differ percase. These are called parameters. They have no distribution(no variance across people)

• e.g., factor loadings ΛΛΛ, variance–covariance matrices ΣΣΣ, ΨΨΨ andΘΘΘ

Page 19: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Transpose

The transpose operator, >, switches rows and columns:

YYY =

1 2 34 5 67 8 9

10 11 12

YYY> =

1 4 7 102 5 8 113 6 9 12

yyy>2 =

[4 5 6

]

Page 20: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Matrix Multiplication

If AAABBB = CCC , with AAA being a x × y matrix and BBB being a y × zmatrix, then CCC is a x × z matrix with following elements:

cij =

y∑k=1

aikbkj

For example:4 5 82 1 73 6 9

5 1 94 3 62 8 7

=

56 83 12228 61 7357 93 126

In R, use %*% to compute this!

Page 21: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Matrix Multiplication

Source: Wikipedia

Page 22: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Multivariate Normal

We assume data are multivariate normally distributed:

yyy i ∼ N(µµµ,ΣΣΣ)

• yyy i is a vector of item responses (e.g., answers to “are youfatigued?”, “are you concentrating well?”) of person i

• µµµ is a vector of means

• ΣΣΣ is a variance–covariance matrix, standardizable to acorrelation matrix.

Page 23: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

source: Wikipedia

Page 24: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Example data of n = 10 subjects:

YYY =

yyy>1yyy>2yyy>3yyy>4yyy>5yyy>6yyy>7yyy>8yyy>9yyy>10

=

4.09 4.94 4.582.48 4.27 5.444.93 4.24 6.242.17 5.36 5.793.39 4.62 4.811.41 3.74 5.212.87 2.60 5.172.62 2.37 4.553.49 3.74 6.754.10 5.69 3.00

Page 25: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

MeansThe center of the normal distribution of variable yj is controlled byits mean µj :

Page 26: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Means

Means per column:

yj =1

n

n∑p=1

yij ,

in which yij indicates row i , column j of data matrix YYY . Combiningthese values in a vector yyy we get:

yyy> =[3.16 4.16 5.16

]In R: colMeans(). This is an unbiased estimator of µµµ:

E (yyy) = µµµ

Page 27: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Diagonal elements of ΣΣΣ correspond to variances:

Var(yj) = σ2j = σjj

It’s square root equals the standard deviation, which usually isinterpreted:

SD(yj) =√

Var(yj) = σj

This parameter controls the width of the normal distribution. E.g.,95% of the probability mass is contained in(µj − 1.96σj , µj + 1.96σj)

Page 28: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Covariances encode association, a positive covariance encodes thatwhen one variable is above (below) its mean, the other variableprobably also is above (below) its mean. Negative covariancesencode that whenever one variable is above (below) its mean, theother variable is likely below (above) its mean:

Page 29: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Covariances can be standardized to correlations, which are alwaysbetween −1 and 1.

Page 30: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Variance–covariance matrix

Computing the variance–covariance matrix:

s∗jk =

∑ni=1(yij − yj)(yik − yk)

n − 1,

Which becomes:

SSS∗ =

1.11 0.35 −0.090.35 1.18 −0.29−0.09 −0.29 1.07

In R: cov(). This is an unbiased estimator of ΣΣΣ:

E (SSS∗) = Σ

Page 31: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Variance–covariance matrix

Dividing by n − 1 leads to an unbiased estimator, but not themaximum likelihood estimate! This is obtained by dividing by n:

sjk =

∑ni=1(yij − yj)(yik − yk)

n,

The R function cov() returns the unbiased estimator (divided byn − 1), to obtain the ML solution:

SSS =n − 1

nSSS∗

Lavaan, the software we use, automatically applies this correction!So you use SSS∗ as input, but SSS in computations.

Page 32: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Correlation matrix

Standardizing to correlations:

rjk =sjk√

sjj√skk

,

which becomes

RRR =

1.00 0.31 −0.080.31 1.00 −0.26−0.08 −0.26 1.00

.In R: cov2cor() (or simply use cor() on data)

Page 33: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Psychometrics

The field of science concerned with measurement of psychologicalconstructs.For example:

• Personality traits

• Cognitive skills

• Ability

• Psychopathological disorders

• Types

Page 34: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

PsychometricsThe field of science concerned with measurement of psychologicalconstructs.

For example:

• Personality traits

• Cognitive skills

• Ability

• Psychopathological disorders

• Types

Page 35: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

PsychometricsThe field of science concerned with measurement of psychologicalconstructs.For example:

• Personality traits

• Cognitive skills

• Ability

• Psychopathological disorders

• Types

Page 36: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

How do you measure temperature?

Page 37: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

How do you measuretemperature?

• By looking at a thermometer

• For this to make sense, weneed to assume that:

• Temperature causes thelevel shown in athermometer

• The thermometer featuresrelatively littlemeasurement error

Page 38: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

We can summarize a causal hypothesis in a path diagram:

ε

• Circular nodes (or suns): latent (unobserved) variables

• Square nodes: observed variables

• Also termed indicators of a latent variable

• Unidirectional links: causal effects

• Bidirectional links: (co)variances

Page 39: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

We can summarize a causal hypothesis in a path diagram:

ε

• Circular nodes (or suns): latent (unobserved) variables

• Square nodes: observed variables

• Also termed indicators of a latent variable

• Unidirectional links: causal effects

• Bidirectional links: (co)variances

Page 40: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

We can summarize a causal hypothesis in a path diagram:

ε

• Circular nodes (or suns): latent (unobserved) variables

• Square nodes: observed variables• Also termed indicators of a latent variable

• Unidirectional links: causal effects

• Bidirectional links: (co)variances

Page 41: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

We can summarize a causal hypothesis in a path diagram:

ε

• Circular nodes (or suns): latent (unobserved) variables

• Square nodes: observed variables• Also termed indicators of a latent variable

• Unidirectional links: causal effects

• Bidirectional links: (co)variances

Page 42: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

We can summarize a causal hypothesis in a path diagram:

ε

• Circular nodes (or suns): latent (unobserved) variables

• Square nodes: observed variables• Also termed indicators of a latent variable

• Unidirectional links: causal effects

• Bidirectional links: (co)variances

Page 43: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Assume all observed and latent variables are normally distributedand all causal effects are linear. Without loss of information, wecan center data and assume all means are 0 (we don’t use meansuntil week 3). Now, the path diagram encodes a causal equation:

λ11 1 θ11ψ11 η1 y1 ε1

yi1 = λ11ηi1 + εi1

η1 ∼ N(0,√ψ11)

ε1 ∼ N(0,√θ11)

λ11 is called a factor loading, εi1the residual variance and ψ11 thefactor variance.

Page 44: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

λ11 1 θ11ψ11 η1 y1 ε1

yi1 = λ11ηi1 + εi1

η1 ∼ N(0,√ψ11)

ε1 ∼ N(0,√θ11)

Variance of y1:

Var(y1) = λ211ψ11 + θ11

How much variance does thelatent variable explain?

λ211ψ11

λ211ψ11 + θ11

Page 45: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1 1 θ11ψ11 η1 y1 ε1

Multiplying λ11 by constant cand dividing ψ11 by c2 leads tothe same variance. Thus, theseparameters are not identified. Weneed to scale the latent variableby fixing one of these parameters.For instance, let λ11 = 1

yi1 = ηi1 + εi1

η1 ∼ N(0,√ψ11)

ε1 ∼ N(0,√θ11)

Var(y1) = ψ11 + θ11

Page 46: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Problems in Estimating Latent Variables

• Well known problem in Psychometrics: you can not jointlyestimate the parameters (parameters stable over people, suchas λ11) and latent variables (e.g., ηi1)

• We need to be able to add data to get better estimates (e.g.,smaller standard error), but adding an item adds parametersand adding a person adds latent variables!

• Paradoxically, the first step in solving this problem is gettingrid of the latent variable ηi1

• In item-response theory (binary items), the latent variable isintegrated out

• In factor analysis, we make use of covariance modeling

• After estimating parameters, the latent variable can beextrapolated. Although as we will see in SEM 2, often this isnot needed and we can use covariance modeling to test all ourhypotheses!

Page 47: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1 1 θ11ψ11 η1 y1 ε1

We match observed variancesand covariances to parameters sothat we do not need to estimatelatent variables themselves. If theparameters reproduce theobserved variances andcovariances, we explain the data!

yi1 = ηi1 + εi1

η1 ∼ N(0,√ψ11)

ε1 ∼ N(0,√θ11)

Var(y1) = ψ11 + θ11

Problem: two parameters andonly one observed variance,model is under-identified!

Page 48: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1 1 θ11ψ11 η1 y1 ε1

We match observed variancesand covariances to parameters sothat we do not need to estimatelatent variables themselves. If theparameters reproduce theobserved variances andcovariances, we explain the data!

yi1 = ηi1 + εi1

η1 ∼ N(0,√ψ11)

ε1 ∼ N(0,√θ11)

Var(y1) = ψ11 + θ11

Problem: two parameters andonly one observed variance,model is under-identified!

Page 49: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Covariance modeling

In general, we aim to estimate parameters that lead to a modelimplied variance–covariance matrix ΣΣΣ by minimizing (note that theBrown book makes a mistake here):

FML = trace(SSSΣΣΣ−1

)− ln |SSSΣΣΣ−1| − p,

in which SSS is the observed variance–covariance matrix, the traceoperator takes the sum of diagonal values, the | . . . | notationindicates the determinant and p is the number of observedvariables. Optimizing this expression is called maximum likelihoodestimation. In principle, This expression is optimized if ΣΣΣ resemblesSSS as much as possible! When SSS = ΣΣΣ, FML = 0.

Page 50: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Identification

Two main rules:

• Because the unit of the latent variable is unknown, we need toscale the latent variable by fixing its variance to 1 or fixingone (usually the first) factor loading to 1.

• We need at least as many observations (sample variances andcovariances) as the number or parameters; we requirenon-negative degrees of freedom (DF)

• DF = a− b• a: number of observations: a = p(p + 1)/2 variances and

covariances.• b: number of parameters we need to estimate (do not count

parameters we fixed for scaling)

• In general, we need 3 indicators for a single latent variablemodel, or 2 per factor for models with multiple (correlated)latent variables.

Page 51: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1 1 θ11ψ11 η1 y1 ε1

We need at least as manyobserved variances andcovariances as parameters!

yi1 = ηi1 + εi1

η1 ∼ N(0,√ψ11)

ε1 ∼ N(0,√θ11)

Var(y1) = ψ11 + θ11

Two parameters, oneobservation. DF = −1.Under-identified!

Page 52: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1θ11

ψ11

λ21

θ22

η1y1 ε1

y2 ε2

yi1 = ηi1 + εi1

yi2 = λ21ηi1 + εi2

η1 ∼ N(0,√ψ11)

ε1 ∼ N(0,√θ11)

ε2 ∼ N(0,√θ22)

Number of parameters: 4 (λ21,ψ11, θ11 and θ22)

ηi1 is now a common cause. yi1and yi2 are assumed independentafter controlling for ηi1: localindependence. Number ofobservations: 2 variances(Var(y1) and Var(y2)) + 1covariance (Cov(y1, y2)) = 3.Model is under-identified!

Page 53: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

General factor analysis framework:

yyy i = ΛΛΛηηηi + εεεi

yyy ∼ N(000,ΣΣΣ)

ηηη ∼ N(000,ΨΨΨ)

εεε ∼ N(000,ΘΘΘ),

in which:

• yyy i is a p-length vector of item responses

• ηηηi an m-length vector of latent variables

• εεεi an p-length vector of residuals

• ΛΛΛ a p ×m matrix of factor loadings

• ΨΨΨ an m ×m symmetric variance–covariance matrix (assumealways all latent variables are correlated)

• ΘΘΘ is a p × p symmetric variance–covariance matrix, mostlydiagonal (unless you explicitly expect violations of localindependence)

Page 54: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

The general framework:

yyy i = ΛΛΛηηηi + εεεi

yyy ∼ N(000,ΣΣΣ)

ηηη ∼ N(000,ΨΨΨ)

εεε ∼ N(000,ΘΘΘ),

Allows you to derive the model-implied variance–covariance matrix:

ΣΣΣ = ΛΛΛΨΨΨΛΛΛ> + ΘΘΘ

Page 55: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

λ11 1 θ11ψ11 η1 y1 ε1

ΛΛΛ =[1],ΨΨΨ =

[ψ11

],ΘΘΘ =

[θ11]

ΣΣΣ =[ψ11 + θ11

]

Page 56: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1θ11

ψ11

λ21

θ22

η1y1 ε1

y2 ε2

ΛΛΛ =

[1λ21

],ΨΨΨ =

[ψ11

],ΘΘΘ =

[θ110 θ22

]

ΣΣΣ =

[ψ11 + θ11ψ11λ21 λ221ψ11 + θ22

](upper triangular elements in symmetric matrices not shown)

Page 57: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1θ11

ψ11 λ21 θ22

λ31

θ33

η1

y1 ε1

y2 ε2

y3 ε3

ΛΛΛ =

1λ21λ31

,ΨΨΨ =[ψ11

],ΘΘΘ =

θ110 θ220 0 θ33

ΣΣΣ =

ψ11 + θ11ψ11λ21 λ221ψ11 + θ22ψ11λ31 ψ11λ21λ31 λ231ψ11 + θ33

DF = 0, just identified (but saturated, will explain the dataperfectly)

Page 58: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1 λ21 λ31 1 λ52 λ62

ψ11 ψ22

ψ21

θ11 θ22 θ33 θ44 θ55 θ66

η1 η2

y1 y2 y3 y4 y5 y6

ε1 ε2 ε3 ε4 ε5 ε6

ΛΛΛ =

1 0λ21 0λ31 00 10 λ52

0 λ62

,ΨΨΨ =

[ψ11

ψ21 ψ22

],ΘΘΘ =

θ110 θ220 0 θ330 0 0 θ440 0 0 0 θ550 0 0 0 0 θ66

Page 59: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1 λ21 λ31 1 λ52 λ62

ψ11 ψ22

ψ21

θ11 θ22 θ33 θ44 θ55 θ66

θ43

η1 η2

y1 y2 y3 y4 y5 y6

ε1 ε2 ε3 ε4 ε5 ε6

ΛΛΛ =

1 0λ21 0λ31 00 10 λ52

0 λ62

,ΨΨΨ =

[ψ11

ψ21 ψ22

],ΘΘΘ =

θ110 θ220 0 θ330 0 θ43 θ440 0 0 0 θ550 0 0 0 0 θ66

Residual covariance / correlation

Page 60: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

1 λ21 λ31 1 λ52 λ62

ψ11 ψ22

ψ21

θ11 θ22 θ33 θ44 θ55 θ66

λ41

η1 η2

y1 y2 y3 y4 y5 y6

ε1 ε2 ε3 ε4 ε5 ε6

ΛΛΛ =

1 0λ21 0λ31 0λ41 10 λ52

0 λ62

,ΨΨΨ =

[ψ11

ψ21 ψ22

],ΘΘΘ =

θ110 θ220 0 θ330 0 0 θ440 0 0 0 θ550 0 0 0 0 θ66

Cross-loading

Page 61: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Conclusion

• Factor analysis allows one to test a measurement model• Does a test or questionnaire measure a single latent variable?

• Done via the common cause model• one or more latent variables cause responses on observed

indicators• Indicators assumed to be mostly locally independent

• Estimation done by estimating elements of ΛΛΛ, ΨΨΨ and ΘΘΘ

• These matrices result in model implied variance–covariancematrix ΣΣΣ, which should resemble observed variance–covariancematrix SSS

Page 62: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Practical

• Form groups and start working on group project!

• Aim to have a questionnaire designed by the end of the firstpractical

• You can ask on the blackboard forum if can not find a group

• First assignment will be available on Wednesday

Page 63: SEM 1: Con rmatory Factor Analysis - Sacha Epskampsachaepskamp.com/files/SEM12017/SEM1Week1.pdfSEM 1: Con rmatory Factor Analysis Week 1 - Common Cause Modeling Sacha Epskamp 03-04-2017.

Introduction Course Outline Notation Stats recap Latent Causes Estimation Conclusion

Thank you for your attention!