Estimating latent variable models in R with...

45

Transcript of Estimating latent variable models in R with...

Estimating latent variable models in R withOpenMx

Lawrence L. Lo

September 2010

Outline

Brief R introduction

What is R and why should I use it?

R basics

Latent variable model basics

What is a latent variable model?

Structural equation models (SEM)

OpenMx

OpenMx basics

Basic model examples

The Future

Outline

Brief R introduction

What is R and why should I use it?

R basics

Latent variable model basics

What is a latent variable model?

Structural equation models (SEM)

OpenMx

OpenMx basics

Basic model examples

The Future

What is R?

I R is a programming language and software environment forstatistical computing and graphics.

I Open source and free to use.

I Includes many user-submitted packages for cutting edgeanalysis.

I One of the most popular platforms for statistics courses.

Where and how do I get R?

I http://www.r-project.org/

I The homepage for R

I http://cran.r-project.org/

I The Comprehensive R Archive Network. This also includesinfo and downloads on R packages and libraries.

I http://www.sciviews.org/Tinn-R/

I This is a great editor for R that many people use.

What is OpenMx?

I OpenMx is a free open source matrix optimization packagefor R. It is primarily built for structural equation model(SEM) analysis.

I Several universities are part of the development team, butthe main developers are from UVA and VCU.

I Very �exible software with powerful optimizationcapabilities.

I Allows model construction by path or matrix speci�cation.

Why R and OpenMx?

I FREE!

I Both are well maintained and have responsive developmentteams.

I Analytic techniques that required multiple SEM runs ortransfer of SEM output to another program can now beeasily integrated.

I Powerful customized algorithms can be built from the toolsin these programs.

Outline

Brief R introduction

What is R and why should I use it?

R basics

Latent variable model basics

What is a latent variable model?

Structural equation models (SEM)

OpenMx

OpenMx basics

Basic model examples

The Future

Object oriented programming

I Programming in R involves creating and maniuplatingobjects. R objects can be numbers, lists of numbers, datasets, character strings, etc.

I Example: create an object, A, that contains the number 7.

I A <- 7I You can display object contents by just entering them in

alone on a command line.

I Standard arithmetic operators can easily be used whencreating objects.

I Example: create an object, B, that is equal to A + 4.

I B <- A+4

R functions

I The other main component in the R language is functionusage.

I functions take the form, function(), where the functionname appears before the parentheses and thing(s) to beevaluated appear within the parentheses.

I Examples: the c() function combines numbers into a stringand the mean() function takes the mean of a string ofnumbers. Store A and B into E and take the mean of E.

I E <- c(A,B)I mean(E)

Reading in data

I There are several functions for reading in data:

I There are data reading functions for speci�c kinds of data(e.g., tables, strings) .

I Also functions for data from di�erent formats (e.g., ASCII,SPSS, SAS).

I One of the most useful data reading functions is read.table().

I Example: read a data set and store it in an object calledDATA.

I DATA <- read.table(��lelocation.dat�)

Examining data

I You can look at the data table by just entering in DATA onthe command line.

I You can also look at the variables within the data set bythe function names(). You can extract a speci�c variable byusing the $ operator.

I Example: examine variable V3 in the dataset

I DATA$V3

I You can indicate a particular cell with the [] indexcommand. Example: case 70 for variable 3

I DATA$V3[70]I DATA[70,3]

Other useful basic functions

I The summary() function can give useful univariatedescriptive information.

I The cor() function gives the correlation between twovectors or character strings.

I Example: Obtain the correlation between V1 and V3I cor(DATA$V1,DATA$V3)

I The cor() function also produces a correlation matrix ifused on a data matrix.

I Example: Obtain the correlation matrix for the data set.I cor(DATA)

R Questions?

I Enter help.search(�keyword�) to search available libraries

I You can simply type a ? before any function to get ahelp�le.

I Example entry: ?cor

I Typing a function without an argument will display thefunction code.

I For a more in-depth introduction to R, visithttp://cran.r-project.org/doc/manuals/R-intro.pdf

Outline

Brief R introduction

What is R and why should I use it?

R basics

Latent variable model basics

What is a latent variable model?

Structural equation models (SEM)

OpenMx

OpenMx basics

Basic model examples

The Future

What is a latent variable model?

I A statistical model that relates a set of manifest variablesto a set of latent variables.

I Controversy over how latent variables should beconceptualized:

I Unobservable or �hidden�.I Variables containing measurement errorI (Borsboom, Mellenbergh, & Van Heerden, 2003; Borsboom,

2008).

I Assumptions:

I 1) The responses on the indicators or manifest variables arethe result of an individual's position on the latentvariable(s).

I 2) The manifest variables have nothing in common aftercontrolling for the latent variable (local independence).

Types of latent variable models

I Di�erent types of latent variable models can be groupedaccording to whether the manifest and latent variables arecategorical or continuous:

Manifest

Latent Continuous Categorical

Continous Factor analysis Latent trait analysis

Categorical Latent pro�le analysis Latent class analysis

I Factor analysis and structural equation modeling, which arerepresented by the upper-left quadrant, will be the focus ofthis presentation.

Diagram representations of latent variable models

I Unidirectional arrows: regressive relationships.

I Bidirectional arrows: covarying relationships

I Circles: latent variables

I Squares: manifest variables

I Triangles: constants

Outline

Brief R introduction

What is R and why should I use it?

R basics

Latent variable model basics

What is a latent variable model?

Structural equation models (SEM)

OpenMx

OpenMx basics

Basic model examples

The Future

Structural equation models (SEM)

I SEM analysis involves speci�ying a model that gives therelationships of manifest and observed variables.

I This usually involves specifying:

I Relationships to be estimatedI Relationships to be constrained to some value, often times

zero (no relationship).

I A data criterion is usually �t by the estimation procedure.In SEM this is often the covariance/corrrelation matrix.

I After �tting a SEM and obtaining parameter estimates, theresearcher can look at several criteria gauging theappropriateness of particular parameters as well as theoverall model.

An example SEM diagram

I The greek symbols (sv, m, and l) refer to parameters to beestimated.

I The 1's refer to constrained relationships.

I Any two components not having an arrow between themcan be thought of as having a zero-arrow.

The LISREL model

I The LISREL (Linear Stuctural Relations) model is one ofthe most popular SEM models. (Jöreskog & Sörbom, 1974)

I The covariance (S) model is:

S = L(I − B)−1Y(I − B)−TLT +J

I and the means model (m) is:

m = t+ L(I − B)−1a

I The parameters on the right-hand sides produce estimatedcovariances and means (on the left hand side) that are�tted to the observed covariances and means.

The LISREL parameters

I The Lambda (L) matrix is the factor loading matrix. Thismatrix gives the regressive relationships from the latent tomanifest variables.

I Theta (J) is the measurement error covariance matrix. Itgives the variance of the observed variables that is notexplained by the latent variables.

I The Psi (Y) matrix is the residual covariance matrix of thelatent variables. In the case that no other variable predictsa latent variable, the corresponding psi value is the varianceof that latent variable.

I The Beta (B) matrix is the factor regression matrix. Itgives the regressive relationships amongst factors.

The LISREL parameters (cont.)

I Tau (t) is the observed mean vector.

I Alpha (a) is the latent mean vector.

I Note: there are several other parameter matrices thatappear in the LISREL model, but the parametersmentioned above usually su�ce for most modelingpurposes.

Basic matrix concepts

I A particular element of a matrix speci�es the relation inthe model.

I The row and column index denote which variables theelement pertains to.

I For covariance matrices, the index speci�es a particularvariance or covariance (bidirectional).

I For regression matrices, this speci�es a particular regressionrelationship (unidirectional).

Matrix element examples

I Examples:

I Element (2,2; row 2, column 2) of the Y matrix denotes the(residual) variance of the second latent variable.

I Element (1,2) of Y denotes the covariance of the �rst andsecond latent variables.

I Element (2,3) of the L matrix gives the regression of thesecond manifest variable on the 3 latent variable (arrowfrom 3rd latent to 2nd manifest),

I Element (4,2) of the B matrix gives the regression of thefourth latent variable on the second latent variable (arrowfrom 2nd latent to 4th latent).

Model speci�cation

I Just as the diagram example depicted path-arrows thatcould estimated or constrained, the LISREL matrices canbe speci�ed in the same way.

I OpenMx, the focus of this presentation, allows the user tospecify models by either paths OR matrices.

Outline

Brief R introduction

What is R and why should I use it?

R basics

Latent variable model basics

What is a latent variable model?

Structural equation models (SEM)

OpenMx

OpenMx basics

Basic model examples

The Future

Installing OpenMx

I The OpenMx website is http://openmx.psyc.virginia.edu/

I You can automatically download and install OpenMx intoR by simply copying and pasting the code below:

source('http://openmx.psyc.virginia.edu/getOpenMx.R')

I You may need to choose a CRAN repository, just chooseone of the Pennsylvania (PA) locations.

I Other download and installation details can be found at:http://openmx.psyc.virginia.edu/installing-openmx

Core OpenMx commands

I There are many commands in OpenMx, but here are themost used commands:

I mxModel: creates a model to test. Many of the otherOpenMx objects get placed into here.

I Here are some important commands that create modelspeci�cation objects.

I mxData: speci�es the data and structures to be �t.I mxPath: speci�es a path or set of paths between variables.I mxMatrix: speci�es a matrix.

I mxRun: runs a object created by mxModel.

I Type a ? before a command to see a complete help �le, forexample: ?mxPath

RAM model

I Default matrix speci�cation for OpenMx is in the ReticularAction Model (RAM).

I OpenMx path speci�cations convert to the RAM model.

I Path results are presented in RAM format with variableindices.

I RAM matrices:

I S: Symmetric pathsI A: Assymetric pathsI M: Means

LISREL function

I Because of the �exibility of OpenMx, we can also de�necustom models. I have designed a function to build SEMswith LISREL parameters.

I Refer to the accompanying syntax �le for the LISRELcommands.

Outline

Brief R introduction

What is R and why should I use it?

R basics

Latent variable model basics

What is a latent variable model?

Structural equation models (SEM)

OpenMx

OpenMx basics

Basic model examples

The Future

Follow the syntax

I For the next few slides, we will be following along withsome example R syntax.

Common factor model (CFM)

I Most basic factor analytic model.

I In this particular model, there is 1 factor that has 5manifest indicators.

Model identi�ability issues

I Most of the time, constraint choices must be made in orderto have an identi�able model.

I This sometimes involves scaling in the latent variances orfactor loadings.

I It is also possible that di�erent starting values must beapplied for estimation purposes.

I These are complicated issues beyond the scope of thistutorial.

Latent growth curve model (LGCM)

I Another common SEM type for longitudinal group data.

I Here latent factors represent variance in intercepts andlinear trends.

LGCM 2

I This is how you would actually want to constrain themodel.

Latent vector autoregression (LVAR)

I Autoregression: factors are explained by their past values.

I Longitudinal cross-lag: factor 2 is partially explained bypast values of factor 1.

State Space Model (SSM)

I The State Space Model (SSM) is a popular dynamic systemand time series model.

I This model focuses on an individual time series.

I The state equation is:

xt = Bxt−1 + wt

I where:

I xt is the state vector at time tI B is the autoregressive matrix showing time dependent

relationships of state variablesI w t is state error at time t

SSM (cont.)

I The measurement equation is:

y t = Lxt + et

I where

I y t is the observed data at time tI L is the factor loading matrixI et is the measurement error at time t

SSsem program

I SSsem is a program for �tting state space models in astructural equation modeling framework.

I Two step procedure:

I 1) Conducts exploratory P-technique factor analysis.Converts observed times series to state series.[Measurement Model]

I 2) Conducts vector autoregression via brute-force likelihoodratio method. [State Model]

I SSsem is exploratory: the user only needs to feed it a datamatrix. SSsem takes care of the rest.

I SSsem is still in development and will be expanded to be amore general program.

The Future

I Things I am working on:

I Fast and straightforward methods of doing dynamicmodeling and �hybrid� mixed e�ect models.

I Automatic search routines for model building.

I Other OpenMx capabilities yet to be tapped:

I Multi-group (or multiple time series)I Multilevel latent variable modelsI Mixture models

Thank you

Thank you

References

I Borsboom, D., Mellenbergh, G.J., & Van Heerden, J.(2003). The theoretical status of latent variables.Psychological Review, 110, 203-219.

I Borsboom, D. (2008). Latent variable theory.Measurement, 6, 25-53.

I Jöreskog, K.G. & Sörbom, D. (1974). LISREL III[Computer software]. Chicago, IL: Scienti�c SoftwareInternational, Inc.

I McArdle, J.J. & McDonald, R.P. (1984). Some algebraicproperties of the Reticular Action Model for momentstructures. British Journal of Mathematical and StatisticalPsychology, 37, 234-251.