INTRODUCTORY STRUCTURAL EQUATION MODELING WITH...
Transcript of INTRODUCTORY STRUCTURAL EQUATION MODELING WITH...
19-12-2012 1
Bidin Yatim, PhD Assoc. Prof. School of Quantitative Sciences
College of Arts & Sciences
019-3394959
PhD Applied Statistics (Exeter, UK) MSc Industrial Mathematics (Aston, UK) BSc Mathematics and Statistics (Nottingham, UK)
INTRODUCTORY STRUCTURAL
EQUATION MODELING WITH
AMOS WORKSHOP
TOPICS TO DISCUSS
TOPIC 1: Introduction to SEM
TOPIC 2: Modeling and Computing
5 Basic SEM steps
Model Specification
Model Identification
Model Estimation
Model Testing
Model Modification
TOPIC 3: Implementing SEM Using AMOS
How to draw model
Running AMOS & evaluating outputs
How to modify and re-specify a non fitting model
Part One
Introduction
To SEM
An overview SEM
Chapter 1 and 2 of Structural
Equation Modeling with AMOS.
Basic Concepts, Applications
and Programming Barbara M.
Bryne
Welcome to SEM The Musical Lyrics by Alan Reifman
(May be sung to the tune of "Matchmaker," Bock/Harnick,
from Fiddler on the Roof)
SEM, SEM, it can be sung, You‟ll be amazed, at what we‟ve sprung, We hope you‟ll learn more ‟bout this stats technique, Through songs of which you‟re among, SEM, SEM, we like to run, It takes awhile, but we get it done, We hope you‟ll learn of the steps that we take, And take home from this, some fun…
A Simple SEM
SEM is an attempt to model
causal relations (model) between
variables by including all
variables that are known to have
some involvement in the process
of interest…
SEM is ……
a family of statistical techniques which incorporates and integrates Path analysis
Linear regression
Factor analysis
SEM Conveys 2 important aspects of the
procedures.. The causal processes under study are
represented by a series of structural relations, and
these structural relations can be modeled pictorially to enable a clearer conceptualization of the theory under study.
The model can be tested simultaneously to determine the extent to which it is consistent with the data: if the goodness of fit adequate, the model is not rejected, otherwise the hypothesized relations rejected.
The Basic Idea Behind SEM
Statisticians have developed procedures for testing whether a set of variances and covariances in a covariance matrix fits a specified structure. The way SEM works is as follows:
1. You state the way that you/the theory believe the variables are inter-related, often with the use of a path diagram.
2. You (AMOS) work out, via some complex internal rules, what the implications of this are for the variances and covariances of the variables.
3. You test whether the variances and covariances fit the model.
4. Results of the statistical testing, and also parameter estimates and standard errors for the numerical coefficients in the linear equations are reported.
5. On the basis of this information, you decide whether the model seems like a good fit to your data.
SEM
Is a statistical methodology that takes a confirmatory (hypothesis testing) approach to the analysis of a structural theory
Most other multivariate procedures are descriptive/ exploratory in nature.
The theory represent “causal” processes that generate observations on multiple variables.
SEM is a very general, powerful and popular
multivariate analysis technique. provides a comprehensive method for the
quantification and testing of theories. Been applied in econometric, psychology,
sociology, political science, education, market and medical research etc.
Also known as covariance structure analysis, covariance structure modeling, Latent vaviable modelling, Confirmatory factor analysis, Linear structural relationship and analysis of covariance structures.
SEM serves purposes similar to multiple
regression, but in a more powerful way which takes into account the modeling of interactions, nonlinearities, correlated IVs, measurement error, correlated error terms, multiple latent IVs each measured by multiple indicators, and one or more latent DVs also each with multiple indicators.
may be used as a more powerful alternative to multiple regression, path analysis, factor analysis, time series analysis, and analysis of covariance. These procedures are special cases of SEM.
Advantages of SEM compared to
multiple regression more flexible assumptions (particularly allowing interpretation
even in the face of multicollinearity),
use of confirmatory factor analysis to reduce measurement error by having multiple indicators per latent variable,
the attraction of SEM's graphical modeling interface, the desirability of testing models overall rather than coefficients individually,
Enables researches to test a set of regression equation simultaneously.
the ability to
test models with multiple dependents,
model mediating variables,
model error terms,
test coefficients across multiple between-subjects groups, and
handle difficult data (time series with autocorrelated error, non-normal data, incomplete data).
Major applications of
structural equation modeling 1. causal modeling, or path analysis - hypothesizes causal
relationships among variables and tests the causal models with a linear equation system. Causal models can involve either manifest variables, latent variables, or both;
2. confirmatory factor analysis - extension of factor analysis in which specific hypotheses about the structure of the factor loadings and intercorrelations are tested;
3. regression models, in which regression weights may be constrained to be equal to each other, or to specified numerical values;
4. covariance structure models, which hypothesize that a covariance matrix has a particular form. For example, you can test the hypothesis that a set of variables all have equal variances with this procedure;
5. correlation structure models, which hypothesize that a correlation matrix has a particular form.
How Useful is a Statistical
Model
All models is wrong, but some are useful
G.E.P Box SEM models can never be accepted,
they can only fail to be rejected. This leads researchers to a given
model While models that fit the data well
can only be provisionally accepted, models that do not fit the data well can be absolutely rejected
Statistical Modeling
Statistical model DOES NOT necessarily have theoretical basis- may be interpreted as either „make sense‟ or „nonsense‟.
Smoking
Heart
Disease
Weight
Income
Number of
Newspaper
Readers
Number of
Road
Accidents
SEM: Terminologies
Independent variables (IV), assumed to be measured without error, called exogenous variables
Dependent variable (DV) or mediating variables called endogeneous variables.
Manifest/ observed variables (indicators) are directly measured by researchers, while
Latent/ unobserved variables are not directly measured but inferred by the relationship/ correlation among measured variables in the analysis. Example: self-concept, motivation, powerlessness, anomie, verbal ability, capitalism, social class.
SEM Terminologies SEM illustrates relationship between
observed & unobserved using path
diagrams.
Oval or circles:- latent variables
Rectangles/ squares:- measured
variables (indicators)
Residuals always unobserved,
represented by oval or circles.
Model identification
Unique solution – parameters are estimateable
Cannot be identified – parameters subject to
arbitraries, implying that different parameters
value define the same model, hence no
consistent estimates for all parameters, thus
model cannot be tested. For example, to
determine unique value of X and Y when the
only information is X+Y=15.
Just identified - model will always cannot be
rejected; over-identified – model can sometimes
be rejected; (positive degrees of freedom) or
under-identified – insufficient information
SEM, Oh, SEM Lyrics by Alan Reifman, dedicated to Peter Westfall (article of his)
(May be sung to the tune of "Galveston," Jimmy Webb, popularized by Glen Campbell) Ultimately, SEM, Your LV‟s cannot be measured, Which gives the critics some displeasure, There‟s nothing physical to grab on, When you run SEM, SEM, Oh, SEM, You make many an assumption, Is it recklessness or gumption? Assume the e‟s uncorrelated... When you run SEM, I can see the critics‟ point of view, now, They‟re saying the models aren‟t unique, That, we must willingly acknowledge, In response to the critique, if we want to keep on using... SEM, Oh, SEM...
I Am an Indicator Lyrics by Alan Reifman
(May be sung to the tune of "The Entertainer," Billy Joel)
I am an indicator, a latent construct I represent, I'm measurable, sometimes pleasurable, A manifestation of what is meant, I am an indicator, I usually come in a multiple set, With other signs of the same construct, you may instruct, I'm correlated with my co-indicators, you can bet, I am an indicator, from my presence the construct is inferred, I'm tap-able, the construct is not palpable, The distinction should not be blurred
At Least Three Lyrics by Alan Reifman
(May be sung to the tune of "Think of Me," Lloyd
Webber/Hart/Stilgoe, from Phantom of the Opera)
At least three, indicators are urged, For each latent construct shown, At least three, indicators should help, Avoid output where you groan, With less than three, your construct sure will be, locally unidentified, Though the model might still run, you could have a rough ride
SEM
Consists of two components
Structural model
Illustrates relationship among latent
construct or endogeneous variables
Measurement models
Representing how the construct related
to their indicators or manifest variables.
Example
Exogeneous Endogeneous Endogeneous
Latent Latent Latent
Construct Construct Construct
Ability Aspiration Achievement
Structural Model
The structural model allows for certain relationships among latent variables, depicted by lines or arrows (in path diagram).
In the path diagram earlier, we specified that Ability and Achievement were related in a specific way i.e. intelligence had some influence on later achievement.
One result of structural model is an indication of the extent to which these priori hypothesized relationships are supported by our sample data
Structural Model
The structural equation addresses
following question
Are Ability and Achievement related?
Exactly how strong is the influence of Ability on
Achievement?
Could there be other latent variables that we
need to consider to get a better understanding
of the influence on Achievement?
Measurement Model
Specifying relationship between latent variables and observed variables
Answers the questions: To what extent are observed variables actually
measuring the hypothesized latent variables?
Which observed variable is the best measure of a particular latent variable?
To what extent are the observed variables actually measuring something other than the hypothesized latent variable?
Using EFA or CFA to determine the significant observed variables related to each of the latent variables.
Using Factor Analysis Factor Loading Academic
Skill
Interpersonal
Skill
Communication
Skill
Ability
Using Factor Analysis Factor Loading Family
Status
Father‟s
Occupation
Peer‟s
Influence
Aspiration Father‟s
Occupation
Father‟s
Occupation
Using Factor Analysis Factor Loading
Personal
Actualization
Professional
Status
Social
Status
Achievement
Measurement Model
The relationships between observed variables and latent variables are described by factor loadings
Factor loadings provide information about the extent to which a given observed variable is able to measure the latent variable. They serve as validity coefficients.
Measurement error is defined as that portion of an observed variable that is measuring something other than what the latent variable is hypothesized to measure. It serves as a measure of unreliability
Full Latent Variable Model
1x
Ability Aspiration Achievement
Interpersonal
Skill, x2
Peer‟s
Influence
y3
Family
Status, y1
Father‟s
Occupation,
y2
Professional
Status, x5
Social
Status,
x6
Academic Skill, x1
Communication Skill, x3
Personal Actualization,
x4
Total, Direct and Indirect Effects
There is a direct effect between two latent
variables when a single directed line or arrow
connects them
There is an indirect effect between two
variables when the second latent variable is
connected to the first latent variable through
one or more other latent variables
The total effect between two latent variables is
the sum of any direct effect and all indirect
effects that connect them.
Types of Measurement Scale
There 4 types of measurement scale in a scale instrument Nominal Scale Ordinal Interval Scales Ratio
Some other common scales like Likert scales, Semantic Differential Scales, Dichotomous Scales etc can be categorized into the 4 above
This is important as assumptions on SEM rely on what we know on this page
SEM Assumptions(1)
SEM in general assumes observed variables are measured on a linear continuous scale
Dichotomous and ordinal variables cause problem because correlation/ covariance tend to be truncated. These score are not normally distributed and responses to individual items may not be very reliable.
SEM Assumption (2)
Many SEM estimation procedures assume
multivariate normality
Lack of univariate normality occurs when
the skew index is >3.0 and kurtosis index >10
Multivariate normality can be detected by
indices of multivariate skew or kurtosis
Non-normality can be corrected
(sometimes) by transforming variables.
Violations of Assumptions
The best known distribution with no
kurtosis is the multinormal
Leptokurtic (more peaked) distribution
results in too many rejection of H0 based
on the Chi-Square statistic
Platykurtic distributions will lead to too
low estimates of Chi-Square.
Correlation Matrix
We say that
If the input matrix used is the
Covariance Matrix: the estimated
coefficients in the parameters
measured are unstandardized estimates
If the input matrix used is the Correlation
Matrix: the estimated coefficients in the
parameters measured are the
standardized estimates
So what?
Correlation Matrix
When we want to test a theory, we use variance-covariance matrix
(to validate the causal relationships among constructs)
When we just want to explain the pattern of the relationships then we use correlation matrix
(Theory testing is not required)
Covariance
The covariance between two variables equals the correlation times the product of the variables' standard deviations. The covariance of a variable with itself is the variable's variance
Covariance Lyrics by Alan Reifman (May be sung to the tune of "Aquarius,"
Rado/Ragni/MacDermot, from Hair, also popularized by the Fifth Dimension)
You draw paths to show relationships, You hope align with the known r‟s, Your model will guide the tracings, From constructs near to constructs far, You will compare this with the data‟s covariance, The data‟s covariance... Covariance! Covariance! Similar to correlation, With the variables unstandardized, Does each known covariance match up with, The one the model tracings will derive? Covariance! Covariance!
Factors Effecting Correlation/ Covariance
Coefficient Type of scale and range of values
Pearson correlation is basis for analysis in regression, path, factor analysis and SEM. Hence data must be in metric form.
There must be enough variation in scores to allow correlation relationship to manifest.
Linearity Pearson correlation coefficient measures degree of linear
relationship between two variables, hence need to test linearity.
Sample size SEM requires big sample size. Rule of thumb: 10-20 times the
number of variables. Ding, Velicer and Harlow (1995): 100-150; Boomsma (1982,1983): 400; Hu, Bentler and Kano (1992): in some cases 5000 is still insufficient; Schumaker, Lomax (1999) many articles 250-500. Bentler and Chou (1987): for normal data 5 subjects per variable is sufficient.
What if the sample is small?
Consequence of small sample size
Convergence failures (software cannot
obtain a satisfactory solution)
Improper solutions (including negative
error variance estimates for measured
variables)
Lowered accuracy of parameter
estimates an in particular, standard
error (SE)
SEM Assumption (3)
SEM program standard error are
computed under the assumption
of large sample size.
Violation of Assumptions
High degrees of skewness lead to
excessively large Chi-Square
estimates.
In small samples (N<100), the Chi-
Square statistic tend to be too large.
SEM Assumptions (4) Multicollinearity Occur when intercorrelations among some variables are so
high that certain mathematical operation are impossible or results are unstable because denominators close to 0.
Bivariate correlations>0.85
Multiple correlation>0.90
May cause a non-positive definite/ singular covariance matrix
May be due to inclusion of individual and composite variables
Detection: Tolerance=1-R², 0.10;
Variance Inflation Factor (VIF)=1/(1-R²)>10
Can be corrected by eliminating or combining redundant variables.
SEM Assumptions (5)
Outliers
Univariate outliers - >3 Std. deviation from the mean
Detection - Inspection of frequency distributions and univariate skewness and kurtosis
Multivariate outliers may have extreme score on two or more variables or their figuration of scores may be unusual
Detection by inspection of multivariate skewness and kurtosis
Can be remedied by correcting errors or by dropping these cases or by transforming the variabless.
Applications involving single-
group analyses - Section II first-order confirmatory factor analytic (CFA) models demonstrate testing for the
validity of the multidimensionality of theoretical structure of a construct (Chapter 3). Specifcally, this application tests the hypothesis that self-concept (SC), for early adolescents (grade 7), is a multidimensional construct composed of four factors—general SC (GSC), academic SC (ASC), English SC (ESC), and mathematics SC (MSC).
second-order CFA model bears on the factorial structure of a measuring instrument
(Chapter 5) that examines a CFA model that comprises a second order factor.
The final single-group application tests for the validity of an empirically derived causal structure (Chapter 6). we take our frst look at a full structural equation model (SEM). The hypothesis to be tested relates to the pattern of causal structure linking several stressor variables that bear on the construct of burnout. The original study from which
this application is taken (Byrne, 1994a) tested and cross-validated the impact of organizational and personality variables on three dimensions of burnout for elementary, intermediate, and secondary teachers. For purposes of illustration here, however, the application is limited to the calibration sample of elementary teachersonly (N = 599).
In Section III, I present three applications relat
6 JULY 2011
Testing for the factorial validity of a theoretical
construct (First-order CFA model)
6 JULY 2011
Testing for the factorial validity of scores from a
measuring instrument (Second-order CFA model)
Testing for the validity
of a full causal structure
6 JULY 2011
Applications involving multiple
group analyses - Section III
to test for measurement and structural equivalence across groups with respect to a measuring instrument i.e a first order CFA model (Chapter 7) and to a full causal structure (Chapter 9).
we focus on applications involving more than one sample where the central concern is whether or not components of the measurement model and/or the structural model are equivalent (i.e., invariant) across particular groups of interest.
6 JULY 2011
17-12-2012 53
THE END
thank you
6 JULY 2011 54