Structural equation models : opportunities, risks and discussion of some applications in the travel...
-
Upload
melvin-golden -
Category
Documents
-
view
215 -
download
3
Transcript of Structural equation models : opportunities, risks and discussion of some applications in the travel...
Structural equation models : opportunities, risks and discussion of some applications in the travel
behavior research domain
Marco Diana, Politecnico di Torino (I)
University of Maryland, College Park, 29th November 2014
2
Structure of the seminar1. Structural equation models are grounded on two
multivariate analysis statistical techniques : Multiple regression Principal component and factor analysis
2. Basic notions on structural equation models (SEM)
3. Use of SEM: needed input, range of output, most commonplace issues in travel behavior research
4. Available software packages
5. Discussion on some applications in the study of mobility behaviours
Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Metric (quantitative) variables: Ratio scales
(Es: body weight, road length)
Interval scales(Es: temperature)
Nonmetric (qualitative) variables: Ordinal scales
(Es: degree of satisfaction)
Categorical scales(Es: sex)
3
Measurement scales (Stevens, 1946)
Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Univariate and bivariate analyses One random variable:
Univariate distributions and related moments (mean, variance…)
Two random variables: Bivariate, joint and conditional distributions
and related moments Interdependence analyses => correlations
(Pearson, Spearman…), contingency tables Dependence analyses => Linear regression,
ANOVA
4Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
5
Multivariate statistical analysis tech.
From:Hair et al. (1998)
Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
6
Multiple linear regression (1/2)
Operating instructions:
1.Dependence technique => need to identify x e y2.A unique linear relationship3.Only one metric dependent variable (y)4.Two or more linear independent variables (x1, x2, …), either metric or binary
Objective:
Find the value of parameters a0, a1, a2, … in
y = a0 + a1x1 + a2x2 + … + … such that the sum of squared errors (differences between the two terms) is mimimised (OLS).
Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
x1
x2y
a1
a2
7
Multiple linear regression (2/2)
Assumptions:
1.Linear relationship
2.Errors independence
3.Normal distribution of errors
4.Constant variance of error (homoskedasticity)
NB1: multicollinearity of x variables «slightly less problematic» than in some discrete choice models
NB2: measurement errors are not distinguishable
SEM can be helpful in both cases!
Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Factor & Principal Components Anal.Operating instructions:
1.Interdependence analysis => «We only have x»
2.Metric variables (possible extensions)
Objective:
Analize the correlation matrix of variables, looking for clusters of variables that are more correlated among them and less correlated with the others
Find latent variables (factors, constructs, components, dimensions) from such groups that can therefore «synthetise» o «represent» the observed x variables
8Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Common, specific and total variance Both methods are based on the study of the
variance in the data The common variance is the variance that is
shared among all x variables The specific variance is associated only to a
specific variable xi (including the one due to meas. errors)
The total variance is the sum of the two PCA: The input is the correlation matrix => this
method considers the total variance FA: The main diagonal of the correlation matric
contains an estimation of the common variance => the method considers only the common variance
9Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Principal component an. (Pearson, 1901)
Transformation of p observed variables x into p latent variables t, linear combinations of x
i.e., find the value of coefficients a11, a21, … in
t1 = a11x1 + a12x2 + … + a1pxp
t2 = a21x1 + a22x2 + … + a2pxp
tp = ap1x1 + ap2x2 + … + appxp
… such that:
The components t1 … tp are sorted by decreasing variance
The components ti are independent
10Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
…
Factor analysis (Spearman, 1904)
Regression of p observed variables x on k<p latent variables
i.e., find the value of loadings 11, 21, … in
x1 = 111 + 122 + … + 1pk + 1
x2 = 211 + 222 + … + 2pk + 2
xp = p11 + p22 + … + ppk + p
… such that the factors can explain the common variance among the x variables
Unlike PCA, here we assume that factors actually exist (more formally, the covariance matrix of x variables must have some properties)
11Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
…
Common requirements and results Both PCA and FA give meaningful results iff x
variables are at least partly correlated => multicollinearity is desirable!
Sample size: at least 5 observations per observed variable x, in any case at least 100
We consider the first k<p components of a PCA or we look for k<p factors through a FA => methods to choose k are needed
If the common variance is a consistent part of the total variance, the two methods give similar results
12Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
PCA ambits of use Aim: to represent data variability with the
minimum number of latent variables
Theoretical assumptions: none, we simply want to summarise the variables while trying to preserve the patterns within the dataset
Data characteristics: the specific variance and the one due to measurement errors are a negligible proportion of the total variance
13Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
x1
Component t2Component t1
x2 x3 x4 x5 x6 x7
a11a12 a13
a23
a13 a24 a25 a26a27
Factor Analysis ambits of use
Aim: identifying the dimensions, or latent factors, implied by the set of x variables being considered
Theoretical assumptions: latent factors do exists, on the basis of a theory that allows the interpretation of the observed correlations
Data characteristics: specific and measurement error variances are not negligible, therefore I consider only the common variance
14Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
x1
Factor 2Factor 1
x2 x3 x4 x5 x6 x7
11 12 1323
13 24 25 2627
The factor analysis we introduced is exploratory (EFA): the number of latent factors and their relationships with the observed variables are found a posteriori, through the analysis itself.
If we have a well founded theory and empirically supported by previous EFAs, it is better to define a priori factors and their relations with observed variables, computing loadings ij and checking the model «goodness of fit» => confirmatory technique (CFA)
15Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Exploratory vs confirmatory analysis
SEM can be used to implement a CFA!
Examples of combinations of the two methods:
16Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Combining regression and factor an.
Education
Age
Children <14Trip rates
Income
Higher-order factor analyses
Regression where some variables are latent
Safety
ReliabilityCognitive
Car attitudes
Mobility
Systematic trips
Holidays, VFR
Transfers
Income
Rootedness
Education
Nationality
Chained regressions: path analysis (Wright, 1934)
Freedom
Well-beingAffective
It would be possible to estimate the previous models by decomposing them and implementing n distinct regressions and/or factor analyses
However, this would be an inefficient use of data
17Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
SEM – Structural equation models
Structural equation models (Jöreskog et al., 1973)
Regression and FA are generalised and combined, through simultanous estimation of all parameters: Further results and «diagnostic tools» Further applications compared to the previous examples
Measurement model:x = x + y = y +
where x and y are esogenous and endogenous variables, and the latent ones, x and y are loadings matrices, and error terms
Structural model: = + +
where and are the structural coefficients matrices and error terms
The two models are jointly estimated.18Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
LISREL notation of a SEM model
Example (Hair, 1998)
19Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Model path diagram
Example, cont. (Hair, 1998)
20Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Complete model
Structural coefficients (regression coefficients) Factor loadings, both of exogenous and
endogenous variables Correlations between endogenous constructs (to
avoid!) or exogenous constructs (obviously not between endogenous and exogenous)
Variance of the measurement error of the observed variables (endogenous and exogenous)
Covariance of the measurement error of the observed variables (endogenous and exogenous)
Parameters that can be estimated
21Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Confirmatory technique => the analyst chooses which parameters should be estimated
Input: covariance or correlation matrix of the observed variables, as in factor analysis: Covariances: total effects are found, comparison
between different models/populations/samples (transferability)
Correlations: understanding patterns among variables and their relative importance
Assumptions: From regression: linear relationship, multivariate
normal distributions From sampling theory: random sample,
independent observations
Input and assumptions
22Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Dimensions of the sample: At least 100-150 observations 10 observations per parameter, 15 when non-
normality is detected Overfitting when we use more than 400
observations (too sensitive model) Estimation methods:
Parametric: maximum likelihood (ML) Non parametric: ADS-WLS => 1000
observations are needed Resampling: bootstrap, jackknife
Data requirement and estimation
23Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
A unique symptom could be due to different problems: estimation process not converging, variances<0, loadings>1, «mysterious» error messages… Unsound theoretical basis, specification errors Model identification: degrees of freedom,
scales and # of indicators per construct, rank and order conditions…
Non-normality when using a parametric estimation method
Algebraic properties of the input matrix (positive definite…)
Common problems in SEM
24Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Problems and symptoms are not univocally linked, the same goes for fit measures: Absolute fit Parsimonius fit Incremental fit Structural model fit (sign and significance of
coefficients, rho-squared) Measurement model fit (unidimensionality of
costructs, Cronbach’s alpha)
Goodness of fit measures in SEM
25Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Path analysis: Reciprocal implications (Non-recursive models) Direct, indirect and total effects Mean structures (different means of latent vars)
Regression with an estimation of correlations among variables (endogenous or exogenous, observed or latent) Models with repeated observations Models with longitudinal data (latent growth)
Including categorical variables Multiple sample models, mixture models
26Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Advanced SEM applications
You simply can’t do all this by combining R and FA!
LISREL 9.1 (Jöreskog et al.) EQS 6.1 (Bentler et al.) Mplus 7 (Muthén et al.) SAS => PROC CALIS (SAS Institute) Statistica => SEPATH (StatSoft) SPSS => Amos (IBM) R => sem, lavaan, …
(Packages that I used to be familiar with are in bold, they are not necessarily the best ones…)
Software for SEM estimation
27Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Golob (2003) reviewed more than 50 papers on a wealth of topics: Mode choice behaviors Determinants of car ownership and use Longitudinal and panel data analyses Activity-based models Travel attitudes-behaviors relationships Driving behaviors and safety issues
Obviously many more SEM papers have appeared since then, although I would have expected an ever sharper increase
SEM applications in travel research
28Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Travel demand derived only by the need of performing activities in different places… Activity-based models Utility-maximising models by minimising travel
times …but is it always true?
«Teleportation test»: 3% of the sample indicates an ideal commute time <2 min, 50% >20 min (Mokhtarian, 2001)
Random utility models where travel-time coefficients >= 0: always garbage or…
Example: primary utility (Diana, 2008)
29Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Goal: capturing and measuring the «primary utility» latent construct
Theoretical model => EFA => primary utility is due to different factors: Importance of on-trip activities Importance of activities at different locations Ideal trip length Travel-related cognitive and affective attitudes Performances and use of the travel means
Item analysis => 6 constructs are related to primary utility => Second order CFA
Example: primary utility (Diana, 2008)
30Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Model specification (Diana, 2008)
31Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Primary utility measurement scale
32Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Drivers versus
transit riders
Commuting versus other
trips
Modal diversion versus mode choice Demand for unknown services:
«cognitive asymmetry» <=> SP surveys Attitudes and rational evaluations have a
different relative importance according to the alternative
Behavioral modal diversion model: the endogenous variable measures the propension to change on a Likert scale
Data limitations => submodel implement. and considering standard estimations
Modal diversion (Diana, 2010)
33Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Standardized estimation=> comparing different structural coefficients
Modal diversion (Diana, 2010)
34Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Is there a difference in the diversion to buses and to shared taxis? => Comparing unstandardized estimations of the single structural equations in the two subsamples
SEM with subsamples
35Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Model with MULTIM All Buses DRT REL_COST -0.20 -0.11 * -0.07 * REL_TIME -0.25 -0.39 -0.21 REL_WAIT -0.15 -0.29 -0.14 REL_WALK -0.14 -0.05 ** -0.15 MULTIM 0.17 0.29 * 0.15 *
Model with COGNIT All Buses DRT REL_COST -0.19 -0.08 * -0.07 * REL_TIME -0.26 -0.38 -0.21 REL_WAIT -0.13 -0.27 -0.11 * REL_WALK -0.09 0.01 -0.10 * COGNIT -0.20 -0.08 ** -0.29
* = not signif. at the 5% level ** = not signif. at the 20% level
36
Thank you for your attention!
Structural equation models : opportunities, risks and discussion of some applications in the travel
behavior research domain
Question, remarks, …
Marco Diana
Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
ADF-WLS = Asymptotically distribution-free weighted least squaresCFA = Confirmatory factor analysisEFA = Exploratory factor analysisFA = Factor analysis
List of acronyms
37Marco Diana, Structural equations models – University of Maryland, College Park, 29/11/2014
Mentioned references
ML = Maximum likelyhoodOLS = Ordinary least squaresPCA = Principal components analysisSEM = Structural equations modelVFR = Visiting friends and relatives
• Diana, M. (2008) Making the “primary utility of travel” concept operational: a measurement model for the assessment of the intrinsic utility of reported trips, Transportation Research A, 42(3), 455-474.
• Diana, M. (2010) From mode choice to modal diversion: a new behavioural paradigm and an application to the study of the demand for innovative transport services, Technological Forecasting & Social Change, 77(3), 429-441.
• Golob, T.F. (2003) Structural equation modeling for travel behavior research, Transportation Research B, 37(1), 1-25.
• Hair, J.F., Anderson, R.E., Tatham, R.L., Black, W.C. (1998) Multivariate Data Analysis, 5 ed. Prentice Hall (but more recent editions are now available)
• Mokhtarian, P.L., Salomon, I. (2001) How derived is the demand for travel? Some conceptual and measurement considerations, Transportation Research A, 35(8), 695-719.