Target Rotations and - ub.edu · rotations can be used productively to establish a plausible...

28
Target Rotations and Assessing the Impact of Model Violations on the Parameters of Unidimensional Item Response Theory Models Steven Reise 1 , Tyler Moore 1 , and Alberto Maydeu-Olivares 2 Abstract Reise, Cook, and Moore proposed a ‘‘comparison modeling’’ approach to assess the dis- tortion in item parameter estimates when a unidimensional item response theory (IRT) model is imposed on multidimensional data. Central to their approach is the compari- son of item slope parameter estimates from a unidimensional IRT model (a restricted model), with the item slope parameter estimates from the general factor in an explor- atory bifactor IRT model (the unrestricted comparison model). In turn, these authors suggested that the unrestricted comparison bifactor model be derived from a target fac- tor rotation. The goal of this study was to provide further empirical support for the use of target rotations as a method for deriving a comparison model. Specifically, we con- ducted Monte Carlo analyses exploring (a) the use of the Schmid–Leiman orthogonal- ization to specify a viable initial target matrix and (b) the recovery of true bifactor pattern matrices using target rotations as implemented in Mplus. Results suggest that to the degree that item response data conform to independent cluster structure, target rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis, unidimensionality, mul- tidimensionality, Schmid–Leiman transformation, target rotations 1 University of California–Los Angeles, Los Angeles, CA, USA 2 University of Barcelona, Barcelona, Spain Corresponding Author: Steven Reise, Department of Psychology, University of California–Los Angeles, 1285 Franz Hall, Los Angeles, CA 90091, USA Email: [email protected] Educational and Psychological Measurement 71(4) 684–711 ª The Author(s) 2011 Reprints and permission: sagepub.com/journalsPermissions.nav DOI: 10.1177/0013164410378690 http://epm.sagepub.com

Transcript of Target Rotations and - ub.edu · rotations can be used productively to establish a plausible...

Page 1: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Target Rotations andAssessing the Impact of ModelViolations on the Parametersof Unidimensional ItemResponse Theory Models

Steven Reise1, Tyler Moore1, andAlberto Maydeu-Olivares2

Abstract

Reise, Cook, and Moore proposed a ‘‘comparison modeling’’ approach to assess the dis-tortion in item parameter estimates when a unidimensional item response theory (IRT)model is imposed on multidimensional data. Central to their approach is the compari-son of item slope parameter estimates from a unidimensional IRT model (a restrictedmodel), with the item slope parameter estimates from the general factor in an explor-atory bifactor IRT model (the unrestricted comparison model). In turn, these authorssuggested that the unrestricted comparison bifactor model be derived from a target fac-tor rotation. The goal of this study was to provide further empirical support for the useof target rotations as a method for deriving a comparison model. Specifically, we con-ducted Monte Carlo analyses exploring (a) the use of the Schmid–Leiman orthogonal-ization to specify a viable initial target matrix and (b) the recovery of true bifactorpattern matrices using target rotations as implemented in Mplus. Results suggest thatto the degree that item response data conform to independent cluster structure, targetrotations can be used productively to establish a plausible comparison model.

Keywords

item response theory, bifactor model, ordinal factor analysis, unidimensionality, mul-tidimensionality, Schmid–Leiman transformation, target rotations

1University of California–Los Angeles, Los Angeles, CA, USA2University of Barcelona, Barcelona, Spain

Corresponding Author:

Steven Reise, Department of Psychology, University of California–Los Angeles, 1285 Franz Hall, Los

Angeles, CA 90091, USA

Email: [email protected]

Educational and PsychologicalMeasurement

71(4) 684–711ª The Author(s) 2011

Reprints and permission:sagepub.com/journalsPermissions.nav

DOI: 10.1177/0013164410378690http://epm.sagepub.com

Page 2: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Most applications of item response theory (IRT) modeling have focused on unidimen-

sional models (i.e., models that assume a single latent trait). Yet if more than one

latent trait is required to achieve local independence, then using a unidimensional

model will result in biased IRT item parameters. There is a large literature that has

explored the robustness of unidimensional IRT model parameter estimates to unidi-

mensionality violations (e.g., Drasgow & Parsons, 1983; Reckase, 1979). One conclu-

sion arising from this literature is that if factor analysis or a related method is applied

to the data and a ‘‘strong’’ common factor, or multiple highly correlated factors are

obtained, then unidimensional IRT item parameter estimates are reasonably robust,

that is, they are not seriously distorted.

Drawing from the IRT robustness literature, many researchers have adopted the

idea that it is safe to apply unidimensional IRT models when data have a ‘‘strong’’

common dimension. Consequently, researchers considering the application of unidi-

mensional IRT models rely almost exclusively on rule-of-thumb guidelines (e.g., ratio

of first to second eigenvalues greater than 3) and post hoc model analysis (e.g., small

residuals after fitting a single factor, or acceptable ‘‘fit’’ to a one-factor model;

McDonald & Mok, 1995) to judge whether item response data are ‘‘unidimensional

enough’’ for application of IRT.

Reise, Cook, and Moore (2010), on the other hand, argued that no matter how rea-

sonable, intuitive, or statistically rigorous, no rule-of-thumb index can provide an esti-

mate of the amount of bias that exists in the item parameter estimates obtained by

fitting a unidimensional IRT model when multidimensionality is present. Thus, as

a complement to rule-of-thumb guidelines and post hoc model checking, they sug-

gested fitting a multidimensional and a unidimensional IRT model and then compar-

ing the resulting item slope parameters (see also Ip, 2010). For this comparison to be

meaningful, the multidimensional IRT model must consist of an overall general

dimension (to capture what is in common among the items) plus some additional

group dimensions (to capture local dependencies). Furthermore, the comparison

should be performed using an intercept/slope parameterization, not a threshold/factor

loading parameterization, for reasons that will be discussed shortly.

Now, to obtain a multidimensional IRT solution with an overall dimension and spe-

cific additional dimensions (i.e., a hierarchical solution) requires some work. One first

needs to find a plausible multidimensional model for the data at hand. Then, the result-

ing solution is transformed into a hierarchical solution using a Schmid–Leiman trans-

formation (SL; Schmid & Leiman, 1957). However, as we shall show, the resulting SL

solution may yield biased estimates of the parameters on the general dimension and

should not be compared with the parameters estimated in the unidimensional model.

This is because the SL solution leads to a constrained hierarchical solution. There are

restrictions among the estimated parameters (see Yung, Thissen, & McLeod, 1999).

Thus, an additional step is needed. Using the SL solution (i.e., the constrained hierar-

chical solution) as target, a target rotation needs to be performed on the parameters of

the multidimensional solution. The resulting solution is an unconstrained hierarchical

model whose parameters on the general dimension are to be compared with those

Reise et al. 685

Page 3: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

obtained in the unidimensional solution. Notice that if in the unconstrained solution

each item depends on just the overall factor and a group factor (regardless of the

number of group factors) this is a bifactor model (Gibbons & Hedeker, 1992) and

throughout this article we shall use the term bifactor and hierarchical model

interchangeably.

Obviously, finding the best multidimensional IRT model may be rather computer

intensive and one may question the utility of so much computational effort simply to

investigate the robustness of IRT parameter estimates. However, the multidimensional

IRT model can be estimated using limited information methods (from tetrachoric/pol-

ychoric correlations) and as a result, the full procedure may take mere seconds. Note

that when estimating the model in this fashion, thresholds and factor loadings are esti-

mated. Ultimately, these need to be transformed to intercepts and slopes for compar-

ison to the unidimensional results, as described below.

Reise et al. (2010) have summarized the steps needed to investigate the robustness

of unidimensional IRT parameter estimates to local independence violations as

follows:

1. Find the best fitting and plausible multidimensional factor model for the data

at hand estimating the model using tetrachoric/polychoric correlations.

2. Apply an SL transformation to the factor loadings obtained in Step 1 to obtain

a restricted hierarchical solution.

3. Using the solution obtained in Step 2 as target, obtain an unrestricted hierar-

chical solution using target rotations (Browne, 2001).

4. Transform the factor loadings obtained in Step 3 to IRT slopes. Compare

them with the slopes obtained for a unidimensional solution. It is preferable

that for this comparison the parameters of the unidimensional solution are

also obtained using limited information methods (i.e., estimating the model

using tetrachoric/polychoric correlations) to avoid possible biases in the

comparison induced by the use of different estimation methods.

Reise et al. (2010) suggested that a finding of little or no difference in IRT slopes

argues for the application of a unidimensional IRT model whereas a finding of large

differences argues for (a) fitting a multidimensional IRT model—a bifactor IRT

model (Gibbons & Hedeker, 1992), a unidimensional model with local dependencies

modeled through correlated residuals (Ip, 2010), or an independent clusters solution

(McDonald, 1999, 2000); (b) revising the scale by deleting problematic items from

the analyses (e.g., items that load on more than one group factor in a hierarchical solu-

tion model); or (c) abandoning any latent variable model (i.e., concluding the scale

lacks any identifiable and interpretable latent structure).

In this study, our primary goal is to elaborate on and provide empirical support for

the approach suggested in Reise et al. (2010). To do so, first we provide an example to

illustrate their proposal. Two sections follow the numerical example: First, we review

the relations between ordinal factor analysis (OFA) and IRT models to justify our use

686 Educational and Psychological Measurement 71(4)

Page 4: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

of OFA for studying the parameters of IRT models. Second, Monte Carlo simulation is

used to explore (a) the viability of using a SL transformation to suggest an initial

bifactor target matrix (Steps 2 and 3) and (b) assuming that the target pattern matrix

is specified correctly, the ability of targeted bifactor pattern rotations to recover true

population parameters.

Ordinal Factor Analysis and Item Response Theory

In this section, we review the different parameterizations that may be used in two-

parameter IRT models. These are well known and have received considerable treat-

ment in the psychometric literature (e.g., Ackerman, 2005; Bartholomew & Knott,

1999; Forero & Maydeu-Olivares, 2009; Knol & Berger, 1991; McDonald, 2000;

McDonald & Mok, 1995; McLeod, Swygert, & Thissen, 2001; Takane, & de Leeuw,

1987; Wirth & Edwards, 2007). However, as we shall see, in the case of bifactor mod-

els equivalent bifactor models can appear very different in terms of the implied data

structure just because of the choice of parameterization.

Item Response Theory Models

IRT models specify an item response curve (IRC) to characterize the nonlinear rela-

tion between individual differences on a continuous latent variable (y) and the prob-

ability of endorsing a dichotomous item xi. Equation (1) shows the two-parameter

normal-ogive model:

Pðxi ¼ 1jyÞ ¼ EðxijyÞ ¼ðzi

�∞

1ffiffiffiffiffiffi2pp expð�Zi

2=2Þdt: ð1Þ

In Equation (1), z is a normal deviate equal to aðy� bÞ, where a is an item discrim-

ination parameter and b is an item location parameter corresponding to where on the

latent trait scale the probability of endorsing the item is .50. Stated in different terms,

Equation (1) is a cumulative normal distribution with a mean equal to b and standard

deviation s ¼ 1=a. Thus, z ¼ ðy� mÞ=s is simply the z-score derived from a particu-

lar normal distribution which is used to determine the cumulative proportion of cases

that fall at or below the normal deviate.

A more commonly applied IRT model that yields a very similar fit is the two-

parameter logistic model (2PL) shown in Equation (2).

Pðxi ¼ 1jyÞ ¼ EðxijyÞ ¼expð1:7ziÞ

1þ expð1:7ziÞ¼ expð1:7aiðy� biÞÞ

1þ expð1:7aiðy� biÞÞ; ð2Þ

where 1.7 is a scaling factor that makes the item slope parameter comparable to its

value under the normal-ogive parameterization (Equation 1). Including the 1.7 is crit-

ical because only the normal-ogive parameterization can be easily linked with ordinal

factor analytic terms.

Reise et al. 687

Page 5: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Equations (1) and (2) express the two-parameter model using its most commonly

found parameterization. However, this is not the parameterization that is used by

most IRT software packages, and this parameterization is not readily generalizable

to the multidimensional case. Instead it is better to write

zi ¼ aiy� aibi ð3Þ

and then define an intercept term g ¼ �ab and rewrite the deviate as shown in Equa-

tion (4):

zi ¼ aiyþ gi: ð4Þ

Obviously, because the slope is always positive, items with negative locations (easy

items) will have positive intercepts and items with positive locations (difficult items)

will have negative intercepts. Thus g reflects the relative ‘‘easiness’’ of the item.

Equation (4) has a straightforward generalization to the multivariate case, where

the number of dimensions is p ¼ 1; . . . ;P (possibly) correlated factors.

zi ¼X

aipyp þ gi: ð5Þ

In Equation (5), g is now a multidimensional intercept parameter representing the

deviate corresponding to the expected proportion endorsing an item for individuals

who are at the mean on all latent traits. In the simple case of a bifactor IRT model

(e.g., Gibbons & Hedeker, 1992) where each item loads on a single general factor

(G), and at most on one additional orthogonal group factor (p ¼ relevant group fac-

tor), Equation (5) becomes

zi ¼ aiGðyGÞ þ aip0 ðyp0 Þ þ gi: ð6Þ

Ordinal Factor Analytic Models

Applying the usual (linear) factor analysis model to dichotomous or polytomous items

is problematic for a variety of reasons (see Bernstein & Teng, 1989; Wirth & Edwards,

2007). However, the model can be easily modified to accommodate ordinal responses

by adding a threshold process to the linear factor model for continuously distributed

responses (Christoffersson, 1975; Maydeu-Olivares, 2006; Muthen, 1979). For exam-

ple, consider the binary case and n items. The ordinal factor analysis model assumes

that the observed 0 or 1 item response is a discrete realization of a continuous and nor-

mally distributed latent response process (x)) underlying the items. More specifically,

consider the fully standardized model:

Eðx�i jy1 . . . yPÞ ¼ x�i ¼ λi1y1 þ λi2y2 þ λi3y3 þ � � � þ λiPyP þ ei: ð7Þ

In this model, x) is not an observed 0 or 1 item score but rather is a continuous and

normally distributed response propensity variable assumed to underlie item perfor-

mance. In addition, l is a factor loading (correlation), which, in the unidimensional

688 Educational and Psychological Measurement 71(4)

Page 6: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

case, can be shown to be analogous to an item-test biserial correlation. To develop the

model further, an item threshold parameter (t) needs to be specified such that x ¼ 1, if

x) ≥ ti and xi ¼ 0, if x) < ti. That is, an individual will endorse an item only if his or

her response propensity is above the item’s threshold.

The model in Equation (7) assumes that the errors are multivariate normal and

uncorrelated with each other and the latent traits. The implied correlation matrix is

thus

� ¼ ���0 þ�; ð8Þ

where, � is the implied correlation matrix for the ‘‘hypothetical’’ latent propensities

(i.e., it is a tetrachoric correlation matrix), � is an n × p matrix of factor loadings, �is a p × p matrix of factor correlations, and � is a p × p diagonal matrix of residual

variances such that � ¼ I� diagð���0Þ.It turns out that the OFA model just described and the normal-ogive model in Equa-

tion (1) are equivalent models. Indeed, we can write the normal deviate for the mul-

tidimensional or bifactor cases:

zi ¼λi1y1 þ λi2y2 þ � � � þ λiPyP � tiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1�P

λ2i

q or zi ¼λiGyG þ λip0yp0 � tiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1�P

λ2i

q : ð9Þ

Clearly, in either case the normal deviate is formed as a weighted linear combination

of the factor loadings, which is then scaled relative to the square root of an item’s

unique variance (the residual standard deviation of x)). Finally, when either of the

normal deviates in Equation (9) is substituted into Equation (1), the model becomes

a normal-ogive OFA model.

Transformations

The equations to translate between an OFA and the two-parameter normal-ogive IRT

model are cited in dozens of readily available sources (e.g., Ackerman, 2005; Knol &

Berger, 1991; McDonald, 2000, McLeod et al., 2001), and are obvious through inspec-

tion of Equation (9). Assuming that the latent factors are uncorrelated, the translations

between FA loadings and IRT slopes (normal-ogive metric) are

λip ¼aipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1þPP

p¼1 a2ip

q ; aip ¼λipffiffiffiffiffici

p ¼ λipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1�

PP

p¼1 λ2ip

q : ð10Þ

The denominator in the latter equation is readily interpretable as the square root of the

item’s uniqueness, ci (one minus the communality). Thus, it is immediately apparent

that a slope a on a particular dimension is a function of both an item’s loading on that

dimension, l, and its uniqueness, c, within the context of a specific model (Muthen &

Christoffersson, 1981, p. 411); items with more common variance in the OFA model

Reise et al. 689

Page 7: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

will have relatively higher slopes in the IRT parameterization. This is exactly why the

interpretation of equivalent multidimensional IRT and OFA models is challenging, as

we will show shortly.

Note that the denominator of either of the above equations becomes considerably

more complicated in the case of multiple correlated dimensions. For example, in the

case of two correlated factors, the denominator for the IRT slope equation is

ffiffiffiffiffici

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1�

XP

p¼1λ2

ip � 2λ1λ2f12

r: ð11Þ

Finally, the translation between IRT item intercepts and factor analytic thresholds fol-

lows a similar logic as Equation (10).

ti ¼�giffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1þPP

p¼1 a2ip

q ; gi ¼�tiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1�PP

p¼1 λ2ip

q : ð12Þ

Interpreting Multidimensional Solutions in OFA and IRT

When data are strictly unidimensional the relations between OFA and IRT are simple

because the item parameter estimates leave the researcher with much the same impres-

sion of the data structure and psychometric functioning of the items. For example, in

the first column in the top half of Table 1 are factor loadings for 15 items. This matrix

mimics a hypothetical test with three content clusters of five items each, where the

relation between the item and common latent trait is equal within cluster but not

between. In the next column are the transformed IRT parameters. Clearly, the same

message about the data structure is conveyed—Items 1 through 5 are the best, Items

11 through 15 are the worst, and the relations between item and latent trait are the

same within cluster but not between.

Such symmetry of translation is, generally speaking, not possible when data have

a multidimensional structure and the items vary in communality. In the top middle

portion of Table 1 we display the loading matrix for a bifactor model with one general

and three orthogonal group factors. The items are all equally related to the general

trait, but the content clusters differ in their loadings on the group factors. In the final

set of columns are shown the equivalent transformed IRT model parameters. The

items do not have equal relations to the general trait, but rather the subset of items

with the smallest uniqueness appear to be the best items in terms of measuring the gen-

eral factor. In short, the factor analytic parameterization suggests a different psycho-

metric interpretation than under the IRT parameterization.

In the bottom portion of Table 1 we reversed the process by starting with an IRT

model with equal slopes on the general factor and varying the group factor slopes.

Specifically, the item slopes on the general factor are 0.75, and group slopes are

0.58, 0.32, and 0.10. The true slopes in the bottom portion of Table 1 mimic the

true factor values in the top half of Table 1 in that these are the IRT slopes equal to

the factor loadings ignoring the multidimensionality. That is, the true slopes are simply

690 Educational and Psychological Measurement 71(4)

Page 8: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

the true loadings divided by the square root of 1 minus the loading squared. Clearly, the

right hand side in the lower half of Table 1 demonstrates that when IRT parameters are

translated to OFA terms, the items no longer load on the general factor equally.

Table 1. Demonstration of OFA and IRT Transformations

Unidimensional Multidimensional

FA IRT FA IRT

Item l a lg l1 l2 l3 ag a1 a2 a3

1 0.78 1.17 0.60 0.50 0.96 0.802 0.78 1.17 0.60 0.50 0.96 0.803 0.78 1.17 0.60 0.50 0.96 0.804 0.78 1.17 0.60 0.50 0.96 0.805 0.78 1.17 0.60 0.50 0.96 0.806 0.67 0.90 0.60 0.30 0.81 0.407 0.67 0.90 0.60 0.30 0.81 0.408 0.67 0.90 0.60 0.30 0.81 0.409 0.67 0.90 0.60 0.30 0.81 0.4010 0.67 0.90 0.60 0.30 0.81 0.4011 0.61 0.77 0.60 0.10 0.76 0.1312 0.61 0.77 0.60 0.10 0.76 0.1313 0.61 0.77 0.60 0.10 0.76 0.1314 0.61 0.77 0.60 0.10 0.76 0.1315 0.61 0.77 0.60 0.10 0.76 0.13

Unidimensional Multidimensional

IRT FA IRT FA

Item l a ag a1 a2 a3 lg l1 l2 l3

1 1.17 0.78 0.75 0.58 0.54 0.422 1.17 0.78 0.75 0.58 0.54 0.423 1.17 0.78 0.75 0.58 0.54 0.424 1.17 0.78 0.75 0.58 0.54 0.425 1.17 0.78 0.75 0.58 0.54 0.426 0.90 0.67 0.75 0.32 0.58 0.257 0.90 0.67 0.75 0.32 0.58 0.258 0.90 0.67 0.75 0.32 0.58 0.259 0.90 0.67 0.75 0.32 0.58 0.2510 0.90 0.61 0.75 0.32 0.58 0.2511 0.77 0.61 0.75 0.10 0.60 0.0812 0.77 0.61 0.75 0.10 0.60 0.0813 0.77 0.61 0.75 0.10 0.60 0.0814 0.77 0.61 0.75 0.10 0.60 0.0815 0.77 0.61 0.75 0.10 0.60 0.08

OFA ¼ ordinal factor analysis; IRT ¼ item response theory.

Reise et al. 691

Page 9: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

The results in Table 1 demonstrate that although the IRT and OFA models are

equivalent in implied correlational structure, in the multidimensional case there is

a nonsymmetry to the interpretation owing to the critical role of an item’s uniqueness.

This nonsymmetry of interpretation must be considered when a researcher is using the

results of one parameterization to understand what the equivalent model would look

like under the alternative parameterization. Fortunately, programs for estimating mul-

tidimensional models such as TESTFACT (Bock et al., 2002) and NOHARM (Fraser,

1988; Fraser & McDonald, 1988) routinely provide results in both IRT and OFA

terms. Nevertheless, this nonsymmetry of interpretation in multidimensional models

can cause confusion when interpreting the results of research into the effects of model

violations on item parameter estimates.

Parameter Estimation in OFA and IRT

The above section outlined the theoretical relations between OFA and IRT parame-

ters; however, in practice, such parameters need to be estimated. Generally speaking,

there are two strategies for estimating these parameters. One strategy, used in OFA,

estimates the parameters using only bivariate information, in several stages, generally

through the use of tetrachoric (or polychoric) correlations (see, e.g., Muthen, 1993).

The other strategy, used in IRT estimates the parameters using full information max-

imum likelihood estimation (Bock & Aitkin, 1981). Thus, even when the same model

is estimated, as in the case of the normal-ogive model, OFA and IRT use slightly dif-

ferent approaches. In OFA, the threshold and factor loading parameterization is used,

and parameters are estimated using bivariate information. In IRT, the intercept and

slope parameterization is used, and parameters are estimated using full information

maximum likelihood. Full information maximum likelihood is theoretically superior

to bivariate estimation methods, but the latter are considerably faster for multidimen-

sional models. Another advantage of the bivariate information methods is that a

goodness-of-fit test of the model to the tetrachoric correlations is available (see

Muthen, 1993; Maydeu-Olivares, 2006)—and an associated root mean square error

of approximation. In contrast, comparable procedures for goodness-of-fit assessment

in the case of full information maximum likelihood estimation (e.g., Maydeu-Olivares

& Joe, 2005, 2006) are not yet available in standard software.

Reise et al. (2010) argued that the simplest bivariate estimation method, common

factoring via unweighted least squares estimation of a tetrachoric matrix is a good

enough approach for the purposes of determining the robustness of unidimensional

modeling of multidimensional data. This view is supported by research such as Forero

and Maydeu-Olivares (2009), who performed an exhaustive Monte Carlo comparison

of full and bivariate estimation strategies and concluded that for samples with more

than 500 observations the differences in parameter accuracy between both methods

were negligible (and even slightly favoring unweighted least squares). In the next sec-

tion, we consider the development and specification of a comparison bifactor model

using the SL and target rotations.

692 Educational and Psychological Measurement 71(4)

Page 10: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

The Schmid–Leiman and Target Bifactor Rotations

As described above, in the comparison modeling approach, estimating the degree to

which model violations affect unidimensional IRT item parameter estimates involves

the comparison of an exploratory bifactor model (i.e., the unrestricted model) with

a highly restricted model (i.e., a unidimensional model). Although the model compar-

ison idea is simple enough, the realization in practice is challenging because of the fact

that commonly used analytic rotations are geared toward identifying simple structure

solutions (see Browne, 2001 for detailed discussion). When the true data structure is

bifactor in the population, exploratory analytic rotations such as promax or oblimin

will fail to identify the correct underlying data structure.

It must be recognized that in practice, to use exploratory multidimensional models

to judge the effects of forcing data into a restricted model, the comparison model must

be plausible and reasonably reflect the population loading pattern. That is, it must

approximate what the true factor loadings or IRT slopes would be without the imposed

restrictions; one cannot meaningfully compare a wrong bifactor model with a more

wrong unidimensional model. The challenge is, given the limitations of analytic rota-

tions, how can researchers identify an acceptable exploratory method that can identify

a bifactor structure?

As noted in the steps above, Reise et al. (2010) suggested that an SL orthogonal-

ization (Schmid & Leiman, 1957) be used as a basis for specifying a bifactor target

pattern, and then a target pattern rotation (Browne, 2001) be used to estimate an unre-

stricted comparison model. This begs two questions. First, assuming that in the pop-

ulation the data structure is bifactor, under what conditions can the SL

orthogonalization identify a good target matrix? By good target matrix, we mean

a matrix that specifies all near-zero loadings to specified (0), and all nontrivial load-

ings to unspecified (?). Second, given a correct target matrix, how well can target rota-

tions recover true population parameters? In the following two sections, we use Monte

Carlo simulation to address each of these issues in turn.

The Schmid–Leiman: Getting the TargetBifactor Pattern Right

Target pattern rotations require an initial target matrix of specified (0) and unspecified

(?) elements. ‘‘This target matrix reflects partial knowledge as to what the factor pat-

tern should be’’ (Browne, 2001, p. 124). In addition, the target matrix forms the basis

for a rotation that minimizes the sum of squared differences between the target and the

rotated factor pattern.

Although target rotations potentially have a self-correcting element (i.e., just

because a loading is specified as zero does not mean the loading estimate must be

zero and vice versa), the robustness of misspecifying the elements of a target matrix

is currently unknown. Thus, at this point, target rotations (Browne, 2001) can only be

expected to function optimally when the target pattern is correctly specified.

Reise et al. 693

Page 11: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

It follows that an important goal of comparison modeling is to identify a bifactor

structure so that a researcher can form a target matrix such that items with nontrivial

(e.g., ≥ .30) loadings on the group factors in the population are unspecified (?), and

items with trivial or near-zero loadings are specified (0). Note that it is assumed all

items have nontrivial loadings on the general factor. Of course, there are many way

to specify an initial target matrix. For example, a target can be generated from theory,

or from exploratory data analyses such as factor analysis, cluster analyses, or multidi-

mensional scaling.

In conjunction with these methods, Reise et al. (2010) suggested that the SL

orthogonalization be used as an exploratory tool to determine the number of identified

group factors in a data matrix and to determine the pattern of trivial and nontrivial

loadings on the group factors. Here, we define a nontrivial loading as being greater

than or equal to .15 in an SL (Schmid & Leiman, 1957) solution (as opposed to a pop-

ulation value) for reasons that will be made clear shortly. In this section, we consider

the ability of the SL to identify an appropriate target matrix.

The SL is an orthogonalization of a higher order factor model, which itself is a prod-

uct of a common factor analysis with oblique rotation (McDonald, 2000). For exam-

ple, starting with a tetrachoric correlation matrix, a researcher: (a) extracts (e.g.,

unweighted least squares) a specified number of primary factors and rotates the factors

obliquely (e.g., oblimin) and (b) factor analyzes the primary factor correlation matrix

to determine the relation between each primary factor and a general factor. Given this

higher order representation, the SL is easily obtained. Specifically, an item’s loading

on the general factor is simply its loading on the primary factor multiplied by the load-

ing of the primary factor on the general factor. An item’s loading on a group factor is

its loading on the primary factor multiplied by the square root of the disturbance (i.e.,

that part of the primary factor that is not explained by the general factor).

The Achilles’ heel of the SL procedure is that because of the well known propor-

tionality constraints (Yung et al., 1999), the SL estimates of population loadings are

unbiased under only a narrow range of conditions. The proportionality constraints

emerge because, as described above, for each item, the group and general loadings

in the SL are functions of common elements (i.e., the loading of the primary on the

general and the unique variance of the primary). As a consequence, a researcher can-

not rely on the SL to derive an acceptable comparison model because of the bias in

factor loadings that emerge under some conditions.

To demonstrate this, in the first set of columns in the top portion of Table 2, we

specified a population bifactor structure with equal loadings on the general and equal

loadings within each group factor. In the bottom portion is a SL of the implied corre-

lation matrix, extracting three factors using unweighted least squares and rotating

using oblimin. The SCHMID routine included in the PSYCH package (Revelle,

2009) of the R software program (R Development Core Team, 2008) was used for

all analyses. Obviously, in this example, the SL perfectly reflects the population struc-

ture because the loadings within the group factors are proportional to the general fac-

tor. If real data were to always mimic this perfectly proportional structure, the SL

694 Educational and Psychological Measurement 71(4)

Page 12: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

would always yield the correct comparison matrix (given enough data) and there

would be no need for target bifactor rotations.

In the top portion of Table 2 in the middle set of columns we specified an extreme

condition where loadings on the general are all .40 but loadings within each group

Table 2. The Schmid–Leiman Orthogonalization Under Three Conditions

True population structure

IC: proportional IC: not proportional IC basis

Item Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 .50 .60 .40 .27 .50 .60 .502 .50 .60 .40 .68 .50 .603 .50 .60 .40 .65 .50 .604 .50 .60 .40 .21 .50 .605 .50 .60 .40 .65 .50 .606 .50 .50 .40 .43 .50 .50 .507 .50 .50 .40 .51 .50 .508 .50 .50 .40 .29 .50 .509 .50 .50 .40 .27 .50 .5010 .50 .50 .40 .19 .50 .5011 .50 .40 .40 .61 .50 .50 .4012 .50 .40 .40 .13 .50 .4013 .50 .40 .40 .46 .50 .4014 .50 .40 .40 .64 .50 .4015 .50 .40 .40 .53 .50 .40

Schmid–Leiman

Item Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 .50 .60 .00 .00 .33 .32 .09 .05 .65 .49 .36 .112 .50 .60 .00 .00 .41 .67 .01 .01 .49 .61 .01 .023 .50 .60 .00 .00 .40 .65 .00 .00 .49 .61 .01 .024 .50 .60 .00 .00 .31 .27 .10 .06 .49 .61 .01 .025 .50 .60 .00 .00 .40 .65 .00 .00 .49 .61 .01 .026 .50 .00 .50 .00 .41 .01 .41 .01 .61 .05 .40 .387 .50 .00 .50 .00 .43 .02 .45 .02 .50 .01 .50 .038 .50 .00 .50 .00 .37 .02 .33 .02 .50 .01 .50 .039 .50 .00 .50 .00 .36 .02 .32 .03 .50 .01 .50 .0310 .50 .00 .50 .00 .33 .04 .27 .05 .50 .01 .50 .0311 .50 .00 .00 .40 .41 .01 .01 .60 .56 .45 .05 .3312 .50 .00 .00 .40 .30 .07 .12 .20 .42 .05 .04 .4813 .50 .00 .00 .40 .38 .02 .03 .48 .42 .05 .04 .4814 .50 .00 .00 .40 .42 .01 .02 .63 .42 .05 .04 .4815 .50 .00 .00 .40 .39 .01 .01 .54 .42 .05 .04 .48

IC ¼ independent cluster. The boldfaced numbers are the loadings for the three items with cross-loadings

in the population.

Reise et al. 695

Page 13: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

factor were selected from a random uniform distribution ranging from 0.10 to 0.70.

Note that this structure is perfect independent cluster (IC; McDonald, 1999; 2000),

that is, there are no items with cross-loadings on the group factors (which is equivalent

to saying there are no items with cross-loadings in the oblimin solution), and each

orthogonal group factor has three items to identify it. This particular true loading pat-

tern, although probably unrealistic in practice, was selected because of its clear vio-

lations of proportionality.

After converting these loadings into a correlation matrix, in the bottom portion of

Table 2 in the middle set of columns are shown the SL loadings for this independent

clusters data. Consider first group factor one (Items 1 through 5) where for three of the

five items, the true group factor loadings are higher than the true general factor load-

ings, and for the remaining two items, the group factor loadings are way below the

loadings for the general. For the latter two items, in the SL the loadings on the general

factor are underestimated, and the loadings on the group factor are overestimated. In

addition, small positive loadings occur for these items on Group Factors 2 and 3. For

the items in Group Factor 2, Items 6 and 7 have overestimated general factor loadings

and underestimated group factor loadings. The reverse occurs for Items 8 through 10.

Such findings are easily anticipated from inspection of the original communalities

and inspection of the loadings in an oblique rotated factor pattern. However, the

important lesson is that although the SL can reasonably reproduce the correct pattern

of trivial and nontrivial loadings even under these extreme conditions, the specific fac-

tor loadings in the SL are not proper estimates of their corresponding population val-

ues. This result can be entirely attributed to the proportionality constraints, and

implies that the SL loadings should not, generally speaking, be used as a comparison

matrix.

Finally, in the third set of columns we have reproduced the population loading pat-

tern from the first demonstration but added large (.50) cross-loadings for Items 1, 6,

and 11. The data are no longer independent cluster but rather have an independent

cluster basis (ICB; each group factor is identified by three items with simple loadings,

but there are also items with cross-loadings). In the bottom portion we see the effects

of cross-loadings on the SL results. Specifically, a cross-loading raises an item’s com-

munality and thus causes an item’s loading on the general factor to increase. This

overestimation is proportional to the communality of the item. In turn, loadings on

the group factors decrease relative to their true population values. The critical lesson

learned from this demonstration is that in the SL, one must proceed very cautiously in

interpreting the size of the loadings, and they should not be taken seriously as esti-

mates of population parameters.

The above demonstrations do not necessarily spoil our ability to use the SL to iden-

tify an appropriate target matrix. In specifying a target pattern a researcher does not

have to get the loadings right but only the pattern of trivial and nontrivial loadings.

However, the fact that loadings in the SL are not good estimates of their population

values must be considered in judging how to specify an initial target in real data sit-

uations. A topic we now address.

696 Educational and Psychological Measurement 71(4)

Page 14: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

As noted, we are interested in identifying any group factor loading that is greater

than or equal to .30 in the population (we always assume items on the general meet

this criterion). To accomplish this, we use the SL results to build a target pattern by

(a) specifying that all items are unspecified (?) on the general factor (if preliminary

analyses suggest otherwise, we delete the item) and on the group factors and (b) if

the SL loading is greater than or equal to .15 the element is unspecified, otherwise

it is specified. The reason we selected this liberal criterion is that we are most con-

cerned with not missing any cross-loadings if they exist in the population; if an

item has a sizable loading on the general and group factor, as well as a cross-loading,

its communality will tend to be high and its group factor loadings underestimated in

the SL. To allow for this potential underestimation we selected the liberal .15

criterion.

This decision rule begs the following question: How well does it work in practice?

Even without data, it is obvious that the largest problems will occur with loadings that

are around .30 in the population. Depending on sample size, and the distortions of the

SL, such values may be underestimated in real data and thus incorrectly specified. The

other, less serious problem is that of loadings that are truly zero in the population, but

overestimated in the SL using real data. This would cause the error of not specifying

an element when it truly should be specified. Because of limited research on the

effects of incorrect targets in target rotations, it is unclear which of these potential

errors is more serious. Given the self-correcting nature of target rotations mentioned

previously, neither may be serious if a researcher allows an iterative method of defin-

ing a target rotation. Such iteration strategies are beyond the present scope, however.

In what follows, we demonstrate the fitting of targeted rotations to multidimen-

sional data in order to evaluate unidimensional and restricted bifactor IRT models.

First, we evaluate the ability of the SL to recover a good target matrix, and second,

we evaluate the target rotations themselves (assuming a correct target). In both

cases, we specify a true population factor loading matrix, and then we convert

that matrix into a true population tetrachoric matrix � using the relation given in

Equation (13):

� ¼ ���0 þ�: ð13Þ

This tetrachoric matrix is then used to generate dichotomous data. A sample tet-

rachoric matrix is estimated from the simulated dichotomous data and submitted to

the analyses detailed in later sections. In all analyses, we specified a structure with

15 items, and OFA threshold (or IRT intercept) parameters are fixed to zero for all

items. This was done to control for the possibility that the estimation of loadings or

slopes is affected by the value of the thresholds or intercepts, respectively.

Simulations

To explore the efficacy of the SL to suggest an initial target, we conducted two sets of

Monte Carlo analyses. In the first, we specified true population bifactor structures that

Reise et al. 697

Page 15: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

were independent cluster—each item loads on the general and one and only one group

factor. For all analyses, we specified 15 items with 5 items loading on each of three

group factors. We defined three levels of loadings: low ¼ .3 to .5, medium ¼ .4 to .6,

and high ¼ .5 to .7. The design was then completely crossed with general and group

factor loadings randomly selected from a uniform distribution with boundaries set to

either low, medium or high values (defined above). For example, conditions ran from

general factor loadings high and group factor loadings low, to, general factor loadings

low and group factor loadings high. Note that because we knew the true population

pattern matrix, we automatically knew the correct target matrix. Note also that the

true loadings specified in the leftmost columns of Tables 3 and 4 are means of the uni-

form distribution from which the simulated loadings were sampled. For example,

loadings sampled from a uniform distribution ranging from 0.4 to 0.6 (‘‘medium,’’

as defined above) would be indicated in the tables as 0.5 (the mean of said

distribution).

Based on the true loading matrices simulated above, we computed the implied pop-

ulation correlation matrix. This process was then repeated 1,000 times for three sam-

ple size conditions of 250, 500, and 1,000. For each simulated data set, we then

estimated the tetrachoric correlation matrix, and conducted a SL orthogonalization

extracting three group factors using unweighted least squares and rotating to an obli-

min solution. For each SL result, we specified a target matrix using the .15 criterion

mentioned above. The outcome of interest is simply the number of times each element

of the target matrix was correctly specified.

There are two types of mistakes possible: (a) a loading that is zero in the population

is estimated above .15 and is thus left erroneously unspecified (?) and (b) a loading

that is .30 and above in the population is underestimated in the data and gets errone-

ously specified (0). The results under IC conditions are very easy to summarize. When

the data are IC cluster and sample size is 500 and above, errors in specifying the cells

of the target matrix occur less than 1% of the time. Although we simulated a total of

six conditions (strong general, strong groups; strong general, medium groups, etc.),

each with sample sizes of 250, 500, and 1,000, we show only the four most extreme

conditions. Table 3 shows the percentages of times errors occurred in each cell of the

target matrix under conditions of a strong general trait with strong group factors (top

portion) and a weak general trait with weak group factors (bottom portion). Note that

the proportion errors for the general factor are not shown, because those are always

unspecified (correctly) in the comparison modeling procedure. As expected, strong

general and group factor loadings are almost always discovered and specified cor-

rectly (for the target) by the SL, even in small sample sizes. In sample sizes above

500, it is fair to say the SL will always recover a correctly specified target in this

strong general/strong groups condition.

The condition involving a weak general factor combined with weak group factors

(bottom portion of Table 3) is far more likely to cause problems for the SL when trying

to recover the true target. This is because low loadings are more vulnerable to under-

estimation below 0.15, especially in small samples. Estimation below this threshold

results in the target element erroneously being specified (0) rather than unspecified

698 Educational and Psychological Measurement 71(4)

Page 16: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Table 3. Effects of Sample Size on the Ability of the Schmid–Leiman to Specify a Correct Target

Demonstration A: Strong general, strong groups

Frequency of incorrect specification (%)

True loadings N ¼ 250 N ¼ 500 N ¼ 1,000

Gen G1 G2 G3 G1 G2 G3 G1 G2 G3 G1 G2 G3

0.60 0.60 0 3 1 0 0 0 0 0 00.60 0.60 0 0 2 0 0 0 0 0 00.60 0.60 0 1 2 0 0 0 0 0 00.60 0.60 0 1 1 0 0 0 0 0 00.60 0.60 0 3 2 0 0 0 0 0 00.60 0.60 2 0 1 0 0 1 0 0 00.60 0.60 1 0 2 0 0 0 0 0 00.60 0.60 1 0 0 0 0 0 0 0 00.60 0.60 3 0 3 0 0 0 0 0 00.60 0.60 1 0 2 0 0 0 0 0 00.60 0.60 2 0 0 1 0 0 0 0 00.60 0.60 0 3 0 0 0 0 0 0 00.60 0.60 1 1 0 0 0 0 0 0 00.60 0.60 1 1 0 0 0 0 0 0 00.60 0.60 4 4 0 0 0 0 0 0 0

Demonstration B: Weak general, weak groups

Frequency of incorrect specification (%)

True loadings N ¼ 250 N ¼ 500 N ¼ 1,000

Gen G1 G2 G3 G1 G2 G3 G1 G2 G3 G1 G2 G3

0.40 0.40 3 17 15 0 4 2 0 0 00.40 0.40 1 16 18 1 4 1 0 1 10.40 0.40 2 24 14 2 5 3 0 0 00.40 0.40 2 22 13 1 3 2 0 0 20.40 0.40 1 18 17 2 10 6 0 3 00.40 0.40 25 6 15 6 0 5 0 0 10.40 0.40 16 4 19 2 0 3 0 0 00.40 0.40 20 5 17 4 1 6 1 0 20.40 0.40 19 2 9 4 0 6 1 0 00.40 0.40 14 2 21 6 0 9 0 0 00.40 0.40 23 12 1 5 5 0 0 0 00.40 0.40 23 10 1 3 6 0 0 3 00.40 0.40 19 15 0 2 4 0 2 1 00.40 0.40 13 15 2 7 5 1 0 1 00.40 0.40 20 24 3 6 1 0 1 0 0

Reise et al. 699

Page 17: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

(?). Even given this problematic condition of weak general and group factors, how-

ever, sufficient sample size (500-1,000) can correct the problem almost entirely.

Table 4 shows the equivalent results for conditions in which the general factor is

strong and the group factors are weak (top portion) and vice versa (bottom portion).

Results in the top portion of Table 4 suggest that the existence of a strong general fac-

tor improves the SL’s ability to recover the true target, but perhaps not enough to jus-

tify sample sizes below 500. Whereas a condition involving strong general and group

factors (top portion of Table 3) appears robust to small samples, the same cannot nec-

essarily be said of the condition shown in the top portion of Table 4. The bottom por-

tion of Table 4 contains the results when the top portion is reversed. Interestingly, the

existence of strong group factors (and a weak general factor) is even more conducive

to correct target specification by the SL than the reverse condition (strong general,

weak groups). Indeed, given strong group factors, one can expect the SL to specify

a correct target 90% of the time with a sample size of 250. Conditions involving

medium general and group factor strengths are not shown, because these can be inter-

polated from the four extreme conditions in Tables 3 and 4.

Target Bifactor Rotations: How Accurate?

Given the results above, we now turn to consideration of the accuracy of targeted rota-

tions under different conditions, assuming that the target matrix is correctly specified.

As noted, rotations to a target structure are not new (e.g., Tucker, 1940), but the rota-

tion of a factor pattern to a partially specified target matrix (Browne, 1972a, 1972b,

2001) is only recently gaining increased attention due to the availability of software

packages to implement targeted and other types of nonstandard rotation methods

(e.g., Mplus—Asparouhov & Muthen, 2008; comprehensive exploratory factor anal-

ysis [CEFA]—Browne, Cudeck, Tateneni, & Mels, 2004). The Mplus program allows

the user to conduct Monte Carlo simulations by specifying a true population loading

matrix, and target pattern matrix of specified (0) and unspecified (?) elements. The

weighted least squares estimator was used.

‘‘This target matrix reflects partial knowledge as to what the factor pattern should

be’’ (Browne, 2001, p. 124). The target matrix forms the basis for a rotation that min-

imizes the sum of squared differences between the target and the rotated factor pattern.

It is important to recognize that a specified element of a target pattern matrix is not the

same as a fixed element in structural equation modeling. A fixed element in structural

equation modeling is not allowed to vary at all from its specified value. If the said

value is unreasonable, it will manifest in poor fit statistics (comparative fit index,

root mean square error of approximation, etc.). In targeted rotations, however,

a ‘‘fixed/specified’’ element is free to vary from its ‘‘specified’’ value if a different

value will lead to better overall fit of the target to the estimated loading matrix. As

mentioned above, in targeted rotation, ‘‘fit’’ is measured by the sum of squared devi-

ations of the pattern matrix from the estimated matrix.

Above, we focused on the first relevant question in evaluating the effectiveness of

the comparison modeling approach: How often does the SL orthogonalization succeed

700 Educational and Psychological Measurement 71(4)

Page 18: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Table 4. Effects of Sample Size on the Ability of the Schmid–Leiman to Specify a Correct Target

Demonstration C: Strong general, weak groups

Frequency of incorrect specification (%)

True loadings N ¼ 250 N ¼ 500 N ¼ 1,000

Gen G1 G2 G3 G1 G2 G3 G1 G2 G3 G1 G2 G3

0.60 0.40 0 13 7 0 4 4 0 0 00.60 0.40 2 13 20 0 4 2 0 0 00.60 0.40 0 13 9 0 4 4 0 0 00.60 0.40 0 15 9 0 2 0 0 0 00.60 0.40 0 4 11 0 2 4 0 0 00.60 0.40 7 0 13 0 0 4 0 0 00.60 0.40 20 0 7 0 0 2 0 0 00.60 0.40 15 0 7 4 0 2 0 0 00.60 0.40 17 0 11 2 0 4 0 0 20.60 0.40 11 2 7 2 0 7 0 0 00.60 0.40 13 15 0 2 2 0 2 0 00.60 0.40 17 13 0 2 0 0 0 0 00.60 0.40 9 22 0 0 2 0 0 0 00.60 0.40 22 15 0 2 2 0 0 0 00.60 0.40 13 13 0 0 2 0 0 0 0

Demonstration D: Weak general, strong groups

Frequency of incorrect specification (%)

True loadings N ¼ 250 N ¼ 500 N ¼ 1,000

Gen G1 G2 G3 G1 G2 G3 G1 G2 G3 G1 G2 G3

0.40 0.60 2 4 4 0 0 0 0 0 00.40 0.60 2 4 7 0 0 0 0 0 00.40 0.60 2 2 2 0 0 0 0 0 00.40 0.60 2 4 9 0 0 0 0 1 00.40 0.60 2 9 2 0 0 2 0 0 00.40 0.60 4 0 2 0 0 0 0 0 00.40 0.60 2 0 7 0 0 2 0 0 00.40 0.60 7 0 4 0 0 0 0 0 00.40 0.60 2 0 2 0 0 2 0 0 00.40 0.60 2 0 7 2 0 0 0 0 00.40 0.60 4 4 0 0 0 0 0 0 00.40 0.60 2 4 0 0 0 0 1 0 00.40 0.60 4 7 0 0 0 0 0 0 00.40 0.60 2 9 0 0 5 0 0 0 00.40 0.60 2 11 0 0 0 0 0 0 0

Reise et al. 701

Page 19: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

in pinpointing a good target matrix? We now turn to the second question: Assuming

the SL pinpoints a good target, how effective is the target rotation itself in recovering

the population parameters?

Whereas population parameters in the above simulations were sampled from a uni-

form distribution, population parameters in the present simulations were fixed at a sin-

gle number. This was necessary to compare true and estimated parameters on

a continuous scale, whereas in the previous simulations, we were interested only in

counts of success/failure. The fixed population parameters used in the present simu-

lations were specified simply as the means of the analogous uniform distribution

from the previous simulations. For example, loadings drawn from a 0.3-0.5 uniform

distribution in the above simulations are defined as 0.4 presently.

The ‘‘Monte Carlo’’ procedure in Mplus requires typical inputs: number of simu-

lated samples, number of persons per sample, population parameters from which to

generate observations, and the desired analysis to be run on said observations. Popu-

lation parameters were specified as variations on bifactor independent cluster struc-

ture directly analogous to the structures simulated in the above Monte Carlo

analyses. Weak/moderate/strong group and general factors were completely crossed

with three sample sizes (250, 500, and 1,000). Additionally, we simulated some

extreme conditions, which are indicated below. The target matrix in each condition

was specified perfectly—that is, if a particular loading is zero in the population, it

was specified as zero. Otherwise, Mplus was told to estimate it.

Table 5 (Demonstration A) presents true and estimated loadings when the general

and group factors are equal and of moderate strength. To save space, only sample sizes

of 250 and 1,000 are shown. At a sample size of 250, loadings were recovered reason-

ably well, with an average absolute error of 0.020. Exceptions to this include Item 4’s

loading on the first group factor, with an error of 0.08, and an apparent, small positive

bias on the group factors. The larger sample size (1,000) offered surprisingly little

improvement. Average absolute error (AAE) decreased to 0.014, an improvement

of only 0.006. Additionally, the group factors remained positively biased, and the

worst error (Item 6, Group 3) was not much better than when N ¼ 250.

Demonstration B (bottom portion of Table 5) presents true and estimated loadings

when the general factor is strong but the group factors are weak. An immediately

observable phenomenon is that the general factor (which is stronger than in Demonstra-

tion A) is estimated more accurately, with AAEs of 0.009 (N ¼ 250) and 0.004 (N ¼1,000). This improved accuracy over Demonstration A extends to the group factors as

well. Even though Item 14’s loading on Group 3 is off by 0.10 when N ¼ 250, the AAE

was lower on the group factors in Demonstration A (despite their being weaker).

Table 6 contains two other independent cluster examples: a weak general factor

with strong group factors (Demonstration C), and a weak general factor with unbal-

anced group factors (Demonstration D). Demonstration C is analogous to Demonstra-

tion B (Table 5), only reversed, such that the group factors actually account for more

item variance than does the general factor. Interestingly, even though the communal-

ities do not differ between Demonstrations B and C, the latter proved more challeng-

ing for the targeted rotation to estimate. At sample sizes of 250 and 1,000, the AAE

702 Educational and Psychological Measurement 71(4)

Page 20: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

was 0.027 and 0.017, respectively (compared with 0.009 and 0.004 in Demonstration

B). The source of this larger error in Demonstration C is due mainly to consistent

underestimation of the general factor loadings. Whereas the strong general factor in

Table 5. True and Estimated Loadings by Monte Carlo Simulation in Mplus

Demonstration A: Simple structure (moderate general)a

Item

Population (true) Estimated, N ¼ 250 Estimated, N ¼ 1,000

Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 0.50 0.50 0.48 0.51 0.01 0.02 0.50 0.49 −0.01 −0.012 0.50 0.50 0.47 0.51 0.04 0.02 0.46 0.52 0.02 0.023 0.50 0.50 0.50 0.49 -0.01 0.00 0.48 0.54 0.00 0.014 0.50 0.50 0.47 0.58 0.02 0.02 0.48 0.51 0.01 0.025 0.50 0.50 0.49 0.50 0.01 0.02 0.47 0.55 0.01 0.026 0.50 0.50 0.48 0.02 0.50 0.03 0.50 0.01 0.51 0.017 0.50 0.50 0.49 0.01 0.51 0.00 0.52 0.00 0.47 0.008 0.50 0.50 0.47 0.02 0.53 0.02 0.52 −0.01 0.51 0.009 0.50 0.50 0.48 0.02 0.52 0.02 0.51 0.01 0.50 0.0110 0.50 0.50 0.50 0.01 0.48 0.01 0.50 0.02 0.51 0.0111 0.50 0.50 0.46 0.02 0.01 0.53 0.48 0.01 0.02 0.5012 0.50 0.50 0.45 0.02 0.03 0.55 0.48 0.02 0.01 0.5613 0.50 0.50 0.47 0.03 0.04 0.54 0.49 0.01 0.00 0.5214 0.50 0.50 0.50 0.01 −0.01 0.49 0.48 0.01 0.01 0.5215 0.50 0.50 0.47 0.01 0.02 0.52 0.49 0.01 0.01 0.50

Demonstration B: Simple structure (Strong general, weak groups)a

Population (true) Estimated, N ¼ 250 Estimated, N ¼ 1,000

Item Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 0.60 0.40 0.60 0.43 0.02 0.02 0.60 0.39 0.00 −0.012 0.60 0.40 0.60 0.40 0.01 −0.01 0.60 0.39 0.00 0.003 0.60 0.40 0.60 0.40 0.01 0.02 0.59 0.41 0.02 0.014 0.60 0.40 0.59 0.46 0.00 0.01 0.59 0.44 0.01 0.015 0.60 0.40 0.60 0.37 0.00 0.01 0.59 0.42 0.00 −0.016 0.60 0.40 0.59 0.02 0.40 0.01 0.60 0.00 0.39 −0.017 0.60 0.40 0.59 −0.01 0.41 0.00 0.59 0.01 0.40 0.018 0.60 0.40 0.59 0.00 0.48 0.02 0.60 0.01 0.40 0.029 0.60 0.40 0.59 0.01 0.42 0.01 0.59 0.01 0.41 0.0110 0.60 0.40 0.59 0.02 0.42 0.02 0.60 0.02 0.41 0.0011 0.60 0.40 0.59 0.00 0.00 0.40 0.60 0.01 0.01 0.4312 0.60 0.40 0.58 0.02 0.02 0.44 0.60 0.00 0.00 0.4013 0.60 0.40 0.60 −0.02 0.01 0.39 0.60 0.01 0.00 0.4014 0.60 0.40 0.58 0.03 0.02 0.50 0.60 0.00 0.01 0.3915 0.60 0.40 0.59 0.01 0.00 0.44 0.60 −0.01 0.01 0.42

a. Although 1,000 replications were specified in Mplus, not all replications converged. Thus, the actual

number of replications is less than 1,000.

Reise et al. 703

Page 21: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Demonstration B was estimated almost perfectly by the target rotation, the weaker

general factor in Demonstration C appears to be vulnerable to slight underestimation.

As sample size increases, however, this underestimation approaches zero.

Table 6. True and Estimated Loadings by Monte Carlo Simulation in Mplus

Demonstration C: Simple structure (weak general, strong groups)a

Population (true) Estimated, N ¼ 250 Estimated, N ¼ 1,000

Item Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 0.40 0.60 0.40 0.60 0.01 0.00 0.36 0.64 0.02 0.022 0.40 0.60 0.36 0.61 0.04 0.01 0.38 0.59 0.00 0.013 0.40 0.60 0.37 0.63 0.02 0.02 0.37 0.64 0.01 0.014 0.40 0.60 0.36 0.64 0.02 0.03 0.36 0.60 0.02 0.025 0.40 0.60 0.38 0.58 0.03 0.03 0.38 0.66 0.01 0.016 0.40 0.60 0.36 0.02 0.65 0.03 0.39 0.01 0.64 0.017 0.40 0.60 0.34 0.02 0.60 0.02 0.39 0.01 0.58 0.018 0.40 0.60 0.35 0.02 0.61 0.03 0.39 0.01 0.64 0.029 0.40 0.60 0.36 0.02 0.60 0.02 0.40 0.01 0.58 0.0110 0.40 0.60 0.34 0.04 0.62 0.03 0.39 0.02 0.58 0.0111 0.40 0.60 0.36 0.02 0.02 0.61 0.40 0.01 0.00 0.5612 0.40 0.60 0.37 0.01 0.02 0.62 0.39 0.01 0.01 0.5913 0.40 0.60 0.37 0.02 0.03 0.60 0.38 0.02 0.02 0.6014 0.40 0.60 0.35 0.03 0.02 0.65 0.38 0.02 0.02 0.6315 0.40 0.60 0.35 0.02 0.03 0.62 0.40 0.00 0.01 0.63

Demonstration D: Simple structure (moderate general, unbalanced groups)a

Population (true) Estimated, N ¼ 250 Estimated, N ¼ 1,000

Item Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 0.40 0.60 0.44 0.55 −0.02 −0.01 0.38 0.60 0.00 0.002 0.40 0.60 0.36 0.62 0.03 0.03 0.35 0.62 0.02 0.023 0.40 0.60 0.37 0.62 0.02 0.03 0.38 0.61 0.00 0.004 0.40 0.60 0.36 0.71 0.02 0.03 0.36 0.62 0.02 0.025 0.40 0.60 0.39 0.57 0.02 0.02 0.35 0.64 0.02 0.026 0.40 0.50 0.39 0.01 0.50 0.03 0.39 0.01 0.49 0.017 0.40 0.50 0.37 0.01 0.53 0.02 0.41 0.01 0.53 0.008 0.40 0.50 0.35 0.02 0.51 0.03 0.40 0.01 0.47 0.029 0.40 0.50 0.36 0.03 0.51 0.03 0.40 0.01 0.51 0.0210 0.40 0.50 0.36 0.02 0.52 0.02 0.42 0.01 0.50 −0.0111 0.40 0.40 0.36 0.01 0.03 0.40 0.38 0.02 0.02 0.4612 0.40 0.40 0.34 0.02 0.03 0.47 0.39 0.01 0.00 0.4013 0.40 0.40 0.33 0.03 0.06 0.47 0.38 0.03 0.02 0.4614 0.40 0.40 0.34 0.04 0.02 0.52 0.40 0.01 0.00 0.4215 0.40 0.40 0.34 0.03 0.03 0.43 0.38 0.01 0.02 0.38

a. Although 1,000 replications were specified in Mplus, not all replications converged. Thus, the actual

number of replications is less than 1,000.

704 Educational and Psychological Measurement 71(4)

Page 22: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Demonstration D (also in Table 6) is the most complicated of the independent clus-

ter examples shown presently, because the general factor is weak and the group factors

vary from explaining 36% of item variance (G1) to only 16% (G3). It is also the most

extreme example of sample size effects, with AAE dropping from 0.033 (N ¼ 250) to

0.016 (N ¼ 1,000). This is due mainly to an apparent vulnerability of the small sam-

ple size to produce a few very large errors. Item 14’s loading on G3 and Item 4’s load-

ing on G1 are both misestimated by more than 0.10. Further simulations (not shown)

confirmed that errors of this size are indeed common with this factor structure at small

sample sizes. A second noticeable phenomenon in Demonstration D is a pattern of

increasing underestimation of the general factor as the group factors decrease in

size, with general factor loadings underestimated (on average) by 0.02 on G1, 0.04

on G2, and 0.06 on G3. This would indicate a misalignment of the general latent vec-

tor, obviously caused by the imbalanced group factors.

Table 7 displays the true and estimated factor loadings for more problematic con-

ditions involving violations of independent cluster structure. In Demonstration E, the

general and group factor loadings are moderate and equal, but Item 3 cross-loads on

the third group factor. Cross-loadings present problems for a few reasons, but one of

the most important is that they indicate the existence of factors unaccounted for by the

hypothesized structure. Discovery of cross-loadings is crucial to proper item param-

eter estimation, and should lead researchers to consider extracting additional factors

or to remove the problematic item(s) altogether.

As in Demonstrations A to C, the effect of sample size in Demonstration E (Table

7) appears negligible, decreasing average absolute error from 0.024 (N ¼ 250) to

0.021 (N ¼ 1,000). Additionally, even though we assume here that the target matrix

is perfect, the targeted rotation still does not estimate the cross-loading (0.50, Item 3)

perfectly. Specifically, it tends to overestimate the general factor loading and under-

estimate both the group loading and the cross-loading. These over- and underestima-

tions, which average 0.09, are large enough to cause problems for item parameter

estimation. Nonetheless, the SL (not shown) performed even worse, over- and under-

estimating the cross-loadings almost twice as badly. This lends support to our argu-

ment that performing a target rotation after performing an SL will lead to more

accurate loading estimation, even though the target is not perfect. Furthermore,

with the exception of Item 3, the target rotation recovered the true factor loadings

with accuracy comparable with Demonstrations A through D.

Finally, the population factor structure in Demonstration F (also in Table 7) is bor-

dering on incoherent. Given the severity of independent clusters (three unbalanced

sets of two cross-loadings each), a researcher would be unlikely to overlook such

a problem, and would probably deem the hypothesized factor structure unreasonable.

We include Demonstration F here to test the limits of our proposed method. As

expected, the average absolute error was greater for Demonstration F than for any

of the previous demonstrations. The effect of sample size is also large, with AAE

decreasing from 0.039 (N ¼ 250) to 0.027 (N ¼ 1,000). Remarkably, however, given

sufficient sample size (1,000), the largest absolute error in Demonstration F is only

0.07. Indeed, whereas the SL orthogonalization failed miserably with such a complex

Reise et al. 705

Page 23: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

structure (not shown), the target rotation recovered 90% of the loadings within 0.06 of

their true values.

Table 7. True and Estimated Loadings by Monte Carlo Simulation in Mplus

Demonstration E: IC basis, moderate general, balanced groups, one cross-loadinga

Population (true) Estimated, N ¼ 250 Estimated, N ¼ 1,000

Item Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 0.50 0.50 0.49 0.53 0.03 0.01 0.49 0.52 0.01 0.002 0.50 0.50 0.47 0.54 0.04 0.02 0.47 0.51 0.03 0.023 0.50 0.50 0.50 0.58 0.40 −0.06 0.42 0.60 0.43 −0.08 0.374 0.50 0.50 0.46 0.54 0.03 0.04 0.47 0.51 0.03 0.025 0.50 0.50 0.47 0.51 0.04 0.02 0.47 0.55 0.02 0.016 0.50 0.50 0.48 0.01 0.51 0.00 0.47 0.01 0.57 0.027 0.50 0.50 0.47 0.00 0.51 0.00 0.49 0.00 0.51 −0.018 0.50 0.50 0.46 0.03 0.54 0.02 0.49 0.00 0.50 0.019 0.50 0.50 0.46 0.02 0.53 0.04 0.48 0.01 0.50 0.0110 0.50 0.50 0.49 0.02 0.48 0.01 0.49 0.02 0.50 0.0011 0.50 0.50 0.47 0.03 0.05 0.48 0.49 0.01 0.02 0.5112 0.50 0.50 0.50 0.00 0.01 0.50 0.49 0.01 0.02 0.5413 0.50 0.50 0.52 0.01 0.01 0.48 0.50 0.01 0.02 0.4814 0.50 0.50 0.49 0.03 0.02 0.52 0.50 0.00 0.01 0.5415 0.50 0.50 0.50 0.01 0.01 0.50 0.50 0.01 0.02 0.49

Demonstration F: IC basis, weak general, unbalanced groups, six unbalanced cross-loadingsa

Population (true) Estimated, N ¼ 250 Estimated, N ¼ 1,000

Item Gen G1 G2 G3 Gen G1 G2 G3 Gen G1 G2 G3

1 0.40 0.60 0.20 0.45 0.55 0.01 0.14 0.42 0.59 0.00 0.162 0.40 0.60 0.20 0.40 0.56 0.02 0.17 0.40 0.60 0.01 0.183 0.40 0.60 0.36 0.64 0.03 0.02 0.37 0.62 0.03 0.004 0.40 0.60 0.35 0.66 0.02 0.00 0.35 0.63 0.03 0.015 0.40 0.60 0.39 0.58 0.02 0.02 0.36 0.61 0.02 0.016 0.40 0.50 0.32 0.04 0.58 0.02 0.33 0.03 0.53 0.027 0.40 0.50 0.34 0.01 0.53 0.02 0.35 0.02 0.51 0.028 0.40 0.50 0.40 0.43 −0.02 0.45 0.35 0.44 −0.01 0.48 0.359 0.40 0.50 0.40 0.43 0.00 0.46 0.33 0.45 −0.01 0.48 0.3310 0.40 0.50 0.30 0.05 0.63 0.03 0.36 0.02 0.56 0.0011 0.40 0.40 0.36 0.02 0.03 0.41 0.40 0.01 0.03 0.3712 0.40 0.40 0.35 0.03 0.04 0.45 0.41 0.01 0.01 0.3813 0.40 0.40 0.37 0.03 0.04 0.41 0.40 0.01 0.02 0.3814 0.40 0.60 0.40 0.45 0.55 −0.02 0.32 0.46 0.57 −0.02 0.3315 0.40 0.60 0.40 0.46 0.57 −0.03 0.32 0.46 0.56 −0.02 0.33

IC ¼ independent cluster. a. Although 1,000 replications were specified in Mplus, not all replications

converged. Thus, the actual number of replications is less than 1,000.

706 Educational and Psychological Measurement 71(4)

Page 24: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Discussion

Parameters can only be estimated correctly to the degree that the data are appropriate for

a specific model. In an IRT context, when considering a highly restrictive unidimen-

sional model, item response data must have a structure that is consistent with this restric-

tion to accurately estimate population parameters. From an applied standpoint, accurate

item parameter estimation is critical because it underlies all applications of IRT models,

including (a) scaling individual differences, (b) performing scale linking, (c) developing

and administering computerized adaptive tests, (d) analyzing the psychometric func-

tioning of items and scales, and (e) identifying differential item functioning.

Nevertheless, it is generally recognized that even well constructed psychological meas-

ures are unlikely to produce item response data that are perfectly consistent with the

restrictions of a unidimensional model. In fact, it is arguable that many psychological

measures produce item response matrices that are consistent with both a single general

common factor (i.e., ‘‘strong’’ common factor) and multidimensional models caused by

the presence of two or more content parcels. Precisely because the assessment of ‘‘unidi-

mensionality’’ is so challenging, applied researchers rely heavily on rule-of-thumb guide-

lines to judge when a restricted model may be safely applied to a given dataset.

In this report, we reviewed and provided further empirical support for a proposed

‘‘comparison modeling’’ approach to evaluating the viability of particular restricted

IRT model applications (Reise et al., 2010). Specifically, in comparison modeling,

a researcher identifies an exploratory multidimensional factor rotation that is assumed

to better represent the true structure of the data. In turn, a researcher then compares

the item slopes with those estimated under a more restricted model. As stated by Browne

(2001, p. 113) in the context of structural equation modeling, ‘‘The discovery of misspe-

cified loadings, however, is more direct through rotation of the factor matrix than

through the examination of model modification indices.’’ We believe that a similar logic

applies to IRT model fitting as well. That is, we believe that it is better to thoroughly

understand the multidimensional structure of the data, and to identify potential modeling

problems, prior to fitting an IRT model as opposed to either relying on a rule of thumb

(e.g., ratio of first to second eigenvalue greater than 3), or fitting an IRT model and then

searching for inadequacies (e.g., inspection of correlated residuals or fit indices).

As illustrated by the present research, at their best (an adequate sample size and

a multidimensional structure that is independent cluster), an SL orthogonalization

can yield an adequate initial target pattern. In turn, assuming the target is correct, a tar-

get rotation can provide, in either a factor analytic or IRT metric, an accurate unre-

stricted bifactor comparison model. Finally, this bifactor comparison model can

provide a direct index of the degree of item parameter bias caused by forcing multi-

dimensional data into unidimensional models. Of course, we note that no targeted

rotation procedure can be expected to work under all the possible conditions that con-

front a researcher. Moreover, the application of targeted rotations is much more com-

plicated when working with real data, as opposed to the hypothetical examples

Reise et al. 707

Page 25: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

provided here. Thus, in the next section, we briefly consider challenges and limitations

of targeted rotations and describe directions for future research.

Limitations and Future Directions

The most significant challenge of comparison IRT modeling is to identify a correct

exploratory factor model to compare with a more restricted model. The ability to

accomplish this goal depends on (a) the true structure of the data (e.g., the severity

and form of multidimensionality); (b) the ability of the target rotation method to

recover this true structure (e.g., what particular function should be minimized?, should

weighted or unweighted least squares be used?), which depends partially on (c) how

the researcher specifies an initial target pattern (e.g., based on theory or a priori data

analysis?). We will focus the following discussion on this latter task—identifying the

dimensionality of a target matrix and specifying the values of an initial target.

To specify a target matrix, the first challenge is the historically important problem

of identifying the dimensionality of the target matrix. Recall that to test a restricted

model (e.g., unidimensionality), the researcher needs to compare that with an explor-

atory multidimensional model. Drawing partly from experience and partly from the

present results, we suggest that in many situations, an unweighted least squares extrac-

tion followed by an oblique rotation can provide a lot of insight into the questions of

how many factors should be extracted as well as the question of how the target should

be specified. As shown in the above simulations, when the data are multidimensional

because of having at least three ‘‘clusters’’ of items, and the items have an indepen-

dent cluster structure, the SL is capable of adequately recovering the configuration of

meaningful loadings. However, although not explicitly illustrated by the present sim-

ulations, it is easy to demonstrate that the SL can run into serious problems when the

data depart drastically from independent cluster structure (i.e., many cross-loadings in

an oblique rotated solution). On the other hand, the degree to which any model should

be applied to such data is highly questionable.

In closing, we note that although it is critical to correctly specify the dimensionality

of the initial target matrix, it may not be so critical to correctly specify the particular

pattern of specified and unspecified elements in that matrix. We have observed that if

a wrong element is specified (e.g., a zero is placed for a loading that is truly nonzero),

Mplus will identify this problem by estimating a nonzero rotated factor loading. In

turn, the researcher may then respecify the target matrix, and reestimate the rotation.

The opposite is also true in many circumstances. That is, if a researcher mistakenly

specified a question mark for an element that should be zero, Mplus can identify

this problem. The conditions under which a researcher can iterate to the correct target

in this manner remains to be demonstrated in future studies. Finally, although not

detailed here, we warn that despite the present positive results, bifactor structures

are often challenging to estimate even in simulated data. The reasons are very com-

plicated and beyond the present scope. We hope to address these issue more thor-

oughly in future work.

708 Educational and Psychological Measurement 71(4)

Page 26: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Summary

We described the use of exploratory bifactor IRT models (estimated as exploratory

factor models) to judge the effects of forcing multidimensional data into a unidimen-

sional model. We propose that the comparison modeling approach allows researchers

to evaluate the item parameter bias caused by forcing data into a restricted model. At

the very least, exploratory modeling forces the researcher into considering alternative

multidimensional models and thus to seriously considering the structure of their data

and its impact on item parameter estimates. For this reason, we suggest that research

that proposes a unidimensional IRT application should be required to also report the

tetrachoric or polychoric correlation matrix so that other researchers can make their

own judgments.

Declaration of Conflicting Interests

The authors declared no conflicts of interest with respect to the authorship and/or publication of

this article.

Funding

This work was supported by the Consortium for Neuropsychiatric Phenomics (NIH Roadmap

for Medical Research grants UL1-DE019580 (PI: Robert Bilder), and RL1DA024853 (PI:

Edythe London). Additional research support was obtained through the National Institutes of

Health the NIH Roadmap for Medical Research Grant (AR052177; PI: David Cella); a pre-doc-

toral advanced quantitative methodology training grant (#R305B080016) awarded to UCLA by

the Institute of Education Sciences of the U.S. Department of Education; and, a NCI grant

4R44CA137841-03 (PI: Patrick Mair) for IRT software development for health outcomes

and behavioral cancer research. The content is solely the responsibility of the authors and

does not necessarily represent the official views of the National Cancer Institute or the National

Institutes of Health.

References

Ackerman, T. A. (2005). Multidimensional item response theory modeling. In A. Maydeu-

Olivares & J. J. McArdle (Eds.), Contemporary psychometrics (pp. 3-26). Mahwah, NJ:

Lawrence Erlbaum.

Asparouhov, T., & Muthen, B. (2008). Exploratory structural equation modeling (Tech. Rep.).

Retrieved from http://www.statmodel.com

Bartholomew, D. J., & Knott, M. (1999). Latent variable models and factor analysis. London,

England: Arnold.

Bernstein, I. H., & Teng, G. (1989). Factoring items and factoring scales are different: Spurious

evidence of multidimensionality. Psychological Bulletin, 105, 467-477.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parame-

ters: Application of an EM algorithm. Psychometrika, 46, 443-458.

Bock, R. D., Gibbons, R., Shilling, S. G., Muraki, E., Wilson, D. T., & Wood, R. (2002). TEST-

FACT 4. Chicago, IL: Scientific Software.

Reise et al. 709

Page 27: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Browne, M. W. (1972a). Orthogonal rotation to a partially specified target. British Journal of

Mathematical and Statistical Psychology, 25, 115-120.

Browne, M. W. (1972b). Oblique rotation to a partially specified target. British Journal of

Mathematical and Statistical Psychology, 25, 207-212.

Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multi-

variate Behavioral Research, 35, 111-150.

Browne, M. W., Cudeck, R., Tateneni, K., & Mels, G. (2004). CEFA: Comprehensive explor-

atory factor analysis, Version 2.00 [Computer software and manual]. Retrieved from http://

quantrm2.psy.ohio-state.edu/browne/

Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32.

Drasgow, F., & Parsons, C. K. (1983). Application of unidimensional item response theory

models to multidimensional data. Applied Psychological Measurement, 7, 189-199.

Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded models for rating data:

Limited versus full information methods. Psychological Methods, 14, 275-299.

Fraser, C. (1988). NOHARM II. A Fortran program for fitting unidimensional and multidimen-

sional normal ogive models of latent trait theory. Armidale, New South Wales, Australia:

The University of New England, Center for Behavioral Studies.

Fraser, C., & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivar-

iate Behavioral Research, 23, 267-269.

Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychome-

trika, 57, 423-436.

Knol, D. L., & Berger, M. P. F. (1991). Empirical comparison between factor analysis and mul-

tidimensional item response models. Multivariate Behavioral Research, 26, 457-477.

Ip, E. H. (2010). Empirically indistinguishable multidimensional IRT and locally dependent

unidimensional item response models. British Journal of Mathematical and Statistical Psy-

chology, 63, 395-416.

Maydeu-Olivares, A. (2006). Limited information estimation and testing of discretized multi-

variate normal structural models. Psychometrika, 71, 57-77.

Maydeu-Olivares, A., & Joe, H. (2005). Limited and full information estimation and testing in

2n contingency tables: A unified framework. Journal of the American Statistical Associa-

tion, 100, 1009-1020.

Maydeu-Olivares, A., & Joe, H. (2006). Limited information goodness-of-fit testing in multi-

dimensional contingency tables. Psychometrika, 71, 713-732.

McDonald, R. P. (1999). Test theory: A unified approach. Mahwah, NJ: Lawrence Erlbaum.

McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psycho-

logical Measurement, 24, 99-114.

McDonald, R. P., & Mok, M. M. C. (1995). Goodness of fit in item response models. Multivar-

iate Behavioral Research, 30, 23-40.

McLeod, L. D., Swygert, K. A., & Thissen, D. (2001). Factor analysis for items scored in two

categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 189-216). Mahwah, NJ:

Lawrence Erlbaum.

Muthen, B. (1979). A structural probit model with latent variables. Journal of American Statis-

tical Association, 74, 807-811.

Muthen, B. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A.

Bollen & J. S. Long (Eds.), Testing structural equations models (pp. 205-234). Newbury

Park, CA: Sage.

710 Educational and Psychological Measurement 71(4)

Page 28: Target Rotations and - ub.edu · rotations can be used productively to establish a plausible comparison model. Keywords item response theory, bifactor model, ordinal factor analysis,

Muthen, B., & Christoffersson, A. (1981). Simultaneous factor analysis of dichotomous varia-

bles in several groups. Psychometrika, 46, 407-419.

R Development Core Team. (2008). R: A language and environment for statistical computing.

Vienna, Austria: R Foundation for Statistical Computing.

Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and

implications. Journal of Educational Statistics, 4, 207-230.

Revelle, W. (2009). psych: Procedures for psychological, psychometric, and personality

research. R package version 1.0-68. Retrieved from http://personality-project.org/r/

psych.manual.pdf

Reise, S. P., Cook, K. F., & Moore, T. M. (2010). A comparison modeling approach for eval-

uating the impact of multidimensionality on unidimensional item response theory model

parameters. Manuscript submitted for publication.

Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psycho-

metrika, 22, 53-61.

Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor

analysis of discretized variables. Psychometrika, 52, 393-408.

Tucker, L. R. (1940). A rotational method based on the mean principal axis of a subgroup of

tests. Psychological Bulletin, 3, 289-294.

Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future

directions. Psychological Methods, 12, 58-79.

Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-

order factor model and the hierarchical factor model. Psychometrika, 64, 113-128.

Reise et al. 711