The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain...

19
The SAGE Handbook of Multilevel Modeling Edited by Marc A. Scott, Jeffrey S. Simonoff and Brian D. Marx

Transcript of The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain...

Page 1: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/23 — 9:39 — page 52

The SAGE Handbook of

Multilevel Modeling

Edited by

Marc A. Scott, Jeffrey S. Simonoff and

Brian D. Marx

Page 2: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 3

1

The Multilevel Model FrameworkJ e f f G i l l

Washington University, USA

A n d r e w J. W o m a c kUniversity of Florida, USA

1.1 OVERVIEW

Multilevel models account for different lev-els of aggregation that may be present indata. Sometimes researchers are confrontedwith data that are collected at different lev-els such that attributes about individual casesare provided as well as the attributes ofgroupings of these individual cases. In addi-tion, these groupings can also have highergroupings with associated data characteris-tics. This hierarchical structure is commonin data across the sciences, ranging fromthe social, behavioral, health, and economicsciences to the biological, engineering, andphysical sciences, yet is commonly ignoredby researchers performing statistical analyses.Unfortunately, neglecting hierarchies in datacan have damaging consequences to subse-quent statistical inferences.

The frequency of nested data structures inthe data-analytic sciences is startling. In theUnited States and elsewhere, individual vot-ers are nested in precincts which are, in turn,nested in districts, which are nested in states,which are nested in the nation. In health-care, patients are nested in wards, which are

then nested in clinics or hospitals, which arethen nested in healthcare management sys-tems, which are nested in states, and so on.In the classic example, students are nestedin classrooms, which are nested in schools,which are nested in districts, which are thennested in states, which again are nested in thenation. In another familiar context, it is oftenthe case that survey respondents are nested inareas such as rural versus urban, then theseareas are nested by nation, and the nations inregions. Famous studies such as the AmericanNational Election Studies, Latinobarometer,Eurobarameter, and Afrobarometer are obvi-ous cases. Often in population biology ahierarchy is built using ancestral informa-tion, and phenotypic variation is used toestimate the heritability of certain traits, inwhat is commonly referred to as the “animalmodel.” In image processing, spatial relation-ships emerge between the intensity and hue ofpixels.There are many hierarchies that emergein language processing, such as topic of dis-cussion, document type, region of origin, orintended audience. In longitudinal studies,more complex hierarchies emerge. Units orgroups of units are repeatedly observed over

Page 3: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 4

4 THE MULTILEVEL MODEL FRAMEWORK

a period of time. In addition to group hier-archies, observations are also grouped by theunit being measured. These models are exten-sively used in the medical/health sciences tomodel the effect of a stimulus or treatmentregime conditional on measures of interest,such as socioeconomic status, disease preva-lence in the environment, drug use, or otherdemographic information. Furthermore, thefrequency of data at different levels of aggre-gation is increasing as more data are generatedfrom geocoding, biometric monitoring, Inter-net traffic, social networks, an amplificationof government and corporate reporting, andhigh-resolution imaging.

Multilevel models are a powerful andflexible extension to conventional regressionframeworks.They extend the linear model andthe generalized linear model by incorporatinglevels directly into the model statement, thusaccounting for aggregation present in the data.As a result, all of the familiar model forms forlinear, dichotomous, count, restricted range,ordered categorical, and unordered categor-ical outcomes are supplemented by addinga structural component. This structure clas-sifies cases into known groups, which mayhave their own set of explanatory variablesat the group level. So a hierarchy is estab-lished such that some explanatory variablesare assigned to explain differences at the indi-vidual level and some explanatory variablesare assigned to explain differences at thegroup level. This is powerful because it takesinto account correlations between subjectswithin the same group as distinct from cor-relations between groups. Thus, with nesteddata structures the multilevel approach imme-diately provides a set of critical advantagesover conventional, flat modeling where thesestructures emerge as unaccounted-for hetero-geneity and correlation.

What does a multilevel model look like? Atthe core, there is a regression equation thatrelates an outcome variable on the left-handside to a set of explanatory variables on theright-hand side. This is the basic individual-level specification, and looks immediately

like a linear model or generalized linearmodel. The departure comes from the treat-ment of some of the coefficients assigned tothe explanatory variables. What can be doneto modify a model when a point estimate isinadequate to describe the variation due toa measured variable? An obvious modifica-tion is to treat this coefficient as having a dis-tribution as opposed to being a fixed point.A regression equation can be introduced tomodel the coefficient itself, using informationat the group level to describe the heterogeneityin the coefficient. This is the heart of the mul-tilevel model. Any right-hand side effect canget its own regression expression with its ownassumptions about functional form, linearity,independence, variance, distribution of errors,and so on. Such models are often referred toas “mixed,” meaning some of the coefficientsare modeled while others are unmodeled.

What this strategy produces is a method ofaccounting for structured data through utiliz-ing regression equations at different hierar-chical levels in the data. The key linkage isthat these higher-level models are describingdistributions at the level just beneath them forthe coefficient that they model as if it wereitself an outcome variable. This means thatmultilevel models are highly symbiotic withBayesian specifications because the focus inboth cases is on making supportable distribu-tional assumptions.

Allowing multiple levels in the samemodel actually provides an immense amountof flexibility. First, the researcher is notrestricted to a particular number of levels. Thecoefficients at the second grouping level canalso be assigned a regression equation, thusadding another level to the hierarchy, althoughit has been shown that there is diminishingreturn as the number of levels goes up, andit is rarely efficient to go past three levelsfrom the individual level (Goel and DeGroot1981, Goel 1983). This is because the effectsof the parameterizations at these super-highlevels gets washed out as it comes down thehierarchy. Second, as stated, any coefficientat these levels can be chosen to be modeled

Page 4: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 5

1.2 BACKGROUND 5

or unmodeled and in this way the mixture ofthese decisions at any level gives a combina-torially large set of choices. Third, the form ofthe link function can differ for any level of themodel. In this way the researcher may mix lin-ear, logit/probit, count, constrained, and otherforms throughout the total specification.

1.2 BACKGROUND

It is often the case that fundamental ideas instatistics hide for a while in some appliedarea before scholars realize that these aregeneralizable and broadly applicable princi-ples. For instance, the well-known EM algo-rithm of Dempster, Laird, and Rubin (1977)was pre-dated in less fully articulated formsby Newcomb (1886), McKendrick (1926),Healy andWestmacott (1956), Hartley (1958),Baum and Petrie (1966), Baum and Eagon(1967), and Zangwill (1969), who gives thecritical conditions for monotonic conver-gence. In another famous example, the coreMarkov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietlyin the Journal of Chemical Physics beforeemerging in the 1990s to revolutionize theentire discipline of statistics. It turns out thathierarchical modeling follows this same sto-ryline, roughly originating with the statisticalanalysis of agricultural data around the 1950s(Eisenhart 1947, Henderson 1950, Scheffé1956, Henderson et al. 1959). A big stepforward came in the 1980s when educationresearchers realized that their data fit thisstructure perfectly (students nested in classes,classes nested in schools, schools nestedin districts, districts nested in states), andthat important explanatory variables couldbe found at all of these levels. This flurryof work focused on the hierarchical linearmodel (HLM) and was developed in detailin works such as Burstein (1980), Masonet al. (1983), Aitkin and Longford (1986),Bryk and Raudenbush (1987), Bryk et al.(1988), De Leeuw and Kreft (1986), Rau-denbush and Bryk (1986), Goldstein (1987),Longford (1987), Raudenbush (1988), and

Lee and Bryk (1989). These applications con-tinue today as education policy remains animportant empirical challenge. Work in thisliterature was accelerated by the developmentof the standalone software packages HLM,ML2, VARCL, as well as incorporation intothe SAS procedure MIXED, and others. Addi-tional work by Goldstein (notably 1985) tookthe two-level model and extended it to sit-uations with further nested groupings, non-nested groupings, time series cross-sectionaldata, and more. At roughly the same time,a series of influential papers and applica-tions grew out of Laird and Ware (1982),where a random effects model for Gaussianlongitudinal data is established. This Laird–Ware model was extended to binary out-comes by Stiratelli, Laird, and Ware (1984)and GEE estimation was established by Zegerand Liang (1986). An important extension tonon-linear mixed effects models is presentedin Lindstrom and Bates (1988). In addition,Breslow and Clayton (1993) developed quasi-likelihood methods to analyze generalized lin-ear mixed models (GLMMs).

Beginning around the 1990s, hierarchicalmodeling took on a much more Bayesian com-plexion now that stochastic simulation tools(e.g. MCMC) had arrived to solve the result-ing estimation challenges. Since the Bayesianparadigm and the hierarchical reliance on dis-tributional relationships between levels havea natural affinity, many papers were producedand continue to be produced in the inter-section of the two. Computational advancesduring this period centered around customiz-ing MCMC solutions for particular problems(Carlin et al. 1992, Albert and Chib 1993,Liu 1994, Hobert and Casella 1996, Jonesand Hobert 2001, Cowles 2002). Other worksfocused on solving specific applied problemswith Bayesian models: Steffey (1992) incor-porates expert information into the model,Stangl (1995) develops prediction and deci-sion rules, Cohen et al. (1998) model arrestrates, Zeger and Karim (1991) use GLMMsto study infectious disease, Christiansen andMorris (1997) build on count models hierar-chically, Hodges and Sargent (2001) refine

Page 5: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 6

6 THE MULTILEVEL MODEL FRAMEWORK

inference procedures, Pauler et al. (2001)model cancer risk, and Pettitt et al. (2006)model survey data from immigrants. Recently,Bayesian prior specifications in hierarchicalmodels have received attention (Hadjicostasand Berry 1999, Daniels and Gatsonis 1999,Gelman 2006, Booth et al. 2008). Finally, thetext by Gelman and Hill (2007) has been enor-mously influential.

A primary reason for the large increasein interest in the use of multilevel modelsin recent years is due to the ready availabil-ity of sophisticated general software solutionsfor estimating more complex specifications.Areview of software available for fitting thesemodels is presented in Chapter 26 of this vol-ume. For basic models, the lme4 package inR works quite well and preserves R’s intu-itive model language. Also, Stata providessome support through the XTMIXED routine.However, researchers now routinely specifygeneralized linear multilevel models with cat-egorical, count, or truncated outcome vari-ables. It is also now common to see non-nestedhierarchies expressing cross-classification,mixtures of nonlinear relationships withinhierarchical groupings, and longitudinal con-siderations such as panel specifications andstandard time-series relations. All of this pro-vides a rich, but sometimes complicated, setof variable relationships. Since most appliedusers are unwilling to spend the time toderive their own likelihood functions or pos-terior distributions and maximize or explorethese forms, software like WinBUGS and itscousin JAGS are popular (Bayesian) solutions(Mplus is also a helpful choice).

1.3 FOUNDATIONAL MODELS

The development of multilevel models startswith the simple linear model specification forindividual i that relates the outcome variable,yi , to the systematic component, xiβ1, withunexplained variance falling to the error term,εi , giving:

yi = β0 + xiβ1 + εi , (1.1)

which is assumed to meet the standard Gauss–Markov assumptions (linear functional form,independent errors with mean zero and con-stant variance, no relationship between xiand errors). The normality of the errors isnot a necessary assumption for making infer-ences since standard least squares proceduresproduce an estimate of the standard error(Amemiya 1985, Ravishanker and Dey 2002),but with reasonable sample size and finitevariance the central limit theorem applies.For maximum likelihood results, which pro-duce the same estimates as does least squares,the derivation of the estimator begins withthe assumption of normality of populationresiduals. See the discussion on pages 583–6 of Casella and Berger (2001) for a detailedderivation.

1.3.1 Basic Linear Forms,Individual-LevelExplanatory Variables

How does one model the heterogeneity thatarises because each i case belongs to one ofj = 1, . . . , J groups where J < n? Even ifthere does not exist explanatory variable infor-mation about these J assignments, model fitmay be improved by binning each i case intoits respective group. This can be done by loos-ening the definition of the single intercept, β0,in (1.1) to J distinct intercepts, β0 j , whichthen groups the n cases, giving them a com-mon intercept with other cases if they land inthe same group. Formally, for i = 1, . . . , n j(where n j is the size of the j th group):

yi j = β0 j + xi jβ1 + εi j ,

where the added j subscript indicates that thiscase belongs to the j th group and gets inter-ceptβ0 j .Theβ0 j are group-specific interceptsand are usually given a common normal dis-tribution with mean β0 and standard deviationσu0 . The overall intercept β0 is referred to as afixed effect and the difference u0 j = β0 j −β0

is a random effect. Subscripting the u with0 denotes that it relates to the intercept termand distinguishes it from other varying quan-tities to be discussed shortly. Since the β1

Page 6: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 7

1.3 FOUNDATIONAL MODELS 7

Figure 1.1 Varying Intercepts

coefficient is not indexed by the groupingterm j , this is still constant across all of then = n1 + · · · + n J cases and evaluated witha standard point estimate. This model is illus-trated in Figure 1.1, which shows that whiledifferent groups start at different intercepts,they progress at the same rate (slope). Thismodel is sufficiently fundamental that it has itsown name, the varying-intercept or random-intercept model.

In a different context, one may want toaccount for the groupings in the data, butthe researcher may feel that the effect is notthrough the intercept, where the groups startat a zero level of the explanatory variablex , having reason to believe that the group-ing affects the slopes instead: as x increases,group membership dictates a different changein y. So now loosen the definition of the singleslope, β1, in (1.1) to account for the groupingsaccording to:

yi j = β0 + xi jβ1 j + εi j ,

where the added j subscript indicates that then j cases in the j th group get slope β1 j . Theintercept now remains fixed across the casesin the data and the slopes are given a commonnormal distribution with mean β1 and stan-dard deviation σu1 . In this situation, β1 is afixed effect and the difference u1 j = β1 j −β1

Figure 1.2 Varying Slopes

is a random effect, and subscripting the u with1 denotes that these random effects relate tothe slope as opposed to the intercepts. This isillustrated in Figure 1.2 showing divergencefrom the same starting point for the groups asx increases. This model is also fundamentalenough that it gets its own name, the varying-slope or random-slope model.

Suppose the researcher suspects that theheterogeneity in the sample is sufficientlycomplex that it needs to be modeled with botha varying-intercept and a varying-slope. Thisis a simple combination of the previous twomodels and takes the form:

yi j = β0 j + xi jβ1 j + εi j ,

where membership in group j for case i jhas two effects, one that is constant and onethat differs from others with increasing x . Thevectors (β0 j , β1 j ) are given a common mul-tivariate normal distribution with mean vec-tor (β0, β1) and covariance matrix �u . Thevector of means (β0, β1) is the fixed effectand the vectors of differences (u0 j , u1 j ) =

(β0 j −β0, β1 j −β1) are the random effects. Asynthetic, possibly exaggerated, model resultis given in Figure 1.3. Not surprisingly, thisis called the varying-intercept, varying-slopeor random-intercept, random-slope model.Notice from the simple artificial example in

Page 7: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 8

8 THE MULTILEVEL MODEL FRAMEWORK

Figure 1.3 Varying Intercepts and Slopes

the figure that it is already possible to modelquite intricate differences in the groups forthis basic linear form.

Before moving on to more complicatedmodels, a clear explanation of what is meantby fixed effects and random effects is neces-sary. Fixed effects are coefficients in the meanstructure that are assumed to have point val-ues, whereas random effects are coefficientsthat are assumed to have distributions. Forexample, in the models thus far consideredthe random effects have been assumed to havea common Gaussian distribution. A distinc-tion must be made, then, for when group-level effects are assumed to be fixed effectsor random effects. In the random-interceptsmodel, for example, the β0 j can be assumedto be point values instead of being modeled ascoming from a distribution, as fixed effects asopposed to random effects. This distinction isquite important and modifies both the assump-tions of the models as well as the estimationstrategy employed in analyzing them.

1.3.2 Basic Linear Forms, AddingGroup-Level ExplanatoryVariables

The bivariate linear form can be additivelyextended on the right-hand side to include

more covariates, which may or may notreceive the grouping treatment. A canon-ical mixed form is one where the inter-cept and the first q explanatory variableshave coefficients that vary by the j =1, . . . , J groupings (for a total of q + 1modeled coefficients), but the next p coef-ficients, q + 1, q + 2, . . . , q + p, are fixedat the individual level. This is given by thespecification:

yi j = β0 j + x1iβ1 j + · · · + xqiβq j

+ x(q+1)iβq+1

+ · · · + x(q+p)iβq+p + εi j ,

where membership in group j for case i j hasq+1 effects. The vectors of group-level coef-ficients (β0 j , . . . , βq j ) are given a commondistribution, which for now will be assumedto be Gaussian with mean vector (β0, . . . , βq)

(the fixed effects) and covariance matrix �u .As before, the vectors of differences u j =

(β0 j − β0, . . . , βq j − βq) are the randomeffects for this model.

An important aspect of these models, whichgreatly facilitates their use in generalized lin-ear models and nonlinear models, is writingthem in a hierarchical fashion. The model iswritten as a specification of a regression equa-tion at each level of the hierarchy. For exam-ple, consider the model where

yi j = β0 j + x1i jβ1 j + x2i jβ2 + εi j , (1.2)

where there are two random coefficients andone fixed coefficient. The group-level coeffi-cients can be written as

β0 j = β0 + u0 j β1 j = β1 + u1 j (1.3)

and the u0 j , u1 j appear as errors at the sec-ond level of the hierarchy. Assumptions arethen made about the distributions of theindividual-level errors (εi j ) and group-levelerrors (u0 j , u1 j ). For example, a commonset of assumptions in linear regression is εi jare independent and identically distributed(iid) with common variance, and (u0 j , u1 j ) isbivariate normal with mean 0 and covariancematrix �.

Page 8: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 9

1.3 FOUNDATIONAL MODELS 9

The model given in (1.2), (1.3) does notimpose explanatory variables at the secondlevel, given by the J groupings, since u j ∼

Nq(0,�) by assumption. Until explanatoryvariables at this second level are added, thereis not a modeled reason for the differences inthe group-level coefficients. The fact that onecan model the variation at the group level is akey feature of treating them as random effectsas opposed to fixed effects. Returning to thelinear model with two group-level coefficientsand one individual-level coefficient in (1.2),yi j = β0 j + x1i jβ1 j + x2i jβ2 + εi j , modeleach of the two group-level coefficients withtheir own regression equation and index theseby j = 1 to J :

β0 j = β0 + β3x3 j + u0 j

β1 j = β1 + β4x4 j + u1 j , (1.4)

and the group-level variation is modeledas depending on covariates. The explana-tory variables at the second level are calledcontext-level variables, and the idea of contex-tual specificity is that of the existence of legit-imately comparable groups. These context-level variables are constant in each group andare subscripted by j instead of i j to identifythat they only depend on group identification.

Substituting the two definitions in (1.4) intothe individual-level model in (1.2) and rear-ranging produces:

yi j = (β0 + β3x3 j + u0 j )

+ (β1 + β4x4 j + u1 j )x1i j

+ β2x2i j + εi j

= (β0 + β1x1i j + β2x2i j + β3x3 j

+ β4x4 j x1i j )

+ (u0 j + u1 j x1i j + εi j ) (1.5)

for the i j th case. The composite fixed effectsnow have a richer structure, accounting forvariation at both the individual and group lev-els. In addition, the error structure is nowmodeled as being due to specific group-level variables. In this example, the com-posite error structure, (u1 j x1i j + u0 j + εi j ),is heteroscedastic since it is conditioned on

levels of the explanatory variable x1i j . Thiscomposite error shows that uncertainty ismodified in a standard linear model (εi j )by introducing terms that are correlated forobservations in the same group. Though itseems that this has increased uncertainty inthe data, it is just modeling the data in a fullerfashion. This richer model accounts for thehierarchical structure in the data and can pro-vide a significantly better fit to observed datathan standard linear regression.

It is important to understand the exact roleof the new coefficients. First, β0 is a univer-sally assigned intercept that all i cases share.Second, β1 gives another shared term thatis the slope coefficient corresponding to theeffect of changes in x1i j , as does β2 for x2i j .These three terms have no effect from themultilevel composition of the model. Third,β3 gives the slope coefficient for the effect ofchanges in the variable x3 j for group j , andare applied to all individual cases assignedto this group. It therefore varies by groupand not individual. Fourth, and surprisingly,β4 is the coefficient on the interaction termbetween x1i j and x4 j . But though no inter-action term was specified in the hierarchicalform of the model this illustrates an importantpoint.Any hierarchy that models a slope on theright-hand side imposes an interaction term ifthis hierarchy contains group-level covariates.While it is easy to see the multiplicative impli-cations from (1.5), it is surprising to some thatthis is an automatic consequence.

In the Laird–Ware form of the model,the fixed effects are separated from the ran-dom effects as in (1.5). In this formula-tion, the covariates on the random effectsare represented by zs rather than xs in orderto distinguish the two. This structure canbe written (in matrix form) for group j asy j = X jβ+Z j u j +ε j . Here, y j is the vectorof observations, X j is the fixed effects designmatrix, and Z j is a the random effects designmatrix for group j . The vector β is the vectorof fixed effects, which are assumed to havepoint values, whereas u j is the vector of ran-dom effects for group j which are modeled

Page 9: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 10

10 THE MULTILEVEL MODEL FRAMEWORK

by a distributional assumption. Finally, ε j isa vector of individual-level error terms forgroup j . From this formulation, it is apparentthat multilevel models can be expressed in asingle-level expression, although this does notalways lead to a more intuitive expression.

1.3.3 The Model Spectrum

In language that Gelman and Hill (2007)emphasize, multilevel models can be thoughtof as sitting between two extremes that areavailable to the researcher when groupingsare known: fully pooled and fully unpooled.The fully pooled model treats the group-levelvariables as individual variables, meaning thatgroup-level distinctions are ignored and theseeffects are treated as if they are case-specific.For a model with one explanatory variablemeasured at the individual level (x1) andone measured at the group level (x2), thisspecification is:

yi = β0 + x1iβ1 + x2iβ2 + εi .

In contrast to (1.5), there is no u0 j modeledhere. This is an assertion that the group dis-tinctions do not matter and the cases should allbe treated homogeneously, ignoring the (pos-sibly important) variation between categories.At the other end of the spectrum is a set ofmodels in which each group can be treatedas a separate dataset and modeled completelyseparately:

yi j = β0 j + xi jβ1 j + εi j ,

for j = 1, . . . , J . Note that the group-levelpredictor x2 does not enter into this equationbecause x2iβ2 is constant within a group andtherefore subsumed into the intercept term.Here there is no second level to the hierarchyand the βs are assumed to be fixed parame-ters, in contrast to the distributional assump-tions made in the mixed model. The fullyunpooled approach is the opposite distinctionfrom the fully pooled approach and assertsthat the groups are so completely differentthat it does not make sense to associate themin the same model. In particular, the valuesof slopes and intercept from one group have

no relationship to those in other groups. Suchseparate regression models clearly overstatevariation between groups, making them lookmore different than they really should be.

Between these two polar group distinctionslies the multilevel model.The word “between”here means that groups are recognized as dif-ferent, but because there is a single modelin which they are associated by commonindividual-level fixed effects as well as distri-butional assumptions on the random effects,the resulting model therefore compromisesbetween full distinction of groups and the fullignoring of groups. This can be thought of aspartial-pooling or semi-pooling in the sensethat the groups are collected together in a sin-gle model, but their distinctness is preserved.

To illustrate this “betweeness”, considera simple varying-intercepts model with noexplanatory variables:

yi j = β0 j + εi j , (1.6)

which is also called a mean model sinceβ0 j represents the mean of the j th group.If there is an assumption that β0 j = β0 isconstant across all cases, then this becomesthe fully pooled model. Conversely, if thereare J separate models each with their ownβ0 j which do not derive from a commondistribution, then it is the fully unpooledapproach. Estimating (1.6) as a partialpooling model (with Gaussian distributionalassumptions) gives group means that are aweighted average of the n j cases in group jand the overall mean from all cases. Definefirst:

y j fully unpooled mean for group jy fully pooled meanσ 2

0 within-group variance (variance of the εi j )σ 2

1 between-group variance (variance of the y j )n j size of the j th group.

Then an approximation of the multilevelmodel estimate for the group mean is givenby:

β̂0 j =

n j

σ 20

y j +1σ 2

1y

n j

σ 20+

1σ 2

1

. (1.7)

This is a very revealing expression. The esti-mate of the mean for a group is a weighted

Page 10: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 11

1.4 EXTENSIONS BEYOND THE TWO-LEVEL MODEL 11

average of the contribution from the full sam-ple and the contribution from that group,where the weighting depends on relative vari-ances and the size of the group. As the sizeof arbitrary group j gets small, y j becomesless important and the group estimate bor-rows more strength from the full sample. Azero size group, perhaps a hypothesized case,relies completely on the full sample size, since(1.7) reduces to β̂0 j = y. On the other hand,as group j gets large, its estimate dominatesthe contribution from the fully pooled mean,and is also a big influence on this fully pooledmean. This is called the shrinkage of the meaneffects towards the common mean. In addi-tion, as σ 2

1 → 0, then β̂0 j → y, and as

σ 21 → ∞, then β̂0 j → y j . Thus, the group

effect which is at the heart of a multilevelmodel, is a balance between the size of thegroup and the standard deviations at the indi-vidual and group levels.

1.4 EXTENSIONS BEYOND THETWO-LEVEL MODEL

Multilevel models are not restricted to linearforms with interval-measured outcomes overthe entire real line, nor are they restricted tohierarchies which contain only one level ofgrouping or nested levels of grouping. Thestochastic assumptions at each level of thehierarchy can be made in any appropriate fash-ion for the problem being modeled.This addedflexibility of the MLM provides a much richerclass of models and captures many of the mod-els used in modern scientific research.

1.4.1 Nested Groupings

The generalization of the mixed effects modelto nested groupings is straightforward andis most easily understood in the hierarchicalframework of (1.2), (1.3) as opposed to thesingle equation of (1.5).

Consider the common case of surveyrespondents nested in regions, which are thennested in states, and so on.The individual levelcomprises the first hierarchy of the model and

captures the variation in the data that can beexplained by individual-level covariates. Inthis example, the outcome of interest is mea-sured support for a political candidate or party,with covariates that are individualized suchas race, gender, income, age, and attentive-ness to public affairs. The second level of themodel in this example is immediate regionof residence, and this comes with its own setof covariates including rural/urban measure-ment, crime levels, dominant industry, coastalaccess, and so on. The third level is state, thefourth level is national region, and so on. Eachlevel of the model comes with a regressionequation where the variation in intercepts orslopes that are assumed to vary do so with thepossible inclusion of group-level covariates.

Consider a three-level model withindividual-level covariate x1, level-twogroup covariate x2, and level-three covariatex3. The data come as yi jk , indicating thei th individual in the j th level two groupwhich is contained in the kth level threegroup. In the previous example, i representssurvey respondents, j represents immediateregion, and k represents state. Allowing bothvarying-intercepts and varying-slopes in theregression equation at the individual level,gives:

yi jk = β0 jk + β1 jk x1i jk + εi jk,

where the εi jk are assumed to be indepen-dently and normally distributed.At the secondlevel of the model, there are separate regres-sion equations for the intercepts and slopes:

β0 jk = β0k + β2k x2 jk + u0 jk

β1 jk = β1k + β3k x2 jk + u1 jk,

where the vectors of (u0 jk, u1 jk) are assumedto have a common multivariate normal distri-bution. At the third level of the model, thereare separate regression equations for the inter-cepts and slopes:

β0k = β0 + β4x3k + u3k

β2k = β2 + β5x3k + u4k

Page 11: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 12

12 THE MULTILEVEL MODEL FRAMEWORK

β1k = β1 + β6x3k + u5k

β3k = β3 + β7x3k + u6k,

where the vectors of level three residuals(u3k, u4k, u5k, u6k) are assumed to have acommon multivariate normal distribution. Inanalogy to (1.5), this model includes eightfixed effects parameters capturing the inter-cept (β0), the three main effects (β1, β2, β4),the three two-way interactions (β3, β5, β6),and the three-way interaction of the covariates(β7) as well as a rich error structure capturingthe nested groupings within the data. Just fromthis simple framework, extensions abound.For example, since the level two residuals areindexed by both j and k, a natural relaxationof the model is to let the distribution of u jkdepend on k and then bring these distribu-tions together at the third level. Alternatively,one can specify the model such that the levelthree covariate only affects intercepts and notslopes, or that slopes and intercepts vary atlevel two but only intercepts vary at levelthree, both of which are easy modificationsin this hierarchical specification.

1.4.2 Non-Nested Groupings

In order to generalize to the case of non-nestedgroupings, consider data with two differentgroupings at the second level. In an economicsexample, imagine modeling the income ofindividuals in a state who have both an imme-diate region of residence and an occupation.These workers are then naturally grouped bythe multiple regions and the jobs, where thesegroups obviously are not required to have thesame number of individuals: there are moreresidents in a large urban region than a nearbyrural county, and one would expect moreclerical office workers than clergymen, forexample. This is non-nested in the sense thatthere are multiple people in the same regionwith the same and different jobs. Representregion of residence with the index r andoccupation with the index o, letting irorefer to the i th individual who is in both

the r th region class and the oth occupationclass. Of course, such an individual does notnecessarily exist—there may be no ranch-ers in New York City. A regression equationwith individual-level covariate x and inter-cepts which vary with both groupings is givenby:

yiro = β0 + xiroβ1 + ur + uo + εiro, (1.8)

where the random effects ur have one com-mon normal distribution and the randomeffects uo have a different common normaldistribution. To add varying slopes to (1.8),simply modify the equation to be

yiro = β0 + xiroβ1 + u0r + u0o + u1r xiro

+ u1oxiro + εiro,

and make appropriate distributional assump-tions about the random effects.

The addition of a random effect uro to(1.8) that depends on both region and occu-pation would give this model three levels:the individual level, the level of intersectionsof region and occupation, and the level ofregion or occupation. The second level of thehierarchy would naturally nest in both of thelevel three groupings. There are many waysto extend the MLM with crossed groupingsto take into account complicated structuresthat could generate observed data. The keyto effectively using these models in practiceis to consider the possible ways in which dif-ferent groupings can affect the outcome vari-able and then include these in appropriatelydefined regression equations.

1.4.3 Generalized Linear Forms

The extension to generalized linear modelswith non-Gaussian distributional assumptionsat the individual level is also straightfor-ward. This involves inserting a link func-tion between the outcome variable and theadditive-linear form based on the right-handside with the explanatory variables. For a

Page 12: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 13

1.5 VOCABULARY 13

two-level model, this means modeling thelinked variable as:

ηi j = [β0 j + x1i jβ1 j ]+ · · · + xqi jβq j

+ x(q+1)i jβq+1

+ · · · + x(q+p)i jβq+p,

where this is then related to the conditionalmean of the outcome, ηi j = g(E[yi j |β j ]),where there is conditioning on the vector ofgroup-level coefficients β j . This two-levelmodel is completed by making an assump-tion about the distribution of β j . In contrast to(1.5), the stochastic components of the modelat the individual and group levels cannot sim-ply be added together because the link func-tion g is only linear in the case of normallydistributed data. Thus, the random effectsare difficult or impossible to “integrate out”of the likelihood, and the marginal likeli-hood of the data cannot be obtained in closedform.

To illustrate this model, consider a simpli-fied logistic case. Suppose there are outcomesfrom the same binary choice at the individuallevel, a single individual-level covariate (x1),and a single group-level covariate (x2). Thenthe regression equation for varying-interceptsand varying-slopes is given by:

p(yi j = 1|β0 j , β1 j )= logit−1(β0 j+ x1i jβ1 j )

= 1−(1+ eβ0 j+x1i jβ1 j

)−1

β0 j = β0 + β2x2 j + u0 j

β1 j = β1 + β3x2 j + u1 j .

Assume that the random effects u j are multi-variate normal with mean 0 and covariancematrix �. The parameters to be estimatedare the coefficients in the mean structure (thefixed effects), and the elements of the covari-ance matrix �. Notice that the distribution atthe second level is given explicitly as a nor-mal distribution whereas the distribution at thefirst level is implied by the assumption of hav-ing Bernoulli trials. It is common to stipulatenormal forms at higher levels in the modelsince they provide an easy way to considerthe possible correlation of the random effects.

However, it is important to understand thatthe assumption of normality at this level isexactly that, an assumption, and thus must beinvestigated. If evidence is found to suggestthat the random effects are not normally dis-tributed, this assumption must be relaxed orfixed effects should be used.

Standard forms for the link function ing(E[yi j |β j ]) include probit, Poisson (log-linear), gamma, multinomial, ordered cate-gorical forms, and more. The theory and esti-mation of GLMMs is discussed in Chapter 15,and the specific case of qualitative outcomesis discussed in Chapter 16. Many statisticalpackages (see Chapter 26) are available for fit-ting a variety of these models, but as assump-tions are relaxed about distributional formsor more exotic generalized linear models areused, one must resort to more flexible estima-tion strategies.

1.5 VOCABULARY

An unfortunate consequence of the develop-ment of multilevel models in disparate liter-atures is that the vocabulary describing thesemodels differs, even for the exact same spec-ification. The primary confusion is betweenthe synonymous terms of multilevel modelor hierarchical model and the descriptionsthat use effects. Varying-coefficients mod-els, either intercepts or slopes, are oftencalled random effects models since they areassociated with distributional statements likeβ0 j ∼ N (β0 + β1x0 j , σ

2u ). A related term

is mixed models, meaning that the specifica-tion has both modeled and unmodeled coeffi-cients. The term fixed effects is unfortunatelynot used as cleanly as implied above, withdifferent meanings in different particular set-tings. Sometimes this is applied to unmodeledcoefficients that are constant across individu-als, or “nuisance” coefficients that are uninter-esting but included by necessity in the formof control variables, or even in the case wherethe data represent a population instead of asample.

The meanings of “fixed” and “random” canalso differ in definition by literature (Kreft

Page 13: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 14

14 THE MULTILEVEL MODEL FRAMEWORK

and De Leeuw 1988; Section 1.3.3, Gelman2005). The obvious solution to this confu-sion is to not worry about labels but to payattention to the implications of subscriptingin the described model. These specificationscan be conceptualized as members of a largermultilevel family where indices are purposelyturned on to create a level, or turned off tocreate a point estimate.

1.6 CASE STUDY: PARTYIDENTIFICATION IN WESTERNEUROPE

As an illustration, consider 23,355 citizens’feeling of alignment with a political party inten Western European countries1 taken fromthe Comparative Study of Electoral Systems(CSES) for 16 elections from 2001 to 2007.The natural hierarchy for these eligible votersis: district, election, and country (some coun-tries held more than one parliamentary elec-tion during this time period). The percentageof those surveyed who felt close to one partyvaries from 0.29 (Ireland 2002) to 0.65 (Spain2004).

Running a logistic regression model onthese data using individual-, district-, andcountry-level covariates as though they areindividual-specific (fully pooled) requiresdramatically different ranges for the explana-tory variables to produce reliable coefficients.Since Western European countries do notshow such differences in individual-levelcovariates and the country-level covariates donot vary strongly with the outcome variable(correlation of −0.25), the model needs totake into account higher-level variations. Thisis done by specifying a hierarchical model totake into account the natural groupings in thedata.

The CSES dataset provides multiple waysto consider hierarchies through region andtime. Respondents can be nested in votingdistricts, elections, and countries. Addition-ally, one could add a time dynamic takinginto account that elections within a singlecountry have a temporal ordering. Twelve of

20 40 60 80 100

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Age

Pro

porti

on o

f “Y

es”

Figure 1.4 Empirical Proportion of “Yes”Versus Age, by Gender

the elections considered belong to groupingsof size two (grouped by country), and infour countries there was a single election.If a researcher expects heterogeneity to beexplained by dynamics within and betweenparticular elections, the developed model willbe hierarchical with two levels based ondistricts and elections. In Figure 1.4 this isshown by plotting the observed fraction of“Yes” answers for each age, separated by gen-der (gray dots for men, black dots for women),for these elections. Notice that in the aggre-gated data women are generally less likely toidentify with a party for any given age, eventhough identification for both men and womenincreases with age.

The outcome variable for our model isa dichotomous measure from the question“Do you usually think of yourself as closeto any particular political party?” codedzero for “No” and one for “Yes” (numbers

Page 14: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 15

1.6 CASE STUDY: PARTY IDENTIFICATION IN WESTERN EUROPE 15

B3028, C3020_1). Here attention is focusedon only a modest subset of the total num-ber of possible explanatory variables. Theindividual-level demographic variablesare: Age of respondent in years (number2001); Female, with men equal to zero andwomen equal to one (number 2002); therespondent’s income quintile labeledIncome; and the respondent’s municipalitycoded Rural/Village, Small/Middle

Town, City Suburb, City Metropolitan,with the first category used as the referencecategory in the model specification (numbersB2027, C2027). Subjects are nested withinelectoral districts with a district-level variabledescribing the number of seats elected by pro-portional representation in the given district.Additionally, these districts are nested withinelections with an election-level variabledescribing the effective number of parties inthe election. The variable Parties (number5094) gives the effective number of politicalparties in each country, and Seats (number4001) indicates the number of seats contestedin each district of the first segment of thelegislature’s lower house where the respon-dent resides. Further details can be found athttp :// www.cses.org/varlist/varli

st_full.htm .For an initial analysis, ignore the nested

structure of the data and simply ana-lyze the dataset using a logistic general-ized linear model. The outcome variable ismodeled as

yi |pi ∼ Bern(pi ) log

(pi

1− pi

)= xiβ,

where xi is the vector of covariates for thei th respondent and β is a vector of coeffi-cients to be estimated. The base categoriesfor this model are men for Female, the firstincome quantile, and Rural/Village forregion. Table 1.1 provides this standard logis-tic model in the first block of results.

For the second model, analyze a two-levelhierarchy: one at district level and one atthe election level, represented by random

intercept contributions to the individual level.The outcome variable is now modeled as:

yi jk |pi jk ∼ Bern(pi jk)

log

(pi jk

1− pi jk

)= β0 jk + xi jkβ

β0 jk = β0k + βSeats

× Seats jk + u0 jk

β0k = β0 + βParties

× Partiesk + u0k

u0 jk ∼ N (0, σ 2d )

u0k ∼ N (0, σ 2e ).

Therefore, Seats and Parties are predic-tors at different levels of the model. SinceParties is constant in an election, it predictsthe change in intercept at the election level ofthe hierarchy. Seats is constant in districtsbut varies within an election, so it predictsthe change in intercept at the district level.In addition, all of the district-level randomeffects have a common normal distributionwhich does not change depending on election,and the election-level random effects have adifferent common normal distribution. Thethree levels of the hierarchy are evident andstochastic processes are introduced at eachlevel: Bernoulli distributions at the data leveland normal distributions at the district andelection level.

The multilevel model fits better by severalstandard measures, with a difference in AIC(Akaike information criterion) of 525 and adifference in BIC (Bayesian information cri-terion) of 509 in favor of the multilevel model.This shows that the multilevel model fits thedata dramatically better than the GLM. As adescription of model fit, also consider the per-cent correctly predicted with the naı̌ve crite-rion (splitting predictions at the arbitrary 0.5threshold). The standard GLM gives 57.367%estimated correctly, whereas the multilevelmodel gives 60.522%.

In Table 1.1 there are essentially nodifferences in the coefficient estimatesbetween the two models for Age, Female,Income Level 4, Income Level 5, and

Page 15: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 16

16 THE MULTILEVEL MODEL FRAMEWORK

Table 1.1 Contrasting Specifications, Voting Satisfaction

Standard Logit GLM Random Intercepts Version

Estimate Std Error z-score Estimate Std Error z-score

Intercept –0.3946 0.0417 –9.46 –0.1751 0.0944 –1.855Age 0.0156 0.0008 19.25 0.0168 0.0008 20.207Female –0.2076 0.0267 –7.76 –0.2098 0.0272 –7.709Income Level 2 0.1178 0.0425 2.77 0.0350 0.0436 0.801Income Level 3 0.2282 0.0427 5.35 0.1474 0.0439 3.357Income Level 4 0.2677 0.0443 6.04 0.2468 0.0453 5.454Income Level 5 0.2388 0.0451 5.30 0.2179 0.0466 4.673Small/Middle Town 0.0665 0.0392 1.70 –0.0822 0.0429 –1.916City Suburb 0.1746 0.0431 4.05 –0.0507 0.0500 –1.014City Metropolitan 0.1212 0.0359 3.38 0.0636 0.0417 1.525Parties –0.0408 0.0142 –2.87 –0.1033 0.0846 –1.222Seats 0.0047 0.0010 4.82 0.0027 0.0019 1.403

Residual Deviance 31608 on 23343 df 31079 on 23341 dfNull Deviance 32134 on 23354 df 31590 on 23352 df

σd = 0.2402, σe = 0.32692

Small/Middle Town. However, notice thedifferences between the two models forthe coefficient estimates of Income Level

2, Income Level 3, City Suburb, CityMetropolitan, Parties, and Seats. In allcases where observed differences exist, thestandard generalized linear model gives morereliable coefficient estimates (smaller stan-dard errors), but this is misleading. Once thecorrelation structure of the data is taken intoaccount, there is more uncertainty in the esti-mated values of these parameters.

To further evaluate the fit of the regularGLM, consider a plot of the ranked fittedprobabilities from the model against binnedestimates of the observed proportion of suc-cess in the reordered observed data. Figure1.5 indicates that although the model doesdescribe the general trend of the data, it missessome features since the point-cloud does notadhere closely to the fitted line underlying theassumption of the model. The curve of fittedprobabilities only describes 47% of the vari-ance in these empirical proportions of success.This suggests that there are additional fea-tures of the data, and it is possible to capturethese with the multilevel specification. Thefitted probabilities of the multilevel model

describe 71% of the variation in the binnedestimates of the observed proportion ofsuccess.

After running the initial GLM, one couldhave improperly concluded that the numberof seats in a district, the effective number ofpolitical parties, and city type significantlyinfluenced the response. However, these vari-ables in this specification are mimicking thecorrelation structure of the data, which wasmore effectively taken into account throughthe multilevel model framework, as evidencedby the large gains in predictive accuracy. Thisis also apparent taking the binned empiricalvalues and breaking down their variance invarious ways. To see that the random effectshave a meaningful impact on data fit, com-pare how well the fixed effects and the fullmodel predict the binned values. Normally, aresearcher would be happy if the group-levelstandard deviations were of similar size to theresidual standard deviation, as small groupeffects relative to the residual effect indicatethat the grouping is not effective in the modelspecification. Use of the binned empirical val-ues mimics this kind of analysis for a general-ized linear mixed model. The variance of thebinned values minus the fixed effects is 0.0111

Page 16: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 17

1.7 SUMMARY 17

0 5000 10000 15000 20000

0.0

0.2

0.4

0.6

0.8

1.0

Fitted Rank

Pro

babi

lity

of “

Yes

Figure 1.5 Probability of “Yes” VersusFitted Rank

and the variance of the binned values minusthe fitted values from the full model is 0.0060,indicating that a significant amount of varia-tion in the data that is indeed captured by therandom effects.

1.6.1 ComputationalConsiderations

Multilevel models are more demanding of sta-tistical software than more standard regres-sions. In the case of the Gaussian–Gaussianmultilevel model such as the one-way randomeffects model, it is possible to integrate out alllevels of the hierarchy to be left with a like-lihood for y which only depends on the fixedeffects and the induced covariance structure.Maximum likelihood (or restricted maximumlikelihood) estimates for the coefficients onthe fixed effects as well as the parameters ofthe variance components can be computed.

Moving beyond the simple Gaussian–Gaussian model greatly complicates

estimation considerations. This includesboth relaxations of the assumptions of thelinear mixed effects models and complica-tions arising from nonlinear link functionsin generalized linear mixed models. Asophisticated set of estimation strategiesusing approximations to the likelihood havebeen developed to overcome these difficultiesand are discussed in Chapter 3. Alternatively,the Bayesian paradigm offers a suite ofMCMC methods for producing samplesfrom the posterior distribution of the modelparameters. These alternatives are discussedin Chapter 4.

1.7 SUMMARY

This introduction to multilevel models pro-vides an overview of a class of regressionmodels that account for hierarchical structurein data. Such data occur when there are nat-ural levels of aggregation whereby individ-ual cases are nested within groups, and thosegroups may also be nested in higher-levelgroups. It provides a general description of themodel features that enable multilevel modelsto account for such structure, demonstratesthat ignoring hierarchies produces incorrectinferential statements in model summaries,and has illustrated that point with a simpleexample using a real dataset.

Aitkin and his co-authors (especially, 1981,1986) introduced the linear multilevel modelin the 1980s, concentrating on applicationsin education research since the hierarchy inthat setting is obvious: students in classrooms,classrooms in schools, schools in districts, anddistricts in states. These applications were alljust linear models and yet they substantiallyimproved fit to the data in educational settings.Since this era more elaborate specificationshave been developed for nonlinear outcomes,non-nested hierarchies, correlations betweenhierarchies, and more. This has been a veryactive area of research both theoretically andin applied settings. These developments aredescribed in detail in subsequent chapters.

Multilevel models are flexible toolsbecause they exist in the spectrum between

Page 17: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 18

18 THE MULTILEVEL MODEL FRAMEWORK

fully pooled models, where groupings areignored, and fully unpooled models, whereeach group gets its own regression statement.This means that multilevel models recognizeboth commonalities within the cases anddifferences between group effects. Thegained efficiency is both notational andsubstantive. The notational efficiency occursbecause there are direct means of expressinghierarchies with subscripts, nested subscripts,and sets of subscripts. This contrasts withmessy “dummy” coding of group definitionswith large numbers of categories. Multilevelmodels account for individual- versus group-level variation because these two sourcesof variability are both explicitly taken intoaccount. Since all non-modeled variationfalls to the residuals, multilevel models areguaranteed to capture between-group vari-ability when it exists. These forms are alsoa convenient way of estimating separately,but concurrently, regression coefficientsfor groups. The alternative is to constructseparate models whereby between-groupvariability is completely lost. In addition,multilevel models provide more flexibilityfor expressing scientific theories, whichroutinely consider settings where individualcases are contained in larger groups, whichthemselves are contained in even largergroups, and so on. Moreover, there arereal problems with ignoring hierarchies indata. The resulting models will have thewrong standard errors on group-affectedcoefficients since fully pooled results assumethat the apparent commonalities are resultsof individual effects. This problem also spillsover into covariances between coefficientestimates. Thus, the multilevel modelingframework not only respects the data, butprovides better statistical inference which canbe used to describe phenomena and informdecisions.

NOTE

1 These are: Switzerland, Germany, Spain, Fin-land, Ireland, Iceland, Italy, Norway, Portugal, andSweden.

REFERENCES

Aitkin, M., Anderson, D., and Hinde, J. (1981) ‘Statis-tical Modeling of Data on Teaching Styles’, Jour-nal of the Royal Statistical Society, Series A, 144:419–61.

Aitkin, M. and Longford, N. (1986) ‘Statistical Mod-eling Issues in School Effectiveness Studies’,Journal of the Royal Statistical Society, Series A,149: 1–43.

Albert, J.H. and Chib, S. (1993) ‘Bayesian Analysis ofBinary and Polychotomous Response Data’, Jour-nal of the American Statistical Association, 88:669–79.

Amemiya, T. (1985) Advanced Econometrics. Cam-bridge, MA: Harvard University Press.

Baum, L.E. and Eagon, J.A. (1967) ‘An Inequality withApplications to Statistical Estimation for Probabilis-tic Functions of Markov Processes and to a Modelfor Ecology’, Bulletin of the American Mathemat-ical Society, 73: 360–3.

Baum, L.E. and Petrie, T. (1966) ‘Statistical Infer-ence for Probabilistic Functions of Finite MarkovChains’, Annals of Mathematical Statistics, 37:1554–63.

Booth, J.G., Casella, G., and Hobert, J.P. (2008) ‘Clus-tering Using Objective Functions and StochasticSearch’, Journal of the Royal Statistical Society,Series B, 70: 119–39.

Breslow, N.E. and Clayton, D.G. (1993) ‘ApproximateInference In Generalized Linear Mixed Models’,Journal of the American Statistical Association, 88:9–25.

Bryk, A.S. and Raudenbush, S.W. (1987) ‘Applica-tions of Hierarchical Linear Models to AssessingChange’, Psychological Bulletin, 101: 147–58.

Bryk, A.S., Raudenbush, S.W., Seltzer, M., and Cong-don, R. (1988) An Introduction to HLM: ComputerProgram and User’s Guide. 2nd edn. Chicago: Uni-versity of Chicago Department of Education.

Burstein, L. (1980) ‘The Analysis of Multi-Level Datain Educational Research and Evaluation’, Reviewof Research in Education, 8: 158–233.

Carlin, B.P., Gelfand, A.E., and Smith, A.F.M. (1992)‘Hierarchical Bayesian Analysis of ChangepointProblems’, Applied Statistics, 41: 389–405.

Casella, G. and Berger, R.L. (2001) Statistical Infer-ence. 2nd edn. Belmont, CA: Duxbury AdvancedSeries.

Christiansen, C.L. and Morris, C.N. (1997) ‘Hierarchi-cal Poisson Regression Modeling’, Journal of theAmerican Statistical Association, 92: 618–32.

Cohen, J., Nagin, D., Wallstrom, G., and Wasserman,L. (1998) ‘Hierarchical Bayesian Analysis of ArrestRates’, Journal of the American Statistical Associ-ation, 93: 1260–70.

Cowles, M.K. (2002) ‘MCMC Sampler ConvergenceRates for Hierarchical Normal Linear Models: A

Page 18: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 19

REFERENCES 19

Simulation Approach’, Statistics and Computing,12: 377–89.

Daniels, M.J. and Gatsonis, C. (1999) ‘HierarchicalGeneralized Linear Models in the Analysis of Vari-ations in Health Care Utilization’, Journal of theAmerican Statistical Association, 94: 29–42.

De Leeuw, J. and Kreft, I. (1986) ‘Random Coeffi-cient Models for Multilevel Analysis’, Journal ofEducational Statistics, 11: 57–85.

Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977)‘Maximum Likelihood from Incomplete Data viathe EM Algorithm’, Journal of the Royal StatisticalSociety, Series B, 39: 1–38.

Eisenhart, C. (1947) ‘The Assumptions Underlying theAnalysis of Variance’, Biometrics, 3: 1–21.

Gelman, A. (2005) ‘Analysis of Variance: Why It IsMore Important Than Ever’, Annals of Statistics,33: 1–53.

Gelman, A. (2006) ‘Prior Distributions for VarianceParameters in Hierarchical Models’, Bayesian Anal-ysis, 1: 515–33.

Gelman, A. and Hill, J. (2007) Data Analysis UsingRegression and Multilevel/Hierarchical Models.Cambridge: Cambridge University Press.

Goel, P.K. (1983) ‘Information Measures and BayesianHierarchical Models’, Journal of the American Sta-tistical Association, 78: 408–10.

Goel, P.K. and DeGroot, M.H. (1981) ‘InformationAbout Hyperparameters in Hierarchical Models’,Journal of the American Statistical Association, 76:140–7.

Goldstein, H. (1985) Multilevel Statistical Models.New York: Halstead Press.

Goldstein, H. (1987) Multilevel Models in Educationand Social Research. Oxford: Oxford UniversityPress.

Hadjicostas, P. and Berry, S.M. (1999) ‘Improper andProper Posteriors with Improper Priors in a Poisson-Gamma Hierarchical Model’, Test, 8: 147–66.

Hartley, H.O. (1958) ‘Maximum Likelihood EstimationFrom Incomplete Data’, Biometrics, 14: 174–94.

Healy, M. and Westmacott, M. (1956) ‘Missing Valuesin Experiments Analysed on Automatic Comput-ers’, Journal of the Royal Statistical Society, SeriesC, 5: 203–6.

Henderson, C.R. (1950) ‘Estimation of Genetic Param-eters’, Biometrics, 6: 186–7.

Henderson, C.R., Kempthorne, O., Searle, S.R., andvon Krosigk, C.M. (1959) ‘The Estimation of Envi-ronmental and Genetic Trends From Records Sub-ject To Culling’, Biometrics, 15: 192.

Hobert, J.P. and Casella, G. (1996) ‘The Effect ofImproper Priors on Gibbs Sampling in Hierarchi-cal Linear Mixed Models’, Journal of the AmericanStatistical Association, 91: 1461–73.

Hodges, J.S. and Sargent, D.J. (2001) ‘CountingDegrees of Freedom in Hierarchical and OtherRichly Parameterized Models’, Biometrika, 88:367–79.

Jones, G.L. and Hobert, J.P. (2001) ‘Honest Explorationof Intractable Probability Distributions via MarkovChain Monte Carlo’, Statistical Science, 16:312–34.

Kreft, I.G.G. and De Leeuw, J. (1988) ‘The SeesawEffect: A Multilevel Problem?’, Quality and Quan-tity, 22: 127–37.

Laird, N.M. and Ware, J.H. (1982) ‘Random-effectsModels for Longitudinal Data’, Biometrics, 38:963–74.

Lee, V.E. and Bryk, A.S. (1989) ‘Multilevel Model of theSocial Distribution of High School Achievement’,Sociology of Education, 62: 172–92.

Lindstrom, M.J. and Bates, D.M. (1988) ‘Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data’,Journal of the American Statistical Association, 83:1014–22.

Liu, J.S. (1994) ‘The Collapsed Gibbs Sampler inBayesian Computations with Applications to aGene Regulation Problem’, Journal of the Amer-ican Statistical Association, 89: 958–66.

Longford, N.T. (1987) ‘A Fast Scoring Algorithm forMaximum Likelihood Estimation in UnbalancedMixed Models With Nested Random Effects’,Biometrika, 74: 817–27.

Mason, W.M., Wong, G.Y., and Entwistle, B. (1983)‘Contextual Analysis Through the MultilevelLinear Model’, in S. Leinhardt (ed.), Sociologi-cal Methodology 1983–1984. Oxford: Blackwell,72–103.

McKendrick, A.G. (1926) ‘Applications ofMathematics to Medical Problems’, Proceed-ings of the Edinburgh Mathematical Society, 44:98–130.

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N.,Teller, A.H., and Teller E. (1953) ‘Equation of StateCalculations by Fast Computing Machines’, Jour-nal of Chemical Physics, 21: 1087–91.

Newcomb, S. (1886) ‘A Generalized Theory of theCombination of Observations So As to Obtain theBest Results’, American Journal of Mathematics,8: 343–66.

Pauler, D.K., Menon, U., McIntosh, M., Symecko,H.L., Skates, S.J., and Jacobs, I.J. (2001) ‘Fac-tors Influencing Serum CA125II Levels in HealthyPostmenopausal Women’, Cancer Epidemiology,Biomarkers & Prevention, 10: 489–93.

Pettitt, A.N., Tran, T.T., Haynes, M.A., and Hay, J.L.(2006) ‘A Bayesian Hierarchical Model for Cate-gorical Longitudinal Data From a Social Survey ofImmigrants’, Journal of the Royal Statistical Soci-ety, Series A, 169: 97–114.

Raudenbush, S.W. (1988) ‘Education Applications ofHierarchical Linear Models: A Review’, Journal ofEducational Statistics, 12: 85–116.

Raudenbush, S.W. and Bryk, A.S. (1986) ‘A Hierarchi-cal Model for Studying School Effects’, Sociologyof Education, 59: 1–17.

Page 19: The SAGE Handbook of Multilevel Modeling › sites › default › files › upm... · Markov chain Monte Carlo (MCMC) algo-rithm (Metropolis et al. 1953) slept quietly in the Journal

“Handbook_Sample.tex” — 2013/7/25 — 11:25 — page 20

20 THE MULTILEVEL MODEL FRAMEWORK

Ravishanker, N. and Dey, D.K. (2002) A First CourseIn Linear Model Theory. New York: Chapman &Hall/CRC.

Scheffé, H. (1956) ‘Alternative Models for the Analy-sis of Variance’, Annals of Mathematical Statistics,27: 251–71.

Stangl, D.K. (1995) ‘Prediction and Decision MakingUsing Bayesian Hierarchical Models’, Statistics inMedicine, 14: 2173–90.

Steffey, D. (1992) ‘Hierarchical Bayesian ModelingWith Elicited Prior Information’, Communicationsin Statistics–Theory and Methods, 21: 799–821.

Stiratelli, R., Laird, N., and Ware, J.H. (1984) ‘Random-Effects Models for Serial Observations with BinaryResponse’, Biometrics, 40: 961–71.

Zangwill, W.I. (1969) Nonlinear Programming: A Uni-fied Approach. Englewood Cliffs, NJ: Prentice-Hall.

Zeger, S.L. and Karim, M.R. (1991) ‘GeneralizedLinear Models With Random Effects: A Gibbs Sam-pling Approach’, Journal of the American Statisti-cal Association, 86: 79–86.

Zeger, S.L. and Liang, K.-Y. (1986) ‘Longitudinal DataAnalysis for Discrete and Continuous Outcomes’,Biometrics, 42: 121–30.