Wooldridge Cluster

download Wooldridge Cluster

of 7

Transcript of Wooldridge Cluster

  • 7/27/2019 Wooldridge Cluster

    1/7

    American Economic Association

    Cluster-Sample Methods in Applied EconometricsAuthor(s): Jeffrey M. WooldridgeSource: The American Economic Review, Vol. 93, No. 2, Papers and Proceedings of the OneHundred Fifteenth Annual Meeting of the American Economic Association, Washington, DC,January 3-5, 2003 (May, 2003), pp. 133-138Published by: American Economic AssociationStable URL: http://www.jstor.org/stable/3132213 .

    Accessed: 20/04/2013 15:04

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .

    http://www.jstor.org/page/info/about/policies/terms.jsp

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

    content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    .

    American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to The

    American Economic Review.

    http://www.jstor.org

    This content downloaded from 128.122.149.154 on Sat, 20 Apr 2013 15:04:26 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/action/showPublisher?publisherCode=aeahttp://www.jstor.org/stable/3132213?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/3132213?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=aea
  • 7/27/2019 Wooldridge Cluster

    2/7

    Cluster-SampleMethodsinAppliedEconometricsBy JEFFREYM. WOOLDRIDGE*

    Inference methodsthatrecognize the cluster-ing of individualobservationshave been avail-able for more than 25 years. Brent Moulton(1990) caughtthe attentionof economistswhenhe demonstrated he serious biases that can re-sult in estimating the effects of aggregateexplanatoryvariables on individual-specificre-sponse variables. The source of the downwardbias in the usual ordinary east-squares(OLS)standarderrors is the presence of an unob-served,state-leveleffect in the error erm.Morerecently, John Pepper (2002) showed how ac-counting for multi-level clustering can havedramaticeffects on t statistics. While adjustingfor clustering s muchmore commonthan t was10 years ago, inferencemethodsrobustto clus-ter correlationare not used routinelyacross allrelevant settings. In this paper, I provide anoverview of applications of cluster-samplemethods, both to cluster samples and to paneldata sets.Potentialproblemswith inference n thepres-ence of group effects when the number ofgroups is small have been highlightedin a re-cent paperby StephenDonald and Kevin Lang(2001). I review differentways of handlingthesmall numberof groupscase in Section III.

    I. The ModelThe goal is to estimate the parametersn thefollowing linearmodel:

    (1) Ygm= a + xgJ + Zgmy + vgm = 1, ... , Mg g = 1, ... , G

    where g indexes the "group"or "cluster,"mindexes observationswithin group, Mg is thegroupsize, and G is the numberof groups.The* Departmentof Economics,MichiganStateUniversity,EastLansing,MI48824-1038 (e-mail:wooldril @msu.edu).I thank Steven Haider and John Pepper for very helpfulcommentson an earlierdraft. An expandedversion of thispaperis availablefrom the authorupon request.

    133

    1 X K vectorXgcontainsexplanatoryvariablesthatvaryonly at the grouplevel, andthe 1 X Lvector zgmcontains explanatoryvariables thatvary within group.The approach o estimationandinference n equation 1) dependson severalfactors, ncludingwhetherone is interestedn theeffects of aggregatevariables(3S)or individual-specific variables (y). Plus, it is necessary tomake assumptions about the error terms. Animportantssue is whetherVgm ontains a com-mon groupeffect, as in(2) Vgm Cg+ gm m= 1,...,Mgwhere Cg is an unobservedcluster effect andUgm is the idiosyncraticerror.[In the statisticsliterature, 1) and (2) arereferred o as a "hier-archicallinearmodel."]One important ssue iswhetherthe explanatoryvariables n (1) can betakento be appropriatelyxogenous. Under(2),exogeneity issues can be brokendown by sep-aratelyconsideringCg and Ugm.I assumethatthe samplingscheme generatesobservations hatareindependentacrossg. Ap-propriatesampling assumptionswithin clusterare more complicated.Theoretically,the sim-plest case also allows the most flexibility forrobust inference: from a large population ofrelativelysmallclusters,drawa largenumberofclusters (G), of sizes Mg. This setup is appro-priatein randomlysamplinga large numberoffamilies or classrooms.The key feature is thatthe number of groups is large enough so thatone can allow general within-clustercorrela-tion. Randomly sampling a large number ofclusters also applies to many panel data sets,where the cross-sectional population size islarge (say, individuals)and the numberof timeperiods s small. Forpaneldata,G is thenumberof cross-sectionalunits, and Mg is the numberof time periodsfor unit g.Stratifiedsampling also results in data setsthatcan be arrangedby group,where the pop-ulation is first stratified nto G > 2 nonover-lapping groups and then a random sample ofsize Mg is obtainedfrom each group. Ideally,

    This content downloaded from 128.122.149.154 on Sat, 20 Apr 2013 15:04:26 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Wooldridge Cluster

    3/7

    AEAPAPERSAND PROCEEDINGSthe strata izes arelargein thepopulation, esult-ingin largeMg.I consider hissampling cheme nSectionHIl.

    II. The Numberof Groups s "Large"The asymptotic theory for G -> co is welldeveloped;for a recenttreatment, ee Chapters7, 10, and 11 in Wooldridge (2002). Supposethat the covariatesare exogenous in the sensethat

    (3) E(Vgmlg, Zg) = 0m=l, ...,Mg g=l ,...G

    where Zg contains Zgm, m = 1, ..., Mg. Thena pooled ordinary east-squares(OLS) estima-tor, where Ygm is regressed on 1, Xg, Zgm (m =1, .... Mg; g = 1, ..., G) is consistent for A-(a, 1',y')' (as G -> oowith Mg fixed) andV/-asymptotically normal. Withoutmore as-sumptions,a robustvariancematrix s needed toaccount for correlationwithin clusters or het-eroscedasticity n Var(vgmlxg,Zg). When Vgmhas the form in (2), the within-clustercorrela-tion can be substantial,which means the usualOLS standarderrorscan be very misleading.Section7.8 in Wooldridge(2002) gives the for-mula for a variance-matrixestimatorthat as-sumes no particular kind of within-clustercorrelationnor a particular orm of heterosce-dasticity.These formulasapplywithoutchangeto panel data with a large number of cross-sectional observations.Such variancematricesareeasy to computenow with existing softwarepackages.Under (2) one can use generalized leastsquares(GLS) to exploit the presence of Cg nvgm.The standardassumptionsimply that theMg X Mg variance-covariancematrixof Vg =(Vgl, g2, ..., VgM )' has the "random effects"form, Var(vg) = (rcjM jM + 0ruIM, wherejM is the Mg X 1 vector of l's and IM isthe Mg X Mg identitymatrix.The standard s-sumptions also include the "system homosce-dasticity"assumption,Var(vglxg,Zg) = Var(vg).The resultingGLS estimator s the well-knownrandom-effects RE)estimator see Section 10.3in Wooldridge[2002]).The RE estimator s asymptoticallymore ef-ficient than pooled OLS under the usual

    RE assumptions, and RE estimates and teststatistics are computed by popular softwarepackages.Somethingoften overlooked n appli-cations s thatone can make nference ompletelyrobust to an unknown form of Var(vglxg, Zg).Equation7.49 in Wooldridge(2002) gives therobustformula.Even if Var(vglxg,Zg) does nothave the RE form, the RE estimator is stillconsistentand

  • 7/27/2019 Wooldridge Cluster

    4/7

    RECENTADVANCES N ECONOMETRICMETHODOLOGY

    Var(ugIZg) o have an arbitraryorm, includingwithin-groupcorrelationandheteroscedasticity.ManuelArellano(1987) proposeda fully robustvariance-matrixestimatorfor the fixed-effectsestimator,and it works with clustersamples orpanel data (see also equation 10.59 in Wool-dridge [2002]). Reasons for wanting a fullyrobustvariance-matrix stimator or FE appliedto cluster samples are similar to the RE case.

    III. The Numberof Groups s "Small"The proceduresdescribed in Section II areeasy to implement,so it is natural o ask:Eventhoughthose proceduresare theoretically usti-

    fied for large G, might they work well formoderate,or even small, G? Joshua D. AngristandVictorLavy (2002) providereferencesthatshow how cluster-robust stimatorsafterpooledOLS do not work very well, even when G is aslargeas 40 or 50. Less is known abouthow wellthe fully robust variance-matrix stimatorandthe associated robust inference work after REestimation.Recently, in the context of fixed-effectsesti-mationandpaneldata,GaborK6zde(2001) andMarianneBertrand t al. (2002) studythe finite-samplepropertiesof robustvariance-matrix s-timators that are theoreticallyjustified only asG --> oo.One common findingis that the fullyrobust estimator works reasonably well evenwhen the cross-sectionalsample size is not es-pecially largerelativeto the time-seriesdimen-sion. When Var(ugIZg)does not dependon Zg,a variancematrix hatexploits systemhomosce-dasticitycanperformbetter han thefully robustvariance-matrix stimator.Importantly, he encouragingfindingsof the

    simulationsfor fixed effects with paneldata arenot in conflict with findings that the robustvariance matrix for the pooled OLS estimatorwith a small number of groups can behavepoorly. For FE estimationusing panel data,theissue is serial correlation n {ugm: m = 1, ....Mg}, which dies out as the time periodsget farapart.The pooled OLS estimatorthatkeeps cgin the error ermsuffersbecauseof the constantcorrelationacross all observationswithin clus-ter.Plus, FE estimates y, while for pooled OLSwith clusteringthe focus is usually on f.When G is very small, relying on large Gasymptotics can be very misleading. Donald

    and Lang (2001; hereafter,DL) have recentlyoffered an alternative approachto inference,particularly or hypothesis testing about ,3. Tobegin, consider a special case of (1) whereZgmis not in the equation and Xg is a scalar. Theequationis(5) ygm= a +1Xg + cg + ugm

    m= l,...,Mg g= 1,...,Gwhere Cg and {ugm: m = 1, ..., Mg} areindependent of xg and { ugm: m = 1, ... , Mg }is a mean-zero, ndependent, denticallydistrib-uted sequence for each g. Even with small G,the pooled OLS estimator s natural or estimat-ing /3.If the cluster effect Cg s absentfrom themodel and Var(ugm) is constant across g, thenprovided N M + ... + MG is large enough(whetheror not G is not large), we can use theusual t statistics from the pooled OLS regres-sion as having an approximate tandardnormaldistribution.Making inference robust to het-eroscedasticity s straightforwardor large N.As pointed out by DL, the presence of Cgmakes the usualpooled-OLSinferencemethodspoorly behavedwith small G. With a commoncluster effect, there is no averagingout withincluster that allows applicationof the central-limit theorem.One way to see the problem s tonote that the pooled-OLSestimator, 3, is iden-tical to the "between"estimatorobtainedfromthe regressionof yg on 1, xg (g = 1, ..., G).Given the Xg, I8 inherits its distributionfrom{vg: g = 1, ..., G}, the within-groupaveragesof the vgm. The presence of Cg means newobservationswithingroupdo not provideaddi-tional information or estimating13beyond af-fecting the groupaverage,yg.If some assumptions readded, here s a solu-tion to the inferenceproblem.In particular, s-sume Cg- .N(0, oa2) is independentof Ugm.N(0, oru)and Mg = M for all g where JVdenotesa normaldistribution.Then vg - N(0,o2 + c2M). Since independenceacross g isassumed,the equation(6) yg = a + Bxg + -g g = 1, ..., Gsatisfiesthe classical linear-modelassumptions.Therefore,one can use inferencebased on thetG-2 distribution to test hypotheses about 13,

    VOL.93 NO. 2 135

    This content downloaded from 128.122.149.154 on Sat, 20 Apr 2013 15:04:26 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Wooldridge Cluster

    5/7

    AEA PAPERSAND PROCEEDINGS

    providedG > 2. When G is small, the require-mentsfor a significant statisticare muchmorestringenthen f oneuses the M, + M2+-- + MG - 2distribution, which is what one would bedoing by using the usualpooled OLS statistics.When Xgis a 1 X K vector, one needs G >K + 1 to use the tG -K-1 distributionforinferenceafterestimatingthe aggregatedequa-tion (6) by OLS.If Zgm is in the model,thenonecan addthe groupaverages,zg, to (6), providedG > K + L + 1, and use the tG-K-L-distributionfor inference. (An alternativeap-proachthat conserves on degrees of freedom,but is only approximatelyvalid, is describedbelow.)

    Importantly, erforming he correct nferencein the presence of Cgis not just a matter ofcorrectingthe pooled-OLS standarderrorsforcluster correlation,or using the RE estimator.All threeestimationmethods ead to the same,.But using the between regressionin (6) givesthe appropriate tandarderrorand reportsthesmall degrees of freedomin the t distribution.If the commongroupsize M is large,thenugwill be approximatelynormalvery generally,sovg is approximatelynormalwith constantvari-ance. Even if the group sizes differ, for verylargegroupsizes ug will be a negligible partofvg. ProvidedCg s normallydistributed,classi-cal linear model analysis on (6) should beroughlyvalid.ForsmallG andlargeMg, inferenceobtainedfrom analyzing (6) as a classical linear modelcan be very conservativeif there is no clustereffect. Perhapsthis is desirable,but it also ex-cludes some staples of policy analysis. In thesimplest case, suppose there are two popula-tions with means /ug (g = 1, 2), and thequestionis whethertheirdifferenceis zero. Un-derrandomsamplingfromeachpopulation,andassumingnormalityand equal populationvari-ances, the usual comparison-of-means tatisticis distributedexactly as tM +M2-2 under thenull hypothesis of equal means. With evenmoderate-sizedMl andM2, one can relax nor-mality andadjustthe statisticfor differentpop-ulationvariances.In the DL setup,the standardcomparison-of-means ase cannoteven be ana-lyzed, because G = 2. DL criticizeDavid Cardand Alan B. Krueger (1994) for comparingmean wage changes of fast-food workers be-cause CardandKrueger ail to accountfor cg in

    vgm, but the criticism in the G = 2 case isindistinguishable rom a common criticism ofdifference-in-differences DID) analyses:Howcan one be surethatany observeddifferenceinmeans is due entirelyto the policy change?More generally, in studies with G > 2 itoften makes sense to view the observationsascoming from standard tratified ampling.Withlarge group sample sizes one can get preciseestimates of the group populationmeans, /ug.Forexample, supposethatG = 4 andgroups 1and 2 are controlgroups,while groups3 and4aretreatedgroups.One mightestimatethe pol-icy effect by T = (x3 + /x4)/2 - (,1 + /2)/2,ordifferent ixedweightscouldbe usedto allowfordifferentgrouppopulationsizes. In anycase,one canget a good estimatorof r by plugginginthe groupmeans, and when properlystandard-ized, T will be approximatelystandardnormaleven if theMgareas smallas, say, 30. To obtaina valid standarderror, it is not necessary toassume that the group means or varianceswithin, say, the treatedgroup,are the same. IntheDL approach,heestimated reatment ffect,A, is obtainedby pooling withinthe treatedandcontrolgroups,and then differencingthe treat-ment and controlmeans. Their inferenceusingthe t2 distributions a differentway of account-ing for ,1 :/ /2 or /23+ ,L4.It seems that moreworkis neededto reconcilethe two approacheswhen G is small.With large group sizes, a minimumdistance(MD) approach o estimating18 heds additionalinsight.For each groupg, write a linearmodelwith individual-specificcovariatesas(7) Ygm=6g +ZgmYg + Ugm m = 1, ..., Mgassuming random sampling within groups.Also, make the assumptions for OLS to beconsistent asMg-> oo)andVKg-asymptoticallynormalseeWooldridge,002Ch.4). Thepresenceof group-levelvariablesXg n (1) can be viewedas puttingrestrictionson the intercepts,8g. Inparticular,(8) g = a + Xg g1,...,Gwhere we now think of xg as fixed observedattributes f the differentgroups.Giventhatonecan estimatethe hg precisely,a simpletwo-stepestimationstrategysuggests itself. First,obtain

    136 MAY2003

    This content downloaded from 128.122.149.154 on Sat, 20 Apr 2013 15:04:26 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Wooldridge Cluster

    6/7

    RECENTADVANCES N ECONOMETRICMETHODOLOGYthe 8g (along with 'g) from an OLS regressionwithin each group. Alternatively, to imposeyg = y for all g, then pool across groups andinclude group dummy variablesto get the 8g.If G = K + 1 then one can solve for =- (a,A')'uniquely n termsof theG X 1 vector&:O=X -1, where X is the (K + 1) X (K + 1)matrix with gth row (1, Xg). If G > K + 1,then in a second step, one can use an MDapproach,as described n Section 14.6 of Wool-dridge (2002). If the G X G identitymatrixisthe weighting matrix,the MD estimatorcan becomputedfrom the OLS regressionof(9) g on 1,xg g = 1, ..., G.IfMg = pgM where0 < pg ' 1 and M oo,the MD estimator 0 is consistent and VMM-asymptoticallynormal.However, this MD esti-mator s asymptotically nefficientexcept understrongassumptions.It is not difficult to obtainthe efficient MD estimator-also called the"minimumchi-square"estimator.The simplestcase is when zgm does not appearin the first-stage estimation, so that the 8g are samplemeans.Let g^denotethe usual samplevariancefor group g. The minimumchi-squareestimatorcan be computedby using the weighted-least-squares(WLS) version of (9), wheregroupg isweightedby Mg/l2. Conveniently,the reportedt statisticsfrom the WLS regressionareasymp-totically standardnormal as the groupsizes Mgget large.An example of this kind of procedureis given by Susanna Loeb and John Bound(1996).A by-product of the WLS regression is aminimumchi-squarestatisticthat can be usedtotest the G - K - 1 overidentifyingrestrictions.The statistic is easily obtainedas the weightedsum of squaredresiduals(SSR): underthe nullhypothesisin (8), SSRw a X2-K-1 If the nullhypothesisHois rejectedat a small significancelevel, the Xgare not sufficientfor characterizingthe changing intercepts across groups. If onefails to rejectHo, one canhave some confidencein the specificationandperform nferenceusingthe standardnormaldistribution or t statistics.If Zg, appears n the first stage, one can useas weights the asymptotic variances of the Gintercepts.These mightbe made fully robusttoheteroscedasticity n E(u2mIZgm),r at least al-low different o2. In any case, the weights are

    given by 1/[SE(8g)]2 (g = 1, ... , G), whereSE(8g) are the asymptoticstandard rrors.Forexample,supposeXg s a binary reatmentindicator.Then ,3 is an estimate of an averagetreatmenteffect. If G = 2 thereare no restric-tions to test. With G > 2 one can test theoveridentifying restrictions. Rejection impliesthat there are missing group-level characteris-tics, and one might re-specify the model byaddingelementsto Xg,even if the new elementsare not thoughtto be systematicallyrelated tothe originalelements (as when treatment s ran-domly assigned at the group level).Alternatively, if the restrictions in (8) arerejected, one concludes that 6g = a + Xg3 +Cg,where Cg s the errormade in imposing therestrictions. This leads to the DL approach,which is to analyze the OLS regressionin (9)where inferenceis based on the tG - K-1 i distri-bution. Why is this approach ustified? Since8g = Og -+ Op(Mg 1/2), for largeMg one mightignore the estimation error in 8g. Then, it isas if the equation 8g = a + xg1 + Cg(g =1, ... , G) is being estimatedby OLS. Classicalanalysisis applicablewhenCg-- V(0, 0o2)andCgis independentof Xg.The latterassumptionmeans that differences in the intercepts6g notdue to Xgmustbe unrelated o xg, which seemsreasonable if G is not too small and Xg is arandomlyassigned treatmentvariableassignedat the group level, as in Angrist and Lavy(2002).

    REFERENCESAngrist,JoshuaD. andLavy,Victor."TheEffectof High School MatriculationAwards:Evi-dence from Randomized Trials." Workingpaper, Massachusetts Institute of Technol-ogy, 2002.Arellano,Manuel."ComputingRobustStandardErrorsfor Within-GroupsEstimators."Ox-ford Bulletin of Economics and Statistics,November 1987, 49(4), pp. 431-34.Bertrand, Marianne; Duflo, Esther and Mullain-athan,Sendhil."HowMuchShouldWe TrustDifferences-in-Differencesstimates?"Work-ing paper, MassachusettsInstituteof Tech-nology, 2002.Card, David and Krueger, Alan B. "MinimumWages and Employment:A Case Study ofthe Fast-Food Industryin New Jersey and

    VOL.93 NO. 2 137

    This content downloaded from 128.122.149.154 on Sat, 20 Apr 2013 15:04:26 PMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp
  • 7/27/2019 Wooldridge Cluster

    7/7

    AEA PAPERSAND PROCEEDINGS

    Pennsylvania."AmericanEconomicReview,September1994, 84(4), pp. 772-93.Donald,StephenG. andLang,Kevin. "Inferencewith Difference in Differences and OtherPanel Data." Working paper, University ofTexas, 2001.Kezde,Gaibor."Robust StandardErrorEstima-tion in Fixed-Effects Panel Models." Work-ing paper,Universityof Michigan,2001.Liang,Kung-YeendZeger,ScottL."LongitudinalDataAnalysisUsing GeneralizedLinearMod-els."Biometrika,April1986, 73(1),pp. 13-22.Loeb,Susannaand Bound,John."TheEffect ofMeasured School Inputs on AcademicAchievement: Evidence from the 1920s,

    1930s, and 1940s BirthCohorts."Review ofEconomics and Statistics, November 1996,78(4), pp. 653-64.Moulton,BrentR. "AnIllustration f a Pitfall inEstimating the Effects of Aggregate Vari-ables on Micro Units."Reviewof Economicsand Statistics,May 1990, 72(2), pp. 334-38.Pepper,John V. "RobustInferencesfrom Ran-dom ClusteredSamples:An ApplicationUs-ing Data from the Panel Study of IncomeDynamics." Economics Letters, May 2002,75(3), pp. 341-45.Wooldridge, effreyM. Econometricanalysis ofcross section and panel data. Cambridge,MA: MIT Press, 2002.

    138 MAY2003

    This content downloaded from 128.122.149.154 on Sat, 20 Apr 2013 15:04:26 PMAll bj JSTOR T d C di i

    http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp