Introduction to Robust and Clustered Standard...

Introduction to Robust and Clustered Standard Errors

Miguel Sarzosa

Department of Economics

University of Maryland

Econ626: Empirical Microeconomics, 2012

1 Standard Errors, why should you worry about them

2 Obtaining the Correct SE

3 Consequences

4 Now we go to Stata!

Regressions and what we estimateA regression does not calculate the value of a relation between twovariables. In fact, it calculates a distribution of values of suchrelation. That is why, when you calculate a regression the two mostimportant outputs you get are:

I The conditional mean of the coe�cientI The standard deviation of the distribution of that coe�cient

Are we in the presence of a star?

The standard errors determine how accurate is your estimation.Therefore, it a�ects the hypothesis testing.That is why the standard errors are so important: they are crucial indetermining how many stars your table gets.And like in any business, in economics, the stars matter a lot.Hence, obtaining the correct SE, is critical

SE and the data

The correct SE estimation procedure is given by the underlying structureof the data.

It is very unlikely that all observations in a data set are unrelated, butdrawn from identical distributionsFor instance, the variance of income is often grater in familiesbelonging to top deciles than among poorer familiesSome phenomena do not a�ect observations individually, but theya�ect groups of observations uniformly within each group.

SE and the data

The correct SE estimation procedure is given by the underlying structureof the data.

It is very unlikely that all observations in a data set are unrelated, butdrawn from identical distributionsFor instance, the variance of income is often grater in familiesbelonging to top deciles than among poorer families(heteroskedasticity)

Some phenomena do not a�ect observations individually, but theya�ect groups of observations uniformly within each group. (clustered

The Basic Case (i.i.d.)

b + ei

⇠�0,s2

This means that E

hee 0 |X

i= ⌦= s2

We know b =hX

0y = b +

0e and

Var (b ) = E

0ee 0X

Heteroskedasticity (i.n.i.d)

b + ei

⇠�0,s2

This means that E

hee 0 |X

i= ⌦= diag

�(big di�erence)

Heteroskedasticity (i.n.i.d)

Var (b )=E

0ee 0X

�=hX

0ee 0X

No further simplification is possibleNeed to estimate E

0ee 0X

i= ÂN

Be aware that ÂN

0^e^eX

Then the Huber-Eicker-White (HEW) VC estimator is:

⇣b⌘=hX

Two-step procedure: 1. Get ei

, 2. Estimate (1). In Stata:vce(robust)

Clustered Data

Observations are related with each other within certain groups

ExampleYou want to regress kids’ grades on class size.The unobservables of kids belonging to the same classroom will becorrelated (e.g., teachers’ quality, recess routines) while will not becorrelated with kids in far away classrooms.

This means, we are assuming independence across clusters butcorrelation within clusters. That is:

b + eig

where g = 1, . . . ,G

(0 if g = g

s(ij)g if g 6= g

Clustered Data

Let us stack observations by cluster

b + eg

where g = 1, . . . ,G

The OLS estimator of b is still b =hX

0y. (We just stacked

the data)The variance is given by

Var (b ) = E

Clustered Data

By (2), we know that

⌦= diag (⌃g

666666666666666666666664

(11)1 . . . s(1N

)1 0 . . . 0 . . . 0 . . . 0

1)1 s(N1

)1 0 0 0 0

0 . . . 0 s2

(11)2 . . . s(1N

0 . . . 0 s(N2

1)2 . . . s2

(11)g . . . s(1N

1)g . . . s2

777777777777777777777775

Clustered Data

With this in mind, we can now write the VC matrix for clustered data

⇣b⌘=

In Stata: vce(cluster clustvar ). Where clustvar is a variablethat identifies the groups in which onobservables are allowed tocorrelate.

The importance of knowing your data

In real world you should never go with the i.i.d. case. Life is not thatsimpleYou need to know your data in order to choose the correct errorstructure and then infer the required SE calculationIf you have aggregate variables (like class size), clustering at that levelis required.You might think your data correlates in more than one way

I If nested (e.g., classroom and school district), you should cluster at thehighest level of aggregation

I If not nested (e.g., time and space), you can:1

Include fixed-e�ects in one dimension and cluster in the other one.

Multi-way clustering extension (see Cameron, Gelbach and Miller, 2006)

How much will my SE increase?

Clustered SE will increase your confidence intervals because you areallowing for correlation between observations. Hence, less stars in yourtables. The higher the clustering level, the larger the resulting SE.For one regressor the clustered SE inflate the default (i.i.d.) SE by

re�N �1

were rx

is the within-cluster correlation of the regressor, re is thewithin-cluster error correlation and N is the average cluster size.Here it is easy to see the importance of clustering when you haveaggregate regressors (i.e., r

Now we go to Stata!

Introduction to Robust and Clustered Standard...

Documents

Transcript of Introduction to Robust and Clustered Standard...

The Clustered Community of

CLUSTERED DATA ONTAP ADMINISTRATION, 8.2 UPDATE …s804283110.onlinehome.us/Netapp Clustered Data ONTAP Administrati… · FEATURED NETAPP PRODUCTS Clustered Data ONTAP 8.2 NetApp

Clustered Data ONTAP 8

Clustered File Systems: No Limits - SNIA · Clustered File Systems: No Limits ... Clustered File System Types 8 Distributed/Clustered File Systems Parallel ... code management, troubleshooting

Clustered-Dot Color HalftoneWatermarks

Clustrix Parallelized Clustered Database

Clustered Data ONTAP 8.1 and An Introduction - CDWwebobjects.cdw.com/.../netapp/clustered-data-ontap-introduction.pdf · 6 Clustered Data ONTAP 8.1 AND 8.1.1: An Introduction system

Robust and Clustered Standard Errors

Leverage causes fat tails and clustered volatility · PDF fileLeverage causes fat tails and clustered volatility ... to previous explanations of fat tails and clustered volatility,

Robust Calibration for Localization in Clustered Wireless Sensor Networks-0Rm

Practical Clustered Shading - Humus

Silvana de Jesús Olalla Sarzosa Mario Troya, MA., Director de ...

The Economic Performance of Clustered and Non Clustered Firms

Robust Inference with Clustered Data - Colin Cameron, …cameron.econ.ucdavis.edu/research/slidesrobustsurvey… · · 2011-05-25Clustering and Its Consequences for OLS Equicorrelated

Clustered graphs

Clustered Naive Bayes

Clustered data ontap_8.3_physical_storage

Clustered Data ONTAP Security GuidanceCli Technical Report Clustered Data ONTAP Security Guidance Recommendations for Security Dave Buster, NetApp April 2015 | TR-4393 Abstract Clustered

Clustered compressive sensingbased

Clustered graph, visualization, hierarchical visualization