Ps602 notes part1

04/10/2023

Advanced Quantitative Methods - PS 602Notes – Version as of 8/30/12

Robert D. Duval - WVU Dept of Political Science

Class Office116 Woodburn 301A WoodburnTh 9:00-11:30 M-F 12:00-1:00Phone 304-293-9537 (office)

304-599-8913 (home)

email: [email protected] (Do not use Mix email address!)

mailto:[email protected]

April 10, 2023

Syllabus

Required texts

Additional readings

Computer exercises

Course requirementsMidterm - in class, open book (30%)Final – in-class, open book (30%)Research paper (30%)Participation (10%)

http://www.polsci.wvu.edu/duval/ps602/602syl.htm

Slide 2

http://www.polsci.wvu.edu/duval/PS602/602SYL.htm

April 10, 2023

PrerequisitesAn fundamental understanding of calculus

An informal but intuitive understanding of the mathematics of Probability

A sense of humor

Slide 3

April 10, 2023

Statistics is an innate cognitive skillWe all possess the ability to do rudimentary statistical

analysis – in our heads

– intuitively.

The cognitive machinery for stats is built in to us, just like it is for calculus.

This is part of how we process information about the world

It is not simply mysterious arcane jargon It is simply the mysterious arcane way you already

think.Much of it formalizes simple intuition.Why do we set alpha (α) to .05?

Slide 4

April 10, 2023

IntroductionThis course is about Regression analysis.

The principle method in the social science

Three basic parts to the course:An introduction to the general ModelThe formal assumptions and what they mean.Selected special topics that relate to regression

and linear models.

Slide 5

April 10, 2023

Introduction: The General Linear Model

The General Linear Model (GLM) is a phrase used to indicate a class of statistical models which include simple linear regression analysis.

Regression is the predominant statistical tool used in the social sciences due to its simplicity and versatility.

Also called Linear Regression Analysis.Multiple RegressionOrdinary Least Squares

Slide 6

April 10, 2023

Simple Linear Regression: The Basic Mathematical Model

Regression is based on the concept of the simple proportional relationship - also known as the straight line.

We can express this idea mathematically!Theoretical aside: All theoretical statements

of relationship imply a mathematical theoretical structure.

Just because it isn’t explicitly stated doesn’t mean that the math isn’t implicit in the language itself!

Slide 7

April 10, 2023

Math and languageWhen we speak about the world, we often have

imbedded implicit relationships that we are referring to.

For instance: Increasing taxes will send us into recession.Decreasing taxes will spur economic growth.

Slide 8

April 10, 2023

From Language to modelsThe idea that reducing taxes on the wealthy will

spur economic growth (or increasing taxes will harm economic growth) suggests that there is a proportional relationship between tax rates and growth in domestic product.

So lets look!Disclaimer! The “models” that follow are meant to be

examples. They are not “good” models, only useful ones to talk about!

Slide 9

Sources: (1) US Bureau of Economic Analysis, http://www.bea.gov/iTable/iTable.cfm?ReqID=9&step=1 (2) US Internal Revenue Service, http://www.irs.gov/pub/irs-soi/09in05tr.xls

GDP and Average Tax Rates: 1986-2009

http://www.bea.gov/iTable/iTable.cfm?ReqID=9&step=1

April 10, 2023

The Stats

Slide 11

regress gdp avetaxrate

Source | SS df MS Number of obs = 24-------------+------------------------------ F( 1, 22) = 5.11 Model | 44236671 1 44236671 Prob > F = 0.0340 Residual | 190356376 22 8652562.57 R-squared = 0.1886-------------+------------------------------ Adj R-squared = 0.1517 Total | 234593048 23 10199697.7 Root MSE = 2941.5

------------------------------------------------------------------------------ gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- avetaxrate | -1341.942 593.4921 -2.26 0.034 -2572.769 -111.1148 _cons | 26815.88 7910.084 3.39 0.003 10411.37 43220.39------------------------------------------------------------------------------

April 10, 2023

The Stats

Slide 12

regress gdp avetaxrate

Source | SS df MS Number of obs = 24-------------+------------------------------ F( 1, 22) = 5.11 Model | 44236671 1 44236671 Prob > F = 0.0340 Residual | 190356376 22 8652562.57 R-squared = 0.1886-------------+------------------------------ Adj R-squared = 0.1517 Total | 234593048 23 10199697.7 Root MSE = 2941.5

------------------------------------------------------------------------------ gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- avetaxrate | -1341.942 593.4921 -2.26 0.034 -2572.769 -111.1148 _cons | 26815.88 7910.084 3.39 0.003 10411.37 43220.39------------------------------------------------------------------------------

April 10, 2023

But wait, there’s more…What is the model?

There is a directly proportional relationship between tax rates and economic growth.

How about an equation?We will get back to this…

Can you critique the “model?”

Can we look at it differently?

If higher

Slide 13

Effective Tax Rate and Growth in GDP:1986-2009

Sources: (1) US Bureau of Economic Analysis, http://www.bea.gov/iTable/iTable.cfm?ReqID=9&step=1 (2) US Internal Revenue Service, http://www.irs.gov/pub/irs-soi/09in05tr.xls

http://www.bea.gov/iTable/iTable.cfm?ReqID=9&step=1

April 10, 2023

. regress gdpchange avetaxrate

Source | SS df MS Number of obs = 23

-------------+------------------------------ F( 1, 21) = 6.15

Model | 22.4307279 1 22.4307279 Prob > F = 0.0217

Residual | 76.5815075 21 3.64673845 R-squared = 0.2265

-------------+------------------------------ Adj R-squared = 0.1897

Total | 99.0122354 22 4.50055615 Root MSE = 1.9096

------------------------------------------------------------------------------

gdpchange | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

avetaxrate | .9889805 .3987662 2.48 0.022 .1597008 1.81826

_cons | -7.977765 5.292757 -1.51 0.147 -18.98466 3.029126

------------------------------------------------------------------------------

Slide 15

April 10, 2023

Finally…sort of

Slide 16

. regress gdpchange taxratechange

Source | SS df MS Number of obs = 23-------------+------------------------------ F( 1, 21) = 11.00 Model | 34.0305253 1 34.0305253 Prob > F = 0.0033 Residual | 64.9817101 21 3.09436715 R-squared = 0.3437-------------+------------------------------ Adj R-squared = 0.3124 Total | 99.0122354 22 4.50055615 Root MSE = 1.7591

------------------------------------------------------------------------------- gdpchange | Coef. Std. Err. t P>|t| [95% Conf. Interval]--------------+----------------------------------------------------------------taxratechange | .2803104 .0845261 3.32 0.003 .1045288 .456092 _cons | 5.414708 .3780098 14.32 0.000 4.628593 6.200822-------------------------------------------------------------------------------

April 10, 2023

Alternate Mathematical Notation for the Line

Alternate Mathematical Notation for the straight line - don’t ask why!10th Grade Geometry

Statistics Literature

Econometrics Literature

Or (Like your Textbook)

Slide 17

y mx b= +

Y a bX ei i i

Y B B X ei i i= + +0 1

April 10, 2023

Alternate Mathematical Notation for the Line – cont.

These are all equivalent. We simply have to live with this inconsistency.

We won’t use the geometric tradition, and so you just need to remember that B0 and a are both the same thing.

Slide 18

April 10, 2023

Linear Regression: the Linguistic Interpretation

In general terms, the linear model states that the dependent variable is directly proportional to the value of the independent variable.

Thus if we state that some variable Y increases in direct proportion to some increase in X, we are stating a specific mathematical model of behavior - the linear model.

Hence, if we say that the crime rate goes up as unemployment goes up, we are stating a simple linear model.

Slide 19

April 10, 2023

Linear Regression:A Graphic Interpretation

Slide 20

The Straight Line

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10

X

Y

April 10, 2023

The linear model is represented by a simple picture

Slide 21

Simple Linear Regression

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10

X

Y

April 10, 2023

The Mathematical Interpretation: The Meaning of the Regression Parameters

a = the interceptthe point where the line crosses the Y-axis.(the value of the dependent variable when all of

the independent variables = 0)

b = the slopethe increase in the dependent variable per unit

change in the independent variable. (also known as the 'rise over the run')

Slide 22

April 10, 2023

The Error TermSuch models do not predict behavior perfectly.

So we must add a component to adjust or compensate for the errors in prediction.

Having fully described the linear model, the rest of the semester (as well as several more) will be spent of the error.

Slide 23

April 10, 2023

The Nature of Least Squares Estimation

There is 1 essential goal and there are 4 important concerns with any OLS Model

Slide 24

April 10, 2023

The 'Goal' of Ordinary Least Squares

Ordinary Least Squares (OLS) is a method of finding the linear model which minimizes the sum of the squared errors.

Such a model provides the best explanation/prediction of the data.

Slide 25

April 10, 2023

Why Least Squared error?Why not simply minimum error?

The error’s about the line sum to 0.0!

Minimum absolute deviation (error) models now exist, but they are mathematically cumbersome.

Try algebra with | Absolute Value | signs!

Slide 26

April 10, 2023

Other models are possible...

Best parabola...? (i.e. nonlinear or curvilinear relationships)

Best maximum likelihood model ... ?

Best expert system...?

Complex Systems…?Chaos/Non-linear systems modelsCatastrophe modelsothers

Slide 27

April 10, 2023

The Simple Linear VirtueI think we over emphasize the linear model.

It does, however, embody this rather important notion that Y is proportional to X.

As noted, we can state such relationships in simple English.As unemployment increases, so does the

crime rate.As domestic conflict increased, national

leaders will seek to distract their populations by initiating foreign disputes.

Slide 28

April 10, 2023

The Notion of Linear Change

The linear aspect means that the same amount of increase in unemployment will have the same effect on crime at both low and high unemployment.

A nonlinear change would mean that as unemployment increases, its impact upon the crime rate might increase at higher unemployment levels.

Slide 29

April 10, 2023

Why squared error? Because:

(1) the sum of the errors expressed as deviations would be zero as it is with standard deviations, and

(2) some feel that big errors should be more influential than small errors.

Therefore, we wish to find the values of a and b that produce the smallest sum of squared errors.

Slide 30

April 10, 2023

The Parameter estimates

In order to do this, we must find parameter estimates which accomplish this minimization.

In calculus, if you wish to know when a function is at its minimum, you set the first derivative equal to zero.

In this case we must take partial derivatives since we have two parameters (a & b) to worry about.

We will look closer at this and it’s not a pretty sight!

Slide 31

April 10, 2023

Decomposition of the error in LS

Slide 32

April 10, 2023

Goodness of Fit

Since we are interested in how well the model performs at reducing error, we need to develop a means of assessing that error reduction.

Since the mean of the dependent variable represents a good benchmark for comparing predictions, we calculate the improvement in the prediction of Yi relative to the mean of Y(the best guess of Y with no other

information).

Slide 33

April 10, 2023

Sum of Squares Terminology

In mathematical jargon we seek to minimize the Unexplained Sum of Squares (USS), where:

Slide 34

USS Y Y

e

i i

i

( )

2

2

April 10, 2023

Sums of Squares

This gives us the following 'sum-of-squares' measures:Total Variation = Explained Variation +

Unexplained Variation

2

2

2

)ˆ(exp

)ˆ(

)(

ii

i

i

YYSquaresofSumlainedUnUSS

YYSquaresofSumExplainedESS

YYSquaresofSumTotalTSS

Slide 35

April 10, 2023

Sums of Squares Confusion

Note: Occasionally you will run across ESS and RSS which generate confusion since they can be used interchangeably. ESS can be the error sums-of-squares, or

alternatively, the estimated or explained SSQ.Likewise RSS can be the residual SSQ, or the

regression SSQ. Hence the use of USS for Unexplained SSQ in this

treatment.Gujarati uses ESS (Explained sum of squares) and

RSS (Residual sum of squares)

Slide 36

April 10, 2023

The Parameter estimates In order to find the ‘best’ parameters, we must find

parameter estimates which accomplish this minimization. That is the parameters that have the smallest sum f

squared errors.

In calculus, if you wish to know when a function is at its minimum, you take the first derivative and set it equal to 0.0 The second derivative must be positive as well, but we

do not need to go there.

In this case we must take partial derivatives since we have two parameters to worry about.

Slide 37

April 10, 2023

Deriving the Parameter Estimates

Since

We can take the partial derivative of USS with respect to both a and b.

Slide 38

USS Y Y

e

Y a bX

i i

i

i i

( )

( )

2

2

2

USS

aY a bX

USS

bY a bX X

i i

i i i

2 1

2

( ) ( )

( ) ( )

April 10, 2023

Deriving the Parameter Estimates (cont.)

Which simplifies to

We also set these derivatives to 0 to indicate that we are at a minimum.

Slide 39

USS

aY a bX

USS

bX Y a bX

i i

i i i

2 0

2 0

( )

( )

April 10, 2023


We now add a “hat” to the parameters to indicate that the results are estimators.

We also set these derivatives equal to zero.

Slide 40

USS

aY a b X

USS

bX Y a b X

i i

i i i

2 0

2 0

( )

( )

April 10, 2023


Dividing through by -2 and rearranging terms, we get

Slide 41

Y an b X

X Y a X b X

i i

i i i i

( ) ,

( ) ( )2

April 10, 2023


We can solve these equations simultaneously to get our estimators.

Slide 42

bn X Y X Y

n X X

X X Y Y

X X

a Y b X

i i i i

i i

i i

i

1 2 2

2

1

( )

( )( )

( )

April 10, 2023


The estimator for a also shows that the regression line always goes through the point which is the intersection of the two means.

This formula is quite manageable for bivariate regression. If there are two or more independent variables, the formula for b2, etc. becomes unmanageable!

See matrix algebra in POLS603!

Slide 43

April 10, 2023

Tests of Inference t-tests for coefficients

F-test for entire model

Since we are interested in how well the model performs at reducing error, we need to develop a means of assessing that error reduction. Since the mean of the dependent variable represents a

good benchmark for comparing predictions, we calculate the improvement in the prediction of Yi relative to the mean of Y

Remember that the mean of Y is your best guess of Y with no other information.Well, often, assuming the data is normally distributed!

Slide 44

April 10, 2023

T-TestsSince we wish to make probability statements

about our model, we must do tests of inference.

Fortunately,

OK, so what is the seB?

Slide 45

B

set

Bn 2

April 10, 2023

Standard Errors of Estimates

These estimates have variation associated with them

Slide 46

April 10, 2023

This gives us the F test:

The F-test tells us whether the full model is significant

Note that F statistics have 2 different degrees of freedom: k-1 and n-k, where k is the number of regressors in the model.

)/(

)1/((,1 knUSS

kESSF knk

Slide 47

April 10, 2023

More on the F-testIn addition, the F test for the entire model must

be adjusted to compensate for the changed degrees of freedom.

Note that F increases as n or R2 increases and decreases as k – the number of independent variables - increases.

Slide 48

April 10, 2023

F test for different Models

The F test can tell us whether two different models of the same dependent variable are significantly different.i.e. whether adding a new variable will

significantly improve estimation

model) new in the parameters ofnumber (/1

regressors new ofnumber /R2

22

ndfR

RF

new

oldnew

Slide 49

April 10, 2023

The correlation coefficientA measure of how close the residuals are to the

regression line

It ranges between -1.0 and +1.0

It is closely related to the slope.

Slide 50

April 10, 2023

R2 (r-square, r2) The r2 (or R-square) is also called the coefficient of determination.

It is the percent of the variation in Y explained by X

It must range between 0.0 and 1.0.

An r2 of .95 means that 95% of the variation about the mean of Y is “caused” (or at least explained) by the variation in X

Slide 51

rESS

TSSUSS

TSS

2

1

April 10, 2023

Observations on r2

R2 always increases as you add independent variables.The r2 will go up even if X2 , or any new variable, is

a completely random variable.

r2 is an important statistic, but it should never be seen as the focus of your analysis.

The coefficient values, their interpretation, and the tests of inference are really more important.

Beware of r2 = 1.0 !!!

Slide 52

April 10, 2023

The Adjusted R2

Since R2 always increases with the addition of a new variable, the adjusted R2 compensates for added explanatory variables.

Note that it may range < 0.0 and greater than 1.0!!! But these values indicate poorly formed models.

)]1/([

)]1/([12

nTSS

knUSSR

Slide 53

April 10, 2023

Comments on the adjusted R-squared

R-squared will always go up with the addition of a new variable.

Adjusted r-squared will go down if the variable contributes nothing new to the explanation of the model.As a rule of thumb, if the new variable has a t-

value greater than 1.0, it increases the adjusted r-square.

Slide 54

April 10, 2023

The assumptions of the model

We will spend the next 7 weeks on this!

Slide 55

April 10, 2023

The Multiple Regression Model The Scalar Version

The basic multiple regression model is a simple extension of the bivariate equation.

By adding extra independent variables, we are creating a multiple-dimensioned space, where the model fit is a some appropriate space.

For instance, if there are two independent variables, we are fitting the points to a ‘plane in space’.

Visualizing this in more dimensions is a good trick.

Slide 56

April 10, 2023

The Scalar EquationThe basic linear model:

If bivariate regression can be described as a line on a plane, multiple regression represents a k-dimensional object in a k+1 dimensional space.

Slide 57

Y a b X b X b X ei i i k ki i 1 1 2 2 . . .

April 10, 2023

The Matrix ModelWe can use a different type of mathematical

structure to describe the regression model

Frequently called Matrix or Linear Algebra

The multiple regression model may be easily represented in matrix terms.

Where the Y, X, B and e are all matrices of data, coefficients, or residuals

Slide 58

Y XB e

April 10, 2023

The Matrix Model (cont.)

The matrices in are represented by

Note that we postmultiply X by B since this order makes them conformable.

Also note that X1 is a column of 1’s to obtain an intercept term.

Slide 59

Y

Y

Y

Yn

1

2

X

X X X

X X X

X X X

ik

k

n n nk

11 1 2

2 1 2 2 2

1 2

. . .

. . .

. . . . . . . . . . . .

. . .

B

B

B

B k

1

2 e

e

e

en

1

2

Y XB e

April 10, 2023

Assumptions of the modelScalar Version

The OLS model has seven fundamental assumptions. This count varies from author to author, based upon

what each author thinks is a separate assumption!

These assumptions form the foundation for all regression analysis.

Failure of a model to conform to these assumptions frequently presents problems for estimation and inference. The “problem” may range from minor to severe!

These problems, or violations of the assumptions, almost invariably arise out of substantive or theoretical problems!

Slide 60

April 10, 2023

The Assumptions of the Model

Scalar Version (cont.)

1. The ei's are normally distributed.

2. E(ei) = 0

3. E(ei2) = 2

4. E(eiej) = 0 (ij)

5. X's are nonstochastic with values fixed in repeated samples and is a finite nonzero number.

6. The number of observations is greater than the number of coefficients estimated.

7. No exact linear relationship exists between any of the explanatory variables.

Slide 61

April 10, 2023

The Assumptions of the Model

The English Version The errors have a normal distribution.

The residuals are heteroskedastic. (The variation in the errors doesn’t change across values of the

independent or dependent variables)

There is no serial correlation in the errors. The errors are unrelated to their neighbors.)

There is no multicollinearity. (No variable is a perfect function of another variable.)

The X’s are fixed (non-stochastic) and have some variation, but no infinite values.

There are more data points than unknowns.

The model is linear in its parameters.. All modeled relationships are directly proportional

OK…so it’s not really English…. Slide 62

April 10, 2023

The Assumptions of the Model:

The Matrix VersionThese same assumptions expressed in matrix

format are:

1. e N(0,)2. = 2I

3. The elements of X are fixed in repeated samples and (1/ n)X'X is nonsingular and its elements are finite

Slide 63

April 10, 2023

Properties of Estimators ()

Since we are concerned with error, we will be concerned with those properties of estimators which have to do with the errors produced by the estimates - the s

We use the symbol to denote a general parameter

It could represent a regression slope (B), a sample mean (Xbar), a standard deviation (s), and many other statistics or estimators based on some sample.

Slide 64

April 10, 2023

Types of estimator error

Estimators are seldom exactly correct due to any number of reasons, most notably sampling error and biased selection.

There are several important concepts that we need to understand in examining how well estimators do their job.

Slide 65

April 10, 2023

Sampling errorSampling error is simply the difference between

the true value of a parameter and its estimate in any given sample.

This sampling error means that an estimator will vary from sample to sample and therefore estimators have variance.

Slide 66

Sampling E rror

Var E E E E( ) [ ( )] ( ) [ ( )] 2 2 2 2

April 10, 2023

Bias

The bias of an estimate is the difference between its expected value and its true value.

If the estimator is always low (or high) then the estimator is biased.

An estimator is unbiased if

And

Slide 67

E ( )

B ias E ( )

April 10, 2023

Mean Squared Error The mean square error (MSE) is different from

the estimator’s variance in that the variance measures dispersion about the estimated parameter while mean squared error measures the dispersion about the true parameter.

If the estimator is unbiased then the variance and MSE are the same.

Slide 68

M ean square error E ( ) 2

April 10, 2023

Mean Squared Error (cont.)

The MSE is important for time series and forecasting since it allows for both bias and efficiency:

For Instance

These concepts lead us to look at the properties of estimators. Estimators may behave differently in large and small samples, so we look at both small sample and,

large (asymptotic) sample properties.

Slide 69

M S E = v arian ce + (b ias) 2

http://www.wunderground.com/tropical/tracking/at201214_5day.html

April 10, 2023

Small Sample Properties

These are the ideal properties. We desire these to hold.Unbiasedness

Efficiency

Best Linear Unbiased Estimator

If the small sample properties hold, then by extension, the large sample properties hold.

Slide 70

April 10, 2023

Bias A parameter is unbiased if

In other words, the average value of the estimator in repeated sampling equals the true parameter.

Note that whether an estimator is biased or not implies nothing about its dispersion.

Slide 71

E ( )

April 10, 2023

Bias

0

0.2

0.4

0.6

0.8

1

-3

-2.2

-1.4

-0.6

0.2 1

1.8

2.6

X

Pro

b

Normal(s=1.0)

Series3

Slide 72

April 10, 2023

Efficiency

An estimator is efficient if it is unbiased and where its variance is less than any other unbiased estimator of the parameter. Is unbiased;

Var( ) Var ( )

where is any other unbiased estimator of

There might be instances in which we might choose a biased estimator, if it has a smaller variance.

Slide 73

~

~

April 10, 2023

Efficiency

Slide 74

Series 1 (s=1.0) is more efficient than Series 2 (s=2.0)

April 10, 2023

BLUE (Best Linear Unbiased Estimate)

An estimator is described as a BLUE estimator if it is is a linear function

is unbiased

Var( ) Var ( )

where is any other linear unbiased estimator of

Slide 75

~~

What is a linear estimator?

A linear estimator looks like the formula for a straight line

The linearity referred to is the linearity in the parameters

Note that the sample mean is an example of a linear estimator.

nXnX

nX

nX

nX

1...

111321

nnxaxaxaxa ...ˆ332211

April 10, 2023 Slide 76

April 10, 2023

BLUE is BuenoIf your estimator (e.g. the Regression B) is the

BLUE estimator, then you have a very good estimator – relative to other regression style estimators.

The problem is that if certain assumptions are violated, then OLS may no longer be the “best” estimator.There might be a better one!

You can still hope that the large sample properties hold though!

Slide 77

April 10, 2023

Asymptotic (Large Sample) Properties

Asymptotically unbiased

Consistency

Asymptotic efficiency

Slide 78

April 10, 2023

Asymptotic bias

An estimator is unbiased if

As the sample size gets larger the estimated parameter gets closer to the true value.

For instance:

)ˆ(lim nn

E

n

1-1)E(Sand

)( 222

2 n

XXS i

Slide 79

April 10, 2023

Consistency

The point at which a distribution collapses is called the probability limit (plim)

If the bias and variance both decrease as the sample size gets larger, the estimator is consistent.

Usually noted by

01ˆlim

Pn

ˆplimn

Slide 80

April 10, 2023

Asymptotic efficiencyAn estimator is asymptotically efficient if

asymptotic distribution with finite mean and variance

is consistent

no other estimator has smaller asymptotic variance

Slide 81

April 10, 2023

Rifle and Target Analogy

Small sample propertiesBias: The shots cluster around some spot other than

the bull’s-eye)

Efficient: When one rifle’s cluster is smaller than another’s.

BLUE - Smallest scatter for rifles of a particular type of simple construction

Slide 82

April 10, 2023

Rifle and Target Analogy (cont.)

Asymptotic properties Think of increased sample size as getting closer to the target.

Asymptotic Unbiasedness means that as the sample size gets larger, the center of the point cluster moves closer to the target center.

With consistency, the point cluster moves closer to the target center and cluster shrinks in size.

If it is asymptotically efficient, then no other rifle has a smaller cluster that is closer to the true center.

When all of the assumptions of the OLS model hold its estimators are: unbiased

Minimum variance, and

BLUE

Slide 83

April 10, 2023

Assumption Violations: How we will approach the

question.

Definition

Implications

Causes

Tests

Remedies

Slide 84

April 10, 2023

Non-zero Mean for the residuals (Definition)

Definition:The residuals have a mean other than 0.0.

Note that this refers to the true residuals. Hence the estimated residuals have a mean of 0.0, while the true residuals are non-zero.

Slide 85

April 10, 2023

Non-zero Mean for the residuals (Implications)The true regression line is

Therefore the intercept is biased.

The slope, b, is unbiased. There is also no way of separating out a and .

Slide 86

Y a bXi i e

April 10, 2023

Non-zero Mean for the residuals (Causes, Tests,

Remedies)Causes: Non-zero means result from some form

of specification error. Something has been omitted from the model which accounts for that mean in the estimation.

We will discuss Tests and Remedies when we look closely at Specification errors.

Slide 87

April 10, 2023

Non-normally distributed errors :

DefinitionThe residuals are not NID(0,)

Slide 88

0.0

8.8

17.5

26.3

35.0

-1000.0 -250.0 500.0 1250.0 2000.0

Histogram of Residuals of rate90

Residuals of rate90

Count

Normality Tests SectionAssumption Value Probability Decision(5%)Skewness 5.1766 0.000000 RejectedKurtosis 4.6390 0.000004 RejectedOmnibus48.3172 0.000000 Rejected

April 10, 2023


ImplicationsThe existence of residuals which are not

normally distributed has several implications.First is that it implies that the model is to

some degree misspecified. A collection of truly stochastic disturbances

should have a normal distribution. The central limit theorem states that as the number of random variables increases, the sum of their distributions tends to be a normal distribution.

Distribution theory – beyond the scope of this course

Slide 89

April 10, 2023

Non-normally distributed errors : Implications (cont.)

If the residuals are not normally distributed, then the estimators of a and b are also not normally distributed.

Estimates are, however, still BLUE.

Estimates are unbiased and have minimum variance.

They are no longer efficient, even though they are asymptotically unbiased and consistent.

It is only our hypothesis tests which are suspect.

Slide 90

April 10, 2023

Non-normally distributed errors:

CausesGenerally causes by a misspecification error.

Usually an omitted variable.

Can also result from Outliers in data.Wrong functional form.

Slide 91

April 10, 2023


Tests for non-normality

Chi-Square goodness of fit

Since the cumulative normal frequency distribution has a chi-square distribution, we can test for the normality of the error terms using a standard chi-square statistic.

We take our residuals, group them, and count how many occur in each group, along with how many we would expect in each group.

Slide 92

April 10, 2023


Tests for non-normality (cont.)

We then calculate the simple 2 statistic.

This statistic has (N-1) degrees of freedom, where N is the number of classes.

Slide 93

2

2

1

O E

Ei i

ii

k

April 10, 2023


Tests for non-normality (cont.)

Jarque-Bera testThis test examines both the skewness and kurtosis

of a distribution to test for normality.

Where S is the skewness and K is the kurtosis of the residuals.

JB has a 2 distribution with 2 df.

Slide 94

JB nS K

2 2

6

3

2 4

( )

April 10, 2023

Non-normally distributed errors:

RemediesTry to modify your theory. Omitted variable?

Outlier needing specification?

Modify your functional form by taking some variance transforming step such as square root, exponentiation, logs, etc. Be mindful that you are changing the nature of the

model.

Bootstrap it!From the shameless commercial division!

Slide 95

http://www.amazon.com/exec/obidos/ASIN/080395381X/qid=1031842439/sr=2-1/ref=sr_2_1/104-7934934-8406347

April 10, 2023

Multicollinearity: Definition

Multicollinearity is the condition where the independent variables are related to each other. Causation is not implied by multicollinearity.

As any two (or more) variables become more and more closely correlated, the condition worsens, and ‘approaches singularity’.

Since the X's are fixed (or they are supposed to be anyway), this a sample problem.

Since multicollinearity is almost always present, it is a problem of degree, not merely existence.

Slide 96

April 10, 2023

Multicollinearity: Implications

Consider the following cases

A. No multicollinearityThe regression would appear to be identical to

separate bivariate regressions

This produces variances which are biased upward (too large) making t-tests too small.

The coefficients are unbiased.

For multiple regression this satisfies the assumption.

Slide 97

April 10, 2023

Multicollinearity: Implications (cont.)

B. Perfect Multicollinearity

Some variable Xi is a perfect linear combination of one or more other variables Xj, therefore X'X is singular, and |X'X| = 0.

This is matrix algebra notation. It means that one variable is a perfect linear function of another. (e.g. X2 = X1+3.2)

The effects of X1 and X2 cannot be separated.

The standard errors for the B’s are infinite.

A model cannot be estimated under such circumstances. The computer dies.

And takes your model down with it…

Slide 98

April 10, 2023

Multicollinearity: Implications (cont.)

C. A high degree of MulticollinearityWhen the independent variables are highly correlated the

variances and covariances of the Bi's are inflated (t ratio's are lower) and R2 tends to be high as well.

The B's are unbiased (but perhaps useless due to their imprecise measurement as a result of their variances being too large). In fact they are still BLUE.

OLS estimates tend to be sensitive to small changes in the data.

Relevant variables may be discarded.

Slide 99

April 10, 2023

Multicollinearity: Causes

Sampling mechanism. Poorly constructed design & measurement scheme or limited range. Too small a sample range

Constrained theory:

(X1 does affect X2) e.g. Elect consump = wealth + House size

Statistical model specification: adding polynomial terms or trend indicators.

Too many variables in the model - the model is over-determined.

Theoretical specification is wrong. Inappropriate construction of theory or even measurement. If your dependent variable is constructed using an independent

variable

Slide 100

April 10, 2023

Multicollinearity: Tests/Indicators

|X'X| approaches 0.0

The variance covariance matrix is singular, so it’s “determinant” is 0.0Since the determinant is a function of variable

scale, this measure doesn't help a whole lot. We could, however, use the determinant of the correlation matrix and therefore bound the range from 0. to 1.0

Slide 101

April 10, 2023

Multicollinearity: Tests/Indicators (cont.)

Tolerance: If the tolerance equals 1, the variables are

unrelated. If Tolj = 0, then they are perfectly correlated.

To calculate, regress each Independent variable on all the other independent variablesVariance Inflation Factors (VIFs)

Tolerance

Slide 102

V IFR k

1

1 2

TOL R V IFj j j 1 12 ( / ( ))

April 10, 2023

Interpreting VIFsNo multicollinearity produces VIFs = 1.0

If the VIF is greater than 10.0, then multicollinearity is probably severe. 90% of the variance of Xj is explained by the other X’s.

In small samples, a VIF of about 5.0 may indicate problems

Slide 103

April 10, 2023


(cont.)

R2 deletes - tries all possible models of X's and by includes/ excludes based on small changes in R2 with the inclusion/omission of the variables (taken 1 at a time)

F is significant, But no t value is.

Adjusted R2 declines with a new variable

Multicollinearity is of concern when either

YXXX

YXXX

rr

rr

221

121

Slide 104

April 10, 2023


(cont.)

I would avoid the rule of thumb

Beta's are > 1.0 or < -1.0

Sign changes occur with the introduction of a new variable

The R2 is high, but few t-ratios are.

Eigenvalues and Condition Index - If this topic is beyond Gujarati, it’s beyond me.

6.21XXr

Slide 105

April 10, 2023

Multicollinearity: Remedies

Increase sample sizePooled cross-sectional time series

Thereby introducing all sorts of new problems!

Omit Variables

Scale Construction/Transformation

Factor Analysis

Constrain the estimation. Such as the case where you can set the value of one coefficient relative to another.

Slide 106

April 10, 2023

Multicollinearity: Remedies (cont.)

Change design (LISREL maybe or Pooled cross-sectional Time series)Thereby introducing all sorts of new problems!

Ridge RegressionThis technique introduces a small amount of bias

into the coefficients to reduce their variance.

Ignore it - report adjusted R2 and claim it warrants retention in the model.

Slide 107

April 10, 2023

Heteroskedasticity: Definition

Heteroskedasticity is a problem where the error terms do not have a constant variance.

That is, they may have a larger variance when values of some Xi (or the Yi’s themselves) are large (or small).

Slide 108

E e i i( )2 2

April 10, 2023

Heteroskedasticity: Definition

This often gives the plots of the residuals by the dependent variable or appropriate independent variables a characteristic fan or funnel shape.

Slide 109

0

20

40

60

80

100

120

140

160

180

0 50 100 150

Series1

April 10, 2023

Heteroskedasticity: Implications

The regression B's are unbiased.

But they are no longer the best estimator. They are not BLUE (not minimum variance - hence not efficient).

They are, however, consistent.

Slide 110

April 10, 2023

Heteroskedasticity: Implications (cont.)

The estimator variances are not asymptotically efficient, and they are biased. So confidence intervals are invalid.

What do we know about the bias of the variance?

If Yi is positively correlated with ei, bias is negative - (hence t values will be too large.)

With positive bias many t's too small.

Slide 111

April 10, 2023

Heteroskedasticity: Implications (cont.)

Types of HeteroskedasticityThere are a number of types of heteroskedasticity.

Additive

Multiplicative

ARCH (Autoregressive conditional heteroskedastic) - a time series problem.

Slide 112

April 10, 2023

Heteroskedasticity: Causes

It may be caused by:Model misspecification - omitted variable or

improper functional form.

Learning behaviors across time

Changes in data collection or definitions.

Outliers or breakdown in model.Frequently observed in cross sectional data sets

where demographics are involved (population, GNP, etc).

Slide 113

April 10, 2023

Heteroskedasticity: Tests

Informal MethodsPlot the data and look for patterns!Plot the residuals by the predicted dependent

variable (Resids on the Y-axis)Plotting the squared residuals actually makes more

sense, since that is what the assumption refers to!

Homoskedasticity will be a random scatter horizontally across the plot.

Slide 114

April 10, 2023

Heteroskedasticity: Tests (cont.)

Park testAs an exploratory test, log the residuals and

regress them on the logged values of the suspected independent variable.

If the B is significant, then heteroskedasticity may be a problem.

Slide 115

ln ln ln

ln

u B X v

a B X vi i i

i i

2 2

April 10, 2023


Glejser Test

This test is quite similar to the park test, except that it uses the absolute values of the residuals, and a variety of transformed X’s.

A significant B2 indicated Heteroskedasticity.

Easy test, but has problems.

Slide 116

u B B X v

u B B X v

u B BX

v

i i i

i i i

ii

i

1 2

1 2

1 2

1

u B BX

v

u B B X v

u B B X v

ii

i

i i i

i i i

1 2

1 2

1 22

1

April 10, 2023


Goldfeld-Quandt testOrder the n cases by the X that you think is

correlated with ei2.

Drop a section of c cases out of the middle(one-fifth is a reasonable number).

Run separate regressions on both upper and lower samples.

Slide 117

April 10, 2023


Goldfeld-Quandt test (cont.)

Do F-test for difference in error variancesF has (n - c - 2k)/2 degrees of freedom for each

2

1

)2

2,

2

2(

ee

eekcnkcn s

sF

Slide 118

April 10, 2023


Breusch-Pagan-Godfrey Test (Lagrangian Multiplier test)Estimate model with OLSObtain

Construct variables

Slide 119

nui /~ 22

22iii up ~/ˆ

April 10, 2023


Breusch-Pagan-Godfrey Test (cont.)Regress pi on the X (and other?!) variables

Calculate

Note that

Slide 120

mimiii ZZZp ...33221

)(ESS2

1

21 m

April 10, 2023


White’s Generalized Heteroskedasticity testEstimate model with OLS and obtain residualsRun the following auxiliary regression

Higher powers may also be used, along with more X’s

Slide 121

April 10, 2023


White’s Generalized Heteroskedasticity test (cont.)Note that

The degrees of freedom is the number of coefficients estimated above.

Slide 122

22 Rn

April 10, 2023

Heteroskedasticity: Remedies

GLSWe will cover this after autocorrelation

Weighted Least Squaressi

2 is a consistent estimator of σi2

use same formula (BLUE) to get a & ß

Slide 123

April 10, 2023

Iteratively weighted least squares (IWLS)

Iteratively weighted least squares (IWLS)

1. Obtain estimates of ei2 using OLS

2. Use these to get "1st round" estimates of σi

3. Using formula above replace wi with 1/ si and obtain new estimates for a and ß.

4. Adjust data

5. Use these to re-estimate

6. Repeat Step 3-5 until a and ß converge.

i

ii

i

ii s

XX

s

YY ** ,

iii ebXaY

Slide 124

*****iii eXY

April 10, 2023

White’s corrected standard errors

White’s corrected standard errors For normal OLS

We can restate this as

Since

this is the same when

xTSSBVar

2

1)ˆ(

x

i

n

ii

TSS

xxBVar

2

2

1

2

1

)()ˆ(

Slide 125

n

ii xx

1

2)(

22 i

April 10, 2023

White’s corrected standard errors

(cont.)

White’s corrected standard errorsWhite’s solution is to use the robust

estimator

When you see robust standard errors, it usually refers to this estimator

2

2

1

22 ˆ)ˆ()ˆ(ˆ

j

i

n

iij

jRSS

urBrVa

Slide 126

April 10, 2023

Obtaining Robust errorsIn Stata, just add a , r to the regress command

regress approval unemrateBecomes

regress approval unemrate, r

Slide 127

April 10, 2023

Autocorrelation: DefinitionAutocorrelation is simply the presence of

correlation between adjacent (contemporaneous) residuals.

If a residual is negative (or positive) then its neighbors tend to also be negative (or positive).

Most often autocorrelation is between adjacent observations, however, lagged or seasonal patterns can also occur.

Autocorrelation is also usually a function of order by time, but it can occur for other orders as well – firm or state size.

Slide 128

April 10, 2023

Autocorrelation: Definition (cont.)

The assumption violated is

This means that the Pearson’s r (correlation coefficient) between the residuals from OLS and the same residuals lagged one period (or more) is non-zero.

Slide 129

0)( jieeE

April 10, 2023


Most autocorrelation is what we call 1st order autocorrelation, meaning that the residuals are related to their contiguous values.

Autocorrelation can be rather complex, producing counterintuitive patterns and correlations.

Slide 130

April 10, 2023


Types of AutocorrelationAutoregressive processes

Moving Averages

Slide 131

April 10, 2023


Autoregressive processes AR(p)

The residuals are related to their preceding values.

This is classic 1st order autocorrelation

ttt uee 1

Slide 132

April 10, 2023


Autoregressive processes (cont.) In 2nd order autocorrelation the residuals are

related to their t-2 values as well

Larger order processes may occur as well

Slide 133

tptpttt ueeee ...2211

tttt ueee 2211

April 10, 2023


Moving Average Processes MA(q)

The error term is a function of some random error plus a portion of the previous random error.

Slide 134

1 ttt uue

April 10, 2023


Moving Average Processes (cont.

Higher order processes for MA(q) also exist.

The error term is a function of some random error plus a portion of the previous random error.

Slide 135

qtqtttt uuuue ...2211

April 10, 2023


Mixed processes ARMA(p,q)

The error term is a complex function of both autoregressive and moving average processes.

Slide 136

qtqtt

tptpttt

uuu

ueeee

...

...

2211

2211

April 10, 2023


There are substantive interpretations that can be placed on these processes.AR processes represent shocks to systems that

have long-term memory.MA processes are quick shocks to a system that

can handle the process ‘efficiently,’ having only short term memory.

Slide 137

April 10, 2023

Autocorrelation: Implications

Coefficient estimates are unbiased, but the estimates are not BLUE

The variances are often greatly underestimated (biased small)

Hence hypothesis tests are exceptionally suspect. In fact, strongly significant t-tests (P < .001) may

well be insignificant once the effects of autocorrelation are removed.

Slide 138

April 10, 2023

Autocorrelation: CausesSpecification error

Omitted variable – i.e inflation

Wrong functional form

Lagged effects

Data Transformations Interpolation of missing datadifferencing

Slide 139

April 10, 2023

Autocorrelation: TestsObservation of residuals

Graph/plot them!

Runs of signsGeary test

Slide 140

April 10, 2023

Autocorrelation: Tests (cont.)

Durbin-Watson d

Criteria for hypothesis of ACReject if d < dL

Do not reject if d > dU

Test is inconclusive if dL d dU.

nt

tt

nt

ttt

u

uud

2

2

2

21

ˆ

ˆˆ

Slide 141

April 10, 2023


Durbin-Watson d (cont.)Note that the d is symmetric about 2.0, so that

negative autocorrelation will be indicated by a d > 2.0.

Use the same distances above 2.0 as upper and lower bounds.

Slide 142

Analysis of Time Series. http://cnx.org/content/m34544/latest/

April 10, 2023


Durbin’s hCannot use DW d if there is a lagged

endogenous variable in the model

sc2 is the estimated variance of the Yt-1 term

h has a standard normal distribution

Slide 143

2121

cTs

Tdh

April 10, 2023


Tests for higher order autocorreltaionLjung-Box Q (χ2 statistic)

Also called the Portmanteau test

Breusch-Godfrey

Slide 144

L

j

j

jT

rTTQ

1

2

)2('

April 10, 2023

Autocorrelation: RemediesGeneralized Least Squares

Later!

First difference methodTake 1st differences of your Xs and YRegress Δ Y on ΔXAssumes that Φ = 1!This changes your model from one that explains

rates to one that explains changes.

Generalized differencesRequires that Φ be known.

Slide 145

April 10, 2023

Autocorrelation: RemediesCochran-Orcutt method

(1) Estimate model using OLS and obtain the residuals, ut.

(2) Using the residuals from the OLS, run the following regression.

Slide 146

ttt vupu 1ˆˆˆ

April 10, 2023

Autocorrelation: Cochran-Orcutt method (cont.)

(3) using the p obtained, perform the regression on the generalized differences

Where

(4) Substitute the values of B1 and B2 into the original regression to obtain new estimates of the residuals.

(5) Return to step 2 and repeat – until p no longer changes.No longer changes means (approximately) changes are less than

3 significant digits – or, for instance, at the 3rd decimal place.

Slide 147

April 10, 2023

Autocorrelation with lagged Dependent Variables

The presence of a lagged dependent variable causes special estimation problems.

Essentially you must purge the lagged error term of its autocorrelation by using a two stage IV solution.

Careful with lagged dependent variable models. The Lagged dep var may simple scoop up all the

variance to be explained. A variety of models used lagged dependent b=variables:

Adaptive expectationsPartial adjustment,Rational expectations.

Slide 148

April 10, 2023

Model Specification: Definition

The analyst should understand one fundamental “truth” about statistical models. They are all misspecified. We exist in a world of incomplete information

at best. Hence model misspecification is an ever-present danger.

We do, however, need to come to terms with the problems associated with misspecification so we can develop a feeling for the quality of information, description, and prediction produced by our models.

Slide 149

April 10, 2023

Criteria for a “Good Model”Hendry & Richard Criteria

Be data admissible – predictions must be logically possible

Be consistent with theoryHave weakly endogenous regressors (errors

and X’s uncorrelated)Exhibit parameter stability – relationship

cannot vary over time – unless modeled in that way

Exhibit data coherency – random residualsBe encompassing – contain or explain the

results of other models

Slide 150

April 10, 2023

Model Specification: Definition (cont.)

There are basically 4 types of misspecification we need to examine:functional form

inclusion of an irrelevant variable

exclusion of a relevant variable

measurement error and misspecified error term

Slide 151

April 10, 2023

Model Specification: Implications

If an omitted variable is correlated with the included variables, the estimates are biased as well as inconsistent.

In addition, the error variance is incorrect, and usually overestimated.

If the omitted variable is uncorrelated with the included variables, the errors are still biased, even though the B’s are not.

Slide 152

April 10, 2023

Model Specification: Implications

Incorrect functional form can result in autocorrelation or heteroskedasticity.

See the notes for these problems for the implications of each.

Slide 153

April 10, 2023

Model Specification: Causes

This one is easy - theoretical design.something is omitted, irrelevantly included,

mismeasured or non-linear.

This problem is explicitly theoretical.

Slide 154

April 10, 2023

Data MiningThere are techniques out there that look for

variables to add.These are often atheoretical – but can they work?Note that data mining may alter the ‘true’ level

of significanceWith c candidates for variables in the model, and

k actually chosen with an α=.05, the true level of significance is:

Note similarity to Bonferroni correction

kc /* )1(1

Slide 155

April 10, 2023

Model Specification: Tests

Actual Specification Tests

No test can reveal poor theoretical construction per se.

The best indicator that your model is misspecified is the discovery that the model has some undesirable statistical property; e.g. a misspecified functional form will often be indicated by a significant test for autocorrelation.

Sometimes time-series models will have negative autocorrelation as a result of poor design.

Slide 156

April 10, 2023

Ramsey RESET TestThe Ramsey RESET test is a “Regression

Specification Error Test.

You add the predicted values of Y to the regression model

If they have a significant coefficient then the errors are related to the predicted values, indicating that there is a specification error.

This is based on demonstrating that there is some non-random behavior left in the residuals

Slide 157

April 10, 2023

Model Specification: Tests

Specification Criteria for lagged designsMost useful for comparing time series models with

same set of variables, but differing number of parameters

Slide 158

April 10, 2023

Model Specification: Tests (cont)

Schwartz Criterion

where 2 equals RSS/n, m is the number of Lags (variables), and n is the number of observations

Note that this is designed for time series.

Slide 159

SC m n ln ~ ln 2

April 10, 2023

Model Specification: Tests (cont)

AIC (Akaike Information Criterion)

Both of these criteria (AIC and Schwartz) are to be minimized for improved model specification. Note that they both have a lower bound which is a function of sample size and number of parameters.

Slide 160

A ICK

nj j

j ln 22

April 10, 2023

Model Specification: Remedies

Model BuildingA. "Theory Trimming" (Pedhauzer: 616)

B. Hendry and the LSE school of “top-down” modeling.

C. Nested Models

D. Stepwise Regression. Stepwise regression is a process of including the

variables in the model “one step at a time.” This is a highly controversial technique.

Slide 161

April 10, 2023

Model Specification: Remedies (cont.)

Stepwise Regression Twelve things someone else says are wrong with

stepwise:Philosophical Problems

1. Completely atheoretical

2. Subject to spurious correlation

3. Information tossed out - insignificant variables may be useful

4. Computer replacing the scientist

5. Utterly mechanistic

Slide 162

April 10, 2023


Stepwise Regression

Statistical6. Population model from sample data

7. Large N - statistical significance can be an artifact

8. Inflates the alpha level

9. The scientist becomes the beholden to the significant tests

10. Overestimates the effect of the variables added early, and underestimates the variables added later

11. Prevents data exploration

12. Not even least squares for stagewise

Slide 163

April 10, 2023


Stepwise Regression

Twelve Responses:Selection of the data selected for the procedure implies

some minimal level of theorization

All analysis is subject to spurious correlation. If you think it might be spurious, - omit it.

True - but this can happen anytime

All the better

If it "works", is this bad? We use statistical decision rules in a mechanistic manner

Slide 164

April 10, 2023


Stepwise Regression this is true of regular regression as well

This is true of regular regression as well

No

No more than OLS

Not true

Also not true - this is a data exploration technique

Huh? Antiquated view of stepwise...probably not accurate in last 20 years

Slide 165

April 10, 2023

Measurement Error

Not much to say.

If the measurement error is random, estimates are unbiased, but results are weaker.

If biased measurement, results are biased.

Occasionally non-random measurement error produces other statistical problems: e.g. heteroskedasticity.

Slide 166

Ps602 notes part1

Documents

Transcript of Ps602 notes part1