Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of...
Embed Size (px)
Transcript of Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of...
- Slide 1
Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of Equations 2009 LPGA Performance Statistics and Prize Winnings www.lpga.com S.J. Callan and J.M. Thomas (2007). Modeling the Determinants of a Professional Golfers Tournament Earnings, Journal of Sports Economics, Vol. 8, No. 4, pp. 394-411 Slide 2 Data Description Prize Winnings and Performance Statistics for n = 146 professional women (LPGA) golfers for 2009 season Exogenous Performance Variables: Average Driving Distance Percentage of Fairways reached on Drive Percentage of Greens Reached in Regulation Percentage of Sand Saves (in hole in 2 shots from close traps) Average Putts per hole on greens reached in regulation Numbers of Events, Events Completed, Rounds Endogenous Result (Dependent & Independent) Variables: Average Score per Round Average Rank (Percentile in Tournaments) Log(Prize Winnings) Slide 3 Variables in Systems of Equations Endogenous Variables Jointly dependent (response) variables that are system determined. They can also appear as predictor variables in other equations Exogenous Variables Independent variables that do not depend on the endogenous variables Predetermined Variables Exogenous and lagged Endogenous variables Instrumental Variables Predetermined variables used to predict endogenous variables in first-stage regressions, with predicted values being used in place of the endogenous predictors in system of equations Slide 4 System of Equations (Callan and Thomas, 2007) 1.Average Score (per 18 holes) is related to the golfers skills and experience (number of rounds played) 2.Average Rank (transformed to percentile) in tournaments is related to average score and the number of events she competed in 3.Season Earnings is related to average rank and the number of tournaments she completed Slide 5 Potential Problems with Endogenous Predictors When endogenous variables are included as predictors, they can be correlated with error terms for that equation, particularly when there are omitted variables that may be related to the outcome. This causes Ordinary Least Squares Estimates to be biased and inconsistent. In equation 2, SCORE may be correlated with the error term without a variable measuring average course difficulty (Callan and Thomas, p. 402). In equation 3, Rank may be correlated with the error term without a variable measuring golfers human capital investment such as diet and concentration level (Callan and Thomas, p. 402). Slide 6 Model Building Process 1.Regress all endogenous variables (Score, Rank, and ln(Prize)) on all exogenous variables 2.Obtain the predicted values for each endogenous variable, based on the Regressions from 1. 3.In the system of equations, replace any right hand side endogenous predictors with their fitted values from 2. 4.Note that software (e.g. SAS and STATA) will fit all the regressions in 1., even if that variable does not appear as a predictor (ln(Prize) in this example). 5.This method provides correct estimates, but not ANOVA table or correct standard errors Slide 7 First Stage Regressions for Score and Rank The fitted (predicted) values for SCORE will be used in equation 2 in place of SCORE, and the fitted values for RANK in equation 3. Equation 1 has no right hand side endogenous variables Slide 8 Equation 1) - SCORE is related to SKILLS and experience All variables except average driving distance are significant. All else equal: Average SCORE decreases as Percent Fairways Hit Increases (a 10% increase in fairways hit corresponds to a 0.19 decrease in SCORE) Average SCORE decreases by 1.36 with a 10% increase in Greens in regulation Average SCORE decreases by 0.16 with a 10% increase in Sand Saves Average SCORE increases by 1.32 with a 0.1 increase in putts per Green in Regulation hole Average SCORE decreases by 0.08 for 10 Round Increase in Rounds played Slide 9 Equation 2) - Rank is related to SCORE and Events Rank (as Percentile, with 100 meaning golfer won every tournament she played in) is: Negative associated with predicted SCORE (decreases by 12.5 with unit increase in average SCORE) Positively associated with number of Events (increases by 0.28 with a unit increase in # of EVENTS played) Note: The estimated coefficients are correct, but the standard errors, t-tests, and Analysis of Variance are incorrect (see slide 11) Slide 10 Equation 3) ln(Prize) is related to Rank and Completed Events Prize Winnings (in log form): Increase with (Predicted) Rank. A 10% increase in Rank (percentile) increases ln(Prize) by 0.56 Increase with Completed Events. For each tournament completed, ln(Prize) increases by 0.080. Note: The estimated coefficients are correct, but the standard errors, t- tests, and Analysis of Variance are incorrect (see slide 11) Slide 11 Matrix Approach: Models w/ Endogenous Predictors Slide 12 Model 2 Rank = f(Score, Events) Slide 13 Model 3: ln(Prize) = f(Rank,Completed) Slide 14 Robust Estimate of Variance of 2SLS Estimator Exact same method for equation 3 Slide 15 Results for Model 2: Rank = f(Score, Events) Slide 16 Results for Model 3: ln(Prize) = f(Rank,Completed) Slide 17 3-Stage Least Squares Extension of 2-Stage Least Squares that allows for a covariance structure among the system of equations Errors from 2SLS are obtained, and used to estimate the within individual (golfer) variance-covariance structure among the equations The response vector is stacked with the n responses from model 1, being stacked over the n responses from model 2, which are stacked over the n responses from model 3. The X matrices are blocked out diagonally, with 0 matrices off the blocked diagonal Slide 18 Model Description - I Slide 19 Model Description - II Slide 20 Estimation Results EQ1 EQ2 EQ3 Slide 21 SAS Program data lpga2009; infile 'lpga2009.dat'; input golfer drive fairway green putts sandsv prize lnprize events girputts complete aveposrank rounds strokes; lnprize1=log(prize); run; proc syslin 2sls out=regout; instruments drive fairway green girputts sandsv rounds events complete; strokes: model strokes = drive fairway green girputts sandsv rounds; output residual=e1; rank: model aveposrank = strokes events; output residual=e2; prize: model lnprize1 = aveposrank complete; output residual=e3; run; proc syslin 3sls data=lpga2009 itprint out=regout3; instruments drive fairway green girputts sandsv rounds events complete; strokes: model strokes = drive fairway green girputts sandsv rounds / xpx; output residual=e1; rank: model aveposrank = strokes events / xpx; output residual=e2; prize: model lnprize1 = aveposrank complete / xpx; output residual=e3; run; Slide 22 STATA Program insheet using lpga_2009_meq.csv generate lnprize=ln(prize) reg3 (avestrokes=drive fairway green sandsvpct girputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 2sls reg3 (avestrokes=drive fairway green sandsvpct girputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 3sls