Regression in Stata
Ista Zahn
Harvard MIT Data Center
January 24 2013
�e Institutefor Quantitative Social Scienceat Harvard University
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 1 / 35
Outline
1 Introduction
2 Univariate regression
3 Multiple regression
4 Interactions
5 Exporting and saving results
6 Wrap-up
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 2 / 35
Introduction
Topic
1 Introduction
2 Univariate regression
3 Multiple regression
4 Interactions
5 Exporting and saving results
6 Wrap-up
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 3 / 35
Introduction
Organization
Please feel free to ask questions at any point if they are relevant tothe current topic (or if you are lost!)There will be a Q&A after class for more specific, personalizedquestionsCollaboration with your neighbors is encouragedIf you are using a laptop, you will need to adjust paths accordinglyMake comments in your Do-file rather than on hand-outsSave on flash drive or email to yourself
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 4 / 35
Introduction
Copy the workshop materials to your home directory
Log in to an Athena workstation using your Athena user name andpasswordClick on the “Ubuntu” button on the upper-left and type “term” asshown below
Click on the “Terminal” icon as shown aboveIn the terminal, type this line exactly as shown:
cd; wget http://tinyurl.com/stata-stats-zip; unzip stata-stats-zip
If you see “ERROR 404: Not Found”, then you mistyped the command– try again, making sure to type the command exactly as shown
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 5 / 35
Introduction
Launch Stata on Athena
To start Stata type these commands in the terminal:
add stataxstata
Open up today’s Stata scriptIn Stata, go to Window => New do file => OpenLocate and open the StatStatistics.do script in the StataStatisticsfolder in your home directory
I encourage you to add your own notes to this file!
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 6 / 35
Introduction
Today’s Dataset
We have data on a variety of variables for all 50 statesPopulation, density, energy use, voting tendencies, graduation rates,income, etc.We’re going to be predicting SAT scoresUnivariate Regression: SAT scores and Education ExpendituresDoes the amount of money spent on education affect the mean SATscore in a state?Dependent variable: csatIndependent variable: expense
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 7 / 35
Introduction
Opening Files in Stata
Look at bottom left hand corner of Stata screen – This is the directoryStata is currently reading fromFiles are located in the StataDatMan folderStart by changing directory and loading the data
// change directorycd "C:/Users/dataclass/Desktop/StataStatistics"// use dir to see what is in the directory:dir// tell Stata to use the states data setuse states.dta
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 8 / 35
Introduction
Steps for Running Regression
1 Examine descriptive statistics2 Look at relationship graphically and test correlation(s)3 Run and interpret regression4 Test regression assumptions
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 9 / 35
Univariate regression
Topic
1 Introduction
2 Univariate regression
3 Multiple regression
4 Interactions
5 Exporting and saving results
6 Wrap-up
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 10 / 35
Univariate regression
Univariate Regression: Preliminaries
We want to predict csat scores from expenseFirst, let’s look at some descriptives
// generate summary statistics for csat and expensesum csat expense// look at codebokcodebook csat expense
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 11 / 35
Univariate regression
Univariate Regression
Look at scatterplots, compute correlation matrix, and regress SATscores on expenditures
// graph expense by csattwoway scatter expense csat
// correlate csat and expensepwcorr csat expense, star(.05)
// run the regressionregress csat expense
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 12 / 35
Univariate regression
Linear Regression Assumptions
Assumption 1: Normal DistributionThe errors of regression equation are normally distributedAssumption 2: Homoscedasticity (The variance around the regressionline is the same for all values of the predictor variable)Assumption 3: Errors are independentAssumption 4: Relationships are linear
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 13 / 35
Univariate regression
Homoscedasticity
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 14 / 35
Univariate regression
Testing Assumptions: Normality
A simple histogram of the residuals can be informative
// graph the residual values of csatpredict resid, residualhistogram resid, normal
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 15 / 35
Univariate regression
Testing Assumptions: Homoscedasticity
rvfplot
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 16 / 35
Multiple regression
Topic
1 Introduction
2 Univariate regression
3 Multiple regression
4 Interactions
5 Exporting and saving results
6 Wrap-up
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 17 / 35
Multiple regression
Multiple Regression
Just keep adding predictorsLet’s try adding some predictors to the model of SAT scoresincome :: % students taking SATspercent :: % adults with HS diploma (high)
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 18 / 35
Multiple regression
Multiple Regression Preliminaries
As before, start with descriptive statistics and correlations
// descriptive statisticssum income percent high
// generate correlation matrixpwcorr csat expense income percent high
// regress csat on exense, income, percent, and high\regress csat expense income percent high
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 19 / 35
Multiple regression
Exercise 1: Multiple Regression
Open the datafile, states.dta.1 Select a few variables to use in a multiple regression of your own.
Before running the regression, examine descriptive of the variables andgenerate a few scatterplots.
2 Run your regression3 Examine the plausibility of the assumptions of normality and
homogeneity
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 20 / 35
Interactions
Topic
1 Introduction
2 Univariate regression
3 Multiple regression
4 Interactions
5 Exporting and saving results
6 Wrap-up
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 21 / 35
Interactions
Interactions
What if we wanted to test an interaction between percent & high?Option 1: generate product terms by hand
// generate product of percent and highgen percenthigh = percent*highregress csat expense income percent high percenthigh
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 22 / 35
Interactions
Interactions
What if we wanted to test an interaction between percent & high?Option 2: Let Stata do your dirty work
// use the # sign to represent interactionsregress csat percent high c.percent#c.high// same as . regress csat c.percent##high
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 23 / 35
Interactions
Categorical Predictors
For categorical variables, we first need to dummy codeUse region as example
Option 1: create dummy codes before fitting regression model
// create region dummy codes using tabtab region, gen(region) // could also use gen / replace
//regress csat on regionregress csat region1 region2 region3
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 24 / 35
Interactions
Categorical Predictors
For categorical variables, we first need to dummy codeUse region as example
Option 2: Let Stata do it for you
// regress csat on region using fvvarlist syntax// see help fvvarlist for detailsregress csat i.region
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 25 / 35
Interactions
Exercise 2: Regression, Categorical Predictors, &Interactions
Open the datafile, states.dta.1 Add on to the regression equation that you created in exercise 1 by
generating an interaction term and testing the interaction.2 Try adding a categorical variable to your regression (remember, it will
need to be dummy coded). You could use region or high25, orgenerate a new categorical variable from one of the continuousvariables in the dataset.
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 26 / 35
Exporting and saving results
Topic
1 Introduction
2 Univariate regression
3 Multiple regression
4 Interactions
5 Exporting and saving results
6 Wrap-up
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 27 / 35
Exporting and saving results
Saving and exporting regression tables
Usually when we’re running regression, we’ll be testing multiplemodels at a timeCan be difficult to compare resultsStata offers several user-friendly options for storing and viewingregression output from multiple modelsFirst, download the necessary packages:
* install outreg2 packagefindit outreg2
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 28 / 35
Exporting and saving results
Saving and replaying
You can store regression model results in Stata
// fit two regression models and store the resultsregress csat expense income percent highestimates store Model1regress csat expense income percent high i.regionestimates store Model2
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 29 / 35
Exporting and saving results
Saving and replaying
Stored models can be recalled
// Display Model1estimates replay Model1
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 30 / 35
Exporting and saving results
Saving and replaying
Stored models can be compared
// Compare Model1 and Model2 coefficientsestimates table Model1 Model2
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 31 / 35
Exporting and saving results
Exporting into Excel
Avoid human error when transferring coefficients into tablesExcel can be used to format publication-ready tables
outreg2 [Model1 Model2] using csatprediction.xls, replace
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 32 / 35
Wrap-up
Topic
1 Introduction
2 Univariate regression
3 Multiple regression
4 Interactions
5 Exporting and saving results
6 Wrap-up
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 33 / 35
Wrap-up
Help Us Make This Workshop Better
Please take a moment to fill out a very short feedback formThese workshops exist for you–tell us what you need!ttp://tinyurl.com/StataRegressionFeedback
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 34 / 35
Wrap-up
Additional resources
training and consultingIQSS workshops:http://projects.iq.harvard.edu/rtc/filter_by/workshopsIQSS statistical consulting: http://rtc.iq.harvard.edu
Stata resourcesUCLA website: http://www.ats.ucla.edu/stat/Stata/Great for self-studyLinks to resources
Stata website: http://www.stata.com/help.cgi?contentsEmail list: http://www.stata.com/statalist/
Ista Zahn (Harvard MIT Data Center) Regression in Stata January 24 2013 35 / 35
Top Related