Microeconometria Day # 5
Transcript of Microeconometria Day # 5
MicroeconometriaDay # 5
L. CembaloRegressione con due variabili
e ipotesi dell’OLS
Multiple regression modelClassical hypothesis of a regression model:
Assumption 1: Linear regression model.The regression model is linear in the parameters
Assumption2: X values are fixed in repeated sampling (e changes while X not). Values taken by the regressor X are considered fixed in repeated samples. More technically, X is assumed to be nonstochastic.
Assumption 3: Zero mean value of disturbance ei. Given the value of X, the mean, or expected, value of the random disturbance term ei is zero. Technically, the conditional meanvalue of ei is zero. Symbolically, we have E(ei |Xi)=0 (no effect on Y)
Assumption 4: Homoscedasticity or equal variance of ei. Given the value of X, the variance of ei is the same for all observations. That is, the conditional variances of ei are identical. Symbolically, we have
Multiple regression modelClassical hypothesis of a regression model:
where var stands for variance.
Multiple regression modelClassical hypothesis of a regression model:Assumption 5: No autocorrelation between the disturbances. Given any two X values, Xi and Xj(i=j), the correlation between any two ei and ej (i=j) is zero. Symbolically,
full rank: k
Multiple regression modelClassical hypothesis of a regression model:
Assumption 10: No perfect multicollinearity. That is, there are no perfect linear relationships among the explanatory variables
Multiple regression modelClassical hypothesis of a regression model:
Anatomy of econometric modeling
Multiple regression modelOrdinary Least Squares
It is possible to show, through the GAUSS-MARKOV Theorem, that OLS are BLUE (Best Linear Unbiased Estimates). It means that OLS estimates are not distorted, linear in the residuals, consistent and are, among all not distorted and linear estimates, that with the minimum variance.
1. it is linear: a linear function of a random variable, such as the dependent variable Y in the regression model
2. it is unbiased: its average or expected value is equal to the true value of beta
3. It is efficient estimator: minimum variance in the class of all such linear unbiased estimators
Thus far we were concerned with the problem of estimating regression coefficients, their standard errors, and some of their properties. We now consider the goodness of fit of the fitted regression line to a set of data; that is, we shall find out how “well” the sample regression line fits the data. It is clear in the figure that if all the observations were to lie on the regression line, we would obtain a “perfect” fit, but this is rarely the case. Generally, there will be some positive errors and some negative. What we hope for is that the residuals around the regression line are as small as possible. The coefficient of determination r2 (two-variable case) or R2 (multiple regression) is a summary measure that tells how well the sample regression line fits the data.
Multiple regression model
Goodness of fit
Multiple regression modelGoodness of fit
Multiple regression modelGoodness of fit
Multiple regression modelGoodness of fit: r square measure the percentage of total variation of Y explained by the regression model
TSS = ESS + RSSTotal sum of squares (TSS) = Explained sum of squares (ESS) + Residual sum of squares (RSS)
Multiple regression modelGoodness of fit
It is non negative; ranging from 0 to 1 where 1 means perfect fit and zero no fit at all (no relations between X and Y).
Hypothesis testingHypothesis testing means to look for a rule that allows us to decide whether, given a pre-fixed probability level to commit an error, accept or refuse an hypothesis made on a rv or population.
Let’s assume to have a phenomenon described by a rv X with a PDF with known and unknown.
We want to analyze a phenomenon X distributed with a probabilistic law known eccept for
Let’s suppose to have extracted from X the sample We formulate the following hypothesis on and therefore on the probabilistic framework of X.
vs
with and such that
Hypothesis testingSince is unknown we will never know if Ho is true or if it is tru H1.The only think we are able to say is that Ho is true with a certain level of probability. is said “parametric set” generated by , while the hypothesis Ho and H1 are respectively named null hypothesis and alternative one.The decision to accept or refuse Ho is taken using the sample information contained in
Once we fix the sample size n, we can obtain the sample rv that describes a set, namely C, said “sample space”.
Testing means looking in C a region C1 such that if the sample is contained at C1 we refuse Ho, while if it is contained at the complementary set Co = C - C1 we can refuse Ho.
Hypothesis testing
As we have seen the decision is taken into the sample space but that decision affects on the parametric space. Graphically:
Parametric space Sample space
Hypothesis testingThe possible consequences refusing or accepting Ho are (given that we do not know in which region falls )
E1 = we refuse Ho when Ho is trueE2 = we accept Ho when Ho is falseG1 = we accept Ho when Ho is trueG2 = we refuse Ho when Ho is false
We are fully satisfied with G1 and G2 but we are unhappy with the other two. More formally:
E1 is named “type 1 error”; E2 is named “type II error”
Hypothesis testingE1 is deemed as the most relevant error for its pratical consequences.Since are “events” (they are function of the sample rv Xn), they admit a probability, more precisely:
obviously:
The probability of the type 1 error is said “level of significance of the test”. The probability of G2 is named “power” of the test. The region C1 is said “critical region”, while Co is said “acceptance region”.
Hypothesis testingExample. Let’s suppose a population where X represents the income of some worker category, and let be equal to .Let’s also suppose that those workers declared an income of thousand euro. The Italian Minister is not convinced of this data and he/she makes up a test:
15.000 euro vs 15.000 euro
The parametric space is
A sample extraction from X of n elements is performed
Varying the sample, change in xn occurs making change in the sample rv that describes our sample space.
E1 = the Minister sues workers (refuse Ho when Ho is true) - the worst scenarioE2 = the Minister accept what declared by workers (accept Ho when Ho is false)G1 = accept Ho when Ho is trueG2 = refuse Ho when Ho is false
Possible outcomes:
• A null hypothesis that is commonly tested in empirical work is H0: βi=0, that is, the slope coefficient is zero. This “zero” null hypothesis is a kind of straw man, the objective being to find out whether Y is related at all to X, the explanatory variable. If there is no relationship between Y and X to begin with, then testing a hypothesis such as β2=0.3 or any other value is meaningless.This null hypothesis can be easily tested by the confidence interval or the t-test approach discussed in the preceding sections. But very often such formal testing can be shortcut by adopting the “2-t” rule of significance, which may be stated as
Hypothesis testing
• The rationale for this rule is not too difficult to grasp. We know that we will reject H0:β2=0 if
Hypothesis testing
or
or rearranging
Example
F-Test su RSS
F=(RSSr - RSSur)/k/((RSSur)/(n1+n2-2k))
F(k, n1+n2-2k) gradi di libertà
Stata per esempio numerico