Tema II (Forma Funcional)

41
Functional Form Rómulo A. Chumacero

description

Romulo Chumacero EscuderoForma FuncionalUniversidad de Chile

Transcript of Tema II (Forma Funcional)

Functional Form

Rómulo A. Chumacero

Functional Form

Motivation• What?: Extend OLS framework

• Why?: Crucial in practice

• How?: Using what we have learned

Outline1. Scaling

2. Dummy variables / Time trends

3. Possible nonlinearities

4. Diagnostics tests

5. Measurement errors in variables

6. Omitting relevant variables

7. Including irrelevant variables

8. Multicollinearity

9. Influential analysis

10. Model selection

11. Specification searches

1

Functional Form

Effects of Scaling• Data are not always conveniently scaled• Changing the scale of

= 1 + 2 + = 1 + (2)+

= 1 + ∗2∗ +

— Changes magnitude of coefficient by — Changes standard error of coefficient by same factor— -statics not affect

— Everything else remains unchanged

• Changing the scale of = 1 + 2 + =

1+2 +

∗ = ∗1 + ∗2 + ∗

— Changes magnitude of ALL coefficients by — Changes standard error of coefficient by same factor— -statics not affect

— Scaled residuals and changes SER by the same factor— Everything else remains unchanged

2

Functional Form

Dummy Variables• Example: E ( |). Equivalent ways to model this:

— Define a dummy variable

1 =

½1 female0 male

Thus, = 0 + 11 + . 0 = E ( |) and 0 + 1 = E ( |)

— Alternatively, define the variable

2 =

½0 female1 male

Thus, = 0 + 12 + . 0 = E ( |) and 0 + 1 = E ( |)

— Or = 11 + 22 + . 1 = E ( |) and 2 = E ( |)

• Standard mistake: include an intercept, 1 and 2. Perfectly collinear (1 + 2 = 1)

• If equation of interest is E ( |): = 0 + 11 + 2 +

• Intercept effect for gender but return on education is the same• A regression model allowing for slope differences (interactions) is

= 0 + 11 + 2 + 31 +

3

Functional Form

Dummy Variables• It is interesting to see how our estimators algebraically handle dummy variables

= 11 +22 +

• By construction 012 = 0. Thus,

b1 = (011)

−101 =

P=1 1P=1 1

=1

1

X=1

1 = 1

bV ³b´ = b2 ∙ (011)

−1 0

0 (022)

−1

¸=

" b21

0

0 b22

#

b2 = 1

X=1

b2• Another candidate for the variance-covariance estimate is

V³b´ = ∙ b21 (0

11)−1 0

0 b22 (022)

−1

¸=

" b211

0

0 b222

#

b2 = 1

X=1

b2 for = 1 2

4

Functional Form

Time Trend• Many economic variables exhibit trends

• Consider a series growing at a constant rate:

= 0 (1 + )

where is the rate of growth per period

• Taking logs (ln) of both sides:

ln = ln 0 + ln (1 + )

• Adding a shock and changing notation:

= 1 + 2 +

where = ln 1 = ln 0 2 = ln (1 + ) '

• Easily checked by taking the first difference and ignoring the disturbance:

∆ = 2

• Important: choice of unit for is irrelevant if it used consistently

5

Functional Form

Seasonality• Economic time series: = + + +

= 0 + 11 + 22 + 33 + 4 +

• E ( |First quarter) = 0 + 1 + 4; E ( |Fourth quarter) = 0 + 4.

• Seasonally adjusted series

∗ = −³b11 + b22 + b33´ (1)

4.4

4.6

4.8

5.0

5.2

5.4

87 88 89 90 91 92 93 94 95 96 97

GDP Seasonally Adjusted GDP

Figure 1: Use of Quarterly Dummies

• Detrended6

Functional Form

Nonlinearity in Regressors

• We are interested in E ( |) = () ∈ R and form of is unknown

• Common approach: polynomial approximation:

= 0 + 1 + 22 + · · · +

+

• Let =¡0 1 · · ·

¢and =

³1

2 · · ·

´ this is = 0 + which is LRM.

Typically, is kept small

• If ∈ R2, a simple quadratic approximation is

= 0 + 11 + 22 + 321 + 4

22 + 512 +

• As dimensionality increases, approximations become non-parsimonious

• Most applications use quadratic terms, some add cubics without interactions (or neural nets,Fourier series, splines, wavelets, etc):

= 0 + 11 + 22 + 321 + 4

22 + 512 + 6

31 + 7

32 +

• Since nonlinear models are linear in parameters, they can be estimated by OLS, and inferenceis conventional

7

Functional Form

Nonlinearity in Regressors

• However, model is nonlinear, so interpretation must take this into account

• For example, in cubic model, slope with respect to 1 isE ( |)

1= 1 + 231 + 52 + 36

21

which is a function of 1 and 2, making reporting of the “slope” difficult

• Important to report slopes for different values of the regressors, chosen to illustrate the pointof interest

• In other applications, average slope may be sufficient. Two obvious candidates:

— Derivative evaluated at sample averages

E ( |)1

¯̄̄̄=

= 1 + 231 + 52 + 3621

— and average derivative

1

X=1

E ( |)1

= 1 + 231 + 52 + 361

X=1

21

8

Functional Form

Transformations

• Even simplest model usually considers nonlinearities

— Example: Cobb-Douglas production function

=

(2)

• Take logs to (2): = + +

• Or: = 0+

• Instances in which no transformation can be used: CES:

+ (1− )

¤1 +

• OLS cannot be applied (NLLS)

9

Functional Form

Some Useful Functions

• Choice of functional form affects interpretation of results

• Use of good analytic skills and experience

Name Function Slope= ElasticityLinear = 1 + 2 2 2

Quadratic = 1 + 22 22 22

2

Cubic = 1 + 23 32

2 323

Log-log ln = 1 + 2 ln 2 2

Log-linear ln = 1 + 2 2 2Linear-log = 1 + 2 ln 2

1 2

1

• If ln is used 0 is required

10

Functional Form

ln ( ) versus as Dependent Variable

• Econometrician can estimate = b+b or ln( ) = b+b (or both). Which is preferable?• Plain truth: either is fine, in the sense that E ( |) and E (ln () |) are well-defined (solong as 0)

• To select one specification over the other, requires the imposition of additional structure (asconditional expectation is linear in , and ∼ N

¡0 2

¢)

• Some good reasons for preferring the ln ( ) over regression

— E (ln () |) may be roughly linear in , while E ( |) may be nonlinear, and linearmodels are easier to report and interpret

— = ln () − E (ln () |) may be less heteroskedastic than the errors from the linearspecification (although the reverse may be true!)

— As long as 0, range of\ln () is well-defined in R; this is not the case for b whichfor some values of b may produce b 0 (Tobit)

— If distribution of is skewed, E ( |) may not be a useful measure of central tendency,and estimates will be influenced by extreme observations (“outliers”); ln E (ln () |)may be a better measure of central tendency, and more interesting to estimate andreport

• Careful when the ln specification is used if interested in obtaining E ( |); Jensen inequality:exp [E (ln () |)] 6= E [exp (ln () |)]

11

Functional Form

Testing for Omitted Nonlinearity• Simple test: add nonlinear functions of , and test significance

— Let = () denote nonlinear functions of . Fit = 0e + 0e + e by OLS, and

test H0 : = 0

• Ramsey RESET test. The null model is = 0 +

=

⎛⎝ b2...b⎞⎠

= 0e + 0e + e (3)

by OLS, and form the Wald statistic ∼ 2−1 for H0 : = 0

— Typically = 2 3 or 4 seem to work best.— Works well as test of functional form against smooth alternatives. Powerful at detectingsingle-index models of the form

= (0) +

where (·) is a smooth “link” function. To see why this is the case, note that (3) maybe written as

= 0e + ³0b´2 e1 + ³0b´3 e2 + · · · + ³0b´ e−1 + e

which has essentially approximated (·) by a -th order polynomial12

Functional Form

Are Errors Normally Distributed?

• Normality of is not crucial for desired properties of OLS estimator (including inference)• However, it leads to use of exact distribution in small samples, helps in forecast, etc• Jarque-Bera test: most popular test for normality

=

6

"2 +

( − 3)2

4

#→ 22

— (Skewness) is a measure of asymmetry of the distribution around the mean

=1

X=1

µ −

¶3= =

1

X=1

µe

¶3 = 0 symmetric, 0 long right tail, 0 long left tail

— (Kurtosis) measures the peakedness or flatness of the distribution

=1

X=1

µ −

¶4= =

1

X=1

µe

¶3 = 3 (normal), 3 peaked (leptokurtic), 3 flat (platykurtic) relative to normal

• Important, if ln ∼ N¡ 2

¢, is log-normal

E = +052

Median () = V = 2+2³2 − 1

´13

Functional Form

Measurement Errors

∗ = ∗ +

∗ or ∗ not observed, = ∗ + = ∗ + ; ∼¡0 2

¢, ∼

¡0 2

¢• Consider first ∗ is observed with error. Then

= ∗ + + = ∗ + ∗; ∗ = +

Model satisfies assumptions of , b unbiased and efficient (not as efficient as when ∗

is observed)

• Now consider ∗ is measured with error,

∗ = ( − ) + = + ∗

where ∗ = − . Since = ∗ + , regressor is correlated with the disturbance, giventhat

Cov (∗) = Cov (∗ + − ) = −2violates assumption of no correlation between and error term. b biased and inconsistent.

• Assumption that measurement errors are unsystematic is naive

• If they are systematic, b may be biased and inconsistent even in the first case14

Functional Form

Omitted Variables

Correct Model: = 11 +22 +

Estimated Model: = 11 +

b1 = ( 011)

−1 01

= 1 + (011)

−1 0122 + (

011)

−1 01

E³b1´ = 1 + (

011)

−1 012| {z }

2

• Each column of is column of slopes of regression of 2 on 1. b1 will generally be biased• Unbiased if either: = 0 which states that 1 and 2 are orthogonal or 2 = 0

• Direction of bias difficult to assess in the general case; consider 1 and 2 are scalars

E³b1´ = 1 +

Cov (1 2)

V (1)2

• If sgn(Cov (1 2)2) 0, E³b1´ 1 and estimator will overestimate effect of 1 on

(Friedman: Permanent Income)

15

Functional Form

Omitted Variables

V³b1 |1

´= 2 ( 0

11)−1

• If we had estimated the “correct” model, E³b∗1´ = 1 and V

³b∗1 |1 2

´would be upper

left block of 2 ( 0)−1, with =£1 2

¤V³b∗1 |´ = 2 ( 0

121)−1

• To compare both expressions, analyze their inverseshV³b1 |1

´i−1−hV³b∗1 |´i−1 = −2

h 012 (

022)

−1 021

i(4)

which is p.d.!

• We may be inclined to conclude that although b1 is biased it has a smaller variance than b∗1• Nevertheless, 2 is not known and needs to be estimated

16

Functional Form

Omitted Variables• Proceeding as usual (thinking that the estimated model is correct) we would obtain

e2 = b0b − 1

but b =1 =1 (11 +22 + ) =122 +1. Then

E (b0b) = 0202122 + 2tr (1)

= 0202122 + 2 ( − 1)

• First term: population counterpart to increase in due to dropping 2. As this term ispositive, e2 will be biased upward (the true variance is smaller). Unfortunately, to take intoaccount this bias we would require to know 2.

• In conclusion

— If we omit relevant variable, b1 and e2 are biased— Even when b1 may be more precise than b∗1, 2 cannot estimate consistently— Only case in which b1 would be unbiased is if 1 and 2 were orthogonal

17

Functional Form

Irrelevant Variables

Correct Model: = 11 +

Estimated Model: = 11 +22 +

b1 = ( 0121)

−1 012 = 1 + (

0121)

−1 012

E³b´ = E " b1b2

#=

∙10

¸; E

¡e2¢ = E µ b0b − 1 − 2

¶= 2

• What is the problem? “Overfit”!! Cost: reduction in precision. Recallb1 = 1 + (0121)

−1 012

V³b1 |´ = 2 ( 0

121)−1

• But variance of b1 is larger than if correct model were estimated:V³b∗1 |1

´= 2 ( 0

11)−1

• Asymptotically as efficient if 1 and 2 orthogonal

• If 1 2 highly correlated, including 2 greatly inflate variance

18

Functional Form

Multicollinearity• Arises when measured variables are too highly intercorrelated to allow for precise analysis ofthe individual effects of each one

• We will discuss:

— nature

— ways to detect it

— effects

— “remedies”.

Perfect Collinearity

• If ( 0) , b not defined. Happens iff columns of are linearly dependent.

• Most commonly, arises when sets of regressors are identically related

• Example, let include ln (1) ln (2) and ln (12)

• When happens, error is quickly discovered, software will be unable to construct ( 0)−1

• Since error is quickly discovered, this is rarely a problem of applied econometric practice

• Thus, problem with multicollinearity is not with data, but with bad specification.

19

Functional Form

Near Multicollinearity

• In contrast to perfect collinearity, near multicollinearity is statistical “problem”

• Problem is not identification but precision

• The higher the correlation between regressors, the less precise will be the estimates

• Troubling about definition of “problem” is that our complaint is with the sample that wasgiven to us!

• The usual “symptoms” of the “problem” are:

— Small changes in data produce wide swings in estimates

— statistics are not significant, 2 is high (excuse?)

— Coefficients have wrong sign or implausible magnitudes

• Problem arises when 0 is “near singular” and columns of are close to linear dependence(loose)

• One implication of near singularity is that numerical reliability of calculations is reduced(more likely that reported calculations will be in error due to floating-point calculationdifficulties)

20

Functional Form

Near Multicollinearity

• Problem is with ( 0)−1, -th diagonal is ( = 1 for convenience):

(0121)−1=³011 − 012 (

022)

−1 021´−1

=

Ã011

Ã1− 012 (

022)

−1 021

011

!!−1=¡011

¡1−21

¢¢−1=

1

011 (1−21)

21 is (uncentered) 2 of regression of 1 on the other regressors. Thus,

V³b1´ = 2

011 (1−21)

if a set of regressors is highly correlated to 1, 21 →1 and V³b1´→∞

21

Functional Form

Detection

• Rule of thumb: concerned when overall 2 any 2• Alternative measure (Belsley) based on the conditioning number ()

=

rmaxmin

’s eigenvalues of = ( 0) =diag³1q0

´

=

⎡⎢⎢⎢⎢⎢⎣1√011

0 · · · 0

0 1√022

0 ...... 0 . . . 0

0 · · · 0 1√0

⎤⎥⎥⎥⎥⎥⎦• If regressors are orthogonal (2 = 0 ∀), = 1. Higher intercorrelation, higher conditioningnumber. If perfect collinearity, min = 0, and →∞

• Belsley suggests 20 indicate potential problems

• Approaches used to deal with “problem”:

— Reduce dimension of (drop variables). Obvious problem: omitted relevant variables,b biased— Principal components, Ridge Regression

22

Functional Form

Bottom Line

• There is no pair of words that is more misused than “multicollinearity problem”

• That explanatory variables are highly collinear is a fact of life

• It is clear that there are realizations of 0 which would be much preferred to the actualdata

• To complaint about the apparent malevolence of nature is not constructive

• Ad-hoc cures for a “bad” sample, can be disastrously inappropriate

• Better to rightly accept the fact that non-experimental data is sometimes not very informa-tive about parameters of interest

23

Functional Form

Bottom Line

• Example to clarify what we are really talking about

— Consider = 11 + 22 +

— A regression of 2 on 1 yields 2 = b1 + b, where b (by construction) is orthogonalto 1

— Substitute this auxiliary relationship into the original one to obtain the model

= 11 + 2

³b1 + b´ +

=³1 + 2b´1 + 2b +

= 11 + 22 +

where 1 =³1 + 2b´ 2 = 2 1 = 1 and 2 = 2 − b1

— Researcher who used 1 and 2 and the parameters 1 and 2, reports that 2 is estimatedinaccurately because of the collinearity problem

— Researcher who happened to stumble on the model with variables 1 and 2 and parame-ters 1 and 2 would report that there is no collinearity problem because 1 and 2 areorthogonal (recall that 1 and b are orthogonal by construction). This researcher wouldnonetheless report that 2(= 2) is estimated inaccurately, not because of collinearity,but because 2 does not vary adequately

24

Functional Form

Bottom Line

• Example illustrates that collinearity as a cause of weak evidence is indistinguishable frominadequate variability as a cause of weak evidence

• In light of that fact, surprising that all econometrics texts have sections dealing with the“collinearity problem” but none has a section on the “inadequate variability problem”

• In summary

— Collinearity is bound to be present in applied econometric practice

— There is no simple solution to this “problem”

— Fortunately, multicollinearity does not lead to errors in inference

— Asymptotic distribution is still valid. Estimates are asymptotically normal, and esti-mated standard errors are consistent

— Confidence intervals are not misleading. They are large, correctly indicating inherentuncertainty about the true parameter values

25

Functional Form

Influential Analysis• OLS seeks to prevent few large residuals at expense of incurring into many relatively smallresiduals

• A few observations can be extremely influential in the sense that dropping them from sample,changes elements of b substantially

• A systematic way to find those influential observations is: let b() OLS estimate of thatwould be obtained if -th observation were omitted

• The key equation is b() − b = −µ 1

1−

¶( 0)

−1b (5)

≡ 0 (0)

−1

which is -th diagonal element of . It is easy to show that

0 ≤ ≤ 1 andX=1

= (6)

so equals on average.

• What should be done with influential observations? Keep or drop?

26

Functional Form

-0.2

-0.1

0.0

0.1

0.2

0.00 0.05 0.10 0.15 0.20

Policy Rate

Gro

wth

1998:09

-0.2

-0.1

0.0

0.1

0.2

0.02 0.04 0.06 0.08 0.10 0.12

Policy Rate

Gro

wth

0.0

0.1

0.2

0.3

0.00 0.05 0.10 0.15 0.20

Policy Rate

p

1998:09

Figure 2: Monetary Policy Rate, Growth, and

27

Functional Form

Model Selection• We discussed costs and benefits of inclusion/exclusion of variables

• How to select specification, when theory does not provide complete guidance?

• This is the question of model selection

— Question: “What is the right model for ?” not well posed, it does not make clear theconditioning set

— Question: “Which subset of (1 · · · ) enters the E ( |1 · · · )?” is well posed.

• In cases, model selection reduced to compare two nested models

= 11 +22 +

1 is × 1 and 2 is × 2. Compare

M1 : = 11 +

M2 : = 11 +22 +

28

Functional Form

Model SelectionM1 : = 11 +

M2 : = 11 +22 +

• Note thatM1 ⊂M2

• We say thatM2 is true if 2 6= 0

• M1 andM2 are estimated by OLS, with residuals b1 and b2, estimated variances b21 andb22, etc., respectively• Model selection procedure is a data-dependent rule which selects one of the models (cM)

• Desirable properties for model selection procedure: consistency

Prh cM =M1 |M1

i→ 1

Prh cM =M2 |M2

i→ 1

29

Functional Form

Selection Based on Fit

• Natural measures of the fit of a regression are

— (b0b)— 2 = 1− (b0b) b2— Gaussian log-likelihood

³b b2´ = − (2) ln b2 + ( is a constant)

• It might be thought attractive to base model selection on one of these measures of fit

• Problem: measures are monotonic between nested models, b01b1 ≥ b02b2 21 ≤ 22 and1 ≤ 2, so M2 would always be selected, regardless of the actual data and probabilitystructure

• Clearly an inappropriate decision rule!

30

Functional Form

Selection Based on Testing

• Common approach to model selection: base decision on statistical test such as Wald

=

µb21 − b22b22¶

• Model selection rule is: for a critical level , let satisfy Pr£22

¤. Select M1 if

≤ , else selectM2.

• Major problem with this approach is that critical level is indeterminate

• Reasoning which helps guide choice of in hypothesis testing (controlling Type I error) isnot relevant for model selection. If is set to be a small number, then Pr

h cM =M1 |M1

i≈

1 − but Prh cM =M2 |M2

icould vary dramatically, depending on the sample size, etc.

Another problem is that if is held fixed, model selection procedure is inconsistent, as

Prh cM =M1 |M1

i→ 1− 1

31

Functional Form

Selection Based on Adjusted R-squared

• As 2 is not a useful model selection rule, as it “prefers” the larger model, Theil proposedan adjusted coefficient of determination

2= 1− (b0b) ( − )b2 = 1− e2b2

• At one time, it was popular to pick between models based on 2

• Rule is to selectM1 if 21

22, else selectM2

• Since 2 is monotonically decreasing on e2, rule is the same as selecting model with smallere2, or equivalently, smaller ln ¡e2¢• It is helpful to observe that

ln¡e2¢ = ln

µb2

¶= ln

¡b2¢ + lnµ1 +

¶' ln

¡b2¢ +

− ' ln

¡b2¢ +

(the first approximation is ln (1 + ) ' for small ).

32

Functional Form

Selection Based on Adjusted R-squared

• Selecting based on 2is the same as selecting based on ln

¡b2¢ + , which is a particular

choice of penalized likelihood criteria

• It turns out that model selection based on any criterion of the form

ln¡b2¢ +

0 (7)

is inconsistent, as the rule tends to overfit. Indeed, since underM1,

¡ln b21 − ln b22¢ ' ∼ 22 (8)

Prh cM =M1 |M1

i= Pr

h21

22 |M1

i' Pr

£ ln

¡e21¢ ln¡e22¢ |M1

¤' Pr

£ ln

¡b21¢ + 1 ln¡b22¢ + (1 + 2) |M1

¤= Pr [ 2 |M1 ]

→ Pr£22 2

¤ 1

33

Functional Form

Selection Based on Information CriteriaAkaike Information Criterion

• Akaike proposed an information criterion which takes the form

= −2+ 2

which with a Gaussian log-likelihood can be approximated by (7) with = 2:

' ln¡b2¢ + 2

• Imposes larger penalty on overparameterization than does 2

• Rule: selectM1 if 1 2, else selectM2

• Since takes the form (7), it is inconsistent model selection criterion, and tends to overfit

34

Functional Form

Selection Based on Information CriteriaSchwarz Criterion

• Modification of : Schwarz (based on Bayesian arguments)

= −2+

ln ( )

which with a Gaussian log-likelihood can be approximated by

' ln¡b2¢ +

ln ( )

• Since ln ( ) 2 (if 8), places larger penalty than on number of estimatedparameters (is more parsimonious)

• is consistent. Indeed, since (8) holds underM1,

ln ( )→ 0

Prh cM =M1 |M1

i= Pr [1 2 |M1 ]

= Pr [ 2 ln ( ) |M1 ]

= Pr

ln ( ) 2 |M1

¸→ Pr (0 2 |M1) = 1

35

Functional Form

Selection Based on Information CriteriaSchwarz Criterion

• Also underM2, one can show that

ln ( )→∞

Prh cM =M2 |M2

i= Pr [2 1 |M2 ]

= Pr

ln ( ) 2 |M2

¸→ 1

36

Functional Form

Selection Based on Information CriteriaHannan-Quinn Criterion

• Another popular model selection criterion is:

= −2+ 2

ln (ln ( ))

which with a Gaussian log-likelihood can be approximated by

' ln¡b2¢ + 2

ln (ln ( ))

• Since ln (ln ( )) 1 (if 15), places larger penalty than on number ofestimated parameters and is more parsimonious

• As 2 ln (ln ( )) ln ( ) (∀ 0), places a larger penalty than the and selectsmore parsimonious models

• is consistent

37

Functional Form

Selection Based on Information CriteriaA Final Word of Caution

• Results were obtained in OLS context with Gaussian innovations

• To compare different models, dependent variable and sample size need to be the same

• Which model selection criterion is “best”? Open question and an active field of research

• While consistency is desirable, there may be cases in which more parsimonious models runthe risk of excluding relevant variables and that is why some researchers prefer whichis consistent and not as parsimonious as

• From a practical standpoint, it is important to look at the three criteria. Who knows, theymay all choose the same the model!

38

Functional Form

Selection Among Multiple Regressors

• Selection among multiple regressors

= 11 + 22 + · · · + +

which regressors enter the regression?

— Ordered case (Nested):

M1 : 1 6= 0 2 = 3 = · · · = = 0

M2 : 1 6= 0 2 6= 0 3 = · · · = = 0...M : 1 6= 0 2 6= 0 3 6= 0 · · · 6= 0

which are nested. Selection model that minimizes criterion

— Unordered case: 2 models. Example, 210 = 1024 and 220 = 1 048 576. Computationallydemanding

39

Functional Form

Specification Searches• Theory often vague about relationship between variables• Result, many relations established from empirical regularities• If not accounted for, practice can generate serious biases in inference• Names: Data mining, data snooping, data grubbing, data fishing• Examples:

“Because of space limitations, only the best of a variety of alternative models arepresented here.”“The precise variables included in the regression were determined on the basis of

extensive experimentation (on the same body of data).”“Since there is no firmly validated theory, we avoided a priori specification of the

functions we wished to fit.”“We let the data specify the model.”

• Newsletter scam• Conventional hypothesis testing valid when a priori considerations rather than exploratorydata mining determine set of variables included

• When miner uncovers t-statistics that appear significant at 0.05 level by running a largenumber of alternative regressions on the same body of data, the probability of Type I erroris much greater than claimed 5%

40