extremum_estimators_computation

7/30/2019 extremum_estimators_computation

1/31

Computation Resampling

Estimadores ExtremosAlgoritmos e Bootstrap

Cristine Campos de Xavier Pinto

CEDEPLAR/UFMG

Maio/2010

Cristine Campos de Xavier Pinto Institute

Estimadores Extremos
http://find/http://goback/


2/31


Direct computation of an extremum estimator is in generalnot possible. We need to use numerical methods forcomputing these estimators.

In this lecture, we will review methods, which are interactive

algorithms that search for the maximum of a function ofseveral arguments.

When we deal with computation, we need to deal withproblems, like multiple local maximum, discontinuities,

numerical instability and large dimensions.




3/31


Grid Search

Consider the one-dimensional maximization problem

max2[a,b]

Q()

and the interval [a, b] can be divided into a number of

subintervals,

f[a, 1] , [1, 2] , ..., [N, b]g

We compute the function value at each boundary, infer that

the maximum lies in one of the intervals with a boundary thatincludes the highest function value:[i, i+1]j max

jQ(j) = max [Q(i) ,Q(i+1)]


http://find/


4/31


One then repeats the process in each of the chosen intervals,as they were the original value (iterations)

The process will lead to smaller and smaller intervals thatcontain local maxima.

Sometimes this method does not nd the global maximum.We can mistakenly drop the interval that contains the globalmaximum if the grip is not ne enough.

If we choose many, short intervals at each iteration, weincrease computation time.

An exhaustive search is infeasible.


http://find/


5/31


Multidimensional settings: The grid search must coverevery dimension. Calculations will increase exponentially withthe dimension of the parameter space.

If we have n intervals and 2 Rk

, each iteration will have anorder ofnk calculations.

Sometimes we have information about the function that canhelp in the search



C
http://find/


6/31


Polynomial Approximation

We can explore the dierentiability of the maximand andapproximate Q() with a polynomial.

The optimum of the polynomial approximation is anapproximation of the optimum ofQ.

Lets use a quadratic approximation:

Q() a + b( 0) +1

2c( 0)

2

where a, b and c are chosen to t Q() well in aneighborhood of the starting value 0.

Given values a, b and c, the approximant to the location ofthe optimum ofQ is b

c, c< 0.



C i R li
http://find/


7/31


There are many ways to choose these parameters.

IfQ() is dierentiable, a second-order Taylor series yields aquadratic approximation based on Q and its rst twoderivatives,

Q() Q(0) + rQ(0) ( 0) +1

2r2 0Q(0) ( 0)

2

Another way is to t 3 points where Q() has been computed,

Q(0) = a + b0 +1

2c20

Q(1) = a + b1 + 12c21

Q(2) = a + b2 +1

2c22



C t ti R li
http://find/


8/31


Line Searches

Idea: overcome the high dimension maximization by using agrid search in one dimension (line search) through aparameter space with several dimension.

Given a starting point 1 and a search direction ( "line ") ,we use an iteraction attempt to solve one-dimensionalproblem:

= arg max

Q(1 + )

: step length. The starting point of next iteration is

2 = 1 +

There are many possible choices of and the method ofapproximating .

By convention, we restrict 0.



http://find/


9/31


The directional derivative ofQ is

Q(1 + )

= rQ(1 + )0

and all line search methods require

Q(1 + )

=0

= rQ(1)0 > 0

so that Q is increasing with respect to the step length in aneighborhood of the starting value 1.

A positive value of that increases Q will always exists.We will see two types of line search: steepest accent andquadratic methods



http://find/


10/31


The Method of Steepest Ascent

In this case, = rQ(1) .

The elements of the gradient are the rates of change in thefunction for a small ceteris paribus change in each element of.

This search direction guarantees that the function value willimprove if the entire vector is moved (at least locally) inthat direction:

Q(1 + rQ(1))

=0 = rQ(1)0 rQ(1) > 0unless 1 is a critical value.



http://find/


11/31


The gradient has an optimality property: Among all thedirections with the same length, setting rQ(1) givesthe fastest rate of increase ofQ(1 + ) with respect to

rQ(1) = arg max

f:kk=krQ(1 )kg

Q(1 + )

This method implicitly approximates the maximand Q() as alinear function in the neighborhood of 1:

Q() Q(1) + rQ(1)0

( 1)





12/31


This method gives no guidance for the step length .

Maximization involves the curvature of a function.

This method does not exploit curvature, and make thisalgorithm slow for many practical problems.





13/31

p p g

Example: OLSLets apply this algorithm to solve the following problem:

max 1

2 (Y X)0

(Y X)

where = and Q() = 12 (Y X)0 (Y X) .





14/31

On the ith iteration, let the starting point i soi = X0 (y Xi) and each line search solves

i = arg max

12

[y X (i + i)]0 [y X (i + i)]

= arg max

0iX

0 (y Xi)

1

2

0iX

0Xi2

=0iX

0 (y Xi

)

0iX0Xi

=0ii

0iX0Xi

and the best step yields

i+1 = i + i i

= i +(y Xi)X

0X(y Xi)

(y Xi)X0X0XX(y Xi)

X0 (y Xi)





15/31

Quadratic Methods

Lets assume that Q is exactly quadratic

Q() = a + b0 +1

20C

where

rQ() = b+ C

r2,Q() = C

The Hessian C is negative denite ifQ is strictly concave. Inthat case, Q attains its maximum at

= C1b

= 1 C1 (b+ C1)

= 1 r2 ,Q()

1 rQ()



http://find/


16/31

This expression suggests a modication to the search directionof the steepest ascent.

For quadratic functions,

= r2 ,Q()1 rQ()

A single line search would yield the optimal value of at thestep length equal to one, no matter the starting value.

Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

http://find/


17/31

Example: Lets to the OLS example using the quadratic method

rQ() = X0 (y X)

r2,Q() = X0X

In this case, best step yields

i+1 = i +

X0X1

X0 (y Xi)

1

=

X0X

1

X0y


http://find/


18/31

Quadratic optimization methods approximate general

functions with quadratic functions,

Q() Q(1) + rQ(1)0 ( 1)

+1

2( 1)

0 r2 0Q(1) ( 1)

The maximum of the quadratic approximation as a furtherapproximation of the maximum of the original function.

For Taylor series approximation, the search direction is

= r2

,

Q(1)1

rQ(1)

We will explore some examples of the quadratic methods


http://find/


19/31

Newton-Raphson

The Newton-Raphson use the quadratic expansion for thescore:

N

i=1

si (g+1) =N

i=1

si (g) +

"N

i=1

Hi (g)

#(g+1 g) + rg

where si () is the Px1 score with respect to , H() is thePxP Hessian and r is a Px1 is vector of remainder vectors.

In this case, ignoring the remainder term

g+1 = g " Ni=1 Hi (g)#1 " Ni=1 si (g)#

Idea: As we get close to the solution, Ni=1 si (g) will getclose to zero, and the search direction will get smaller.




20/31

In general, we can use a stop rule: the requirement that thelargest absolute value change jg+1 gj is smaller than a

constant.Another stop criteria that is used by these quadratic methodsis

"N

i=1

si (g)#0

"N

i=1

Hi (g)#1

"N

i=1

si (g)#being less than a small number, 0.0001.

This expression will be zero when the a maximum has beenreached.

We need to check that the Hessian is negative denite beforeclaiming convergence.

We need many dierent starting values to make sure that atend the maximum is a global one and not a local one.


http://find/


21/31

Drawbacks:

Computation of the second derivativeThe sum of the Hessian may not be negative denite at aparticular value of , and we can go in the wrong direction.

We check the progress is being made by computing the

dierence in the values of the objective function in eachiteration:

N

i=1

Qi (g+1) N

i=1

Qi (g)

Since we are maximizing the objective function, we shouldexpect that the step from g to g+ 1 is positive.


http://find/


22/31

BHHH Algorithm

Use the outer product of the score in the place of the Hessian,

g+1 = g + " Ni=1

si (g) si (g)0#1 " Ni=1

si (g)#where is the direction (step size)

It solves the problem of estimating a second derivative.




23/31

The Generalized Gauss-Newton Method

Another possibility to estimate the Hessian is to use theexpected value ofH(z, 0) conditional on x, where z ispartitioned into y and x.

We called this conditional expectation, A (x, 0) .The generalized Gauss-Newton method uses the updatingequation:

g+1 = g " Ni=1 Ai (g)#1

" Ni=1 si (g)#


http://find/


24/31

Sometimes, it is computationally convenient to concentrateone set of parameters.

Suppose that we can partition into the vectors and . Inthis case, the rst order conditions are:

N

i=1

rQ(zi, , ) = 0

N

i=1

rQ(zi, , ) = 0

Suppose that the second equation can be solved for as a

function ofz and , in the parameter set = g(z, )N

i=1

rQ(zi, g(z, ) , ) = 0




25/31

When we plug g(z, ) into the original objective function, weget the concentrated objective function that only depends on

Qc (z, ) =N

i=1

Q(zi, g(z, ) , )

Under some regularity condition, b that solves themaximization problem using the concentrated objectivefunction is the same as the one for the original problem.

Finding b, we can get b = gz,b .




26/31

Allow to improve the asymptotic distribution approximation.

Sometimes, we know that the approximation distribution for bworks well, but we are interested in a function of theparameters

0 = g(0)

One way to obtain the approximation distribution of thisfunction is to use the Delta Method to approximateb = gb .Sometimes it is hard to apply the Delta Method or theapproximations are not good.

Resampling can improve the usual asymptotic (standard errorsand condence intervals)


http://find/


27/31

Bootstrapping

There are several variants of bootstrap.Idea: Approximate the distribution ofb without relying on therst-order asymptotic theory.

Let fz1, ..., zNg be the outcome of a random sample.

At each bootstrap iteration, b, a random sample of size N is

drawn from fz1, ..., zNg, with replacement,nz

(b)1 , ..., z

(b)N

oAt each iteration, we use the bootstrap sample to obtain the

estimate b(b) by solvingmax2

N

i=1

Qz

(b)i ,




28/31

We iterate the process B times, obtaining

b

(b), b= 1, ...,B.

Then, we compute the average ofb(b) say b and uses thisaverage as the estimate value of the parameter.

The sample variance

1

B 1

N

i=1 b(b) b2

can be used to estimate the standard error.

A 95% bootstrapped condence interval for 0 can be

obtained by nding the 2.5 and 97

.5 percentiles in the list of

valuesnb(b) : b= 1, ...,Bo .

This is the nonparametric bootstrap


http://find/


29/31

Parametric bootstrap: assume that the distribution of z isknown up to the parameter 0.

Let f(., ) denote the parametric density.

On each bootstrap iteration, we draw a random sample of sizeN from f

.,b which gives nz(b)1 , ..., z(b)N o .

We do the resampling thousands of times.


http://find/


30/31

Other alternative: In a regression model, we rst estimate bby NLS and obtain the residuals

bi = yi m

xi,

b

then we bootstrap sample of the residuals nb(b)i : b= 1, ..,Boand obtain y

(b)i = m

xi,b +b(b)i .

Using the generated data

nxi, y

(b)i

: i = 1, ...,N

o, we

compute b(b).


http://find/


31/31

References

Amemya: 4

Wooldridge: 12

Rudd: 16

Newey, W. and D. McFadden (1994). "Large SampleEstimation and Hypothesis Testing", Handbook ofEconometrics, Volume IV, chapter 36.

http://goforward/http://find/

extremum_estimators_computation

Documents

Transcript of extremum_estimators_computation