extremum_estimators_computation

download extremum_estimators_computation

of 31

Transcript of extremum_estimators_computation

  • 7/30/2019 extremum_estimators_computation

    1/31

    Computation Resampling

    Estimadores ExtremosAlgoritmos e Bootstrap

    Cristine Campos de Xavier Pinto

    CEDEPLAR/UFMG

    Maio/2010

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    2/31

    Computation Resampling

    Direct computation of an extremum estimator is in generalnot possible. We need to use numerical methods forcomputing these estimators.

    In this lecture, we will review methods, which are interactive

    algorithms that search for the maximum of a function ofseveral arguments.

    When we deal with computation, we need to deal withproblems, like multiple local maximum, discontinuities,

    numerical instability and large dimensions.

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    3/31

    Computation Resampling

    Grid Search

    Consider the one-dimensional maximization problem

    max2[a,b]

    Q()

    and the interval [a, b] can be divided into a number of

    subintervals,

    f[a, 1] , [1, 2] , ..., [N, b]g

    We compute the function value at each boundary, infer that

    the maximum lies in one of the intervals with a boundary thatincludes the highest function value:[i, i+1]j max

    jQ(j) = max [Q(i) ,Q(i+1)]

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    http://find/
  • 7/30/2019 extremum_estimators_computation

    4/31

    Computation Resampling

    One then repeats the process in each of the chosen intervals,as they were the original value (iterations)

    The process will lead to smaller and smaller intervals thatcontain local maxima.

    Sometimes this method does not nd the global maximum.We can mistakenly drop the interval that contains the globalmaximum if the grip is not ne enough.

    If we choose many, short intervals at each iteration, weincrease computation time.

    An exhaustive search is infeasible.

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    http://find/
  • 7/30/2019 extremum_estimators_computation

    5/31

    Computation Resampling

    Multidimensional settings: The grid search must coverevery dimension. Calculations will increase exponentially withthe dimension of the parameter space.

    If we have n intervals and 2 Rk

    , each iteration will have anorder ofnk calculations.

    Sometimes we have information about the function that canhelp in the search

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    C

    http://find/
  • 7/30/2019 extremum_estimators_computation

    6/31

    Computation Resampling

    Polynomial Approximation

    We can explore the dierentiability of the maximand andapproximate Q() with a polynomial.

    The optimum of the polynomial approximation is anapproximation of the optimum ofQ.

    Lets use a quadratic approximation:

    Q() a + b( 0) +1

    2c( 0)

    2

    where a, b and c are chosen to t Q() well in aneighborhood of the starting value 0.

    Given values a, b and c, the approximant to the location ofthe optimum ofQ is b

    c, c< 0.

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    C i R li

    http://find/
  • 7/30/2019 extremum_estimators_computation

    7/31

    Computation Resampling

    There are many ways to choose these parameters.

    IfQ() is dierentiable, a second-order Taylor series yields aquadratic approximation based on Q and its rst twoderivatives,

    Q() Q(0) + rQ(0) ( 0) +1

    2r2 0Q(0) ( 0)

    2

    Another way is to t 3 points where Q() has been computed,

    Q(0) = a + b0 +1

    2c20

    Q(1) = a + b1 + 12c21

    Q(2) = a + b2 +1

    2c22

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    C t ti R li

    http://find/
  • 7/30/2019 extremum_estimators_computation

    8/31

    Computation Resampling

    Line Searches

    Idea: overcome the high dimension maximization by using agrid search in one dimension (line search) through aparameter space with several dimension.

    Given a starting point 1 and a search direction ( "line ") ,we use an iteraction attempt to solve one-dimensionalproblem:

    = arg max

    Q(1 + )

    : step length. The starting point of next iteration is

    2 = 1 +

    There are many possible choices of and the method ofapproximating .

    By convention, we restrict 0.

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    9/31

    Computation Resampling

    The directional derivative ofQ is

    Q(1 + )

    = rQ(1 + )0

    and all line search methods require

    Q(1 + )

    =0

    = rQ(1)0 > 0

    so that Q is increasing with respect to the step length in aneighborhood of the starting value 1.

    A positive value of that increases Q will always exists.We will see two types of line search: steepest accent andquadratic methods

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    10/31

    Computation Resampling

    The Method of Steepest Ascent

    In this case, = rQ(1) .

    The elements of the gradient are the rates of change in thefunction for a small ceteris paribus change in each element of.

    This search direction guarantees that the function value willimprove if the entire vector is moved (at least locally) inthat direction:

    Q(1 + rQ(1))

    =0 = rQ(1)0 rQ(1) > 0unless 1 is a critical value.

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    11/31

    Computation Resampling

    The gradient has an optimality property: Among all thedirections with the same length, setting rQ(1) givesthe fastest rate of increase ofQ(1 + ) with respect to

    rQ(1) = arg max

    f:kk=krQ(1 )kg

    Q(1 + )

    This method implicitly approximates the maximand Q() as alinear function in the neighborhood of 1:

    Q() Q(1) + rQ(1)0

    ( 1)

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    12/31

    Computation Resampling

    This method gives no guidance for the step length .

    Maximization involves the curvature of a function.

    This method does not exploit curvature, and make thisalgorithm slow for many practical problems.

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    13/31

    p p g

    Example: OLSLets apply this algorithm to solve the following problem:

    max 1

    2 (Y X)0

    (Y X)

    where = and Q() = 12 (Y X)0 (Y X) .

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    14/31

    On the ith iteration, let the starting point i soi = X0 (y Xi) and each line search solves

    i = arg max

    12

    [y X (i + i)]0 [y X (i + i)]

    = arg max

    0iX

    0 (y Xi)

    1

    2

    0iX

    0Xi2

    =0iX

    0 (y Xi

    )

    0iX0Xi

    =0ii

    0iX0Xi

    and the best step yields

    i+1 = i + i i

    = i +(y Xi)X

    0X(y Xi)

    (y Xi)X0X0XX(y Xi)

    X0 (y Xi)

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    15/31

    Quadratic Methods

    Lets assume that Q is exactly quadratic

    Q() = a + b0 +1

    20C

    where

    rQ() = b+ C

    r2,Q() = C

    The Hessian C is negative denite ifQ is strictly concave. Inthat case, Q attains its maximum at

    = C1b

    = 1 C1 (b+ C1)

    = 1 r2 ,Q()

    1 rQ()

    Cristine Campos de Xavier Pinto Institute

    Estimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    16/31

    This expression suggests a modication to the search directionof the steepest ascent.

    For quadratic functions,

    = r2 ,Q()1 rQ()

    A single line search would yield the optimal value of at thestep length equal to one, no matter the starting value.

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    17/31

    Example: Lets to the OLS example using the quadratic method

    rQ() = X0 (y X)

    r2,Q() = X0X

    In this case, best step yields

    i+1 = i +

    X0X1

    X0 (y Xi)

    1

    =

    X0X

    1

    X0y

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    18/31

    Quadratic optimization methods approximate general

    functions with quadratic functions,

    Q() Q(1) + rQ(1)0 ( 1)

    +1

    2( 1)

    0 r2 0Q(1) ( 1)

    The maximum of the quadratic approximation as a furtherapproximation of the maximum of the original function.

    For Taylor series approximation, the search direction is

    = r2

    ,

    Q(1)1

    rQ(1)

    We will explore some examples of the quadratic methods

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    19/31

    Newton-Raphson

    The Newton-Raphson use the quadratic expansion for thescore:

    N

    i=1

    si (g+1) =N

    i=1

    si (g) +

    "N

    i=1

    Hi (g)

    #(g+1 g) + rg

    where si () is the Px1 score with respect to , H() is thePxP Hessian and r is a Px1 is vector of remainder vectors.

    In this case, ignoring the remainder term

    g+1 = g " Ni=1 Hi (g)#1 " Ni=1 si (g)#

    Idea: As we get close to the solution, Ni=1 si (g) will getclose to zero, and the search direction will get smaller.

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    20/31

    In general, we can use a stop rule: the requirement that thelargest absolute value change jg+1 gj is smaller than a

    constant.Another stop criteria that is used by these quadratic methodsis

    "N

    i=1

    si (g)#0

    "N

    i=1

    Hi (g)#1

    "N

    i=1

    si (g)#being less than a small number, 0.0001.

    This expression will be zero when the a maximum has beenreached.

    We need to check that the Hessian is negative denite beforeclaiming convergence.

    We need many dierent starting values to make sure that atend the maximum is a global one and not a local one.

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    21/31

    Drawbacks:

    Computation of the second derivativeThe sum of the Hessian may not be negative denite at aparticular value of , and we can go in the wrong direction.

    We check the progress is being made by computing the

    dierence in the values of the objective function in eachiteration:

    N

    i=1

    Qi (g+1) N

    i=1

    Qi (g)

    Since we are maximizing the objective function, we shouldexpect that the step from g to g+ 1 is positive.

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    22/31

    BHHH Algorithm

    Use the outer product of the score in the place of the Hessian,

    g+1 = g + " Ni=1

    si (g) si (g)0#1 " Ni=1

    si (g)#where is the direction (step size)

    It solves the problem of estimating a second derivative.

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    23/31

    The Generalized Gauss-Newton Method

    Another possibility to estimate the Hessian is to use theexpected value ofH(z, 0) conditional on x, where z ispartitioned into y and x.

    We called this conditional expectation, A (x, 0) .The generalized Gauss-Newton method uses the updatingequation:

    g+1 = g " Ni=1 Ai (g)#1

    " Ni=1 si (g)#

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    24/31

    Sometimes, it is computationally convenient to concentrateone set of parameters.

    Suppose that we can partition into the vectors and . Inthis case, the rst order conditions are:

    N

    i=1

    rQ(zi, , ) = 0

    N

    i=1

    rQ(zi, , ) = 0

    Suppose that the second equation can be solved for as a

    function ofz and , in the parameter set = g(z, )N

    i=1

    rQ(zi, g(z, ) , ) = 0

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    25/31

    When we plug g(z, ) into the original objective function, weget the concentrated objective function that only depends on

    Qc (z, ) =N

    i=1

    Q(zi, g(z, ) , )

    Under some regularity condition, b that solves themaximization problem using the concentrated objectivefunction is the same as the one for the original problem.

    Finding b, we can get b = gz,b .

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    26/31

    Allow to improve the asymptotic distribution approximation.

    Sometimes, we know that the approximation distribution for bworks well, but we are interested in a function of theparameters

    0 = g(0)

    One way to obtain the approximation distribution of thisfunction is to use the Delta Method to approximateb = gb .Sometimes it is hard to apply the Delta Method or theapproximations are not good.

    Resampling can improve the usual asymptotic (standard errorsand condence intervals)

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    27/31

    Bootstrapping

    There are several variants of bootstrap.Idea: Approximate the distribution ofb without relying on therst-order asymptotic theory.

    Let fz1, ..., zNg be the outcome of a random sample.

    At each bootstrap iteration, b, a random sample of size N is

    drawn from fz1, ..., zNg, with replacement,nz

    (b)1 , ..., z

    (b)N

    oAt each iteration, we use the bootstrap sample to obtain the

    estimate b(b) by solvingmax2

    N

    i=1

    Qz

    (b)i ,

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/http://goback/
  • 7/30/2019 extremum_estimators_computation

    28/31

    We iterate the process B times, obtaining

    b

    (b), b= 1, ...,B.

    Then, we compute the average ofb(b) say b and uses thisaverage as the estimate value of the parameter.

    The sample variance

    1

    B 1

    N

    i=1 b(b) b2

    can be used to estimate the standard error.

    A 95% bootstrapped condence interval for 0 can be

    obtained by nding the 2.5 and 97

    .5 percentiles in the list of

    valuesnb(b) : b= 1, ...,Bo .

    This is the nonparametric bootstrap

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    29/31

    Parametric bootstrap: assume that the distribution of z isknown up to the parameter 0.

    Let f(., ) denote the parametric density.

    On each bootstrap iteration, we draw a random sample of sizeN from f

    .,b which gives nz(b)1 , ..., z(b)N o .

    We do the resampling thousands of times.

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    30/31

    Other alternative: In a regression model, we rst estimate bby NLS and obtain the residuals

    bi = yi m

    xi,

    b

    then we bootstrap sample of the residuals nb(b)i : b= 1, ..,Boand obtain y

    (b)i = m

    xi,b +b(b)i .

    Using the generated data

    nxi, y

    (b)i

    : i = 1, ...,N

    o, we

    compute b(b).

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    Computation Resampling

    http://find/
  • 7/30/2019 extremum_estimators_computation

    31/31

    References

    Amemya: 4

    Wooldridge: 12

    Rudd: 16

    Newey, W. and D. McFadden (1994). "Large SampleEstimation and Hypothesis Testing", Handbook ofEconometrics, Volume IV, chapter 36.

    Cristine Campos de Xavier Pinto InstituteEstimadores Extremos

    http://goforward/http://find/