extremum_estimators_computation
-
Upload
victor-haselmann-arakawa -
Category
Documents
-
view
219 -
download
0
Transcript of extremum_estimators_computation
-
7/30/2019 extremum_estimators_computation
1/31
Computation Resampling
Estimadores ExtremosAlgoritmos e Bootstrap
Cristine Campos de Xavier Pinto
CEDEPLAR/UFMG
Maio/2010
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
2/31
Computation Resampling
Direct computation of an extremum estimator is in generalnot possible. We need to use numerical methods forcomputing these estimators.
In this lecture, we will review methods, which are interactive
algorithms that search for the maximum of a function ofseveral arguments.
When we deal with computation, we need to deal withproblems, like multiple local maximum, discontinuities,
numerical instability and large dimensions.
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
3/31
Computation Resampling
Grid Search
Consider the one-dimensional maximization problem
max2[a,b]
Q()
and the interval [a, b] can be divided into a number of
subintervals,
f[a, 1] , [1, 2] , ..., [N, b]g
We compute the function value at each boundary, infer that
the maximum lies in one of the intervals with a boundary thatincludes the highest function value:[i, i+1]j max
jQ(j) = max [Q(i) ,Q(i+1)]
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
http://find/ -
7/30/2019 extremum_estimators_computation
4/31
Computation Resampling
One then repeats the process in each of the chosen intervals,as they were the original value (iterations)
The process will lead to smaller and smaller intervals thatcontain local maxima.
Sometimes this method does not nd the global maximum.We can mistakenly drop the interval that contains the globalmaximum if the grip is not ne enough.
If we choose many, short intervals at each iteration, weincrease computation time.
An exhaustive search is infeasible.
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
http://find/ -
7/30/2019 extremum_estimators_computation
5/31
Computation Resampling
Multidimensional settings: The grid search must coverevery dimension. Calculations will increase exponentially withthe dimension of the parameter space.
If we have n intervals and 2 Rk
, each iteration will have anorder ofnk calculations.
Sometimes we have information about the function that canhelp in the search
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
C
http://find/ -
7/30/2019 extremum_estimators_computation
6/31
Computation Resampling
Polynomial Approximation
We can explore the dierentiability of the maximand andapproximate Q() with a polynomial.
The optimum of the polynomial approximation is anapproximation of the optimum ofQ.
Lets use a quadratic approximation:
Q() a + b( 0) +1
2c( 0)
2
where a, b and c are chosen to t Q() well in aneighborhood of the starting value 0.
Given values a, b and c, the approximant to the location ofthe optimum ofQ is b
c, c< 0.
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
C i R li
http://find/ -
7/30/2019 extremum_estimators_computation
7/31
Computation Resampling
There are many ways to choose these parameters.
IfQ() is dierentiable, a second-order Taylor series yields aquadratic approximation based on Q and its rst twoderivatives,
Q() Q(0) + rQ(0) ( 0) +1
2r2 0Q(0) ( 0)
2
Another way is to t 3 points where Q() has been computed,
Q(0) = a + b0 +1
2c20
Q(1) = a + b1 + 12c21
Q(2) = a + b2 +1
2c22
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
C t ti R li
http://find/ -
7/30/2019 extremum_estimators_computation
8/31
Computation Resampling
Line Searches
Idea: overcome the high dimension maximization by using agrid search in one dimension (line search) through aparameter space with several dimension.
Given a starting point 1 and a search direction ( "line ") ,we use an iteraction attempt to solve one-dimensionalproblem:
= arg max
Q(1 + )
: step length. The starting point of next iteration is
2 = 1 +
There are many possible choices of and the method ofapproximating .
By convention, we restrict 0.
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
9/31
Computation Resampling
The directional derivative ofQ is
Q(1 + )
= rQ(1 + )0
and all line search methods require
Q(1 + )
=0
= rQ(1)0 > 0
so that Q is increasing with respect to the step length in aneighborhood of the starting value 1.
A positive value of that increases Q will always exists.We will see two types of line search: steepest accent andquadratic methods
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
10/31
Computation Resampling
The Method of Steepest Ascent
In this case, = rQ(1) .
The elements of the gradient are the rates of change in thefunction for a small ceteris paribus change in each element of.
This search direction guarantees that the function value willimprove if the entire vector is moved (at least locally) inthat direction:
Q(1 + rQ(1))
=0 = rQ(1)0 rQ(1) > 0unless 1 is a critical value.
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
11/31
Computation Resampling
The gradient has an optimality property: Among all thedirections with the same length, setting rQ(1) givesthe fastest rate of increase ofQ(1 + ) with respect to
rQ(1) = arg max
f:kk=krQ(1 )kg
Q(1 + )
This method implicitly approximates the maximand Q() as alinear function in the neighborhood of 1:
Q() Q(1) + rQ(1)0
( 1)
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
12/31
Computation Resampling
This method gives no guidance for the step length .
Maximization involves the curvature of a function.
This method does not exploit curvature, and make thisalgorithm slow for many practical problems.
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
13/31
p p g
Example: OLSLets apply this algorithm to solve the following problem:
max 1
2 (Y X)0
(Y X)
where = and Q() = 12 (Y X)0 (Y X) .
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
14/31
On the ith iteration, let the starting point i soi = X0 (y Xi) and each line search solves
i = arg max
12
[y X (i + i)]0 [y X (i + i)]
= arg max
0iX
0 (y Xi)
1
2
0iX
0Xi2
=0iX
0 (y Xi
)
0iX0Xi
=0ii
0iX0Xi
and the best step yields
i+1 = i + i i
= i +(y Xi)X
0X(y Xi)
(y Xi)X0X0XX(y Xi)
X0 (y Xi)
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
15/31
Quadratic Methods
Lets assume that Q is exactly quadratic
Q() = a + b0 +1
20C
where
rQ() = b+ C
r2,Q() = C
The Hessian C is negative denite ifQ is strictly concave. Inthat case, Q attains its maximum at
= C1b
= 1 C1 (b+ C1)
= 1 r2 ,Q()
1 rQ()
Cristine Campos de Xavier Pinto Institute
Estimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
16/31
This expression suggests a modication to the search directionof the steepest ascent.
For quadratic functions,
= r2 ,Q()1 rQ()
A single line search would yield the optimal value of at thestep length equal to one, no matter the starting value.
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
17/31
Example: Lets to the OLS example using the quadratic method
rQ() = X0 (y X)
r2,Q() = X0X
In this case, best step yields
i+1 = i +
X0X1
X0 (y Xi)
1
=
X0X
1
X0y
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
18/31
Quadratic optimization methods approximate general
functions with quadratic functions,
Q() Q(1) + rQ(1)0 ( 1)
+1
2( 1)
0 r2 0Q(1) ( 1)
The maximum of the quadratic approximation as a furtherapproximation of the maximum of the original function.
For Taylor series approximation, the search direction is
= r2
,
Q(1)1
rQ(1)
We will explore some examples of the quadratic methods
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
19/31
Newton-Raphson
The Newton-Raphson use the quadratic expansion for thescore:
N
i=1
si (g+1) =N
i=1
si (g) +
"N
i=1
Hi (g)
#(g+1 g) + rg
where si () is the Px1 score with respect to , H() is thePxP Hessian and r is a Px1 is vector of remainder vectors.
In this case, ignoring the remainder term
g+1 = g " Ni=1 Hi (g)#1 " Ni=1 si (g)#
Idea: As we get close to the solution, Ni=1 si (g) will getclose to zero, and the search direction will get smaller.
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
20/31
In general, we can use a stop rule: the requirement that thelargest absolute value change jg+1 gj is smaller than a
constant.Another stop criteria that is used by these quadratic methodsis
"N
i=1
si (g)#0
"N
i=1
Hi (g)#1
"N
i=1
si (g)#being less than a small number, 0.0001.
This expression will be zero when the a maximum has beenreached.
We need to check that the Hessian is negative denite beforeclaiming convergence.
We need many dierent starting values to make sure that atend the maximum is a global one and not a local one.
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
21/31
Drawbacks:
Computation of the second derivativeThe sum of the Hessian may not be negative denite at aparticular value of , and we can go in the wrong direction.
We check the progress is being made by computing the
dierence in the values of the objective function in eachiteration:
N
i=1
Qi (g+1) N
i=1
Qi (g)
Since we are maximizing the objective function, we shouldexpect that the step from g to g+ 1 is positive.
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
22/31
BHHH Algorithm
Use the outer product of the score in the place of the Hessian,
g+1 = g + " Ni=1
si (g) si (g)0#1 " Ni=1
si (g)#where is the direction (step size)
It solves the problem of estimating a second derivative.
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
23/31
The Generalized Gauss-Newton Method
Another possibility to estimate the Hessian is to use theexpected value ofH(z, 0) conditional on x, where z ispartitioned into y and x.
We called this conditional expectation, A (x, 0) .The generalized Gauss-Newton method uses the updatingequation:
g+1 = g " Ni=1 Ai (g)#1
" Ni=1 si (g)#
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
24/31
Sometimes, it is computationally convenient to concentrateone set of parameters.
Suppose that we can partition into the vectors and . Inthis case, the rst order conditions are:
N
i=1
rQ(zi, , ) = 0
N
i=1
rQ(zi, , ) = 0
Suppose that the second equation can be solved for as a
function ofz and , in the parameter set = g(z, )N
i=1
rQ(zi, g(z, ) , ) = 0
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
25/31
When we plug g(z, ) into the original objective function, weget the concentrated objective function that only depends on
Qc (z, ) =N
i=1
Q(zi, g(z, ) , )
Under some regularity condition, b that solves themaximization problem using the concentrated objectivefunction is the same as the one for the original problem.
Finding b, we can get b = gz,b .
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
26/31
Allow to improve the asymptotic distribution approximation.
Sometimes, we know that the approximation distribution for bworks well, but we are interested in a function of theparameters
0 = g(0)
One way to obtain the approximation distribution of thisfunction is to use the Delta Method to approximateb = gb .Sometimes it is hard to apply the Delta Method or theapproximations are not good.
Resampling can improve the usual asymptotic (standard errorsand condence intervals)
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
27/31
Bootstrapping
There are several variants of bootstrap.Idea: Approximate the distribution ofb without relying on therst-order asymptotic theory.
Let fz1, ..., zNg be the outcome of a random sample.
At each bootstrap iteration, b, a random sample of size N is
drawn from fz1, ..., zNg, with replacement,nz
(b)1 , ..., z
(b)N
oAt each iteration, we use the bootstrap sample to obtain the
estimate b(b) by solvingmax2
N
i=1
Qz
(b)i ,
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/http://goback/ -
7/30/2019 extremum_estimators_computation
28/31
We iterate the process B times, obtaining
b
(b), b= 1, ...,B.
Then, we compute the average ofb(b) say b and uses thisaverage as the estimate value of the parameter.
The sample variance
1
B 1
N
i=1 b(b) b2
can be used to estimate the standard error.
A 95% bootstrapped condence interval for 0 can be
obtained by nding the 2.5 and 97
.5 percentiles in the list of
valuesnb(b) : b= 1, ...,Bo .
This is the nonparametric bootstrap
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
29/31
Parametric bootstrap: assume that the distribution of z isknown up to the parameter 0.
Let f(., ) denote the parametric density.
On each bootstrap iteration, we draw a random sample of sizeN from f
.,b which gives nz(b)1 , ..., z(b)N o .
We do the resampling thousands of times.
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
30/31
Other alternative: In a regression model, we rst estimate bby NLS and obtain the residuals
bi = yi m
xi,
b
then we bootstrap sample of the residuals nb(b)i : b= 1, ..,Boand obtain y
(b)i = m
xi,b +b(b)i .
Using the generated data
nxi, y
(b)i
: i = 1, ...,N
o, we
compute b(b).
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
Computation Resampling
http://find/ -
7/30/2019 extremum_estimators_computation
31/31
References
Amemya: 4
Wooldridge: 12
Rudd: 16
Newey, W. and D. McFadden (1994). "Large SampleEstimation and Hypothesis Testing", Handbook ofEconometrics, Volume IV, chapter 36.
Cristine Campos de Xavier Pinto InstituteEstimadores Extremos
http://goforward/http://find/