Optimization. Issues What is optimization? What real life situations give rise to optimization...
-
Upload
domenic-fredricks -
Category
Documents
-
view
218 -
download
2
Transcript of Optimization. Issues What is optimization? What real life situations give rise to optimization...
Optimization
Issues
What is optimization? What real life situations give rise to
optimization problems? When is it easy to optimize? What are we trying to optimize? What can cause problems when we try to
optimize? What methods can we use to optimize?
One-Dimensional Minimization
Golden section search
Brent’s method
One-Dimensional Minimization
Golden section search: successively narrowing the brackets of upper and lower bounds
Terminating condition: |x3–x1|<
Start with x1,x2,x3 where f2 is smaller than f1 and f3Iteration: Choose x4 somewhere in the larger intervalTwo cases for f4: • f4a: [x1,x2,x4]• f4b: [x2,x4,x3]
Initial bracketing…
)(min xfRx
Upper bound a, lower bound b, initial estimate x f(a) > f(x) < f(b) This condition guarantees that a minimum is
contained somewhere within the interval. On each iteration a new point x' is selected using one
of the available algorithms. If the new point is a better estimate of the minimum,
i.e. where f(x') < f(x), then the current estimate of the minimum x is updated.
The new point also allows the size of the bounded interval to be reduced, by choosing the most compact set of points which satisfies the constraint f(a) > f(x) < f(b).
The interval is reduced until it encloses the true minimum to a desired tolerance.
This provides a best estimate of the location of the minimum and a rigorous error estimate.
From GSL
Golden Section Search
618.12
51
11
b
aa
ab
abccba
b
a
cb
cb
aa
c
Guaranteed linear convergence:[x1,x3]/[x1,x4] = 1.618
[GSL] Choosing the golden section as the bisection ratio can be shown to provide the fastest convergence for this type of algorithm.
Golden Section (reference)
Fibonacci Search (ref)
2
511
k
k
F
FFi: 0, 1, 1, 2, 3, 5, 8, 13, …
Related…
Parabolic Interpolation (Brent)
Brent Details (From GSL)
The minimum of the parabola is taken as a guess for the minimum.
If it lies within the bounds of the current interval then the interpolating point is accepted, and used to generate a smaller interval.
If the interpolating point is not accepted then the algorithm falls back to an ordinary golden section step.
The full details of Brent's method include some additional checks to improve convergence.
Brent(details)
The abscissa x that is the minimum of a parabola through three points (a,f(a)), (b,f(b)), (c,f(c))
Multi-Dimensional Minimization
Gradient Descent
Conjugate Gradient
f: RnR. If f(x) is of class C2, objective function Gradient of f
Hessian of f
Gradient and Hessian
Optimality
Positive semi-definite Hessian
Taylor’s expansion
For one dimensional f(x)
Multi-Dimensional Optimization
0 :points critical
:
f
RRf n
Higher dimensional root finding is no easier (more difficult) than minimization
Quasi-Newton Method
The various quasi-Newton methods (DFP, BFGS, Broyden) differ in their choice of the solution to update B.
Taylor’s series of f(x) around xk:
B: an approximation to the Hessian matrix
The gradient of this approximation:
Setting this gradient to zero provides the Newton step:
Gradient Descent
Are the directions always orthogonal? Yes!
ExampleMinimize
minimum
…
Gradient is perpendicular to level curves and surfaces
(proof)
Weakness of Gradient Descent
Narrow valley
where
Any function f(x) can be locally approximated by a quadratic function
Conjugate gradient method is a method that works well on this kind of problems
Conjugate Gradient
An iterative method for solving linear systems Ax=b, where A is symmetric and positive definite
Guaranteed to converge in n steps, where n is the system size
Symmetric A is positive definite if it has (any of these):1. All n eigenvalues are positive2. All n upper left determinants are positive3. All n pivots are positive4. xTAx is positive except at x = 0
Symmetric A is positive definite if it has (any of these):1. All n eigenvalues are positive2. All n upper left determinants are positive3. All n pivots are positive4. xTAx is positive except at x = 0
Details (from wikipedia)
Two nonzero vectors u & v are conjugate w.r.t. A:
{pk} are n mutually conjugate directions. {pk} form a basis of Rn.
x*, the solution to Ax=b, can be expressed in this basis
Therefore,Find pk’sSolve k’s
Find pk’sSolve k’s
The Iterative Method
Equivalent problem: find the minimal of the quadratic function,
Taking the first basis vector p1 to be the gradient of f at x = x0; the other vectors in the basis will be conjugate to the gradient
rk: the residual at kth step,
Note that rk is the negative gradient of f at x = xk
The Algorithm
Example
yx
yx
bA
xbAxxxf
yfxf
TT
1022
2161
2
1,
51
18
)( 21
Stationary point at [-1/26, -5/26]
Solving Linear Equations
The optimality condition seems to suggest that CG can be used to solve linear equations
CG is only applicable for symmetric positive definite A. For arbitrary linear systems, solve the normal equation
since ATA is symmetric and positive-semidefinite for any A But, k(ATA) = k(A)^2! Slower convergence, worse accuracy
BiCG (biconjugate gradient) is the approach to use for general A
Multidimensional Minimizer [GSL]
Conjugate gradient Fletcher-Reeves, Polak-Ribiere
Quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) Utilizes 2nd order approximation
Steepest descent Inefficient (for demonstration purpose)
Simplex algorithm (Nelder and Mead) Without derivative
GSL Example
Objective function: paraboloid
42
132
02),( ppyppxpyxf
Starting from (5,7)
30220110),( 22 yxyxf
Conjugate gradientConverge in 12 iterations
Steepest descentConverge in 158 iterations
[Solutions in Numerical Recipe]
Sec.2.7 linbcg (biconjugate gradient): general AReference A implicitly through atimes
Sec.10.6 frprmn (minimization) Model test problem: spacetime, …