Math for CSLecture 61 Function Optimization Newton’s Method. Conjugate Gradients.
Math for CSTutorial 5-61 Function Optimization. Line Search. Taylor Series for R n Steepest Descent...
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
1
Transcript of Math for CSTutorial 5-61 Function Optimization. Line Search. Taylor Series for R n Steepest Descent...
Math for CS Tutorial 5-6 1
Tutorial 5-6
Function Optimization.
Line Search.
Taylor Series for Rn
Steepest Descent
Newton’s Method
Conjugate Gradients Method
Math for CS Tutorial 5-6 2
Line search runs as following. Let
Be the scalar function of α representing the possible values of f(x) in the direction of
pk. Let (a,b,c) be the three points of α, such, that the point of (constrained) minimum
x’, is between a and c: a<x’<c.
Then the following algorithm allows to
approach x’ arbitrarily close:
If b-a>c-b, u=(a+b)/2;If f(u)<f(b)(a,b,c)=(a,u,b) Else(a,b,c)=(u,b,c)
Line search
a
b
c
uIf b-a<c-b, u=(b+c)/2;If f(u)<f(b)(a,b,c)=(b,u,c) Else(a,b,c)=(a,b,u)
Math for CS Tutorial 5-6 3
The Taylor series for f(x) is
,where
For the function of m variables, the expression is
Taylor Series
)()!1(
)1(1
xfn
xR n
n
n
n
n
i
i
Rxft
ly
kx
hi
tyxfltkyhxf
1
)(...!
1),...,(),...,,(
nn
n
Rxfn
hxf
hxf
hxfhxf )(
!...)(''
!2)('
!1)()( )(
2
Math for CS Tutorial 5-6 4
Consider the elliptic function: f(x,y)=(x1-1)2+(2x2-2)2 and find the
first three terms of Taylor expansion.
2D Taylor Series: Example
3
2
22
2
2
2
!2
1
!1
1)0()( Rx
y
f
xy
fyx
f
x
f
xxx
ffxf
T
Math for CS Tutorial 5-6 5
Consider the elliptic function: f(x,y)=(x1-1)2+4(x2-2)2 and find the
first three terms of Taylor expansion.
Steepest Descent
22 82162580
02),(1625),( vhvh
v
hvhvhvhf
1
2
-f’(0)
Math for CS Tutorial 5-6 6
In Lecture 5 we have seen that the steepest descent method can suffer from slow convergence. Newton’s method fixes this problem for cases, where the function f(x) near x* can be approximated by a paraboloid:
,where
and
Newton’s Method
(1)
Math for CS Tutorial 5-6 7
Here gk is the gradient and Qk is the Hessian of the function f,
evaluated at xk . They appear in the 2nd and 3rd terms of the Taylor
expansion of f(xk). Minimum of the function should require:
The solution of this equation gives the step direction and the step
size towards the minimum of (2), which is, presumably, close to
the minimum of f(x). The minimization algorithm in which
xk+1=y(xk)=xk+∆, with ∆ defined by (2) is called a Newton’s
method.
Newton’s Method 2
(2)
Math for CS Tutorial 5-6 8
Consider the same elliptic function: f(x,y)=(x1-1)2+4(x2-2)2 and
find the first step for Newton’s Method.
Newton’s Method: Example
)16,2( kg
1
2
-f’(0)
80
02Q
)2,1(x
Math for CS Tutorial 5-6 9
Conjugate Gradient
Suppose that we want to minimize the quadratic function
where Q is a symmetric, positive definite matrix, and x has n components. As we saw in explanation of steepest descent, the minimum x* is the solution to the linear system
The explicit solution of this system requires about O(n3) operations and O(n2) memory, what is very expensive.
Math for CS Tutorial 5-6 10
Conjugate Gradients 2
We now consider an alternative solution method that does not need Q, but only the gradient of f(xk)
evaluated at n different points x1 , . . ., xn.
Conjugate Gradient
Gradient
Math for CS Tutorial 5-6 11
Conjugate Gradients 3
Consider the case n = 3, in which the variable x in f(x) is a three-
dimensional vector . Then the quadratic function f(x) is constant
over ellipsoids, called isosurfaces, centered at the minimum x* .
How can we start from a point xo on one of these ellipsoids and
reach x* by a finite sequence of one-dimensional searches? In the
steepest descent, for the poorly conditioned Hessians orthogonal
directions lead to many small steps, that is, to slow convergence.
Math for CS Tutorial 5-6 12
Conjugate Gradients: Spherical Case
When the ellipsoids are spheres, on the other hand, the convergence is
much faster: first step takes from xo to x1 , and the line between xo and
x1 is tangent to an isosurface at x1 . The next step is in the direction of
the gradient, takes us to x* right away. Suppose however that we
cannot afford to compute this special direction p1 orthogonal to po, but
that we can only compute some direction p1 orthogonal to po (there is
an n-1 -dimensional space of such directions!) and reach the minimum
of f(x) in this direction.
In that case n steps will take us to x* of the sphere, since coordinate of
the minimum in each on the n directions is independent of others.
Math for CS Tutorial 5-6 13
Conjugate Gradients: Elliptical Case
Any set of orthogonal directions, with a line search in each direction,
will lead to the minimum for spherical isosurfaces. Given an arbitrary
set of ellipsoidal isosurfaces, there is a one-to-one mapping with a
spherical system: if Q = UEUT is the SVD of the symmetric, positive
definite matrix Q, then we can write
,where
(4)
(5)
Math for CS Tutorial 5-6 14
Elliptical Case 2
Consequently, there must be a condition for the original problem (in
terms of Q) that is equivalent to orthogonality for the spherical
problem. If two directions qi and qj are orthogonal in the spherical
context, that is, if
what does this translate into in terms of the directions pi and pj for the
ellipsoidal problem? We have
(6)
Math for CS Tutorial 5-6 15
Elliptical Case 3
Consequently,
What is
This condition is called Q-conjugacy, or Q-orthogonality : if equation
(7) holds, then pi and pj are said to be Q-conjugate or Q-orthogonal to
each other. Or simply say "conjugate".
(7)
Math for CS Tutorial 5-6 16
Elliptical Case 4
In summary, if we can find n directions po, . . .,pn_1 that are mutually
conjugate, i.e. comply with (7), and if we do line minimization along
each direction pk, we reach the minimum in at most n steps. Of course,
we cannot use the transformation (5) in the algorithm, because E and
especially UT are too large. So we need to find a method for generating
n conjugate directions without using either Q or its SVD .
Math for CS Tutorial 5-6 17
Hestenes Stiefel Procedure
Where
Math for CS Tutorial 5-6 18
Hestenes Stiefel Procedure 2
It is simple to see that pk and pk+1 are conjugate. In fact,
The proof that pi and pk+1 for i = 0, . . . , k are also conjugate can be
done by induction, based on the observation that the vectors pk are
found by a generalization of Gram-Schmidt to produce conjugate
rather than orthogonal vectors.
Math for CS Tutorial 5-6 19
Removing the Hessian
In the described algorithm the expression for yk contains the Hessian Q, which is too large. We now show that yk can be rewritten in terms of the gradient values gk and gk+1 only. To this end, we noticeThat
Or
Proof:
So that
Math for CS Tutorial 5-6 20
We can therefore write
and Q has disappeared .
This expression for yk can be further simplified by noticing that
because the line along pk is tangent to an isosurface at xk+l , while
the gradient gk+l is orthogonal to the isosurface at xk +l.
Removing the Hessian 2
Math for CS Tutorial 5-6 21
Similarly,
Then, the denominator of yk becomes
In conclusion, we obtain the Polak-Ribiere formula
Polak-Ribiere formula