CS B553: Algorithms for Optimization and Learning

CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNINGGradient descent

KEY CONCEPTS

Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate

descent Random restarts

Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

Gradient descent: iteratively move in direction

Line search: pick step size to lead to decrease in function value

(Use your favorite univariate optimization method)

f(x-af(x))

GRADIENT DESCENT PSEUDOCODE

Input: f, starting value x1, termination tolerances

For t=1,2,…,maxIters: Compute the search direction dt = -f(xt) If ||dt||< εg then:

return “Converged to critical point”, output xt

Find t so that f(xt+t dt) < f(xt) using line search If ||t dt||< εx then:

return “Converged in x”, output xt

Let xt+1 = xt+t dt

Return “Max number of iterations reached”, output xmaxIters

RELATED METHODS

Steepest descent (discrete) Coordinate descent

Many local minima: good initialization, or random restarts

Documents