CS B553: Algorithms for Optimization and Learning

Post on 31-Dec-2015

37 views 0 download

Tags:

description

CS B553: Algorithms for Optimization and Learning. Gradient descent. Key Concepts. Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate descent Random restarts. - PowerPoint PPT Presentation

Transcript of CS B553: Algorithms for Optimization and Learning

CS B553: ALGORITHMS FOR OPTIMIZATION AND LEARNINGGradient descent

KEY CONCEPTS

Gradient descent Line search Convergence rates depend on scaling Variants: discrete analogues, coordinate

descent Random restarts

Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

Gradient direction is orthogonal to the level sets (contours) of f,points in direction of steepest increase

Gradient descent: iteratively move in direction

Gradient descent: iteratively move in direction

Gradient descent: iteratively move in direction

Gradient descent: iteratively move in direction

Gradient descent: iteratively move in direction

Gradient descent: iteratively move in direction

Gradient descent: iteratively move in direction

Line search: pick step size to lead to decrease in function value

Line search: pick step size to lead to decrease in function value

(Use your favorite univariate optimization method)

a

f(x-af(x))

*a

GRADIENT DESCENT PSEUDOCODE

Input: f, starting value x1, termination tolerances

For t=1,2,…,maxIters: Compute the search direction dt = -f(xt) If ||dt||< εg then:

return “Converged to critical point”, output xt

Find t so that f(xt+t dt) < f(xt) using line search If ||t dt||< εx then:

return “Converged in x”, output xt

Let xt+1 = xt+t dt

Return “Max number of iterations reached”, output xmaxIters

RELATED METHODS

Steepest descent (discrete) Coordinate descent

Many local minima: good initialization, or random restarts