Ellipsoid Method - Stanford University · 2003. 9. 18. · Ellipsoid Method †ellipsoidmethod...

29
Ellipsoid Method ellipsoid method convergence proof inequality constraints feasibility problems Prof. S. Boyd, EE392o, Stanford University

Transcript of Ellipsoid Method - Stanford University · 2003. 9. 18. · Ellipsoid Method †ellipsoidmethod...

  • Ellipsoid Method

    • ellipsoid method

    • convergence proof

    • inequality constraints

    • feasibility problems

    Prof. S. Boyd, EE392o, Stanford University

  • Challenges in cutting-plane methods

    • can be difficult to compute appropriate next query point

    • localization polyhedron grows in complexity as algorithm progresses

    can get around these challenges . . .

    ellipsoid method is another approach

    • developed in 70s by Shor and Yudin

    • used in 1979 by Khachian to give polynomial time algorithm for LP

    Prof. S. Boyd, EE392o, Stanford University 1

  • Ellipsoid algorithm

    idea: localize x? in an ellipsoid instead of a polyhedron

    1. at iteration k we know x? ∈ E(k)

    2. set x(k+1) := center(E (k)); evaluate ∇f(x(k+1)) (or g(k) ∈ ∂f(x(k+1)))

    3. hence we know

    x? ∈ E(k) ∩ {z | ∇f(x(k+1))T (z − x(k+1)) ≤ 0}

    (a half-ellipsoid)

    4. set E(k+1) := minimum volume ellipsoid coveringE(k) ∩ {z | ∇f(x(k+1))T (z − x(k+1)) ≤ 0}

    Prof. S. Boyd, EE392o, Stanford University 2

  • PSfrag replacements

    E(k)

    x(k+1)

    ∇f(x(k+1))

    E(k+1)

    compared to cutting-plane method:

    • localization set doesn’t grow more complicated

    • easy to compute query point

    • but, we add unnecessary points in step 4

    Prof. S. Boyd, EE392o, Stanford University 3

  • Properties of ellipsoid method

    • reduces to bisection for n = 1

    • simple formula for E (k+1) given E(k), ∇f(x(k+1))

    • E(k+1) can be larger than E (k) in diameter (max semi-axis length), butis always smaller in volume

    • vol(E(k+1)) < e−12n vol(E(k))

    (note that volume reduction factor depends on n)

    Prof. S. Boyd, EE392o, Stanford University 4

  • Example

    PSfrag replacements

    px(0)

    PSfrag replacements

    px(1)

    PSfrag replacements

    px(2)

    Prof. S. Boyd, EE392o, Stanford University 5

  • PSfrag replacements

    px(3)

    PSfrag replacements

    px(4)

    PSfrag replacements

    px(5)

    Prof. S. Boyd, EE392o, Stanford University 6

  • Updating the ellipsoid

    E(x,A) ={

    z | (z − x)TA−1(z − x) ≤ 1}

    PSfrag replacements rxrx+

    r

    ¡¡¡ª

    E

    @@@R

    E+g

    Prof. S. Boyd, EE392o, Stanford University 7

  • (for n > 1) minimum volume ellipsoid containing

    E ∩{

    z | gT (z − x) ≤ 0}

    is given by

    x+ = x−1

    n + 1Ag̃

    A+ =n2

    n2 − 1

    (

    A−2

    n + 1Ag̃g̃TA

    )

    where g̃∆= g

    /

    gTAg

    Prof. S. Boyd, EE392o, Stanford University 8

  • Stopping criterion

    x? ∈ Ek, so

    f(x?) ≥ f(x(k)) +∇f(x(k))T (x? − x(k))

    ≥ f(x(k)) + infx∈E(k)

    ∇f(x(k))T (x− x(k))

    = f(x(k))−√

    ∇f(x(k))TA(k)∇f(x(k))

    simple stopping criterion:

    ∇f(x(k))TA(k)∇f(x(k)) ≤ ²

    Prof. S. Boyd, EE392o, Stanford University 9

  • PSfrag replacements

    AK

    f(x(k))−√

    ∇f(x(k))TA(k)∇f(x(k))

    ¡ªf(x(k))

    f?

    k0 5 10 15 20 25 30

    Prof. S. Boyd, EE392o, Stanford University 10

  • more sophisticated stopping criterion: Uk − Lk ≤ ², where

    Uk = mini≤k

    f(x(i))

    Lk = maxi≤k

    (

    f(x(i))−√

    ∇f(x(i))TA(i)∇f(x(i))

    )

    Prof. S. Boyd, EE392o, Stanford University 11

  • PSfrag replacements

    @ILk

    ¡ªUk

    f?

    k0 5 10 15 20 25 30

    Prof. S. Boyd, EE392o, Stanford University 12

  • Basic ellipsoid algorithm

    ellipsoid described as E(x,A) = { z | (z − x)TA−1(z − x) ≤ 1 }

    given ellipsoid E(x,A) containing x?, accuracy ² > 0

    repeat1. evaluate ∇f(x) (or g ∈ ∂f(x))

    2. if√

    ∇f(x)TA∇f(x) ≤ ², return(x)3. update ellipsoid

    3a. g̃ := ∇f(x)/

    ∇f(x)TA∇f(x)

    3b. x := x− 1n+1Ag̃

    3c. A := n2

    n2−1

    (

    A− 2n+1Ag̃g̃TA

    )

    Prof. S. Boyd, EE392o, Stanford University 13

  • Interpretation

    • change coordinates so uncertainty (E) is unit ball

    • take gradient (or subgradient) step with fixed length 1/(n + 1)

    properties:

    • can propagate Cholesky factor of A; get O(n2) update

    • not a descent method

    • often slow but robust in practice

    Prof. S. Boyd, EE392o, Stanford University 14

  • Proof of convergence

    assumptions:

    • f is Lipschitz: |f(y)− f(x)| ≤ G‖y − x‖

    • E(0) is ball with radius R

    suppose f(x(i)) > f? + ², i = 0, . . . , k

    thenf(x) ≤ f? + ² =⇒ x ∈ E (k)

    since at iteration i we only discard points with f ≥ f(x(i))

    Prof. S. Boyd, EE392o, Stanford University 15

  • from Lipschitz condition,

    ‖x− x?‖ ≤ ²/G =⇒ f(x) ≤ f? + ² =⇒ x ∈ E (k)

    so B = {x | ‖x− x?‖ ≤ ²/G} ⊆ E(k)

    hence vol(B) ≤ vol(E (k)), so

    βn(²/G)n ≤ e−k/2n vol(E(0)) = e−k/2nβnR

    n

    (βn is volume of unit ball in Rn)

    therefore k ≤ 2n2 log(RG/²)

    Prof. S. Boyd, EE392o, Stanford University 16

  • PSfrag replacements

    E(0)

    E(k)

    x(k)

    f(x) ≤ f? + ²

    B = {x | ‖x− x?‖ ≤ ²/G}

    x?

    conclusion: for K > 2n2 log(RG/²),

    mini=0,...,K

    f(x(i)) ≤ f? + ²

    Prof. S. Boyd, EE392o, Stanford University 17

  • Interpretation of complexity

    since x? ∈ E0 = {x | ‖x− x(0)‖ ≤ R}, our prior knowledge of f? is

    f? ∈ [f(x(0))−GR, f(x(0))]

    our prior uncertainty in f? is GR

    after k iterations our knowledge of f? is

    f? ∈

    [

    mini=0,...,k

    f(x(i))− ², mini=0,...,k

    f(x(i))

    ]

    posterior uncertainty in f? is ≤ ²

    Prof. S. Boyd, EE392o, Stanford University 18

  • iterations required:

    2n2 logRG

    ²= 2n2 log

    prior uncertainty

    posterior uncertainty

    efficiency: 0.72/n2 bits per gradient evaluation (degrades with n)

    Prof. S. Boyd, EE392o, Stanford University 19

  • Inequality constrained problems

    minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . ,m

    same idea: maintain ellipsoids E (k) that

    • contain x?

    • decrease in volume to zero

    Prof. S. Boyd, EE392o, Stanford University 20

  • case 1: x(k) feasible, i.e., fi(x(k)) ≤ 0, i = 1, . . . ,m

    • then do usual update of E (k) based on ∇f0(x(k))

    • rules out halfspace of points with larger function value than currentpoint

    case 2: x(k) infeasible, say, fj(x(k)) > 0;

    • then ∇fj(x(k))T (x− x(k)) ≥ 0 =⇒ fj(x) > 0 =⇒ x infeasible so

    update E(k) based on ∇fj(x(k))

    • rules out halfspace of infeasible points

    Prof. S. Boyd, EE392o, Stanford University 21

  • Example

    PSfrag replacements

    ªf1(x) = 0

    px(0)

    ∇f1(x(0))

    PSfrag replacements

    px(1)∇f0(x

    (1))

    PSfrag replacements

    px(2)

    ∇f0(x(2))

    Prof. S. Boyd, EE392o, Stanford University 22

  • PSfrag replacementspx(3)

    ∇f1(x(3))

    PSfrag replacements

    px(4)∇f0(x

    (4))

    PSfrag replacements

    px(5)

    ∇f0(x(5))

    Prof. S. Boyd, EE392o, Stanford University 23

  • Stopping criterion

    if x(k) is feasible, we have a lower bound on f? as before:

    f? ≥ f(x(k))−√

    ∇f(x(k))TA(k)∇f(x(k))

    if x(k) is infeasible, we have for all x ∈ E (k)

    fj(x) ≥ fj(x(k)) +∇fj(x

    (k))T (x− x(k))

    ≥ fj(x(k)) + inf

    x∈E(k)∇fj(x

    (k))T (x− x(k))

    = fj(x(k))−

    ∇fj(x(k))TA(k)∇fj(x(k))

    Prof. S. Boyd, EE392o, Stanford University 24

  • hence, problem is infeasible if for some j,

    fj(x(k))−

    ∇fj(x(k))TA(k)∇fj(x(k)) > 0

    stopping criteria:

    • if x(k) is feasible and√

    ∇f0(x(k))TA(k)∇f0(x(k)) ≤ ²(x(k) is ²-suboptimal)

    • if fj(x(k))−

    ∇fj(x(k))TA(k)∇fj(x(k)) > 0(problem is infeasible)

    Prof. S. Boyd, EE392o, Stanford University 25

  • Ellipsoid method for feasibility

    abstract feasibility problem: find x ∈ C ⊂ Rn or determine C = ∅

    separating hyperplane oracle: for any x, oracle either

    • confirms x ∈ C, or

    • returns g 6= 0 s.t. z ∈ C ⇒ gT (z − x) ≤ 0

    PSfrag replacementsx(k)

    E(k+1)

    E(k)

    C

    g(k)

    Prof. S. Boyd, EE392o, Stanford University 26

  • start with E(0) which intersects C

    1. If x(k) := center(E (k)) ∈ C, quit. Else, compute g 6= 0, s.t.x ∈ C ⇒ gT (x− x(k)) ≤ 0

    2. E(k+1) := minimum volume ellipsoid covering

    E(k) ∩ {z | gT (z − x(k)) ≤ 0}

    Prof. S. Boyd, EE392o, Stanford University 27

  • Example

    PSfrag replacements ••PSfrag replacements ••

    PSfrag replacements•

    •PSfrag replacements

    ••

    Prof. S. Boyd, EE392o, Stanford University 28