Sample Average Approximation (SAA) for Stochastic Programs · Sample Average Approximation (SAA)...

Post on 20-Jul-2020

23 views 1 download

Transcript of Sample Average Approximation (SAA) for Stochastic Programs · Sample Average Approximation (SAA)...

Sample Average Approximation (SAA)

for Stochastic Programs

with an eye towards computational SAA

Dave Morton

Industrial Engineering & Management Sciences

Northwestern University

Outline

• SAA

– Results for Monte Carlo estimators: no optimization

– What results should we want for SAA?

– Results for SAA

1. Bias

2. Consistency

3. Central limit theorem (CLT)

• SAA Algorithm

– A basic algorithm

– A sequential algorithm

• Multi-Stage Problems

• What We Didn’t Discuss

Stochastic Programming Models

z∗ = minx∈X

E f (x, ξ)

Such problems arise in statistics, simulation and mathematical programming

Our focus: mathematical programming with X deterministic

We’ll assume:

(A1) X 6= ∅ and compact

(A2) Ef (·, ξ) is lower semicontinuous

(A3) E supx∈X

f 2(x, ξ) <∞

ξ is a random vector and Pξ 6= Pξ(x)

We can evaluate f (x, ξ(ω)) for a fixed x and realization ξ(ω)

Choice of f determines problem class

Sample Average Approximation

• True or population problem:

z∗ = minx∈X

Ef (x, ξ) (SP )

Denote optimal solution x∗

• SAA problem:

z∗n = minx∈X

1

n

n∑j=1

f (x, ξj)

︸ ︷︷ ︸

fn(x)

(SP n)

Here, ξ1, ξ2, . . . , ξn iid as ξ or sampled another way. Denote optimal solution x∗n

• View z∗n as an estimator of z∗ and x∗n as an estimator of x∗

Want names? external sampling method, sample-path optimization, stochastic counterpart,

retrospective optimization, non-recursive method, and sample average approximation.

Let’s start in a simpler setting,

momentarily putting aside optimization. . .

Monte Carlo Sampling

Suppressing the (fixed) decision x

Let z = Ef (ξ), σ2 = varf (ξ) <∞ and ξ1, ξ2, . . . , ξn be iid as ξ

Let zn =1

n

n∑i=1

f (ξi) be the sample mean estimator of z

FACT 1. Ezn = z zn is an unbiased estimator of z

FACT 2. zn → z, wp1 (strong LLN) zn is a strongly consistent estimator of z

FACT 3.√n(zn − z)⇒ N(0, σ2) (CLT) Rate of convergence is 1/

√n and scaled

difference is normally distributed

FACTS 4,5,. . . law of iterated logarithm, concentration inequalities,. . .

Do such results carry over to SAA?

SAA

• Population problem:

z∗ = minx∈X

Ef (x, ξ) (SP )

Denote optimal solution x∗

• SAA problem:

z∗n = minx∈X

1

n

n∑j=1

f (x, ξj)

︸ ︷︷ ︸

fn(x)

(SP n)

Denote optimal solution x∗n

• View z∗n as an estimator of z∗ and x∗n as an estimator of x∗

• What can we say about z∗n and x∗n as n→∞?

• What should we want to say about z∗n and x∗n as n→∞?

SAA: Possible Goals

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ N(0,Σ)

SAA: Possible Goals

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ N(0,Σ)

2. z∗n → z∗, wp1 and√n(z∗n − z∗)⇒ N(0, σ2)

SAA: Possible Goals

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ N(0,Σ)

2. z∗n → z∗, wp1 and√n(z∗n − z∗)⇒ N(0, σ2)

3. Ef(x∗n, ξ)→ z∗,wp1

SAA: Possible Goals

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ N(0,Σ)

2. z∗n → z∗, wp1 and√n(z∗n − z∗)⇒ N(0, σ2)

3. Ef(x∗n, ξ)→ z∗,wp1

4. limn→∞

P (Ef(x∗n, ξ)− z∗ ≤ εn) ≥ 1− α where εn → 0

SAA: Possible Goals

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ N(0,Σ)

2. z∗n → z∗, wp1 and√n(z∗n − z∗)⇒ N(0, σ2)

3. Ef(x∗n, ξ)→ z∗,wp1

4. limn→∞

P (Ef(x∗n, ξ)− z∗ ≤ εn) ≥ 1− α where εn → 0

Modeling Issues:

• If (SPn) is for maximum-likelihood estimation then goal 1 could be appropriate

• If (SP ) is to price a financial option then goal 2 could be appropriate

•When (SP ) is a decision-making model, 1 may be more than we need and 2 is of secondaryinterest. Goals 3 and 4 arguably suffice

SAA: Possible Goals

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ N(0,Σ)

2. z∗n → z∗, wp1 and√n(z∗n − z∗)⇒ N(0, σ2)

3. Ef(x∗n, ξ)→ z∗,wp1

4. limn→∞

P (Ef(x∗n, ξ)− z∗ ≤ εn) ≥ 1− α where εn → 0

Modeling Issues:

• If (SPn) is for maximum-likelihood estimation then goal 1 could be appropriate

• If (SP ) is to price a financial option then goal 2 could be appropriate

•When (SP ) is a decision-making model, 1 may be more than we need and 2 is of secondaryinterest. Goals 3 and 4 arguably suffice

Technical Issues:

• In general, we shouldn’t expect {x∗n}∞n=1 to converge when (SP ) has multiple optimal solutions.In this case, we want: “limit points of {x∗n}∞n=1 solve (SP ) ”

• If we achieve “limit points” result, X is compact & Ef(·, ξ) is continuous, then we obtain goal 3

•The limiting distributions may not be normal

SAA: Possible Goals1

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ N(0,Σ)

2. z∗n → z∗, wp1 and√n(z∗n − z∗)⇒ N(0, σ2)

3. Ef(x∗n, ξ)→ z∗,wp1

4. limn→∞

P (Ef(x∗n, ξ)− z∗ ≤ εn) ≥ 1− α where εn → 0

Modeling Issues:

• If (SPn) is for maximum-likelihood estimation then goal 1 could be appropriate

• If (SP ) is to price a financial option then goal 2 could be appropriate

•When (SP ) is a decision-making model, 1 may be more than we need and 2 is of secondaryinterest. Goals 3 and 4 arguably suffice

Technical Issues:

• In general, we shouldn’t expect {x∗n}∞n=1 to converge when (SP ) has multiple optimal solutions.In this case, we want: “limit points of {x∗n}∞n=1 solve (SP ) ”

• If we achieve “limit points” result, X is compact & Ef(·, ξ) is continuous, then we obtain goal 3

•The limiting distributions may not be normal

1Again, these goals aren’t true in general; i.e., they may be impossible goals.

1. Bias

2. Consistency

3. CLT

SAA: Example

z∗ = min−1≤x≤1

[E f(x, ξ) = Eξx] , where ξ ∼ N(0, 1)

Every feasible solution, x ∈ [−1, 1] is optimal and z∗ = 0

SAA: Example

z∗ = min−1≤x≤1

[E f(x, ξ) = Eξx] , where ξ ∼ N(0, 1)

Every feasible solution, x ∈ [−1, 1] is optimal and z∗ = 0

z∗n = min−1≤x≤1

(1

n

n∑j=1

ξj

)x

x∗n = ±1, z∗n = − |N(0, 1/n)|

SAA: Example

z∗ = min−1≤x≤1

[E f(x, ξ) = Eξx] , where ξ ∼ N(0, 1)

Every feasible solution, x ∈ [−1, 1] is optimal and z∗ = 0

z∗n = min−1≤x≤1

(1

n

n∑j=1

ξj

)x

x∗n = ±1, z∗n = − |N(0, 1/n)|

Observations

1. Ez∗n ≤ z∗ ∀n (negative bias)

2. Ez∗n ≤ Ez∗n+1 ∀n (monotonically shrinking bias)

3. z∗n → z∗,wp1 (strongly consistent)

4.√n(z∗n − z∗) = − |N(0, 1)| (non-normal errors)

5. b(z∗n) ≡ Ez∗n − z∗ = a/√n (O(n−1/2) bias)

SAA: Example

z∗ = min−1≤x≤1

[E f(x, ξ) = Eξx] , where ξ ∼ N(0, 1)

Every feasible solution, x ∈ [−1, 1] is optimal and z∗ = 0

z∗n = min−1≤x≤1

(1

n

n∑j=1

ξj

)x

x∗n = ±1, z∗n = − |N(0, 1/n)|

Observations

1. Ez∗n ≤ z∗ ∀n (negative bias)

2. Ez∗n ≤ Ez∗n+1 ∀n (monotonically shrinking bias)

3. z∗n → z∗,wp1 (strongly consistent)

4.√n(z∗n − z∗) = − |N(0, 1)| (non-normal errors)

5. b(z∗n) ≡ Ez∗n − z∗ = a/√n (O(n−1/2) bias)

So, optimization changes the nature of sample-mean estimators.

SAA: Example

z∗ = min−1≤x≤1

[E f(x, ξ) = Eξx] , where ξ ∼ N(0, 1)

Every feasible solution, x ∈ [−1, 1] is optimal and z∗ = 0

z∗n = min−1≤x≤1

(1

n

n∑j=1

ξj

)x

x∗n = ±1, z∗n = − |N(0, 1/n)|

Observations

1. Ez∗n ≤ z∗ ∀n (negative bias)

2. Ez∗n ≤ Ez∗n+1 ∀n (monotonically shrinking bias)

3. z∗n → z∗,wp1 (strongly consistent)

4.√n(z∗n − z∗) = − |N(0, 1)| (non-normal errors)

5. b(z∗n) ≡ Ez∗n − z∗ = a/√n (O(n−1/2) bias)

So, optimization changes the nature of sample-mean estimators.

Note: What if x ∈ [−1, 1] is replaced by x ∈ R? SAA fails, spectacularly.

1. Bias

2. Consistency

3. CLT

1. Bias

All you need to know: minx∈X

[f (x) + g(x)] ≥ minx∈X

f (x) + minx∈X

g(x)

SAA: Bias

Theorem.

Assume (A1), (A2), and Efn(x) = Ef (x, ξ), ∀x ∈ X. Then, Ez∗n ≤ z∗.

If, in addition, ξ1, ξ2, . . . , ξn are iid then Ez∗n ≤ Ez∗n+1.

SAA: Bias

Theorem.

Assume (A1), (A2), and Efn(x) = Ef (x, ξ), ∀x ∈ X. Then, Ez∗n ≤ z∗.

If, in addition, ξ1, ξ2, . . . , ξn are iid then Ez∗n ≤ Ez∗n+1.

Notes:

• First result does not require iid realizations, just an unbiased estimator

• Hypothesis can be relaxed to: Efn(x) ≤ Ef (x, ξ), ∀x ∈ X

• Hypothesis can be relaxed to: ξ1, ξ2, . . . , ξn are exchangeable random variables

Proof of Bias Result

E

1

n

n∑j=1

f (x, ξj)

= E f (x, ξ)

Proof of Bias Result

minx∈X

E

1

n

n∑j=1

f (x, ξj)

= minx∈X

E f (x, ξ)

Proof of Bias Result

minx∈X

E

1

n

n∑j=1

f (x, ξj)

= minx∈X

E f (x, ξ) = z∗

Proof of Bias Result

minx∈X

E

1

n

n∑j=1

f (x, ξj)

= minx∈X

E f (x, ξ) = z∗

and so we obtain

E

minx∈X

1

n

n∑j=1

f (x, ξj)

≤ minx∈X

E f (x, ξ) = z∗

Proof of Bias Result

minx∈X

E

1

n

n∑j=1

f (x, ξj)

= minx∈X

E f (x, ξ) = z∗

and so we obtain

Ez∗n = E

minx∈X

1

n

n∑j=1

f (x, ξj)

≤ minx∈X

E f (x, ξ) = z∗

Proof of Bias Result

minx∈X

E

1

n

n∑j=1

f (x, ξj)

= minx∈X

E f (x, ξ) = z∗

and so we obtain

Ez∗n = E

minx∈X

1

n

n∑j=1

f (x, ξj)

≤ minx∈X

E f (x, ξ) = z∗

Proof of Bias Result

minx∈X

E

1

n

n∑j=1

f (x, ξj)

= minx∈X

E f (x, ξ) = z∗

and so we obtain

Ez∗n = E

minx∈X

1

n

n∑j=1

f (x, ξj)

≤ minx∈X

E f (x, ξ) = z∗

Aside: Simple example when n = 1

Eminx∈X

f (x, ξ) ≤ minx∈X

Ef (x, ξ)

Proof of Bias Result

minx∈X

E

1

n

n∑j=1

f (x, ξj)

= minx∈X

E f (x, ξ) = z∗

and so we obtain

Ez∗n = E

minx∈X

1

n

n∑j=1

f (x, ξj)

≤ minx∈X

E f (x, ξ) = z∗

Aside: Simple example when n = 1

Eminx∈X

f (x, ξ) ≤ minx∈X

Ef (x, ξ)

Interpretation: We’ll do better if we “wait and see” ξ’s realization before choosing x

Next, we show bias decreases monotonically:

Ez∗n ≤ Ez∗n+1

Intuition. . .

Proof of Bias Monotonicity Result

Ez∗n+1 = E minx∈X

[1

n + 1

n+1∑i=1

f (x, ξi)

]

= E minx∈X

1

n + 1

n+1∑i=1

1

n

n+1∑j=1,j 6=i

f (x, ξj)

Proof of Bias Monotonicity Result

Ez∗n+1 = E minx∈X

[1

n + 1

n+1∑i=1

f (x, ξi)

]

= E minx∈X

1

n + 1

n+1∑i=1

1

n

n+1∑j=1,j 6=i

f (x, ξj)

≥ E

1

n + 1

n+1∑i=1

minx∈X

1

n

n+1∑j=1, j 6=i

f (x, ξj)

Proof of Bias Monotonicity Result

Ez∗n+1 = E minx∈X

[1

n + 1

n+1∑i=1

f (x, ξi)

]

= E minx∈X

1

n + 1

n+1∑i=1

1

n

n+1∑j=1,j 6=i

f (x, ξj)

≥ E

1

n + 1

n+1∑i=1

minx∈X

1

n

n+1∑j=1, j 6=i

f (x, ξj)

=1

n + 1

n+1∑i=1

E minx∈X

1

n

n+1∑j=1, j 6=i

f (x, ξj)

Proof of Bias Monotonicity Result

Ez∗n+1 = E minx∈X

[1

n + 1

n+1∑i=1

f (x, ξi)

]

= E minx∈X

1

n + 1

n+1∑i=1

1

n

n+1∑j=1,j 6=i

f (x, ξj)

≥ E

1

n + 1

n+1∑i=1

minx∈X

1

n

n+1∑j=1, j 6=i

f (x, ξj)

=1

n + 1

n+1∑i=1

E minx∈X

1

n

n+1∑j=1, j 6=i

f (x, ξj)

= Ez∗n

X Bias

2. Consistency: z∗n and x∗n

3. CLT

2. Consistency of z∗n

All you need to know:

Ef (x∗, ξ) ≤ Ef (x∗n, ξ) and fn(x∗n) ≤ fn(x∗)

SAA: Consistency of z∗n

Theorem. Assume (A1), (A2), and the USLLN:

limn→∞

supx∈X

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1.

Then, z∗n → z∗, wp1.

SAA: Consistency of z∗n

Theorem. Assume (A1), (A2), and the USLLN:

limn→∞

supx∈X

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1.

Then, z∗n → z∗, wp1.

Notes:

• Does not assume ξ1, ξ2, . . . ξn are iid

• Instead, assumes uniform strong law of large numbers (USLLN)

SAA: Consistency of z∗n

Theorem. Assume (A1), (A2), and the USLLN:

limn→∞

supx∈X

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1.

Then, z∗n → z∗, wp1.

Notes:

• Does not assume ξ1, ξ2, . . . ξn are iid

• Instead, assumes uniform strong law of large numbers (USLLN)

• Important to realize:

limn→∞

supx∈X

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1.

⇒limn→∞

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1,∀x ∈ X

But, the converse is false. Think of our example: fn(x) = ξn · x and X = R

Proof of consistency of z∗n

|z∗n − z∗| =∣∣fn(x∗n)− Ef (x∗, ξ)

∣∣

Proof of consistency of z∗n

|z∗n − z∗| =∣∣fn(x∗n)− Ef (x∗, ξ)

∣∣= max

{fn(x∗n)− Ef (x∗, ξ) , Ef (x∗, ξ)− fn(x∗n)

}

Proof of consistency of z∗n

|z∗n − z∗| =∣∣fn(x∗n)− Ef (x∗, ξ)

∣∣= max

{fn(x∗n)− Ef (x∗, ξ) , Ef (x∗, ξ)− fn(x∗n)

}≤ max

{fn(x∗)− Ef (x∗, ξ) , Ef (x∗n, ξ)− fn(x∗n)

}

Proof of consistency of z∗n

|z∗n − z∗| =∣∣fn(x∗n)− Ef (x∗, ξ)

∣∣= max

{fn(x∗n)− Ef (x∗, ξ) , Ef (x∗, ξ)− fn(x∗n)

}≤ max

{fn(x∗)− Ef (x∗, ξ) , Ef (x∗n, ξ)− fn(x∗n)

}≤ max

{∣∣fn(x∗)− Ef (x∗, ξ)∣∣ , ∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣}

Proof of consistency of z∗n

|z∗n − z∗| =∣∣fn(x∗n)− Ef (x∗, ξ)

∣∣= max

{fn(x∗n)− Ef (x∗, ξ) , Ef (x∗, ξ)− fn(x∗n)

}≤ max

{fn(x∗)− Ef (x∗, ξ) , Ef (x∗n, ξ)− fn(x∗n)

}≤ max

{∣∣fn(x∗)− Ef (x∗, ξ)∣∣ , ∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣}≤ sup

x∈X

∣∣fn(x)− Ef (x, ξ)∣∣

Proof of consistency of z∗n

|z∗n − z∗| =∣∣fn(x∗n)− Ef (x∗, ξ)

∣∣= max

{fn(x∗n)− Ef (x∗, ξ) , Ef (x∗, ξ)− fn(x∗n)

}≤ max

{fn(x∗)− Ef (x∗, ξ) , Ef (x∗n, ξ)− fn(x∗n)

}≤ max

{∣∣fn(x∗)− Ef (x∗, ξ)∣∣ , ∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣}≤ sup

x∈X

∣∣fn(x)− Ef (x, ξ)∣∣

Taking n→∞ completes the proof

2. Consistency of x∗n

All you need to know:

If g is continuous and limk→∞

xk = x then limk→∞

g(xk) = g(x)

SAA: Consistency of x∗n

Theorem. Assume (A1), (A2), Ef (·, ξ) is continuous, and the USLLN:

limn→∞

supx∈X

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1.

Then, every limit point of {x∗n} solves (SP ), wp1.

SAA: Consistency of x∗n

Theorem. Assume (A1), (A2), Ef (·, ξ) is continuous, and the USLLN:

limn→∞

supx∈X

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1.

Then, every limit point of {x∗n} solves (SP ), wp1.

Notes:

• Assumes USLLN rather than assuming ξ1, ξ2, . . . ξn are iid

• And, assumes continuity of Ef (·, ξ)

• The result doesn’t say: limn→∞

x∗n = x∗, wp1. Why not?

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

(Note such as limit point exists and x ∈ X because X is compact.)

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

By the USLLN

limn→∞n∈N

fn(x∗n)︸ ︷︷ ︸≡z∗n

= z∗, wp1

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

By the USLLN

limn→∞n∈N

fn(x∗n)︸ ︷︷ ︸≡z∗n

= z∗, wp1

and ∣∣fn(x∗n)− Ef (x, ξ)∣∣ =

∣∣fn(x∗n)− Ef (x∗n, ξ) + Ef (x∗n, ξ)− Ef (x, ξ)∣∣

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

By the USLLN

limn→∞n∈N

fn(x∗n)︸ ︷︷ ︸≡z∗n

= z∗, wp1

and ∣∣fn(x∗n)− Ef (x, ξ)∣∣ =

∣∣fn(x∗n)− Ef (x∗n, ξ) + Ef (x∗n, ξ)− Ef (x, ξ)∣∣

≤∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣ + |Ef (x∗n, ξ)− Ef (x, ξ)|

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

By the USLLN

limn→∞n∈N

fn(x∗n)︸ ︷︷ ︸≡z∗n

= z∗, wp1

and ∣∣fn(x∗n)− Ef (x, ξ)∣∣ =

∣∣fn(x∗n)− Ef (x∗n, ξ) + Ef (x∗n, ξ)− Ef (x, ξ)∣∣

≤∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣ + |Ef (x∗n, ξ)− Ef (x, ξ)|

Taking n→∞ for n ∈ N . . .

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

By the USLLN

limn→∞n∈N

fn(x∗n)︸ ︷︷ ︸≡z∗n

= z∗, wp1

and ∣∣fn(x∗n)− Ef (x, ξ)∣∣ =

∣∣fn(x∗n)− Ef (x∗n, ξ) + Ef (x∗n, ξ)− Ef (x, ξ)∣∣

≤∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣ + |Ef (x∗n, ξ)− Ef (x, ξ)|

Taking n→∞ for n ∈ N . . .

First term goes to zero by USLLN

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

By the USLLN

limn→∞n∈N

fn(x∗n)︸ ︷︷ ︸≡z∗n

= z∗, wp1

and ∣∣fn(x∗n)− Ef (x, ξ)∣∣ =

∣∣fn(x∗n)− Ef (x∗n, ξ) + Ef (x∗n, ξ)− Ef (x, ξ)∣∣

≤∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣ + |Ef (x∗n, ξ)− Ef (x, ξ)|

Taking n→∞ for n ∈ N . . .

First term goes to zero by USLLN

And second goes to zero by continuity of Ef (·, ξ)

Proof of consistency of x∗n

Let x be a limit point of {xn}∞n=1 and let n ∈ N index a convergent subsequence.

By the USLLN

limn→∞n∈N

fn(x∗n)︸ ︷︷ ︸≡z∗n

= z∗, wp1

and ∣∣fn(x∗n)− Ef (x, ξ)∣∣ =

∣∣fn(x∗n)− Ef (x∗n, ξ) + Ef (x∗n, ξ)− Ef (x, ξ)∣∣

≤∣∣fn(x∗n)− Ef (x∗n, ξ)

∣∣ + |Ef (x∗n, ξ)− Ef (x, ξ)|

Taking n→∞ for n ∈ N . . .

First term goes to zero by USLLN

And second goes to zero by continuity of Ef (·, ξ)

Thus, Ef (x, ξ) = z∗

X Bias

X Consistency: z∗n and x∗n

3. CLT

X Bias

X− Consistency: z∗n and x∗n

• When does USSLN hold?

• Suppose we have a stochastic MIP, in which continuity doesn’t make sense?

Sufficient Conditions for the USLLN

Fact. 2 Assume X is compact and assume:

• f (·, ξ) is continuous, wp1, on X

• ∃ g(ξ) satisfying supx∈X|f (x, ξ)| ≤ g(ξ), wp1 and Eg(ξ) <∞

• ξ1, ξ2, . . . , ξn are iid as ξ.

Then, the USLLN holds.

2Facts are theorems that we won’t prove.

Sufficient Conditions for the USLLN

Fact. Let X be compact and convex and assume:

• f (·, ξ) is convex and continuous, wp1, on X

• the LLN holds pointwise:

limn→∞

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1,∀x ∈ X

Then, the USLLN holds.

SAA: Consistency of z∗n and x∗n under Finite X

Fact. Assume X is finite, and assume

limn→∞

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1, ∀x ∈ X.

Then, USLLN holds, z∗n → z∗, and every limit point of {x∗n} solves (SP ), wp1.

Notes:

• Ef (·, ξ) need not be continuous (would be unnatural since domain X is finite)

• Assumes pointwise LLN rather than USLLN iid

• Here

limn→∞

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1,∀x ∈ X

plus X finite implies

limn→∞

supx∈X

∣∣fn(x)− Ef (x, ξ)∣∣ = 0, wp1

SAA: Consistency of z∗n and x∗n under LSC f (·, ξ)

Fact. Assume

• ξ1, ξ2, . . . , ξn are iid as ξ

• f (·, ξ) is lower semicontinuous on X ∀ξ

• ∃ g(ξ) satisfying infx∈X

f (x, ξ) ≥ g(ξ), wp1, where E|g(ξ)| <∞.

Then, z∗n → z∗, wp1, and every limit point of {x∗n} solves (SP ), wp1.

SAA: Consistency of z∗n and x∗n under LSC f (·, ξ)

Fact. Assume

• ξ1, ξ2, . . . , ξn are iid as ξ

• f (·, ξ) is lower semicontinuous on X ∀ξ

• ∃ g(ξ) satisfying infx∈X

f (x, ξ) ≥ g(ξ), wp1, where E|g(ξ)| <∞.

Then, z∗n → z∗, wp1, and every limit point of {x∗n} solves (SP ), wp1.

• Proof relies on epi-convergence of fn(x) to Ef (x, ξ)

• Epi-convergence provides theory for approximation in optimization beyond SAA

• fn(x) convex, continuous on compact, convex X : epi-convergence ⇔ USLLN

• But, epi-convergence provides a more general framework in non-convex setting

• Epi-convergence can be viewed as precisely the relaxation of uniform convergence

that yields desired convergence results

MATHEMATICS OF OPERATIONS RESEARCH Vol. 11, No. 1, February 1986 Printed in U.S.A.

APPROXIMATION TO OPTIMIZATION PROBLEMS: AN ELEMENTARY REVIEW*

PETER KALL

Universitat Zurich

During the last two decades the concept of epi-convergence was introduced and then was used in various investigations in optimization and related areas. The aim of this review is to show in an elementary way how closely the arguments in the epi-convergence approach are related to those of the classical theory of convergence of functions.

1. Introduction. In mathematical programming problems of the type

inf (q(x) x E } (I)

have to be solved, where r C IRR and ,: F -> R are given. In designing solution methods for e it is quite common to replace the original

problem by a sequence of "approximating" problems

inf (,(x) I xE r}) (IF)

which are supposed to be easier to solve then e. To give some examples, we just mention cutting plane methods, penalty methods

and solution methods for stochastic programming problems. To simplify the presentation we restate the above problems in the usual way by

defining

f(x)= f (x) if xE, + oo else, and

fv(x) = f ^(x) if x E r,

+ oo else.

Then obviously e and 6l are equivalent to

inf f(x) and (P)

inf fv(x) ( )

respectively. In order to assure that the optimal values inf f, and the solutions x. of -if they

exist-approximate in some reasonable sense the optimal value and solution of A, we need to know in which appropriate way the functions f, should approximate f.

Since solutions Xv to problems like V are in many cases not unique, we cannot expect that the X. converge. Hence the only reasonable requirement with respect to a meaningful approximating procedure is that every accumulation point of { X} be a solution of . To assure this statement the classical type of assumption is uniform convergence of f, to f on each compact subset of R" together with some continuity off.

* Received September 4, 1984; revised November 5, 1984. AMS 1980 subject classification. Primary: 90C30. Secondary: 65K10. IAOR 1973 subject classification. Main: Programming:nonlinear. OR/MS Index 1978 subject classification. 657 Programming:nonlinear/theory. Key words. Approximation, nonlinear, optimization, epigraphs, epi-convergence.

9

0364-765X/86/1101/0009$01.25 Copyright ? 1986, The Institute of Management Sciences/Operations Research Society of America

X Bias

X Consistency: z∗n and x∗n

3. CLT

3. One-sided CLT for z∗n

All you need to know:

CLT for iidrvs and fn(x∗n) ≤ fn(x) ∀x ∈ X

SAA: Towards a CLT for z∗n

We have conditions under which z∗n − z∗ shrinks to zero

Is√n correct scaling factor so that

√n(z∗n− z∗) converges to something nontrivial?

SAA: Towards a CLT for z∗n

We have conditions under which z∗n − z∗ shrinks to zero

Is√n correct scaling factor so that

√n(z∗n− z∗) converges to something nontrivial?

Notation:

fn(x) =1

n

n∑j=1

f (x, ξj)

σ2(x) = var[f (x, ξ)]

s2n(x) =

1

n− 1

n∑j=1

[f (x, ξj)− fn(x)

]2X∗ is set of optimal solutions to (SP )

zα satisfies P(N(0, 1) ≤ zα) = 1− α

SAA: Towards a CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ X

SAA: Towards a CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

σ(x)/√n≤ fn(x)− z∗

σ(x)/√n, wp1

SAA: Towards a CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

σ(x)/√n≤ fn(x)− z∗

σ(x)/√n, wp1

Let x∗ ∈ X∗ ⊂ X .

SAA: Towards a CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

σ(x)/√n≤ fn(x)− z∗

σ(x)/√n, wp1

Let x∗ ∈ X∗ ⊂ X . Then,

P(

z∗n − z∗

σ(x∗)/√n≤ zα

)≥ P

(fn(x∗)− z∗

σ(x∗)/√n≤ zα

)

SAA: Towards a CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

σ(x)/√n≤ fn(x)− z∗

σ(x)/√n, wp1

Let x∗ ∈ X∗ ⊂ X . Then,

P(

z∗n − z∗

σ(x∗)/√n≤ zα

)≥ P

(fn(x∗)− z∗

σ(x∗)/√n≤ zα

)By CLT for iidrvs

limn→∞

P(fn(x∗)− z∗

σ(x∗)/√n≤ zα

)= 1− α

Thus . . .

SAA: One-sided CLT for z∗n

Theorem. Assume a pointwise CLT:

limn→∞

P(fn(x)− Ef (x, ξ)

σ(x)/√n

≤ u

)= P(N(0, 1) ≤ u),∀x ∈ X.

Let x∗ ∈ X∗. Then,

lim infn→∞

P(

z∗n − z∗

σ(x∗)/√n≤ zα

)≥ 1− α.

Notes:

• (A3) and ξ1, ξ2, . . . , ξn iid as ξ suffice for pointwise CLT. Other possibilities, too

• For sufficiently large n, we infer that

P{z∗n − zασ(x∗)/

√n ≤ z∗

}≥ 1− α

• Of course, we don’t know σ(x∗), and so this is practically useless. But. . .

SAA: Towards (a better) CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ X

SAA: Towards (a better) CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

sn(x∗n)/√n≤ fn(x)− z∗

sn(x∗n)/√n, wp1

SAA: Towards (a better) CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

sn(x∗n)/√n≤ fn(x)− z∗

sn(x∗n)/√n, wp1

Let x = x∗min ∈ arg minx∈X∗ σ2(x).

SAA: Towards (a better) CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

sn(x∗n)/√n≤ fn(x)− z∗

sn(x∗n)/√n, wp1

Let x = x∗min ∈ arg minx∈X∗ σ2(x). Then,

P(

z∗n − z∗

sn(x∗n)/√n≤ zα

)≥ P

(fn(x∗min)− z∗

sn(x∗n)/√n≤ zα

)

SAA: Towards (a better) CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

sn(x∗n)/√n≤ fn(x)− z∗

sn(x∗n)/√n, wp1

Let x = x∗min ∈ arg minx∈X∗ σ2(x). Then,

P(

z∗n − z∗

sn(x∗n)/√n≤ zα

)≥ P

(fn(x∗min)− z∗

sn(x∗n)/√n≤ zα

)= P

(fn(x∗min)− z∗

σ(x∗min)/√n≤ zα

[sn(x∗n)

σ(x∗min)

])

SAA: Towards (a better) CLT for z∗n

z∗n = fn(x∗n) ≤ fn(x), wp1, ∀x ∈ Xand so

z∗n − z∗

sn(x∗n)/√n≤ fn(x)− z∗

sn(x∗n)/√n, wp1

Let x = x∗min ∈ arg minx∈X∗ σ2(x). Then,

P(

z∗n − z∗

sn(x∗n)/√n≤ zα

)≥ P

(fn(x∗min)− z∗

sn(x∗n)/√n≤ zα

)= P

(fn(x∗min)− z∗

σ(x∗min)/√n≤ zα

[sn(x∗n)

σ(x∗min)

])If zα > 0 and lim inf

n→∞sn(x∗n) ≥ inf

x∈X∗σ(x) then. . .

lim infn→∞

P(

z∗n − z∗

sn(x∗n)/√n≤ zα

)≥ 1− α

SAA: One-sided CLT for z∗n

Theorem. Assume

• (A1)-(A3)

• ξ1, ξ2, . . . , ξn are iid as ξ

• infx∈X∗

σ2(x) ≤ lim infn→∞

s2n(x∗n) ≤ lim sup

n→∞s2n(x∗n) ≤ sup

x∈X∗σ2(x),wp1

Then, given 0 < α < 1

lim infn→∞

P(

z∗n − z∗

sn(x∗n)/√n≤ zα

)≥ 1− α.

SAA: One-sided CLT for z∗n

Theorem. Assume

• (A1)-(A3)

• ξ1, ξ2, . . . , ξn are iid as ξ

• infx∈X∗

σ2(x) ≤ lim infn→∞

s2n(x∗n) ≤ lim sup

n→∞s2n(x∗n) ≤ sup

x∈X∗σ2(x),wp1.

Then, given 0 < α < 1

lim infn→∞

P(

z∗n − z∗

sn(x∗n)/√n≤ zα

)≥ 1− α.

Notes:

• Could have assumed pointwise CLT

• For sufficiently large n, we infer that

P{z∗n − zαsn(x∗n)/

√n ≤ z∗

}≥ 1− α

• How does this relate to the bias result: Ez∗n ≤ z∗?

X Bias

X Consistency: z∗n and x∗n

X− CLT for z∗n

• Two-sided CLT for z∗?

Two-sided CLT for z∗n

Fact. Assume

• (A1)-(A3)

• ξ1, ξ2, . . . , ξn are iid as ξ

• |f (x1, ξ)− f (x2, ξ)| ≤ g(ξ)‖x1 − x2‖, ∀x1, x2 ∈ X, where E g2(ξ) <∞

If (SP ) has a unique optimal solution then:√n (z∗n − z∗)⇒ N(0, σ2(x∗)).

Two-sided CLT for z∗n

Fact. Assume

• (A1)-(A3)

• ξ1, ξ2, . . . , ξn are iid as ξ

• |f (x1, ξ)− f (x2, ξ)| ≤ g(ξ)‖x1 − x2‖, ∀x1, x2 ∈ X, where E g2(ξ) <∞

If (SP ) has a unique optimal solution then:√n (z∗n − z∗)⇒ N(0, σ2(x∗)).

Notes:

• But, there are frequently multiple optimal solutions. . .

Two-sided CLT for z∗n

Fact. Assume

• (A1)-(A3)

• ξ1, ξ2, . . . , ξn are iid as ξ

• |f (x1, ξ)− f (x2, ξ)| ≤ g(ξ)‖x1 − x2‖, ∀x1, x2 ∈ X, where E g2(ξ) <∞

Then, √n (z∗n − z∗)⇒ inf

x∈X∗N(0, σ2(x)).

Two-sided CLT for z∗n

Fact. Assume

• (A1)-(A3)

• ξ1, ξ2, . . . , ξn are iid as ξ

• |f (x1, ξ)− f (x2, ξ)| ≤ g(ξ)‖x1 − x2‖, ∀x1, x2 ∈ X, where E g2(ξ) <∞.

Then, √n (z∗n − z∗)⇒ inf

x∈X∗N(0, σ2(x)).

Notes:

• What is infx∈X∗

N(0, σ2(x)) ?

•√n(fn(x)− Ef (x, ξ)

)⇒ N(0, σ2(x))

• N(0, σ2(x)) is family of correlated normal random variables

Two-sided CLT for z∗n

Fact. Assume

• (A1)-(A3)

• ξ1, ξ2, . . . , ξn are iid as ξ

• |f (x1, ξ)− f (x2, ξ)| ≤ g(ξ)‖x1 − x2‖, ∀x1, x2 ∈ X, where E g2(ξ) <∞.

Then, √n (z∗n − z∗)⇒ inf

x∈X∗N(0, σ2(x)).

Notes:

• What is infx∈X∗

N(0, σ2(x)) ?

•√n(fn(x)− Ef (x, ξ)

)⇒ N(0, σ2(x))

• N(0, σ2(x)) is family of correlated normal random variables

• Recall example: infx∈X∗

N(0, σ2(x)) = −|N(0, 1)|

• How does infx∈X∗

N(0, σ2(x)) relate to the bias result: Ez∗n ≤ z∗?

X Bias

X Consistency: z∗n and x∗n

X CLT for z∗n

3. CLT for x∗n

SAA: CLT for x∗n

Fact. Assume

• (A1)-(A3)

• f(·, ξ) is convex and twice continuously differentiable

•X = {x : Ax ≤ b}• (SP ) has a unique optimal solution x∗

• (x1 − x2)>H∗(x1 − x2) > 0, ∀x1, x2 ∈ X, x1 6= x2, where H∗ = E∇2xf(x∗, ξ).

•Assume ∇xf(x, ξ) satisfies:

‖∇xf(x1, ξ)−∇xf(x2, ξ)‖ ≤ g(ξ) ‖x1 − x2‖ ∀x1, x2 ∈ X,

where Eg2(ξ) <∞ for some real-valued function g

Then,√n(x∗n − x∗)⇒ u where u solves the random QP:

minu

12u>H∗u+ c>u

s.t. Ai·u ≤ 0 : i ∈ {i : Ai·x∗ = bi}

u>E∇xf(x∗, ξ) = 0

and c is multivariate normal with mean 0 and covariance matrix Σ∗, where

Σ∗ij = cov(∂f(x∗,ξ)∂xi

, ∂f(x∗,ξ)

∂xj

).

X Bias: z∗n

X Consistency: z∗n and x∗n

X CLT: z∗n and x∗n

SAA: Revisiting Possible Goals

1. x∗n → x∗, wp1 and√n(x∗n − x∗)⇒ u, where u solves a random QP

2. z∗n → z∗, wp1 and√n(z∗n − z∗)⇒ infx∈X∗N(0, σ2(x))

3. Ef (x∗n, ξ)→ z∗,wp1

4. limn→∞

P (Ef (x∗n, ξ)− z∗ ≤ εn) ≥ 1− α where εn → 0

• We now have conditions under which variants of 1-3 hold

• Let’s next start by aiming for a more modest version of 4:

Given x ∈ X and α find a (random) CI width ε with:

P(E f (x, ξ)− z∗ ≤ ε) & 1− α

An SAA Algorithm

Assessing Solution Quality: Towards an SAA Algorithm

z∗ = minx∈X

E f (x, ξ)

Goal: Given x ∈ X and α find a (random) CI width ε with:

P(E f (x, ξ)− z∗ ≤ ε) & 1− α

Using the bias result,

E

1

n

n∑j=1

f (x, ξj)−minx∈X

1

n

n∑j=1

f (x, ξj)

︸ ︷︷ ︸

Gn(x)

≥ Ef (x, ξ)− z∗

Assessing Solution Quality: Towards an SAA Algorithm

z∗ = minx∈X

E f (x, ξ)

Goal: Given x ∈ X and α find a (random) CI width ε with:

P(E f (x, ξ)− z∗ ≤ ε) & 1− α

Using the bias result,

E

1

n

n∑j=1

f (x, ξj)−minx∈X

1

n

n∑j=1

f (x, ξj)

︸ ︷︷ ︸

Gn(x)

≥ Ef (x, ξ)− z∗

Remarks

• Anticipate varGn(x) ≤ var [ 1n

∑nj=1 f (x, ξj)] + var z∗n

• Gn(x) ≥ 0, but not asymptotically normal (what to do?)

• Not much of an algorithm if the solution, x, comes as input!

An SAA Algorithm

Input: CI level 1− α, sample sizes nx and n, replication size ng

An SAA Algorithm

Input: CI level 1− α, sample sizes nx and n, replication size ng

Output: Solution x∗nx and approximate (1− α)-level CI on E f (x∗nx, ξ)− z∗

An SAA Algorithm

Input: CI level 1− α, sample sizes nx and n, replication size ng

Output: Solution x∗nx and approximate (1− α)-level CI on E f (x∗nx, ξ)− z∗

0. Sample iid observations ξ1, ξ2, . . . , ξnx, and solve (SP nx) to obtain x∗nx

An SAA Algorithm

Input: CI level 1− α, sample sizes nx and n, replication size ng

Output: Solution x∗nx and approximate (1− α)-level CI on E f (x∗nx, ξ)− z∗

0. Sample iid observations ξ1, ξ2, . . . , ξnx, and solve (SP nx) to obtain x∗nx

1. For k = 1, 2, . . . , ng

1.1. Sample iid observations ξk1, ξk2, . . . , ξkn from the distribution of ξ

1.2. Solve (SP n) using ξk1, ξk2, . . . , ξkn to obtain xk∗n

1.3. Calculate Gkn(x∗nx) = 1

n

∑nj=1 f (x∗nx, ξ

kj)− 1n

∑nj=1 f (xk∗n , ξ

kj)

An SAA Algorithm

Input: CI level 1− α, sample sizes nx and n, replication size ng

Output: Solution x∗nx and approximate (1− α)-level CI on E f (x∗nx, ξ)− z∗

0. Sample iid observations ξ1, ξ2, . . . , ξnx, and solve (SP nx) to obtain x∗nx

1. For k = 1, 2, . . . , ng

1.1. Sample iid observations ξk1, ξk2, . . . , ξkn from the distribution of ξ

1.2. Solve (SP n) using ξk1, ξk2, . . . , ξkn to obtain xk∗n

1.3. Calculate Gkn(x∗nx) = 1

n

∑nj=1 f (x∗nx, ξ

kj)− 1n

∑nj=1 f (xk∗n , ξ

kj)

2. Calculate gap estimate and sample variance:

Gn(ng) =1

ng

ng∑k=1

Gkn(x∗nx) and s2

G(ng) =1

ng − 1

ng∑k=1

(Gkn(x∗nx)− Gn(ng)

)2

An SAA Algorithm

Input: CI level 1− α, sample sizes nx and n, replication size ng

Output: Solution x∗nx and approximate (1− α)-level CI on E f (x∗nx, ξ)− z∗

0. Sample iid observations ξ1, ξ2, . . . , ξnx, and solve (SP nx) to obtain x∗nx

1. For k = 1, 2, . . . , ng

1.1. Sample iid observations ξk1, ξk2, . . . , ξkn from the distribution of ξ

1.2. Solve (SP n) using ξk1, ξk2, . . . , ξkn to obtain xk∗n

1.3. Calculate Gkn(x∗nx) = 1

n

∑nj=1 f (x∗nx, ξ

kj)− 1n

∑nj=1 f (xk∗n , ξ

kj)

2. Calculate gap estimate and sample variance:

Gn(ng) =1

ng

ng∑k=1

Gkn(x∗nx) and s2

G(ng) =1

ng − 1

ng∑k=1

(Gkn(x∗nx)− Gn(ng)

)2

3. Let εg = tng−1,αsG(ng)/√ng, and output x∗nx and one-sided CI:[

0, Gn(ng) + εg]

An SAA Algorithm

Input: CI level 1− α, sample sizes nx and n, replication size ng

• Fix α = 0.05 and ng = 15 (say)

• Choose nx and n based on what is computationally reasonable

• Choose nx > n, perhaps nx � n

Then,

• For fixed n and nx can justify algorithm with ng →∞

• For fixed ng can justify the algorithm with n→∞

• Can even use ng = 1, albeit with different variance estimator

An SAA Algorithm

Output: Solution x∗nx and approximate (1− α)-level CI on E f (x∗nx, ξ)− z∗

• x∗nx is the decision we will make

• The confidence interval is on x∗nx’s optimality gap, E f (x∗nx, ξ)− z∗

• Here, E f (x∗nx, ξ) = Eξ [f (x∗nx, ξ) |x∗nx]

• So, this is a posterior assessment, given the decision we will make

An SAA Algorithm

0. Sample iid observations ξ1, ξ2, . . . , ξnx, and solve (SP nx) to obtain x∗nx

• ξ1, ξ2, . . . , ξnx need not be iid

• Agnostic to algorithm used to solve (SP nx)

An SAA Algorithm

1. For k = 1, 2, . . . , ng

1.1. Sample iid observations ξk1, ξk2, . . . , ξkn from the distribution of ξ

1.2. Solve (SP n) using ξk1, ξk2, . . . , ξkn to obtain xk∗n

1.3. Calculate Gkn(x∗nx) = 1

n

∑nj=1 f (x∗nx, ξ

kj)− 1n

∑nj=1 f (xk∗n , ξ

kj)

• ξk1, ξk2, . . . , ξkn need not be iid, but should satisfy Efn(x) = Ef (x, ξ)

(could use Latin hypercube sampling or randomized quasi Monte Carlo sampling)

• (ξk1, ξk2, . . . , ξkn), k = 1, 2, . . . , ng, should be iid

• Agnostic to algorithm used to solve (SP n)

• Can solve relaxation of (SP n) if lower bound is used in second term of 1.3

(recall Efn(x) ≤ Ef (x, ξ) relaxation in bias result)

• Can also use independent samples and different sample sizes, nu, and n`, for the

upper- and lower-bound estimators in step 1.3

An SAA Algorithm

2. Calculate gap estimate and sample variance:

Gn(ng) =1

ng

ng∑k=1

Gkn(x∗nx) and s2

G(ng) =1

ng − 1

ng∑k=1

(Gkn(x∗nx)− Gn(ng)

)2

3. Let εg = tng−1,αsG(ng)/√ng, and output x∗nx and one-sided CI:[

0, Gn(ng) + εg]

• Standard calculation of sample mean and sample variance

• Standard calculation of one-sided confidence interval for a nonnegative parameter

• Again, here the “parameter” is E f (x∗nx, ξ)− z∗

• SAA Algorithm tends to be conservative, i.e., exhibit over-coverage

• Why?

SAA Algorithm Applied to a few Two-Stage SLPs

Problem DB WRPM 20TERM SSN

nx in (SPnx)

for x∗nx 50 50 50 2000

Optimality Gap

n 25 25 25 1000

ng 30 30 30 30

95% CI Width 0.2% 0.08% 0.5% 8%

× Var. Red. 4300 480 1300 17

Variance reduction with respect to algorithm, which estimates upper and lower bounds

defining G with independent, rather than common, random number streams

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

0  

1  

2  

3  

4  

5  

6  

10   100   1000   10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

Note: n = nx

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0.9  

100   1000   10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

0  

0.02  

0.04  

0.06  

0.08  

0.1  

0.12  

0.14  

0.16  

0.18  

1000   10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

7.95  

8  

8.05  

8.1  

8.15  

8.2  

8.25  

8.3  

8.35  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

8.25  

8.26  

8.27  

8.28  

8.29  

8.3  

8.31  

8.32  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

8.288  

8.29  

8.292  

8.294  

8.296  

8.298  

8.3  

8.302  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

0.001  

0.01  

0.1  

1  1   10   100   1000   10000  

gap  

n    

• If EGn(x∗n) =a

npthen log [EGn(x∗n)] = log[a]− p log[n]

• From these four points p ≈ 0.74. R2 = 0.9998.

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

A

B

E D C

1

2

3 5

7

6

4

• Enforce symmetry constraints: x1 = x6, x2 = x7, x3 = x5

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

in

0  

1  

2  

3  

4  

5  

6  

10   100   1000  10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

0  

1  

2  

3  

4  

5  

6  

10   100   1000  10000  gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

no extra constraints with symmetry constraints

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

in

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  

100   1000   10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  

100   1000  10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

no extra constraints with symmetry constraints

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

in

0  0.02  0.04  0.06  0.08  0.1  0.12  0.14  0.16  0.18  0.2  

1000   10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

0  0.02  0.04  0.06  0.08  0.1  

0.12  0.14  0.16  0.18  0.2  

1000   10000  

gap  (%

 of  z*)  

n=nx  

samp  err  

gap  

no extra constraints with symmetry constraints

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

7.95  

8  

8.05  

8.1  

8.15  

8.2  

8.25  

8.3  

8.35  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

7.95  

8  

8.05  

8.1  

8.15  

8.2  

8.25  

8.3  

8.35  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

no extra constraints with symmetry constraints

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

8.25  

8.26  

8.27  

8.28  

8.29  

8.3  

8.31  

8.32  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

8.25  

8.26  

8.27  

8.28  

8.29  

8.3  

8.31  

8.32  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

no extra constraints with symmetry constraints

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

8.288  

8.29  

8.292  

8.294  

8.296  

8.298  

8.3  

8.302  

8.304  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

8.288  

8.29  

8.292  

8.294  

8.296  

8.298  

8.3  

8.302  

8.304  

1   10   100   1000   10000  

Upp

er  and

 Low

er  Bou

nds  

n  

no extra constraints with symmetry constraints

SAA Algorithm

Network Capacity Expansion Model (z∗ ≈ 8.3)

(Higle & Sen)

0.001  

0.01  

0.1  

1  1   10   100   1000   10000  

gap  

n    

0.001  

0.01  

0.1  

1  1   10   100   1000   10000  

gap  

n    

• EGn(x∗n) =a

np

• p ≈ 0.74. R2 = 0.999 p ≈ 0.61. R2 = 0.986

rate worse, constant “a” better

If you are happy with your results from SAA Algorithm then stop now!

Why Are You Unhappy?

1. Computational effort to solve ng = 15 instances of (SP n) is prohibitive;

2. Bias of z∗n is large;

3. Sampling error, εg, is large; or,

4. Solution x∗nx is far from optimal to (SP )

Why Are You Unhappy?

1. Computational effort to solve ng = 15 instances of (SP n) is prohibitive;

2. Bias of z∗n is large;

3. Sampling error, εg, is large; or,

4. Solution x∗nx is far from optimal to (SP )

Remedy 1: Single replication procedure: ng = 1

Remedy 2: LHS, randomized QMC, adaptive jackknife estimator

Remedy 3: CRNs reduce variance. Other ideas help: LHS and randomized QMC

Remedy 4: A sequential SAA algorithm

A Sequential SAA Algorithm

A Sequential SAA Algorithm

Step 1: Generate a candidate solution

Step 2: Check stopping criterion. If satisfied, stop

Else, go to step 1

Instead of candidate solution x = x∗nx ∈ X , we have {xk} with each xk ∈ X

Stopping criterion rooted in above procedure (with ng = 1):

Gk ≡ Gnk(xk) =1

nk

nk∑j=1

(f (xk, ξ

j)− f (x∗nk, ξj))

and

s2k ≡ s2

nk(x∗nk) =

1

nk − 1

nk∑j=1

[(f (xk, ξ

j)− f (x∗nk, ξj))− (fnk(xk)− fnk(x

∗nk

))]2

A Sequential SAA Algorithm

Stopping criterion:

T = infk≥1{k : Gk ≤ h′sk} (1)

Sample size criterion:

nk ≥(

1

h− h′

)2 (cq,α + 2q ln2 k

)(2)

Fact. Consider the sequential sampling procedure in which the sample size is in-

creased according to (2), and the procedure stops at iteration T according to (1).

Then, under some regularity assumptions (including uniform integrability of a

moment generating function)

lim infh↓h′

P (E f (xT , ξ)− z∗ ≤ hsT ) ≥ 1− α

A Word (well, pictures) About

Multi-Stage Stochastic Programming

What Does “Solution” Mean?

In multistage setting, assessing solution quality means assessing policy quality

One Family of Algorithms & SAA

Assume interstage independence, or, dependence with special structure.

Stochastic dual dynamic programming (SDDP):

……

(a) Forward Pass

… … …

… … …

(b) Backward Pass

Small Sampling of Things We Didn’t Talk About

• Non-iid sampling (well, we did a bit)

• Bias and variance reduction techniques (some brief allusion)

• Multi-stage SAA (in any detail)

• Large-deviation results, concentration-inequality results, finite-sample guarantees

• More generally, results with coefficients that are difficult to estimate

• SAA for expected-value constraints, including chance constraints

• SAA for other models, such as those with equilibrium constraints

• Results that exploit more specific special structure of f , ξ, and/or X

• Results that study interaction between an optimization algorithm and SAA

• Stochastic approximation, stochastic gradient descent, stochastic mirror descent, stochasticcutting-plane methods, stochastic dual dynamic programming . . .

• Statistical testing of optimality conditions

• Results for risk measures not expressed as expected (dis)utility.

• Decision-dependent probability distributions

• Distributionally robust data-driven variants of SAA

Summary: SAA

• SAA

– Results for Monte Carlo estimators: no optimization

– What results should we want for SAA?

– Results for SAA

1. Bias

2. Consistency

3. CLT

• SAA Algorithm

– A basic algorithm

– A sequential algorithm

• Multi-Stage Problems

• What We Didn’t Discuss

Small Sampling of References

• Lagrange, Bernoulli, Euler, Laplace, Gauss, Edgeworth, Hotelling, Fisher . . . (leading to maximum likelihood)

• H. Robbins and S. Monro, “A stochastic approximation method,” Annals of Mathematical Statistics 22, 400-407, 1951.

• G. Dantzig and A. Madansky, “On the solution of two-stage linear programs under uncertainty,” Proceedings of theFourth Berkeley Symposium on Mathematical Statistics and Probability, 1961.

——————————————— Overviews and Tutorials ———————————————————————• A. Shapiro, A. Ruszczynski, D. Dentcheva, Lectures on Stochastic Programming: Modeling and Theory (Chapter 5,

Statistical Inference), 2014.

• A. Shapiro, “Monte Carlo sampling methods.” In A. Ruszczynski and A. Shapiro (editors), Stochastic Programming:Handbooks in Operations Research and Management Science, 2003.

• S. Kim, R. Pasupathy and S. Henderson. “A guide to sample-average approximation.” In Handbook of SimulationOptimization, edited by M. Fu, 2015.

• T. Homem-de-Mello and G. Bayraksan, “Monte Carlo sampling-based methods for stochastic optimization,” Surveys inOperations Research and Management Science 19, 56-85, 2014.

• G. Bayraksan and D.P. Morton, “Assessing solution quality in stochastic programs via sampling,” Tutorials in OperationsResearch, M.R. Oskoorouchi (ed.), 102-122, INFORMS, 2009.

——————————————— Further References ——————————————————————————• G. Bayraksan and D.P. Morton, “Assessing solution quality in stochastic programs,” Mathematical Programming, 108,

495-514 (2006).

• G. Bayraksan and D.P. Morton, “A sequential sampling procedure for stochastic programming,” Operations Research 59,898-913 (2011).

• J. Dupacova and R. Wets, “Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimiza-tion problems,” The Annals of Statistics 16, 1517-1549, 1988.

• M. Freimer, J. Linderoth and D. Thomas, ”The impact of sampling methods on bias and variance in stochastic linearprograms,” Computational Optimization and Applications 51, 51-75, 2012.

• P. Glynn and G. Infanger, “Simulation-based confidence bounds for two-stage stochastic programs,” Mathematical Pro-gramming 138, 15-42, 2013.

• J. Higle and S. Sen, “Stochastic decomposition: an algorithm for two-stage linear programs with recourse,” Mathematicsof Operations Research 16, 650-669, 1991.

Small Sampling of References

• J. Higle and S. Sen, “Duality and statistical tests of optimality for two stage stochastic programs,” MathematicalProgramming 75, 257-275, 1996.

• T. Homem-de-Mello, “On rates of convergence for stochastic optimization problems under non-iid sampling”, SIAMJournal on Optimization 19, 524-551, 2008.

• G. Infanger, “Monte Carlo (importance) sampling within a Benders decomposition algorithm for stochastic linear pro-grams,” Annals of Operations Research 39, 4167, 1991

• A. King and R. Rockafellar, “Asymptotic theory for solutions in statistical estimation and stochastic programming,”Mathematics of Operations Research 18, 148-162, 1993.

• A. King and R. Wets, “Epiconsistency of convex stochastic programs,” Stochastics 34, 83-92, 1991.

• A. Kleywegt, A. Shapiro, and T. Homem-de-Mello, “The sample average approximation method for stochastic discreteoptimization,” SIAM Journal on Optimization 12, 479-502, 2001.

• V. Kozmik and D.P. Morton, “Evaluating policies in risk-averse multi-stage stochastic programming,” MathematicalProgramming 152, 275-300 (2015).

• J. Luedtke and S. Ahmed, “A sample approximation approach for optimization with probabilistic constraints,” SIAMJournal on Optimization 19, 674-699, 2008.

• J. Linderoth, A. Shapiro and S. Wright, “The empirical behavior of sampling methods for stochastic programming, Annalsof Operations Research 142, 215-241, 2001.

• W. Mak, D. Morton and R. Wood, “Monte Carlo bounding techniques for determining solution quality in stochasticprograms,” Operations Research Letters 24, 47-56 (1999).

• B. Pagnoncelli, S. Ahmed and A. Shapiro, “Sample average approximation method for chance constrained programming:theory and applications,” Journal of Optimization Theory and Applications 142, 399-416, 2009.

• R. Pasupathy, “On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simu-lation optimization,” Operations Research 58, 889-901, 2010.

• J. Royset and R. Szechtman, “Optimal Budget Allocation for Sample Average Approximation,” Operations Research 61,762-776, 2013.

Sorry for All the Acronyms (SAA)

• CI: Confidence Interval

• CLT: Central Limit Theorem

• CRN: Common Random Numbers

• DB: Donohue Birge test instance

• iid: independent and identically distributed

• iidrvs: iid random variables

• LHS: Latin Hybercube Sampling

• LLN: Law of Large Numbers

• LSC: Lower Semi-Continuous

• MIP: Mixed Integer Program

• QMC: Quasi Monte Carlo

• QP: Quadratic Program

• SAA: Sample Average Approximation

• SDDP: Stochastic Dual Dynamic Programming

• SLP: Stochastic Linear Program

• SSN: SONET Switched Network test instance.Or, Suvrajeet Sen’s Network

• SONET: Synchronous Optical Networking

• USLLN: Uniform Strong LLN

• wp1: with probability one

• WRPM: West-coast Regional Planning Model

• 20TERM: 20 TERMinal test instance