DP can give complete quantitative solution

25
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 DP can give complete quantitative solution Example 1: Discrete, finite capacity, inventory control problem S k = C k = D k = {0, 1, 2} x k + u k 2 : finite capacity x k+1 = max(0, x k + u k – w k ) x k + u k 2 u k 2 – x k no backlogging U(x k )={0,…,2-x k )

description

DP can give complete quantitative solution. Example 1 : Discrete, finite capacity, inventory control problem S k = C k = D k = {0, 1, 2} x k + u k  2: finite capacity x k+1 = max(0, x k + u k – w k ) x k + u k  2  u k  2 – x k - PowerPoint PPT Presentation

Transcript of DP can give complete quantitative solution

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

1

DP can give complete quantitative solution

Example 1: Discrete, finite capacity, inventory control problem

• Sk = Ck = Dk = {0, 1, 2}

• xk + uk 2 : finite capacity

• xk+1 = max(0, xk + uk – wk )

• xk + uk 2 uk 2 – xk

• Prob{wk=0}=0.1, Prob{wk=1}=0.7, Prob{wk=2}=0.2

no backlogging

U(x k)={0,…,2-x k)

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

2

DP can give complete quantitative solution

Example 1 continued: Inventory control problem

• N = 3

• gn(xn) = 0

• gk(xk, uk, wk) = uk + 1∙max(0, xk + uk – wk) + 3∙max(0, wk + xk – uk)order holding lost demand

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

3

DP can give closed-form solution

Example 2: A gambling model

A gambler is going to bet in N successive plays. The gambler can bet any (nonnegative) amount up to his present fortune. What betting strategy maximizes his final fortune?

P(lose) = p, P(win) = 1 – p = q : Bernoulli

Solution: For convenience, and with no loss in generality, we look to maximize the log of the final fortune. The model is as

follows.

• Utility of fortune 1 / wealth

U(x) = log(x) : also Bernoulli!

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

4

DP can give closed-form solution

Example 2 continued: Variable definitions

• xk = fortune at beginning of kth play (after outcome of (k – 1)th play, before kth)

• uk = bet for kth play as a percentage of xk

1 : win w.p. p

-1 : lose w.p. q = 1 – p

• gk(xk, uk, wk) = 0, 0 k N – 1

• gN(xN) = -log(xN)

• xk+1 = xk + wk uk xk

to maximize

• wk =

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

5

DP can give closed-form solution

Example 2 continued: DP algorithm for the problem

1...,,1,0

)}()({max

)}({max

)}(0{max)(

)log()(

1110

110

1110

Nk

xuxJqxuxJp

xuwxJE

xJExJ

xxJ

kkkkkkkku

kkkkkwu

kkwu

kk

NNN

k

kk

kk

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

6

DP can give closed-form solution

Example 2 continued: Solving the DP at k=N-1

Thus,

)}log()1log()1log({max

)}log()log({max)(

11110

11111110

11

1

1

NNNu

NNNNNNu

NN

xuqup

xuxqxuxpxJ

N

N

121

112012

10:

01

)(

11

*1

21

1

111

p

pp

qpiffeasibleqpu

u

uqp

u

q

u

p

u

N

N

N

NNN

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

7

DP can give closed-form solution

Example 2 continued: Solving the DP at k=N-1

Thus,

)}log()1log()1log({max

)}log()log({max)(

11110

11111110

11

1

1

NNNu

NNNNNNu

NN

xuqup

xuxqxuxpxJ

N

N

if p = 1 (q = 0) u*N-1 = 1 : bet it all! u*

N-1 = p – q

if 0 ≤ p < ½, then u*N-1 = 0

(p < q q log(1 – uN-1) dominates)

p log(1 + uN-1)+ q log(1 – uN-1)< q log(1 – u2N-1) ≤ 0

: consider uN-1 = 1 separately

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

8

DP can give closed-form solution

Example 2 continued: Closed-form solution for k=N-1

Hence,

Cpx

px

pqqppx

xqqpp

xpqpp

N

N

N

N

N

]121[log

210log)1log(

1212loglogloglog

2loglogloglog

log)22log()2log(

1

1

1

1

1

)( 11 NN xJ

C0

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

9

DP can give closed-form solution

Example 2 continued: Closed-form solution for k=N-1

Hence,

2100

121

p

pqp

*

1Nu

can view these as constant functions (controls = percentage) or as feedback policies (total bet )kkk xuu **

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

10

DP can give closed-form solution

Example 2 continued: Solving the DP at k=N-2

Proceeding one stage (play) back:

But except for constant C, this is the same equation as for k = N – 1

solution the same, plus consant C

Cp

xuxqxuxp

xuwxJExJ

NNNNNNu

NNNNNwu

NN

N

NN

]121[1...

...)log()log({max

)}(0{max)(

22222210

2222110

22

2

22

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

11

DP can give closed-form solution

Example 2 continued: General closed-from DP solution

210log

121log

2

2

px

pkCx

N

N

2100

121

p

pqp

)( 22 NN xJ

)( 22 NN uu

)( kNkN xJ

)( kNkN uu

210log

121log

px

pkCx

kN

kN

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

12

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3: A stock option model

• xk : price of a given stock at beginning of kth day

• xk+1 = xk + wk =

• {wk} i.i.d., wk ~ F( )

Random Walk

k

llwx

00

dwwFw )(

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

13

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3 continued: A stock option model

• Actions: Have an option to buy one share of the stock at fixed price c; N days to exercise option. If you buy when stock’s price is s:

s – c = profit (can be negative)

What strategy maximizes profit?

Terminating Process (Bertsekas, Prob. 8, Ch. 1)

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

14

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3 continued: Solution

DBu

Bucxwuxr

buytdonDB

buyBu

k

kkkkkk

k

;0

;),,(

')0(

)1(

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

15

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3 continued: Solution

However, process terminates (see prob. 8, ch. 1) when uk=B

introduce fictitious termination state T s.t.

BuT

TxDBuT

TxDBuwx

x

k

kk

kkkk

k

;

,;

,;

1

mixed symbolic and numeric states discrete event system

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

16

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3 continued: Solution

Cost structure changed to:

There is no simple analytical solution for Jk(xk) or u*k=*(xk), but we can obtain some qualitative properties (structure) of solutions.

otherwise

Txrwuxr kk

kkkk ;0

;),,(

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

17

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3 continued: DP algorithm for the problem

)()(,max)(

0,max)(

1

111

kkkkkkk

NNNN

wdFwxJcxxJ

xTcxxJ

expected “profit-to-go”

u N – 1 = B u N – 1 = DB

u k = B u k = DB

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

18

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3 continued: Lemma (Ross)

(i) Jk+1(xk) – xk + c is decreasing in xk

after a certain value of stock price profit-to-go is negative buy none

(ii) Jk(xk) is increasing and continuous in xk (backward induction)

constant does not affect property

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

19

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3 continued: Theorem (Ross)

There exists numbers s1 ≤ s2 ≤ … ≤ sN-k ≤ … ≤ sN such that

where,

These results can be used to solve the problem numerically, or to gain insight into the process.

kkN

kkNkN sxDB

sxBu

;

;*

})(:{min cssJss kNk

critical stock price values

k periods remaining

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

20

DP for deterministic problems

Example 3 continued: Remark

For a deterministic situation, optimizing over policies (feedback) results in no advantage over optimizing over actions (sequences of controls/decisions)

Hence, the optimization problem can be solved using linear/nonlinear programming. Furthermore, for a finite state and action deterministic problem, we can equivalently formulate the problem as a shortest path problem for an acyclic graph.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

21

DP for deterministic problems

Example 3 continued: Forward search

There are efficient ways to find shortest path, e.g. Branch and Bound algorithms. However, DP has some advantages:

• always leads to global optimum

• can handle difficult constraint sets

c01

c02

c03

cij1

2

3

0

0

0

k=0 k=1 k=2 k=N-1 k=N

start End (Artificial)

. . .

. . .

. . .

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

22

DP can handle difficult constraint sets

Example 4: Integer-valued variables

Remark: reachable set from x0 = 1 is Z2

valuesboundaryx

x

kuxx

ux

ts

uxu

kkk

kk

0

1

1,0,

,

..

}2

1{min

2

0

1

21

21

20 : no cost at final stage N=2

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

23

Example 4 continued: Solution

k = 2

k = 1

0)( 22 xJ

211111

*1

*1

11111

21

21

)(11

2

1)()(

}0:{)(

}02

1{min)(

111

xxJxxu

uxZuxU

uxxJxUu

one-stage cost J2

singleton

DP can handle difficult constraint sets

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

24

Example 4 continued: Solution

k = 0

0

2

1

2

3min)1(

1:2

1

2

3min

)(2

1min

)}({min)(

*0

0200

00020

20

)(

200

20

)(

1120

)(00

0

000

000

000

u

uuJ

xuxxu

uxu

xJuxJ

Zu

xUu

xUu

xUu

DP can handle difficult constraint sets

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

25

Example 4 continued: Optimal Policy

k = 0

0

1)(

00)1(

2

*111

*1

*1

01*0

*0

x

uxxu

xxu

DP can handle difficult constraint sets