DP can give complete quantitative solution

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

1

DP can give complete quantitative solution

Example 1: Discrete, finite capacity, inventory control problem

• Sk = Ck = Dk = {0, 1, 2}

• xk + uk 2 : finite capacity

• xk+1 = max(0, xk + uk – wk )

• xk + uk 2 uk 2 – xk

• Prob{wk=0}=0.1, Prob{wk=1}=0.7, Prob{wk=2}=0.2

no backlogging

U(x k)={0,…,2-x k)


2

DP can give complete quantitative solution

Example 1 continued: Inventory control problem

• N = 3

• gn(xn) = 0

• gk(xk, uk, wk) = uk + 1∙max(0, xk + uk – wk) + 3∙max(0, wk + xk – uk)order holding lost demand


3

DP can give closed-form solution

Example 2: A gambling model

A gambler is going to bet in N successive plays. The gambler can bet any (nonnegative) amount up to his present fortune. What betting strategy maximizes his final fortune?

P(lose) = p, P(win) = 1 – p = q : Bernoulli

Solution: For convenience, and with no loss in generality, we look to maximize the log of the final fortune. The model is as

follows.

• Utility of fortune 1 / wealth

U(x) = log(x) : also Bernoulli!


4


Example 2 continued: Variable definitions

• xk = fortune at beginning of kth play (after outcome of (k – 1)th play, before kth)

• uk = bet for kth play as a percentage of xk

1 : win w.p. p

-1 : lose w.p. q = 1 – p

• gk(xk, uk, wk) = 0, 0 k N – 1

• gN(xN) = -log(xN)

• xk+1 = xk + wk uk xk

to maximize

• wk =


5


Example 2 continued: DP algorithm for the problem

1...,,1,0

)}()({max

)}({max

)}(0{max)(

)log()(

1110

110

1110

Nk

xuxJqxuxJp

xuwxJE

xJExJ

xxJ

kkkkkkkku

kkkkkwu

kkwu

kk

NNN

k

kk

kk


6


Example 2 continued: Solving the DP at k=N-1

Thus,

)}log()1log()1log({max

)}log()log({max)(

11110

11111110

11

1

1

NNNu

NNNNNNu

NN

xuqup

xuxqxuxpxJ

N

N

121

112012

10:

01

)(

11

*1

21

1

111

p

pp

qpiffeasibleqpu

u

uqp

u

q

u

p

u

N

N

N

NNN


7



Thus,

)}log()1log()1log({max

)}log()log({max)(

11110

11111110

11

1

1

NNNu

NNNNNNu

NN

xuqup

xuxqxuxpxJ

N

N

if p = 1 (q = 0) u*N-1 = 1 : bet it all! u*

N-1 = p – q

if 0 ≤ p < ½, then u*N-1 = 0

(p < q q log(1 – uN-1) dominates)

p log(1 + uN-1)+ q log(1 – uN-1)< q log(1 – u2N-1) ≤ 0

: consider uN-1 = 1 separately


8


Example 2 continued: Closed-form solution for k=N-1

Hence,

Cpx

px

pqqppx

xqqpp

xpqpp

N

N

N

N

N

]121[log

210log)1log(

1212loglogloglog

2loglogloglog

log)22log()2log(

1

1

1

1

1

)( 11 NN xJ

C0


9


Example 2 continued: Closed-form solution for k=N-1

Hence,

2100

121

p

pqp

*

1Nu

can view these as constant functions (controls = percentage) or as feedback policies (total bet )kkk xuu **


10



Proceeding one stage (play) back:

But except for constant C, this is the same equation as for k = N – 1

solution the same, plus consant C

Cp

xuxqxuxp

xuwxJExJ

NNNNNNu

NNNNNwu

NN

N

NN

]121[1...

...)log()log({max

)}(0{max)(

22222210

2222110

22

2

22


11


Example 2 continued: General closed-from DP solution

210log

121log

2

2

px

pkCx

N

N

2100

121

p

pqp

)( 22 NN xJ

)( 22 NN uu

)( kNkN xJ

)( kNkN uu

210log

121log

px

pkCx

kN

kN


12

DP can be used to obtain qualitative properties (structure) of optimal solutions

Example 3: A stock option model

• xk : price of a given stock at beginning of kth day

• xk+1 = xk + wk =

• {wk} i.i.d., wk ~ F( )

Random Walk

k

llwx

00

dwwFw )(


13


Example 3 continued: A stock option model

• Actions: Have an option to buy one share of the stock at fixed price c; N days to exercise option. If you buy when stock’s price is s:

s – c = profit (can be negative)

What strategy maximizes profit?

Terminating Process (Bertsekas, Prob. 8, Ch. 1)


14


Example 3 continued: Solution

DBu

Bucxwuxr

buytdonDB

buyBu

k

kkkkkk

k

;0

;),,(

')0(

)1(


15



However, process terminates (see prob. 8, ch. 1) when uk=B

introduce fictitious termination state T s.t.

BuT

TxDBuT

TxDBuwx

x

k

kk

kkkk

k

;

,;

,;

1

mixed symbolic and numeric states discrete event system


16



Cost structure changed to:

There is no simple analytical solution for Jk(xk) or u*k=*(xk), but we can obtain some qualitative properties (structure) of solutions.

otherwise

Txrwuxr kk

kkkk ;0

;),,(


17


Example 3 continued: DP algorithm for the problem

)()(,max)(

0,max)(

1

111

kkkkkkk

NNNN

wdFwxJcxxJ

xTcxxJ

expected “profit-to-go”

u N – 1 = B u N – 1 = DB

u k = B u k = DB


18


Example 3 continued: Lemma (Ross)

(i) Jk+1(xk) – xk + c is decreasing in xk

after a certain value of stock price profit-to-go is negative buy none

(ii) Jk(xk) is increasing and continuous in xk (backward induction)

constant does not affect property


19


Example 3 continued: Theorem (Ross)

There exists numbers s1 ≤ s2 ≤ … ≤ sN-k ≤ … ≤ sN such that

where,

These results can be used to solve the problem numerically, or to gain insight into the process.

kkN

kkNkN sxDB

sxBu

;

;*

})(:{min cssJss kNk

critical stock price values

k periods remaining


20

DP for deterministic problems

Example 3 continued: Remark

For a deterministic situation, optimizing over policies (feedback) results in no advantage over optimizing over actions (sequences of controls/decisions)

Hence, the optimization problem can be solved using linear/nonlinear programming. Furthermore, for a finite state and action deterministic problem, we can equivalently formulate the problem as a shortest path problem for an acyclic graph.


21

DP for deterministic problems

Example 3 continued: Forward search

There are efficient ways to find shortest path, e.g. Branch and Bound algorithms. However, DP has some advantages:

• always leads to global optimum

• can handle difficult constraint sets

c01

c02

c03

cij1

2

3

0

0

0

k=0 k=1 k=2 k=N-1 k=N

start End (Artificial)

. . .

. . .

. . .


22

DP can handle difficult constraint sets

Example 4: Integer-valued variables

Remark: reachable set from x0 = 1 is Z2

valuesboundaryx

x

kuxx

ux

ts

uxu

kkk

kk

0

1

1,0,

,

..

}2

1{min

2

0

1

21

21

20 : no cost at final stage N=2


23


k = 2

k = 1

0)( 22 xJ

211111

*1

*1

11111

21

21

)(11

2

1)()(

}0:{)(

}02

1{min)(

111

xxJxxu

uxZuxU

uxxJxUu

one-stage cost J2

singleton



24


k = 0

0

2

1

2

3min)1(

1:2

1

2

3min

)(2

1min

)}({min)(

*0

0200

00020

20

)(

200

20

)(

1120

)(00

0

000

000

000

u

uuJ

xuxxu

uxu

xJuxJ

Zu

xUu

xUu

xUu



25

Example 4 continued: Optimal Policy

k = 0

0

1)(

00)1(

2

*111

*1

*1

01*0

*0

x

uxxu

xxu


DP can give complete quantitative solution

Documents

Transcript of DP can give complete quantitative solution