DP can give complete quantitative solution
-
Upload
damian-snow -
Category
Documents
-
view
19 -
download
1
description
Transcript of DP can give complete quantitative solution
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
1
DP can give complete quantitative solution
Example 1: Discrete, finite capacity, inventory control problem
• Sk = Ck = Dk = {0, 1, 2}
• xk + uk 2 : finite capacity
• xk+1 = max(0, xk + uk – wk )
• xk + uk 2 uk 2 – xk
• Prob{wk=0}=0.1, Prob{wk=1}=0.7, Prob{wk=2}=0.2
no backlogging
U(x k)={0,…,2-x k)
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
2
DP can give complete quantitative solution
Example 1 continued: Inventory control problem
• N = 3
• gn(xn) = 0
• gk(xk, uk, wk) = uk + 1∙max(0, xk + uk – wk) + 3∙max(0, wk + xk – uk)order holding lost demand
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
3
DP can give closed-form solution
Example 2: A gambling model
A gambler is going to bet in N successive plays. The gambler can bet any (nonnegative) amount up to his present fortune. What betting strategy maximizes his final fortune?
P(lose) = p, P(win) = 1 – p = q : Bernoulli
Solution: For convenience, and with no loss in generality, we look to maximize the log of the final fortune. The model is as
follows.
• Utility of fortune 1 / wealth
U(x) = log(x) : also Bernoulli!
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
4
DP can give closed-form solution
Example 2 continued: Variable definitions
• xk = fortune at beginning of kth play (after outcome of (k – 1)th play, before kth)
• uk = bet for kth play as a percentage of xk
1 : win w.p. p
-1 : lose w.p. q = 1 – p
• gk(xk, uk, wk) = 0, 0 k N – 1
• gN(xN) = -log(xN)
• xk+1 = xk + wk uk xk
to maximize
• wk =
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
5
DP can give closed-form solution
Example 2 continued: DP algorithm for the problem
1...,,1,0
)}()({max
)}({max
)}(0{max)(
)log()(
1110
110
1110
Nk
xuxJqxuxJp
xuwxJE
xJExJ
xxJ
kkkkkkkku
kkkkkwu
kkwu
kk
NNN
k
kk
kk
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
6
DP can give closed-form solution
Example 2 continued: Solving the DP at k=N-1
Thus,
)}log()1log()1log({max
)}log()log({max)(
11110
11111110
11
1
1
NNNu
NNNNNNu
NN
xuqup
xuxqxuxpxJ
N
N
121
112012
10:
01
)(
11
*1
21
1
111
p
pp
qpiffeasibleqpu
u
uqp
u
q
u
p
u
N
N
N
NNN
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
7
DP can give closed-form solution
Example 2 continued: Solving the DP at k=N-1
Thus,
)}log()1log()1log({max
)}log()log({max)(
11110
11111110
11
1
1
NNNu
NNNNNNu
NN
xuqup
xuxqxuxpxJ
N
N
if p = 1 (q = 0) u*N-1 = 1 : bet it all! u*
N-1 = p – q
if 0 ≤ p < ½, then u*N-1 = 0
(p < q q log(1 – uN-1) dominates)
p log(1 + uN-1)+ q log(1 – uN-1)< q log(1 – u2N-1) ≤ 0
: consider uN-1 = 1 separately
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
8
DP can give closed-form solution
Example 2 continued: Closed-form solution for k=N-1
Hence,
Cpx
px
pqqppx
xqqpp
xpqpp
N
N
N
N
N
]121[log
210log)1log(
1212loglogloglog
2loglogloglog
log)22log()2log(
1
1
1
1
1
)( 11 NN xJ
C0
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
9
DP can give closed-form solution
Example 2 continued: Closed-form solution for k=N-1
Hence,
2100
121
p
pqp
*
1Nu
can view these as constant functions (controls = percentage) or as feedback policies (total bet )kkk xuu **
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
10
DP can give closed-form solution
Example 2 continued: Solving the DP at k=N-2
Proceeding one stage (play) back:
But except for constant C, this is the same equation as for k = N – 1
solution the same, plus consant C
Cp
xuxqxuxp
xuwxJExJ
NNNNNNu
NNNNNwu
NN
N
NN
]121[1...
...)log()log({max
)}(0{max)(
22222210
2222110
22
2
22
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
11
DP can give closed-form solution
Example 2 continued: General closed-from DP solution
210log
121log
2
2
px
pkCx
N
N
2100
121
p
pqp
)( 22 NN xJ
)( 22 NN uu
)( kNkN xJ
)( kNkN uu
210log
121log
px
pkCx
kN
kN
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
12
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3: A stock option model
• xk : price of a given stock at beginning of kth day
• xk+1 = xk + wk =
• {wk} i.i.d., wk ~ F( )
Random Walk
k
llwx
00
dwwFw )(
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
13
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3 continued: A stock option model
• Actions: Have an option to buy one share of the stock at fixed price c; N days to exercise option. If you buy when stock’s price is s:
s – c = profit (can be negative)
What strategy maximizes profit?
Terminating Process (Bertsekas, Prob. 8, Ch. 1)
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
14
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3 continued: Solution
DBu
Bucxwuxr
buytdonDB
buyBu
k
kkkkkk
k
;0
;),,(
')0(
)1(
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
15
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3 continued: Solution
However, process terminates (see prob. 8, ch. 1) when uk=B
introduce fictitious termination state T s.t.
BuT
TxDBuT
TxDBuwx
x
k
kk
kkkk
k
;
,;
,;
1
mixed symbolic and numeric states discrete event system
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
16
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3 continued: Solution
Cost structure changed to:
There is no simple analytical solution for Jk(xk) or u*k=*(xk), but we can obtain some qualitative properties (structure) of solutions.
otherwise
Txrwuxr kk
kkkk ;0
;),,(
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
17
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3 continued: DP algorithm for the problem
)()(,max)(
0,max)(
1
111
kkkkkkk
NNNN
wdFwxJcxxJ
xTcxxJ
expected “profit-to-go”
u N – 1 = B u N – 1 = DB
u k = B u k = DB
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
18
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3 continued: Lemma (Ross)
(i) Jk+1(xk) – xk + c is decreasing in xk
after a certain value of stock price profit-to-go is negative buy none
(ii) Jk(xk) is increasing and continuous in xk (backward induction)
constant does not affect property
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
19
DP can be used to obtain qualitative properties (structure) of optimal solutions
Example 3 continued: Theorem (Ross)
There exists numbers s1 ≤ s2 ≤ … ≤ sN-k ≤ … ≤ sN such that
where,
These results can be used to solve the problem numerically, or to gain insight into the process.
kkN
kkNkN sxDB
sxBu
;
;*
})(:{min cssJss kNk
critical stock price values
k periods remaining
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
20
DP for deterministic problems
Example 3 continued: Remark
For a deterministic situation, optimizing over policies (feedback) results in no advantage over optimizing over actions (sequences of controls/decisions)
Hence, the optimization problem can be solved using linear/nonlinear programming. Furthermore, for a finite state and action deterministic problem, we can equivalently formulate the problem as a shortest path problem for an acyclic graph.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
21
DP for deterministic problems
Example 3 continued: Forward search
There are efficient ways to find shortest path, e.g. Branch and Bound algorithms. However, DP has some advantages:
• always leads to global optimum
• can handle difficult constraint sets
c01
c02
c03
cij1
2
3
0
0
0
k=0 k=1 k=2 k=N-1 k=N
start End (Artificial)
. . .
. . .
. . .
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
22
DP can handle difficult constraint sets
Example 4: Integer-valued variables
Remark: reachable set from x0 = 1 is Z2
valuesboundaryx
x
kuxx
ux
ts
uxu
kkk
kk
0
1
1,0,
,
..
}2
1{min
2
0
1
21
21
20 : no cost at final stage N=2
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
23
Example 4 continued: Solution
k = 2
k = 1
0)( 22 xJ
211111
*1
*1
11111
21
21
)(11
2
1)()(
}0:{)(
}02
1{min)(
111
xxJxxu
uxZuxU
uxxJxUu
one-stage cost J2
singleton
DP can handle difficult constraint sets
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
24
Example 4 continued: Solution
k = 0
0
2
1
2
3min)1(
1:2
1
2
3min
)(2
1min
)}({min)(
*0
0200
00020
20
)(
200
20
)(
1120
)(00
0
000
000
000
u
uuJ
xuxxu
uxu
xJuxJ
Zu
xUu
xUu
xUu
DP can handle difficult constraint sets