Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored...
Transcript of Planning in Factored Hybrid-MDPsnau/adversarial/program/monday/hauskrecht.pdfPlanning in Factored...
Planning in Factored Hybrid-MDPs
Milos HauskrechtDepartment of Computer Science, University of Pittsburgh
Branislav KvetonIntelligent Systems Program, University of Pittsburgh
Carlos GuestrinDepartment of Computer Science, CMU
Factored MDPs
• Limitations Standard MDP formulation permits only flat discrete state and
action spaces Real-world problems are complex, factored, and involve
continuous quantities• Factored MDPs with discrete components
State and action spaces grow exponentially in the number ofvariables
Traditional methods for solving MDPs are polynomial in the sizeof state and action spaces
• Factored MDPs with continuous components State and action spaces are infinitely large optimal value function or policy may not have finite support Naive discretization of continuous variables often leads to an
exponential complexity of solution methods
Hybrid Factored MDPs
• A hybrid factored MDP(HMDP) is a 4-tuple M =(X, A, P, R): X = (XD, XC) is a vector of
state variables (discrete orcontinuous)
A = (AD, AC) is a vector ofaction variables (discreteor continuous)
P and R are factoredtransition and rewardmodels represented by adynamic Bayesian network(DBN)
X1
A1
X2
A2
X’1
X’2
R1
X3
A3
X’3
R2
t t + 1
Optimal Value Function and Policy
• Optimal policy π∗ maximizes the infinite horizondiscounted reward:
• Optimal value function V∗ is a fixed point of the Bellmanequation:
• Linear value function approximation A compact representation of an MDP does not guarantee the
same structure in the optimal value function or policy Approximation by a linear combination of |w| basis functions:
( )( )
πγ∑
∞
=π tt
0t
t ,RE xx
( ) ( ) ( ) ( )
′′′γ+= ∑ ∫
′ ′
∗∗
D C
CdV,|PRsupVx xa
xxaxxax,x
( )∑=i
iiifw)(V xxw
Hybrid Approximate Linear Programming
• Hybrid approximate linear programming (HALP)formulation:
( ) ( )AaXx
axaxw
∈∈∀
≥−
α
∑∑
,
0,R,Fw:to subject
wminimize
iii
iii
Basis function relevance weight:
Difference between the basis function fi(x) andits discounted backprojection:
( ) ( ) ( ) ( )∑ ∫′ ′
′′′γ−=D C
Ciii df,|Pf,Fx x
xxaxxxax
( ) ( )∑ ∫ψ=αD C
Cii dfx x
xxx
HALP formulation contains infinite number of constraints,
Methods:• Monte Carlo (Hauskrecht, Kveton, 2004):
Sample constraint space• ε−HALP (Guestrin, Hauskrecht, Kveton, 2004):
Discretize constraint space on the regular grid Take advantage of the structure Cutting plane methods on discrete state space Exponential in the treewidth of the problem
• Direct cutting plane methods (Kveton, Hauskrecht, 2005) Stochastic optimizations for finding a maximally violated
constraint
HALP solutions
( ) ( ) AaXxaxax ∈∈∀≥−∑ , ,0,R,Fwi
ii
Scale-up potential
• Problems with over 25 state and 20 action variables
Large irrigation network
Outflowregulation
device
Inflowregulation
device
Irrigation channelrepresented by a
continuous variable
Regulation devicerepresented by a
discrete action node
A1
A2
A4
A5
A6
A7
A3
A8
A9
A10
A11
A12
A13
A14
A15
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15 X16
X17
Traffic Management
• Recent surveys show Americans spend billions of hours in city traffic jams Congestions have grown in urban areas of every size Congestions have increasing economic impact The problem is too complex and growing rapidly No technology has provided a definite solution yet
• Our approach Automatic traffic management to ease congestions Traffic management exhibits local structure with long-term global
consequences
Automatic Traffic Management
XF-1 – # cars on Forbes Avebetween S Bouquet St and
Bigelow Blvd
XF-2 – # cars on Forbes Ave betweenBigelow Blvd and Schenley Dr
XF-3 – # cars on ForbesAve between Schenley Dr
and S Bellefield Ave
AF-1 – traffic lightsat Forbes Ave and
Bigelow Blvd
AF-2 – trafficlights at Forbes
Ave andSchenley Dr
XB-1 – # cars on Bigelow Blvdbetween 5th Ave and Forbes Ave