1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks...
-
Upload
opal-davis -
Category
Documents
-
view
216 -
download
0
Transcript of 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks...
![Page 1: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/1.jpg)
1(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Logistics
Reading for WedProject MeetingsThanks to Craig Boutilier, Eric Hansen
![Page 2: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/2.jpg)
2(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Outline
BDDs & ADDsMDP Review
![Page 3: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/3.jpg)
3(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD Definition
Defn• A Boolean Decision Diagram (BDD) is a directed
acyclic graph with two terminal nodes (0-terminal, 1-terminal). Each non-terminal node has an index to identify an input variable of the Boolean function and has two outgoing edges, called the 0-edge and the 1-edge.
Why care?• Compact representation of Boolean functions
• Bn B
![Page 4: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/4.jpg)
4(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
OBDD Definition
A OBDD is a BDD where input variables appear in a fixed order in all paths of the graph and no variable appears more than once on a path.
![Page 5: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/5.jpg)
5(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example (x3 and x2) or not x1
x2
x1
0 1
00
01
1
1
x3
OBDD
10
1 1 1 1 1
Binary decision tree
x3
x2
x1
![Page 6: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/6.jpg)
6(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
1
(x3 and x2) or not x1
10
1 1 1 1 1
Binary decision tree
x3
x2
x1
![Page 7: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/7.jpg)
7(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
(x3 and x2) or not x1
10
1
Binary decision diagram
x3
x2
x1
0
After ELIMINATION
![Page 8: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/8.jpg)
8(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
(x3 and x2) or not x1
10
1
Binary decision diagram
x3
x2
x1
0
MERGING
![Page 9: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/9.jpg)
9(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
(x3 and x2) or not x1
10
1
Binary decision diagram
x3
x2
x1
0
After MERGING
![Page 10: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/10.jpg)
10(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
(x3 and x2) or not x1
10
1
Binary decision diagram
x3
x2
x1
0
MERGING
![Page 11: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/11.jpg)
11(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
(x3 and x2) or not x1
10
1
Binary decision diagram
x3
x2
x1
0
After MERGING
![Page 12: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/12.jpg)
12(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
(x3 and x2) or not x1
10
1
Binary decision diagram
x3
x2
x1
0
ELIMINATION
![Page 13: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/13.jpg)
13(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BDD reduction example
x2
x1
0 1
00
01
1
1
(x3 and x2) or not x1
x3 1
1
Binary decision diagram
OBDD x3
x2
x1
0
After ELIMINATION
![Page 14: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/14.jpg)
14(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Unary and Binary Operations
Negation: • Computing not f
• Just exchange 0-terminal and 1-terminal.
• Constant time
• No increase in size!
![Page 15: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/15.jpg)
15(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
0 1
x1
0 1
x2
x1 x2
x1 and x2
0
x2
0 1
x1
0
1
1
(x1 and x2) or x3
0
x2
0 1
x1
0
1
1
x3
1
0
![Page 16: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/16.jpg)
16(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Set function argument xi to constant k (0 or 1).
k F xi –1
xi +1
xn
x1
F [xi =k]
Fx equivalent to F [x = 1]
Fx equivalent to F [x = 0]
Restriction Operation
![Page 17: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/17.jpg)
17(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Argument F
Restriction Execution Example
0
a
b
c
d
1 0
a
c
d
1
Restriction F[b=1]
0
c
d
1
Reduced Result
![Page 18: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/18.jpg)
18(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Properties Uniqueness
• With respect to each fixed variable order
• Reduced OBDD of a Boolean function f is unique
Is f satisfiable? Operations
• Function=, apply, restrict, compose Could just merge and reduce Faster to walk both trees, building new one
• Polynomial time in size of BDD
• Fast C library implementations
Popular• Model checking … And now AI…
![Page 19: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/19.jpg)
19(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Size of BDDs
n-input Boolean functions Require 2n bits in worst-case
• Truth tables always require 2n bits
Many practical functions require much less space in BDD representation.
2 2
n
![Page 20: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/20.jpg)
20(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Good Ordering Bad Ordering
Linear Growth
0
b3
a3
b2
a2
1
b1
a1
Exponential Growth
a3 a3
a2
b1 b1
a3
b2
b1
0
b3
b2
1
b1
a3
a2
a1
)()()( 332211 bababa
Finding Good Ordering = NPC
![Page 21: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/21.jpg)
21(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Symbolic Manipulation with OBDDs
Strategy• Represent data as set of OBDDs
Identical variable orderings
• Express solution method as sequence of symbolic ops Sequence of constructor & query operations
• Implement each operation by OBDD manipulation Do all the work in the constructor operations
Key Algorithmic Properties• Arguments are OBDDs with identical variable orderings
• Result is OBDD with same ordering
• Each step polynomial complexity (in |OBDD|)
![Page 22: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/22.jpg)
22(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
A0 /1
Set Operations
A
B
UnionA
B
Intersection
Characteristic Functions• A {0,1}n
Set of bit vectors of length n
• Represent set A as Boolean function A of n variables X A if and only if A(X ) = 1
![Page 23: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/23.jpg)
23(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Algebraic Decision Diagram (ADD)
Defn• A directed acyclic graph with k terminal nodes, each a
real number. • Each non-terminal node has an index to identify an input
variable of the Boolean function and has two outgoing edges, called the 0-edge and the 1-edge.
Why care?• Compact representation of functions: Bn R
Efficient operations• Add, Multiply, Max, …
![Page 24: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/24.jpg)
24(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Markov Decision ProcessesAn MDP has four components, S, A, R, Pr:
• (finite) state set S (|S| = n)• (finite) action set A (|A| = m)• transition function Pr(s,a,t)
each Pr(s,a,-) is a distribution over Srepresented by set of n x n stochastic matrices
• bounded, real-valued reward function R(s)represented by an n-vectorcan be generalized to include action costs: R(s,a)can be stochastic (but replacable by expectation)
Model easily generalizable to countable or continuous state and action spaces
![Page 25: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/25.jpg)
25(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
System Dynamics
Finite State Space S
State s1013: Loc = 236 Joe needs printout Craig needs coffee ...
![Page 26: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/26.jpg)
26(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
System Dynamics
Finite Action Space APick up Printouts?Go to Coffee Room?Go to charger?
![Page 27: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/27.jpg)
27(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
System Dynamics
Transition Probabilities: Pr(si, a, sj)
Prob. = 0.95
![Page 28: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/28.jpg)
28(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
System Dynamics
Transition Probabilities: Pr(si, a, sk)
Prob. = 0.05
s1 s2 ... sn
s1 0.9 0.05 ... 0.0s2 0.0 0.20 ... 0.1
sn 0.1 0.0 ... 0.0
...
![Page 29: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/29.jpg)
29(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Reward Process
Reward Function: R(si)- action costs possible
Reward = -10
Rs1 12s2 0.5
sn 10
...
.
.
![Page 30: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/30.jpg)
30(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Graphical View of MDP
St
Rt
St+1
At
Rt+1
St+2
At+1
Rt+2
![Page 31: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/31.jpg)
31(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Assumptions
Markovian dynamics (history independence)• Pr(St+1|At,St,At-1,St-1,..., S0) = Pr(St+1|At,St)
Markovian reward process• Pr(Rt|At,St,At-1,St-1,..., S0) = Pr(Rt|At,St)
Stationary dynamics and reward• Pr(St+1|At,St) = Pr(St’+1|At’,St’) for all t, t’
Full observability• though we can’t predict what state we will reach when
we execute an action, once it is realized, we know what it is
![Page 32: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/32.jpg)
32(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Policies
Nonstationary policy •π:S x T → A•π(s,t) is action to do at state s with t-stages-to-go
Stationary policy •π:S → A•π(s) is action to do at state s (regardless of time)• analogous to reactive or universal plan
These assume or have these properties:• full observability• history-independence• deterministic action choice
![Page 33: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/33.jpg)
33(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Value Iteration (Bellman 1957)Markov property allows exploitation of DP principle for optimal policy construction
• no need to enumerate |A|Tn possible policies
Value Iteration
)'(' )',,Pr(max)()( 1 ss VsassRsV kk
a
ssRsV ),()(0
)'(' )',,Pr(maxarg),(* 1 ss Vsasks k
a
Vk is optimal k-stage-to-go value function
Bellman backup
![Page 34: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/34.jpg)
34(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Value Iteration
0.3
0.7
0.4
0.6
s4
s1
s3
s2
Vt+1Vt
0.4
0.3
0.7
0.6
0.3
0.7
0.4
0.6
Vt-1Vt-2
0.7 Vt+1 (s1) + 0.3 Vt+1 (s4)
0.4 Vt+1 (s2) + 0.6 Vt+1 (s3)
Vt(s4) = R(s4)+max {
}
![Page 35: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/35.jpg)
35(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Value Iteration
s4
s1
s3
s2
0.3
0.7
0.4
0.6
0.3
0.7
0.4
0.6
0.3
0.7
0.4
0.6
Vt+1VtVt-1Vt-2
t(s4) = max { }
![Page 36: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/36.jpg)
36(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Value Iteration
Note how DP is used• optimal soln to k-1 stage problem can be used without
modification as part of optimal soln to k-stage problem
Because of finite horizon, policy nonstationaryIn practice, Bellman backup computed using:
ass VsassRsaQ kk ),'(' )',,Pr()(),( 1
),(max)( saQsV ka
k
![Page 37: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/37.jpg)
37(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Complexity
T iterationsAt each iteration |A| computations of n x n matrix times n-vector: O(|A|n3)
Total O(T|A|n3)Can exploit sparsity of matrix: O(T|A|n2)
![Page 38: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/38.jpg)
38(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Summary
Resulting policy is optimal
• convince yourself of this; convince that nonMarkovian, randomized policies not necessary
Note: optimal value function is unique, but optimal policy is not
kssVsV kk ,,),()(*
![Page 39: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/39.jpg)
39(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Discounted Infinite Horizon MDPsTotal reward problematic (usually)
• many or all policies have infinite expected reward
• some MDPs (e.g., zero-cost absorbing states) OK
“Trick”: introduce discount factor 0 ≤ β < 1• future rewards discounted by β per time step
Note:
Motivation: economic? failure prob? convenience?
],|[)(0
sREsVt
ttk
max
0
max
1
1][)( RREsV
t
t
![Page 40: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/40.jpg)
40(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Some Notes
Optimal policy maximizes value at each state
Optimal policies guaranteed to exist (Howard60)
Can restrict attention to stationary policies
• why change action at state s at new time t?
We define for some optimal π)()(* sVsV
![Page 41: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/41.jpg)
41(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Value Equations (Howard 1960)
Value equation for fixed policy value
Bellman equation for optimal value function
)'(' )'),(,Pr()()( ss VsssβsRsV
)'(' *)',,Pr(max)()(* ss VsasβsRsVa
![Page 42: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/42.jpg)
42(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Policy Iteration
Given fixed policy, can compute its value exactly:
Policy iteration exploits this
)'(' )'),(,Pr()()( ss VssssRsV
1. Choose a random policy π2. Loop:
(a) Evaluate Vπ
(b) For each s in S, set (c) Replace π with π’Until no improving action possible at any state
)'(' )',,Pr(maxarg)(' ss Vsassa
![Page 43: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/43.jpg)
43(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Policy Iteration Notes
Convergence assured (Howard)• intuitively: no local maxima in value space, and each
policy must improve value; since finite number of policies, will converge to optimal policy
Very flexible algorithm• need only improve policy at one state (not each state)
Gives exact value of optimal policyGenerally converges much faster than VI
• each iteration more complex, but fewer iterations
• quadratic rather than linear rate of convergence
![Page 44: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/44.jpg)
44(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Outline
![Page 45: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/45.jpg)
45(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Logical or Feature-based Problems
AI problems are most naturally viewed in terms of logical propositions, random variables, objects and relations, etc. (logical, feature-based)
E.g., consider “natural” spec. of robot example• propositional variables: robot’s location, Craig wants
coffee, tidiness of lab, etc.
• could easily define things in first-order terms as well
|S| exponential in number of logical variables• Spec./Rep’n of problem in state form impractical
• Explicit state-based DP impractical
• Bellman’s curse of dimensionality
![Page 46: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/46.jpg)
46(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Solution?
Require structured representations• exploit regularities in probabilities, rewards
• exploit logical relationships among variables
Require structured computation• exploit regularities in policies, value functions
• can aid in approximation (anytime computation)
We start with propositional represnt’ns of MDPs• probabilistic STRIPS
• dynamic Bayesian networks
• BDDs/ADDs
![Page 47: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/47.jpg)
47(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Propositional Representations
States decomposable into state variables
Structured representations the norm in AI• STRIPS, Sit-Calc., Bayesian networks, etc.
• Describe how actions affect/depend on features
• Natural, concise, can be exploited computationally
Same ideas can be used for MDPs
nXXXS 21
![Page 48: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/48.jpg)
48(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Robot Domain as Propositional MDP
Propositional variables for single user version• Loc (robot’s locat’n): Off, Hall, MailR, Lab, CoffeeR• T (lab is tidy): boolean• CR (coffee request outstanding): boolean• RHC (robot holding coffee): boolean• RHM (robot holding mail): boolean• M (mail waiting for pickup): boolean
Actions/Events• move to an adjacent location, pickup mail, get coffee, deliver
mail, deliver coffee, tidy lab• mail arrival, coffee request issued, lab gets messy
Rewards• rewarded for tidy lab, satisfying a coffee request, delivering mail• (or penalized for their negation)
![Page 49: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/49.jpg)
49(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
State Space
State of MDP: assignment to these six variables• 160 states
• grows exponentially with number of variables
Transition matrices• 25600 (or 25440) parameters required per matrix
• one matrix per action (6 or 7 or more actions)
Reward function• 160 reward values needed
Factored state and action descriptions will break this exponential dependence (generally)
![Page 50: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/50.jpg)
50(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Probabilistic STRIPS
PSTRIPS is a generalization of STRIPS that allows compact action (trans. matrix) represent’n
Intuition:• state = a list of variable values (one per variable)• state transitions = changes in variable values• actions tend to affect only a small number of variables
PSTRIPS gains compactness by describing only how particular variables change under an action
• each distinct outcome of a stochastic action will be described by a “change list” w/ associated probability
• changes/probs can vary with initial conditions
![Page 51: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/51.jpg)
51(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example of PSTRIPS Action
Procedural semantics: replace state valuesMuch more concise than explicit transition matrix
Condition Outcome Probability
Off, RHC -CR, -HRC 0.8
-HRC 0.1
0.1
-Off, RHC -HRC 0.8
0.2
-RHC 1.0
Action: DelC
![Page 52: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/52.jpg)
56(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Dynamic Bayesian Networks (DBNs)
Bayesian networks (BNs) a common representation for probability distributions
• A graph (DAG) represents conditional independence
• Tables (CPTs) quantify local probability distributions
Recall Pr(s,a,-) a distribution over S (X1 x ... x Xn)• BNs can be used to represent this too
Before discussing dynamic BNs (DBNs), we’ll have a brief excursion into Bayesian networks
![Page 53: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/53.jpg)
57(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Bayes Nets
In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential
space for representation inference
BNs provide a graphical representation of conditional independence relations in P
• usually quite compact
• requires assessment of fewer parameters, those being quite natural (e.g., causal)
• efficient (usually) inference: query answering and belief update
![Page 54: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/54.jpg)
58(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Extreme Independence
If X1, X2,... Xn are mutually independent, then
P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn)
Joint can be specified with n parameters• cf. the usual 2n-1 parameters required
Though such extreme independence is unusual, some conditional independence is common in most domains
BNs exploit this conditional independence
![Page 55: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/55.jpg)
59(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
An Example Bayes Net
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Pr(B=t) Pr(B=f) 0.05 0.95
Pr(A|E,B)e,b 0.9 (0.1)e,b 0.2 (0.8)e,b 0.85 (0.15)e,b 0.01 (0.99)
Radio
![Page 56: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/56.jpg)
60(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Earthquake Example (con’t)
If I know whether Alarm, no other evidence influences my degree of belief in Nbr1Calls
• P(N1|N2,A,E,B) = P(N1|A)
• also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E)
By the chain rule we haveP(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)·
P(A|E,B) ·P(E|B) ·P(B)
= P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B)
Full joint requires only 10 parameters (cf. 32)
Earthquake Burglary
Alarm
Nbr2CallsNbr1Calls
Radio
![Page 57: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/57.jpg)
61(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BNs: Qualitative Structure
Graphical structure of BN reflects conditional independence among variables
Each variable X is a node in the DAGEdges denote direct probabilistic influence
• usually interpreted causally• parents of X are denoted Par(X)
X is conditionally independent of all
nondescendents given its parents• Graphical test exists for more general independence• “Markov Blanket”
![Page 58: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/58.jpg)
62(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
BNs: Quantification
To complete specification of joint, quantify BNFor each variable X, specify CPT: P(X | Par(X))
• number of params locally exponential in |Par(X)|
If X1, X2,... Xn is any topological sort of the
network, then we are assured:
P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1)
… P(X2 | X1) · P(X1)
= P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1)
![Page 59: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/59.jpg)
63(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Inference in BNs
The graphical independence representation
gives rise to efficient inference schemes
We generally want to compute Pr(X) or Pr(X|E)
where E is (conjunctive) evidence
Computations organized network topology
One simple algorithm: variable elimination (VE)
![Page 60: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/60.jpg)
64(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Variable Elimination
A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1)
• CPTs are factors, e.g., P(A|E,B) function of A,E,B
VE works by eliminating all variables in turn until there is a factor with only query variable
To eliminate a variable:• join all factors containing that variable (like DB)
• sum out the influence of the variable on new factor
• exploits product form of joint distribution
![Page 61: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/61.jpg)
65(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Example of VE: P(N1) Earthqk Burgl
Alarm
N2N1
P(N1)
= N2,A,B,E P(N1,N2,A,B,E)
= N2,A,B,E P(N1|A)P(N2|A) P(B)P(A|B,E)P(E)
= AP(N1|A) N2P(N2|A) BP(B) EP(A|B,E)P(E)
= AP(N1|A) N2P(N2|A) BP(B) f1(A,B)
= AP(N1|A) N2P(N2|A) f2(A)
= AP(N1|A) f3(A)
= f4(N1)
![Page 62: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/62.jpg)
66(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Notes on VE
Each operation is a simply multiplication of factors and summing out a variable
Complexity determined by size of largest factor• e.g., in example, 3 vars (not 5)
• linear in number of vars, exponential in largest factor
• elimination ordering has great impact on factor size
• optimal elimination orderings: NP-hard
• heuristics, special structure (e.g., polytrees) exist
Practically, inference is much more tractable using structure of this sort
![Page 63: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/63.jpg)
67(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Dynamic BNs
Dynamic Bayes net action representation• one Bayes net for each action a, representing the set
of conditional distributions Pr(St+1|At,St)
• each state variable occurs at time t and t+1
• dependence of t+1 variables on t variables and other t+1 variables provided (acyclic)
• no quantification of time t variables given (since we don’t care about prior over St)
![Page 64: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/64.jpg)
68(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
DBN Representation: DelC
Tt
Lt
CRt
RHCt
Tt+1
Lt+1
CRt+1
RHCt+1
fCR(Lt,CRt,RHCt,CRt+1)
fT(Tt,Tt+1)
L CR RHC CR(t+1) CR(t+1)
O T T 0.2 0.8
E T T 1.0 0.0
O F T 0.0 1.0
E F T 0.0 1.0
O T F 1.0 0.1
E T F 1.0 0.0
O F F 0.0 1.0
E F F 0.0 1.0
T T(t+1) T(t+1)
T 0.91 0.09
F 0.0 1.0
RHMt RHMt+1
Mt Mt+1
fRHM(RHMt,RHMt+1)RHM R(t+1) R(t+1)
T 1.0 0.0
F 0.0 1.0
![Page 65: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/65.jpg)
69(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Benefits of DBN Representation
Pr(Rmt+1,Mt+1,Tt+1,Lt+1,Ct+1,Rct+1 | Rmt,Mt,Tt,Lt,Ct,Rct)
= fRm(Rmt,Rmt+1) * fM(Mt,Mt+1) * fT(Tt,Tt+1) * fL(Lt,Lt+1) * fCr(Lt,Crt,Rct,Crt+1) * fRc(Rct,Rct+1)
- Only 48 parameters vs. 25440 for matrix
-Removes global exponential dependence
s1 s2 ... s160
s1 0.9 0.05 ... 0.0s2 0.0 0.20 ... 0.1
s160 0.1 0.0 ... 0.0
...
Tt
Lt
CRt
RHCt
Tt+1
Lt+1
CRt+1
RHCt+1
RHMt RHMt+1
Mt Mt+1
![Page 66: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/66.jpg)
70(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Structure in CPTs
Notice that there’s regularity in CPTs• e.g., fCr(Lt,Crt,Rct,Crt+1) has many similar entries
• corresponds to context-specific independence in BNs
Compact function representations for CPTs can be used to great effect
• decision trees
• algebraic decision diagrams (ADDs/BDDs)
• Horn rules
![Page 67: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/67.jpg)
71(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Action Representation – DBN/ADD
CR
0.0 1.0 0.8
RHC
L
CR(t+1)CR(t+1)CR(t+1)
0.2
Algebraic Decision Diagram (ADD)Tt
Lt
CRt
RHCt
Tt+1
Lt+1
CRt+1
RHCt+1
RHMt RHMt+1
Mt Mt+1
f
t
t
o
t
e
f
ffft
t
fCR(Lt,CRt,RHCt,CRt+1)
![Page 68: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/68.jpg)
72(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Reward Representation
Rewards represented with ADDs in a similar fashion
• save on 2n size of vector rep’n
JC
10 012
CP
CC
JP BC JP
9
![Page 69: 1 (c) 2002-3, C. Boutilier, E. Hansen, D. Weld Logistics Reading for Wed Project Meetings Thanks to Craig Boutilier, Eric Hansen.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d345503460f94a0a9c2/html5/thumbnails/69.jpg)
73(c) 2002-3, C. Boutilier, E. Hansen, D. Weld
Reward Representation
Rewards represented similarly • save on 2n size of vector rep’n
Additive independent reward also very common
• as in multiattribute utility theory
• offers more natural and concise representation for many types of problems
10 0
CP
CC
CT
20 0
+