Perfect recall:
description
Transcript of Perfect recall:
Perfect recall:• Every decision node observes all earlier decision nodes and their parents (along a “temporal” order)
• Sum-max-sum rule (dynamical programming):
• Perfect recall is unrealistic: memory limit, decentralized systems
Variational methods:• Log-partition function duality:
• Junction graph BP: approximating and
Belief Propagation for Structured Decision Making Qiang Liu Alexander Ihler
Department of Computer Science, University of California, Irvine
AbstractVariational inference methods such as loopy BP have revolutionized inference abilities on graphical models.
Influence diagrams (or decision networks) are extension of graphical models for representing structured decision making problems.
Our contribution:• A general variational framework for solving influence diagrams• A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees • A convergent double-loop algorithm• Significant empirical improvement over the baseline algorithm
Variational Framework for structured decision
Influence Diagram
Graphical Models and Variational MethodsGraphical models:• Factors & exponential family form
• Graphical representations: Bayes nets, Markov random fields …
Inference: answering queries about graphical models
Our Algorithms
Experiments
Junction graph belief propagation for MEU:• Construct junction graph over the augmented distribution
Main result:
• Intuition: the last term encourages policies to be deterministic • Perfect recall convex optimization (easier)• Imperfect recall non-convex optimization (harder)
Bethe-Kikuchi approximation: locally consistent polytopeed
abc bcd
abe
d
bc
e
ab
b
e
a c
d
Loopy Junction graph
Influence diagram:• Chance nodes (C):
Augmented distribution:
Maximum expected utility (MEU):
Imperfect recall:• No closed form solution• Dominant algorithm: single policy updating (SPU), with policy-by-policy optimality
If is the maximum, the optimal strategy is Causes policies to be deterministic
Significance:• Enables converting arbitrary variational methods to MEU algorithms • “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops)
c1c4d1
c1c2d2
c3d1
c2c3d3 c4d2d3
Influence diagram Augmented distribution (factor graph)
Junction graph
• For each decision node , identify a unique cluster (called a decision cluster) that includes
Decision cluster of d1
Normal cluster
• Message passing algorithm ( )Sum-messages (from normal clusters):
MEU-messages (from decision clusters):
Optimal policies:
• Strong local optimality: provably better than SPU
Convergent algorithm by proximal point method:• Iteratively optimize a smoothed objective,
Diagnostic network (UAI08 inference challenge):
e.g., calculating (log) partition function:
Decentralized Sensor network:
Conditional probability:
Decision rule:
Global utility function: Local utility function:
• Decision nodes (D):
• Utility nodes (U):
or
d1 d2
u
d1 d2
u
Perfect recall Imperfect recall
Additive
d1 d2 utility
+1 +1 2-1 -1 1+1 -1 0-1 +1 0
Toy example:
Multiplicative
Weather
Activity
Forecast
Happiness
d3d2
u
c2 c3
d1
c4
c1
d3
d2
c2
c3
d1
c4
c1