Perfect recall:

1
Perfect recall: • Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) • Sum-max-sum rule (dynamical programming): • Perfect recall is unrealistic: memory limit, decentralized systems Variational methods: • Log-partition function duality: • Junction graph BP: approximating and Belief Propagation for Structured Decision Making Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine Abstract Variational inference methods such as loopy BP have revolutionized inference abilities on graphical models. Influence diagrams (or decision networks) are extension of graphical models for representing structured decision making problems . Our contribution: • A general variational framework for solving influence diagrams • A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees • A convergent double-loop algorithm • Significant empirical improvement over the baseline algorithm Variational Framework for structured decision Influence Diagram Graphical Models and Variational Methods Graphical models: • Factors & exponential family form • Graphical representations: Bayes nets, Markov random fields … Inference: answering queries about graphical models Our Algorithms Experiments Junction graph belief propagation for MEU: • Construct junction graph over the augmented distribution Main result: • Intuition: the last term encourages policies to be deterministic • Perfect recall convex optimization (easier) • Imperfect recall non-convex optimization (harder) Bethe-Kikuchi approximation : locally consistent polytope ed abc bcd abe d bc e a b b e a c d Loopy Junction graph Influence diagram: • Chance nodes (C): Augmented distribution: Maximum expected utility (MEU): Imperfect recall: • No closed form solution • Dominant algorithm: single policy updating (SPU), with policy-by-policy optimality If is the maximum, the optimal strategy is Causes policies to be deterministic Significance: • Enables converting arbitrary variational methods to MEU algorithms “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops) c 1 c 4 d 1 c 1 c 2 d 2 c 3 d 1 c 2 c 3 d 3 c 4 d 2 d 3 Influence diagram Augmented distribution (factor graph) Junction graph • For each decision node , identify a unique cluster (called a decision cluster) that includes Decision cluster of d 1 Normal cluster • Message passing algorithm ( ) Sum-messages (from normal clusters): MEU-messages (from decision clusters ): Optimal policies: • Strong local optimality: provably better than SPU Convergent algorithm by proximal point method: • Iteratively optimize a smoothed objective, Diagnostic network (UAI08 inference challenge): .g., calculating (log) partition function: Decentralized Sensor network: Conditional probability: Decision rule: Global utility function: Local utility function: • Decision nodes (D): • Utility nodes (U): or d 1 d 2 u d 1 d 2 u Perfect recall Imperfect recall Additive d 1 d 2 utilit y +1 +1 2 -1 -1 1 +1 -1 0 -1 +1 0 Toy example: Multiplicative Weather Activity Forecast Happiness d 3 d 2 u c 2 c 3 d 1 c 4 c 1 d 3 d 2 c 2 c 3 d 1 c 4 c 1

description

Belief Propagation for Structured Decision Making. c 1 c 4 d 1. c 1 c 2 d 2. abc. bcd. ab. c 3 d 1. Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine. c 1. d 1. e. d. abe. ed. d 1. d 1. d 2. d 2. c 4 d 2 d 3. bc. c 2 c 3 d 3. c 2. - PowerPoint PPT Presentation

Transcript of Perfect recall:

Page 1: Perfect recall:

Perfect recall:• Every decision node observes all earlier decision nodes and their parents (along a “temporal” order)

• Sum-max-sum rule (dynamical programming):

• Perfect recall is unrealistic: memory limit, decentralized systems

Variational methods:• Log-partition function duality:

• Junction graph BP: approximating and

Belief Propagation for Structured Decision Making Qiang Liu Alexander Ihler

Department of Computer Science, University of California, Irvine

AbstractVariational inference methods such as loopy BP have revolutionized inference abilities on graphical models.

Influence diagrams (or decision networks) are extension of graphical models for representing structured decision making problems.

Our contribution:• A general variational framework for solving influence diagrams• A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees • A convergent double-loop algorithm• Significant empirical improvement over the baseline algorithm

Variational Framework for structured decision

Influence Diagram

Graphical Models and Variational MethodsGraphical models:• Factors & exponential family form

• Graphical representations: Bayes nets, Markov random fields …

Inference: answering queries about graphical models

Our Algorithms

Experiments

Junction graph belief propagation for MEU:• Construct junction graph over the augmented distribution

Main result:

• Intuition: the last term encourages policies to be deterministic • Perfect recall convex optimization (easier)• Imperfect recall non-convex optimization (harder)

Bethe-Kikuchi approximation: locally consistent polytopeed

abc bcd

abe

d

bc

e

ab

b

e

a c

d

Loopy Junction graph

Influence diagram:• Chance nodes (C):

Augmented distribution:

Maximum expected utility (MEU):

Imperfect recall:• No closed form solution• Dominant algorithm: single policy updating (SPU), with policy-by-policy optimality

If is the maximum, the optimal strategy is Causes policies to be deterministic

Significance:• Enables converting arbitrary variational methods to MEU algorithms • “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops)

c1c4d1

c1c2d2

c3d1

c2c3d3 c4d2d3

Influence diagram Augmented distribution (factor graph)

Junction graph

• For each decision node , identify a unique cluster (called a decision cluster) that includes

Decision cluster of d1

Normal cluster

• Message passing algorithm ( )Sum-messages (from normal clusters):

MEU-messages (from decision clusters):

Optimal policies:

• Strong local optimality: provably better than SPU

Convergent algorithm by proximal point method:• Iteratively optimize a smoothed objective,

Diagnostic network (UAI08 inference challenge):

e.g., calculating (log) partition function:

Decentralized Sensor network:

Conditional probability:

Decision rule:

Global utility function: Local utility function:

• Decision nodes (D):

• Utility nodes (U):

or

d1 d2

u

d1 d2

u

Perfect recall Imperfect recall

Additive

d1 d2 utility

+1 +1 2-1 -1 1+1 -1 0-1 +1 0

Toy example:

Multiplicative

Weather

Activity

Forecast

Happiness

d3d2

u

c2 c3

d1

c4

c1

d3

d2

c2

c3

d1

c4

c1