Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact...

15
Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence Speaker Prashant Doshi University of Georgia Authors B. Rathnasabapathy, Prashant Doshi, and Piotr Gmytrasiewicz

Transcript of Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact...

Page 1: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06)

Exact Solutions of Interactive POMDPs Using Behavioral

Equivalence

SpeakerPrashant Doshi

University of Georgia

AuthorsB. Rathnasabapathy, Prashant Doshi, and Piotr Gmytrasiewicz

Page 2: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

2Overview

I-POMDP – Framework for sequential decision making for an agent in a multi-agent setting– Takes the perspective of an individual in an interaction

Problem– Cardinality of the interactive state space → infinite

Other agent's models (incl. beliefs) are part of an agent's state space (interactive epistemology)

An algorithm for solving I-POMDPs exactly– Aggregate behaviorally equivalent models of other agents

Page 3: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

3 Background – Properties of POMDPs and I-POMDPs

• Finitely nested – Beliefs are nested up to a finite strategic level l– Level 0 models are POMDPs

• Value function of POMDP and finitely nested I-POMDP is piecewise linear and convex (PWLC)

• Agents’ behaviors in POMDP and finitely nested I-POMDP can be represented using policy trees

liPOMDPI ,

Page 4: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

4Interactive POMDPs

• Definition

• Interactive state space

– S: set of physical states : set of intentional models

: set of subintentional models– Intentional models contain the other agent’s beliefs

1, lj

jSM

Page 5: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

5Example: Single-Agent Tiger Problem

?+10 -100

-1

Page 6: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

6Behaviorally Equivalent Models

P1 P2 P3

Equivalence Classes of Beliefs

Page 7: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

7 Equivalence Classes of Interactive States

• Definition– Combination of a physical state and an equivalence

class of models

Page 8: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

8Lossless Aggregation

• In a finitely nested I-POMDP, a probability distribution over , provides a sufficient statistic for the past history of i’s observations

• Transformation of the interactive state space into behavioral equivalence classes is value-preserving

• Optimal policy of the transformed finitely nested I-POMDP remains unchanged

Page 9: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

9Solving I-POMDPs Exactly

Procedure Solve-IPOMDP ( AGENTi, Belief Nesting L ) : Returns Policy

If L = 0 Then

Return Policy : = Solve-POMDP ( AGENTi )

Else

For all AGENTj < > AGENTi

Policyj : = Solve-IPOMDP( AGENTj , L-1)

End

Mj := Behavioral-Equivalence-Models(Policyj )

ECISi : = S x xj Mj

Policy : = Modified-GIP(ECISi , Ai , Ti , Ωi , Oi , Ri )

Return Policy

End

Page 10: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

10Multi-Agent Persistent-Tiger Problem

+10 -100

Growl Left, Growl RightX

Creak Right, Creak Left, Silence

Page 11: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

11Beliefs on ECIS

Agent j’s policy

Page 12: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

Agent i’s policy in the presence of another agent j

Policy becomes diverse as i’s ability of observing j’s actions improves

Page 13: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

13

Page 14: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

14

A method that enables exact solution of finitely nested interactive POMDPs

Aggregate agent models into behavioral equivalence classes– Discretization is lossless

Interesting behaviors emerge in the multi-agent Tiger problem

Conclusions

Page 15: Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact Solutions of Interactive POMDPs Using Behavioral Equivalence.

Thank You and Please Stop by my Poster

Questions