Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact...

Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06)

Exact Solutions of Interactive POMDPs Using Behavioral

Equivalence

SpeakerPrashant Doshi

University of Georgia

AuthorsB. Rathnasabapathy, Prashant Doshi, and Piotr Gmytrasiewicz

2Overview

I-POMDP – Framework for sequential decision making for an agent in a multi-agent setting– Takes the perspective of an individual in an interaction

Problem– Cardinality of the interactive state space → infinite

Other agent's models (incl. beliefs) are part of an agent's state space (interactive epistemology)

An algorithm for solving I-POMDPs exactly– Aggregate behaviorally equivalent models of other agents

3 Background – Properties of POMDPs and I-POMDPs

• Finitely nested – Beliefs are nested up to a finite strategic level l– Level 0 models are POMDPs

• Value function of POMDP and finitely nested I-POMDP is piecewise linear and convex (PWLC)

• Agents’ behaviors in POMDP and finitely nested I-POMDP can be represented using policy trees

liPOMDPI ,

4Interactive POMDPs

• Definition

• Interactive state space

– S: set of physical states : set of intentional models

: set of subintentional models– Intentional models contain the other agent’s beliefs

1, lj

jSM

5Example: Single-Agent Tiger Problem

?+10 -100

-1

6Behaviorally Equivalent Models

P1 P2 P3

Equivalence Classes of Beliefs

7 Equivalence Classes of Interactive States

• Definition– Combination of a physical state and an equivalence

class of models

8Lossless Aggregation

• In a finitely nested I-POMDP, a probability distribution over , provides a sufficient statistic for the past history of i’s observations

• Transformation of the interactive state space into behavioral equivalence classes is value-preserving

• Optimal policy of the transformed finitely nested I-POMDP remains unchanged

9Solving I-POMDPs Exactly

Procedure Solve-IPOMDP ( AGENTi, Belief Nesting L ) : Returns Policy

If L = 0 Then

Return Policy : = Solve-POMDP ( AGENTi )

Else

For all AGENTj < > AGENTi

Policyj : = Solve-IPOMDP( AGENTj , L-1)

End

Mj := Behavioral-Equivalence-Models(Policyj )

ECISi : = S x xj Mj

Policy : = Modified-GIP(ECISi , Ai , Ti , Ωi , Oi , Ri )

Return Policy

End

10Multi-Agent Persistent-Tiger Problem

+10 -100

Growl Left, Growl RightX

Creak Right, Creak Left, Silence

11Beliefs on ECIS

Agent j’s policy

Agent i’s policy in the presence of another agent j

Policy becomes diverse as i’s ability of observing j’s actions improves

14

A method that enables exact solution of finitely nested interactive POMDPs

Aggregate agent models into behavioral equivalence classes– Discretization is lossless

Interesting behaviors emerge in the multi-agent Tiger problem

Conclusions

Thank You and Please Stop by my Poster

Questions

Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact...

Documents

Transcript of Fifth International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-06) Exact...