An Analysis Framework for Ad Hoc Teamwork Taskspstone/Papers/bib2html-links/AAMAS12-Barrett... ·...

An Analysis Framework forAd Hoc Teamwork Tasks

Samuel Barrett Peter Stone

The University of Texas at Austin{sbarrett,pstone}@cs.utexas.edu

AAMAS 2012June 8, 2012

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Ad Hoc Teamwork

Only in control of a singleagentUnknown teammatesShared goalsNo pre-coordination

Examples in humans:Pick up soccerAccident response


Motivation

Agents are becoming more common and lasting longerPre-coordination may not be possiblePrevious work focuses on specific subsets of the ad hocteamwork problemUnify research in ad hoc teamwork


Overview

Analyze ad hoc team problems in terms of 3 dimensionsAnalysis helps for reusing prior algorithms on new domainsBetter identify areas for future research


Ad Hoc Agent Evaluation

Not whether they win, buthow well they cooperateCompare against other adhoc agentsDepends on possible tasksDepends on possibleteammates










Dimensions

Analyze ad hoc team problemsIdentify informative dimensionsGive explicit measures to compare problems


Team Knowledge

Does the ad hoc agent know what its teammates actions willbe for a given state, before interacting with them?

What the ad hoc agent knows ahead of time, not what itcan learnCompare expected distribution of teammates actions totrue distributionAveraged over all states and teammatesHigher values more knowledge easier planning


Team Knowledge

Does the ad hoc agent know what its teammates actions willbe for a given state, before interacting with them?

What the ad hoc agent knows ahead of time, not what itcan learnCompare expected distribution of teammates actions totrue distributionAveraged over all states and teammatesHigher values more knowledge easier planning


Team Knowledge

K (T ,P) =

1 if JS(T ,P) = 0

1 JS(T ,P)JS(T ,U)

if JS(T ,P) < JS(T ,U)

JS(P,U)JS(U,Point)

otherwise

Team Knowledge =

ns=1

kt=1

K (Tt(s),Pt(s))

nkT - True distributionP - Predicted distributionJS - Jensen-Shannon divergence, a symmetric variant of KLPoint - distribution with all weight on one point


Environment Knowledge

Does the ad hoc agent know the transition and reward dis-tribution given a joint action and state before interacting withthe environment?

2 parts - transition and reward are separateCompare expected next state/reward distribution to truedistributionGiven full information of joint actionAveraged over all statesHigher values more knowledge easier planning


Environment Knowledge

Does the ad hoc agent know the transition and reward dis-tribution given a joint action and state before interacting withthe environment?

2 parts - transition and reward are separateCompare expected next state/reward distribution to truedistributionGiven full information of joint actionAveraged over all statesHigher values more knowledge easier planning


Reactivity of Teammates

How much does the ad hoc agents actions affect those of itsteammates?

Compare resulting joint action distribution for different adhoc agents actionsOne step effectsSimilar to empowermentHigher values more reactive harder planning, butmore control


Reactivity of Teammates

How much does the ad hoc agents actions affect those of itsteammates?

Compare resulting joint action distribution for different adhoc agents actionsOne step effectsSimilar to empowerment

[?]

Higher values more reactive harder planning, butmore control


Overview

Analyze variations on an example domainIdentify how to reuse prior workAnalyze more domainsIdentify areas for future research


Pursuit Domain

Grid world - TorusN Predators and 1 PreyPredators goal is tocapture the prey as quicklyas possibleAct simultaneouslyCollisions randomlydecided - loser stays still


vid.mpgMedia File (video/mpeg)

4-Predator Known Behaviors

4 predatorsTeammates use 1 of 4behaviorsTeammate behavior isknown


Problem Analysis

DomainTeam Environment Teammate

Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501


4-Predator Unknown Behaviors

4 predatorsTeammates are drawnfrom set of 4 behaviorsTeammate behavior is notknown


Problem Analysis



4 Unknown 0.1550.807 (1,1) 0.001050.501


2-Predator Simultaneous

2 predatorsChoose high level behaviorto play for an episodeBoth know expectedcapture time for behaviorsTeammate plays bestresponseCapture by both predatorsneighboring the prey


Problem Analysis



4 Unknown 0.1550.807 (1,1) 0.001050.5012 Simul 1 (1,1) 0.198


2-Predator Teaching

2 predators taking turnsChoose high level behaviorto play for an episodeAd hoc agent knowsexpected capture times forbehaviorsTeammate choosesgreedily based onobserved capture times


Problem Analysis



4 Unknown 0.1550.807 (1,1) 0.001050.5012 Simul 1 (1,1) 0.1982 Teach 1 (1,1) 0.03420.118


2-Predator Simultaneous Revisited

Repeated normal-formgameShared payoffsChoosing behaviorscorresponds to choosingrow/columnTry to find lowest cost pathto optimal cell

=TA PD GR

TA -4.583 -5.123 -5.152PD -5.123 -4.946 -4.615GR -5.152 -4.615 -4.379


2-Predator Simultaneous Revisited

[?]

Efficient algorithm when memory size of 1

Exponential algorithm when memory size of >1

Can handle non-deterministic teammates


Related Work

[?], [?], [?],


Problem Descriptions

Flocking control - controlling a flock with a shill agentHan, Li, and Guo 2006

Unknown teammates (UTM) - cooperative box pushing,meeting in a 3x3 grid, and multi-channel broadcast

UTM-1 - follow a fixed set of actionsUTM-2 - attempt to play optimally, but have limitedobservationsWu, Zilberstein, and Chen 2011

Simulated pickup soccer - ad hoc agent given a differentplaybook

Bowling and McCracken 2005


Problem Analysis


Knowledge (Trans, Reward) ReactivityFlocking control 1 (1,1) 0.07320.880Cooperating with

0 (1,1) 0UTM-1 teammatesCooperating with

0 (1,1) >0UTM-2 teammates

Simulated pickup soccer >0 (>0,1) >0


Future Research

Most prior work focused on planning to interact withteammatesLow team knowledge - must explore teammates behaviorsLow environment knowledge - must explore environmentTrade-off between exploiting current knowledge, exploringteammates, and exploring the environmentMore complex domains


Conclusions

Can analyze ad hoc team problems in terms of:Team KnowledgeEnvironmental KnowledgeTeammate Reactivity

Analysis helps for reusing prior algorithms on new domainsBetter identify areas for future research


Thank You!

Analyzing domains canhelp ad hoc team agentscooperate with a variety ofteammates


An Analysis Framework for Ad Hoc Teamwork Taskspstone/Papers/bib2html-links/AAMAS12-Barrett... ·...

Documents

Transcript of An Analysis Framework for Ad Hoc Teamwork Taskspstone/Papers/bib2html-links/AAMAS12-Barrett... ·...