An Analysis Framework for Ad Hoc Teamwork Taskspstone/Papers/bib2html-links/AAMAS12-Barrett... ·...

35
An Analysis Framework for Ad Hoc Teamwork Tasks Samuel Barrett Peter Stone The University of Texas at Austin {sbarrett,pstone}@cs.utexas.edu AAMAS 2012 June 8, 2012 Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Transcript of An Analysis Framework for Ad Hoc Teamwork Taskspstone/Papers/bib2html-links/AAMAS12-Barrett... ·...

An Analysis Framework forAd Hoc Teamwork Tasks

Samuel Barrett Peter Stone

The University of Texas at Austin{sbarrett,pstone}@cs.utexas.edu

AAMAS 2012June 8, 2012

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Ad Hoc Teamwork

Only in control of a singleagentUnknown teammatesShared goalsNo pre-coordination

Examples in humans:Pick up soccerAccident response

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Motivation

Agents are becoming more common and lasting longerPre-coordination may not be possiblePrevious work focuses on specific subsets of the ad hocteamwork problemUnify research in ad hoc teamwork

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Overview

Analyze ad hoc team problems in terms of 3 dimensionsAnalysis helps for reusing prior algorithms on new domainsBetter identify areas for future research

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Ad Hoc Agent Evaluation

Not whether they win, buthow well they cooperateCompare against other adhoc agentsDepends on possible tasksDepends on possibleteammates

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Ad Hoc Agent Evaluation

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Ad Hoc Agent Evaluation

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Ad Hoc Agent Evaluation

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Ad Hoc Agent Evaluation

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Dimensions

Analyze ad hoc team problemsIdentify informative dimensionsGive explicit measures to compare problems

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Team Knowledge

Does the ad hoc agent know what its teammates actions willbe for a given state, before interacting with them?

What the ad hoc agent knows ahead of time, not what itcan learnCompare expected distribution of teammates actions totrue distributionAveraged over all states and teammatesHigher values more knowledge easier planning

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Team Knowledge

Does the ad hoc agent know what its teammates actions willbe for a given state, before interacting with them?

What the ad hoc agent knows ahead of time, not what itcan learnCompare expected distribution of teammates actions totrue distributionAveraged over all states and teammatesHigher values more knowledge easier planning

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Team Knowledge

K (T ,P) =

1 if JS(T ,P) = 0

1 JS(T ,P)JS(T ,U)

if JS(T ,P) < JS(T ,U)

JS(P,U)JS(U,Point)

otherwise

Team Knowledge =

ns=1

kt=1

K (Tt(s),Pt(s))

nkT - True distributionP - Predicted distributionJS - Jensen-Shannon divergence, a symmetric variant of KLPoint - distribution with all weight on one point

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Environment Knowledge

Does the ad hoc agent know the transition and reward dis-tribution given a joint action and state before interacting withthe environment?

2 parts - transition and reward are separateCompare expected next state/reward distribution to truedistributionGiven full information of joint actionAveraged over all statesHigher values more knowledge easier planning

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Environment Knowledge

Does the ad hoc agent know the transition and reward dis-tribution given a joint action and state before interacting withthe environment?

2 parts - transition and reward are separateCompare expected next state/reward distribution to truedistributionGiven full information of joint actionAveraged over all statesHigher values more knowledge easier planning

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Reactivity of Teammates

How much does the ad hoc agents actions affect those of itsteammates?

Compare resulting joint action distribution for different adhoc agents actionsOne step effectsSimilar to empowermentHigher values more reactive harder planning, butmore control

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Reactivity of Teammates

How much does the ad hoc agents actions affect those of itsteammates?

Compare resulting joint action distribution for different adhoc agents actionsOne step effectsSimilar to empowerment

[?]

Higher values more reactive harder planning, butmore control

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Overview

Analyze variations on an example domainIdentify how to reuse prior workAnalyze more domainsIdentify areas for future research

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Pursuit Domain

Grid world - TorusN Predators and 1 PreyPredators goal is tocapture the prey as quicklyas possibleAct simultaneouslyCollisions randomlydecided - loser stays still

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

vid.mpgMedia File (video/mpeg)

4-Predator Known Behaviors

4 predatorsTeammates use 1 of 4behaviorsTeammate behavior isknown

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Problem Analysis

DomainTeam Environment Teammate

Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

4-Predator Unknown Behaviors

4 predatorsTeammates are drawnfrom set of 4 behaviorsTeammate behavior is notknown

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Problem Analysis

DomainTeam Environment Teammate

Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501

4 Unknown 0.1550.807 (1,1) 0.001050.501

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

2-Predator Simultaneous

2 predatorsChoose high level behaviorto play for an episodeBoth know expectedcapture time for behaviorsTeammate plays bestresponseCapture by both predatorsneighboring the prey

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Problem Analysis

DomainTeam Environment Teammate

Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501

4 Unknown 0.1550.807 (1,1) 0.001050.5012 Simul 1 (1,1) 0.198

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

2-Predator Teaching

2 predators taking turnsChoose high level behaviorto play for an episodeAd hoc agent knowsexpected capture times forbehaviorsTeammate choosesgreedily based onobserved capture times

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Problem Analysis

DomainTeam Environment Teammate

Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501

4 Unknown 0.1550.807 (1,1) 0.001050.5012 Simul 1 (1,1) 0.1982 Teach 1 (1,1) 0.03420.118

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

2-Predator Simultaneous Revisited

Repeated normal-formgameShared payoffsChoosing behaviorscorresponds to choosingrow/columnTry to find lowest cost pathto optimal cell

=TA PD GR

TA -4.583 -5.123 -5.152PD -5.123 -4.946 -4.615GR -5.152 -4.615 -4.379

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

2-Predator Simultaneous Revisited

[?]

Efficient algorithm when memory size of 1

Exponential algorithm when memory size of >1

Can handle non-deterministic teammates

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Related Work

[?], [?], [?],

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Problem Descriptions

Flocking control - controlling a flock with a shill agentHan, Li, and Guo 2006

Unknown teammates (UTM) - cooperative box pushing,meeting in a 3x3 grid, and multi-channel broadcast

UTM-1 - follow a fixed set of actionsUTM-2 - attempt to play optimally, but have limitedobservationsWu, Zilberstein, and Chen 2011

Simulated pickup soccer - ad hoc agent given a differentplaybook

Bowling and McCracken 2005

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Problem Analysis

DomainTeam Environment Teammate

Knowledge (Trans, Reward) ReactivityFlocking control 1 (1,1) 0.07320.880Cooperating with

0 (1,1) 0UTM-1 teammatesCooperating with

0 (1,1) >0UTM-2 teammates

Simulated pickup soccer >0 (>0,1) >0

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Future Research

Most prior work focused on planning to interact withteammatesLow team knowledge - must explore teammates behaviorsLow environment knowledge - must explore environmentTrade-off between exploiting current knowledge, exploringteammates, and exploring the environmentMore complex domains

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Conclusions

Can analyze ad hoc team problems in terms of:Team KnowledgeEnvironmental KnowledgeTeammate Reactivity

Analysis helps for reusing prior algorithms on new domainsBetter identify areas for future research

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks

Thank You!

Analyzing domains canhelp ad hoc team agentscooperate with a variety ofteammates

Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks