An Analysis Framework for Ad Hoc Teamwork Taskspstone/Papers/bib2html-links/AAMAS12-Barrett... ·...
Transcript of An Analysis Framework for Ad Hoc Teamwork Taskspstone/Papers/bib2html-links/AAMAS12-Barrett... ·...
An Analysis Framework forAd Hoc Teamwork Tasks
Samuel Barrett Peter Stone
The University of Texas at Austin{sbarrett,pstone}@cs.utexas.edu
AAMAS 2012June 8, 2012
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Ad Hoc Teamwork
Only in control of a singleagentUnknown teammatesShared goalsNo pre-coordination
Examples in humans:Pick up soccerAccident response
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Motivation
Agents are becoming more common and lasting longerPre-coordination may not be possiblePrevious work focuses on specific subsets of the ad hocteamwork problemUnify research in ad hoc teamwork
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Overview
Analyze ad hoc team problems in terms of 3 dimensionsAnalysis helps for reusing prior algorithms on new domainsBetter identify areas for future research
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Ad Hoc Agent Evaluation
Not whether they win, buthow well they cooperateCompare against other adhoc agentsDepends on possible tasksDepends on possibleteammates
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Ad Hoc Agent Evaluation
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Ad Hoc Agent Evaluation
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Ad Hoc Agent Evaluation
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Ad Hoc Agent Evaluation
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Dimensions
Analyze ad hoc team problemsIdentify informative dimensionsGive explicit measures to compare problems
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Team Knowledge
Does the ad hoc agent know what its teammates actions willbe for a given state, before interacting with them?
What the ad hoc agent knows ahead of time, not what itcan learnCompare expected distribution of teammates actions totrue distributionAveraged over all states and teammatesHigher values more knowledge easier planning
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Team Knowledge
Does the ad hoc agent know what its teammates actions willbe for a given state, before interacting with them?
What the ad hoc agent knows ahead of time, not what itcan learnCompare expected distribution of teammates actions totrue distributionAveraged over all states and teammatesHigher values more knowledge easier planning
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Team Knowledge
K (T ,P) =
1 if JS(T ,P) = 0
1 JS(T ,P)JS(T ,U)
if JS(T ,P) < JS(T ,U)
JS(P,U)JS(U,Point)
otherwise
Team Knowledge =
ns=1
kt=1
K (Tt(s),Pt(s))
nkT - True distributionP - Predicted distributionJS - Jensen-Shannon divergence, a symmetric variant of KLPoint - distribution with all weight on one point
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Environment Knowledge
Does the ad hoc agent know the transition and reward dis-tribution given a joint action and state before interacting withthe environment?
2 parts - transition and reward are separateCompare expected next state/reward distribution to truedistributionGiven full information of joint actionAveraged over all statesHigher values more knowledge easier planning
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Environment Knowledge
Does the ad hoc agent know the transition and reward dis-tribution given a joint action and state before interacting withthe environment?
2 parts - transition and reward are separateCompare expected next state/reward distribution to truedistributionGiven full information of joint actionAveraged over all statesHigher values more knowledge easier planning
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Reactivity of Teammates
How much does the ad hoc agents actions affect those of itsteammates?
Compare resulting joint action distribution for different adhoc agents actionsOne step effectsSimilar to empowermentHigher values more reactive harder planning, butmore control
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Reactivity of Teammates
How much does the ad hoc agents actions affect those of itsteammates?
Compare resulting joint action distribution for different adhoc agents actionsOne step effectsSimilar to empowerment
[?]
Higher values more reactive harder planning, butmore control
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Overview
Analyze variations on an example domainIdentify how to reuse prior workAnalyze more domainsIdentify areas for future research
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Pursuit Domain
Grid world - TorusN Predators and 1 PreyPredators goal is tocapture the prey as quicklyas possibleAct simultaneouslyCollisions randomlydecided - loser stays still
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
vid.mpgMedia File (video/mpeg)
4-Predator Known Behaviors
4 predatorsTeammates use 1 of 4behaviorsTeammate behavior isknown
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Problem Analysis
DomainTeam Environment Teammate
Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
4-Predator Unknown Behaviors
4 predatorsTeammates are drawnfrom set of 4 behaviorsTeammate behavior is notknown
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Problem Analysis
DomainTeam Environment Teammate
Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501
4 Unknown 0.1550.807 (1,1) 0.001050.501
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
2-Predator Simultaneous
2 predatorsChoose high level behaviorto play for an episodeBoth know expectedcapture time for behaviorsTeammate plays bestresponseCapture by both predatorsneighboring the prey
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Problem Analysis
DomainTeam Environment Teammate
Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501
4 Unknown 0.1550.807 (1,1) 0.001050.5012 Simul 1 (1,1) 0.198
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
2-Predator Teaching
2 predators taking turnsChoose high level behaviorto play for an episodeAd hoc agent knowsexpected capture times forbehaviorsTeammate choosesgreedily based onobserved capture times
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Problem Analysis
DomainTeam Environment Teammate
Knowledge (Trans, Reward) Reactivity4 Known 1 (1,1) 0.001050.501
4 Unknown 0.1550.807 (1,1) 0.001050.5012 Simul 1 (1,1) 0.1982 Teach 1 (1,1) 0.03420.118
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
2-Predator Simultaneous Revisited
Repeated normal-formgameShared payoffsChoosing behaviorscorresponds to choosingrow/columnTry to find lowest cost pathto optimal cell
=TA PD GR
TA -4.583 -5.123 -5.152PD -5.123 -4.946 -4.615GR -5.152 -4.615 -4.379
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
2-Predator Simultaneous Revisited
[?]
Efficient algorithm when memory size of 1
Exponential algorithm when memory size of >1
Can handle non-deterministic teammates
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Related Work
[?], [?], [?],
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Problem Descriptions
Flocking control - controlling a flock with a shill agentHan, Li, and Guo 2006
Unknown teammates (UTM) - cooperative box pushing,meeting in a 3x3 grid, and multi-channel broadcast
UTM-1 - follow a fixed set of actionsUTM-2 - attempt to play optimally, but have limitedobservationsWu, Zilberstein, and Chen 2011
Simulated pickup soccer - ad hoc agent given a differentplaybook
Bowling and McCracken 2005
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Problem Analysis
DomainTeam Environment Teammate
Knowledge (Trans, Reward) ReactivityFlocking control 1 (1,1) 0.07320.880Cooperating with
0 (1,1) 0UTM-1 teammatesCooperating with
0 (1,1) >0UTM-2 teammates
Simulated pickup soccer >0 (>0,1) >0
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Future Research
Most prior work focused on planning to interact withteammatesLow team knowledge - must explore teammates behaviorsLow environment knowledge - must explore environmentTrade-off between exploiting current knowledge, exploringteammates, and exploring the environmentMore complex domains
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Conclusions
Can analyze ad hoc team problems in terms of:Team KnowledgeEnvironmental KnowledgeTeammate Reactivity
Analysis helps for reusing prior algorithms on new domainsBetter identify areas for future research
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks
Thank You!
Analyzing domains canhelp ad hoc team agentscooperate with a variety ofteammates
Samuel Barrett, Peter Stone An Analysis Framework for Ad Hoc Teamwork Tasks