RoboCup Standard Platform League: Strategies and Challenges€¦ · RoboCup Standard Platform...

RoboCup Standard Platform League:Strategies and Challenges

Aris Valtazanos

School of InformaticsStructure and Synthesis of Robot Motion

March 3, 2011

slide 1 of 33 www.inf.ed.ac.uk

Talk overview

• The RoboCup Standard Platform League• Team EdInferno

• Overall framework• Novel techniques and algorithms

• Future endeavours


Humanoid league

• Custom-made robots, focus on hardware and control


Middle Size league

• More standardised design

• Fully autonomous

• On-board sensing and omnidirectional vision

• Only ball is colour-coded - not even the goals!


Small Size league

• Very fast wheeled robots

• Can probably already beat humans!

• But not fully autonomous - off-board, overhead vision system


Simulation league

• Two categories: 2-D and 3-D league

• 2-D: focus on multi-agent coordination, team strategies, etc.

• 3-D: simulated matches between teams of NAO robots, (basic)modelling of dynamics


So why a Standard Platform League?

• A league that provides a testbed with realistic constraints . . .

• . . . without the need to invest too much effort on hardware anddynamic primitives

• Moreover, all teams use the same robot platform!

• So, success depends solely on algorithmic merit• Various domains of interest:

• Physical actions (locomotion, kicking)• Decision making algorithms• Multi-robot communication and cooperation• Vision-based localisation• Belief estimation• . . . and several more


Platform

• 2000(?)-2007: SONY Aibo (4-legged league)

• 2008-present: Aldebaran NAO (SPL replaces the 4-leggedleague)


History

• 2008: Total disaster (according to eyewitnesses)

• 2009-2010: Improvement and expansion

• Winners: B-Human (x2)• But state-of-the-art still consists of:

• Fast, robust locomotion• Good, strong kicks• Good enough vision-based localisation

• Very little in the way of:• Team cooperation (e.g. passing)• Team coordination (e.g. role assignment)• . . . and anything else you would label as “artificial intelligence”


Some technicalities

• Pitch size: 6x4m

• Team size: 4 robots (as of 2011 - previously 3)• Visual cues:

• Goalmouths: one blue, one yellow (localisation)• Localisation beacons• Ball: orange• Waistbands: pink for one team, light blue for the other (swap at

half time)• Lines, boxes, penalty spots, etc.


Some more technicalities - NAO robot

• Height: ∼ 60cm

• Built-in closed loop walking engine - max speed: 9.5cm/s (someteams have their own, faster engines)

• Two cameras - top & bottom (normally only use the latter)• Field of view: 58◦ (diagonal)• To change their FoV, robots can either move or turn their head

• Two sonar sensors accross chestboard - range up to 2m

• Various other sensors: touch, force-sensitive etc.


Team EdInferno

• Sep. 2009: team established (i.e. first robots arrived)

• Sep. 2009 - Mar. 2010: familiarisation with the platform -walking engine did not exist at that time, lots of frustratingproblems

• Apr. 2010 - Aug. 2010: first serious attempt to create acomplete framework, 2 more robots acquired

• Sep. 2010 - Dec. 2010: main development period, qualificationfor RoboCup

• Jan. 2011 - present: 5 more robots acquired, work towards fullintegration of all modules


Behaviour module

• Basically everything that doesn’t involve vision, localisation, orinter-robot communication

• Main functionalities:• Belief estimation and sensor fusion• Role assignment and decision making• Path planning and action execution

• Required inputs: locations of salient objects in field of view,communicated info from teammates, other sensor readings(sonar)


(Main) Behaviour module components

• Vision “helper”

• Belief estimator

• Decison maker

• Path target selector

• Path planner

• Action executor


Vision “helper”

• Basic trigonometric functions

• E.g. convert image coordinates to real world distances throughrobot’s kinematics

• Ball tracker

• Field of view bounds calculation


Belief estimator

• Preliminaries:• Observation: Anything the robot sees or senses, e.g. “a robot at

location (0.5,0.5)”• Belief: A confidence-based deduction based on a history of

observations, “I am 80% confident that the robot at (0.5, 0.5) isteammate #1”.

• Sensor fusion: combine vision and sonar readings into a singleset of observations

• Information sharing: update these observations fromcorresponding teammate observations

• Observation assignment: for each current observation, find bestmatching past belief


Belief estimator (cont.)

• Particle filtering: for each teammate/adversary, maintain a set ofhypotheses (particles) over their possible states

• Two main steps:• Predict: Given a (probabilistic) motion model, estimate how each

particle might next move• Update: Compute the likelihood of each update based on the

current sensor readings

• Subject to consistent observations, particles may converge

• Role assignment: egocentrically determine each teammember’s role(e.g. who should go kick the ball)


Particle filter toy example


Decision maker

• Based on own inferred role and current beliefs, determine theappropriate action

• Possible actions: move(dx,dy,dθ), kick(type,speed),scan(dyaw,dpitch), getup(front/back)

• Choice of action should depend on belief confidence (e.g. ifwe’re not sure where the ball is, scanning should be the highestpriority)

• Also requires fine-tuned thresholds, e.g. for kicking


Path target selector

• Invoked if selected action == move

• Chooses an appropriate target for path planning• More challenging than it sounds! E.g., for kickers:

• Determine where we would like the ball to eventually be, from alist of candidate affordable locations, and subject to a set ofconstraints

• Compute best kicking position and posture that will allow us tokick ball to this desired location


Path planner

• As with path target selector, invoked only if selected action ==move

• As name suggests, plans a path that will lead robot to desiredlocation

• Two cases:• If no objects (e.g. other robots, goal posts) in view, simply plan a

straight path• Else, plan a path, every point of which is at least some safety

distance from each obstacle


Action executor

• Simply executes the selected action!

• If selected action == move, executes first step of computedtrajectory

• May also execute two moves at once, e.g. move and scan


Research contributions (in progress!)

• Reachable sets: improve particle filtering algorithm byaccounting for the physical capabilities of the adversaries

• Intent inference, escape, deceit: synthesise more intelligentbehaviours that exploit the observability constraints andstrategic limitations of the adversaries

• Bringing the above together in a closed-loop sense


Reachable sets


Composable reachable sets

• Initial idea: composable reachable sets

• Compute different sets for each capability hypothesis for theadversary offline

• Online, always pick the one that most closely matches theadversary’s observed behaviour, based on particle filterestimates

• Result: more flexible decision making that adapts locally, in theface of noisy observations


Composable reachable sets

• Offers some performance improvement

• But sensory information is too noisy to allow accurate estimationof velocities

• Need a more flexible approach that adapts to adversary overtime

• New approach: use the reachable set as a proposaldistribution inside the particle filter

• State estimation is still probabilistic and data-driven, but withadditional physical constraints


Intent inference, escape and deceit

• Very difficult for robots to execute complicated strategies(passing, attack formations etc)

• But they can be strategic in different ways!

• Approach: form flexible probabilistic models of the adversaries,through which their capabilities may be exploited


Intent inference

• Decompose adversary’s behaviour into a set of coarse classes(intent templates), and define a probability distribution over them

• E.g. {Move towards ball, move towards me, move randomly,stand still}

• At time t :• Compute the expected moves for each template• Pick a template randomly (proportionally to its weight)

• At time t + 1, adjust intent template weights based on the actualmove of the robot


Escape strategies

• Idea: Robots are faced with strong sensory limitations. . .

• . . . but this is also true of their opponents!

• Select actions and trajectories so they exploit these capabilitiesand hide information from the adversary:

β̂ = argmaxβ∈BT

1|β|

|β|∑k=1

dist(βk , vbsij ) (1)

ρ̂ = argmaxρ∈RT

1|ρ|

|ρ|∑k=1

dist(ρk , sbsij ) (2)


Deceit

• Escape strategies are one-step predictive

• Can we extend this to greater time horizons?

• Deceptive move: maximise deviation from the move theadversary expects you to do, while minimising the distance toyour own goal:

d̂m = argminm∈DM

wDDtµ(m) + wUUt

µ(m) (3)

whereDtµ(dm) = −dist(dm,E t

µ), (4)

Utµ(dm) = dist(dm,Gt

µ) (5)


Regret minimisation

• Well-studied game-theoretic concept

• Aim: learn and adapt to adversary’s strategic model

• Our approach: adjust weight distributions for intent templatesand deceit online, based on difference between expected andactual moves


Regret minimisation algorithm


Complete decision making algorithm


RoboCup Standard Platform League: Strategies and Challenges€¦ · RoboCup Standard Platform...

Documents

Transcript of RoboCup Standard Platform League: Strategies and Challenges€¦ · RoboCup Standard Platform...