Learning-Based Automatic Generation of Collision Avoidance Algorithms for Multiple Autonomous Mobile...

Learning-Based Automatic Learning-Based Automatic Generation of Collision Avoidance Generation of Collision Avoidance

Algorithms for Multiple Algorithms for Multiple Autonomous Mobile RobotsAutonomous Mobile Robots

Yukiyoshi FujitaYukiyoshi Fujita

Ichiro SuzukiIchiro Suzuki

Satoshi FujitaSatoshi Fujita

Hajime AsamaHajime Asama

Masafumi YamashitaMasafumi Yamashita

Am

i Sha

kked

236805 - Seminar in CS (Robotics) 2

AbstractAbstract

• This is a discussion about an automatic This is a discussion about an automatic generation of a collision avoidance generation of a collision avoidance algorithm:algorithm:– Effective algorithm for two robots that

simulates human trial and error– Usage of a reward function that is also

learned by the robots– Sole usage of the sensor's output

Am

i Sha

kked


Abstract (cont.)Abstract (cont.)

• How a robot can use its gained How a robot can use its gained “experience” for a more complex “experience” for a more complex environmentenvironment

• Usage of reduced state spaceUsage of reduced state space

• Usage of Omni-directional robotsUsage of Omni-directional robots

• Comparison of theoretical results to Comparison of theoretical results to the actual resultsthe actual results

Am

i Sha

kked


IntroductionIntroduction• An autonomous multi-robot system is on in An autonomous multi-robot system is on in

which:which:– No fixed “leader” - each robot is self driven

only by it’s own design & data

– Each robot adjusts itself independently.

• This is an advantage when it comes to This is an advantage when it comes to failures, scalability, communication failures, scalability, communication overhead etc.overhead etc.

• On the other hand the design of algorithms On the other hand the design of algorithms is more difficultis more difficult

Am

i Sha

kked


Introduction (cont.)Introduction (cont.)• The discussed robots have eight The discussed robots have eight

sensorssensors

• Each sensor can detect:Each sensor can detect:– A nearby object (robot or wall)– Direction of the object’s motion (out of eight)– Speed of the object (out of three)

• The above results a state space of the The above results a state space of the sensors outputs that consist (8*3+2)sensors outputs that consist (8*3+2)88 statesstates

Am

i Sha

kked


Introduction (cont.)Introduction (cont.)• This motivates a combined This motivates a combined

research of:research of:– Collision avoidance algorithm

– Automatic reduction of states (by automatic state merging)

Am

i Sha

kked


Introduction (cont.)Introduction (cont.)• A robot in an unknown environment A robot in an unknown environment

repeatedly evaluates its performancerepeatedly evaluates its performance

• The more successful actions (from the The more successful actions (from the past) are more likely to be chosenpast) are more likely to be chosen

• We will investigate how the above We will investigate how the above robots autonomously organize the state robots autonomously organize the state space & generate a collision avoidance space & generate a collision avoidance algorithm based on reduced state spacealgorithm based on reduced state space

Am

i Sha

kked


Introduction (cont.)Introduction (cont.)• We will examine a simulated naive We will examine a simulated naive

human trial & error learning algorithm human trial & error learning algorithm and see it presents relatively good and see it presents relatively good resultsresults

• All algorithm parameters are adjusted All algorithm parameters are adjusted without any external interventionwithout any external intervention

• A discussion about how robots can use A discussion about how robots can use their experience for a more complicated their experience for a more complicated environment (three robots) will be heldenvironment (three robots) will be held

Am

i Sha

kked


Introduction (cont.)Introduction (cont.)• In addition to the theoretical discussions In addition to the theoretical discussions

and experiments we will hold physical and experiments we will hold physical experiments as wellexperiments as well

• Results will show very high probability of Results will show very high probability of collision avoidance - especially for two collision avoidance - especially for two robotsrobots

• The algorithm works reasonably well for The algorithm works reasonably well for the case of three robotsthe case of three robots

Am

i Sha

kked


The Model of the RobotsThe Model of the Robots

• The discussed Omni-directional The discussed Omni-directional robots have 8 infra-red sensors robots have 8 infra-red sensors (trans. & receiv.) and can detect the (trans. & receiv.) and can detect the position of robot position of robot ii in a relative in a relative movement angle movement angle jj..

• For convenience sake we will discard For convenience sake we will discard the other detection possibilities (like the other detection possibilities (like detecting a wall)detecting a wall)

Am

i Sha

kked


The Model of the Robots (cont.)The Model of the Robots (cont.)

• Let us assume Let us assume is a distinct output of is a distinct output of the sensors and each is being a vector of the sensors and each is being a vector of the sensors outputthe sensors output

• A A state spacestate space is a partition Q of is a partition Q of • For each For each statestate q qQ we prepare an Q we prepare an

action tableaction table S Sqq whose k whose kthth element S element Sqq(k) is (k) is

the probability that a robot in state q will the probability that a robot in state q will move in direction k ( )move in direction k ( )

7

0

1 ) (K

qk S

Am

i Sha

kked



• Each robot decides to move according Each robot decides to move according to its sensors output to its sensors output meaning it meaning it moves in direction k under the moves in direction k under the distribution Sdistribution Sqq for each q for each qQQ

• The task of the robots is to The task of the robots is to autonomously build a partition Q and an autonomously build a partition Q and an action tableaction table S Sqq for each for each qqQQ

• In the future aIn the future akk notes: the action of notes: the action of

moving in direction kmoving in direction k

Am

i Sha

kked



• An examplary view of the robots An examplary view of the robots and how their positioning is marked:and how their positioning is marked:

0

1

2

34

5

70

6

1

2

3

45

6

0

7

Direction of motion

Robot BS(6,1)

Robot AS(1,6)

Am

i Sha

kked


Collision Avoidance for Two Collision Avoidance for Two RobotsRobots

Construction of Action Construction of Action Tables by LearningTables by Learning

Am

i Sha

kked


Action TablesAction Tables

• We start with a case of two robotsWe start with a case of two robots• The state The state =(i,j) denotes sensor i is facing =(i,j) denotes sensor i is facing

sensor j of the other robot sensor j of the other robot | |22|=64|=64

• QQ22={{={{}|}|22} is a partition of } is a partition of 22

• ppijkijk is the value of the k is the value of the kthth element of element of action action

tabletable S(i,j) - the probability that a robot will S(i,j) - the probability that a robot will take action atake action akk when the sensor’s output is when the sensor’s output is

(i,j)(i,j)

Am

i Sha

kked


Action Tables (cont.)Action Tables (cont.)

• To create an un-biased system we assign To create an un-biased system we assign k ; pk ; pijkijk==11//88

• Evaluation of the influence of aEvaluation of the influence of akk is done by: is done by:

rewardreward=(=( (f (ftt-f-ft+1t+1)+(1-)+(1-)(d)(dt+1t+1-d-dtt))))

• fftt: distance between the robot and the target : distance between the robot and the target

at time tat time t

• ddtt: distance between the two robots at time t: distance between the two robots at time t

Am

i Sha

kked



• 001 1 and and >0 will be determined by >0 will be determined by the robotsthe robots

• The The rewardreward shows the need to get as shows the need to get as close as possible to the target without close as possible to the target without getting near another robotgetting near another robot

• A robot that takes action aA robot that takes action akk from state from state

(i,j) updates the (i,j) updates the action tableaction table S(i,j) by: S(i,j) by:– pijk=max{pijk+reward, 0} while holds

7

0

1)(K

q kS

Am

i Sha

kked



• Simulation: Simulation: =0.5, =0.5, =0.05, d=0.05, d00=1.0 and a =1.0 and a

state (i,j)state (i,j)

• Move them one step and update the Move them one step and update the action tableaction table

• Repeat this 64k times & update all 64 Repeat this 64k times & update all 64 states about 1k timesstates about 1k times

• The vectors of S(i,j) converges to pThe vectors of S(i,j) converges to p ijkijk=1.0 =1.0

for a single k for most states (i,j)for a single k for most states (i,j)

Am

i Sha

kked



• The following table shows k for all i & jThe following table shows k for all i & j• The actions in parentheses show the The actions in parentheses show the

highest probability where convergence highest probability where convergence to a single number didn’t occurto a single number didn’t occur

j 7 6 5 4 3 2 1 0

2 2 2 2 6 6 (6) 6 0 6 7 7 7 7 6 7 7 1 7 7 7 7 7 0 7 7 2 7 7 7 0 0 0 7 0 3i

0 7 0 0 0 0 0 0 4 0 0 0 0 1 1 1 1 5 1 1 1 1 1 1 1(1) 6 1 2 1 2 1 1 1 1 7

Am

i Sha

kked



• Let us test the algorithm’s Let us test the algorithm’s performance (in a simulation):performance (in a simulation):– Each robot is a 1.0 radius disc

– A sensor can feel a distance of 2.0

– A robot can move in steps of 0.5

– The initial distance between the robots is 2.0

– The target for each robot is at a distance of 10.0 in direction 0

Am

i Sha

kked



• CASE (i,j) states an experiment in CASE (i,j) states an experiment in which the initial state of one of the which the initial state of one of the robots is (i,j)robots is (i,j)

• Each robot moves according to Each robot moves according to the the action selection tableaction selection table unless unless he can move directly towards its he can move directly towards its targettarget

Am

i Sha

kked


Action Tables (cont.)Action Tables (cont.)• Results of the simulation show Results of the simulation show

success in all 64 casessuccess in all 64 cases

• Below are the more difficult cases:Below are the more difficult cases:– CASE(0,0), CASE(1,6), CASE (1,7),

CASE (2,7)

Am

i Sha

kked


Action Tables (cont.)Action Tables (cont.)• For comparison we will simulate For comparison we will simulate

a heuristic algorithm in which a heuristic algorithm in which the robot chooses the first free the robot chooses the first free direction (0,1, ... ,7)direction (0,1, ... ,7)

• There is no difference in There is no difference in performance between the twoperformance between the two

Am

i Sha

kked


Tuning Tuning and and which is used to update the which is used to update the

probability table implies the robots probability table implies the robots collision avoidance policy:collision avoidance policy:– A greater - move forward in direction

0 (less avoidance)

– A smaller - stronger avoidance

Am

i Sha

kked


Tuning Tuning and and (cont.) (cont.)

implies the “strength” of the last implies the “strength” of the last experience:experience:– A larger - stronger consideration to

the last experience

– A smaller - a slower learning process

• An ideal learning process should be An ideal learning process should be without human assistancewithout human assistance

Am

i Sha

kked


Tuning Tuning and and (cont.) (cont.)• We uWe use the following se the following tuning process tuning process

(assuming the robots reach their target (assuming the robots reach their target within 30 steps without a collision) within 30 steps without a collision) starting with starting with =1.0 (and a fixed value of =1.0 (and a fixed value of ):):– With the current value build the 64 action

tables S(i,j) from the previous chapter in 30k updates for random states (i,j)

– Evaluate the algorithm for CASE(0,0) to CASE(7,7) while changing until the robots readh their target in 30 steps or less

Am

i Sha

kked


Tuning Tuning and and (cont.) (cont.)

• The rules for changing The rules for changing ::– If a collision occurs in one of the 64

possibilities decrease by – If no collision occurs in all 64

possibilities but the robots can’t reach their target in 30 steps - increase by

begins as 0.1 and is halved every begins as 0.1 and is halved every time time uses its last value uses its last value

Am

i Sha

kked


Tuning Tuning and and (cont.) (cont.)• Figure 3 shows the results of this Figure 3 shows the results of this

experimentexperiment is eventually stabilized on 0.4is eventually stabilized on 0.4

Am

i Sha

kked


Tuning Tuning and and (cont.) (cont.)• Assumption:Assumption:

– The robots “want” to create the set of 64 action tables S(i,j) within 20k to 30k updates

• We start with We start with =1.0 (and a fixed =1.0 (and a fixed value of value of ))

• If more than 30k updates occur, If more than 30k updates occur, ==/2 and if less than 20k updates /2 and if less than 20k updates occur occur =2=2

Am

i Sha

kked


Automatic state space creationAutomatic state space creation

• Reminder:Reminder:– Q2={{}|2} is a state space

– S(i,j) and the action tables (slide 19) are built

– can be created by merging two adjacent states with the same action | |=24

• The algorithm based on has the same The algorithm based on has the same performance as the originalperformance as the original

• can be built automatically at the end of can be built automatically at the end of the learning process of the action tablesthe learning process of the action tables

Q*

2

Q*

2

Q*

2

Q*

2

Am

i Sha

kked


Collision avoidance for three Collision avoidance for three robotsrobots

• A similar approach can be used for a A similar approach can be used for a more complex environment based on more complex environment based on simpler environment resultssimpler environment results

• We will compare the method from the We will compare the method from the previous chapters to a simpler previous chapters to a simpler learning methodlearning method

Am

i Sha

kked


Direct learning algorithmDirect learning algorithm

• For three robots the robots sensor’s output For three robots the robots sensor’s output is is ((i((i11,j,j11),(i),(i22,j,j22))))

• (i(ikk,j,jkk) where k=1,2 means sensor i) where k=1,2 means sensor ikk is facing is facing

sonsor isonsor ikk and (i and (i22,j,j22) is undefined if only one robot is ) is undefined if only one robot is

visiblevisible

• Assume QAssume Q33 is a partition of is a partition of 33 and that Q and that Q33={{={{}|}|

33} and build (i} and build (i11,j,j11,i,i22,j,j22) instead ((i) instead ((i11,j,j11),(i),(i22,j,j22))))

• We will concentrate on cases with two robots in We will concentrate on cases with two robots in sight since in the case of one we can adopt the sight since in the case of one we can adopt the previous action tablesprevious action tables

Am

i Sha

kked


Direct learning algorithm (cont.)Direct learning algorithm (cont.)

• We chose a state ((iWe chose a state ((i11,j,j11),(i),(i22,j,j22)) & update )) & update

S(iS(i11,j,j11,i,i22,j,j22) after a single step with the ) after a single step with the

previously described previously described rewardreward

• Repeated the process 1,792k times Repeated the process 1,792k times (each table is updated ~1k times)(each table is updated ~1k times)

=0.5, =0.5, =0.05=0.05

• From the results we can deduce an From the results we can deduce an action selection tableaction selection table similar to the one similar to the one we sawwe saw

Am

i Sha

kked


Direct learning algorithm (cont.)Direct learning algorithm (cont.)• CASE (0,0,1,0), CASE CASE (0,0,1,0), CASE

(0,1,1,6), CASE (1,0,7,0)(0,1,1,6), CASE (1,0,7,0)

• The second figure The second figure compares the first compares the first (heuristic) algorithm with (heuristic) algorithm with the learning-based one the learning-based one we just sawwe just saw

• We clearly see that the We clearly see that the latter outperforms when latter outperforms when the first can’t handle the first can’t handle some of the cases wellsome of the cases well

Am

i Sha

kked


• We adopt the from our previous We adopt the from our previous discussion and turn it intodiscussion and turn it into as a as a state space for three robotsstate space for three robots

• We get 300 states instead of 1792 We get 300 states instead of 1792 states (including a single robot vision)states (including a single robot vision)

• Repeat the learning process as Repeat the learning process as discussed in the last three slides with discussed in the last three slides with reduced state spacereduced state space

Reduced state learning for 3 robots Reduced state learning for 3 robots

Q*

2

QQ**

*22

Am

i Sha

kked


Reduced state learning for 3 Reduced state learning for 3 robots (cont.)robots (cont.)

• The table shows the The table shows the actions that their actions that their probability converged probability converged to some valueto some value

• Parentheses show the Parentheses show the action with the highest action with the highest probability but no probability but no convergence took convergence took placeplace

actionj2i2j1i1class6--001

(6)--1026--2-3032--4-7041--7-37212--47221--57232--6724

(6)100025(2)2-30002624-70002760-1100281477-37295

(1)577-372961677-372972574729816747299

(0)6757300

Am

i Sha

kked


• As we can see the reduced state space has As we can see the reduced state space has an advantagean advantage

• We believe the advantage is the outcome of We believe the advantage is the outcome of the need for a single update which can be the need for a single update which can be equivalent to many updatesequivalent to many updates

Reduced state learning for 3 Reduced state learning for 3 robots (cont.)robots (cont.)

Am

i Sha

kked


• We installed the obtained algorithms on We installed the obtained algorithms on the omni-robotsthe omni-robots

• Two robots:Two robots:– 10 experiments for each (CASE(0,0),

CASE(1,7), CASE(1,6))

– 8, 6 & 7 avoidances, respectavly

• Three robots:Three robots:– Without showing the results, the robots avoided

collisions 5 out of 10 experiments but for some cases the algorithm didn’t perform well when compared to the reduced state (time-wise)

Experiment with Physical RobotsExperiment with Physical Robots

Am

i Sha

kked


Experiment with Physical Experiment with Physical Robots (cont.)Robots (cont.)

• We attribute the differences to the We attribute the differences to the following:following:– Non discrete movement of the robots– Non syncronized movement of the robots

Am

i Sha

kked


ConclusionsConclusions• The robots built (without any intervention) The robots built (without any intervention)

a collision avoidance algorithma collision avoidance algorithm• We demonstrated how good algorithms We demonstrated how good algorithms

can be used for a more complex can be used for a more complex enviorment containing more than three enviorment containing more than three robotsrobots

• Most of the time the resulting algorithm Most of the time the resulting algorithm gives good resultsgives good results

• We didn’t discuss the memory demandsWe didn’t discuss the memory demands

Am

i Sha

kked


Future StudyFuture Study• A more complex state spaceA more complex state space

• Copying methods from a simple to a Copying methods from a simple to a complex enviromentcomplex enviroment

• Improve the simulation modelImprove the simulation model

Learning-Based Automatic Generation of Collision Avoidance Algorithms for Multiple Autonomous Mobile...

Documents

Transcript of Learning-Based Automatic Generation of Collision Avoidance Algorithms for Multiple Autonomous Mobile...