Learning and imitation in heterogeneous robot groups

101
Introduction Architecture Imitation in robot groups Conclusion Learning and imitation in heterogeneous robot groups Wilhelm Richert [email protected] Fakultät für Elektrotechnik, Informatik und Mathematik, Universität Paderborn 22. Dezember 2009 Learning and imitation in heterogeneous robot groups 1 / 58

description

As robots become increasingly affordable, they are used in ever more diverse areas in order to perform increasingly complex tasks. These tasks are typically preprogrammed by a human expert. In some cases, however, this is not feasible -- either because of the inherent complexity of the task itself or due to the dynamics of the environment. The only possibility then is to let the robot learn the task by itself. This learning process usually involves a long training period in which the robot experiments with its surroundings in order to learn the desired behavior. If robots have to learn a shared goal in a group, the robots should imitate each other in order to reduce their individual learning time. The question how this can be done in a robot group has been considered in this thesis, i.e., how robots in a group can learn to achieve their shared goal and imitate each other in order to increase the performance and the speed of learning by spreading the learned knowledge in the group. To allow for this intertwined learning and imitation, a dedicated robot architecture has been developed. On the one hand, it fosters autonomous and self-exploratory learning. On the other hand, it allows for manipulating the learned knowledge and behavior to account for new knowledge gathered by the imitation process. Learning of behavior is achieved by separately learning at two levels of abstraction. At the higher level, the strategy is learned as a mapping from abstract states to symbolic actions. At the lower level, the symbolic actions are grounded autonomously by learned low-level actions. The approaches of imitation presented in this thesis are unique in that they relieve the requirements that governed multi-robot imitation so far. It enables robots in a robot group to imitate each other in a non-obtrusive manner. The robots can thus increase their learning speed and thereby the overall performance of the group by simply observing the other group members without requiring them to stick to a certain communication protocol that would provide the necessary information. With the presented approach, a robot is able to infer the behavior that the observed demonstrator is performing and to replay the beneficial behavior with its own capabilities. In addition, the presented approaches allow the robots to apply imitation even if the group is heterogeneous. Normally, the performance of a group degrades if robots with incompatible capabilities imitate each other. Capability differences arise if robot morphologies differ in a robot group. This is the case if different robots from different manufacturers form a robot group that has to achieve shared goals. This thesis presents an approach that is able to determine similarities or differences between robots. This can guide the robots in a heterogeneous robot group in order to determine those robots for imitation that are most similar to themselves.

Transcript of Learning and imitation in heterogeneous robot groups

Page 1: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Learning and imitation in

heterogeneous robot groups

Wilhelm [email protected]

Fakultät für Elektrotechnik, Informatik und Mathematik,Universität Paderborn

22. Dezember 2009

Learning and imitation in heterogeneous robot groups 1 / 58

Page 2: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

MotivationWhy do we need learning and imitation?

State of the art

L Off-line learning (mostly population-based)

L Behavior is fixed afterwards

Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009]

Desired

L On-line learning to intelligently react on unforeseeable events/problems

L Means to benefit from the “redundancy” in group behavior

L Robustness to arbitrary robot groups

Learning and imitation in heterogeneous robot groups 2 / 58

Page 3: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

MotivationWhy do we need learning and imitation?

State of the art

L Off-line learning (mostly population-based)

L Behavior is fixed afterwards

Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009]Desired

L On-line learning to intelligently react on unforeseeable events/problems

L Means to benefit from the “redundancy” in group behavior

L Robustness to arbitrary robot groups

Learning and imitation in heterogeneous robot groups 2 / 58

Page 4: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

The five big challenges in imitation[Dautenhahn and Nehaniv, 2002]

Five big challenges governing successful imitation in multi-robot systems:

whom � heterogeneous robot groups

when � concentrate on salient behavior

what � the results, the actions, or the hidden goals of the imitatee?

how � correspondence problem

how to evaluate What should be counted as successful imitation?

Learning and imitation in heterogeneous robot groups 3 / 58

Page 5: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Thesis objectives

Robots in a groups shall be able to

1. combine learning with imitation,

2. recognize and learn observedbehavior non-obtrusively, and

3. choose potential imitatees wiselyalso in heterogeneous robot groups.

Learning and imitation in heterogeneous robot groups 4 / 58

Page 6: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 7: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 8: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 9: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 10: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 11: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 12: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 13: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 14: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Robot architecture

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

interaction example

Learning and imitation in heterogeneous robot groups 5 / 58

Page 15: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Strategy layer

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

L Inspired by AMPS [Kochenderfer, 2006]

raw perception, motivationI, µi

perception filteringot > Is

experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te

abstractions � ξ�o� heuristics

modelT ,R, γ

reinforcementlearning

policyπ

action selectiona � π�s� > A

Learning and imitation in heterogeneous robot groups 6 / 58

Page 16: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Strategy layer

L State abstraction function ξ might use any

abstraction method supportingL insertion of new state observationsL deletion of old state observationsL querying most similar state observation to

a new state observation

L Experiments use nearest neighbor

region("abstract state")

state observation("raw state")

raw perception, motivationI, µi

perception filteringot > Is

experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te

abstractions � ξ�o� heuristics

modelT ,R, γ

reinforcementlearning

policyπ

action selectiona � π�s� > A

Learning and imitation in heterogeneous robot groups 6 / 58

Page 17: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Strategy layer

L Heuristics maintain the models so that the sameaction feels similar in all observations of thesame state

L Heuristics may split or merge regionstransition, failure, reward, simplification, experience

L Example: transition heuristic

region("abstract state")

state observation("raw state")

raw perception, motivationI, µi

perception filteringot > Is

experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te

abstractions � ξ�o� heuristics

modelT ,R, γ

reinforcementlearning

policyπ

action selectiona � π�s� > A

Learning and imitation in heterogeneous robot groups 6 / 58

Page 18: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Strategy layer

L Heuristics maintain the models so that the sameaction feels similar in all observations of thesame state

L Heuristics may split or merge regionstransition, failure, reward, simplification, experience

L Example: transition heuristic

region("abstract state")

state observation("raw state")

raw perception, motivationI, µi

perception filteringot > Is

experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te

abstractions � ξ�o� heuristics

modelT ,R, γ

reinforcementlearning

policyπ

action selectiona � π�s� > A

Learning and imitation in heterogeneous robot groups 6 / 58

Page 19: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Building a policy

L Reinforcement Learning with SMDPL Q�s, a� � R�s, a� �Q

s�>SP�s�Ss, a�γ�s, a, s��Vπ�s��

L Determine current best policyL Vπ�s� � max

a>AQ�s, a�

L π�s� � argmaxa>A

Q�s, a�

region("abstract state")

state observation("raw state")

raw perception, motivationI, µi

perception filteringot > Is

experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te

abstractions � ξ�o� heuristics

modelT ,R, γ

reinforcementlearning

policyπ

action selectiona � π�s� > A

Learning and imitation in heterogeneous robot groups 7 / 58

Page 20: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Strategy layer

L Strategy layer requests symbolic actions

L Execution of these actions is up to the skill layer

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

raw perception, motivationI, µi

perception filteringot > Is

experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te

abstractions � ξ�o� heuristics

modelT ,R, γ

reinforcementlearning

policyπ

action selectiona � π�s� > A

Learning and imitation in heterogeneous robot groups 8 / 58

Page 21: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill layer

Tasks

1. discover and learn a set of skills that are useful to thestrategy layer � ground symbols > A

2. execute them when requested and optimize at runtime

Skill

L skill s � �f 1e , . . . , f Ne �, whereL error function fe � Ia � Ia � R� assigns an error value to a

pair of perception �I�ti�, I�tj��Example: “approach the ball and orient towards it”

f 1e �I�ti�, I�tj�� � dball�I�tj�� � minimize the ball distancef 2e �I�ti�, I�tj�� � Sαball�I�tj��S � minimize the ball angles � �f 1e , f 2e � � approach the ball and orient towards it

Learning and imitation in heterogeneous robot groups 9 / 58

Page 22: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill layerMeasuring a skill’s progress

L Progress function fp � Ia � Ia � �0, 1� measures a skill’s progressL For a skill s � �f 1e , . . . , f Ne � it is defined as

fp�I�ti�, I�tj�� �¢¨¦¨¤

0 if Ca BW�I�ti�, I�tj��Ca�W�I�ti�,I�tj��

Ca�Csif Cs @W�I�ti�, I�tj�� @ Ca

1 if W�I�ti�, I�tj�� B Cs

f ie : error function, I�ti�: perception when the skill has been started, I�tj�: current perception, success and

abort thresholds Cs > R� and Ca > R� (Cs @ Ca)

L W�I�ti�, I�tj�� � PNk�1 f

ke �I�ti�, I�tj��

L Example graph:Cs � 0.15, Ca � 0.75full skill definition

Learning and imitation in heterogeneous robot groups 10 / 58

Page 23: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationOverview of the approach

L Robots observe each other permanently

L Moving window of observations and well-being statesfor each observed robot

L Imitation process starts when well-beingimprovement is detected

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

observed episode`�oI1 , eI1�, . . . , �oIN , eIN�e

transform observations

subjective observation data`�oD1 , e1�, . . . , �oDN , eN�e

interpret behavior

recognized episodes`. . . , ��t, oD , s�, at , �t�, o�D , s��� , . . .e

estimate rewards

observed interpreted experience`. . . , ��t, oD , s�, at , rt , �t�, o�D , s��� , . . .e

integrate into experience,update SMDP

Learning and imitation in heterogeneous robot groups 11 / 58

Page 24: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationHMM and the Viterbi connection [Viterbi, 1967]

sa

ox

sb

oy

sc

oz

P�sb S sa�

P�sc S sa�

P�ox Ssa �

P�oy S s

a �P�o

z S sa�

o1o2 . . . oT � Viterbi � s1s2 . . . sT

V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��

Learning and imitation in heterogeneous robot groups 12 / 58

Page 25: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationHMM and the Viterbi connection [Viterbi, 1967]

sa

ox

sb

oy

sc

oz

P�sb S sa�

P�sc S sa�

P�ox Ssa �

P�oy S s

a �P�o

z S sa�

o1o2 . . . oT � Viterbi � s1s2 . . . sT

V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��

Learning and imitation in heterogeneous robot groups 12 / 58

Page 26: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationHMM and the Viterbi connection [Viterbi, 1967]

sa

ox

sb

oy

sc

oz

P�sb S sa�

P�sc S sa�

P�ox Ssa �

P�oy S s

a �P�o

z S sa�

o1o2 . . . oT � Viterbi � s1s2 . . . sT

V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��Learning and imitation in heterogeneous robot groups 12 / 58

Page 27: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationInterpreting observed behavior with the imitator’s own knowledge

Knowledge in strategy layer

L Imitator’s own transition probabilities

instead of “foreign” HMM transition

probabilities

Knowledge in skill layer

L Skills vote on perceptual changes plus

the following heuristics ...

Learning and imitation in heterogeneous robot groups 13 / 58

Page 28: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationInterpreting observed behavior with the imitator’s own knowledge

Knowledge in strategy layer

s0

s1

s2

T�s 0,

a 0, s 1�

T�s 0,

a 1, s 1�

T�s 0,

a 2, s 1�

T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�

L Imitator’s own transition probabilities

instead of “foreign” HMM transition

probabilities

Knowledge in skill layer

L Skills vote on perceptual changes plus

the following heuristics ...

Learning and imitation in heterogeneous robot groups 13 / 58

Page 29: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationInterpreting observed behavior with the imitator’s own knowledge

Knowledge in strategy layer

s0

s1

s2

T�s 0,

a 0, s 1�

T�s 0,

a 1, s 1�

T�s 0,

a 2, s 1�

T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�

L Imitator’s own transition probabilities

instead of “foreign” HMM transition

probabilities

Knowledge in skill layer

L Skills vote on perceptual changes plus

the following heuristics ...

Learning and imitation in heterogeneous robot groups 13 / 58

Page 30: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationInterpreting observed behavior with the imitator’s own knowledge

Knowledge in strategy layer

s0

s1

s2

T�s 0,

a 0, s 1�

T�s 0,

a 1, s 1�

T�s 0,

a 2, s 1�

T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�

L Imitator’s own transition probabilities

instead of “foreign” HMM transition

probabilities

Knowledge in skill layer

a0

∆o0

a1

∆o1

a2

∆o2

approach ball approach goal lift ball

������

�1

�0.4

0

������

������

�0.2

�1

0

������

������

0

0

0.3

������

ball dist

goal dist

ball height

L Skills vote on perceptual changes plus

the following heuristics ...

Learning and imitation in heterogeneous robot groups 13 / 58

Page 31: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationInterpreting observed behavior with the imitator’s own knowledge

Knowledge in strategy layer

s0

s1

s2

T�s 0,

a 0, s 1�

T�s 0,

a 1, s 1�

T�s 0,

a 2, s 1�

T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�

L Imitator’s own transition probabilities

instead of “foreign” HMM transition

probabilities

Knowledge in skill layer

a0

∆o0

a1

∆o1

a2

∆o2

approach ball approach goal lift ball

������

�1

�0.4

0

������

������

�0.2

�1

0

������

������

0

0

0.3

������

ball dist

goal dist

ball height

P�∆o2 Sa0�P�∆o2 Sa1 �

P�∆o2 Sa

2 �

L Skills vote on perceptual changes � f ap

plus the following heuristics ...

Learning and imitation in heterogeneous robot groups 13 / 58

Page 32: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationInterpreting observed behavior with the imitator’s own knowledge

Knowledge in strategy layer

s0

s1

s2

T�s 0,

a 0, s 1�

T�s 0,

a 1, s 1�

T�s 0,

a 2, s 1�

T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�

L Imitator’s own transition probabilities

instead of “foreign” HMM transition

probabilities

Knowledge in skill layer

a0

∆o0

a1

∆o1

a2

∆o2

approach ball approach goal lift ball

������

�1

�0.4

0

������

������

�0.2

�1

0

������

������

0

0

0.3

������

ball dist

goal dist

ball height

P�∆o2 Sa0�P�∆o2 Sa1 �

P�∆o2 Sa

2 �

L Skills vote on perceptual changes � f applus the following heuristics ...

Learning and imitation in heterogeneous robot groups 13 / 58

Page 33: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition

1. Recognize observation changes ot�1 � ota) Prefer nearer goals

Ambiguous situation: Robot might drive either to the red or yellow goal base

b) Ignore skills that “seem to have finished”c) Clip votes to �0, 1�

Pa�ot S ot�1� �

¢¦¤min�max�

f ap �ot��fap �ot�1�

1�f ap �ot�

, 0� , 1� , 1 � f ap �ot� @ є

0, otherwise

2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2

aml � argmaxa

Pt2t�t1

Pa�ot S ot�1�t2 � t1

3. Recognize state transitions

P�st2 S st1� � T�st1 , aml , st2�

Learning and imitation in heterogeneous robot groups 14 / 58

Page 34: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition

1. Recognize observation changes ot�1 � ota) Prefer nearer goalsb) Ignore skills that “seem to have finished”

c) Clip votes to �0, 1�

Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f

ap �ot�1�

1�f ap �ot�, 0� , 1� , 1 � f ap �ot� @ є

0, otherwise

2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2

aml � argmaxa

Pt2t�t1

Pa�ot S ot�1�t2 � t1

3. Recognize state transitions

P�st2 S st1� � T�st1 , aml , st2�

Learning and imitation in heterogeneous robot groups 14 / 58

Page 35: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition

1. Recognize observation changes ot�1 � ota) Prefer nearer goalsb) Ignore skills that “seem to have finished”c) Clip votes to �0, 1�

Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f

ap �ot�1�

1�f ap �ot�, 0� , 1� , 1 � f ap �ot� @ є

0, otherwise

2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2

aml � argmaxa

Pt2t�t1

Pa�ot S ot�1�t2 � t1

3. Recognize state transitions

P�st2 S st1� � T�st1 , aml , st2�

Learning and imitation in heterogeneous robot groups 14 / 58

Page 36: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition

1. Recognize observation changes ot�1 � ota) Prefer nearer goalsb) Ignore skills that “seem to have finished”c) Clip votes to �0, 1�

Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f

ap �ot�1�

1�f ap �ot�, 0� , 1� , 1 � f ap �ot� @ є

0, otherwise

2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2

aml � argmaxa

Pt2t�t1

Pa�ot S ot�1�t2 � t1

3. Recognize state transitions

P�st2 S st1� � T�st1 , aml , st2�

Learning and imitation in heterogeneous robot groups 14 / 58

Page 37: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationRecognition scenario: description

L Demonstrator (right robot) has totransport the yellow ball onto thebase

L Imitator (left robot) tries to“understand” its observations

L Two scenarios:

1. Imitator is only able to drive (andthereby push the ball)

2. Imitator is also able to lift theball

fig/lifting.png

Learning and imitation in heterogeneous robot groups 15 / 58

Page 38: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationRecognition scenario: results

1. Without lifting capabilities

???

dis

tance

[m

]

move toball

move togoal

L Recognized “drive to ball” (B) and “drive togoal” (G) correctly

L Detected “missing behavior” in between

2. With lifting capabilities

dis

tance

[m

]

move toball

move togoal

lift theball

L Recognized “drive to ball” (B), “lift the ball”(L), and “drive to goal” (G) correctly

Learning and imitation in heterogeneous robot groups 16 / 58

Page 39: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationRecognition scenario: results

1. Without lifting capabilities

???

dis

tance

[m

]

move toball

move togoal

L Recognized “drive to ball” (B) and “drive togoal” (G) correctly

L Detected “missing behavior” in between

2. With lifting capabilities

dis

tance

[m

]

move toball

move togoal

lift theball

L Recognized “drive to ball” (B), “lift the ball”(L), and “drive to goal” (G) correctly

Learning and imitation in heterogeneous robot groups 16 / 58

Page 40: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “three bases”

L Task: transport objects to goal bases

L Reward for reaching an object: 10

L Goal bases provide different reward

L State space consists ofL distance to closest objectL distance of closest object to closest goalL ID of closest goal

Learning and imitation in heterogeneous robot groups 17 / 58

Page 41: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ConclusionObjectives achieved in this thesis

1. Combination of learning and imitation

2. Non-obtrusive recognition and learningof observed behavior

3. Support for heterogeneous robotgroups

Thank you for your attention!

Learning and imitation in heterogeneous robot groups 18 / 58

Page 42: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ConclusionObjectives achieved in this thesis

1. Combination of learning and imitation

2. Non-obtrusive recognition and learningof observed behavior

3. Support for heterogeneous robotgroups

Thank you for your attention!

Learning and imitation in heterogeneous robot groups 18 / 58

Page 43: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

L ArchitectureL State of the artL OverviewL Layer interaction

L Motivation layerL ExcitationL Prioritizing goals

L Strategy layerL State abstractionL HeuristicsL PolicyL Sample frequencyL Strategy example

L Skill layerL Overview of the approach

explore, exploitL Skill managerL Model managerL Error minimizerL ConfigurationL Skill example

L Imitation in robot

groupsL Overview of the approach

L Recognizing behaviorL ViterbiL Interpreting observed behaviorL Recognition example

L Integrating recognized behavior

L EvaluationL CTF with three basesL PerformanceL State abstractionL Group homogeneityL CTF with five basesL PerformanceL State abstractionL Group homogeneity

L Choice of the imitateeL Affordance detectionL Affordance network generationL Comparing ANsL Choice of the imitateeL EvaluationL Parameterization of the

environmentL Robustness experimentL Clustering experiment

Learning and imitation in heterogeneous robot groups 19 / 58

Page 44: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

State of the art

[Takahashi et al., 2008] use imitation to learnrobotic soccer behaviors (approaching,shooting a ball)

� combines learning with imitation� requires the robot group to stop

whenever a robot imitates� needs multiple presentation of the

same behavior� needs sufficient prior knowledge of

the task to imitate

[Priesterjahn, 2008] evolves game bots withsimilar performance as the humanplayer

[Inamura et al., 2003] combine top-downteaching with the bottom-up learningfrom the robot’s side

Learning and imitation in heterogeneous robot groups 20 / 58

Page 45: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

State of the art

[Takahashi et al., 2008] use imitation to learnrobotic soccer behaviors (approaching,shooting a ball)

[Priesterjahn, 2008] evolves game bots withsimilar performance as the humanplayer

� shows that imitation-basedadaptation is able to outperform theevolutionary only approach

� targeted to computer gamescenarios, not stochastic real-worldapplications

� assumes group homogeneity

[Inamura et al., 2003] combine top-downteaching with the bottom-up learningfrom the robot’s side

The Rule-Based Operation Cycle of an Agent

Learning and imitation in heterogeneous robot groups 20 / 58

Page 46: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

State of the art

[Takahashi et al., 2008] use imitation to learnrobotic soccer behaviors (approaching,shooting a ball)

[Priesterjahn, 2008] evolves game bots withsimilar performance as the humanplayer

[Inamura et al., 2003] combine top-downteaching with the bottom-up learningfrom the robot’s side

� exclusive approach (cannot becombined with other learningtechniques)

� HMM is learned and then fixthroughout the robot’s lifetime

Motion capturing system: motion for learning data

A result of motion generation on a humanoid robot

Learning and imitation in heterogeneous robot groups 20 / 58

Page 47: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Layer interaction

Ê Strategy step is triggered

L Determining the current motivationand the corresponding next strategyaction.

L The strategy layer requires the mostcurrent motivation as feedbackregarding its last chosen action � bothare synchronous.

Ë Skill step is triggered

L Strategy step does not have to befinished yet

L The skill layer simply executesaccording to the action most recentlydelivered by the strategy layer

Ì,Í Strategy step has finished

L It signals the next action to executeand to the skill layer.

L Subsequent skill steps then performthis action accordingly.

clock motivation layer strategy layer skill layer perception action

Ê next strategy step event

request Improcessed perception

set next motivationrequest Is

processed perceptiondetermine next strategy step

set next skill

Strategy stepStrategy step

Ë next skill step event

request Iaprocessed perception

calculate best actuator command

set next low-level action

Skill stepSkill step

Ì next skill step event

request Iaprocessed perception

calculate best actuator command

set next low-level action

Skill stepSkill step

Í next skill step event

Learning and imitation in heterogeneous robot groups 21 / 58

Page 48: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Motivation layerMotivation system example

L The current motivation µ is the vector

to the current drive state, dependent

onL timeL perception

L Each drive measures the status ofaccomplishing a sub-goal(0 = fully accomplished)

L A drive i is called satisfied (goalachieved) if the correspondingmotivation is below its threshold:µi @ µθ

i

drive 3

drive 1

drive 2

well-beingregion

currentdrive state

currentmotivation

shortest vectorto desired drive area,used for prioritization

p

more

Learning and imitation in heterogeneous robot groups 22 / 58

Page 49: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

A sub-goal subjected to an excitation

t

1

well-being region

0

excitation

threshold triggeringbehavior

L Excitation describes the force, which the current drive stateis subjected to.

L By specifying it dependent on the perception and on theinternal state of the robot the user is “programming” thefinal behavior.

Learning and imitation in heterogeneous robot groups 23 / 58

Page 50: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Prioritizing goals

L At each time step, the motivation layer provides the currentmotivation vector to the strategy layer.

L With µp the strategy layer prioritizes, which of the sub-goalsare to be handled first

µp �

������

max�0, µ1 � µθ1 �

max�0, µ2 � µθ2 �

max�0, µn � µθn�

������L Different drives can be prioritized by means of an according

scaling � modeling a hierarchy of needs

Learning and imitation in heterogeneous robot groups 24 / 58

Page 51: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Strategy layerSample frequency

A new interaction is made in one of the following conditions:

L Sufficiently different perception (measured by some scenario-specific distancemetric d):

d�ot1 , ot2� A θo

L Sufficiently interesting motivation change:

Sµt2 � µt1 S A θr

L Enough time has passed:

t2 � t1 A θt

θo, θr, and θt are application specific and have to be determined empirically.

Learning and imitation in heterogeneous robot groups 25 / 58

Page 52: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Strategy example

S

GG

S

2

G

v

3>

(3, 1) (4, 1)

(4, 2)

(1, 1)

(2, 1)

4

v

6 >

(6, 5)

(4, 1) (5, 1)

(2, 1)

(1, 1)

(6, 4)

(6, 1)

(6, 6)

(6, 3)(6, 2)(3, 1)

G

Learning and imitation in heterogeneous robot groups 26 / 58

Page 53: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill layer

1. discover and learn a set of skills that are useful to thestrategy layer � ground symbols > A

2. execute them when requested and optimize at runtime

exploration mode

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

training mode notify new skill

create & fetch skills

createmod-els

Ia

Oexplore actions

exploitation mode

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

execution mode request skill

set current skill

fetch cur-rent skill

Ia

updatemod-els

O

Learning and imitation in heterogeneous robot groups 27 / 58

Page 54: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill layerData flow in exploration mode

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

training mode notify new skill

create & fetch skills

createmod-els

Ia

Oexplore actions

Learning and imitation in heterogeneous robot groups 28 / 58

Page 55: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill layerData flow in exploitation mode

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

execution mode request skill

set current skill

fetch cur-rent skill

Ia

updatemod-els

O

Learning and imitation in heterogeneous robot groups 29 / 58

Page 56: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill definition

L extraction function fext � Ia � R extracts information from a perception I�t� > IaL control function fc � R � R � R� associates an error value to the tuple �vti , vtj�

L decrease: fc�vti , vtj� � Svtj SL increase: fc�vti , vtj� � 1

Svtj S

L keep value: fc�vti , vtj� � Svti � δ � vtj SL error function fe � Ia � Ia � R� assigns an error value to a perception pairL progress function fp � Ia � Ia � �0, 1� measures a skill’s progress between two

time pointsmore about fp

Learning and imitation in heterogeneous robot groups 30 / 58

Page 57: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill manager

L exploration phaseL generate skills that enable the robot to

control the perceived propertiesL assign a priority to each skill dependent on

its execution priorityL determine the skills the robot can reliably

perform and notify them as new skills tothe strategy layer

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

training mode notify new skill

create & fetch skills

createmod-els

Ia

Oexplore actions

L exploitation phaseL manage the execution of requested skills

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

execution mode request skill

set current skill

fetch cur-rent skill

Ia

updatemod-els

O

Learning and imitation in heterogeneous robot groups 31 / 58

Page 58: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Model manager

L creating prediction models for each perceived

propertyL prediction model is the tuple �idp , S, M,m�

idp > IDp: perception feature to be predictedS ` IDo � IDp: subset of the perceptual featuresM ` O: subset of the actuators to controlm � RSSS�SMS � R predicts the value for theperceptual feature idp at the next inputperception given the values of S and M.

L m in experiments: Poly, RBF

L updating prediction models to reflect newexperiences

L scoring each model dependent on its predictionaccuracy:

score�m� � n

Pk�ni�k �m�S�ti�,M�ti�� � vti�1�2

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

training mode notify new skill

create & fetch skills

createmod-els

Ia

Oexplore actions

skill layer

strategy layer

skillmanager

skills

modelmanager

errorminimizer

acti

on

perc

ep

tion

execution mode request skill

set current skill

fetch cur-rent skill

Ia

updatemod-els

O

Learning and imitation in heterogeneous robot groups 32 / 58

Page 59: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Error minimizer1. Ic�t�� only those perceptual features, on which the error functions of the current

skill s are dependent on current time t

2. Estimate the next perception, Ic�t � 1�*, dependent on the motor action M as

predicted by mj

best� argmaxm�score�m��:

IMc �t � 1� � �mj

best�Ic�t�,M�t�� S pj > Ic�t��

3. For each error function f ke : calculate the expected next error eMk�t � 1�, with Ic�ti�

being the perception when the skill has been started:

eMk �t � 1� � f ke �Ic�ti�, IMc �t � 1��

4. Determine the best actuator command M�t�, by finding the one that minimizes theaccumulated expected error:

Mnext�t� � minM

N

Qk�1

eMk �t � 1�

*t � 1 is the time point of the next interaction after time t

Learning and imitation in heterogeneous robot groups 33 / 58

Page 60: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill layer configuration

Greater universality leads to a bigger exploration space. It is wise to limit theexploration space by specifying non-changing parameters beforehand. This can beachieved by configuring the following parameters:

L Degrees of freedom specify the number of actors the skill layer has to control.

L Extraction functions define the language that can be used to specify the errorfunctions.

L Control functions specify the functions that the error minimizer will minimize bymeans of the error functions.

L Regression models are used by the model manager to build predictions for theenvironment interaction. A regression model consists of two methods: one that fitsa model to an experience trace and one that predicts the value of the modeledproperty.

Learning and imitation in heterogeneous robot groups 34 / 58

Page 61: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Skill example“Minimize angle to object” learned with radial basis functions

Controlling speed dependent on angle anddistance to the object

Controlling rotational speed dependent onangle and distance to the object

Learning and imitation in heterogeneous robot groups 35 / 58

Page 62: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationViterbi [Viterbi, 1967]

Problem description

L Given the observation sequence oN1 � `o1 , o2 , . . . oNe �oi > Rd�L Find the most likely hidden state sequence sN1 � `s1 , s2 , . . . , sNe �si > S�

Approach

L Maximizing probability P�sN1 S oN1 �: sN�1 � argmaxsN1

P �sN1 S oN1 �

by recursively calculating the probability V�s, t� � maxst�11 P�ot1 , s1 . . . st�1st � s� that

s > S is the observed hidden state at time t given the observations ot1:

L V�s, 1� � P�o1 S s1 � s�P�s1 � s� ¦ s > SL V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��L φ�s, t� � argmaxs� �P�st � s S st�1 � s��V�s� , t � 1��

Learning and imitation in heterogeneous robot groups 36 / 58

Page 63: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationViterbi [Viterbi, 1967]

Problem description

L Given the observation sequence oN1 � `o1 , o2 , . . . oNe �oi > Rd�L Find the most likely hidden state sequence sN1 � `s1 , s2 , . . . , sNe �si > S�

Approach

L Maximizing probability P�sN1 S oN1 �: sN�1 � argmaxsN1

P �sN1 S oN1 �

by recursively calculating the probability V�s, t� � maxst�11 P�ot1 , s1 . . . st�1st � s� that

s > S is the observed hidden state at time t given the observations ot1:

L V�s, 1� � P�o1 S s1 � s�P�s1 � s� ¦ s > SL V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��L φ�s, t� � argmaxs� �P�st � s S st�1 � s��V�s� , t � 1��

Learning and imitation in heterogeneous robot groups 36 / 58

Page 64: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationViterbi [Viterbi, 1967]

Problem description

L Given the observation sequence oN1 � `o1 , o2 , . . . oNe �oi > Rd�L Find the most likely hidden state sequence sN1 � `s1 , s2 , . . . , sNe �si > S�

Approach

L Maximizing probability P�sN1 S oN1 �: sN�1 � argmaxsN1

P �sN1 S oN1 �

by recursively calculating the probability V�s, t� � maxst�11 P�ot1 , s1 . . . st�1st � s� that

s > S is the observed hidden state at time t given the observations ot1:

L V�s, 1� � P�o1 S s1 � s�P�s1 � s� ¦ s > SL V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��L φ�s, t� � argmaxs� �P�st � s S st�1 � s��V�s� , t � 1��

Learning and imitation in heterogeneous robot groups 36 / 58

Page 65: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationRecognition

Problem description

L Given the observation sequence oN1 � `o1, o2, . . . oNe �oi > Rd�L Find the most likely behavior sequence �t > R�, o > Rd , s > S, a > A)

à � �. . . , ��tk, ok, sk�, ak, �tk�1, ok�1, sk�1�� , . . .�

Approach

L Maximizing probability P�sn1 , an�11 S oN1 �, nP N

L Adapting V�s, 1� and V�s, t�:L Use own state and action space for S and AL Support bootstrapping of probabilitiesL Let actions recognize themselves� technical realization of the mirror neuron system

Learning and imitation in heterogeneous robot groups 37 / 58

Page 66: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

ImitationRecognition

Problem description

L Given the observation sequence oN1 � `o1, o2, . . . oNe �oi > Rd�L Find the most likely behavior sequence �t > R�, o > Rd , s > S, a > A)

à � �. . . , ��tk, ok, sk�, ak, �tk�1, ok�1, sk�1�� , . . .�

Approach

L Maximizing probability P�sn1 , an�11 S oN1 �, nP N

L Adapting V�s, 1� and V�s, t�:L Use own state and action space for S and AL Support bootstrapping of probabilitiesL Let actions recognize themselves� technical realization of the mirror neuron system

Learning and imitation in heterogeneous robot groups 37 / 58

Page 67: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition: determining P�ot S st � s�

Viterbi: V�s, t� � P�ot S st � s�maxs�

�P�st � s S st�1 � s��V�s�, t � 1��

L Every state gets a chancedependent on the distance of itsobservations:

P�ot S st � s� � Po>Nko�sÕ ot � o Õ�2

Po>NkoÕ ot � o Õ�2

L Example: P�o S s � s2�, k � 3 state observa("raw state"

region

("abstract state")

Learning and imitation in heterogeneous robot groups 38 / 58

Page 68: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition: determining P�st2 S st1�

Viterbi: V�s, t� � P�ot S st � s�maxs�

�P�st � s S st�1 � s��V�s�, t � 1��

L Replace P�st2 S st1� with T�st1 , aml , st2�, where

aml � argmaxa

Pt2t�t1 Pa�ot S ot�1�

t2 � t1

L Pa�ot S ot�1� is the vote of skill a by means of its progressfunction �f ap �ot�1� � f ap �ot�� plus the following heuristics ...

Learning and imitation in heterogeneous robot groups 39 / 58

Page 69: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition: determining P�st2 S st1�

Viterbi: V�s, t� � P�ot S st � s�maxs�

�P�st � s S st�1 � s��V�s�, t � 1��

L Replace P�st2 S st1� with T�st1 , aml , st2�, where

aml � argmaxa

Pt2t�t1 Pa�ot S ot�1�

t2 � t1

L Pa�ot S ot�1� is the vote of skill a by means of its progressfunction �f ap �ot�1� � f ap �ot�� plus the following heuristics ...

Learning and imitation in heterogeneous robot groups 39 / 58

Page 70: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition: determining Pa�ot S ot�1�

Viterbi: V�s, t� � P�ot S st � s�maxs�

�P�st � s S st�1 � s��V�s�, t � 1��1. Prefer nearer goals

Ambiguous situation: Robot might drive either to the red or yellow goal base

2. Ignore skills that “seem to have finished”

3. Clip votes to �0, 1�

Pa�ot S ot�1� �

¢¦¤min�max�

f ap �ot��f ap �ot�1�1�f ap �ot�

, 0� , 1� , 1 � f ap �ot� @ є0, otherwise

Learning and imitation in heterogeneous robot groups 40 / 58

Page 71: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition: determining Pa�ot S ot�1�

Viterbi: V�s, t� � P�ot S st � s�maxs�

�P�st � s S st�1 � s��V�s�, t � 1��1. Prefer nearer goals

2. Ignore skills that “seem to have finished”

3. Clip votes to �0, 1�

Pa�ot S ot�1� �¢¦¤

min�max�

f ap �ot��f ap �ot�1�1�f ap �ot�

, 0� , 1�

, 1 � f ap �ot� @ є0, otherwise

Learning and imitation in heterogeneous robot groups 40 / 58

Page 72: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Recognition: determining Pa�ot S ot�1�

Viterbi: V�s, t� � P�ot S st � s�maxs�

�P�st � s S st�1 � s��V�s�, t � 1��

1. Prefer nearer goals

2. Ignore skills that “seem to have finished”

3. Clip votes to �0, 1�

Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f ap �ot�1�

1�f ap �ot� , 0� , 1� , 1 � f ap �ot� @ є0, otherwise

Learning and imitation in heterogeneous robot groups 40 / 58

Page 73: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Integrating recognized behavior

L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:

Itk�1tk

� �otk , atk , dtk , µtk , ftk , otk�1�

L Integrate recognized behavior into existing experience

Learning and imitation in heterogeneous robot groups 41 / 58

Page 74: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Integrating recognized behavior

L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:

Itk�1tk

� �otk , atk , dtk , µtk , ftk , otk�1�duration

dtk � tk�1 � tk

L Integrate recognized behavior into existing experience

Learning and imitation in heterogeneous robot groups 41 / 58

Page 75: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Integrating recognized behavior

L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:

Itk�1tk

� �otk , atk , dtk , µtk , ftk , otk�1�duration

dtk � tk�1 � tk

failureftk � false

L Integrate recognized behavior into existing experience

Learning and imitation in heterogeneous robot groups 41 / 58

Page 76: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Integrating recognized behavior

L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:

Itk�1tk

� �otk , atk , dtk , µtk , ftk , otk�1�duration

dtk � tk�1 � tk

motivation

µItk � µDtk �µImax�µ

Imin

µDmax�µDmin

failureftk � false

L Integrate recognized behavior into existing experience

Learning and imitation in heterogeneous robot groups 41 / 58

Page 77: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Integrating recognized behavior

L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:

Itk�1tk

� �otk , atk , dtk , µtk , ftk , otk�1�duration

dtk � tk�1 � tk

motivation

µItk � µDtk �µImax�µ

Imin

µDmax�µDmin

failureftk � false

L Integrate recognized behavior into existing experience

Learning and imitation in heterogeneous robot groups 41 / 58

Page 78: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “three bases”: results

L Time to reach the goal decreaseddrastically in the beginning

L Imitation slightly above no-imitationin the long-run.

L Reason: Robots have learned toprefer the black base: Higher reward,but longer way.

L More realistic measure: Reward/time

L Result: Imitation increases learningspeed by up to 50%

more results

Learning and imitation in heterogeneous robot groups 42 / 58

Page 79: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “three bases”: abstraction

L Experience heuristic cuts off at 2000interactions

L Imitation starts with a lower amount of

experiences:L Reaching the goals faster � less

experienceL Learning faster to drive to the black

base � more experience

L Less than 6 abstracts states withno-imitation

L Less than 10 abstracts states withimitation

Learning and imitation in heterogeneous robot groups 43 / 58

Page 80: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “three bases”: group homogeneity

L Percentage of goal bases

no imitation

imitation

number of episodes0 10 20 30 40 50

60

goal base

[%

]

10

20

30

40

50

yellowredblack

L Group homogeneity G�X� measured by

normalized Shannon entropy H�X��Hmax � log SXS�:L H�X� � �Q

x>X

p�x� log p�x�

L G�X� � Hmax �H�X�Hmax

group homogeneityimitationno-imitation

number of episodes0 10 20 30 40 50

hom

ogeneit

y

0.02

0.04

0.06

0.08

0.10

0.12

0.14

1.0 = all robots prefer the same goal

Learning and imitation in heterogeneous robot groups 44 / 58

Page 81: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “five bases”: description

L Robots have to transport objects to goalbases

L No reward for reaching an object

L Goal bases provide different rewardL black: 10,000 pointsL blue, green, red, yellow: 20 points

L State space consists ofL distance to closest objectL distance of closest object to closest goalL ID of closest goal

Learning and imitation in heterogeneous robot groups 45 / 58

Page 82: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “five bases”: results

L Time to reach the goal:lower for imitation in the beginning,higher in the long run

L Imitation above no-imitation in thelong-run.

L Reason: needle eye between greenand blue base

L More realistic measure: Reward/time

L Result: Imitation increases learningspeed in the long run by up to 100%

Learning and imitation in heterogeneous robot groups 46 / 58

Page 83: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “five bases”: abstraction

L Experience heuristic cuts off at 2000interactions

L Less than 8 abstracts states withno-imitation

L Less than 10 abstracts states withimitation

Learning and imitation in heterogeneous robot groups 47 / 58

Page 84: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationMulti-robot scenario “five bases”: group homogeneity

L Percentage of goal bases

no imitation

imitation

L Group homogeneity G�X� measured by

normalized Shannon entropy H�X��Hmax � log SXS�:L H�X� � �Q

x>X

p�x� log p�x�

L G�X� � Hmax �H�X�Hmax

group homogeneity

1.0 = all robots prefer the same goal

Learning and imitation in heterogeneous robot groups 48 / 58

Page 85: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Choice of the imitateeTask

L Find the best imitatee prior to theimitation process itself

motivation layer

strategy layer

skill layer

current motivation

request result

perc

ep

tion

acti

on

ch

oic

eof

the

imit

ate

e

imit

ati

on

Approach

1. Observe behavior capabilities by means ofaffordances

2. Encode recognized affordances stochastically3. Compare representation differences

raw perceptionI

1. affordance detection

accumulated affordancesT

quit 2. affordance network generation

affordance networksAN1, . . . ,ANn

3. choice of the imitateeRimitate � argmin

Ri>R, RixRm

�DAN �ANi,ANm��

don’t imitate imitate

Learning and imitation in heterogeneous robot groups 49 / 58

Page 86: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Affordance detectionAffordance testing conditions modeled as FSAs

driveto

object

alignto

object

seizeobject

finish

ok

fail

ok

fail

(a) seizable

driveto

object

alignto

object

seizeobject

liftobject

finish

ok

fail

ok

fail

ok

fail

(b) liftable

driveto

object

alignto

object

drivefor-

ward

finish

ok

fail

ok

fail

(c) pushable

driveto

object

alignto

object

seizeobject

driveback-ward

finish

ok

fail

ok

fail

ok

fail

(d) pullable

Learning and imitation in heterogeneous robot groups 50 / 58

Page 87: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Affordance network generationPreparing collected data

Λ1 � seizable

Λ2 � liftable

Λ3 � pushable

Λ4 � pullable

T red

T redred T red

blue

Λj ok Validj�Ired�t�, ok ,Rred� Λj ok Validj�Ired�t�, ok ,Rblue�Λ1 o1 T Λ1 o1 T

Λ2 o1 � Λ2 o1 �

Λ3 o1 T Λ3 o1 T

Λ4 o1 F Λ4 o1 F

Λ1 o2 F Λ1 o2 F

Λ2 o2 � Λ2 o2 �

Λ3 o2 F Λ3 o2 F

Λ4 o2 F Λ4 o2 F

Λ1 o3 T Λ1 o3 T

Λ2 o3 T Λ2 o3 T

Λ3 o3 T Λ3 o3 T

Λ4 o3 T Λ4 o3 F

ok Λ1 Λ2 Λ3 Λ4o1 T � T F

o2 F � F F

o3 T T T T

ok Λ1 Λ2 Λ3 Λ4o1 T � T F

o2 F � F F

o3 T T T F

T redred

�T red

blue�

Learning and imitation in heterogeneous robot groups 51 / 58

Page 88: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Affordance network generationPreparing collected data

Λ1 � seizable

Λ2 � liftable

Λ3 � pushable

Λ4 � pullable

T red

T redred T red

blue

Λj ok Validj�Ired�t�, ok ,Rred� Λj ok Validj�Ired�t�, ok ,Rblue�Λ1 o1 T Λ1 o1 T

Λ2 o1 � Λ2 o1 �

Λ3 o1 T Λ3 o1 T

Λ4 o1 F Λ4 o1 F

Λ1 o2 F Λ1 o2 F

Λ2 o2 � Λ2 o2 �

Λ3 o2 F Λ3 o2 F

Λ4 o2 F Λ4 o2 F

Λ1 o3 T Λ1 o3 T

Λ2 o3 T Λ2 o3 T

Λ3 o3 T Λ3 o3 T

Λ4 o3 T Λ4 o3 F

ok Λ1 Λ2 Λ3 Λ4o1 T � T F

o2 F � F F

o3 T T T T

ok Λ1 Λ2 Λ3 Λ4o1 T � T F

o2 F � F F

o3 T T T F

T redred

�T red

blue�

Learning and imitation in heterogeneous robot groups 51 / 58

Page 89: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Affordance network generationLearned Bayesian networks [Friedman, 1997]

T redred

ok Λ1 Λ2 Λ3 Λ4o1 T � T F

o2 F � F F

o3 T T T T

ANred

A1

A3

0

1

P�A1�

0.00

1.00

A4

A3

0

1

P�A4�

0.00

0.33

A3

P�A3� � 0.60

A2

P�A2� � 0.60

T redblue

ok Λ1 Λ2 Λ3 Λ4o1 T � T F

o2 F � F F

o3 T T T F

ANblue

A1

A3

0

1

P�A1�

0.00

1.00

A2

P�A2� � 0.60

A3

P�A3� � 0.60

A4

P�A4� � 0.00

Learning and imitation in heterogeneous robot groups 52 / 58

Page 90: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]

ANred

A1

A3

0

1

P�A1�

0.00

1.00

A4

A3

0

1

P�A4�

0.00

0.33

A3

P�A3� � 0.60

A2

P�A2� � 0.60

ANblue

L Affordance network difference:

DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S

� 2 � 1 � 2 � 1

L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue

i �

� 0.33

Learning and imitation in heterogeneous robot groups 53 / 58

Page 91: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]

ANred

A1

A3

0

1

P�A1�

0.00

1.00

A4

A3

0

1

P�A4�

0.00

0.33

A3

P�A3� � 0.60

A2

P�A2� � 0.60

ANblue

L Affordance network difference:

DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S

� 2 � 1 � 2 � 1

L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue

i �

� 0.33

Learning and imitation in heterogeneous robot groups 53 / 58

Page 92: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]

ANred

A1

A3

0

1

P�A1�

0.00

1.00

A4

A3

0

1

P�A4�

0.00

0.33

A3

P�A3� � 0.60

A2

P�A2� � 0.60

ANblue

L Affordance network difference:

DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S

� 2 � 1 � 2 � 1

L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue

i �

� 0.33

Learning and imitation in heterogeneous robot groups 53 / 58

Page 93: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]

ANred

A1

A3

0

1

P�A1�

0.00

1.00

A4

A3

0

1

P�A4�

0.00

0.33

A3

P�A3� � 0.60

A2

P�A2� � 0.60

ANblue

L Affordance network difference:

DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S

� 2 � 1 � 2 � 1

L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue

i �

� 0.33

Learning and imitation in heterogeneous robot groups 53 / 58

Page 94: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]

ANred

A1

A3

0

1

P�A1�

0.00

1.00

A4

A3

0

1

P�A4�

0.00

0.33

A3

P�A3� � 0.60

A2

P�A2� � 0.60

ANblue

L Affordance network difference:

DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S � 2 � 1 � 2 � 1L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Ared

i ,Abluei � � 0.33

Learning and imitation in heterogeneous robot groups 53 / 58

Page 95: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Choosing the best imitatee

A1

A3

01

P(A1)

0.001.00

A4

A3

01

P(A4)

0.000.33

A3

P(A3) = 0.60A2

P(A2) = 0.60

Rimitate � argminRi>R, RixRred

�DAN �ANi,ANred��

Learning and imitation in heterogeneous robot groups 54 / 58

Page 96: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationExperimental setup

L Robots imitate other robots that areperforming different skills on thedifferent objects

L Thereby: Listening to the imitator’serror functions of the involved skills

L Number of failure signals received bythe strategy layer serves as anindicator how wise the demonstratorchoice had been.

Learning and imitation in heterogeneous robot groups 55 / 58

Page 97: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationParametrization of the robots and objects

robots

parameter values description

moto

r

power �0.3, 6.0� kg maximal weight a robot canpull/push

speed �0.03, 0.2� m~s controls the impulse a robot im-pact on an object

gri

pp

er

length �0.08, 0.2�m the longer the gripper the deeperthe objects can be

span �0.16, 0.5� m limits the diameter of objects thatcan be gripped

closing force�1.0, 30.0� kg controls the contact pressure (to

pull heavier objects the closingforce must be higher)

lifting force�30.0, 80.0� kg controls the friction (to lift heavier

objects the closing force must behigher)

form �normal, barb� see figures below

objects

parameter values discret.

mass �1.0, 5.0� kg 0.5 kg

width �0.04, 0.24� m 0.05 m

height �0.17, 0.2� m 0.05 m

friction �50, 100� % 0.1 %

shape { sphere, cube,cylinder}

Learning and imitation in heterogeneous robot groups 56 / 58

Page 98: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationRobustness experiment

Figure: Impact of the demonstrator selection algorithm on the failure rates fordifferent fractions of unknown data (with 95% confidence interval for the 0% case)

Learning and imitation in heterogeneous robot groups 57 / 58

Page 99: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

EvaluationClustering experiment

L Cluster objects by their appearanceprior to AN creation

L Generate one AN for each clusterL Distance is the weighted sum:

DcAN�Ra ,Rb� � Pn

l�1klk� DAN�ANa,l , ANb,l�

k � min ��ST ma,l S � ST m

b,l S� S 1 B l B n�

kl � min �ST ma,l S, ST m

b,l S�

Impact of the clustered demonstrator selectionalgorithm

Learning and imitation in heterogeneous robot groups 58 / 58

Page 100: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Baele, G., Bredeche, N., Haasdijk, E., Maere, S., Michiels, N., Van de Peer, Y., Schwarzer, C., and Thenius, R.

(2009).Open-ended on-board evolutionary robotics for robot swarms.In Tyrrell, A., editor, 2009 IEEE Congress on Evolutionary Computation, pages –, Trondheim, Norway. IEEEComputational Intelligence Society, IEEE Press.

Dautenhahn, K. and Nehaniv, C. (2002).

Imitation in Animals and Artifacts, chapter “An agent-based perspective on imitation”.MIT Press.

Dickinson, P. J., Bunke, H., Dadej, A., and Kraetzl, M. (2003).

On graphs with unique node labels.In Graph Based Representations in Pattern Recognition, volume 2726, pages 409–437, Heidelberg, DE.Springer Berlin.

Dorigo, M., Tuci, E., Trianni, V., Gro"s, R., Nouyan, S., Ampatzis, C., Labella, T. H., O’Grady, R., Bonani, M.,

and Mondada, F. (2006).SWARM-BOT: Design and Implementation of Colonies of Self-Assembling Robots.In Computational Intelligence: Principles and Practice. IEEE Computational Intelligence Society, New York.

Friedman, N. (1997).

Learning belief networks in the presence of missing values and hidden variables.In Proc. 14th International Conference on Machine Learning, pages 125–133. Morgan Kaufmann.

Inamura, T., Toshima, I., Nakamura, Y., and Saitama, J. (2003).

Acquiring Motion Elements for Bidirectional Computation of Motion Recognition and Generation.Experimental Robotics VIII.

Kochenderfer, M. (2006).

Adaptive Modelling and Planning for Learning Intelligent Behaviour.PhD thesis, School of Informatics, University of Edinburgh.

Learning and imitation in heterogeneous robot groups 58 / 58

Page 101: Learning and imitation in heterogeneous robot groups

Introduction Architecture Imitation in robot groups Conclusion

Priesterjahn, S. (2008).

Online imitation and adaptation in modern computer games.PhD thesis, University of Paderborn.

Takahashi, Y., Tamura, Y., and Asada, M. (2008).

Mutual development of behavior acquisition and recognition based on value system.In From Animals to Animats 10, 10th International Conference on Simulation of Adaptive Behavior (SAB2008), pages 291–300.

Viterbi, A. (1967).

Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.Information Theory, IEEE Transactions on, 13(2):260–269.

Learning and imitation in heterogeneous robot groups 58 / 58