Learning and imitation in heterogeneous robot groups
-
Upload
willi-richert -
Category
Technology
-
view
1.241 -
download
3
description
Transcript of Learning and imitation in heterogeneous robot groups
Introduction Architecture Imitation in robot groups Conclusion
Learning and imitation in
heterogeneous robot groups
Wilhelm [email protected]
Fakultät für Elektrotechnik, Informatik und Mathematik,Universität Paderborn
22. Dezember 2009
Learning and imitation in heterogeneous robot groups 1 / 58
Introduction Architecture Imitation in robot groups Conclusion
MotivationWhy do we need learning and imitation?
State of the art
L Off-line learning (mostly population-based)
L Behavior is fixed afterwards
Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009]
Desired
L On-line learning to intelligently react on unforeseeable events/problems
L Means to benefit from the “redundancy” in group behavior
L Robustness to arbitrary robot groups
Learning and imitation in heterogeneous robot groups 2 / 58
Introduction Architecture Imitation in robot groups Conclusion
MotivationWhy do we need learning and imitation?
State of the art
L Off-line learning (mostly population-based)
L Behavior is fixed afterwards
Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009]Desired
L On-line learning to intelligently react on unforeseeable events/problems
L Means to benefit from the “redundancy” in group behavior
L Robustness to arbitrary robot groups
Learning and imitation in heterogeneous robot groups 2 / 58
Introduction Architecture Imitation in robot groups Conclusion
The five big challenges in imitation[Dautenhahn and Nehaniv, 2002]
Five big challenges governing successful imitation in multi-robot systems:
whom � heterogeneous robot groups
when � concentrate on salient behavior
what � the results, the actions, or the hidden goals of the imitatee?
how � correspondence problem
how to evaluate What should be counted as successful imitation?
Learning and imitation in heterogeneous robot groups 3 / 58
Introduction Architecture Imitation in robot groups Conclusion
Thesis objectives
Robots in a groups shall be able to
1. combine learning with imitation,
2. recognize and learn observedbehavior non-obtrusively, and
3. choose potential imitatees wiselyalso in heterogeneous robot groups.
Learning and imitation in heterogeneous robot groups 4 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
Introduction Architecture Imitation in robot groups Conclusion
Strategy layer
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
L Inspired by AMPS [Kochenderfer, 2006]
raw perception, motivationI, µi
perception filteringot > Is
experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te
abstractions � ξ�o� heuristics
modelT ,R, γ
reinforcementlearning
policyπ
action selectiona � π�s� > A
Learning and imitation in heterogeneous robot groups 6 / 58
Introduction Architecture Imitation in robot groups Conclusion
Strategy layer
L State abstraction function ξ might use any
abstraction method supportingL insertion of new state observationsL deletion of old state observationsL querying most similar state observation to
a new state observation
L Experiments use nearest neighbor
region("abstract state")
state observation("raw state")
raw perception, motivationI, µi
perception filteringot > Is
experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te
abstractions � ξ�o� heuristics
modelT ,R, γ
reinforcementlearning
policyπ
action selectiona � π�s� > A
Learning and imitation in heterogeneous robot groups 6 / 58
Introduction Architecture Imitation in robot groups Conclusion
Strategy layer
L Heuristics maintain the models so that the sameaction feels similar in all observations of thesame state
L Heuristics may split or merge regionstransition, failure, reward, simplification, experience
L Example: transition heuristic
region("abstract state")
state observation("raw state")
raw perception, motivationI, µi
perception filteringot > Is
experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te
abstractions � ξ�o� heuristics
modelT ,R, γ
reinforcementlearning
policyπ
action selectiona � π�s� > A
Learning and imitation in heterogeneous robot groups 6 / 58
Introduction Architecture Imitation in robot groups Conclusion
Strategy layer
L Heuristics maintain the models so that the sameaction feels similar in all observations of thesame state
L Heuristics may split or merge regionstransition, failure, reward, simplification, experience
L Example: transition heuristic
region("abstract state")
state observation("raw state")
raw perception, motivationI, µi
perception filteringot > Is
experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te
abstractions � ξ�o� heuristics
modelT ,R, γ
reinforcementlearning
policyπ
action selectiona � π�s� > A
Learning and imitation in heterogeneous robot groups 6 / 58
Introduction Architecture Imitation in robot groups Conclusion
Building a policy
L Reinforcement Learning with SMDPL Q�s, a� � R�s, a� �Q
s�>SP�s�Ss, a�γ�s, a, s��Vπ�s��
L Determine current best policyL Vπ�s� � max
a>AQ�s, a�
L π�s� � argmaxa>A
Q�s, a�
region("abstract state")
state observation("raw state")
raw perception, motivationI, µi
perception filteringot > Is
experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te
abstractions � ξ�o� heuristics
modelT ,R, γ
reinforcementlearning
policyπ
action selectiona � π�s� > A
Learning and imitation in heterogeneous robot groups 7 / 58
Introduction Architecture Imitation in robot groups Conclusion
Strategy layer
L Strategy layer requests symbolic actions
L Execution of these actions is up to the skill layer
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
raw perception, motivationI, µi
perception filteringot > Is
experience`�o, a, d, µi, f �t�N , . . . , �o, a, d, µi, f �te
abstractions � ξ�o� heuristics
modelT ,R, γ
reinforcementlearning
policyπ
action selectiona � π�s� > A
Learning and imitation in heterogeneous robot groups 8 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill layer
Tasks
1. discover and learn a set of skills that are useful to thestrategy layer � ground symbols > A
2. execute them when requested and optimize at runtime
Skill
L skill s � �f 1e , . . . , f Ne �, whereL error function fe � Ia � Ia � R� assigns an error value to a
pair of perception �I�ti�, I�tj��Example: “approach the ball and orient towards it”
f 1e �I�ti�, I�tj�� � dball�I�tj�� � minimize the ball distancef 2e �I�ti�, I�tj�� � Sαball�I�tj��S � minimize the ball angles � �f 1e , f 2e � � approach the ball and orient towards it
Learning and imitation in heterogeneous robot groups 9 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill layerMeasuring a skill’s progress
L Progress function fp � Ia � Ia � �0, 1� measures a skill’s progressL For a skill s � �f 1e , . . . , f Ne � it is defined as
fp�I�ti�, I�tj�� �¢¨¦¨¤
0 if Ca BW�I�ti�, I�tj��Ca�W�I�ti�,I�tj��
Ca�Csif Cs @W�I�ti�, I�tj�� @ Ca
1 if W�I�ti�, I�tj�� B Cs
f ie : error function, I�ti�: perception when the skill has been started, I�tj�: current perception, success and
abort thresholds Cs > R� and Ca > R� (Cs @ Ca)
L W�I�ti�, I�tj�� � PNk�1 f
ke �I�ti�, I�tj��
L Example graph:Cs � 0.15, Ca � 0.75full skill definition
Learning and imitation in heterogeneous robot groups 10 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationOverview of the approach
L Robots observe each other permanently
L Moving window of observations and well-being statesfor each observed robot
L Imitation process starts when well-beingimprovement is detected
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
observed episode`�oI1 , eI1�, . . . , �oIN , eIN�e
transform observations
subjective observation data`�oD1 , e1�, . . . , �oDN , eN�e
interpret behavior
recognized episodes`. . . , ��t, oD , s�, at , �t�, o�D , s��� , . . .e
estimate rewards
observed interpreted experience`. . . , ��t, oD , s�, at , rt , �t�, o�D , s��� , . . .e
integrate into experience,update SMDP
Learning and imitation in heterogeneous robot groups 11 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationHMM and the Viterbi connection [Viterbi, 1967]
sa
ox
sb
oy
sc
oz
P�sb S sa�
P�sc S sa�
P�ox Ssa �
P�oy S s
a �P�o
z S sa�
o1o2 . . . oT � Viterbi � s1s2 . . . sT
V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��
Learning and imitation in heterogeneous robot groups 12 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationHMM and the Viterbi connection [Viterbi, 1967]
sa
ox
sb
oy
sc
oz
P�sb S sa�
P�sc S sa�
P�ox Ssa �
P�oy S s
a �P�o
z S sa�
o1o2 . . . oT � Viterbi � s1s2 . . . sT
V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��
Learning and imitation in heterogeneous robot groups 12 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationHMM and the Viterbi connection [Viterbi, 1967]
sa
ox
sb
oy
sc
oz
P�sb S sa�
P�sc S sa�
P�ox Ssa �
P�oy S s
a �P�o
z S sa�
o1o2 . . . oT � Viterbi � s1s2 . . . sT
V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��Learning and imitation in heterogeneous robot groups 12 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationInterpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
L Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Knowledge in skill layer
L Skills vote on perceptual changes plus
the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationInterpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
s0
s1
s2
T�s 0,
a 0, s 1�
T�s 0,
a 1, s 1�
T�s 0,
a 2, s 1�
T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�
L Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Knowledge in skill layer
L Skills vote on perceptual changes plus
the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationInterpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
s0
s1
s2
T�s 0,
a 0, s 1�
T�s 0,
a 1, s 1�
T�s 0,
a 2, s 1�
T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�
L Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Knowledge in skill layer
L Skills vote on perceptual changes plus
the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationInterpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
s0
s1
s2
T�s 0,
a 0, s 1�
T�s 0,
a 1, s 1�
T�s 0,
a 2, s 1�
T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�
L Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Knowledge in skill layer
a0
∆o0
a1
∆o1
a2
∆o2
approach ball approach goal lift ball
������
�1
�0.4
0
������
������
�0.2
�1
0
������
������
0
0
0.3
������
ball dist
goal dist
ball height
L Skills vote on perceptual changes plus
the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationInterpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
s0
s1
s2
T�s 0,
a 0, s 1�
T�s 0,
a 1, s 1�
T�s 0,
a 2, s 1�
T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�
L Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Knowledge in skill layer
a0
∆o0
a1
∆o1
a2
∆o2
approach ball approach goal lift ball
������
�1
�0.4
0
������
������
�0.2
�1
0
������
������
0
0
0.3
������
ball dist
goal dist
ball height
P�∆o2 Sa0�P�∆o2 Sa1 �
P�∆o2 Sa
2 �
L Skills vote on perceptual changes � f ap
plus the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationInterpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
s0
s1
s2
T�s 0,
a 0, s 1�
T�s 0,
a 1, s 1�
T�s 0,
a 2, s 1�
T�s0, a0, s2�T�s0, a1, s2�T�s0, a2, s2�
L Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Knowledge in skill layer
a0
∆o0
a1
∆o1
a2
∆o2
approach ball approach goal lift ball
������
�1
�0.4
0
������
������
�0.2
�1
0
������
������
0
0
0.3
������
ball dist
goal dist
ball height
P�∆o2 Sa0�P�∆o2 Sa1 �
P�∆o2 Sa
2 �
L Skills vote on perceptual changes � f applus the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot�1 � ota) Prefer nearer goals
Ambiguous situation: Robot might drive either to the red or yellow goal base
b) Ignore skills that “seem to have finished”c) Clip votes to �0, 1�
Pa�ot S ot�1� �
¢¦¤min�max�
f ap �ot��fap �ot�1�
1�f ap �ot�
, 0� , 1� , 1 � f ap �ot� @ є
0, otherwise
2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2
aml � argmaxa
Pt2t�t1
Pa�ot S ot�1�t2 � t1
3. Recognize state transitions
P�st2 S st1� � T�st1 , aml , st2�
Learning and imitation in heterogeneous robot groups 14 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot�1 � ota) Prefer nearer goalsb) Ignore skills that “seem to have finished”
c) Clip votes to �0, 1�
Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f
ap �ot�1�
1�f ap �ot�, 0� , 1� , 1 � f ap �ot� @ є
0, otherwise
2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2
aml � argmaxa
Pt2t�t1
Pa�ot S ot�1�t2 � t1
3. Recognize state transitions
P�st2 S st1� � T�st1 , aml , st2�
Learning and imitation in heterogeneous robot groups 14 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot�1 � ota) Prefer nearer goalsb) Ignore skills that “seem to have finished”c) Clip votes to �0, 1�
Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f
ap �ot�1�
1�f ap �ot�, 0� , 1� , 1 � f ap �ot� @ є
0, otherwise
2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2
aml � argmaxa
Pt2t�t1
Pa�ot S ot�1�t2 � t1
3. Recognize state transitions
P�st2 S st1� � T�st1 , aml , st2�
Learning and imitation in heterogeneous robot groups 14 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot�1 � ota) Prefer nearer goalsb) Ignore skills that “seem to have finished”c) Clip votes to �0, 1�
Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f
ap �ot�1�
1�f ap �ot�, 0� , 1� , 1 � f ap �ot� @ є
0, otherwise
2. Recognize actions in sequence ot2t1 � ot1ot1�∆ . . . ot2
aml � argmaxa
Pt2t�t1
Pa�ot S ot�1�t2 � t1
3. Recognize state transitions
P�st2 S st1� � T�st1 , aml , st2�
Learning and imitation in heterogeneous robot groups 14 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationRecognition scenario: description
L Demonstrator (right robot) has totransport the yellow ball onto thebase
L Imitator (left robot) tries to“understand” its observations
L Two scenarios:
1. Imitator is only able to drive (andthereby push the ball)
2. Imitator is also able to lift theball
fig/lifting.png
Learning and imitation in heterogeneous robot groups 15 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationRecognition scenario: results
1. Without lifting capabilities
???
dis
tance
[m
]
move toball
move togoal
L Recognized “drive to ball” (B) and “drive togoal” (G) correctly
L Detected “missing behavior” in between
2. With lifting capabilities
dis
tance
[m
]
move toball
move togoal
lift theball
L Recognized “drive to ball” (B), “lift the ball”(L), and “drive to goal” (G) correctly
Learning and imitation in heterogeneous robot groups 16 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationRecognition scenario: results
1. Without lifting capabilities
???
dis
tance
[m
]
move toball
move togoal
L Recognized “drive to ball” (B) and “drive togoal” (G) correctly
L Detected “missing behavior” in between
2. With lifting capabilities
dis
tance
[m
]
move toball
move togoal
lift theball
L Recognized “drive to ball” (B), “lift the ball”(L), and “drive to goal” (G) correctly
Learning and imitation in heterogeneous robot groups 16 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “three bases”
L Task: transport objects to goal bases
L Reward for reaching an object: 10
L Goal bases provide different reward
L State space consists ofL distance to closest objectL distance of closest object to closest goalL ID of closest goal
Learning and imitation in heterogeneous robot groups 17 / 58
Introduction Architecture Imitation in robot groups Conclusion
ConclusionObjectives achieved in this thesis
1. Combination of learning and imitation
2. Non-obtrusive recognition and learningof observed behavior
3. Support for heterogeneous robotgroups
Thank you for your attention!
Learning and imitation in heterogeneous robot groups 18 / 58
Introduction Architecture Imitation in robot groups Conclusion
ConclusionObjectives achieved in this thesis
1. Combination of learning and imitation
2. Non-obtrusive recognition and learningof observed behavior
3. Support for heterogeneous robotgroups
Thank you for your attention!
Learning and imitation in heterogeneous robot groups 18 / 58
Introduction Architecture Imitation in robot groups Conclusion
L ArchitectureL State of the artL OverviewL Layer interaction
L Motivation layerL ExcitationL Prioritizing goals
L Strategy layerL State abstractionL HeuristicsL PolicyL Sample frequencyL Strategy example
L Skill layerL Overview of the approach
explore, exploitL Skill managerL Model managerL Error minimizerL ConfigurationL Skill example
L Imitation in robot
groupsL Overview of the approach
L Recognizing behaviorL ViterbiL Interpreting observed behaviorL Recognition example
L Integrating recognized behavior
L EvaluationL CTF with three basesL PerformanceL State abstractionL Group homogeneityL CTF with five basesL PerformanceL State abstractionL Group homogeneity
L Choice of the imitateeL Affordance detectionL Affordance network generationL Comparing ANsL Choice of the imitateeL EvaluationL Parameterization of the
environmentL Robustness experimentL Clustering experiment
Learning and imitation in heterogeneous robot groups 19 / 58
Introduction Architecture Imitation in robot groups Conclusion
State of the art
[Takahashi et al., 2008] use imitation to learnrobotic soccer behaviors (approaching,shooting a ball)
� combines learning with imitation� requires the robot group to stop
whenever a robot imitates� needs multiple presentation of the
same behavior� needs sufficient prior knowledge of
the task to imitate
[Priesterjahn, 2008] evolves game bots withsimilar performance as the humanplayer
[Inamura et al., 2003] combine top-downteaching with the bottom-up learningfrom the robot’s side
Learning and imitation in heterogeneous robot groups 20 / 58
Introduction Architecture Imitation in robot groups Conclusion
State of the art
[Takahashi et al., 2008] use imitation to learnrobotic soccer behaviors (approaching,shooting a ball)
[Priesterjahn, 2008] evolves game bots withsimilar performance as the humanplayer
� shows that imitation-basedadaptation is able to outperform theevolutionary only approach
� targeted to computer gamescenarios, not stochastic real-worldapplications
� assumes group homogeneity
[Inamura et al., 2003] combine top-downteaching with the bottom-up learningfrom the robot’s side
The Rule-Based Operation Cycle of an Agent
Learning and imitation in heterogeneous robot groups 20 / 58
Introduction Architecture Imitation in robot groups Conclusion
State of the art
[Takahashi et al., 2008] use imitation to learnrobotic soccer behaviors (approaching,shooting a ball)
[Priesterjahn, 2008] evolves game bots withsimilar performance as the humanplayer
[Inamura et al., 2003] combine top-downteaching with the bottom-up learningfrom the robot’s side
� exclusive approach (cannot becombined with other learningtechniques)
� HMM is learned and then fixthroughout the robot’s lifetime
Motion capturing system: motion for learning data
A result of motion generation on a humanoid robot
Learning and imitation in heterogeneous robot groups 20 / 58
Introduction Architecture Imitation in robot groups Conclusion
Layer interaction
Ê Strategy step is triggered
L Determining the current motivationand the corresponding next strategyaction.
L The strategy layer requires the mostcurrent motivation as feedbackregarding its last chosen action � bothare synchronous.
Ë Skill step is triggered
L Strategy step does not have to befinished yet
L The skill layer simply executesaccording to the action most recentlydelivered by the strategy layer
Ì,Í Strategy step has finished
L It signals the next action to executeand to the skill layer.
L Subsequent skill steps then performthis action accordingly.
clock motivation layer strategy layer skill layer perception action
Ê next strategy step event
request Improcessed perception
set next motivationrequest Is
processed perceptiondetermine next strategy step
set next skill
Strategy stepStrategy step
Ë next skill step event
request Iaprocessed perception
calculate best actuator command
set next low-level action
Skill stepSkill step
Ì next skill step event
request Iaprocessed perception
calculate best actuator command
set next low-level action
Skill stepSkill step
Í next skill step event
Learning and imitation in heterogeneous robot groups 21 / 58
Introduction Architecture Imitation in robot groups Conclusion
Motivation layerMotivation system example
L The current motivation µ is the vector
to the current drive state, dependent
onL timeL perception
L Each drive measures the status ofaccomplishing a sub-goal(0 = fully accomplished)
L A drive i is called satisfied (goalachieved) if the correspondingmotivation is below its threshold:µi @ µθ
i
drive 3
drive 1
drive 2
well-beingregion
currentdrive state
currentmotivation
shortest vectorto desired drive area,used for prioritization
p
more
Learning and imitation in heterogeneous robot groups 22 / 58
Introduction Architecture Imitation in robot groups Conclusion
A sub-goal subjected to an excitation
t
1
well-being region
0
excitation
threshold triggeringbehavior
L Excitation describes the force, which the current drive stateis subjected to.
L By specifying it dependent on the perception and on theinternal state of the robot the user is “programming” thefinal behavior.
Learning and imitation in heterogeneous robot groups 23 / 58
Introduction Architecture Imitation in robot groups Conclusion
Prioritizing goals
L At each time step, the motivation layer provides the currentmotivation vector to the strategy layer.
L With µp the strategy layer prioritizes, which of the sub-goalsare to be handled first
µp �
������
max�0, µ1 � µθ1 �
max�0, µ2 � µθ2 �
�
max�0, µn � µθn�
������L Different drives can be prioritized by means of an according
scaling � modeling a hierarchy of needs
Learning and imitation in heterogeneous robot groups 24 / 58
Introduction Architecture Imitation in robot groups Conclusion
Strategy layerSample frequency
A new interaction is made in one of the following conditions:
L Sufficiently different perception (measured by some scenario-specific distancemetric d):
d�ot1 , ot2� A θo
L Sufficiently interesting motivation change:
Sµt2 � µt1 S A θr
L Enough time has passed:
t2 � t1 A θt
θo, θr, and θt are application specific and have to be determined empirically.
Learning and imitation in heterogeneous robot groups 25 / 58
Introduction Architecture Imitation in robot groups Conclusion
Strategy example
S
GG
S
2
G
v
3>
(3, 1) (4, 1)
(4, 2)
(1, 1)
(2, 1)
4
v
6 >
(6, 5)
(4, 1) (5, 1)
(2, 1)
(1, 1)
(6, 4)
(6, 1)
(6, 6)
(6, 3)(6, 2)(3, 1)
G
Learning and imitation in heterogeneous robot groups 26 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill layer
1. discover and learn a set of skills that are useful to thestrategy layer � ground symbols > A
2. execute them when requested and optimize at runtime
exploration mode
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
training mode notify new skill
create & fetch skills
createmod-els
Ia
Oexplore actions
exploitation mode
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
execution mode request skill
set current skill
fetch cur-rent skill
Ia
updatemod-els
O
Learning and imitation in heterogeneous robot groups 27 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill layerData flow in exploration mode
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
training mode notify new skill
create & fetch skills
createmod-els
Ia
Oexplore actions
Learning and imitation in heterogeneous robot groups 28 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill layerData flow in exploitation mode
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
execution mode request skill
set current skill
fetch cur-rent skill
Ia
updatemod-els
O
Learning and imitation in heterogeneous robot groups 29 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill definition
L extraction function fext � Ia � R extracts information from a perception I�t� > IaL control function fc � R � R � R� associates an error value to the tuple �vti , vtj�
L decrease: fc�vti , vtj� � Svtj SL increase: fc�vti , vtj� � 1
Svtj S
L keep value: fc�vti , vtj� � Svti � δ � vtj SL error function fe � Ia � Ia � R� assigns an error value to a perception pairL progress function fp � Ia � Ia � �0, 1� measures a skill’s progress between two
time pointsmore about fp
Learning and imitation in heterogeneous robot groups 30 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill manager
L exploration phaseL generate skills that enable the robot to
control the perceived propertiesL assign a priority to each skill dependent on
its execution priorityL determine the skills the robot can reliably
perform and notify them as new skills tothe strategy layer
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
training mode notify new skill
create & fetch skills
createmod-els
Ia
Oexplore actions
L exploitation phaseL manage the execution of requested skills
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
execution mode request skill
set current skill
fetch cur-rent skill
Ia
updatemod-els
O
Learning and imitation in heterogeneous robot groups 31 / 58
Introduction Architecture Imitation in robot groups Conclusion
Model manager
L creating prediction models for each perceived
propertyL prediction model is the tuple �idp , S, M,m�
idp > IDp: perception feature to be predictedS ` IDo � IDp: subset of the perceptual featuresM ` O: subset of the actuators to controlm � RSSS�SMS � R predicts the value for theperceptual feature idp at the next inputperception given the values of S and M.
L m in experiments: Poly, RBF
L updating prediction models to reflect newexperiences
L scoring each model dependent on its predictionaccuracy:
score�m� � n
Pk�ni�k �m�S�ti�,M�ti�� � vti�1�2
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
training mode notify new skill
create & fetch skills
createmod-els
Ia
Oexplore actions
skill layer
strategy layer
skillmanager
skills
modelmanager
errorminimizer
acti
on
perc
ep
tion
execution mode request skill
set current skill
fetch cur-rent skill
Ia
updatemod-els
O
Learning and imitation in heterogeneous robot groups 32 / 58
Introduction Architecture Imitation in robot groups Conclusion
Error minimizer1. Ic�t�� only those perceptual features, on which the error functions of the current
skill s are dependent on current time t
2. Estimate the next perception, Ic�t � 1�*, dependent on the motor action M as
predicted by mj
best� argmaxm�score�m��:
IMc �t � 1� � �mj
best�Ic�t�,M�t�� S pj > Ic�t��
3. For each error function f ke : calculate the expected next error eMk�t � 1�, with Ic�ti�
being the perception when the skill has been started:
eMk �t � 1� � f ke �Ic�ti�, IMc �t � 1��
4. Determine the best actuator command M�t�, by finding the one that minimizes theaccumulated expected error:
Mnext�t� � minM
N
Qk�1
eMk �t � 1�
*t � 1 is the time point of the next interaction after time t
Learning and imitation in heterogeneous robot groups 33 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill layer configuration
Greater universality leads to a bigger exploration space. It is wise to limit theexploration space by specifying non-changing parameters beforehand. This can beachieved by configuring the following parameters:
L Degrees of freedom specify the number of actors the skill layer has to control.
L Extraction functions define the language that can be used to specify the errorfunctions.
L Control functions specify the functions that the error minimizer will minimize bymeans of the error functions.
L Regression models are used by the model manager to build predictions for theenvironment interaction. A regression model consists of two methods: one that fitsa model to an experience trace and one that predicts the value of the modeledproperty.
Learning and imitation in heterogeneous robot groups 34 / 58
Introduction Architecture Imitation in robot groups Conclusion
Skill example“Minimize angle to object” learned with radial basis functions
Controlling speed dependent on angle anddistance to the object
Controlling rotational speed dependent onangle and distance to the object
Learning and imitation in heterogeneous robot groups 35 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationViterbi [Viterbi, 1967]
Problem description
L Given the observation sequence oN1 � `o1 , o2 , . . . oNe �oi > Rd�L Find the most likely hidden state sequence sN1 � `s1 , s2 , . . . , sNe �si > S�
Approach
L Maximizing probability P�sN1 S oN1 �: sN�1 � argmaxsN1
P �sN1 S oN1 �
by recursively calculating the probability V�s, t� � maxst�11 P�ot1 , s1 . . . st�1st � s� that
s > S is the observed hidden state at time t given the observations ot1:
L V�s, 1� � P�o1 S s1 � s�P�s1 � s� ¦ s > SL V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��L φ�s, t� � argmaxs� �P�st � s S st�1 � s��V�s� , t � 1��
Learning and imitation in heterogeneous robot groups 36 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationViterbi [Viterbi, 1967]
Problem description
L Given the observation sequence oN1 � `o1 , o2 , . . . oNe �oi > Rd�L Find the most likely hidden state sequence sN1 � `s1 , s2 , . . . , sNe �si > S�
Approach
L Maximizing probability P�sN1 S oN1 �: sN�1 � argmaxsN1
P �sN1 S oN1 �
by recursively calculating the probability V�s, t� � maxst�11 P�ot1 , s1 . . . st�1st � s� that
s > S is the observed hidden state at time t given the observations ot1:
L V�s, 1� � P�o1 S s1 � s�P�s1 � s� ¦ s > SL V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��L φ�s, t� � argmaxs� �P�st � s S st�1 � s��V�s� , t � 1��
Learning and imitation in heterogeneous robot groups 36 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationViterbi [Viterbi, 1967]
Problem description
L Given the observation sequence oN1 � `o1 , o2 , . . . oNe �oi > Rd�L Find the most likely hidden state sequence sN1 � `s1 , s2 , . . . , sNe �si > S�
Approach
L Maximizing probability P�sN1 S oN1 �: sN�1 � argmaxsN1
P �sN1 S oN1 �
by recursively calculating the probability V�s, t� � maxst�11 P�ot1 , s1 . . . st�1st � s� that
s > S is the observed hidden state at time t given the observations ot1:
L V�s, 1� � P�o1 S s1 � s�P�s1 � s� ¦ s > SL V�s, t� � P�ot S st � s�maxs� �P�st � s S st�1 � s��V�s� , t � 1��L φ�s, t� � argmaxs� �P�st � s S st�1 � s��V�s� , t � 1��
Learning and imitation in heterogeneous robot groups 36 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationRecognition
Problem description
L Given the observation sequence oN1 � `o1, o2, . . . oNe �oi > Rd�L Find the most likely behavior sequence �t > R�, o > Rd , s > S, a > A)
à � �. . . , ��tk, ok, sk�, ak, �tk�1, ok�1, sk�1�� , . . .�
Approach
L Maximizing probability P�sn1 , an�11 S oN1 �, nP N
L Adapting V�s, 1� and V�s, t�:L Use own state and action space for S and AL Support bootstrapping of probabilitiesL Let actions recognize themselves� technical realization of the mirror neuron system
Learning and imitation in heterogeneous robot groups 37 / 58
Introduction Architecture Imitation in robot groups Conclusion
ImitationRecognition
Problem description
L Given the observation sequence oN1 � `o1, o2, . . . oNe �oi > Rd�L Find the most likely behavior sequence �t > R�, o > Rd , s > S, a > A)
à � �. . . , ��tk, ok, sk�, ak, �tk�1, ok�1, sk�1�� , . . .�
Approach
L Maximizing probability P�sn1 , an�11 S oN1 �, nP N
L Adapting V�s, 1� and V�s, t�:L Use own state and action space for S and AL Support bootstrapping of probabilitiesL Let actions recognize themselves� technical realization of the mirror neuron system
Learning and imitation in heterogeneous robot groups 37 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition: determining P�ot S st � s�
Viterbi: V�s, t� � P�ot S st � s�maxs�
�P�st � s S st�1 � s��V�s�, t � 1��
L Every state gets a chancedependent on the distance of itsobservations:
P�ot S st � s� � Po>Nko�sÕ ot � o Õ�2
Po>NkoÕ ot � o Õ�2
L Example: P�o S s � s2�, k � 3 state observa("raw state"
region
("abstract state")
Learning and imitation in heterogeneous robot groups 38 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition: determining P�st2 S st1�
Viterbi: V�s, t� � P�ot S st � s�maxs�
�P�st � s S st�1 � s��V�s�, t � 1��
L Replace P�st2 S st1� with T�st1 , aml , st2�, where
aml � argmaxa
Pt2t�t1 Pa�ot S ot�1�
t2 � t1
L Pa�ot S ot�1� is the vote of skill a by means of its progressfunction �f ap �ot�1� � f ap �ot�� plus the following heuristics ...
Learning and imitation in heterogeneous robot groups 39 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition: determining P�st2 S st1�
Viterbi: V�s, t� � P�ot S st � s�maxs�
�P�st � s S st�1 � s��V�s�, t � 1��
L Replace P�st2 S st1� with T�st1 , aml , st2�, where
aml � argmaxa
Pt2t�t1 Pa�ot S ot�1�
t2 � t1
L Pa�ot S ot�1� is the vote of skill a by means of its progressfunction �f ap �ot�1� � f ap �ot�� plus the following heuristics ...
Learning and imitation in heterogeneous robot groups 39 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition: determining Pa�ot S ot�1�
Viterbi: V�s, t� � P�ot S st � s�maxs�
�P�st � s S st�1 � s��V�s�, t � 1��1. Prefer nearer goals
Ambiguous situation: Robot might drive either to the red or yellow goal base
2. Ignore skills that “seem to have finished”
3. Clip votes to �0, 1�
Pa�ot S ot�1� �
¢¦¤min�max�
f ap �ot��f ap �ot�1�1�f ap �ot�
, 0� , 1� , 1 � f ap �ot� @ є0, otherwise
Learning and imitation in heterogeneous robot groups 40 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition: determining Pa�ot S ot�1�
Viterbi: V�s, t� � P�ot S st � s�maxs�
�P�st � s S st�1 � s��V�s�, t � 1��1. Prefer nearer goals
2. Ignore skills that “seem to have finished”
3. Clip votes to �0, 1�
Pa�ot S ot�1� �¢¦¤
min�max�
f ap �ot��f ap �ot�1�1�f ap �ot�
, 0� , 1�
, 1 � f ap �ot� @ є0, otherwise
Learning and imitation in heterogeneous robot groups 40 / 58
Introduction Architecture Imitation in robot groups Conclusion
Recognition: determining Pa�ot S ot�1�
Viterbi: V�s, t� � P�ot S st � s�maxs�
�P�st � s S st�1 � s��V�s�, t � 1��
1. Prefer nearer goals
2. Ignore skills that “seem to have finished”
3. Clip votes to �0, 1�
Pa�ot S ot�1� �¢¦¤min�max� f ap �ot��f ap �ot�1�
1�f ap �ot� , 0� , 1� , 1 � f ap �ot� @ є0, otherwise
Learning and imitation in heterogeneous robot groups 40 / 58
Introduction Architecture Imitation in robot groups Conclusion
Integrating recognized behavior
L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:
Itk�1tk
� �otk , atk , dtk , µtk , ftk , otk�1�
L Integrate recognized behavior into existing experience
Learning and imitation in heterogeneous robot groups 41 / 58
Introduction Architecture Imitation in robot groups Conclusion
Integrating recognized behavior
L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:
Itk�1tk
� �otk , atk , dtk , µtk , ftk , otk�1�duration
dtk � tk�1 � tk
L Integrate recognized behavior into existing experience
Learning and imitation in heterogeneous robot groups 41 / 58
Introduction Architecture Imitation in robot groups Conclusion
Integrating recognized behavior
L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:
Itk�1tk
� �otk , atk , dtk , µtk , ftk , otk�1�duration
dtk � tk�1 � tk
failureftk � false
L Integrate recognized behavior into existing experience
Learning and imitation in heterogeneous robot groups 41 / 58
Introduction Architecture Imitation in robot groups Conclusion
Integrating recognized behavior
L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:
Itk�1tk
� �otk , atk , dtk , µtk , ftk , otk�1�duration
dtk � tk�1 � tk
motivation
µItk � µDtk �µImax�µ
Imin
µDmax�µDmin
failureftk � false
L Integrate recognized behavior into existing experience
Learning and imitation in heterogeneous robot groups 41 / 58
Introduction Architecture Imitation in robot groups Conclusion
Integrating recognized behavior
L Estimate missing informationL recognition output: à � �. . . , ��tk , ok , sk�, ak , �tk�1 , ok�1 , sk�1�� , . . .�L needed for learning:
Itk�1tk
� �otk , atk , dtk , µtk , ftk , otk�1�duration
dtk � tk�1 � tk
motivation
µItk � µDtk �µImax�µ
Imin
µDmax�µDmin
failureftk � false
L Integrate recognized behavior into existing experience
Learning and imitation in heterogeneous robot groups 41 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “three bases”: results
L Time to reach the goal decreaseddrastically in the beginning
L Imitation slightly above no-imitationin the long-run.
L Reason: Robots have learned toprefer the black base: Higher reward,but longer way.
L More realistic measure: Reward/time
L Result: Imitation increases learningspeed by up to 50%
more results
Learning and imitation in heterogeneous robot groups 42 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “three bases”: abstraction
L Experience heuristic cuts off at 2000interactions
L Imitation starts with a lower amount of
experiences:L Reaching the goals faster � less
experienceL Learning faster to drive to the black
base � more experience
L Less than 6 abstracts states withno-imitation
L Less than 10 abstracts states withimitation
Learning and imitation in heterogeneous robot groups 43 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “three bases”: group homogeneity
L Percentage of goal bases
no imitation
imitation
number of episodes0 10 20 30 40 50
60
goal base
[%
]
10
20
30
40
50
yellowredblack
L Group homogeneity G�X� measured by
normalized Shannon entropy H�X��Hmax � log SXS�:L H�X� � �Q
x>X
p�x� log p�x�
L G�X� � Hmax �H�X�Hmax
group homogeneityimitationno-imitation
number of episodes0 10 20 30 40 50
hom
ogeneit
y
0.02
0.04
0.06
0.08
0.10
0.12
0.14
1.0 = all robots prefer the same goal
Learning and imitation in heterogeneous robot groups 44 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “five bases”: description
L Robots have to transport objects to goalbases
L No reward for reaching an object
L Goal bases provide different rewardL black: 10,000 pointsL blue, green, red, yellow: 20 points
L State space consists ofL distance to closest objectL distance of closest object to closest goalL ID of closest goal
Learning and imitation in heterogeneous robot groups 45 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “five bases”: results
L Time to reach the goal:lower for imitation in the beginning,higher in the long run
L Imitation above no-imitation in thelong-run.
L Reason: needle eye between greenand blue base
L More realistic measure: Reward/time
L Result: Imitation increases learningspeed in the long run by up to 100%
Learning and imitation in heterogeneous robot groups 46 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “five bases”: abstraction
L Experience heuristic cuts off at 2000interactions
L Less than 8 abstracts states withno-imitation
L Less than 10 abstracts states withimitation
Learning and imitation in heterogeneous robot groups 47 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationMulti-robot scenario “five bases”: group homogeneity
L Percentage of goal bases
no imitation
imitation
L Group homogeneity G�X� measured by
normalized Shannon entropy H�X��Hmax � log SXS�:L H�X� � �Q
x>X
p�x� log p�x�
L G�X� � Hmax �H�X�Hmax
group homogeneity
1.0 = all robots prefer the same goal
Learning and imitation in heterogeneous robot groups 48 / 58
Introduction Architecture Imitation in robot groups Conclusion
Choice of the imitateeTask
L Find the best imitatee prior to theimitation process itself
motivation layer
strategy layer
skill layer
current motivation
request result
perc
ep
tion
acti
on
ch
oic
eof
the
imit
ate
e
imit
ati
on
Approach
1. Observe behavior capabilities by means ofaffordances
2. Encode recognized affordances stochastically3. Compare representation differences
raw perceptionI
1. affordance detection
accumulated affordancesT
quit 2. affordance network generation
affordance networksAN1, . . . ,ANn
3. choice of the imitateeRimitate � argmin
Ri>R, RixRm
�DAN �ANi,ANm��
don’t imitate imitate
Learning and imitation in heterogeneous robot groups 49 / 58
Introduction Architecture Imitation in robot groups Conclusion
Affordance detectionAffordance testing conditions modeled as FSAs
driveto
object
alignto
object
seizeobject
finish
ok
fail
ok
fail
(a) seizable
driveto
object
alignto
object
seizeobject
liftobject
finish
ok
fail
ok
fail
ok
fail
(b) liftable
driveto
object
alignto
object
drivefor-
ward
finish
ok
fail
ok
fail
(c) pushable
driveto
object
alignto
object
seizeobject
driveback-ward
finish
ok
fail
ok
fail
ok
fail
(d) pullable
Learning and imitation in heterogeneous robot groups 50 / 58
Introduction Architecture Imitation in robot groups Conclusion
Affordance network generationPreparing collected data
Λ1 � seizable
Λ2 � liftable
Λ3 � pushable
Λ4 � pullable
T red
T redred T red
blue
Λj ok Validj�Ired�t�, ok ,Rred� Λj ok Validj�Ired�t�, ok ,Rblue�Λ1 o1 T Λ1 o1 T
Λ2 o1 � Λ2 o1 �
Λ3 o1 T Λ3 o1 T
Λ4 o1 F Λ4 o1 F
Λ1 o2 F Λ1 o2 F
Λ2 o2 � Λ2 o2 �
Λ3 o2 F Λ3 o2 F
Λ4 o2 F Λ4 o2 F
Λ1 o3 T Λ1 o3 T
Λ2 o3 T Λ2 o3 T
Λ3 o3 T Λ3 o3 T
Λ4 o3 T Λ4 o3 F
ok Λ1 Λ2 Λ3 Λ4o1 T � T F
o2 F � F F
o3 T T T T
ok Λ1 Λ2 Λ3 Λ4o1 T � T F
o2 F � F F
o3 T T T F
T redred
�T red
blue�
Learning and imitation in heterogeneous robot groups 51 / 58
Introduction Architecture Imitation in robot groups Conclusion
Affordance network generationPreparing collected data
Λ1 � seizable
Λ2 � liftable
Λ3 � pushable
Λ4 � pullable
T red
T redred T red
blue
Λj ok Validj�Ired�t�, ok ,Rred� Λj ok Validj�Ired�t�, ok ,Rblue�Λ1 o1 T Λ1 o1 T
Λ2 o1 � Λ2 o1 �
Λ3 o1 T Λ3 o1 T
Λ4 o1 F Λ4 o1 F
Λ1 o2 F Λ1 o2 F
Λ2 o2 � Λ2 o2 �
Λ3 o2 F Λ3 o2 F
Λ4 o2 F Λ4 o2 F
Λ1 o3 T Λ1 o3 T
Λ2 o3 T Λ2 o3 T
Λ3 o3 T Λ3 o3 T
Λ4 o3 T Λ4 o3 F
ok Λ1 Λ2 Λ3 Λ4o1 T � T F
o2 F � F F
o3 T T T T
ok Λ1 Λ2 Λ3 Λ4o1 T � T F
o2 F � F F
o3 T T T F
T redred
�T red
blue�
Learning and imitation in heterogeneous robot groups 51 / 58
Introduction Architecture Imitation in robot groups Conclusion
Affordance network generationLearned Bayesian networks [Friedman, 1997]
T redred
�
ok Λ1 Λ2 Λ3 Λ4o1 T � T F
o2 F � F F
o3 T T T T
�
ANred
A1
A3
0
1
P�A1�
0.00
1.00
A4
A3
0
1
P�A4�
0.00
0.33
A3
P�A3� � 0.60
A2
P�A2� � 0.60
T redblue
�
ok Λ1 Λ2 Λ3 Λ4o1 T � T F
o2 F � F F
o3 T T T F
�
ANblue
A1
A3
0
1
P�A1�
0.00
1.00
A2
P�A2� � 0.60
A3
P�A3� � 0.60
A4
P�A4� � 0.00
Learning and imitation in heterogeneous robot groups 52 / 58
Introduction Architecture Imitation in robot groups Conclusion
Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]
ANred
A1
A3
0
1
P�A1�
0.00
1.00
A4
A3
0
1
P�A4�
0.00
0.33
A3
P�A3� � 0.60
A2
P�A2� � 0.60
ANblue
L Affordance network difference:
DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S
� 2 � 1 � 2 � 1
L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue
i �
� 0.33
Learning and imitation in heterogeneous robot groups 53 / 58
Introduction Architecture Imitation in robot groups Conclusion
Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]
ANred
A1
A3
0
1
P�A1�
0.00
1.00
A4
A3
0
1
P�A4�
0.00
0.33
A3
P�A3� � 0.60
A2
P�A2� � 0.60
ANblue
L Affordance network difference:
DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S
� 2 � 1 � 2 � 1
L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue
i �
� 0.33
Learning and imitation in heterogeneous robot groups 53 / 58
Introduction Architecture Imitation in robot groups Conclusion
Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]
ANred
A1
A3
0
1
P�A1�
0.00
1.00
A4
A3
0
1
P�A4�
0.00
0.33
A3
P�A3� � 0.60
A2
P�A2� � 0.60
ANblue
L Affordance network difference:
DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S
� 2 � 1 � 2 � 1
L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue
i �
� 0.33
Learning and imitation in heterogeneous robot groups 53 / 58
Introduction Architecture Imitation in robot groups Conclusion
Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]
ANred
A1
A3
0
1
P�A1�
0.00
1.00
A4
A3
0
1
P�A4�
0.00
0.33
A3
P�A3� � 0.60
A2
P�A2� � 0.60
ANblue
L Affordance network difference:
DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S
� 2 � 1 � 2 � 1
L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Aredi ,Ablue
i �
� 0.33
Learning and imitation in heterogeneous robot groups 53 / 58
Introduction Architecture Imitation in robot groups Conclusion
Comparing affordance networksBased on Graph Edit Distance [Dickinson et al., 2003]
ANred
A1
A3
0
1
P�A1�
0.00
1.00
A4
A3
0
1
P�A4�
0.00
0.33
A3
P�A3� � 0.60
A2
P�A2� � 0.60
ANblue
L Affordance network difference:
DAN�ANred ,ANblue� � η �Dstruct�ANred ,ANblue� � �1 � η� �Dparam�ANred ,ANblue�L structural difference: Dstruct�ANred ,ANblue� � SCredS � SCblueS � 2SC0 S � 2 � 1 � 2 � 1L parameter difference: Dparam�ANred ,ANblue� � P4i�1 Dparam�Ared
i ,Abluei � � 0.33
Learning and imitation in heterogeneous robot groups 53 / 58
Introduction Architecture Imitation in robot groups Conclusion
Choosing the best imitatee
A1
A3
01
P(A1)
0.001.00
A4
A3
01
P(A4)
0.000.33
A3
P(A3) = 0.60A2
P(A2) = 0.60
Rimitate � argminRi>R, RixRred
�DAN �ANi,ANred��
Learning and imitation in heterogeneous robot groups 54 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationExperimental setup
L Robots imitate other robots that areperforming different skills on thedifferent objects
L Thereby: Listening to the imitator’serror functions of the involved skills
L Number of failure signals received bythe strategy layer serves as anindicator how wise the demonstratorchoice had been.
Learning and imitation in heterogeneous robot groups 55 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationParametrization of the robots and objects
robots
parameter values description
moto
r
power �0.3, 6.0� kg maximal weight a robot canpull/push
speed �0.03, 0.2� m~s controls the impulse a robot im-pact on an object
gri
pp
er
length �0.08, 0.2�m the longer the gripper the deeperthe objects can be
span �0.16, 0.5� m limits the diameter of objects thatcan be gripped
closing force�1.0, 30.0� kg controls the contact pressure (to
pull heavier objects the closingforce must be higher)
lifting force�30.0, 80.0� kg controls the friction (to lift heavier
objects the closing force must behigher)
form �normal, barb� see figures below
objects
parameter values discret.
mass �1.0, 5.0� kg 0.5 kg
width �0.04, 0.24� m 0.05 m
height �0.17, 0.2� m 0.05 m
friction �50, 100� % 0.1 %
shape { sphere, cube,cylinder}
Learning and imitation in heterogeneous robot groups 56 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationRobustness experiment
Figure: Impact of the demonstrator selection algorithm on the failure rates fordifferent fractions of unknown data (with 95% confidence interval for the 0% case)
Learning and imitation in heterogeneous robot groups 57 / 58
Introduction Architecture Imitation in robot groups Conclusion
EvaluationClustering experiment
L Cluster objects by their appearanceprior to AN creation
L Generate one AN for each clusterL Distance is the weighted sum:
DcAN�Ra ,Rb� � Pn
l�1klk� DAN�ANa,l , ANb,l�
k � min ��ST ma,l S � ST m
b,l S� S 1 B l B n�
kl � min �ST ma,l S, ST m
b,l S�
Impact of the clustered demonstrator selectionalgorithm
Learning and imitation in heterogeneous robot groups 58 / 58
Introduction Architecture Imitation in robot groups Conclusion
Baele, G., Bredeche, N., Haasdijk, E., Maere, S., Michiels, N., Van de Peer, Y., Schwarzer, C., and Thenius, R.
(2009).Open-ended on-board evolutionary robotics for robot swarms.In Tyrrell, A., editor, 2009 IEEE Congress on Evolutionary Computation, pages –, Trondheim, Norway. IEEEComputational Intelligence Society, IEEE Press.
Dautenhahn, K. and Nehaniv, C. (2002).
Imitation in Animals and Artifacts, chapter “An agent-based perspective on imitation”.MIT Press.
Dickinson, P. J., Bunke, H., Dadej, A., and Kraetzl, M. (2003).
On graphs with unique node labels.In Graph Based Representations in Pattern Recognition, volume 2726, pages 409–437, Heidelberg, DE.Springer Berlin.
Dorigo, M., Tuci, E., Trianni, V., Gro"s, R., Nouyan, S., Ampatzis, C., Labella, T. H., O’Grady, R., Bonani, M.,
and Mondada, F. (2006).SWARM-BOT: Design and Implementation of Colonies of Self-Assembling Robots.In Computational Intelligence: Principles and Practice. IEEE Computational Intelligence Society, New York.
Friedman, N. (1997).
Learning belief networks in the presence of missing values and hidden variables.In Proc. 14th International Conference on Machine Learning, pages 125–133. Morgan Kaufmann.
Inamura, T., Toshima, I., Nakamura, Y., and Saitama, J. (2003).
Acquiring Motion Elements for Bidirectional Computation of Motion Recognition and Generation.Experimental Robotics VIII.
Kochenderfer, M. (2006).
Adaptive Modelling and Planning for Learning Intelligent Behaviour.PhD thesis, School of Informatics, University of Edinburgh.
Learning and imitation in heterogeneous robot groups 58 / 58
Introduction Architecture Imitation in robot groups Conclusion
Priesterjahn, S. (2008).
Online imitation and adaptation in modern computer games.PhD thesis, University of Paderborn.
Takahashi, Y., Tamura, Y., and Asada, M. (2008).
Mutual development of behavior acquisition and recognition based on value system.In From Animals to Animats 10, 10th International Conference on Simulation of Adaptive Behavior (SAB2008), pages 291–300.
Viterbi, A. (1967).
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.Information Theory, IEEE Transactions on, 13(2):260–269.
Learning and imitation in heterogeneous robot groups 58 / 58