Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi...

32
IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi [email protected] , {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior- based System using Reinforcement Learning Department of Electrical and Computer Engineering University of Tehran Iran

Transcript of Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi...

Page 1: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi

[email protected], {mnili, araabi}@ut.ac.ir

Behavior HierarchyLearning in a Behavior-

based System usingReinforcement Learning

Department of Electrical and Computer EngineeringUniversity of Tehran

Iran

Page 2: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Paper Outline

• Challenges and Requirements of Robotic Systems• Behavior-based Approach to AI• How should we design a Behavior-based System

(BBS)?!• Learning in BBS• Structure Learning in BBS• Value Function Decomposition• Experiments: Multi-Robot Object Lifting• Conclusions, Ongoing Research, and Future Work

Page 3: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Challenges andRequirements

of Robotic SystemsChallenges

• Sensor and EffectorUncertainty

• Partial Observability

• Non-Stationarity

Requirements

(among many others)

• Multi-goal

• Robustness

• Multiple Sensors

• Scalability

• Automatic design

• [Learning]

Page 4: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Behavior-based Approach to AI

• Behavior-based approach as a good candidate for low-levelintelligence.

• Behavioral (activity) decomposition– against functional decomposition

• Behavior: Sensor->Action (Direct link between perception andaction)

• Situatedness– Situatedness motto: The world is its own best model!

• Embodiment• Intelligence as Emergence

– (interaction of agent with environment)

Page 5: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Behavioral decomposition

manipulatethe world

build maps

explore

locomote

avoid obstacles

sensors actuators

Page 6: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Behavior-based System Design

• Hand Design– Common in almost everywhere (just ask some people in

IROS04)– Complicated: may be infeasible in complex problems– Even if it is possible to find a working system, probably it is not

optimal.

• Evolution– Time consuming– Good solutions can be found– Biologically feasible

• Learning– Biologically feasible– Learning is essential for life-time survival of the agent.

We have focuses on learning in this presentation.

Page 7: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

The Importance of Learning

• Unknown environment/body– [exact] Model of environment/body is not known

• Non-stationary environment/body– Changing environment (offices, houses, streets, and almost

everywhere)– Aging

• Designer may not know how to benefit from everyaspects of her agent/environment– Let’s the agent learn it by itself (learning as optimization)

• etc …

Page 8: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Learning in Behavior-basedSystems

• There are a few works on behavior-basedlearning– Mataric, Mahadevan, Maes, and ...

• … but there is no deep investigation aboutit (specially mathematical formulation)!

Page 9: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Learning in Behavior-basedSystems

There are different methods of learning withdifferent viewpoints, but we haveconcentrated on Reinforcement Learning.– [Agent] Did I perform it correctly?!

– [Tutor] Yes/No!

Page 10: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Learning in Behavior-basedSystems

We have divided learning in BBS into these twoparts:

• Structure Learning– How should we organize behaviors in the architecture

assume having a repertoire of working behaviors

• Behavior Learning– How should each behavior behave? (we do not have

a necessary toolbox)

Page 11: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Learning Assumptions

• Structure Learning inSubsumption Architecture as agood sample for BBS

• Purely parallel case• We know B1, B2, and … but we

do not know how to arrangethem in the architecture

– we know how to {avoidobstacles, pick an object,stop, move forward, turn,…} but we don’t knowwhich one is superior toothers.

Page 12: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Learning

manipulatethe world

build maps

explore

locomoteavoid obstacles

Behavior Toolbox

The agent wants to learnhow to arrange thesebehaviors in order to getmaximum reward from itsenvironment (or tutor).

Page 13: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Learning

manipulatethe world

build maps

explore

locomoteavoid obstacles

Behavior Toolbox

Page 14: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox 1-explore becomescontrolling behavior andsuppress avoid obstacles

2-The agent hits a wall!

Page 15: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox Tutor (environment) givesexplore a punishment for itsbeing in that place of thestructure.

Page 16: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Learning

manipulatethe world

build maps

explorelocomote

avoid obstacles

Behavior Toolbox“explore” is not a very goodbehavior for the highestposition of the structure. Soit is replaced by “avoidobstacles”.

Page 17: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Learning Issues

• How should we represent structure?– Sufficient (Concept space should be covered by

Hypothesis space)– Tractable (small Hypothesis space)– Well-defined credit assignment

• How should we assign credits to architecture?– If the agent receives a reward/punishment, how

should we reward/punish structure of thearchitecture?

Page 18: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Value Function Decomposition andStructure Learning

Each structure has a value regarding itsreceiving reinforcement signal.

[ ]T structure agent with thetTrEV =

•The objective is finding a structure T with ahigh value.•We have decomposed value function tosimpler components that enable us to benefitfrom previous experiments.

Page 19: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Value Function Decomposition

• It is possible to decompose total system’s valueto value of each behavior in each layer.

• We call it Zero-Order method.

[ ]layeri in thebehavior gcontrollin is ),( th

jtijZO BrEVjiV ==

Page 20: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Value Function DecompositionZero Order Method

It stores the value of behavior-being in a specificlayer.

avoid obstacles(0.8)

avoid obstacles(0.6)

explore(0.7)

explore(0.9)

locomote(0.4)Higher layer

Lower layer

ZO Value Table in the agent’s mind

locomote(0.4)

Page 21: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Credit Assignment forZero Order Method

• Controlling behavior is the only responsiblebehavior for the current reinforcement signal.

• Appropriate ZO value table updating method isavailable.

Page 22: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Value Function DecompositionAnother Method (First Order)

It stores the value of relative order of behaviors– How much is it good/bad if “B1 is being placed higher than B2”?!

• V(avoid obstacles>explore) = 0.8

• V(explore>avoid obstacles) = -0.3

• Sorry! Not that easy (and informative) to show graphically!!• Credits are assigned to all (controlling, activated) pairs of

behaviors.– The agent receives reward while B1 is controlling and B3 and B5 are

activated• (B1>B3): +• (B1>B5): +

Page 23: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Structure Representation

Both of these methods are provided with alot of probabilistic reasoning which showshow to– decompose total system value to simple

components

– assign credits

– update values table

Check the Proceeding for MathematicalFormulation!

Page 24: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Example: Multi-RobotObject Lifting

• A Group of three robots wantto lift an object using theirown local sensors– No central control

– No communication

– Local sensors

• Objectives– Reaching prescribed height

– Keeping tilt angle small

Page 25: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Example: Multi-RobotObject Lifting

Behavior Toolbox

Stop

Push More

Hurry Up

Slow DownDon’t Go Fast

?!

Page 26: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Example: Multi-RobotObject Lifting

Sample shot of tilt angle of the object after sufficient learning

5 10 15 20 25 30 35 40 45 50

-40

-30

-20

-10

0

10

20

30

40

Episodes

Average total reward per episode

Mean hand-designed performance

Zero order

First order

Page 27: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Example: Multi-RobotObject Lifting

Sample shot of height of each robot after sufficient learning

0 10 20 30 40 50 60 70 80 900

0.5

1

1.5

2

2.5

3

3.5

Steps

z of robots

goal

1

2

3

Page 28: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Example: Multi-RobotObject Lifting

Sample shot of tilt angle of the object after sufficient learning

0 10 20 30 40 50 60 70 80 900

5

10

15

20

25

30

35

40

45

Steps

Tilt angle (in degrees)

Page 29: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Conclusions, Ongoing Research,and Future Work

• We have devised two different methods forstructure learning for behavior-basedsystem.

• Good results in two different tasks– Multi-robot Object Lifting

– An Abstract Problem (not reported yet)

Page 30: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Conclusions, Ongoing Research,and Future Work

• … but from where should we findnecessary behaviors?!– Behavior Learning

• We have devised some methods forbehavior learning which will be reportedsoon.

Page 31: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran

Conclusions, Ongoing Research,and Future Work

• However, there are many steps remained forfully automated agent design– How should we generate new behaviors without even

knowing which sensory information is necessary forthe task (feature selection)

– Problem of Reinforcement Signal Design• Designing a good reinforcement signal is not easy at all.

Page 32: Behavior Hierarchy Learning in a Behavior- based System ...€¦ · Babak Najar Araabi farahmand@ipm.ir, {mnili, araabi}@ut.ac.ir Behavior Hierarchy Learning in a Behavior-based System

IROS04 (Japan, Sendai)University of Tehran