Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.

16
Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th , 2007
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    3

Transcript of Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.

Learning From DemonstrationAtkeson and Schaal

Dang, RLAB

Feb 28th, 2007

Feb 28th, 2007 Dang, RLAB 2

Goal

• Robot Learning from Demonstration– Small number of human demonstrations– Task level learning (learn intent, not just

mimicry)

• Explore– Parametric vs. nonparametric learning– role of a priori knowledge

Feb 28th, 2007 Dang, RLAB 3

Known Task

• Pendulum swing-up task– Like pole balancing, but more complex– Difficult, but easy to evaluate success

• Simplified– Restricted to horz. motion– Impt. variables picked out

• Pendulum angle

• Pendulum angular velocity

• Hand location

• Hand velocity

• Hand acceleration

Feb 28th, 2007 Dang, RLAB 4

Implementation details

• SARCOS 7DOF arm• Stereo Vision, colored ball indicators• 0.12s delay overcome with Kalman filter

– Idealized pendulum dynamics• Redundant inverse kinematics and real-time

inverse dynamics for control

Feb 28th, 2007 Dang, RLAB 5

Learning

• Task composed of two subtasks• Believe that subtask learning accelerates new task

learning

– 1 Pole Swing up• open-loop

– 2 Upright Balance• Feedback

• Focus here on swing-up– Balancing already learned

Feb 28th, 2007 Dang, RLAB 6

First approach

• Directly mimic human hand movement– Fails

• Differences in human and robot capabilities• Improper demonstration (not horizontal)• Imprecise mimicry

Feb 28th, 2007 Dang, RLAB 7

Approach the second

• Learn reward–

• Learn a model–

• Use human demonstration as seed so a planner can find a good policy

k kk krC ,,ux

kkk f uxx ,1

Feb 28th, 2007 Dang, RLAB 8

Learn Task Model

• Parametric:– – learn parameters via linear regression

• Nonparametric– – Use Locally Weighted Learning– Given desired variable and a set of possibly relevant

input variables• Cross validation to tune meta-parameters

gx kkkkk /cossin1 211

kkkkkk xxxf ,,,,1

Feb 28th, 2007 Dang, RLAB 9

Swing up

• Transition to balance occurs at ± 0.5 radians with angular vel. < 3 rad/sec

• Reward function set to make robot want to be like demonstrator– kkkkkkkk kr uuxxxxux TdTd ,,

Feb 28th, 2007 Dang, RLAB 10

Parametric

• Parameters learned from failure data

• Trajectory optimized using human trajectory as seed

• SUCCESS

Feb 28th, 2007 Dang, RLAB 11

Nonparametric

• Slower, but still successful

Feb 28th, 2007 Dang, RLAB 12

Harder Task

• Double pump swing up– Approach fails

• Believed to be due to improper modeling of the system

• Solved by

Feb 28th, 2007 Dang, RLAB 13

Direct task-level learning

• Learn a correction term to add to the target angle– Now target ± (0.5+∆)rad– Use binary search

• Worked for parametric• Didn’t for nonparametric

– Left region of validity of local models– So, tweak velocity all over

• Binary search for coefficient

Feb 28th, 2007 Dang, RLAB 14

Results

Feb 28th, 2007 Dang, RLAB 15

Summary of Technique

Watch demo, mimic hand

Learn model, optimize demo trajectory

Tune model, reoptimize

Binary search for delta

Binary search for c

Succeeds for

None

Parametric, single

Nonparametric, single

Parametric, double

Nonparametric, double

Math

gx kkk

kk

/cossin

1

2

11

kkkkkk xxxf ,,,,1

Tct /1

Feb 28th, 2007 Dang, RLAB 16

Discussion points

• Reward function was given or learned?• Does task-level direct learning make sense?

– Only useful in this task / implementation?– I in PID?

• Nonparametrics don’t avoid all modeling errors– Poor planner? – Not enough data?

• A priori knowledge– human selects inputs, outputs, control system, perception,

model selection, reward function, task segmenting, task factors

• It Works!