Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.

Learning From DemonstrationAtkeson and Schaal

Dang, RLAB

Feb 28th, 2007

Feb 28th, 2007 Dang, RLAB 2

Goal

• Robot Learning from Demonstration– Small number of human demonstrations– Task level learning (learn intent, not just

mimicry)

• Explore– Parametric vs. nonparametric learning– role of a priori knowledge


Known Task

• Pendulum swing-up task– Like pole balancing, but more complex– Difficult, but easy to evaluate success

• Simplified– Restricted to horz. motion– Impt. variables picked out

• Pendulum angle

• Pendulum angular velocity

• Hand location

• Hand velocity

• Hand acceleration


Implementation details

• SARCOS 7DOF arm• Stereo Vision, colored ball indicators• 0.12s delay overcome with Kalman filter

– Idealized pendulum dynamics• Redundant inverse kinematics and real-time

inverse dynamics for control


Learning

• Task composed of two subtasks• Believe that subtask learning accelerates new task

learning

– 1 Pole Swing up• open-loop

– 2 Upright Balance• Feedback

• Focus here on swing-up– Balancing already learned


First approach

• Directly mimic human hand movement– Fails

• Differences in human and robot capabilities• Improper demonstration (not horizontal)• Imprecise mimicry


Approach the second

• Learn reward–

• Learn a model–

• Use human demonstration as seed so a planner can find a good policy

k kk krC ,,ux

kkk f uxx ,1


Learn Task Model

• Parametric:– – learn parameters via linear regression

• Nonparametric– – Use Locally Weighted Learning– Given desired variable and a set of possibly relevant

input variables• Cross validation to tune meta-parameters

gx kkkkk /cossin1 211

kkkkkk xxxf ,,,,1


Swing up

• Transition to balance occurs at ± 0.5 radians with angular vel. < 3 rad/sec

• Reward function set to make robot want to be like demonstrator– kkkkkkkk kr uuxxxxux TdTd ,,


Parametric

• Parameters learned from failure data

• Trajectory optimized using human trajectory as seed

• SUCCESS


Nonparametric

• Slower, but still successful


Harder Task

• Double pump swing up– Approach fails

• Believed to be due to improper modeling of the system

• Solved by


Direct task-level learning

• Learn a correction term to add to the target angle– Now target ± (0.5+∆)rad– Use binary search

• Worked for parametric• Didn’t for nonparametric

– Left region of validity of local models– So, tweak velocity all over

• Binary search for coefficient


Results


Summary of Technique

Watch demo, mimic hand

Learn model, optimize demo trajectory

Tune model, reoptimize

Binary search for delta

Binary search for c

Succeeds for

None

Parametric, single

Nonparametric, single

Parametric, double

Nonparametric, double

Math

gx kkk

kk

/cossin

1

2

11

kkkkkk xxxf ,,,,1

Tct /1


Discussion points

• Reward function was given or learned?• Does task-level direct learning make sense?

– Only useful in this task / implementation?– I in PID?

• Nonparametrics don’t avoid all modeling errors– Poor planner? – Not enough data?

• A priori knowledge– human selects inputs, outputs, control system, perception,

model selection, reward function, task segmenting, task factors

• It Works!

Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.

Documents

Transcript of Learning From Demonstration Atkeson and Schaal Dang, RLAB Feb 28 th, 2007.