Learning to Control an Octopus Arm with Gaussian Process...
Transcript of Learning to Control an Octopus Arm with Gaussian Process...
Learning to Control an Octopus Arm
with Gaussian Process Temporal
Difference Methods
Yaakov Engel
Collaborators: Peter Szabo and Dmitry Volkinshtein
Bayesian RL Tutorial 2/16
The Octopus Arm
Can bend and twist at any point
Can do this in any direction
Can be elongated and shortened
Can change cross section
Can grab using any part of the arm
Virtually infinitely many DOF
Bayesian RL Tutorial 3/16
Octopus Arm Anatomy 101
Bayesian RL Tutorial 4/16
Our Arm Model
CN
� �� �� �� �� �� �� �� �� �� �
� �� �� �� �� �� �� �� �
C1
� �� ��
����
����
��
���� �� !
"#$%&'
()*+
, ,, ,, ,- -- -- -. .. .. .. ./ // // // / 0 00 00 01 11 11 1
2 22 22 23 33 33 3
ventral side
dorsal side
pair #1
pair #N+1
longitudinal muscle
longitudinal muscle
transverse muscle
transverse muscle
arm tip
arm base
Bayesian RL Tutorial 5/16
The Muscle Model
� � � �� � � �� � � �� � � �� � � �
� � � �� � � �� � � �� � � �� � � �
f(a) = (k0 + a(kmax − k0)) (` − `0) + βd`
dt
a ∈ [0, 1]
Bayesian RL Tutorial 6/16
Other Forces
• Gravity
• Buoyancy
• Water drag
• Internal pressures (maintain constant compartmental volume)
Dimensionality
10 compartments ⇒
22 point masses × (x, y, x, y)
= 88 state variables
Bayesian RL Tutorial 7/16
The Control Problem
Starting from a random position, bring {any part, tip} of arm
into contact with a goal region, optimally.
Optimality criteria:
Time, energy, obstacle avoidance
Constraint:
We only have access to sampled trajectories
Our approach:
Define problem as a MDP
Apply Reinforcement Learning algorithms
Bayesian RL Tutorial 8/16
The Task
−0.1 −0.05 0 0.05 0.1 0.15
−0.1
−0.05
0
0.05
0.1
0.15
t = 1.38
Bayesian RL Tutorial 9/16
Actions
Each action specifies a set of fixed activations –
one for each muscle in the arm.
Action # 1 Action # 2 Action # 3
Action # 4 Action # 5 Action # 6
Base rotation adds duplicates of actions 1,2,4 and 5 with
positive and negative torques applied to the base.
Bayesian RL Tutorial 10/16
Rewards
Deterministic rewards:
+10 for a goal state,
Large negative value for obstacle hitting,
-1 otherwise.
Energy economy:
A constant multiple of the energy expended by the muscles in
each action interval was deducted from the reward.
Bayesian RL Tutorial 11/16
Now, to the Movies...
Bayesian RL Tutorial 12/16
Fixed Base Task I
Bayesian RL Tutorial 13/16
Fixed Base Task II
Bayesian RL Tutorial 14/16
Rotating Base Task I
Bayesian RL Tutorial 15/16
Rotating Base Task II
Bayesian RL Tutorial 16/16