Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile...

6
Tactile Sensing and Deep Reinforcement Learning for In-Hand Manipulation Tasks Andrew Melnik 1 , Luca Lach 1 , Matthias Plappert 2 , Timo Korthals 1 , Robert Haschke 1 , and Helge Ritter 1 Abstract— Deep Reinforcement Learning techniques demon- strate advances in the domain of robotics. One of the limiting factors is the large number of interaction samples usually required for training in simulated and real-world environ- ments. In this work, we demonstrate that tactile information substantially increases sample efficiency for training (by 97% on average), and simultaneously increases the performance in dexterous in-hand manipulation of objects tasks (by 21% on average). To examine the role of tactile-sensor parameters in these improvements, we conducted experiments with varied sensor-measurement accuracy (Boolean vs. float values), and varied spatial resolution of the tactile sensors (92 sensors vs. 16 sensors on the hand). We conclude that ground-truth touch- sensor readings as well as dense tactile resolution do not further improve performance and sample efficiency in the tasks. We make available these touch-sensors extensions as a part of OpenAI-Gym robotics Shadow-Dexterous-Hand environments. Fig. 1. Shadow Dexterous Hand equipped with fabrics-based tactile sensors in the palm and finger phalanges (indicated green) and fingertip sensors realized by Molded-Interconnect-Devices (indicated yellow) [1, 2] Fig. 2. 92 touch sensors covering the Shadow Dexterous Hand model. This is a technical visualization to represent the essence of our model. Red sites represent activated touch sensors, where a block is pressing against the touch sensitive area. Green sites represent inactive touch sensors. A video demonstration of the extended environments can be found at https://rebrand.ly/TouchSensors . This work was supported by the Cluster of Excellence Cognitive Inter- action Technology (CITEC) which is funded by DFG (EXC 277) 1 CITEC, Bielefeld University, 33619 Bielefeld, Germany. [email protected] 2 OpenAI, San Francisco, USA I. INTRODUCTION Deep Reinforcement Learning techniques demonstrate ad- vances in the domain of robotics. For example, dexterous in-hand manipulation of objects with an anthropomorphic robotic hand [3, 4]. An agent with a model-free policy was able to learn complex in-hand manipulation tasks using just proprioceptive feedback and visual information about the manipulated object. Despite the good performance of the agent, absence of tactile sensing imposes certain limitations on these approaches. In cases, like insufficient lighting conditions, or partial visibility of the manipulated object due to occlusion (e.g. by the manipulator itself), the tactile sensing may be necessary to perform an in-hand manip- ulation task [5]. Continuous haptic feedback can improve grasping acquisition in terms of robustness under uncertainty [6]. Different to these works with static objects, we present empirical results in simulation that show that including tactile information in the state improves the sample efficiency and performance for dynamic in-hand manipulation of objects. Humans perform better in dexterous in-hand manipulation tasks than robotic systems. One of the reasons is the rich tactile perception available to humans which allows to rec- ognize and manipulate an object even without vision [7], instead just rely on tactile information. In such cases, tactile perception is one of the key abilities for in-hand manipulation of objects and tool usage. The importance of tactile sensing for object recognition was demonstrated on a multifingered robotic hand [8] as well as for successful grasping on a two- fingered gripper with high-resolution tactile sensors [9]. Multisensory fusion [10] techniques may help at the level of geometry, contact and force physics. For example, an autoencoder can translate high-dimensional sensor represen- tations in a compact space, and a reinforcement learner can learn stable, non-linear policies [11]. Learning curves for tactile representations learned with different types of autoencoders can be found here [11]. Recent works describe approaches to bring the tactile sensing to anthropomorphic hands like the Shadow Dexter- ous Hand, by providing integrated tactile fingertips [1] as shown in Fig. 1 and constructing a flexible tactile skin [2]. The tactile skin comprises stretchable and flexible, fabric- based tactile sensors capable of capturing typical human interaction forces within the palm and proximal and distal phalanges of the hand. This enables the hand to exploit tactile information, e.g. for contact or slip detection [12, 13]. The distribution of tactile sensors in these works resembles our segmentation of the simulated Shadow Dexterous Hand

Transcript of Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile...

Page 1: Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile Sensing and Deep Reinforcement Learning for In-Hand Manipulation Tasks Andrew Melnik 1,

Tactile Sensing and Deep Reinforcement Learning forIn-Hand Manipulation Tasks

Andrew Melnik1 Luca Lach1 Matthias Plappert2 Timo Korthals1 Robert Haschke1 and Helge Ritter1

Abstractmdash Deep Reinforcement Learning techniques demon-strate advances in the domain of robotics One of the limitingfactors is the large number of interaction samples usuallyrequired for training in simulated and real-world environ-ments In this work we demonstrate that tactile informationsubstantially increases sample efficiency for training (by 97on average) and simultaneously increases the performance indexterous in-hand manipulation of objects tasks (by 21 onaverage) To examine the role of tactile-sensor parameters inthese improvements we conducted experiments with variedsensor-measurement accuracy (Boolean vs float values) andvaried spatial resolution of the tactile sensors (92 sensors vs16 sensors on the hand) We conclude that ground-truth touch-sensor readings as well as dense tactile resolution do not furtherimprove performance and sample efficiency in the tasks Wemake available these touch-sensors extensions as a part ofOpenAI-Gym robotics Shadow-Dexterous-Hand environments

Fig 1 Shadow Dexterous Hand equipped with fabrics-based tactile sensorsin the palm and finger phalanges (indicated green) and fingertip sensorsrealized by Molded-Interconnect-Devices (indicated yellow) [1 2]

Fig 2 92 touch sensors covering the Shadow Dexterous Hand modelThis is a technical visualization to represent the essence of our modelRed sites represent activated touch sensors where a block is pressingagainst the touch sensitive area Green sites represent inactive touch sensorsA video demonstration of the extended environments can be found athttpsrebrandlyTouchSensors

This work was supported by the Cluster of Excellence Cognitive Inter-action Technology (CITEC) which is funded by DFG (EXC 277)

1CITEC Bielefeld University 33619 Bielefeld Germanyandrewmelnikpapersgmailcom

2OpenAI San Francisco USA

I INTRODUCTION

Deep Reinforcement Learning techniques demonstrate ad-vances in the domain of robotics For example dexterousin-hand manipulation of objects with an anthropomorphicrobotic hand [3 4] An agent with a model-free policy wasable to learn complex in-hand manipulation tasks using justproprioceptive feedback and visual information about themanipulated object Despite the good performance of theagent absence of tactile sensing imposes certain limitationson these approaches In cases like insufficient lightingconditions or partial visibility of the manipulated objectdue to occlusion (eg by the manipulator itself) the tactilesensing may be necessary to perform an in-hand manip-ulation task [5] Continuous haptic feedback can improvegrasping acquisition in terms of robustness under uncertainty[6] Different to these works with static objects we presentempirical results in simulation that show that including tactileinformation in the state improves the sample efficiency andperformance for dynamic in-hand manipulation of objects

Humans perform better in dexterous in-hand manipulationtasks than robotic systems One of the reasons is the richtactile perception available to humans which allows to rec-ognize and manipulate an object even without vision [7]instead just rely on tactile information In such cases tactileperception is one of the key abilities for in-hand manipulationof objects and tool usage The importance of tactile sensingfor object recognition was demonstrated on a multifingeredrobotic hand [8] as well as for successful grasping on a two-fingered gripper with high-resolution tactile sensors [9]

Multisensory fusion [10] techniques may help at the levelof geometry contact and force physics For example anautoencoder can translate high-dimensional sensor represen-tations in a compact space and a reinforcement learnercan learn stable non-linear policies [11] Learning curvesfor tactile representations learned with different types ofautoencoders can be found here [11]

Recent works describe approaches to bring the tactilesensing to anthropomorphic hands like the Shadow Dexter-ous Hand by providing integrated tactile fingertips [1] asshown in Fig 1 and constructing a flexible tactile skin [2]The tactile skin comprises stretchable and flexible fabric-based tactile sensors capable of capturing typical humaninteraction forces within the palm and proximal and distalphalanges of the hand This enables the hand to exploittactile information eg for contact or slip detection [12 13]The distribution of tactile sensors in these works resemblesour segmentation of the simulated Shadow Dexterous Hand

into 92 tactile-sensitive areas [14] proposed a deep tac-tile model-predictive control framework for non-prehensilemanipulation to perform tactile servoing and to repositionan object to user-specified configurations indicated by agoal tactile reading using the learned tactile predictivemodel [15] studied how tactile feedback can be exploitedto adapt to unknown rollable objects located on a table anddemonstrated the possibility of learning feedback controllersfor in-hand manipulation using reinforcement learning on anunderactuated compliant platform The feedback controllerwas hand-tuned to successfully complete the tasks

Another challenge is the sample efficiency in learningof model free DRL methods Assuming the importance oftactile information for in-hand manipulation of objects itis necessary to measure how sensory information influenceslearning and overall performance for in-hand manipulationtasks We selected well established and benchmarked Ope-nAI Gym simulated environments for robotics which includein-hand manipulation of objects with anthropomorphic-handtasks We extended these environments with touch sensorsand compared the steepness of the learning curves with andwithout tactile information We demonstrate that the sampleefficiency and performance can be increased when tactileinformation is available to the agent To fully examine theroll of tactile sensing we are providing analysis of sensormeasurement accuracy - a highly important aspect in usingtactile sensing on physical robots and examining the roleof the spatial resolution of the tactile sensors on the handThese experiments provide beneficial knowledge for thoselooking to build robots with tactile sensors for manipulation

A logical next step is to employ such hands to study theimpact of tactile information on manual skill learning Withthe current sample efficiency of model free reinforcementlearning this will require a combined strategy that connectssimulation and real-world learning stages As a step towardssuch an approach and to generate insights about how theavailability of touch sensing impacts on manual skill learningwith model free DRL approaches we here present results ofa simulation study particularly with respect to the factorssample efficiency and performance level To this end we (i)extended the OpenAI Gym robotics environments with touchsensors designed to closely mimic the touch sensing of therobot hand [1][2] (ii) compared learning results for OpenAIGym robotics environments with and without touch sensingWe find for all four learning tasks a significantly increase insampling efficiency along with improved performance of thetrained agents

II METHODS

OpenAI Gym [16] contains several simulated roboticsenvironments with the Shadow Dexterous Hand These en-vironments use the MuJoCo [17] physics engine for fastand accurate simulation These are open source environmentsdesigned for experimental work and research with Deep Re-inforcement Learning The anthropomorphic Shadow Dex-terous Hand model comprising 24 degrees of freedom (20actuated and 4 coupled) has to manipulate an object (block

TABLE INEW OPENAI GYM ROBOTICS ENVIRONMENTS WITH TOUCH SENSORS

-V0 (BOOLEAN) -V1 (FLOAT-VALUE)HTTPSGITHUBCOMOPENAIGYM

HandManipulateBlockRotateZTouchSensorsHandManipulateBlockRotateParallelTouchSensorsHandManipulateBlockRotateXYZTouchSensors

HandManipulateBlockTouchSensorsHandManipulateEggRotateTouchSensors

HandManipulateEggTouchSensorsHandManipulatePenRotateTouchSensors

HandManipulatePenTouchSensors

TABLE IITHE 92 AND 16 TOUCH-SENSOR ENVIRONMENTS

lower phalanx of the fingers (4x) 7 sensors x 4 1 sensor x 4middle phalanxes of the fingers (4x) 5 sensors x 4 1 sensor x 4

tip phalanxes of the fingers (4x) 5 sensors x 4 1 sensor x 4thumb phalanxes (3x) 5 sensors x 3 1 sensor x 3

palm (1x) 9 sensors x 1 1 sensor x 1All touch sensors 92 sensors 16 sensors

egg or pen) so that it matches a given goal orientationposition or both position and orientation

As a step towards touch-augmented RL we extended theShadow Dexterous Hand model with touch sensors avail-able as new environments (Table I) in the OpenAI Gympackage [16] We covered all five fingers and the palm ofthe Shadow Dexterous Hand model with 92 touch sensors(Fig 2 Table II) For the agent the only difference betweenthe robotics environments with and without touch sensorsis the length of the state vector that the agent gets asan input at each time step For the original OpenAI Gymsimulated environments for robotics (Table I) without touchinformation the state vector is 68-dimensional (Table III) [4]In the environments with 92 touch sensors the state vector is160-dimensional (68+92) As an additional experiment wegrouped 92 sensors into 16 sub-groups (Table II) to reducethe tactile sensory resolution (rdquo16 Sensors-v0rdquo) If any ofsensors in a group has a greater than zero value then thesub-group returns True otherwise False The grouping wasdone per phalanx (3 phalanx x 5 digits) plus a palm resultingin 16 sub-groups (Table II) In the environments with 16touch sensors sub-groups the state vector is 84-dimensional

TABLE IIINEURAL NETWORK INPUT VECTOR IN THE HAND ENVIRONMENTS

Type Length

Joint angles 24Joint angleslsquo velocity 24

Objectlsquos position XYZ 3Objectlsquos velocity XYZ 3

Objectlsquos orientation (quaternion) 4Objectlsquos angular velocities 3

Target position of the object XYZ 3Target orientation of the object XYZ (quaternion) 4

All values without touch sensors 68Touch sensors 92

All values with touch sensors 160

(68+16) For a given state of the environment a trainedpolicy outputs an action vector of 20 float numbers used forposition-control (actuation center + action actuation range)of the 20 actuated degrees of freedom

The MuJoCo [17] physics engine provides methods tomimic touch sensing at specified locations This is basedon specifying the tactile sensorsrsquo active zones by so-calledsites Each site can be represented as either ellipsoid or boxIn Fig 2 the sites are visualized as red and green transparentshapes attached to the hand model If a bodyrsquos contact pointfalls within a sitersquos volume and involves a geometry attachedto the same body as the site the corresponding contact forceis included in the sensor reading The output of this sensoris non-negative scalar At runtime the contact normal forcesof all included contact points of a single zone are added andreturned as the simulated output of the corresponding sensor[18][19]

The learning agent in the environments is an Actor andCritic network 3 layers with 256 units each and ReLUnon-linearities We evaluate the learning using Deep De-terministic Policy Gradients (DDPG) [20] and HindsightExperience Replay (HER) [21] techniques All hyperpa-rameters and the training procedure are described in de-tail in [21 4] and are available in a code implementa-tion as a part of the OpenAI Baselines (HER) repository(httpsgithubcomopenaibaselines)[22]

For all environments we train on a single machine with 19CPU cores Each core generates experience using two paral-lel rollouts and uses MPI for synchronization We train for200 epochs which amounts to a total of 38x106 timestepsWe evaluate the performance after each epoch by performing10 deterministic test rollouts per MPI worker and thencompute the test success rate by averaging across rolloutsand MPI workers A test is counted as successful when atthe end of the test rollout the manipulated object occurs in arewarded state determined by the reward function The train-ing environments - the original and the newly contributedtouch-augmented ones as listed in Table I - are available aspart of OpenAI-Gym (httpsgithubcomopenaigym) [22]In all cases we repeat an experiment with five differentrandom seeds (4 20 1299 272032 8112502) and reportresults by computing the median test success rate as well asthe interquartile range

Following [4] in all tasks we use rewards that are sparseand binary The agent obtains a reward of 0 if the goalhas been achieved (within some task-specific tolerance) and1 otherwise Actions are 20-dimensional we use absoluteposition control for all non-coupled joints of the hand Weapply the same action in 20 subsequent simulator steps (witht = 0002 s each) before returning control to the agent iethe agentrsquos action frequency is f = 25 Hz Observationsinclude the 24 positions and velocities of the robotrsquos jointsIn case of an object that is being manipulated we alsoinclude its Cartesian position and rotation represented bya quaternion (hence 7-dimensional) as well as its linearand angular velocities In case of a rdquoTouchSensors-v0rdquoenvironment with touch-sensors we also include a vector

(92 Boolean values) representing tactile information In caseof a rdquoTouchSensors-v1rdquo environment with touch-sensorswe include a vector (92 float values) representing tactileinformation

Manipulate Block A block is placed on the palm of thehand The task is to manipulate the block such that a targetpose is achieved The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions) We include multiple variants

HandManipulateBlockFullTouchSensors Random targetrotation for all axes of the block Random target position

HandManipulateBlockRotateXYZTouchSensors Randomtarget rotation for all axes of the block No target positionA goal is considered achieved if the distance between themanipulated objects position and its desired position is lessthan 1 cm (applicable only in the Full variant) and thedifference in rotation is less than 01 rad

Manipulate Egg An egg-shaped object is placed on thepalm of the hand The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions)

HandManipulateEggFullTouchSensors Random target ro-tation for all axes of the egg Random target position Agoal is considered achieved if the distance between themanipulated objects position and its desired position is lessthan 1 cm and the difference in rotation is less than 01 rad

Manipulate Pen A pen-shaped object is placed on thepalm of the hand The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions)

HandManipulatePenRotateTouchSensors Random targetrotation x and y axes of the pen and no target rotation aroundthe z axis No target position A goal is considered achievedif the difference in rotation ignoring the z axis is less than01 rad

For the sake of brevity further details about trainingprocedure reward function goal-aware observation spaceand neural network parameters are available in [4] since ourmain contribution focuses on the extension of the existingShadow Dexterous Hand model by tactile sensors

III EXPERIMENTAL RESULTS

We tested how tactile information helps to increase per-formance and samples efficiency of training on the objectmanipulation tasks To this end we have reproduced theexperiments where no touch sensors were used [4] andconducted the same experiments with additional touch-sensorreadings To provide insights about how different aspectsof tactile information (accuracy tactile resolution) influencelearning and performance we conducted three experimentsIn the first experiment we added float-value readings from92 sensors to the state (red curves in Fig 3) This experimentcan be reproduced in the OpenAI-gym-robotics environmentsending at rdquoTouchSensors-v1rdquo In the second experimentwe added Boolean-value reading from the same 92 sensorsto the state (black curves in Fig 3) The experiment canbe reproduced in the OpenAI-gym-robotics environments

TABLE IVAVERAGE PERFORMANCE IN THE RANGE OR 150-200 EPOCHS (FIG 3)WITH AND WITHOUT TACTILE INFORMATION (LEFT FOUR COLUMNS)

PERFORMANCE-INCREASE RATIO OF LEARNING WITH TACTILE

INFORMATION IN COMPARISON TO LEARNING WITHOUT TACTILE

INFORMATION (RIGHT THREE COLUMNS)

No

Sens

ors

(ave

per

form

)

92Se

nsor

s-v1

(ave

per

form

)

92Se

nsor

s-v0

(ave

per

form

)

16Se

nsor

s-v0

(ave

per

form

)

92Se

nsor

s-v1

(per

form

inc

r)

92Se

nsor

s-v0

(per

form

inc

r)

16Se

nsor

s-v0

(per

form

inc

r)

HandManipulateEgg 074 083 085 085 112 115 115

HandManipulateBlockRoatateXYZ 092 094 094 093 102 102 101

HandManipulatePenRotate 028 032 031 030 114 111 107

HandManipulateBlock 019 030 030 029 158 158 153

Mean 122 122 119

ending at rdquoTouchSensors-v0rdquo In the third experiment wegrouped 92 sensors into 16 sub-groups (Table II) to reducethe tactile sensory resolution If any of sensors in a grouphas a Boolen-True value then the sub-group returns Trueotherwise False The grouping was done per phalanx (3phalanx x 5 digits) plus a palm resulting in 16 sub-groups(Table II) Thus in the third experiment we added Boolean-value readings from the 16 sub-groups to the state (greencurves in Fig 3)

Fig 3 and results in Table IV demonstrate that tactileinformation increases performance of the agent in the tasksWe define 100 performance level as the average over thelast 50 epochs (epochs 150-200) of median test success ratein the original rdquoNoSensorsrdquo environments (blue curves inFig 3) To compare performance-increase ratio of learningwith tactile information in comparison to learning without

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateEgg (HME)

HME NoSensorsHME 92 Sensors-v1HME 92 Sensors-v0HME 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlockRotateXYZ (HMBRXYZ)

HMBRXYZ NoSensorsHMBRXYZ 92 Sensors-v1HMBRXYZ 92 Sensors-v0HMBRXYZ 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulatePenRotate (HMPR)

HMPR NoSensorsHMPR 92 Sensors-v1HMPR 92 Sensors-v0HMPR 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlock (HMB)

HMB NoSensorsHMB 92 Sensors-v1HMB 92 Sensors-v0HMB 16 Sensors-v0

[4 1299 8112502 20 272032]

Fig 3 Curves - median test success rate Shaded areas - interquartilerange [five random seeds] Blue curves - learning without tactile infor-mation (NoSensors) red curves - learning with float-value tactile readings(Sensors-v1) from 92 sensors black curves - learning with Boolean-valuetactile readings (Sensors-v0) from 92 sensors green curves - learning withBoolean-value tactile readings (Sensors-v0) from 16 sensor sub-groups

TABLE VLEFT FOUR COLUMNS FIRST EPOCH WHEN A CONVERGENCE CURVE

(FIG 3) REACHES THE AVERAGE PERFORMANCE (EPOCHS 150-200)WITHOUT TACTILE INFORMATION RIGHT THREE COLUMNS SAMPLE

EFFICIENCY OF LEARNING WITH TACTILE INFORMATION IN

COMPARISON TO LEARNING WITHOUT TACTILE INFORMATION

No

Sens

ors

(firs

tep

och)

92Se

nsor

s-v1

(firs

tep

och)

92Se

nsor

s-v0

(firs

tep

och)

16Se

nsor

s-v0

(firs

tep

och)

92Se

nsor

s-v1

(sam

ple

eff)

92Se

nsor

s-v0

(sam

ple

eff)

16Se

nsor

s-v0

(sam

ple

eff)

HandManipulateEgg 136 77 59 58 177 231 235

HandManipulateBlockRoatateXYZ 85 56 60 75 152 142 113

HandManipulatePenRotate 108 34 40 56 318 270 193

HandManipulateBlock 113 62 60 69 182 188 164

Mean 207 208 176

tactile information we divide the average performance overthe last 50 epochs (epochs 150-200) of the three experiments(rdquo92 Sensors-v1rdquo rdquo92 Sensors-v0rdquo rdquo16 Sensors-v0rdquo) by theaverage performance in the original experiment (rdquoNoSen-sorsrdquo) In each experiment we observe on average morethan 119 times better performance (Table IV) when tactileinformation is available

Results in Table V demonstrate sample-efficiency increasewhile training when tactile information is available Tocompare the sample efficiency for learning with and with-out tactile information we measured how many trainingepochs were necessary to reach the performance level of anagent trained without tactile information in an environmentFor example in the HandManipulateBlock environment therdquoNoSensorsrdquo performance level equals 019 (Table IV) andis reached after 113 epochs (Table V) when training withouttactile information When training with float-value tactilereadings (92 Sensors-v1) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 62 epochs(Table V) which results in the 182 times increase (Ta-ble V) in sample efficiency for training When training withBoolean-values tactile readings (92 Sensors-v0) the agentreaches the rdquoNoSensorsrdquo-performance level of 019 first timealready after 60 epochs which results in the 188 timesincrease in sample efficiency for training When training withBoolean-values tactile readings with lower resolution (16Sensors-v0)(Table II) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 69 epochswhich results in the 164 times increase in sample efficiencyfor training In each experiment we observe on average morethan 176 times faster convergence (Table V) when tactileinformation is available (Table V)

IV DISCUSSION

In this work we study the inclusion of tactile informationinto model-free deep reinforcement learning We considertasks with dynamic in-hand manipulation of known objectsby the Shadow-Dexterous-Hand model We observe thatfeedback from tactile sensors improves performance andsample efficiency for training Next to vision tactile sensingis a crucial source of information for manipulation of objectsand tool usage for humans and robots [23 24 25 26] Visualreconstruction of contact areas suffers from occlusions by themanipulator at the contact areas [27] Better inferring thegeometry mass contact and force physics of manipulatedobjects is possible when touch information is availableTactile sensing thus can provide essential information formanual interaction and grasping making it one of keycomponents for better generalization Tactile informationalso allows training of gentle manipulation of fragile objectsIt is a challenging objective to tap the potential of touch fordexterous robot hands This requires to combine hardwareand software developments for robot hand touch sensing

In this work we concatenated tactile proprioceptive andvisual information at the input level of a neural networkA possible further extension of this work is a multi-modalsensor fusion The multi-modal sensor fusion [10] allowsend-to-end training of Bayesian information fusion on rawdata for all subsets of a sensor setup It can potentially deliverbetter performance and more sample efficient training withmodel-free deep reinforcement learning approaches

V CONCLUSIONS

In this work we introduce the touch-sensors extensionsto OpenAI-Gym [16] robotics Shadow-Dexterous-Hand en-vironments [4] modeled after our touch sensor developments[1 2] We find that adding tactile information substantiallyincreases sample efficiency for training (by 97 on averageTable V) and performance (by 21 on average Table IV)in the environments when training with deep reinforcementlearning techniques [21] To examine the role of tactile-sensor parameters in these improvements we conductedexperiments (Fig 3) with varied sensor-measurement accu-racy (Boolean vs float values) and varied spatial resolutionof the tactile sensors (92 sensors vs 16 sensors on thehand) We conclude that accurate sensory readings as wellas dense tactile resolution do not substantially improveperformance and sample efficiency when training with deepreinforcement learning techniques in comparison to Booleansensor readings and sparse sensor localization (one sensorper phalanx) The performance and sample efficiency fortraining are similar in these case These experiments providebeneficial knowledge for those looking to build robots withtactile sensors for manipulation

REFERENCES

[1] Risto Koiva et al ldquoA highly sensitive 3D-shapedtactile sensorrdquo In 2013 IEEEASME InternationalConference on Advanced Intelligent MechatronicsIEEE 2013 pp 1084ndash1089

[2] Gereon H Buscher et al ldquoFlexible and stretchablefabric-based tactile sensorrdquo In Robotics and Au-tonomous Systems 63 (2015) pp 244ndash252

[3] Marcin Andrychowicz et al ldquoLearning dexter-ous in-hand manipulationrdquo In arXiv preprintarXiv180800177 (2018)

[4] Matthias Plappert et al ldquoMulti-goal reinforce-ment learning Challenging robotics environmentsand request for researchrdquo In arXiv preprintarXiv180209464 (2018)

[5] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo Injuxinet (2019)

[6] Hamza Merzic et al ldquoLeveraging Contact Forces forLearning to Grasprdquo In 2019 International Conferenceon Robotics and Automation (ICRA) IEEE 2019pp 3615ndash3621

[7] Susan J Lederman and Roberta L Klatzky ldquoHandmovements A window into haptic object recognitionrdquoIn Cognitive psychology 193 (1987) pp 342ndash368

[8] Alexander Schmitz et al ldquoTactile object recognitionusing deep learning and dropoutrdquo In 2014 IEEE-RASInternational Conference on Humanoid Robots IEEE2014 pp 1044ndash1050

[9] Roberto Calandra et al ldquoMore than a feeling Learningto grasp and regrasp using vision and touchrdquo InIEEE Robotics and Automation Letters 34 (2018)pp 3300ndash3307

[10] Timo Korthals et al ldquoJointly Trained VariationalAutoencoder for Multi-Modal Sensor Fusionrdquo In22st International Conference on Information Fu-sion(FUSION) 2019 Ottawa CA July 2-5 20192019

[11] Herke Van Hoof et al ldquoStable reinforcement learn-ing with autoencoders for tactile and visual datardquoIn 2016 IEEERSJ International Conference on In-telligent Robots and Systems (IROS) IEEE 2016pp 3928ndash3934

[12] Martin Meier et al ldquoDistinguishing sliding from slip-ping during object pushingrdquo In 2016 IEEERSJ Inter-national Conference on Intelligent Robots and Systems(IROS) IEEE 2016 pp 5579ndash5584

[13] Guillaume Walck et al ldquoRobot self-protection byvirtual actuator fatigue Application to tendon-drivendexterous hands during graspingrdquo In 2017 IEEERSJInternational Conference on Intelligent Robots andSystems (IROS) IEEE 2017 pp 2200ndash2205

[14] Stephen Tian et al ldquoManipulation by feel Touch-based control with deep predictive modelsrdquo In arXivpreprint arXiv190304128 (2019)

[15] Herke Van Hoof et al ldquoLearning robot in-hand ma-nipulation with tactile featuresrdquo In 2015 IEEE-RAS15th International Conference on Humanoid Robots(Humanoids) IEEE 2015 pp 121ndash127

[16] Greg Brockman et al OpenAI Gym 2016 eprintarXiv160601540

[17] Emanuel Todorov Tom Erez and Yuval Tassa ldquoMu-joco A physics engine for model-based controlrdquo In2012 IEEERSJ International Conference on Intelli-gent Robots and Systems IEEE 2012 pp 5026ndash5033

[18] Emanuel Todorov Sensor touch httpwwwmujocoorgbookXMLreferencehtmlsensor-touch 2019

[19] Emanuel Todorov MPL sensors httpwwwmujoco org book haptix html mpSensors 2019

[20] Timothy P Lillicrap et al ldquoContinuous control withdeep reinforcement learningrdquo In arXiv preprintarXiv150902971 (2015)

[21] Marcin Andrychowicz et al ldquoHindsight experience re-playrdquo In Advances in Neural Information ProcessingSystems 2017 pp 5048ndash5058

[22] Prafulla Dhariwal et al ldquoOpenai baselinesrdquo InGitHub GitHub repository (2017)

[23] Jonathan Maycock et al ldquoApproaching manual intel-ligencerdquo In KI-Kunstliche Intelligenz 244 (2010)pp 287ndash294

[24] Roland S Johansson and Goran Westling ldquoRoles ofglabrous skin receptors and sensorimotor memoryin automatic control of precision grip when liftingrougher or more slippery objectsrdquo In Experimentalbrain research 563 (1984) pp 550ndash564

[25] Hao Dang Jonathan Weisz and Peter K Allen ldquoBlindgrasping Stable robotic grasping using tactile feed-back and hand kinematicsrdquo In ICRA 2011 pp 5917ndash5922

[26] Markus Fritzsche Norbert Elkmann and Erik Schu-lenburg ldquoTactile sensing A key technology forsafe physical human robot interactionrdquo In 20116th ACMIEEE International Conference on Human-Robot Interaction (HRI) IEEE 2011 pp 139ndash140

[27] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo In2019 IEEE International Conference on Robotics andAutomation Workshop on Integrating Vision and Touchfor Multimodal and Cross-modal Perception(ViTac)2019 Montreal CA May 20-25 2019 2019

  • INTRODUCTION
  • METHODS
  • EXPERIMENTAL RESULTS
  • DISCUSSION
  • CONCLUSIONS
Page 2: Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile Sensing and Deep Reinforcement Learning for In-Hand Manipulation Tasks Andrew Melnik 1,

into 92 tactile-sensitive areas [14] proposed a deep tac-tile model-predictive control framework for non-prehensilemanipulation to perform tactile servoing and to repositionan object to user-specified configurations indicated by agoal tactile reading using the learned tactile predictivemodel [15] studied how tactile feedback can be exploitedto adapt to unknown rollable objects located on a table anddemonstrated the possibility of learning feedback controllersfor in-hand manipulation using reinforcement learning on anunderactuated compliant platform The feedback controllerwas hand-tuned to successfully complete the tasks

Another challenge is the sample efficiency in learningof model free DRL methods Assuming the importance oftactile information for in-hand manipulation of objects itis necessary to measure how sensory information influenceslearning and overall performance for in-hand manipulationtasks We selected well established and benchmarked Ope-nAI Gym simulated environments for robotics which includein-hand manipulation of objects with anthropomorphic-handtasks We extended these environments with touch sensorsand compared the steepness of the learning curves with andwithout tactile information We demonstrate that the sampleefficiency and performance can be increased when tactileinformation is available to the agent To fully examine theroll of tactile sensing we are providing analysis of sensormeasurement accuracy - a highly important aspect in usingtactile sensing on physical robots and examining the roleof the spatial resolution of the tactile sensors on the handThese experiments provide beneficial knowledge for thoselooking to build robots with tactile sensors for manipulation

A logical next step is to employ such hands to study theimpact of tactile information on manual skill learning Withthe current sample efficiency of model free reinforcementlearning this will require a combined strategy that connectssimulation and real-world learning stages As a step towardssuch an approach and to generate insights about how theavailability of touch sensing impacts on manual skill learningwith model free DRL approaches we here present results ofa simulation study particularly with respect to the factorssample efficiency and performance level To this end we (i)extended the OpenAI Gym robotics environments with touchsensors designed to closely mimic the touch sensing of therobot hand [1][2] (ii) compared learning results for OpenAIGym robotics environments with and without touch sensingWe find for all four learning tasks a significantly increase insampling efficiency along with improved performance of thetrained agents

II METHODS

OpenAI Gym [16] contains several simulated roboticsenvironments with the Shadow Dexterous Hand These en-vironments use the MuJoCo [17] physics engine for fastand accurate simulation These are open source environmentsdesigned for experimental work and research with Deep Re-inforcement Learning The anthropomorphic Shadow Dex-terous Hand model comprising 24 degrees of freedom (20actuated and 4 coupled) has to manipulate an object (block

TABLE INEW OPENAI GYM ROBOTICS ENVIRONMENTS WITH TOUCH SENSORS

-V0 (BOOLEAN) -V1 (FLOAT-VALUE)HTTPSGITHUBCOMOPENAIGYM

HandManipulateBlockRotateZTouchSensorsHandManipulateBlockRotateParallelTouchSensorsHandManipulateBlockRotateXYZTouchSensors

HandManipulateBlockTouchSensorsHandManipulateEggRotateTouchSensors

HandManipulateEggTouchSensorsHandManipulatePenRotateTouchSensors

HandManipulatePenTouchSensors

TABLE IITHE 92 AND 16 TOUCH-SENSOR ENVIRONMENTS

lower phalanx of the fingers (4x) 7 sensors x 4 1 sensor x 4middle phalanxes of the fingers (4x) 5 sensors x 4 1 sensor x 4

tip phalanxes of the fingers (4x) 5 sensors x 4 1 sensor x 4thumb phalanxes (3x) 5 sensors x 3 1 sensor x 3

palm (1x) 9 sensors x 1 1 sensor x 1All touch sensors 92 sensors 16 sensors

egg or pen) so that it matches a given goal orientationposition or both position and orientation

As a step towards touch-augmented RL we extended theShadow Dexterous Hand model with touch sensors avail-able as new environments (Table I) in the OpenAI Gympackage [16] We covered all five fingers and the palm ofthe Shadow Dexterous Hand model with 92 touch sensors(Fig 2 Table II) For the agent the only difference betweenthe robotics environments with and without touch sensorsis the length of the state vector that the agent gets asan input at each time step For the original OpenAI Gymsimulated environments for robotics (Table I) without touchinformation the state vector is 68-dimensional (Table III) [4]In the environments with 92 touch sensors the state vector is160-dimensional (68+92) As an additional experiment wegrouped 92 sensors into 16 sub-groups (Table II) to reducethe tactile sensory resolution (rdquo16 Sensors-v0rdquo) If any ofsensors in a group has a greater than zero value then thesub-group returns True otherwise False The grouping wasdone per phalanx (3 phalanx x 5 digits) plus a palm resultingin 16 sub-groups (Table II) In the environments with 16touch sensors sub-groups the state vector is 84-dimensional

TABLE IIINEURAL NETWORK INPUT VECTOR IN THE HAND ENVIRONMENTS

Type Length

Joint angles 24Joint angleslsquo velocity 24

Objectlsquos position XYZ 3Objectlsquos velocity XYZ 3

Objectlsquos orientation (quaternion) 4Objectlsquos angular velocities 3

Target position of the object XYZ 3Target orientation of the object XYZ (quaternion) 4

All values without touch sensors 68Touch sensors 92

All values with touch sensors 160

(68+16) For a given state of the environment a trainedpolicy outputs an action vector of 20 float numbers used forposition-control (actuation center + action actuation range)of the 20 actuated degrees of freedom

The MuJoCo [17] physics engine provides methods tomimic touch sensing at specified locations This is basedon specifying the tactile sensorsrsquo active zones by so-calledsites Each site can be represented as either ellipsoid or boxIn Fig 2 the sites are visualized as red and green transparentshapes attached to the hand model If a bodyrsquos contact pointfalls within a sitersquos volume and involves a geometry attachedto the same body as the site the corresponding contact forceis included in the sensor reading The output of this sensoris non-negative scalar At runtime the contact normal forcesof all included contact points of a single zone are added andreturned as the simulated output of the corresponding sensor[18][19]

The learning agent in the environments is an Actor andCritic network 3 layers with 256 units each and ReLUnon-linearities We evaluate the learning using Deep De-terministic Policy Gradients (DDPG) [20] and HindsightExperience Replay (HER) [21] techniques All hyperpa-rameters and the training procedure are described in de-tail in [21 4] and are available in a code implementa-tion as a part of the OpenAI Baselines (HER) repository(httpsgithubcomopenaibaselines)[22]

For all environments we train on a single machine with 19CPU cores Each core generates experience using two paral-lel rollouts and uses MPI for synchronization We train for200 epochs which amounts to a total of 38x106 timestepsWe evaluate the performance after each epoch by performing10 deterministic test rollouts per MPI worker and thencompute the test success rate by averaging across rolloutsand MPI workers A test is counted as successful when atthe end of the test rollout the manipulated object occurs in arewarded state determined by the reward function The train-ing environments - the original and the newly contributedtouch-augmented ones as listed in Table I - are available aspart of OpenAI-Gym (httpsgithubcomopenaigym) [22]In all cases we repeat an experiment with five differentrandom seeds (4 20 1299 272032 8112502) and reportresults by computing the median test success rate as well asthe interquartile range

Following [4] in all tasks we use rewards that are sparseand binary The agent obtains a reward of 0 if the goalhas been achieved (within some task-specific tolerance) and1 otherwise Actions are 20-dimensional we use absoluteposition control for all non-coupled joints of the hand Weapply the same action in 20 subsequent simulator steps (witht = 0002 s each) before returning control to the agent iethe agentrsquos action frequency is f = 25 Hz Observationsinclude the 24 positions and velocities of the robotrsquos jointsIn case of an object that is being manipulated we alsoinclude its Cartesian position and rotation represented bya quaternion (hence 7-dimensional) as well as its linearand angular velocities In case of a rdquoTouchSensors-v0rdquoenvironment with touch-sensors we also include a vector

(92 Boolean values) representing tactile information In caseof a rdquoTouchSensors-v1rdquo environment with touch-sensorswe include a vector (92 float values) representing tactileinformation

Manipulate Block A block is placed on the palm of thehand The task is to manipulate the block such that a targetpose is achieved The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions) We include multiple variants

HandManipulateBlockFullTouchSensors Random targetrotation for all axes of the block Random target position

HandManipulateBlockRotateXYZTouchSensors Randomtarget rotation for all axes of the block No target positionA goal is considered achieved if the distance between themanipulated objects position and its desired position is lessthan 1 cm (applicable only in the Full variant) and thedifference in rotation is less than 01 rad

Manipulate Egg An egg-shaped object is placed on thepalm of the hand The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions)

HandManipulateEggFullTouchSensors Random target ro-tation for all axes of the egg Random target position Agoal is considered achieved if the distance between themanipulated objects position and its desired position is lessthan 1 cm and the difference in rotation is less than 01 rad

Manipulate Pen A pen-shaped object is placed on thepalm of the hand The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions)

HandManipulatePenRotateTouchSensors Random targetrotation x and y axes of the pen and no target rotation aroundthe z axis No target position A goal is considered achievedif the difference in rotation ignoring the z axis is less than01 rad

For the sake of brevity further details about trainingprocedure reward function goal-aware observation spaceand neural network parameters are available in [4] since ourmain contribution focuses on the extension of the existingShadow Dexterous Hand model by tactile sensors

III EXPERIMENTAL RESULTS

We tested how tactile information helps to increase per-formance and samples efficiency of training on the objectmanipulation tasks To this end we have reproduced theexperiments where no touch sensors were used [4] andconducted the same experiments with additional touch-sensorreadings To provide insights about how different aspectsof tactile information (accuracy tactile resolution) influencelearning and performance we conducted three experimentsIn the first experiment we added float-value readings from92 sensors to the state (red curves in Fig 3) This experimentcan be reproduced in the OpenAI-gym-robotics environmentsending at rdquoTouchSensors-v1rdquo In the second experimentwe added Boolean-value reading from the same 92 sensorsto the state (black curves in Fig 3) The experiment canbe reproduced in the OpenAI-gym-robotics environments

TABLE IVAVERAGE PERFORMANCE IN THE RANGE OR 150-200 EPOCHS (FIG 3)WITH AND WITHOUT TACTILE INFORMATION (LEFT FOUR COLUMNS)

PERFORMANCE-INCREASE RATIO OF LEARNING WITH TACTILE

INFORMATION IN COMPARISON TO LEARNING WITHOUT TACTILE

INFORMATION (RIGHT THREE COLUMNS)

No

Sens

ors

(ave

per

form

)

92Se

nsor

s-v1

(ave

per

form

)

92Se

nsor

s-v0

(ave

per

form

)

16Se

nsor

s-v0

(ave

per

form

)

92Se

nsor

s-v1

(per

form

inc

r)

92Se

nsor

s-v0

(per

form

inc

r)

16Se

nsor

s-v0

(per

form

inc

r)

HandManipulateEgg 074 083 085 085 112 115 115

HandManipulateBlockRoatateXYZ 092 094 094 093 102 102 101

HandManipulatePenRotate 028 032 031 030 114 111 107

HandManipulateBlock 019 030 030 029 158 158 153

Mean 122 122 119

ending at rdquoTouchSensors-v0rdquo In the third experiment wegrouped 92 sensors into 16 sub-groups (Table II) to reducethe tactile sensory resolution If any of sensors in a grouphas a Boolen-True value then the sub-group returns Trueotherwise False The grouping was done per phalanx (3phalanx x 5 digits) plus a palm resulting in 16 sub-groups(Table II) Thus in the third experiment we added Boolean-value readings from the 16 sub-groups to the state (greencurves in Fig 3)

Fig 3 and results in Table IV demonstrate that tactileinformation increases performance of the agent in the tasksWe define 100 performance level as the average over thelast 50 epochs (epochs 150-200) of median test success ratein the original rdquoNoSensorsrdquo environments (blue curves inFig 3) To compare performance-increase ratio of learningwith tactile information in comparison to learning without

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateEgg (HME)

HME NoSensorsHME 92 Sensors-v1HME 92 Sensors-v0HME 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlockRotateXYZ (HMBRXYZ)

HMBRXYZ NoSensorsHMBRXYZ 92 Sensors-v1HMBRXYZ 92 Sensors-v0HMBRXYZ 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulatePenRotate (HMPR)

HMPR NoSensorsHMPR 92 Sensors-v1HMPR 92 Sensors-v0HMPR 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlock (HMB)

HMB NoSensorsHMB 92 Sensors-v1HMB 92 Sensors-v0HMB 16 Sensors-v0

[4 1299 8112502 20 272032]

Fig 3 Curves - median test success rate Shaded areas - interquartilerange [five random seeds] Blue curves - learning without tactile infor-mation (NoSensors) red curves - learning with float-value tactile readings(Sensors-v1) from 92 sensors black curves - learning with Boolean-valuetactile readings (Sensors-v0) from 92 sensors green curves - learning withBoolean-value tactile readings (Sensors-v0) from 16 sensor sub-groups

TABLE VLEFT FOUR COLUMNS FIRST EPOCH WHEN A CONVERGENCE CURVE

(FIG 3) REACHES THE AVERAGE PERFORMANCE (EPOCHS 150-200)WITHOUT TACTILE INFORMATION RIGHT THREE COLUMNS SAMPLE

EFFICIENCY OF LEARNING WITH TACTILE INFORMATION IN

COMPARISON TO LEARNING WITHOUT TACTILE INFORMATION

No

Sens

ors

(firs

tep

och)

92Se

nsor

s-v1

(firs

tep

och)

92Se

nsor

s-v0

(firs

tep

och)

16Se

nsor

s-v0

(firs

tep

och)

92Se

nsor

s-v1

(sam

ple

eff)

92Se

nsor

s-v0

(sam

ple

eff)

16Se

nsor

s-v0

(sam

ple

eff)

HandManipulateEgg 136 77 59 58 177 231 235

HandManipulateBlockRoatateXYZ 85 56 60 75 152 142 113

HandManipulatePenRotate 108 34 40 56 318 270 193

HandManipulateBlock 113 62 60 69 182 188 164

Mean 207 208 176

tactile information we divide the average performance overthe last 50 epochs (epochs 150-200) of the three experiments(rdquo92 Sensors-v1rdquo rdquo92 Sensors-v0rdquo rdquo16 Sensors-v0rdquo) by theaverage performance in the original experiment (rdquoNoSen-sorsrdquo) In each experiment we observe on average morethan 119 times better performance (Table IV) when tactileinformation is available

Results in Table V demonstrate sample-efficiency increasewhile training when tactile information is available Tocompare the sample efficiency for learning with and with-out tactile information we measured how many trainingepochs were necessary to reach the performance level of anagent trained without tactile information in an environmentFor example in the HandManipulateBlock environment therdquoNoSensorsrdquo performance level equals 019 (Table IV) andis reached after 113 epochs (Table V) when training withouttactile information When training with float-value tactilereadings (92 Sensors-v1) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 62 epochs(Table V) which results in the 182 times increase (Ta-ble V) in sample efficiency for training When training withBoolean-values tactile readings (92 Sensors-v0) the agentreaches the rdquoNoSensorsrdquo-performance level of 019 first timealready after 60 epochs which results in the 188 timesincrease in sample efficiency for training When training withBoolean-values tactile readings with lower resolution (16Sensors-v0)(Table II) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 69 epochswhich results in the 164 times increase in sample efficiencyfor training In each experiment we observe on average morethan 176 times faster convergence (Table V) when tactileinformation is available (Table V)

IV DISCUSSION

In this work we study the inclusion of tactile informationinto model-free deep reinforcement learning We considertasks with dynamic in-hand manipulation of known objectsby the Shadow-Dexterous-Hand model We observe thatfeedback from tactile sensors improves performance andsample efficiency for training Next to vision tactile sensingis a crucial source of information for manipulation of objectsand tool usage for humans and robots [23 24 25 26] Visualreconstruction of contact areas suffers from occlusions by themanipulator at the contact areas [27] Better inferring thegeometry mass contact and force physics of manipulatedobjects is possible when touch information is availableTactile sensing thus can provide essential information formanual interaction and grasping making it one of keycomponents for better generalization Tactile informationalso allows training of gentle manipulation of fragile objectsIt is a challenging objective to tap the potential of touch fordexterous robot hands This requires to combine hardwareand software developments for robot hand touch sensing

In this work we concatenated tactile proprioceptive andvisual information at the input level of a neural networkA possible further extension of this work is a multi-modalsensor fusion The multi-modal sensor fusion [10] allowsend-to-end training of Bayesian information fusion on rawdata for all subsets of a sensor setup It can potentially deliverbetter performance and more sample efficient training withmodel-free deep reinforcement learning approaches

V CONCLUSIONS

In this work we introduce the touch-sensors extensionsto OpenAI-Gym [16] robotics Shadow-Dexterous-Hand en-vironments [4] modeled after our touch sensor developments[1 2] We find that adding tactile information substantiallyincreases sample efficiency for training (by 97 on averageTable V) and performance (by 21 on average Table IV)in the environments when training with deep reinforcementlearning techniques [21] To examine the role of tactile-sensor parameters in these improvements we conductedexperiments (Fig 3) with varied sensor-measurement accu-racy (Boolean vs float values) and varied spatial resolutionof the tactile sensors (92 sensors vs 16 sensors on thehand) We conclude that accurate sensory readings as wellas dense tactile resolution do not substantially improveperformance and sample efficiency when training with deepreinforcement learning techniques in comparison to Booleansensor readings and sparse sensor localization (one sensorper phalanx) The performance and sample efficiency fortraining are similar in these case These experiments providebeneficial knowledge for those looking to build robots withtactile sensors for manipulation

REFERENCES

[1] Risto Koiva et al ldquoA highly sensitive 3D-shapedtactile sensorrdquo In 2013 IEEEASME InternationalConference on Advanced Intelligent MechatronicsIEEE 2013 pp 1084ndash1089

[2] Gereon H Buscher et al ldquoFlexible and stretchablefabric-based tactile sensorrdquo In Robotics and Au-tonomous Systems 63 (2015) pp 244ndash252

[3] Marcin Andrychowicz et al ldquoLearning dexter-ous in-hand manipulationrdquo In arXiv preprintarXiv180800177 (2018)

[4] Matthias Plappert et al ldquoMulti-goal reinforce-ment learning Challenging robotics environmentsand request for researchrdquo In arXiv preprintarXiv180209464 (2018)

[5] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo Injuxinet (2019)

[6] Hamza Merzic et al ldquoLeveraging Contact Forces forLearning to Grasprdquo In 2019 International Conferenceon Robotics and Automation (ICRA) IEEE 2019pp 3615ndash3621

[7] Susan J Lederman and Roberta L Klatzky ldquoHandmovements A window into haptic object recognitionrdquoIn Cognitive psychology 193 (1987) pp 342ndash368

[8] Alexander Schmitz et al ldquoTactile object recognitionusing deep learning and dropoutrdquo In 2014 IEEE-RASInternational Conference on Humanoid Robots IEEE2014 pp 1044ndash1050

[9] Roberto Calandra et al ldquoMore than a feeling Learningto grasp and regrasp using vision and touchrdquo InIEEE Robotics and Automation Letters 34 (2018)pp 3300ndash3307

[10] Timo Korthals et al ldquoJointly Trained VariationalAutoencoder for Multi-Modal Sensor Fusionrdquo In22st International Conference on Information Fu-sion(FUSION) 2019 Ottawa CA July 2-5 20192019

[11] Herke Van Hoof et al ldquoStable reinforcement learn-ing with autoencoders for tactile and visual datardquoIn 2016 IEEERSJ International Conference on In-telligent Robots and Systems (IROS) IEEE 2016pp 3928ndash3934

[12] Martin Meier et al ldquoDistinguishing sliding from slip-ping during object pushingrdquo In 2016 IEEERSJ Inter-national Conference on Intelligent Robots and Systems(IROS) IEEE 2016 pp 5579ndash5584

[13] Guillaume Walck et al ldquoRobot self-protection byvirtual actuator fatigue Application to tendon-drivendexterous hands during graspingrdquo In 2017 IEEERSJInternational Conference on Intelligent Robots andSystems (IROS) IEEE 2017 pp 2200ndash2205

[14] Stephen Tian et al ldquoManipulation by feel Touch-based control with deep predictive modelsrdquo In arXivpreprint arXiv190304128 (2019)

[15] Herke Van Hoof et al ldquoLearning robot in-hand ma-nipulation with tactile featuresrdquo In 2015 IEEE-RAS15th International Conference on Humanoid Robots(Humanoids) IEEE 2015 pp 121ndash127

[16] Greg Brockman et al OpenAI Gym 2016 eprintarXiv160601540

[17] Emanuel Todorov Tom Erez and Yuval Tassa ldquoMu-joco A physics engine for model-based controlrdquo In2012 IEEERSJ International Conference on Intelli-gent Robots and Systems IEEE 2012 pp 5026ndash5033

[18] Emanuel Todorov Sensor touch httpwwwmujocoorgbookXMLreferencehtmlsensor-touch 2019

[19] Emanuel Todorov MPL sensors httpwwwmujoco org book haptix html mpSensors 2019

[20] Timothy P Lillicrap et al ldquoContinuous control withdeep reinforcement learningrdquo In arXiv preprintarXiv150902971 (2015)

[21] Marcin Andrychowicz et al ldquoHindsight experience re-playrdquo In Advances in Neural Information ProcessingSystems 2017 pp 5048ndash5058

[22] Prafulla Dhariwal et al ldquoOpenai baselinesrdquo InGitHub GitHub repository (2017)

[23] Jonathan Maycock et al ldquoApproaching manual intel-ligencerdquo In KI-Kunstliche Intelligenz 244 (2010)pp 287ndash294

[24] Roland S Johansson and Goran Westling ldquoRoles ofglabrous skin receptors and sensorimotor memoryin automatic control of precision grip when liftingrougher or more slippery objectsrdquo In Experimentalbrain research 563 (1984) pp 550ndash564

[25] Hao Dang Jonathan Weisz and Peter K Allen ldquoBlindgrasping Stable robotic grasping using tactile feed-back and hand kinematicsrdquo In ICRA 2011 pp 5917ndash5922

[26] Markus Fritzsche Norbert Elkmann and Erik Schu-lenburg ldquoTactile sensing A key technology forsafe physical human robot interactionrdquo In 20116th ACMIEEE International Conference on Human-Robot Interaction (HRI) IEEE 2011 pp 139ndash140

[27] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo In2019 IEEE International Conference on Robotics andAutomation Workshop on Integrating Vision and Touchfor Multimodal and Cross-modal Perception(ViTac)2019 Montreal CA May 20-25 2019 2019

  • INTRODUCTION
  • METHODS
  • EXPERIMENTAL RESULTS
  • DISCUSSION
  • CONCLUSIONS
Page 3: Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile Sensing and Deep Reinforcement Learning for In-Hand Manipulation Tasks Andrew Melnik 1,

(68+16) For a given state of the environment a trainedpolicy outputs an action vector of 20 float numbers used forposition-control (actuation center + action actuation range)of the 20 actuated degrees of freedom

The MuJoCo [17] physics engine provides methods tomimic touch sensing at specified locations This is basedon specifying the tactile sensorsrsquo active zones by so-calledsites Each site can be represented as either ellipsoid or boxIn Fig 2 the sites are visualized as red and green transparentshapes attached to the hand model If a bodyrsquos contact pointfalls within a sitersquos volume and involves a geometry attachedto the same body as the site the corresponding contact forceis included in the sensor reading The output of this sensoris non-negative scalar At runtime the contact normal forcesof all included contact points of a single zone are added andreturned as the simulated output of the corresponding sensor[18][19]

The learning agent in the environments is an Actor andCritic network 3 layers with 256 units each and ReLUnon-linearities We evaluate the learning using Deep De-terministic Policy Gradients (DDPG) [20] and HindsightExperience Replay (HER) [21] techniques All hyperpa-rameters and the training procedure are described in de-tail in [21 4] and are available in a code implementa-tion as a part of the OpenAI Baselines (HER) repository(httpsgithubcomopenaibaselines)[22]

For all environments we train on a single machine with 19CPU cores Each core generates experience using two paral-lel rollouts and uses MPI for synchronization We train for200 epochs which amounts to a total of 38x106 timestepsWe evaluate the performance after each epoch by performing10 deterministic test rollouts per MPI worker and thencompute the test success rate by averaging across rolloutsand MPI workers A test is counted as successful when atthe end of the test rollout the manipulated object occurs in arewarded state determined by the reward function The train-ing environments - the original and the newly contributedtouch-augmented ones as listed in Table I - are available aspart of OpenAI-Gym (httpsgithubcomopenaigym) [22]In all cases we repeat an experiment with five differentrandom seeds (4 20 1299 272032 8112502) and reportresults by computing the median test success rate as well asthe interquartile range

Following [4] in all tasks we use rewards that are sparseand binary The agent obtains a reward of 0 if the goalhas been achieved (within some task-specific tolerance) and1 otherwise Actions are 20-dimensional we use absoluteposition control for all non-coupled joints of the hand Weapply the same action in 20 subsequent simulator steps (witht = 0002 s each) before returning control to the agent iethe agentrsquos action frequency is f = 25 Hz Observationsinclude the 24 positions and velocities of the robotrsquos jointsIn case of an object that is being manipulated we alsoinclude its Cartesian position and rotation represented bya quaternion (hence 7-dimensional) as well as its linearand angular velocities In case of a rdquoTouchSensors-v0rdquoenvironment with touch-sensors we also include a vector

(92 Boolean values) representing tactile information In caseof a rdquoTouchSensors-v1rdquo environment with touch-sensorswe include a vector (92 float values) representing tactileinformation

Manipulate Block A block is placed on the palm of thehand The task is to manipulate the block such that a targetpose is achieved The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions) We include multiple variants

HandManipulateBlockFullTouchSensors Random targetrotation for all axes of the block Random target position

HandManipulateBlockRotateXYZTouchSensors Randomtarget rotation for all axes of the block No target positionA goal is considered achieved if the distance between themanipulated objects position and its desired position is lessthan 1 cm (applicable only in the Full variant) and thedifference in rotation is less than 01 rad

Manipulate Egg An egg-shaped object is placed on thepalm of the hand The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions)

HandManipulateEggFullTouchSensors Random target ro-tation for all axes of the egg Random target position Agoal is considered achieved if the distance between themanipulated objects position and its desired position is lessthan 1 cm and the difference in rotation is less than 01 rad

Manipulate Pen A pen-shaped object is placed on thepalm of the hand The goal is 7-dimensional and includes thetarget position (in Cartesian coordinates) and target rotation(in quaternions)

HandManipulatePenRotateTouchSensors Random targetrotation x and y axes of the pen and no target rotation aroundthe z axis No target position A goal is considered achievedif the difference in rotation ignoring the z axis is less than01 rad

For the sake of brevity further details about trainingprocedure reward function goal-aware observation spaceand neural network parameters are available in [4] since ourmain contribution focuses on the extension of the existingShadow Dexterous Hand model by tactile sensors

III EXPERIMENTAL RESULTS

We tested how tactile information helps to increase per-formance and samples efficiency of training on the objectmanipulation tasks To this end we have reproduced theexperiments where no touch sensors were used [4] andconducted the same experiments with additional touch-sensorreadings To provide insights about how different aspectsof tactile information (accuracy tactile resolution) influencelearning and performance we conducted three experimentsIn the first experiment we added float-value readings from92 sensors to the state (red curves in Fig 3) This experimentcan be reproduced in the OpenAI-gym-robotics environmentsending at rdquoTouchSensors-v1rdquo In the second experimentwe added Boolean-value reading from the same 92 sensorsto the state (black curves in Fig 3) The experiment canbe reproduced in the OpenAI-gym-robotics environments

TABLE IVAVERAGE PERFORMANCE IN THE RANGE OR 150-200 EPOCHS (FIG 3)WITH AND WITHOUT TACTILE INFORMATION (LEFT FOUR COLUMNS)

PERFORMANCE-INCREASE RATIO OF LEARNING WITH TACTILE

INFORMATION IN COMPARISON TO LEARNING WITHOUT TACTILE

INFORMATION (RIGHT THREE COLUMNS)

No

Sens

ors

(ave

per

form

)

92Se

nsor

s-v1

(ave

per

form

)

92Se

nsor

s-v0

(ave

per

form

)

16Se

nsor

s-v0

(ave

per

form

)

92Se

nsor

s-v1

(per

form

inc

r)

92Se

nsor

s-v0

(per

form

inc

r)

16Se

nsor

s-v0

(per

form

inc

r)

HandManipulateEgg 074 083 085 085 112 115 115

HandManipulateBlockRoatateXYZ 092 094 094 093 102 102 101

HandManipulatePenRotate 028 032 031 030 114 111 107

HandManipulateBlock 019 030 030 029 158 158 153

Mean 122 122 119

ending at rdquoTouchSensors-v0rdquo In the third experiment wegrouped 92 sensors into 16 sub-groups (Table II) to reducethe tactile sensory resolution If any of sensors in a grouphas a Boolen-True value then the sub-group returns Trueotherwise False The grouping was done per phalanx (3phalanx x 5 digits) plus a palm resulting in 16 sub-groups(Table II) Thus in the third experiment we added Boolean-value readings from the 16 sub-groups to the state (greencurves in Fig 3)

Fig 3 and results in Table IV demonstrate that tactileinformation increases performance of the agent in the tasksWe define 100 performance level as the average over thelast 50 epochs (epochs 150-200) of median test success ratein the original rdquoNoSensorsrdquo environments (blue curves inFig 3) To compare performance-increase ratio of learningwith tactile information in comparison to learning without

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateEgg (HME)

HME NoSensorsHME 92 Sensors-v1HME 92 Sensors-v0HME 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlockRotateXYZ (HMBRXYZ)

HMBRXYZ NoSensorsHMBRXYZ 92 Sensors-v1HMBRXYZ 92 Sensors-v0HMBRXYZ 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulatePenRotate (HMPR)

HMPR NoSensorsHMPR 92 Sensors-v1HMPR 92 Sensors-v0HMPR 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlock (HMB)

HMB NoSensorsHMB 92 Sensors-v1HMB 92 Sensors-v0HMB 16 Sensors-v0

[4 1299 8112502 20 272032]

Fig 3 Curves - median test success rate Shaded areas - interquartilerange [five random seeds] Blue curves - learning without tactile infor-mation (NoSensors) red curves - learning with float-value tactile readings(Sensors-v1) from 92 sensors black curves - learning with Boolean-valuetactile readings (Sensors-v0) from 92 sensors green curves - learning withBoolean-value tactile readings (Sensors-v0) from 16 sensor sub-groups

TABLE VLEFT FOUR COLUMNS FIRST EPOCH WHEN A CONVERGENCE CURVE

(FIG 3) REACHES THE AVERAGE PERFORMANCE (EPOCHS 150-200)WITHOUT TACTILE INFORMATION RIGHT THREE COLUMNS SAMPLE

EFFICIENCY OF LEARNING WITH TACTILE INFORMATION IN

COMPARISON TO LEARNING WITHOUT TACTILE INFORMATION

No

Sens

ors

(firs

tep

och)

92Se

nsor

s-v1

(firs

tep

och)

92Se

nsor

s-v0

(firs

tep

och)

16Se

nsor

s-v0

(firs

tep

och)

92Se

nsor

s-v1

(sam

ple

eff)

92Se

nsor

s-v0

(sam

ple

eff)

16Se

nsor

s-v0

(sam

ple

eff)

HandManipulateEgg 136 77 59 58 177 231 235

HandManipulateBlockRoatateXYZ 85 56 60 75 152 142 113

HandManipulatePenRotate 108 34 40 56 318 270 193

HandManipulateBlock 113 62 60 69 182 188 164

Mean 207 208 176

tactile information we divide the average performance overthe last 50 epochs (epochs 150-200) of the three experiments(rdquo92 Sensors-v1rdquo rdquo92 Sensors-v0rdquo rdquo16 Sensors-v0rdquo) by theaverage performance in the original experiment (rdquoNoSen-sorsrdquo) In each experiment we observe on average morethan 119 times better performance (Table IV) when tactileinformation is available

Results in Table V demonstrate sample-efficiency increasewhile training when tactile information is available Tocompare the sample efficiency for learning with and with-out tactile information we measured how many trainingepochs were necessary to reach the performance level of anagent trained without tactile information in an environmentFor example in the HandManipulateBlock environment therdquoNoSensorsrdquo performance level equals 019 (Table IV) andis reached after 113 epochs (Table V) when training withouttactile information When training with float-value tactilereadings (92 Sensors-v1) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 62 epochs(Table V) which results in the 182 times increase (Ta-ble V) in sample efficiency for training When training withBoolean-values tactile readings (92 Sensors-v0) the agentreaches the rdquoNoSensorsrdquo-performance level of 019 first timealready after 60 epochs which results in the 188 timesincrease in sample efficiency for training When training withBoolean-values tactile readings with lower resolution (16Sensors-v0)(Table II) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 69 epochswhich results in the 164 times increase in sample efficiencyfor training In each experiment we observe on average morethan 176 times faster convergence (Table V) when tactileinformation is available (Table V)

IV DISCUSSION

In this work we study the inclusion of tactile informationinto model-free deep reinforcement learning We considertasks with dynamic in-hand manipulation of known objectsby the Shadow-Dexterous-Hand model We observe thatfeedback from tactile sensors improves performance andsample efficiency for training Next to vision tactile sensingis a crucial source of information for manipulation of objectsand tool usage for humans and robots [23 24 25 26] Visualreconstruction of contact areas suffers from occlusions by themanipulator at the contact areas [27] Better inferring thegeometry mass contact and force physics of manipulatedobjects is possible when touch information is availableTactile sensing thus can provide essential information formanual interaction and grasping making it one of keycomponents for better generalization Tactile informationalso allows training of gentle manipulation of fragile objectsIt is a challenging objective to tap the potential of touch fordexterous robot hands This requires to combine hardwareand software developments for robot hand touch sensing

In this work we concatenated tactile proprioceptive andvisual information at the input level of a neural networkA possible further extension of this work is a multi-modalsensor fusion The multi-modal sensor fusion [10] allowsend-to-end training of Bayesian information fusion on rawdata for all subsets of a sensor setup It can potentially deliverbetter performance and more sample efficient training withmodel-free deep reinforcement learning approaches

V CONCLUSIONS

In this work we introduce the touch-sensors extensionsto OpenAI-Gym [16] robotics Shadow-Dexterous-Hand en-vironments [4] modeled after our touch sensor developments[1 2] We find that adding tactile information substantiallyincreases sample efficiency for training (by 97 on averageTable V) and performance (by 21 on average Table IV)in the environments when training with deep reinforcementlearning techniques [21] To examine the role of tactile-sensor parameters in these improvements we conductedexperiments (Fig 3) with varied sensor-measurement accu-racy (Boolean vs float values) and varied spatial resolutionof the tactile sensors (92 sensors vs 16 sensors on thehand) We conclude that accurate sensory readings as wellas dense tactile resolution do not substantially improveperformance and sample efficiency when training with deepreinforcement learning techniques in comparison to Booleansensor readings and sparse sensor localization (one sensorper phalanx) The performance and sample efficiency fortraining are similar in these case These experiments providebeneficial knowledge for those looking to build robots withtactile sensors for manipulation

REFERENCES

[1] Risto Koiva et al ldquoA highly sensitive 3D-shapedtactile sensorrdquo In 2013 IEEEASME InternationalConference on Advanced Intelligent MechatronicsIEEE 2013 pp 1084ndash1089

[2] Gereon H Buscher et al ldquoFlexible and stretchablefabric-based tactile sensorrdquo In Robotics and Au-tonomous Systems 63 (2015) pp 244ndash252

[3] Marcin Andrychowicz et al ldquoLearning dexter-ous in-hand manipulationrdquo In arXiv preprintarXiv180800177 (2018)

[4] Matthias Plappert et al ldquoMulti-goal reinforce-ment learning Challenging robotics environmentsand request for researchrdquo In arXiv preprintarXiv180209464 (2018)

[5] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo Injuxinet (2019)

[6] Hamza Merzic et al ldquoLeveraging Contact Forces forLearning to Grasprdquo In 2019 International Conferenceon Robotics and Automation (ICRA) IEEE 2019pp 3615ndash3621

[7] Susan J Lederman and Roberta L Klatzky ldquoHandmovements A window into haptic object recognitionrdquoIn Cognitive psychology 193 (1987) pp 342ndash368

[8] Alexander Schmitz et al ldquoTactile object recognitionusing deep learning and dropoutrdquo In 2014 IEEE-RASInternational Conference on Humanoid Robots IEEE2014 pp 1044ndash1050

[9] Roberto Calandra et al ldquoMore than a feeling Learningto grasp and regrasp using vision and touchrdquo InIEEE Robotics and Automation Letters 34 (2018)pp 3300ndash3307

[10] Timo Korthals et al ldquoJointly Trained VariationalAutoencoder for Multi-Modal Sensor Fusionrdquo In22st International Conference on Information Fu-sion(FUSION) 2019 Ottawa CA July 2-5 20192019

[11] Herke Van Hoof et al ldquoStable reinforcement learn-ing with autoencoders for tactile and visual datardquoIn 2016 IEEERSJ International Conference on In-telligent Robots and Systems (IROS) IEEE 2016pp 3928ndash3934

[12] Martin Meier et al ldquoDistinguishing sliding from slip-ping during object pushingrdquo In 2016 IEEERSJ Inter-national Conference on Intelligent Robots and Systems(IROS) IEEE 2016 pp 5579ndash5584

[13] Guillaume Walck et al ldquoRobot self-protection byvirtual actuator fatigue Application to tendon-drivendexterous hands during graspingrdquo In 2017 IEEERSJInternational Conference on Intelligent Robots andSystems (IROS) IEEE 2017 pp 2200ndash2205

[14] Stephen Tian et al ldquoManipulation by feel Touch-based control with deep predictive modelsrdquo In arXivpreprint arXiv190304128 (2019)

[15] Herke Van Hoof et al ldquoLearning robot in-hand ma-nipulation with tactile featuresrdquo In 2015 IEEE-RAS15th International Conference on Humanoid Robots(Humanoids) IEEE 2015 pp 121ndash127

[16] Greg Brockman et al OpenAI Gym 2016 eprintarXiv160601540

[17] Emanuel Todorov Tom Erez and Yuval Tassa ldquoMu-joco A physics engine for model-based controlrdquo In2012 IEEERSJ International Conference on Intelli-gent Robots and Systems IEEE 2012 pp 5026ndash5033

[18] Emanuel Todorov Sensor touch httpwwwmujocoorgbookXMLreferencehtmlsensor-touch 2019

[19] Emanuel Todorov MPL sensors httpwwwmujoco org book haptix html mpSensors 2019

[20] Timothy P Lillicrap et al ldquoContinuous control withdeep reinforcement learningrdquo In arXiv preprintarXiv150902971 (2015)

[21] Marcin Andrychowicz et al ldquoHindsight experience re-playrdquo In Advances in Neural Information ProcessingSystems 2017 pp 5048ndash5058

[22] Prafulla Dhariwal et al ldquoOpenai baselinesrdquo InGitHub GitHub repository (2017)

[23] Jonathan Maycock et al ldquoApproaching manual intel-ligencerdquo In KI-Kunstliche Intelligenz 244 (2010)pp 287ndash294

[24] Roland S Johansson and Goran Westling ldquoRoles ofglabrous skin receptors and sensorimotor memoryin automatic control of precision grip when liftingrougher or more slippery objectsrdquo In Experimentalbrain research 563 (1984) pp 550ndash564

[25] Hao Dang Jonathan Weisz and Peter K Allen ldquoBlindgrasping Stable robotic grasping using tactile feed-back and hand kinematicsrdquo In ICRA 2011 pp 5917ndash5922

[26] Markus Fritzsche Norbert Elkmann and Erik Schu-lenburg ldquoTactile sensing A key technology forsafe physical human robot interactionrdquo In 20116th ACMIEEE International Conference on Human-Robot Interaction (HRI) IEEE 2011 pp 139ndash140

[27] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo In2019 IEEE International Conference on Robotics andAutomation Workshop on Integrating Vision and Touchfor Multimodal and Cross-modal Perception(ViTac)2019 Montreal CA May 20-25 2019 2019

  • INTRODUCTION
  • METHODS
  • EXPERIMENTAL RESULTS
  • DISCUSSION
  • CONCLUSIONS
Page 4: Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile Sensing and Deep Reinforcement Learning for In-Hand Manipulation Tasks Andrew Melnik 1,

TABLE IVAVERAGE PERFORMANCE IN THE RANGE OR 150-200 EPOCHS (FIG 3)WITH AND WITHOUT TACTILE INFORMATION (LEFT FOUR COLUMNS)

PERFORMANCE-INCREASE RATIO OF LEARNING WITH TACTILE

INFORMATION IN COMPARISON TO LEARNING WITHOUT TACTILE

INFORMATION (RIGHT THREE COLUMNS)

No

Sens

ors

(ave

per

form

)

92Se

nsor

s-v1

(ave

per

form

)

92Se

nsor

s-v0

(ave

per

form

)

16Se

nsor

s-v0

(ave

per

form

)

92Se

nsor

s-v1

(per

form

inc

r)

92Se

nsor

s-v0

(per

form

inc

r)

16Se

nsor

s-v0

(per

form

inc

r)

HandManipulateEgg 074 083 085 085 112 115 115

HandManipulateBlockRoatateXYZ 092 094 094 093 102 102 101

HandManipulatePenRotate 028 032 031 030 114 111 107

HandManipulateBlock 019 030 030 029 158 158 153

Mean 122 122 119

ending at rdquoTouchSensors-v0rdquo In the third experiment wegrouped 92 sensors into 16 sub-groups (Table II) to reducethe tactile sensory resolution If any of sensors in a grouphas a Boolen-True value then the sub-group returns Trueotherwise False The grouping was done per phalanx (3phalanx x 5 digits) plus a palm resulting in 16 sub-groups(Table II) Thus in the third experiment we added Boolean-value readings from the 16 sub-groups to the state (greencurves in Fig 3)

Fig 3 and results in Table IV demonstrate that tactileinformation increases performance of the agent in the tasksWe define 100 performance level as the average over thelast 50 epochs (epochs 150-200) of median test success ratein the original rdquoNoSensorsrdquo environments (blue curves inFig 3) To compare performance-increase ratio of learningwith tactile information in comparison to learning without

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateEgg (HME)

HME NoSensorsHME 92 Sensors-v1HME 92 Sensors-v0HME 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

06

08

10

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlockRotateXYZ (HMBRXYZ)

HMBRXYZ NoSensorsHMBRXYZ 92 Sensors-v1HMBRXYZ 92 Sensors-v0HMBRXYZ 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulatePenRotate (HMPR)

HMPR NoSensorsHMPR 92 Sensors-v1HMPR 92 Sensors-v0HMPR 16 Sensors-v0

0 50 100 150 200Epoch

00

02

04

Med

ian

Test

Suc

cess

Rat

e

HandManipulateBlock (HMB)

HMB NoSensorsHMB 92 Sensors-v1HMB 92 Sensors-v0HMB 16 Sensors-v0

[4 1299 8112502 20 272032]

Fig 3 Curves - median test success rate Shaded areas - interquartilerange [five random seeds] Blue curves - learning without tactile infor-mation (NoSensors) red curves - learning with float-value tactile readings(Sensors-v1) from 92 sensors black curves - learning with Boolean-valuetactile readings (Sensors-v0) from 92 sensors green curves - learning withBoolean-value tactile readings (Sensors-v0) from 16 sensor sub-groups

TABLE VLEFT FOUR COLUMNS FIRST EPOCH WHEN A CONVERGENCE CURVE

(FIG 3) REACHES THE AVERAGE PERFORMANCE (EPOCHS 150-200)WITHOUT TACTILE INFORMATION RIGHT THREE COLUMNS SAMPLE

EFFICIENCY OF LEARNING WITH TACTILE INFORMATION IN

COMPARISON TO LEARNING WITHOUT TACTILE INFORMATION

No

Sens

ors

(firs

tep

och)

92Se

nsor

s-v1

(firs

tep

och)

92Se

nsor

s-v0

(firs

tep

och)

16Se

nsor

s-v0

(firs

tep

och)

92Se

nsor

s-v1

(sam

ple

eff)

92Se

nsor

s-v0

(sam

ple

eff)

16Se

nsor

s-v0

(sam

ple

eff)

HandManipulateEgg 136 77 59 58 177 231 235

HandManipulateBlockRoatateXYZ 85 56 60 75 152 142 113

HandManipulatePenRotate 108 34 40 56 318 270 193

HandManipulateBlock 113 62 60 69 182 188 164

Mean 207 208 176

tactile information we divide the average performance overthe last 50 epochs (epochs 150-200) of the three experiments(rdquo92 Sensors-v1rdquo rdquo92 Sensors-v0rdquo rdquo16 Sensors-v0rdquo) by theaverage performance in the original experiment (rdquoNoSen-sorsrdquo) In each experiment we observe on average morethan 119 times better performance (Table IV) when tactileinformation is available

Results in Table V demonstrate sample-efficiency increasewhile training when tactile information is available Tocompare the sample efficiency for learning with and with-out tactile information we measured how many trainingepochs were necessary to reach the performance level of anagent trained without tactile information in an environmentFor example in the HandManipulateBlock environment therdquoNoSensorsrdquo performance level equals 019 (Table IV) andis reached after 113 epochs (Table V) when training withouttactile information When training with float-value tactilereadings (92 Sensors-v1) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 62 epochs(Table V) which results in the 182 times increase (Ta-ble V) in sample efficiency for training When training withBoolean-values tactile readings (92 Sensors-v0) the agentreaches the rdquoNoSensorsrdquo-performance level of 019 first timealready after 60 epochs which results in the 188 timesincrease in sample efficiency for training When training withBoolean-values tactile readings with lower resolution (16Sensors-v0)(Table II) the agent reaches the rdquoNoSensorsrdquo-performance level of 019 first time already after 69 epochswhich results in the 164 times increase in sample efficiencyfor training In each experiment we observe on average morethan 176 times faster convergence (Table V) when tactileinformation is available (Table V)

IV DISCUSSION

In this work we study the inclusion of tactile informationinto model-free deep reinforcement learning We considertasks with dynamic in-hand manipulation of known objectsby the Shadow-Dexterous-Hand model We observe thatfeedback from tactile sensors improves performance andsample efficiency for training Next to vision tactile sensingis a crucial source of information for manipulation of objectsand tool usage for humans and robots [23 24 25 26] Visualreconstruction of contact areas suffers from occlusions by themanipulator at the contact areas [27] Better inferring thegeometry mass contact and force physics of manipulatedobjects is possible when touch information is availableTactile sensing thus can provide essential information formanual interaction and grasping making it one of keycomponents for better generalization Tactile informationalso allows training of gentle manipulation of fragile objectsIt is a challenging objective to tap the potential of touch fordexterous robot hands This requires to combine hardwareand software developments for robot hand touch sensing

In this work we concatenated tactile proprioceptive andvisual information at the input level of a neural networkA possible further extension of this work is a multi-modalsensor fusion The multi-modal sensor fusion [10] allowsend-to-end training of Bayesian information fusion on rawdata for all subsets of a sensor setup It can potentially deliverbetter performance and more sample efficient training withmodel-free deep reinforcement learning approaches

V CONCLUSIONS

In this work we introduce the touch-sensors extensionsto OpenAI-Gym [16] robotics Shadow-Dexterous-Hand en-vironments [4] modeled after our touch sensor developments[1 2] We find that adding tactile information substantiallyincreases sample efficiency for training (by 97 on averageTable V) and performance (by 21 on average Table IV)in the environments when training with deep reinforcementlearning techniques [21] To examine the role of tactile-sensor parameters in these improvements we conductedexperiments (Fig 3) with varied sensor-measurement accu-racy (Boolean vs float values) and varied spatial resolutionof the tactile sensors (92 sensors vs 16 sensors on thehand) We conclude that accurate sensory readings as wellas dense tactile resolution do not substantially improveperformance and sample efficiency when training with deepreinforcement learning techniques in comparison to Booleansensor readings and sparse sensor localization (one sensorper phalanx) The performance and sample efficiency fortraining are similar in these case These experiments providebeneficial knowledge for those looking to build robots withtactile sensors for manipulation

REFERENCES

[1] Risto Koiva et al ldquoA highly sensitive 3D-shapedtactile sensorrdquo In 2013 IEEEASME InternationalConference on Advanced Intelligent MechatronicsIEEE 2013 pp 1084ndash1089

[2] Gereon H Buscher et al ldquoFlexible and stretchablefabric-based tactile sensorrdquo In Robotics and Au-tonomous Systems 63 (2015) pp 244ndash252

[3] Marcin Andrychowicz et al ldquoLearning dexter-ous in-hand manipulationrdquo In arXiv preprintarXiv180800177 (2018)

[4] Matthias Plappert et al ldquoMulti-goal reinforce-ment learning Challenging robotics environmentsand request for researchrdquo In arXiv preprintarXiv180209464 (2018)

[5] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo Injuxinet (2019)

[6] Hamza Merzic et al ldquoLeveraging Contact Forces forLearning to Grasprdquo In 2019 International Conferenceon Robotics and Automation (ICRA) IEEE 2019pp 3615ndash3621

[7] Susan J Lederman and Roberta L Klatzky ldquoHandmovements A window into haptic object recognitionrdquoIn Cognitive psychology 193 (1987) pp 342ndash368

[8] Alexander Schmitz et al ldquoTactile object recognitionusing deep learning and dropoutrdquo In 2014 IEEE-RASInternational Conference on Humanoid Robots IEEE2014 pp 1044ndash1050

[9] Roberto Calandra et al ldquoMore than a feeling Learningto grasp and regrasp using vision and touchrdquo InIEEE Robotics and Automation Letters 34 (2018)pp 3300ndash3307

[10] Timo Korthals et al ldquoJointly Trained VariationalAutoencoder for Multi-Modal Sensor Fusionrdquo In22st International Conference on Information Fu-sion(FUSION) 2019 Ottawa CA July 2-5 20192019

[11] Herke Van Hoof et al ldquoStable reinforcement learn-ing with autoencoders for tactile and visual datardquoIn 2016 IEEERSJ International Conference on In-telligent Robots and Systems (IROS) IEEE 2016pp 3928ndash3934

[12] Martin Meier et al ldquoDistinguishing sliding from slip-ping during object pushingrdquo In 2016 IEEERSJ Inter-national Conference on Intelligent Robots and Systems(IROS) IEEE 2016 pp 5579ndash5584

[13] Guillaume Walck et al ldquoRobot self-protection byvirtual actuator fatigue Application to tendon-drivendexterous hands during graspingrdquo In 2017 IEEERSJInternational Conference on Intelligent Robots andSystems (IROS) IEEE 2017 pp 2200ndash2205

[14] Stephen Tian et al ldquoManipulation by feel Touch-based control with deep predictive modelsrdquo In arXivpreprint arXiv190304128 (2019)

[15] Herke Van Hoof et al ldquoLearning robot in-hand ma-nipulation with tactile featuresrdquo In 2015 IEEE-RAS15th International Conference on Humanoid Robots(Humanoids) IEEE 2015 pp 121ndash127

[16] Greg Brockman et al OpenAI Gym 2016 eprintarXiv160601540

[17] Emanuel Todorov Tom Erez and Yuval Tassa ldquoMu-joco A physics engine for model-based controlrdquo In2012 IEEERSJ International Conference on Intelli-gent Robots and Systems IEEE 2012 pp 5026ndash5033

[18] Emanuel Todorov Sensor touch httpwwwmujocoorgbookXMLreferencehtmlsensor-touch 2019

[19] Emanuel Todorov MPL sensors httpwwwmujoco org book haptix html mpSensors 2019

[20] Timothy P Lillicrap et al ldquoContinuous control withdeep reinforcement learningrdquo In arXiv preprintarXiv150902971 (2015)

[21] Marcin Andrychowicz et al ldquoHindsight experience re-playrdquo In Advances in Neural Information ProcessingSystems 2017 pp 5048ndash5058

[22] Prafulla Dhariwal et al ldquoOpenai baselinesrdquo InGitHub GitHub repository (2017)

[23] Jonathan Maycock et al ldquoApproaching manual intel-ligencerdquo In KI-Kunstliche Intelligenz 244 (2010)pp 287ndash294

[24] Roland S Johansson and Goran Westling ldquoRoles ofglabrous skin receptors and sensorimotor memoryin automatic control of precision grip when liftingrougher or more slippery objectsrdquo In Experimentalbrain research 563 (1984) pp 550ndash564

[25] Hao Dang Jonathan Weisz and Peter K Allen ldquoBlindgrasping Stable robotic grasping using tactile feed-back and hand kinematicsrdquo In ICRA 2011 pp 5917ndash5922

[26] Markus Fritzsche Norbert Elkmann and Erik Schu-lenburg ldquoTactile sensing A key technology forsafe physical human robot interactionrdquo In 20116th ACMIEEE International Conference on Human-Robot Interaction (HRI) IEEE 2011 pp 139ndash140

[27] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo In2019 IEEE International Conference on Robotics andAutomation Workshop on Integrating Vision and Touchfor Multimodal and Cross-modal Perception(ViTac)2019 Montreal CA May 20-25 2019 2019

  • INTRODUCTION
  • METHODS
  • EXPERIMENTAL RESULTS
  • DISCUSSION
  • CONCLUSIONS
Page 5: Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile Sensing and Deep Reinforcement Learning for In-Hand Manipulation Tasks Andrew Melnik 1,

IV DISCUSSION

In this work we study the inclusion of tactile informationinto model-free deep reinforcement learning We considertasks with dynamic in-hand manipulation of known objectsby the Shadow-Dexterous-Hand model We observe thatfeedback from tactile sensors improves performance andsample efficiency for training Next to vision tactile sensingis a crucial source of information for manipulation of objectsand tool usage for humans and robots [23 24 25 26] Visualreconstruction of contact areas suffers from occlusions by themanipulator at the contact areas [27] Better inferring thegeometry mass contact and force physics of manipulatedobjects is possible when touch information is availableTactile sensing thus can provide essential information formanual interaction and grasping making it one of keycomponents for better generalization Tactile informationalso allows training of gentle manipulation of fragile objectsIt is a challenging objective to tap the potential of touch fordexterous robot hands This requires to combine hardwareand software developments for robot hand touch sensing

In this work we concatenated tactile proprioceptive andvisual information at the input level of a neural networkA possible further extension of this work is a multi-modalsensor fusion The multi-modal sensor fusion [10] allowsend-to-end training of Bayesian information fusion on rawdata for all subsets of a sensor setup It can potentially deliverbetter performance and more sample efficient training withmodel-free deep reinforcement learning approaches

V CONCLUSIONS

In this work we introduce the touch-sensors extensionsto OpenAI-Gym [16] robotics Shadow-Dexterous-Hand en-vironments [4] modeled after our touch sensor developments[1 2] We find that adding tactile information substantiallyincreases sample efficiency for training (by 97 on averageTable V) and performance (by 21 on average Table IV)in the environments when training with deep reinforcementlearning techniques [21] To examine the role of tactile-sensor parameters in these improvements we conductedexperiments (Fig 3) with varied sensor-measurement accu-racy (Boolean vs float values) and varied spatial resolutionof the tactile sensors (92 sensors vs 16 sensors on thehand) We conclude that accurate sensory readings as wellas dense tactile resolution do not substantially improveperformance and sample efficiency when training with deepreinforcement learning techniques in comparison to Booleansensor readings and sparse sensor localization (one sensorper phalanx) The performance and sample efficiency fortraining are similar in these case These experiments providebeneficial knowledge for those looking to build robots withtactile sensors for manipulation

REFERENCES

[1] Risto Koiva et al ldquoA highly sensitive 3D-shapedtactile sensorrdquo In 2013 IEEEASME InternationalConference on Advanced Intelligent MechatronicsIEEE 2013 pp 1084ndash1089

[2] Gereon H Buscher et al ldquoFlexible and stretchablefabric-based tactile sensorrdquo In Robotics and Au-tonomous Systems 63 (2015) pp 244ndash252

[3] Marcin Andrychowicz et al ldquoLearning dexter-ous in-hand manipulationrdquo In arXiv preprintarXiv180800177 (2018)

[4] Matthias Plappert et al ldquoMulti-goal reinforce-ment learning Challenging robotics environmentsand request for researchrdquo In arXiv preprintarXiv180209464 (2018)

[5] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo Injuxinet (2019)

[6] Hamza Merzic et al ldquoLeveraging Contact Forces forLearning to Grasprdquo In 2019 International Conferenceon Robotics and Automation (ICRA) IEEE 2019pp 3615ndash3621

[7] Susan J Lederman and Roberta L Klatzky ldquoHandmovements A window into haptic object recognitionrdquoIn Cognitive psychology 193 (1987) pp 342ndash368

[8] Alexander Schmitz et al ldquoTactile object recognitionusing deep learning and dropoutrdquo In 2014 IEEE-RASInternational Conference on Humanoid Robots IEEE2014 pp 1044ndash1050

[9] Roberto Calandra et al ldquoMore than a feeling Learningto grasp and regrasp using vision and touchrdquo InIEEE Robotics and Automation Letters 34 (2018)pp 3300ndash3307

[10] Timo Korthals et al ldquoJointly Trained VariationalAutoencoder for Multi-Modal Sensor Fusionrdquo In22st International Conference on Information Fu-sion(FUSION) 2019 Ottawa CA July 2-5 20192019

[11] Herke Van Hoof et al ldquoStable reinforcement learn-ing with autoencoders for tactile and visual datardquoIn 2016 IEEERSJ International Conference on In-telligent Robots and Systems (IROS) IEEE 2016pp 3928ndash3934

[12] Martin Meier et al ldquoDistinguishing sliding from slip-ping during object pushingrdquo In 2016 IEEERSJ Inter-national Conference on Intelligent Robots and Systems(IROS) IEEE 2016 pp 5579ndash5584

[13] Guillaume Walck et al ldquoRobot self-protection byvirtual actuator fatigue Application to tendon-drivendexterous hands during graspingrdquo In 2017 IEEERSJInternational Conference on Intelligent Robots andSystems (IROS) IEEE 2017 pp 2200ndash2205

[14] Stephen Tian et al ldquoManipulation by feel Touch-based control with deep predictive modelsrdquo In arXivpreprint arXiv190304128 (2019)

[15] Herke Van Hoof et al ldquoLearning robot in-hand ma-nipulation with tactile featuresrdquo In 2015 IEEE-RAS15th International Conference on Humanoid Robots(Humanoids) IEEE 2015 pp 121ndash127

[16] Greg Brockman et al OpenAI Gym 2016 eprintarXiv160601540

[17] Emanuel Todorov Tom Erez and Yuval Tassa ldquoMu-joco A physics engine for model-based controlrdquo In2012 IEEERSJ International Conference on Intelli-gent Robots and Systems IEEE 2012 pp 5026ndash5033

[18] Emanuel Todorov Sensor touch httpwwwmujocoorgbookXMLreferencehtmlsensor-touch 2019

[19] Emanuel Todorov MPL sensors httpwwwmujoco org book haptix html mpSensors 2019

[20] Timothy P Lillicrap et al ldquoContinuous control withdeep reinforcement learningrdquo In arXiv preprintarXiv150902971 (2015)

[21] Marcin Andrychowicz et al ldquoHindsight experience re-playrdquo In Advances in Neural Information ProcessingSystems 2017 pp 5048ndash5058

[22] Prafulla Dhariwal et al ldquoOpenai baselinesrdquo InGitHub GitHub repository (2017)

[23] Jonathan Maycock et al ldquoApproaching manual intel-ligencerdquo In KI-Kunstliche Intelligenz 244 (2010)pp 287ndash294

[24] Roland S Johansson and Goran Westling ldquoRoles ofglabrous skin receptors and sensorimotor memoryin automatic control of precision grip when liftingrougher or more slippery objectsrdquo In Experimentalbrain research 563 (1984) pp 550ndash564

[25] Hao Dang Jonathan Weisz and Peter K Allen ldquoBlindgrasping Stable robotic grasping using tactile feed-back and hand kinematicsrdquo In ICRA 2011 pp 5917ndash5922

[26] Markus Fritzsche Norbert Elkmann and Erik Schu-lenburg ldquoTactile sensing A key technology forsafe physical human robot interactionrdquo In 20116th ACMIEEE International Conference on Human-Robot Interaction (HRI) IEEE 2011 pp 139ndash140

[27] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo In2019 IEEE International Conference on Robotics andAutomation Workshop on Integrating Vision and Touchfor Multimodal and Cross-modal Perception(ViTac)2019 Montreal CA May 20-25 2019 2019

  • INTRODUCTION
  • METHODS
  • EXPERIMENTAL RESULTS
  • DISCUSSION
  • CONCLUSIONS
Page 6: Tactile Sensing and Deep Reinforcement Learning for In-Hand … · 2019. 10. 31. · Tactile Sensing and Deep Reinforcement Learning for In-Hand Manipulation Tasks Andrew Melnik 1,

[17] Emanuel Todorov Tom Erez and Yuval Tassa ldquoMu-joco A physics engine for model-based controlrdquo In2012 IEEERSJ International Conference on Intelli-gent Robots and Systems IEEE 2012 pp 5026ndash5033

[18] Emanuel Todorov Sensor touch httpwwwmujocoorgbookXMLreferencehtmlsensor-touch 2019

[19] Emanuel Todorov MPL sensors httpwwwmujoco org book haptix html mpSensors 2019

[20] Timothy P Lillicrap et al ldquoContinuous control withdeep reinforcement learningrdquo In arXiv preprintarXiv150902971 (2015)

[21] Marcin Andrychowicz et al ldquoHindsight experience re-playrdquo In Advances in Neural Information ProcessingSystems 2017 pp 5048ndash5058

[22] Prafulla Dhariwal et al ldquoOpenai baselinesrdquo InGitHub GitHub repository (2017)

[23] Jonathan Maycock et al ldquoApproaching manual intel-ligencerdquo In KI-Kunstliche Intelligenz 244 (2010)pp 287ndash294

[24] Roland S Johansson and Goran Westling ldquoRoles ofglabrous skin receptors and sensorimotor memoryin automatic control of precision grip when liftingrougher or more slippery objectsrdquo In Experimentalbrain research 563 (1984) pp 550ndash564

[25] Hao Dang Jonathan Weisz and Peter K Allen ldquoBlindgrasping Stable robotic grasping using tactile feed-back and hand kinematicsrdquo In ICRA 2011 pp 5917ndash5922

[26] Markus Fritzsche Norbert Elkmann and Erik Schu-lenburg ldquoTactile sensing A key technology forsafe physical human robot interactionrdquo In 20116th ACMIEEE International Conference on Human-Robot Interaction (HRI) IEEE 2011 pp 139ndash140

[27] Timo Korthals et al ldquoMultisensory Assisted In-handManipulation of Objects with a Dexterous Handrdquo In2019 IEEE International Conference on Robotics andAutomation Workshop on Integrating Vision and Touchfor Multimodal and Cross-modal Perception(ViTac)2019 Montreal CA May 20-25 2019 2019

  • INTRODUCTION
  • METHODS
  • EXPERIMENTAL RESULTS
  • DISCUSSION
  • CONCLUSIONS