Markerless visual servoing for humanoid robot platforms · Markerless visual servoing for humanoid...

2
Markerless visual servoing for humanoid robot platforms Claudio Fantacci 1 , Ugo Pattacini 1 , Vadim Tikhanoff 1 and Lorenzo Natale 1 S1 S2 S3 S4 Fig. 1. Block representation of the proposed markerless visual servoing framework. ABSTRACT Recent surge of interest in humanoid robots and their use in private or public contexts has risen the need for robust and resilient techniques for manipulation and interaction tasks [1]. These contexts present real-world challenges in that the environment is unstructured, complex and time varying. Precise and reliable manipulation and interaction tasks can be achieved when accurate knowledge of the end-effector pose is available. This is possible for industrial settings, where it is required to repeat similar tasks over time, in a fine-calibrated setting and in a well-known and structured environment. Many humanoid robots instead: I) are supposed to act in dynamic and unknown environment wherein object poses and shapes are unknown and II) have unreliable proprioception due to measurement noises, sensor biases, mechanical elasticity of the links and so forth. With this motivations, we propose a framework towards addressing I, II in the context of grasping tasks. In particular, we use vision for compensating for the robot’s proprioception errors and, as a result, such a refined information allows designing of a visual servoing control [2]–[4] for precise reaching and grasping. Our approach is markerless and makes use of stereo vision information and RGB images. As main contribution to our previous work [5], we propose an image-based visual servoing approach with decoupled translation and orientation controls. The framework we propose for markerless visual servoing consists of the following steps (cfr. Fig. 1): S1. The grasping goal pose is set by using the 3D point cloud acquired from stereo vision. S2. An open loop phase brings the robot’s end-effector in the proximity of the object and in the cameras field-of-views, then the 3D model-aided particle filter of [5] estimates the end-effector pose using RGB images. S3. Visual servoing uses the particle filter output of S4 in order to reach for the pose computed in S2. S4. Reaching completes and the robot grasps the object. Within S2, the 3D model-aided particle filter of [5] estimates the 6D pose (position and orientation) of the robot’s end- effector without the use of markers. To yield out estimates we resort to Computer Aided Design (CAD) schematics of the robot in order to render 3D mesh models of the end-effector as they would appear from the robots viewpoints (cfr. Fig. 2). In particular: 1 ) we render, for each particle, an image of the 3D mesh model of the end-effector as it would appear from the robot’s viewpoints (likewise in augmented reality contexts); 2 ) we then use this state representation to directly estimate the 6D pose of the end-effector in the robot operative space using 2D image descriptors. In particular, we use Histograms of Oriented Gradients (HOG) [6] to compare the rendered images with the robot’s camera images and, as a result, the particles which are more likely to represent the end-effector will have higher weight. Further details and a pseudo code of the 3D model-aided SIS PF can be found in [5]. During S3, a visual servoing controller commands the robot’s end-effector for accurately reaching a desired pose. A good visual servoing approach for humanoid robots requires the design of a robust and reliable control law and a human-like motion of the upper-body. The two main ingredients to design a visual servo control are the goal pose x g and the current pose of the end-effector x e , which is given by the 3D model-aided particle filter of [5]. The visual servoing objective is to minimize the error e k , s(x e k ) - s(x g )= s e k - s g , (1) where s e k and s g are some feature representing, respectively, the manipulator and the goal pose. Once a feature s is selected, the aim is to design a velocity controller ˙ x e = -K e J e, (2) 1 Claudio Fantacci, Ugo Pattacini, Vadim Tikhanoff and Lorenzo Natale are with Istituto Italiano di Tecnologia, iCub Facility, Humanoid Sensing and Perception, Via Morego 30, Genova, Italy [email protected], [email protected], [email protected], [email protected]

Transcript of Markerless visual servoing for humanoid robot platforms · Markerless visual servoing for humanoid...

Page 1: Markerless visual servoing for humanoid robot platforms · Markerless visual servoing for humanoid robot platforms Claudio Fantacci 1, Ugo Pattacini , Vadim Tikhanoff and Lorenzo

Markerless visual servoing for humanoid robot platforms

Claudio Fantacci1, Ugo Pattacini1, Vadim Tikhanoff1 and Lorenzo Natale1

S1 S2 S3 S4

Fig. 1. Block representation of the proposed markerless visual servoing framework.

ABSTRACTRecent surge of interest in humanoid robots and their use in private or public contexts has risen the need for robust

and resilient techniques for manipulation and interaction tasks [1]. These contexts present real-world challenges in thatthe environment is unstructured, complex and time varying. Precise and reliable manipulation and interaction tasks can beachieved when accurate knowledge of the end-effector pose is available. This is possible for industrial settings, where it isrequired to repeat similar tasks over time, in a fine-calibrated setting and in a well-known and structured environment. Manyhumanoid robots instead: I) are supposed to act in dynamic and unknown environment wherein object poses and shapes areunknown and II) have unreliable proprioception due to measurement noises, sensor biases, mechanical elasticity of the linksand so forth. With this motivations, we propose a framework towards addressing I, II in the context of grasping tasks. Inparticular, we use vision for compensating for the robot’s proprioception errors and, as a result, such a refined informationallows designing of a visual servoing control [2]–[4] for precise reaching and grasping. Our approach is markerless andmakes use of stereo vision information and RGB images. As main contribution to our previous work [5], we propose animage-based visual servoing approach with decoupled translation and orientation controls.

The framework we propose for markerless visual servoing consists of the following steps (cfr. Fig. 1):S1. The grasping goal pose is set by using the 3D point cloud acquired from stereo vision.S2. An open loop phase brings the robot’s end-effector in the proximity of the object and in the cameras field-of-views,

then the 3D model-aided particle filter of [5] estimates the end-effector pose using RGB images.S3. Visual servoing uses the particle filter output of S4 in order to reach for the pose computed in S2.S4. Reaching completes and the robot grasps the object.

Within S2, the 3D model-aided particle filter of [5] estimates the 6D pose (position and orientation) of the robot’s end-effector without the use of markers. To yield out estimates we resort to Computer Aided Design (CAD) schematics of therobot in order to render 3D mesh models of the end-effector as they would appear from the robots viewpoints (cfr. Fig. 2).In particular: 1 ) we render, for each particle, an image of the 3D mesh model of the end-effector as it would appear fromthe robot’s viewpoints (likewise in augmented reality contexts); 2 ) we then use this state representation to directly estimatethe 6D pose of the end-effector in the robot operative space using 2D image descriptors. In particular, we use Histograms ofOriented Gradients (HOG) [6] to compare the rendered images with the robot’s camera images and, as a result, the particleswhich are more likely to represent the end-effector will have higher weight. Further details and a pseudo code of the 3Dmodel-aided SIS PF can be found in [5].

During S3, a visual servoing controller commands the robot’s end-effector for accurately reaching a desired pose. A goodvisual servoing approach for humanoid robots requires the design of a robust and reliable control law and a human-likemotion of the upper-body. The two main ingredients to design a visual servo control are the goal pose xg and the currentpose of the end-effector xe, which is given by the 3D model-aided particle filter of [5]. The visual servoing objective is tominimize the error

ek , s(xek)− s(xg) = sek − sg , (1)

where sek and sg are some feature representing, respectively, the manipulator and the goal pose. Once a feature s is selected,the aim is to design a velocity controller

xe = −KeJ†e , (2)1Claudio Fantacci, Ugo Pattacini, Vadim Tikhanoff and Lorenzo Natale are with Istituto Italiano di Tecnologia, iCub Facility, Humanoid Sensing and

Perception, Via Morego 30, Genova, Italy [email protected], [email protected], [email protected],[email protected]

Page 2: Markerless visual servoing for humanoid robot platforms · Markerless visual servoing for humanoid robot platforms Claudio Fantacci 1, Ugo Pattacini , Vadim Tikhanoff and Lorenzo

Fig. 2. Left: mechanical model of the iCub right arm. Center: image from the left camera of iCub. Right: rendered image of the right end-effector(hand) of the iCub.

where Ke > 0 is a proportional gain, J ∈ Rm×6 is the feature Jacobian, or simply Jacobian, and J† its Moore-Penrosepseudo-inverse.

We adopt the image-based visual servoing approach [4], which is preferable over position-based visual servoing, becauseit allows precise control despite errors in the extrinsic camera parameters. Thus, we use image-plane coordinates of a set ofpoints to define the feature vector s∗. Unfortunately, in our settings, popular visual servo controllers [4] provide unsatisfactorycamera velocity profiles and unexpected translation motion when a rotation is needed to reach the goal pose. As a result,unsuitable Cartesian trajectory of the end-effector are generated (see the dashed lines in Fig. 3). To tackle this problem, wepropose a new image-based visual servoing control that provides satisfactory Cartesian trajectories.

140 160 180 200 220 240 260 280 300 320 340

60

80

100

120

140

160

180

200

Fig. 3. Left camera view of four image-plane trajectories performed by theright end-effector using different image Jacobians. The green and red crossesrepresent, respectively, the initial and final position of the end-effector. Thereaching task was carried out in simulation to avoid damaging the robot, andit mainly consists of a translation toward the left and a small positive rotation.The solid black line on the right highlights the end of the image frame, whichin our setting is 320× 240. Note that only the green solid line, representingour image-based visual servoing control, is capable of providing a satisfactorytrajectory.

Our approach considers two image-based visual servoingproblems to be solved. The first solves for the translationmotion assuming the rotation completed. This is equivalentto consider the current pose xe

t of the end-effector as thecombination of the 3D Cartesian component of xe and ofthe orientation of xg . Conversely, in the second problemwe compute the rotation motion under the assumption ofachieved translation, which is equivalent to consider thecurrent pose xe

o of the end-effector as the combination ofthe 3D Cartesian component of xg and of the orientationof xe. We then proceed with the classic approach presentedin [4], defining four coplanar 3D points around xe

t and xeo

and by computing the two image Jacobians Jet and Je

o tosynthesize two velocity controller as defined in (2). Theresulting trajectory turns out to be satisfactory, combininga decoupled translation and rotation motion. A comparisonview of the trajectories is shown in Fig. 3.

We tested effectiveness and robustness of the proposedframework on the iCub humanoid robot platform. Weachieved sub-pixel precision while performing reachingand grasping tasks, decreasing the Cartesian error by anorder of magnitude w.r.t. using the pose of the handprovided by the direct kinematics achieving millimeterprecision. The C++ implementation of our method is freelyavailable on Github1.

REFERENCES

[1] J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven grasp synthesis-A survey,” IEEE Transactions on Robotics, vol. 30, no. 2, pp. 289–309,2014.

[2] B. Espiau, F. Chaumette, and P. Rives, “A new approach to visual servoing in robotics,” IEEE Transactions on Robotics and Automation, vol. 8, no. 3,pp. 313–326, 1992.

[3] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,” IEEE Transactions on Robotics and Automation, vol. 12, no. 5,pp. 651–670, 1996.

[4] F. Chaumette and S. Hutchinson, “Visual servo control. I. Basic approaches,” IEEE Robotics & Automation Magazine, vol. 13, no. 4, pp. 82–90, 2006.[5] C. Fantacci, U. Pattacini, V. Tikhanoff, and L. Natale, “Visual end-effector tracking using a 3D model-aided particle filter for humanoid robot platforms,”

IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Vancouver, BC, Canada, September 24–28, 2017. arXiv preprint 1703.04771.[6] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Computer Society Conference on Computer Vision and

Pattern Recognition (CVPR), vol. 1, pp. 886–893, IEEE, 2005.

1� github.com/robotology/visual-tracking-control/tree/master