Object Recognition on the REEM robot
Transcript of Object Recognition on the REEM robot
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 1/88
Implementing visual perception tasks for
the REEM robot
Visual pre-grasping pose
Author: Bence Magyar
Supervisors: Jordi Pages, PhD ; Dr. Zoltan Istenes, PhD
Barcelona & Budapest, 2012
Master Thesis Computer Science
Eotvos Lorand University Faculty of Informatics - Department of Software Technology and
Methodology
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 2/88
Contents
List of Tables IV
List of Figures V
1 Introduction and background 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 REEM introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline and goal of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 ROS Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Computer Vision basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Grasping problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 Visual servoing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Object detection survey and State of the Art 10
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Available sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Survey work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Brief summary of survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Pose estimation of an object 17
3.1 Introduction and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 CAD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Edge detection on color image . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5 Particle filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 Feature detection (SIFT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 kNN and RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7.1 kNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.7.2 RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.8 Implemented application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
I
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 3/88
3.8.1 Learning module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8.2 Detector module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8.3 Tracker module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8.4 State design pattern for node modes . . . . . . . . . . . . . . . . . . . 33
3.9 Pose estimation results and ways for improvement . . . . . . . . . . . . . . . . 35
3.10 Published software, documentation and tutorials . . . . . . . . . . . . . . . . . 35
3.11 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.12 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Increasing the speed of pose estimation using image segmentation 38
4.1 Image segmentation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Segmentation using image processing . . . . . . . . . . . . . . . . . . . . . . 39
4.3 ROS node design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Stereo disparity-based segmentation . . . . . . . . . . . . . . . . . . . . . . . 41
4.4.1 Computing depth information from stereo imaging . . . . . . . . . . . 41
4.4.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Template-based segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5.1 Template matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.6 Histogram backprojection-based segmentation . . . . . . . . . . . . . . . . . . 44
4.6.1 Histogram backprojection . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.7 Combined results with BLORT . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.8 Published software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.9 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.10 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Tracking the hand of the robot 49
5.1 Hand tracking problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 AR Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 ESM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Aruco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Application examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.7 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Experimental results for visual pre-grasping 56
6.1 Putting it all together: visual servoing architecture . . . . . . . . . . . . . . . . 56
6.2 Tests on the REEM RH2 robot . . . . . . . . . . . . . . . . . . . . . . . . . . 576.3 Hardware requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
II
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 4/88
7 Conclusion 60
7.1 Key results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8 Bibliography 63
A Appendix 1: Deep survey tables 67
B Appendix 2: Shallow survey tables 71
III
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 5/88
List of Tables
2.1 Survey summary table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1 Effect of segmentation on detection . . . . . . . . . . . . . . . . . . . . . . . 47
A.1 Filtered deep survey part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
A.2 Filtered deep survey part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
A.3 Filtered deep survey part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
B.1 Wide shallow survey part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
B.2 Wide shallow survey part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
B.3 Wide shallow survey part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
B.4 Wide shallow survey part 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
B.5 Wide shallow survey part 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
B.6 Wide shallow survey part 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.7 Wide shallow survey part 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
B.8 Wide shallow survey part 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
B.9 Wide shallow survey part 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
IV
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 6/88
List of Figures
1.1 Willowgarage’s PR2 finishing a search task . . . . . . . . . . . . . . . . . . . 2
1.2 PAL Robotics’ REEM robot . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Two nodes in the ROS graph connected through topics . . . . . . . . . . . . . 4
1.4 Real scene with the REEM robot . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Examples for grasping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 REEM grasping a juicebox . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Closed loop architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Most common sensor types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Stereo camera theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 LINE-Mod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 ODUFinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 RoboEarth Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 ViSP Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 ESM Tracker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 CAD model of a Pringles box in MeshLab . . . . . . . . . . . . . . . . . . . . 20
3.2 Examples of rendering in case of BLORT . . . . . . . . . . . . . . . . . . . . 21
3.3 Image convolution example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Steps of image processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Particle filter used for localization . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 Particles visualized on the detected object of BLORT. Greens are valid, reds areinvalid particles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Extracted SIFT and ORB feature points of the same scene . . . . . . . . . . . . 27
3.8 SIFT orientation histogram example . . . . . . . . . . . . . . . . . . . . . . . 28
3.9 Extracted SIFTs. Red SIFTs are not in the codebook, yellow and green ones are
considered as object points, green ones are inliers of the model and yellow ones
are outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 Detection result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.11 Tracking result, the object visible in the image is rendered . . . . . . . . . . . 323.12 Diagram of the tracking mode . . . . . . . . . . . . . . . . . . . . . . . . . . 33
V
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 7/88
3.13 Diagram of the singleshot mode . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.14 Screenshots of the ROS wiki documentation . . . . . . . . . . . . . . . . . . . 36
4.1 Example of erosion the black pixel class were eroded . . . . . . . . . . . . . . 39
4.2 Example of dilation where the black pixel class were dilated . . . . . . . . . . 404.3 ROS node design of segmentation nodes . . . . . . . . . . . . . . . . . . . . . 40
4.4 Parameters exposed through dynamic reconfigure . . . . . . . . . . . . . . . . 41
4.5 Example of stereo vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6 Masked, input and disparity images . . . . . . . . . . . . . . . . . . . . . . . 42
4.7 Template-based segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.8 The process of histogram backprojection-based segmentation . . . . . . . . . . 44
4.9 Histogram segmentation using a template of the target orange ball . . . . . . . 45
4.10 The segmentation process and BLORT . . . . . . . . . . . . . . . . . . . . . . 46
4.11 Test scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.12 Screenshot of the ROS wiki documentation . . . . . . . . . . . . . . . . . . . 48
5.1 ARToolkit markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 ARToolkit in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3 Tests using the ESM ROS wrapper . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Example markers of Aruco . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.5 Otsu thresholding example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.6 Video of Aruco used for visual servoing. Markers are attached to the hand andto the target object in the Gazebo simulator. . . . . . . . . . . . . . . . . . . . 54
5.7 Tests done with Aruco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1 Putting it all together: visual servoing architecture . . . . . . . . . . . . . . . . 56
6.2 A perfect result with the juicebox . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.3 An experiment gone wrong . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4 An experiment where tracking was tested . . . . . . . . . . . . . . . . . . . . 58
VI
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 8/88
Acknowledgements
First I would like to thank my family and my girlfriend for their support and patience during
my 5-month journey in science and technology in Barcelona. The same appreciation goes to
my friends. I would also like to say thanks to Eotvos Lorand University and PAL Robotics
for providing the opportunity as an Erasmus internship to conduct such research at a foreign
country. Many thanks to my advisors Jordi Pages who mentored me at PAL and Zoltan Isteneswho both helped me forming this manuscript and organizing my work so that it can be pre-
sented. Thumbs up for Thomas Morwald who was always willing to answer my questions
about BLORT. The conversations and emails exchanged with Ferran Rigual, Julius Adorf and
Dejan Pangercic helped a great deal with my research.
I really enjoyed the friendly environment created by the co-workers and interns of PAL
Robotics especially: Laszlo Szabados, Jordi Pages, Don Joven Agravante, Adolfo Rodriguez,
Enrico Mingo, Hilario Tome, Carmen Lopera and all.
I would also like to give credit to everyone whose work served as a basis for my thesis.
These people are the members of the open source community and the developers of: Ubuntu,
C++, OpenCV, ROS, Texmaker, Latex, Qt Creator, GiMP, Inkscape and many more.
Thank you.
VII
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 9/88
Chapter 1
Introduction and background
1.1 IntroductionEven though we are not aware of it we are already being surrounded by robots. The most ac-
cepted definition of a robot is that it is some kind of machine that’s automated in order to help
its owner by completing some tasks. They might not have human form as one would assume but
the only difference is that humanoid robots are bigger and more complex. A humanoid robot
could replace humans in various hazardous situations where a human form is still required such
as the tools provided for a rescue mission are hand-tools designed for humans. Although pop-
ular science fiction and sometimes even scientist like to paint a highly developed and idealized
picture about robotics, it is only in the state of maturing.
Despite the initial football oriented goal, even RoboCup - one of the most respected robotics
competitions - has a special league called Robocup@Home where humanoid robots compete in
well defined common tasks in home environments. 1 Also the DARPA Grand Challenge - the
most well-founded competition - has announced its latest challenge centered around a humanoid
robot. 2
1
http://www.ai.rug.nl/robocupathome/2http://spectrum.ieee.org/automaton/robotics/humanoids/
darpa-robotics-challenge-here-are-the-official-details
1
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 10/88
Figure 1.1: Willowgarage’s PR2 finishing a search task
However there is no generally accepted solution even for manipulating simple objects like
boxes and glasses at the current time but this field has been through a lot of development lately.
Right now it is still an open problem but promising works such as [ 15] have been published.
Finding and identifying an object to be grasped highly depends on the type and number of
sensors a robot has.
1.2 REEM introduction
The latest creation of PAL Robotics is the robot named REEM.
Figure 1.2: PAL Robotics’ REEM robot
With its 22 degrees of freedom, 8 hours of battery time, 30kg payload and 4km/h speed it
is one of the top humanoid service robots. Each arm owns 7 degrees of freedom with 2 for
the torso and 2 for the head. The head unit of REEM holds a pair of cameras as well as a
2
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 11/88
microphone and speaker system while a touch screen is available on the chest for multimedia
applications such as map navigation.
The main goal of this thesis work was to develop applications for this specific robot while
making sure that the end result will still be general enough to allow the usage of other robot
platforms.
1.3 Outline and goal of the thesis
Before going into more details the basic problems are needed to be defined.
The goal of this thesis was to implement computer vision modules for grasping tabletop
objects with the REEM robot. To be more precise: it consisted of implementing solutions for
the sub-problems of visual servoing in order to solve the grasping problem. This covers the
following two tasks from the topic statement: ”detection of textured and non-textured objects”,
”detection of tabletop objects for robot grasping”.
The first and primary problem encountered is the pose estimation problem which was the
main task of this thesis work. There are several examples in scientific literature solving
slightly different problems partial to pose estimation. One of them is the object detectionproblem and the other one is the object tracking problem. It is crucial to always have these
problems in mind when dealing with objects through vision.
The pose estimation problem is that we have to compute an estimated pose of an object
given some input image(s) and - possibly - given additional background knowledge. Ways of
defining a pose can be found at 1.6.
An object detection problem can be identified by its desired answer type. One is dealing
with object detection if the desired answer for an image or image sequence is whether an object
is present or its number of appearances. This problem is typically solved using features.
Numerous examples and articles can also be found for the object tracking problem. Usu-
ally these type of methods are specialized to provide real-time speed. To do so they require an
initialization stage before starting the tracker. Concretely: the target object has to be set to an
initial pose or the tracker has to be initialized with the pose of the object.
The secondary task of this thesis was to provide solution for tracking the hand of the
REEM robot during the grasping process so the manipulator position and the target position
3
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 12/88
can be inserted into a visual servoing architecture.
A necessary overall objective was to provide all results with speed that makes these appli-
cations eligable for deployment in a real scene on the REEM robot.
1.4 ROS Introduction
ROS [29] (Robot Operating System) is a meta operating system designed to help and enhance
the work of robotics scientists. ROS is not an operating system in the traditional sense of process
management and scheduling; rather, it provides a structured communications layer above the
host operating systems of a heterogenous computer cluster.
At the very core of it, ROS provides an implementation of the Observer Design Pattern [14,
p. 293] and additional software tools to well organize the system. A ROS system is made up of
nodes which serve as computational processes in the system. ROS nodes are communicating
via typed messages through topics which are registered by using simple strings as names. A
node can publish and/or subscribe to a topic.
A ROS system is completely modular, each node can be dynamically started or stopped,
they are all independent components in the system depending on each other only for data input
reasons. Topics provide continous dataflow-style processing of messages but they have limita-
tions if one would like to use a node service in a ”blocking call” way. There is a way to create
such interfaces for nodes and these are called services. To support a dynamic way to store and
modify commonly used global or local parameters, ROS also has a Parameter Server through
which nodes can read, create, modify parameters.
The link below provides more information about ROS:
http://ros.org/wiki
Figure 1.3: Two nodes in the ROS graph connected through topics
Among others, ROS also provides
4
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 13/88
• a build system - rosbuild, rosmake
• a launch system - roslaunch
• monitoring tools - rxgraph, rosinfo, rosservice, rostopic
1.5 OpenCV
OpenCV [9] stands for Open Source Computer Vision and it is a programming library developed
for real time computer vision tasks. OpenCV is released under a BSD license, it is free for both
academic and commercial use. It has C++, C and Python interfaces running on Windows, Linux,
Android and Mac. It provides an implementation of several image processing and computer
vision algorithms classic and state of the art alike. It has great amounts of supplementary
material available on the internet such as [22]. It is being developed by Willowgarage along
with ROS and is widely used for vision oriented applications on all platforms. All tasks related
to image-processing in this thesis work were solved using OpenCV.
1.6 Computer Vision basics
This section will go through the very basic definitions of Computer Vision.
A rigid body in 3D space is defined by its position and orientation which is commonly refer-
enced as pose. Such a pose is always defined with respect to an orthonormal reference frame
where x,y,z are the unit vectors of the frame axes.
Position of a point O on the rigid body with respect to the coordinate frame O − xyz is ex-
pressed by the relation
o = oxx + oyy + ozz (1.1)
, where ox, oy, oz denote the components of the vector o ∈ R3 along the frame axes. The
position of O therefore can be defined as vector o as follows:
o =
ox
oy
oz
(1.2)
So far we covered the position element of the objects pose.
5
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 14/88
The orientation of O can be defined w.r.t its reference frame as follows:
x = xxx + x
yy + xzz
y = y xx + y
yy + yzz
z = z xx + z yy + z zz
(1.3)
A more practical form is the following usually called a rotation matrix R:
R =
x y z
=
xx yx z x
xy yy z y
xz yz z z
=
xT x yT x z T x
xT y yT y z T y
xT z yT z z T z
(1.4)
The columns of matrix R are mutually orthogonal so as a consequence
RT R = I 3 (1.5)
where I 3 denotes the (3 × 3) identity matrix.
It is clear that the rotation matrix above is redundant in representation. In some cases a
unit quaternion representation is used. Given a unit quaternion q = (w,x,y ,z ) the equivalent
rotation matrix can be computed:
Q =
1 − 2y2
− 2z 2
2xy − 2zw 2xz + 2yw
2xy + 2zw 1 − 2x2 − 2z 2 2yz − 2xw
2xz − 2yw 2yz + 2xw 1 − 2x2 − 2y2
(1.6)
Note: When talking about transformations the components of a pose are usually called
translation and rotation instead of position and orientation.
[35] provided great help for writing this section.
1.7 Grasping problem
A grasping problem has several definitions depending on specific parameters. Since the goal of
this thesis was not visual servoing the presented grasping problem will be a simplified version.
6
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 15/88
Figure 1.4: Real scene with the REEM robot
Given an object frame in the 3D space the task is to find an apropriate sequence of operations
resulting the used robot manipulator in a pose that meets the desired definition of goal frame.Let F co denote the object frame w.r.t the camera frame and define the goal frame as
F cg = T of f · F co (1.7)
where T of f is a transformation that defines a desired offset on the object frame. Also let F cm
denote the manipulator frame w.r.t. the camera frame.
The next task is to find the sequence T 1, T 2,...,T n where
|| T 1 · T 2 · ... · T n · F cm − F cg ||< (1.8)
is true where is a pre-defined error. The transformations T 1, T 2,...,T n are applied to a kine-
matic chain describing the robots current state.
Initialize robot manipulator ;1
Detect object(s) and determine pose ;2
Compute goal frame ;3
while || T 1 · T 2 · ... · T n · F cm − F cg ||>= do4
Manipulate/modify T 1, T 2,...,T n to minimize error. ;5
end6
Grasp object - close hand/gripper ;7
Algorithm 1: General grasping algorithm
7
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 16/88
(a) (b) (c)
(d) (e)
Figure 1.5: Examples for grasping
Figure 1.6: REEM grasping a juicebox
8
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 17/88
1.8 Visual servoing
Following the same principles as with motor control visual servoing stands for controlling a
robot manipulator based on feedback. In this particular case the feedback is obtained by using
computer vision. It is also referred to as Vision-Based Control and has 3 main types such as:
• Image Based (IBVS): The feedback is the error between the current and the desired image
points on image plane. Does not include 3D pose at all therefore is often referred to as
2D visual servoing.
• Position Based (PBVS): The main feedback is the 3D pose error between the current pose
and the goal pose. Usually referred to as 3D visual servoing.
• Hybrid: 2D-3D visual servoing approaches are taking image features as well as 3D pose
information combining the two servoing methods mentioned above.
Visual servoing is categorized as closed loop control. Figure 1.7 shows the general archi-
tecture of visual servoing.
Figure 1.7: Closed loop architecture
9
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 18/88
Chapter 2
Object detection survey and State of the
Art
2.1 Introduction
As a precursor for this project a wide survey of existing software packages and techniques
needed to be done. The survey consisted of 2 stages.
1. A wider survey for shallow testing and research to classify possible subjects. The table
of results can be found in Appendix 2 B.
2. A filtered survey based on the attributes and previous results and experiences with more
detailed tests and research also taking available sensors into account. The table of results
can be found in Appendix 1 A.
This chapter introduces the most resulting softwares and techniques from the above surveys
providing the benefits and drawbacks experienced.
2.2 Available sensors
There are several ways to address the tasks of digitally recording the world. While there is a
wide variety of sensors suitable for image-based applications when building an actual humanoid
robot one has to consider to choose the type best fitting the application and one that can fit into
a robot body or - more preferably - into a robot head.
10
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 19/88
(a) Monocular camera (b) Stereo cameras (c) RGB-D devices (d) Laser scanner
Figure 2.1: Most common sensor types
• Monocular cameras usually provide accurate colors and can be reasonably faster than the
other sensor types.
• Stereo cameras usually require extra processing time since it is a system of two calibratedcameras with a predefined distance between the two monocular cameras the system con-
sists of. They are also used for digital and analog 3D filming and photoshooting.
Figure 2.2: Stereo camera theory
• RGB-D sensors are operating with structured light or time-of-flight techniques and have
become quite popular and frequent thanks to Microsoft Kinect or the Asus Xtion. These
sensors are cheaper then a medium quality stereo camera system and also require less or
no calibration at all but their quality is fixed to standard webcameras. They have special
hardware for processing the data into RGB images with depth information namely RGB-
D. A really common use cases for these sensors are human-PC virtual reality interactioninterfaces.
11
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 20/88
• Laser scanners are more industrial and usually substantially more expensive than the oth-
ers. Due to the primary industrial design of laser scanners they have an extremely low
error rate and high resolution. They are mostly used on mobile robots for mapping tasks
or 3D object scanning for graphical or medical applications.
2.3 Survey work
This section summarizes the research conducted for this thesis mentioning test experiences if
there were any.
Holzer et al. defined so-called distance templates and applied them using regular template
matching methods.
Hinterstoisser et al. introduced a method using Dominant Orientation Templates to identify
textureless objects and estimate their pose in real-time. In their very recent work Hinterstoisser
et al. engineered the method LINE-Mod for detecting textureless objects using gradient normals
and surface normals. The advantage of their approach is that even though an RGB-D sensor is
required in the learning stage, a simple webcamera is enough to detect - of course the error will
increase since there are no surface points available from a webcam. A compact implementation
is available since OpenCV 2.4.
Experiments done with LINE-Mod showed that this method cannot be applied to textured
objects although it is a reasonably nice alternative for textureless objects. An experience gained
by using this method is that the false detection rate was extremely high and no applicable 3D
pose result could be obtained, it only provided if an object was detected or not. The first
implementation was released at the time of this thesis work therefore it is possible that future
versions will improve results. The product of this thesis work could be expanded to textureless
objects using this technique.
Test videos prepared for this thesis:
• http://www.youtube.com/watch?v=2cCsYfwQGxI
• http://www.youtube.com/watch?v=3e3Wola4EWA
12
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 21/88
Figure 2.3: LINE-Mod
Nister and Stewenius speeded up the classic feature-based matching approach by utilizing
a tree-based search optimized data structure. They also ran experiments on a PR2 robot and
released the ROS package named Objects of Daily Use Finder (ODUFinder).
Figure 2.4: ODUFinder
However the theoretical base of this method is solid conducted experiments showed that the
practical results were not applicable for a mobile robot working in human environment at the
time of this work.
Muja et al. implemented the general Recognition Infrastructure to host and coordinate the
modules of recognition pipelines while Rusu et al. provided an example use case applying Bi-
narized Gradient Grids and Viewpoint Feature Histograms.
13
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 22/88
The OpenCV group, Rublee et al. defined a new type of feature detector/extractor Oriented
BRief (ORB) to provide a BSD licensed solution in constrast to SURF [5]. The work of [4]
was to experiment and create benchmarks for TOD [34] using ORB as its main feature detec-
tion/extracion method.
Experimental work was done for this thesis to see if SIFT could be replaced with ORB in Chap-
ter 3 but due to deadlines it was not possible to implement it. As future work it would be
however a nice addition to the final software.
The work of [38], RoboEarth is a general communication platform for robots and has a
ROS package which contains a database client and a detector module. Even though the detec-
tor module of RoboEarth was not precise enough for the task of this thesis it’s still exemplary
as robotics software.
The tests of RoboEarth package were really smooth and easy to do since they provided tuto-
rials and convenient interfaces for their software. The requirements of the system however did
not exactly meet the provided hardware because the detector of RoboEarth needs an RGB-D
sensor and with REEM we only had a stereo camera. Experiments showed that obtaining a
precise pose is hard due to its high variance and the false detection rate was also high.
Figure 2.5: RoboEarth Detector
The published library of Eric Marchand et al. called ViSP contains tools for visual servoing
tasks along image processing and other fields as well. ViSP is also available as a ROS package
and it contains a ready-to-use model-based tracker tracking edges of the object model. The
ViSP tracker operates using the edges of the objects and tracks it starting from a known initial
position.
14
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 23/88
Figure 2.6: ViSP Tracker
Thanks to the ROS package provided for it the ViSP Tracker was easy to test. The good
results almost made it the primary choice of tracker however it still requires a known initial
position to start and it solely does tracking which alone does not solve the pose estimation
problem alone. Though it provided good results problems occured due to the limit of only
using greyscale images. The tracker finally chosen (3.4) to be used is also taking colors into
account.
A remarkably unique approach for tracking 2D planar patterns is ESM[7]. It has a free
version for demoing but also provides a licensed version which is highly optimized. It did
not prove reliable enough and the output format also raised problems for this task. During the
tracking process the template searched is always modified to adapt to small changes over time.
Because of this it can only work when there’s tiny difference between two consecutive images
and more importantly the target pattern should not travel too big distances between such im-
ages. It is also worth mentioning that since this technique is also a tracker the initial pose is
required. The implementation provided for this technique did not make the job of testing it eas-
ier with it’s dinamically linked C library and C header file. No open-source solution is provided.
Figure 2.7: ESM Tracker
15
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 24/88
Mixing techniques proved to be successful in markerless object detection. Feature detection
and edge-tracking based methods were presented and discussed in [26], [37], [30], [25], [31]
[8], [19], [20] and [21] leading to the birth of a software package called The Blocks World
Robotic Vision Toolbox. The basic idea is to use a feature-based pose estimation module to
acquire a rough estimation and than use this result to initialize a real-time tracker using edge
detection to make things a lot faster and dynamic. As a result of this survey work first BLORT
was chosen to be further tested and to be integrated into the software of the REEM robot.
2.4 Brief summary of survey
Table 2.1 is summarizing the previous section in a table form highlighting the most relevant
attributes.
Name Tracker Detector Hybrid Sensor Texture Speed Output Keywords
ViSP tracker Yes No No Monocular Only edges 30Hz Pose edge tracking,
grayscale, particle
filter
RoboEarth No Yes No RGB-
D(train,
detect),
monocu-
lar(detect)
Needed 11Hz Pattern
matched
kinect, point cloud
matching, texture
matching
LINE-Mod No Yes No RGB-
D(train,
detect),
monocu-
lar(detect)
Low texture 30Hz Pattern
matched
surface and color
gradient normals,
kinect
ESM Yes No No Monocular Needed 30Hz Homography custom minimiza-
tion, pattern match-
ing
ODUFinder No Yes No Monocular Needed 4-6Hz Matched
SIFTs
SIFT, vocabulary tree
BLORT No No Yes Monocular Needed 30Hz+ 3D pose SIFT, edge, CAD,
RANSAC, OpenGL
Table 2.1: Survey summary table
16
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 25/88
Chapter 3
Pose estimation of an object
3.1 Introduction and overviewAs a result of the wide and then the deep survey one software package was chosen to be inte-
grated with the REEM robot. A crucial factor of all techniques surveyed was to see what type
of sensor is required because the REEM robot does not have a visual depth sensor in the head.
Early experiments with the BLORT system showed that it could be capable of serving as a pose
estimator on REEM for the grasping task . It provided correct results with a low ratio of false
detections especially when compared to others along with a reasonably good speed.
BLORT - The Blocks World Robotic Vision Toolbox
The vision and robotics communities have developed a large number of increas-
ingly successful methods for tracking, recognizing and online learning of objects,
all of which have their particular strengths and weaknesses. A researcher aiming
to provide a robot with the ability to handle objects will typically have to pick
amongst these and engineer a system that works for her particular setting. The
toolbox is aimed at robotics research and as such we have in mind objects typicallyof interest for robotic manipulation scenarios, e.g. mugs, boxes and packaging of
various sorts. We are not aiming to cover articulated objects (such as walking hu-
mans), highly irregular objects (such as potted plants) or deformable objects (such
as cables). The system does not require specialized hardware and simply uses a sin-
gle camera allowing usage on about any robot. The toolbox integrates state-of-the
art methods for detection and learning of novel objects, and recognition and track-
ing of learned models. Integration is currently done via our own modular robotics
framework, but of course the libraries making up the modules can also be separately
integrated into own projects.
17
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 26/88
Source: http://www.acin.tuwien.ac.at/?id=290 (Last accessed: 2012.10.16.)
For the core of BLORT credits are to Thomas Morwald, Johann Prankl, Michael Zwillich,
Andreas Richtsfeld and Markus Wincze at the Vision for Robotics (V4R) lab at the Automation
and Control Institute (ACIN) of the Vienna University of Technology (TUWIEN). Originally
BLORT was designed to provide a toolkit for robotics therefore its full name is Blocks World
Robotic Vision Toolbox.
For a better understanding on this chapter it is required to read 1.3.
The list of papers connected to BLORT:
1. ”BLORT - The Blocks World Robotic Vision Toolbox Best Practice in 3D Perception and
Modeling for Mobile Manipulation” [26]
2. ”Anytimeness Avoids Parameters in Detecting Closed Convex Polygons” [37]
3. ”Basic Object Shape Detection and Tracking using Perceptual Organization” [30]
4. ”Edge Tracking of Textured Objects with a Recursive Particle Filter” [25]
5. ”Taking in Shape: Detection and Tracking of Basic 3D Shapes in a Robotics Context”
[31]
Since there was no ROS package provided the integration had to start at that level. The
system itself is composed of separate works of the above authors but performs reasonably well
integrated together. A positive aspect of BLORT is that it was designed to be used with a single
webcam. This way no extra sensor is required on most robots and still it performs well. Of
course as most scientific software BLORT was also developed indoors without ever leaving the
lab. The step to take with BLORT was to integrate it into a system that runs ROS and tune it so
it will be able to operate in a real robot outside laboratory environment.
For the above objectives to work out the software had to be throughoutly tested while also
discovering those regions where most of the computation is being done. The code had to be
refactored in order to provide more convenient interfaces and also to eliminate bugs such as
tiny memory leaks and other problems coming from incorrect memory usage. Also all the com-
ponents and algorithms used by BLORT had to be inspected and their parameters exposed to
end-users for deploy-time configuration or modified inside for better results.
18
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 27/88
The algorithmic design of BLORT is a sequence of the detector and tracker modules.
initialization ;1
while object not detected or (object detected and confidence < thresholddetector) do2
//Run object detector;3
Extract SIFT features;4
Match extracted SIFTs to codebook using kNN;5
Estimate object pose from matched SIFTs using RANSAC;6
Validate confidence;7
publish object pose for tracker;8
end9
while object tracking confidence is high do10
//Run object tracker;11
Copy the input image and render the textured object into scene to its known location;12
Run colored edge detection on both (input and rendered) image;13
Use a particle filter to match the images around the estimated pose;14
Average particle guesses and compute confidence rate;15
Smooth confidence values (edge, color) to avoid unrealistic fast flashes;16
if confidence > thresholdtracker then17
publish pose of the object ;18
end19
end20
Algorithm 2: BLORT algorithmical overview
19
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 28/88
3.2 CAD Model
CAD models are commonly used in Computer Aided Design softwares mainly by different type
of engineers. These models define simple 3D objects as well as more complex ones.
Object trackers often rely on CAD models of the target object(s) to perform edge detection-
based tracking.
Related articles of BLORT: [26], [25]
MeshLab [11] proved to be a great tool to handle simple objects and generate convex hulls
of complex meshes.
A demonstration video about the process of creating a simple juicebox brick can be found
on the following link: http://www.youtube.com/watch?v=OtduI5MWVag
Figure 3.1: CAD model of a Pringles box in MeshLab
3.3 Rendering
Rendering is commonly known from computer games or scientific visualization. It loads or
generates 3D shapes and (usually) projects them onto a 2D surface - the screen. The OpenGL
and DirectX libraries are often used to utilize the computational power of the GPU (video card)
for rendering tasks through their APIs. Unlike CUDA which is pretty young compared to the
other two these libraries were not designed for scientific computation but they are still being
used for it.
20
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 29/88
(a) Visualizer of TomGine(part of BLORT) (b) Rendering a 3D model onto a camera
image
Figure 3.2: Examples of rendering in case of BLORT
In the case of BLORT rendering is used in the tracker module to validate the actual poseguess. An intuitive description of this step is that the tracker module imagines (renders) how the
object should look like given the current pose guess and validates the guess using a comparison
method.
3.4 Edge detection on color image
To validate a pose guess the tracker module compares the original input image with the one
with the 3D object rendered onto it. Such a comparison can be done several ways. In the caseof object tracking it is considerable to use the edges of the object which can be extracted by
detecting the edges of the image.
The following steps were implemenented using OpenGL Shaders - a technique highly opti-
mized for computing image convolution. The procedure takes an input image I and a convolu-
tion kernel K and outputs O. A simplified definition could be
O[x, y] = I [f (x, y)] ∗ K [g(x, y)] (3.1)
where f (x, y) and g(x, y) are the corresponding indexer functions. The result however is often
required to be normalized. This can be arranged by adding a normalizing factor to Equation
3.1 which is the sum of the factors of multiplication, more concretely the elements of kernel K .
The final formula for convolution should look like the following:
O[x, y] = 1a,b K [a, b]
I [f (x, y)] ∗ K [g(x, y)] (3.2)
21
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 30/88
Figure 3.3: Image convolution example
Steps of image processing in the tracker module of BLORT:
1. A blurring operator is applied to the input image to minimize pointilized error such as
isolated black and white pixels. This is usually an important pre-filtering step for edge-
detection methods. For this purpose a 5x5 Gauss operator was chosen.
K = 1115
·
2 4 5 4 2
4 9 12 9 4
5 12 15 12 5
4 9 12 9 4
2 4 5 4 2
(3.3)
2. Edge detection using a Scharr operator. By applying K x and K y as convolutions to the
input image the corresponding estimated derivatives can be computed.
K x =
1
22 ·
−3 0 3
−10 0 10−3 0 3
(3.4)
K y = 1
22 ·
3 10 3
0 0 0
−3 −10 −3
(3.5)
3. Nonmaxima supression to only keep the strongest edges of the edge detection.
K x =
0 0 0
1 0 1
0 0 0
(3.6)
22
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 31/88
K x =
0 1 0
0 0 0
0 1 0
(3.7)
In this step the above convolutions serve as indicators whether the current pixel is a max-
imal edge compared to its neighborhood. If it is not, the pixel is disposed and an extremal
element is returned (RGB(0,127,128)).
4. Spreading operation to grow the remaining edges from the previous step.
K =
1√ 2
1 1√ 2
1 0 11√ 2
1 1√ 2
(3.8)
This step enlarges the previously determined strongest edges. This step is important to
remove the small errors received from detected false edges.
23
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 32/88
(a) Input image (b) Gaussian blur
(c) Scharr operator (d) Nonmaxima supression
(e) Spreading
Figure 3.4: Steps of image processing
The implementation of the above method is used through an OpenGL shader (which makes
use of the ”paralellizable” nature of image processing techniques) but a pure CPU version us-
ing OpenCV was also implemented during the work of this thesis though they have proven
reasonably slower than the shader version.
24
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 33/88
3.5 Particle filter
As a technique well-based on statistical methods in robotics particle filters are often used for
localization tasks. For object detection tasks it’s utilized for tracking objects in real time.
At its very core, particle filtering is a model estimation technique based on simulation. In
such a system a particle could be called an elementary guess about one possible estimation of
the model while simulation stands for continuously validating and resampling these particles to
adapt the model to new information given by measurements or additional data.
(a) (b)
(c) (d)
Figure 3.5: Particle filter used for localization
Figure 3.5 shows a particle filter used in localization. It is clear that in the initial situ-
ation where no information was given the particles are well-spread around the map. As the
robot moves and uses sensors to measure its environment these particles are beginning to center
around those areas more likely to contain the robot.
The design of particle filters makes it possible to utilize paralell computing techniques in
the implementation such as using multiple processor threads or the graphics card. This is an
important feature which makes this algorithm suitable for real-time tracking.
25
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 34/88
Generate initial particles ;1
while Error > T hreshold do2
Wait for additional information ;3
Calculate normalized particle importance weights ;4
Resample particles based on importance weight to generate new particle set ;5
Calculate Error ;6
end7
Algorithm 3: Particle filter algorithm
The tracker module is using a particle filter to track and refine the pose of an object. One
particle in this specific case holds a pose value which is evaluated by running an edge-detection
based comparation method described in Section 3.4.
Figure 3.6: Particles visualized on the detected object of BLORT. Greens are valid, reds are
invalid particles.
3.6 Feature detection (SIFT)
Image processing is often only the first step to further goals such as image analization or pattern
matching. The term image processing refers to operations done on pixel-level where the infor-
mation gained is also often pixel-level information. The features used here are the individual
pixels. However it is necessary to define features of higher level in order to increase complexity,
robustness or speed or all of the previous at the same time. Though a sucessfully extracted line
in an image is also considered a feature when speaking of feature detection it usually refers tofeature types which are centered around a point. Such feature detectors are for example:
26
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 35/88
• FAST
• SIFT[23]
• SURF[5]
• BRIEF
• ORB [32]
• FREAK
Figure 3.7: Extracted SIFT and ORB feature points of the same scene
The SIFT detector did prove one of the strongest through the literature and existing appli-
cations therefore was chosen to be the main feature detector of BLORT. The SIFTs extracted
from the surface of the current object in the learning stage are saved in a data structure which
will be referred to as codebook or object SIFTs from now on. Later this codebook is used to
match image SIFTs: features extracted from the current image.
SIFT details:
• invariant
– scaling
– orientation
• partially invariant
– affine distortion
– illumination changes
SIFT procedure:
• Image convolved using Laplacian of Gaussian (LoG) filter at different scales (scale pyra-
mid)
• Compute difference between the neighboring filtered images
• Keypoints: local max/min of difference of LoG
27
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 36/88
– Compare to its 8 neighbors on the same scale
– Compare to the 9 corresponding pixels on the neighboring levels
• Keypoint localization
– Problem: too many (unstable) keypoints
– Discard the low contrast points
– Eliminate the weak edge points
• Orientation assignment
– Invariant for rotation
– Each keypoint is assigned one or more orientations from local gradient features
m(x, y) =
(L(x + 1, y) − L(x − 1, y))2 + (L(x, y + 1) − L(x, y − 1))2 (3.9)
φ(x, y) = arctgL(x, y + 1) − L(x, y − 1)
L(x + 1, y) − L(x − 1, y) (3.10)
– Calculate for every pixel in a neighboring region to create an orientation histogram
– Determine dominant orientation based on the histogram
Figure 3.8: SIFT orientation histogram example
On the implementation level the feature detection step is done by utilizing the graphics card
again by using the SiftGPU [36] library to extract image SIFTs. As a part of this thesis work a
ROS wrapper package of this library was also created.
http://ros.org/wiki/siftgpu (Last accessed: 2012.11.05.)
28
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 37/88
3.7 kNN and RANSAC
3.7.1 kNN
k-Nearest Neighbors [12] is an algorithm used for solving classification problems. Given a dis-
tance measure of the data type of the actual dataset it classifies the current element based on the
attributes and classes of its nearest neighbors. It is also often used for clustering tasks.
In BLORT kNN is used to select a fixed size of set (the number k) of features from the
codebook most similar to the feature currently being matched during the detection stage.
3.7.2 RANSAC
The RANSAC[13] algorithm is possibly the most widely used robust estimator in the field of
computer vision. The abbreviation stands for Ran Sample Consensus. RANSAC is an iterative
model estimation algorithm which operates by assuming that the input data set contains outliers
- elements not inside the validation range of the estimated mathematical model and minimizes
the ratio of outlierinlier
. It is a non-deterministic algorithm since a random number generation is used
in the sampling stage.
In BLORT RANSAC is used to estimate the pose of the object using image features (SIFTs
in this case) to initialize the tracker module therefore a RANSAC method can be found in the
detector module.
Figure 3.9: Extracted SIFTs. Red SIFTs are not in the codebook, yellow and green ones are
considered as object points, green ones are inliers of the model and yellow ones are outliers.
29
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 38/88
Data: dataset;
model - whose parameters are needed to be estimated;
n-points-to-sample - number of points to use to give a new estimation ;
max-ransac-trials - maximum number of iterations;
t - a threshold for maximal error when fitting a model ;
n-points-to-match - the number of dataset elements required to set up a valid model ;
η0 - an optional, tolerable error limit
Result: best-model ;
best-inliers;
best-error ;
iterations = 0 ;1
idx = NIL;2
= n− points−to−matchdataset.size
;3
while iterations < max − ransac − trials or (1.0 − n− points−to−match)iterations >= η04
do
idx = random indices from dataset ;5
model = Compute model(idx) ;6
inliers = Get inliers(model, dataset) ;7
if inliers.size >= n − points − to − match then8
error = Compute error(model, dataset, idx) ;9
if error < best − error then10
best-model = model ;11
best-inliers = inliers ;12
best-error = error ;13
end14
end15
increment iterations ;16
end17
Algorithm 4: RANSAC algorithm in BLORT
30
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 39/88
3.8 Implemented application
All implementation were done using ROS 1.4 and C++.
• Training and detecting a Pringles box: http://www.youtube.com/watch?v=
HoMSHIzhERI
• Training and detecting a juicebox: http://www.youtube.com/watch?v=0QVc9x3ZRx8
3.8.1 Learning module
As well as similar applications BLORT also requires a learning stage before any detection could
be done. In order to start the process a CAD model of the object is needed. This model gets
textured during the learning process as well as SIFTs are registered onto surface points of the
model. The software itself is running the tracker module which is able to operate without tex-
ture only based on the pheriperial edges of the object (ie: the outline of the object). The learning
stage is operated manually.
After the operator starts the tracker in an initial pose displayed on the screen the tracker
will follow the object. By the pressing of a single button both texture and SIFT descriptors are
registered for the most dominant face of the object (ie: the one that is the most orthogonal to the
camera). All information captured are used on-the-fly from the moment of recording during the
learning stage. As the tracker gets more information by registering textures to different faces of
the object the task of the operator becomes more convenient. 1
To make this step easier for new users of BLORT demonstrative videos were recorded:
• Training a Pringles container: http://www.youtube.com/watch?v=pp6RwxbUwrI
• Training with a juicebox: http://www.youtube.com/watch?v=Hfg7spaPmY0
3.8.2 Detector module
The detector module unlike its name implies does object detection and pose estimation however
this resulting pose is often not completely precise. The detection stage starts with the extraction
of SIFTs 3.6 then continues with a kNN 3.7.1 method to determine the best matchings from the
codebook then further used by a RANSAC 3.7.2 method approximating the pose of the object.
1Cylindrical objects tend to keep rotating when there is no texture due to the completely symmetric form.
31
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 40/88
Figure 3.10: Detection result
To overcome the inprecision of the feature-based approach and further validate the result an
object tracker is initialized with this pose.
3.8.3 Tracker module
As mentioned before the tracker module is running a particle filter -based 3.5 tracker using 3D
rendering to ”imagine” the result then validating it with edge detection and matching running
on the GPU to gain real-time speed. This step is best summarized in the overview of BLORT
Algorithm 2 at the beginning of this chapter.
Figure 3.11: Tracking result, the object visible in the image is rendered
32
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 41/88
3.8.4 State design pattern for node modes
Although BLORT was designed to provide real-time tracking (tracker module) after feature-
based initialization (detector module) it yields a different possible use-case which is more de-
sirable for this thesis than the original functionality.
When used in an almost stand still scene to determine the pose of an object to be grabbed
tracking provides the refinement and validation of the pose acquired by the detector . By defining
a timeout for the tracker in these cases would result in high resource saving which is important
in a real robot. After the timeout has been passed and the confidence is sufficient the last de-
termined pose can be used. This way for example the robot doesn’t have to run all the costly
algorithms until it reached the table where it needs to grab an object.
The above behaviour however is not always an option therefore it is also required to have a
full-featured tracker which can recover when the object is lost.
Even though it is a launch-time parameter of BLORT the run-time design pattern called
”State”[14, p. 305] brings convenience to the implementation and future use.
tracking
The full-featured version of BLORT. When BLORT is launched in tracking mode it will recover
(or initialize) when needed and track continuously.
Figure 3.12: Diagram of the tracking mode
33
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 42/88
singleshot
When launched in singleshot mode, BLORT will initialize using the detector module then refine
the gained pose by launching the tracker module only when queried for this service through a
ROS service interface. The result of one service call is one pose, or an empty answer if the pose
estimation failed due to unpresent object or bad detection.
Figure 3.13: Diagram of the singleshot mode
34
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 43/88
3.9 Pose estimation results and ways for improvement
The goal of this thesis was to find the way to start with tabletop object grasping on the REEM
robot and to provide an initial solution to it.
Table 4.1 shows detection statistics of BLORT in a few given scenes. The average pose
estimation time was between 3 seconds and 5 seconds.
Since the part which takes most of the CPU time is the RANSAC algorithm inside the de-
tector module it is desirable to decrease the number of extracted SIFT (or any) features.
Most of the failed attempts were caused by matching the bottom or the top of the boxes to
the wall or any untextured surface. In these cases the detector made a mistake by initializing the
tracker with a wrong pose but the tracker was satisfied with it because the edge-based matching
(requiring texture) was perfect. It would be useful to provide a way to block specific faces of
the object in case they are low textured.
3.10 Published software, documentation and tutorials
All software developed for BLORT were published open-source on the ROS wiki and can be
found at the following link:
http://www.ros.org/wiki/perception_blort
It is a ROS stack which consists of 3 packages.
• blort : holds the modified version of the original BLORT sources used as a library.
• blort ros: contains the nodes using the BLORT library, completely separate from it.
• siftgpu: a necessary dependency of the blort package.
The codes are hosted at PAL Robotics’ public github account at https://github.com/
pal-robotics.
35
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 44/88
(a) (b) (c)
(d) (e)
Figure 3.14: Screenshots of the ROS wiki documentation
Links to documentation and tutorials:
• BLORT stack: http://ros.org/wiki/perception_blort
• blort package: http://ros.org/wiki/blort
• blort ros package: http://ros.org/wiki/blort_ros
• siftgpu package: http://ros.org/wiki/siftgpu
• Training tutorial: http://www.ros.org/wiki/blort_ros/Tutorials/Training
• Track and detect tutorialhttp://www.ros.org/wiki/blort_ros/Tutorials/
TrackAndDetect
36
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 45/88
• ”How to tune?” tutorial: http://www.ros.org/wiki/blort_ros/Tutorials/
Tune
3.11 Hardware requirementsBLORT requires an OpenGL supported graphics card with GLSL ¿= 2.0 (OpenGL Shading
Language) for running the paralellized image processing in the tracker module and also in the
detector module by SiftGPU to extract SIFTs fast.
3.12 Future work
• Use a database to store learned object models. This could also be used to interface with
other object detection systems.
• SIFT dependency: Remove the mandatory usage of SiftGPU and SIFT in general. Provide
a way to use different keypoint extractor/detector techniques.
• OpenGL dependency: It would be elegant to have build options which also support CUDA
or non-GPU modes.
37
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 46/88
Chapter 4
Increasing the speed of pose estimation
using image segmentation
4.1 Image segmentation problem
Image processing operators such as feature extractors are usually quite expensive in terms of
computation time therefore it is often benefitially to limit their operating space. Much faster
speed can be achieved by limiting the space of frequently called costly image processing oper-
ators. The question of possible ways arises here.
Trying to copy nature is usually a good way to start in engineering. Let’s think of how our
image processing works. Human perception tries to keep things simple and fast while the brain
only provides a tiny part of itself to do it. The way how our perception works is that most of the
information we receive through our eyes is disposed of by the time it reaches our brains. The
information that actually reaches the brain is based around a certain area of our vision with high
detail called focus point while we only get highly sparse information about other areas. In this
chapter the same approach is followed to increase the speed of image-based systems - in this
case more focused on boosting BLORT .
In order to limit the operating space the input image needs to be segmented. Segmentation
can be done via direct masking by painting the masked regions of the image to some color or
by assigning a matrix of 0’s and 1’s as mask to the image marking the valid and invalid pixels
and carrying this mask along with the image.
In general it requires a priori knowledge to know which areas of the input are interesting for
a specific costly operator. Most of the time it depends on the actual application environment
that is defined by hardware, software, camera and physical environments. The result of the
segmentation is a mask which in the end will be used to indicate which pixels are valid for
38
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 47/88
further analysis and which are invalid. Formally it could be written as
O[i, j] = image operator(I , i , j), where M [i, j] == valid
= extremal element, where M [i, j] == invalid(4.1)
where O is the output image, I is the input image, i, j are current indices, M is the mask while
image operator and extremal element are depending on the current use-case.
As it plays an important role in Computer Vision, image segmentation is a strong tool of
medical imaging, face and iris recognition and agricultural imaging as well as image operator
optimization.
4.2 Segmentation using image processingAfter creating a mask based on a specific method pointilized errors should be eliminated. This
step is done by running an erode operator which is used in image processing. Erode is using a
binary image therefore during this step all pixels of the target color (or class) will be trialed for
survival. Figure 4.1 shows an example and the way erosion trial works on pixel-level.
(a) Original (b) Result (c) Structuring element
Figure 4.1: Example of erosion the black pixel class were eroded
In order to make sure that the mask was not minimized too much a dilate step may be done.
As erode before the dilate operator is working on a binary image but trials all non-target pixels
for survival. Figure 4.2 shows an example and the way dilation trial works on pixel-level.1
1Figures in this section were taken from http://docs.opencv.org/doc/tutorials/imgproc/
erosion_dilatation/erosion_dilatation.html and [10]
39
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 48/88
(a) Original (b) Result (c) Structuring element
Figure 4.2: Example of dilation where the black pixel class were dilated
By combining erode and dilate point-like noise can be eliminated and masking errors can be
fixed in an adaptive way to reduce mask noise. The parameters of the two operators are exposed
to the end-user and they can be tuned in run-time.
4.3 ROS node design
Since the segmentation tasks well defined the same ROS node skeleton can be used to imple-
ment all segmentation methods. This node has two primary input topics: image, and camera
info. The latter here holds the camera parameters and is published by the ROS node responsible
for capturing images. The output topics of the node are: a debug topic which holds information
on the inner working (eg: correlation map), a masked version of the input image and a mask.
For efficiency the node is designed in a way that messages on these topics are only published
when there is at least one node subscribing to them. For this reason the debug topic is usually
empty.
Figure 4.3: ROS node design of segmentation nodes
40
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 49/88
It was mentioned before that the parameters of erode and dilate operators need to be ex-
posed. This is solved through a dynamic reconfigure 2 interface provided by ROS. For each
of erode and dilate the number of iterations and the kernel size can be set. An extra parame-
ter threshold was included because segmentation methods often use at least one thresholding
operator inside.
Figure 4.4: Parameters exposed through dynamic reconfigure
4.4 Stereo disparity-based segmentation
4.4.1 Computing depth information from stereo imaging
(a) Left camera image (b) Right camera image (c) Computed disparity image
(d) Computed depth map that
matches the dimensions of the left
camera image
Figure 4.5: Example of stereo vision
2
http://ros.org/wiki/dynamic_reconfigure
41
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 50/88
In Figure 4.5 the disparity image is computed by matching image patches between the images
captured by the two cameras. Subfigure d in 4.5 shows a depth map with each pixel colored
accordingly to its estimated depth value. Black regions have unknown depth. The major draw-
back of stereo cameras compared to RGB-D sensors is that while depth images acquired from
RGB-D sensors are continuous, stereo systems tend to have holes in the depth map where no
depth information is available. This effect is usually due to the fact that stereo cameras operate
using feature detection and matching while most RGB-D cameras use light-emitting techniques.
Depth map holes are acquired of regions where no features could be extracted because of tex-
turelessness. The texturelessness problem is solved by RGB-D techniques by emitting a light
pattern onto the surface and determining the distortion of these.
4.4.2 Segmentation
After obtaining a depth image it is not enough to create a mask based on the distance values
of single pixels. These masks would reflect the raw result of the segmentation however further
steps could be done to refine them.
Distance-based segmentation is good but not good enough in itself. Even though some parts
of the input image is usually omitted it can still forward too much unwanted information to a
costly image-processing system. Images of experiments are shown in Figure 4.6. Segmentation
steps can be organized in a pipeline fashion so the obtained result is an aggregate of masks
computed using different techniques.
Figure 4.6: Masked, input and disparity images
4.5 Template-based segmentation
4.5.1 Template matching
Template-matching is a common way to start with object detection but rarely yields success as
a standalone solution. It is perfect to search for a subimage in a big image but the matching
often fails when the pattern is from different source.
42
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 51/88
The most straightforward approach to template matching is image correlation. The output of
image correlation is a correlation map which is usually represented as a floating point single-
channel image that is of the same size as the image scanned and where the value of one pixel
holds the result of the image-subimage correlation centered around that position.
OpenCV has a highly optimized implementation for template matching where several dif-
ferent correlation methods can be chosen. 3
(a) Debug image showing the window around the
target
(b)
Template
(c) Masked image (d) Mask
Figure 4.7: Template-based segmentation
4.5.2 Segmentation
Irrelevant regions can be masked by thresholding the correlation map with a certain limit and
using this as the final mask. For tuning conveniency and noise issues erode and dilate operations
can also be used.
3OpenCV documentation: http://docs.opencv.org/modules/imgproc/doc/object_
detection.html?highlight=matchtemplate
43
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 52/88
4.6 Histogram backprojection-based segmentation
4.6.1 Histogram backprojection
Calculating a histogram of an image is a fast operation and can serve with pixel-level statistical
information. Luckily this type of information is also often used to solve pattern matching in
a relatively simple way. It is based on the assumption that similar images or sub-images often
have similar histograms especially when these are normalized.
OpenCV provides an implementation for histogram backprojection where the target his-
togram (the pattern in this case) is backprojected to the scanned image and an ”correlation”
image can be computed. This result image will indicate how well the target and sub-image
histograms are matching therefore a maximum search will find the best matching region.
(a) Input camera image (b) Histogram backprojection result (c) Masked image
Figure 4.8: The process of histogram backprojection-based segmentation
Figure 4.8 shows the results of experiments where texture information is used that was cap-
tured during the training of BLORT. At startup the segmentation node reads the image and
computes its histogram. Later on when an input image is received the node uses the computed
histogram and backprojects it onto the input images histogram.
4.6.2 Segmentation
The noise level of these results are not relevant therefore erode steps are not necessary here but
to enlarge the valid regions of the mask the dilate operator can still be used. The parameters are
- as before - exposed through configuration files.
Experiments have shown that histogram-backprojection works far more precisely and faster than
the pixel correlation-based template matching approach. Figure 4.9 shows an experiment where
the pattern was the orange ball that can be seen in the upper-right corner and the on the left is
an image masked according to the result of the histogram backprojection. Image correlation-
based matching usually fails when using different light conditions than the one the pattern was
44
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 53/88
captured with. It can be seen that histogram-backprojection is more robust to changes in light
conditions.
Figure 4.9: Histogram segmentation using a template of the target orange ball
45
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 54/88
4.7 Combined results with BLORT
By masking the input image of BLORT nodes the overall success rate was increased. The key
of this success was to control the ratio of inliers and outliers inserted into the RANSAC method
inside the detector module. By manipulating the features handled by RANSAC in a way thatthe Inliers
Outliers ratio increases the overall success rate and speed can be enhanced. However this
ratio can not be increased directly but it can be manipulated by decreasing the overall number
of extracted features while trying to keep the ones coming from the object. A good indicator
number is the ratio of Object SIFTs - the features matching the codebook - and All SIFTs ex-
tracted from the image.
(a) Left camera image (b) After stereo segmentation (c) After histogram segmentation
(d) Detector result (e) Tracker result
Figure 4.10: The segmentation process and BLORT
This approach proved useful when BLORT is deployed in a noisy environment. To demon-strate this measurements were taken from a sample of 6 scenes a 100 times each. Table 4.1
shows the effectiveness of each segmentation method averaged from all scenes. The timeout
parameter of BLORT singleshot was set to 120 seconds. It can be seen that the speed and suc-
cess rate of BLORT was dramatically increased by segmenting the input especially when using
different techniques.
46
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 55/88
Method used Extracted features ObjectSIFTs
AllSIFTs Success rate Worst detection time
Nothing 4106 52
4106 14% 101s
Stereo-based 2287 53
2287 41% 64s
Matching-based 3406 32
3406 31% 74s
Histogram-based 1220 50
1220 50% 32s
Stereo+histogram hybrid 600 52
600 82% 20s
Table 4.1: Effect of segmentation on detection
For the test scenes depicted in Figure 4.11 the pose of the object was estimated using an
Augmented Reality marker and its detector.
(a) (b) (c)
(d) (e) (f)
Figure 4.11: Test scenes
4.8 Published software
All software developed for BLORT were published open-source on the ROS wiki and can be
found at the following link:
http://www.ros.org/wiki/pal_vision_segmentation
47
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 56/88
Figure 4.12: Screenshot of the ROS wiki documentation
4.9 Hardware requirements
There are no special hardware requirements for these nodes.
4.10 Future work
Future work on this topic may include the introduction of other pattern-matching techniques or
even new sensors. Also most works marked as ”detectors” in Chapter 2 like LINE-Mod can be
used for segmentation as long as it is reasonable in terms of computation time.
48
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 57/88
Chapter 5
Tracking the hand of the robot
5.1 Hand tracking problemThe previous chapters of this thesis are about estimating the pose of the target objects which
is necessary for grasping but when considering the grasping problem 1.7 in full detail and the
visual servoing problem 1.8 it is necessary to be able to track the robot manipulator - the hand
in this case.
A reasonable approach could be to use a textured CAD model to track the hand but the
design of REEM does not have any textures on the body by default. To overcome this prob-
lem the marker-based approach was selected. Augmented Reality applications already featuremarker-based detectors and trackers therefore it is worthwhile to test them for tracking a robot
manipulator.
49
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 58/88
5.2 AR Toolkit
As its name indicates the Augmented Reality Toolkit [1] was designed to support applications
implementing augmented reality. It is an open-source project widely known and supported. It
provides marker design and software to detect and give a pose estimation of these markers in3D space or to compute the viewpoint of the user.
(a) (b)
Figure 5.1: ARToolkit markers
The functionality is implemented using edge- and corner-detection techniques. A marker is
defined by its black frame while the inside of the frame serves as the identifier of the marker
and as a primary indicator of orientation. Detection speed is increased to that of a usual CPU-
based implementation by the usage of OpenGL and the GPU. Despite being faster than the usual
CPU-based implementations using the GPU can also yield problems when the target platform
does not have such a unit or it is being exclusively used by other components.
The AR Toolkit is already available in ROS wrapper so it is straightforward to integrate it
with a robot running with ROS.
50
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 59/88
(a) Using an ARToolkit marker attached to an
object in the Gazebo simulator to test against
BLORT
(b) ARToolkit marker on the hand of REEM
Figure 5.2: ARToolkit in action
AR Toolkit gives the user freedom by using 4.5.1 for the center of the markers meaning that
the inside of the marker can be customized. This feature yields problems in detection because
pre-defined patterns can be more optimized for speed as well as for detection quality (precision,
success, ambiguity). The minimum size of the printed marker that was still working on REEM
was 7x7cm which does not fit into a smooth design.
ARToolkit was the first library used for tracking the hand of the REEM robot but it soon
turned out that it has mayor problems with light-changes. Because of the above reasons theneed for a different approach emerged.
51
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 60/88
5.3 ESM
The idea of ESM is a completely custom pattern matching technique that stands on solid ground
thanks to a custom minimization method defined specifically for it. Unfortunately it is only able
to track the target but given the circumstances it can achieve high precision while keeping itsadaptiveness to light-source changes.
ESM was tested in the Gazebo simulator and also with a real webcam to follow the company
logo of PAL Robotics. Figure 5.3 shows screenshots of the tracking tests and the target pattern.
(a) Target pattern logo
of PAL Robotics
(b) Used in the Gazebo simulator (c) Used with a webcamera
Figure 5.3: Tests using the ESM ROS wrapper
Unfortunately during previous tests it turned out that the circumstances mentioned before
are really strict in the case of continuity. This means that the tracked marker can only move
tiny differences between two consecutive images therefore a really high frame-rate camera is
required. While the frame-rate would be supported by the cameras of REEM, it is not lucky to
spend tremendous portion of the computation time on capturing images from them.
Another problem is that ESM only provides a homography to the previous pattern which is
in the space of image points not in 3D.
• Web page: [3]
• Article: Benhimane and Malis
• Video: http://www.youtube.com/watch?v=oN3sVTwNCBg
52
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 61/88
5.4 Aruco
Even though the Aruco library is matching most of the Augmented Reality libraries in func-
tionality it is different in implementation and application aspects. The markers used by Aruco
might seem similar to the ones seen previously though they differ in the definition of the innerside of markers such that it is a 5x5 grid made up by black and white rectangles. These patterns
code numbers from 0 to 1024 using a modified Hamming-code which provides error-detection
in code and a tool to measure distance between markers which: Hamming-distance. By know-
ing the distance between markers gives the opportunity to select the most distant markers from
each other to minimize the number of false or uncertain detections.
(a) 300 (b) 582
Figure 5.4: Example markers of Aruco
A significant addition to Aruco opposed to other Augmented Reality libraries is the sup-
port of marker boards allowing users to use several markers defining the same pose bringing
redundancy and with it more robustness and precision to the detection system. By using marker
boards unsuccessful detection of (even several) single markers is no problem.
Numerous techniques are used during the detection process which are configurable. The
technique most promising between these is the one introduced by Otsu which gives adaptive
binary thresholding making the detection more robust to changes in light. They use Otsu’s
thresholding on the grey-converted input image to speed up the process and also to increace
precision. The core of the method is an optimization problem where given an image histogram
the corresponding two, most separable classes have to be found. After the classes have been
found the classic thresholding takes place where the elements belonging to the bottom class
will be turned into black and while the other elements are going to turn white. This procedure
is highly advantegous with the black and white markers used by Aruco.
53
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 62/88
(a) Original image (b) After thresholding (c) Histogram the computed dividing
line
Figure 5.5: Otsu thresholding example
More about ArUco can be found on the web page: [2]
Figure 5.6: Video of Aruco used for visual servoing. Markers are attached to the hand and to
the target object in the Gazebo simulator.
http://www.youtube.com/watch?v=sI2mD9zRRw4
5.5 Application examples
The implemented Aruco ROS package from now on can be used for different robot tasks de-
pending on 3D pose input. Not only the robot hand can be equipped with Aruco markers but
its other parts so robots can locate each other by their marked regions or it can also serve on
self-charger stations helping the robot to execute a safe approach. Since feature-based pose es-
timation will always be slower than the marker-based approach a robot-prepared kitchen could
54
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 63/88
be made by marking all important objects with Aruco markers relevant to their ID in the robots
database. A faster system could be implemented this way.
(a) Aruco in the Gazebo simulator attached to the
hand of the REEM model
(b) Real scene with the real REEM robot and an
Aruco marker on the hand
Figure 5.7: Tests done with Aruco
5.6 Software
Aruco ROS nodes are planned to be released after the verification of the Aruco authors.
5.7 Hardware requirements
There are no special hardware requirements for these nodes.
55
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 64/88
Chapter 6
Experimental results for visual
pre-grasping
6.1 Putting it all together: visual servoing architecture
The design of all components were done by keeping the visual servoing architecture in mind.
Figure 6.1 shows the final implemented structure of the general architecture version presented
in Section 1.8. All the software produced in Chapters 3,4,5 was integrated into this architecture.
Figure 6.1: Putting it all together: visual servoing architecture
Now grasping can be done given a visual servoing controller. At the time of this work the
visual servoing controller was being developed by another intern. This controller was the one
controlling the manipulator of the robot during the experiments.
56
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 65/88
6.2 Tests on the REEM RH2 robot
At the end of my internship at PAL Robotics several experiments were done to validate this ap-
proach integrated with the visual servoing controller. Results showed that the system is capable
of running at a relatively fast speed given that - expect for minimal ones - no deep optimizationwas done. This speed was an average of between 3 seconds and 5 seconds in a cluttered envi-
ronment with the object often partially occluded. All final experiments were repeated with the
Pringles container and the juicebox.
(a) (b) (c)
(d) (e)
Figure 6.2: A perfect result with the juicebox
Some experiments failed because the controller moved the arm to a position where the
marker was no longer visible or detectable. In other cases the depth component of the pose es-
timate was not precise enough causing the manipulator to push the object instead of positioning
it inside the gripper.
57
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 66/88
(a) (b) (c)
(d) (e)
Figure 6.3: An experiment gone wrong
The different launch modes of BLORT (tracking, singleshot) allows for a different use-case.
In the experiment showed in Figure 6.4 I tested how well the tracker behaves with the visual
servoing controller. The last sub-figure shows that even after moving the object when the hand
was almost at the goal the controller moved the hand to the new goal pose.
(a) (b) (c)
(d) (e) (f)
Figure 6.4: An experiment where tracking was tested
58
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 67/88
6.3 Hardware requirements
The hardware requirements for integrated solution is the sum of all the requirements of previ-
ously introduced ROS nodes in Chapters 3,4,5.
Experiments were done on several different machines.
• Desktop
– Intel Xeon Quad-Core E5620 2.4GHz
– 4 GB DDR3 memory
– NVidia GeForce GTX560
– Ubuntu Linux 10.04 Lucid Lynx
• Laptop1:
– Intel Core2Duo 2.2GHz
– 4 GB DDR2 memory
– Intel Graphics Media Accelerator X4500MHD
– Ubuntu Linux 11.04 Natty Narhwal
• Laptop2:
– Intel Core i7 2.6 GHz
– 8 GB DDR2 memory
– NVidia GeForce GT 650M
– Ubuntu Linux 10.04 Lucid Lynx
• Inner computer of the REEM robot:
– Intel Core2Duo 2.2 GHz
– Ubuntu Linux 10.04 Lucid Lynx
59
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 68/88
Chapter 7
Conclusion
7.1 Key resultsThe goals defined at the beginning of the work were all reached by the end of my internship at
PAL Robotics.
First I gathered information about existing object detection techniques and software and
tried to classify them by principal attributes. After the first survey was done I selected the best
candidates for the task and further analyzed them by running demos and tests. I integrated one
chosen software package with the REEM robot and ran experiments on it. In order to increase
the speed of the software parameters had to be finetuned and I also introduced a new way toincrease the speed of the system by segmenting the images. These segmentation nodes were
implemented in a general way so other packages relying on image processing can also benefit
them. As a result REEM is now able to estimate a pose of a trained object in common kitchen
or home scenes.
Given a working pose estimation node only a reliable hand tracker was needed. Using the
information gathered during the survey work and extra advices from Jordi I tested 3 different
packages to see which one is better for tracking the hand of the REEM robot. It turned out thatthe Aruco library is capable of doing this job reasonably fast and accurately. After consulting
with the author of Aruco I created a ROS package for Aruco and used it in experiments to ac-
complish visual pre-grasping poses. REEM is now able to track its own hand using vision and
by using it in a visual servoing architecture it is able to move its hand into a grasping position.
While some tasks such as tracking the hand was easier to solve with existing software the
parts regarding object detection were much harder to deal with. The survey work was re-
ally interesting and I learned a lot about the field in general during those weeks. Choosing
BLORT was the best choice at that time. I consulted several times with fellow MSc students
60
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 69/88
from Universitat Poltiecnica de Catalunya : Institute of Robotics and Industrial Informatics
working on similar tasks for the RoboCup@Home competition (http://www.ai.rug.nl/
robocupathome/(Last accessed: 2012.12.)) only to find out that they also had a hard time
finding an approach fulfilling all of the requirements. They used different software but a mainly
similar approach to solve their task however they had to attach a Kinect sensor to the head of
REEM which in my case was not possible.
7.2 Contribution
Most of the work I did during this thesis was given back to the community. This section sum-
marizes the contribution to this field.
I kept a daily blog of my work which can be accessed here: http://bmagyaratpal.
wordpress.com/ (Last accessed: 2012.12.) It can be useful to anyone working on similar
problems.
All released software can be found in the github repository of PAL Robotics: https:
//github.com/pal-robotics (Last accessed: 2012.12.)).
BLORT links to documentation and tutorials:
• BLORT stack: http://ros.org/wiki/perception_blort
• blort package: http://ros.org/wiki/blort
• blort ros package: http://ros.org/wiki/blort_ros
• siftgpu package: http://ros.org/wiki/siftgpu
• Training tutorial: http://www.ros.org/wiki/blort_ros/Tutorials/Training
• Track and detect tutorialhttp://www.ros.org/wiki/blort_ros/Tutorials/
TrackAndDetect
• ”How to tune?” tutorial: http://www.ros.org/wiki/blort_ros/Tutorials/
Tune
Image segmentation nodes:
• http://www.ros.org/wiki/pal_vision_segmentation
As an additional result, several bugs and suggestions were submitted during the thesis work.
These are:
61
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 70/88
• dynamic reconfigure ROS package bug report
• odufinder ROS package bug report
• BLORT library bug report and implementation details
• ROS wiki bug report
• Gazebo simulator image encoding bug fix
• research on TOD and its current state
• research on WillowGarage ECTO and its current state
• several questions asked and answered athttp://answers.ros.org/ (Last accessed:
2012.12.)
7.3 Future work
As with most software developed for thesis reasons this work could also be further expanded in
several directions.
• One mayor feature could be to provide a more flexible GPU usage with OpenGL, CUDA,
or OpenCL implementations.
• Further decrease the number of features extracted by the detector module of BLORT to
gain speed, increase detection confidence, decrease ambiguity.
• A smart addition would be to block the detection of textureless object faces such as bot-
toms of juice boxes, etc. This was the reason for most of the failed detections.
• The BLORT library itself could be further optimized and refactored to provide a conve-
nient way for future expansion.
• Use a database to store learned object models of BLORT. This could also be used to
interface with other object detection systems.
• Use a database designed for grasping to bring this work forward. The database entries
should have grasping points marked for each object so the robot can grasp it where it is
best.
62
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 71/88
Chapter 8
Bibliography
[1] Ar toolkit. http://www.hitl.washington.edu/artoolkit/. Accessed:
22/08/2012.
[2] Aruco. http://www.uco.es/investiga/grupos/ava/node/26. Accessed:
22/08/2012.
[3] Esm software development kit. http://esm.gforge.inria.fr/. Accessed:
22/08/2012.
[4] Julius Adorf. Object detection and segmentation in cluttered scenes through perception
and manipulation, 2011.
[5] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. 3951:
404–417, 2006.
[6] S. Benhimane and E. Malis. Homography-based 2d visual tracking and servoing. Inter-
national Journal of Robotic Research (Special Issue on Vision and Robotics joint with the
International Journal of Computer Vision), 2007.
[7] S. Benhimane and E. Mallis. Homography-based 2d visual tracking and servoing. Inter-
national Journal of Robotic Research (Special Issue on Vision and Robotics joint with the International Journal of Computer Vision), 2007.
[8] S. Benhimane, A. Ladikos, V. Lepetit, and N. Navab. Linear and quadratic subsets for
template-based tracking. IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, 2007.
[9] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
[10] Dmitry Chetverikov. Basic algorithms of digital image processing, slides of course. ELTE.
63
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 72/88
[11] Paolo Cignoni, Massimiliano Corsini, and Guido Ranzuglia. Meshlab: an open-source 3d
mesh processing system. ERCIM News, 2008(73), 2008. doi: http://ercim-news.ercim.eu/
meshlab-an-open-source-3d-mesh-processing-system.
[12] S.A. Dudani. The distance-weighted k-nearest-neighbor rule. Systems, Man and Cyber-netics, IEEE Transactions on, (4):325–327, 1976.
[13] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for
model fitting with applications to image analysis and automated cartography. Communi-
cations of the ACM , 24(6):381–395, 1981.
[14] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design patterns: ele-
ments of reusable object-oriented software. Addison-Wesley Professional, 1995.
[15] Corey Goldfeder, Matei Ciocarlie, Hao Dang, and Peter K. Allen. The columbia grasp
database. In IEEE Intl. Conf. on Robotics and Automation, 2009.
[16] S. Hinterstoisser, V. Lepetit, S. Ilic, P. Fua, and N. Navab. Dominant orientation templates
for real-time detection of texture-less objects. 2010.
[17] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, and V. Lepetit.
Multimodal templates for real-time detection of texture-less objects in heavily cluttered
scenes. 2011.
[18] S. Holzer, S. Hinterstoisser, S. Ilic, and N. Navab. Distance transform templates for object
detection and pose estimation. 2009.
[19] A. Ladikos, S. Benhimane, and N. Navab. A real-time tracking system combining
template-based and feature-based approaches. In International Conference on Computer
Vision Theory and Applications, 2007.
[20] A. Ladikos, S. Benhimane, M. Appel, and N. Navab. Model-free markerless tracking
for remote support in unknown environments. In International Conference on Computer
Vision Theory and Applications, 2008.
[21] A. Ladikos, S. Benhimane, and N. Navab. High performance model-based object detection
and tracking. In Computer Vision and Computer Graphics. Theory and Applications,
volume 21 of Communications in Computer and Information Science. Springer, 2008.
ISBN 978-3-540-89681-4.
[22] Robert Laganiere. OpenCV 2 Computer Vision Application Programming Cookbook .
Packt Publishing, 2011. ISBN 1849513244.
64
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 73/88
[23] David G. Lowe. Distinctive image features from scale-invariant keypoints.
International Journal of Computer Vision, 60:91–110, 2004. ISSN 0920-
5691. URL http://dx.doi.org/10.1023/B:VISI.0000029664.99615.
94. 10.1023/B:VISI.0000029664.99615.94.
[24] Marius Muja, Radu Bogdan Rusu, Gary Bradski, and David Lowe. Rein - a fast, robust,
scalable recognition infrastructure. In ICRA, Shanghai, China, 09/2011 2011.
[25] T. Morwald, M. Zillich, and M. Vincze. Edge tracking of textured objects with a recur-
sive particle filter. In 19th International Conference on Computer Graphics and Vision
(Graphicon), Moscow, pages 96–103., 2009.
[26] T. Morwald, J. Prankl, A. Richtsfeld, M. Zillich, and M. Vincze. Blort - the blocks world
robotic vision toolbox best practice in 3d perception and modeling for mobile manipula-
tion. in conjunction with ICRA 2010, 2010.
[27] David Nister and Henrik Stewenius. Scalable recognition with a vocabulary tree. In CVPR
- Computer Vision and Pattern Recognition, pages 2161–2168. IEEE, 2006.
[28] N. Otsu. A threshold selection method from gray-level histograms. Automatica, 11(285-
296):23–27, 1975.
[29] Morgan Quigley, Ken Conley, Brian P. Gerkey, Josh Faust, Tully Foote, Jeremy Leibs,
Rob Wheeler, and Andrew Y. Ng. Ros: an open-source robot operating system. In ICRA
Workshop on Open Source Software, 2009.
[30] A. Richtsfeld and M. Vincze. Basic object shape detection and tracking using percep-
tual organization. In International Conference on Advanced Robotics (ICAR), pages 1-6.,
2009.
[31] A. Richtsfeld, T. Morwald, M. Zillich, and M. Vincze. Taking in shape: Detection and
tracking of basic 3d shapes in a robotics context. In Computer Vision Winter Workshop
(CVWW), pages 91–98., 2010.
[32] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient
alternative to sift or surf. In International Conference on Computer Vision, Barcelona,
11/2011 2011.
[33] Radu Bogdan Rusu, Gary Bradski, Romain Thibaux, and John Hsu. Fast 3d recognition
and pose using the viewpoint feature histogram. In Proceedings of the 23rd IEEE/RSJ In-
ternational Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, 10/2010
2010.
[34] Alexander Shishkov and Victor Eruhimov. Textured object detection. URL http://
www.ros.org/wiki/textured_object_detection .
65
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 74/88
[35] Bruno Siciliano, Lorenzo Sciavicco, Luigi Villani, and Giuseppe Oriolo. Robotics: Mod-
elling, Planning and Control. 2009.
[36] Changchang Wu. SiftGPU: A GPU implementation of scale invariant feature transform
(SIFT). http://cs.unc.edu/˜ccwu/siftgpu, 2007.
[37] M. Zillich and M. Vincze. Anytimeness avoids parameters in detecting closed convex
polygons. In The Sixth IEEE Computer Society Workshop on Perceptual Grouping in
Computer Vision (POCV 2008), 2008.
[38] Oliver Zweigle, Rene van de Molengraft, Raffaello d’Andrea, and Kai Haussermann.
Roboearth: connecting robots worldwide. In Proceedings of the 2nd International Con-
ference on Interaction Sciences: Information Technology, Culture and Human, ICIS ’09,
pages 184–191, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-710-3. doi:
10.1145/1655925.1655958. URL http://doi.acm.org/10.1145/1655925.
1655958.
[39] Eric Marchand, Fabien Spindler, and Francois Chaumette. Visp: A generic software plat-
form for visual servoing.
66
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 75/88
Appendix A
Appendix 1: Deep survey tables
These tables were prepared based on the tables in Appendix 2 B. The techniques listed here
were further analyzed and evaluated.
67
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 76/88
T e s t
N a m e
E x p e r i e n c e s
D e t e c t i o n
P o s e
I m p l e m e n t a t i o n
S e n s o r r e q u i r e d
S p e e d d u r i n g t e s t
T e c h n i q u e k e y w o r d s
1
V i S P
M o d e l b a s e d
t r a c k e r
G o o d a n d f a s t b u t w i l l r e -
q u i r e s o m e
f u r t h e r w o r k .
T e n d s t o g e t s t u c k o n s t r o n g
e d g e s .
Y e s
Y e s
V i S P r o s p a c k a g e
m o n o c u l a r
3 0 H z
C A D , e d g e t r a c k i n g
2
R o b o E a r t h
R O S
p a c k a g e s
T h e d e t e c t i o n r a t e i s t o o
p o o r .
2 D
d e t e c t i o n i s a s
g o o d a s 3 D
f o r t e x t u r e d o b -
j e c t s . F o r u n
t e x t u r e d n e i t h e r
w o r k s .
Y e s
Y e s
R O S s t a c k
k i n e c t ( d e t e
c t , s e e m s
c o m p u l s o r y
f o r
r e c o r d i n g ) , m o n o c u -
l a r ( d e t e c t )
1 0 - 1 1 H z
r e c o g n i t i o n , k i n e c t
3
f a s t t e m p l a t e d e t e c t o r
n o n e
Y e s
N o
R O S p a c k a g e
m o n o c u l a r
n o n e
D O T
4
O b j e c t o f d a i l y u s e
fi n d e r
F a l s e d e t e c t i o n i s t o o h i g h ,
t h e c o d e i s
i n c o m p l e t e , u n -
o p t i m i z e d , a n d h a s m e m o r y
l e a k s .
N o
r e l e v a n t o u t p u t
p u b l i s h e d .
Y e s
N o
R O S p a c k a g e
m o n o c u l a r
4 - 6
H z
( r e -
p u b l i s h i n g
t h e
i n p u t i m a g e t o p i c o n
o b j e c t f o u n d
v o c a b u l a r y t r e e , s i f t ,
l o c a l i m a g e r e g i o n s ,
D O T
5
B I G G
d e t e c t o r
+
V H F + R e I n
F e r r a n : d r o
p p e d i t b e c a u s e
o f v e r y h i g h f a l s e d e t e c t i o n
r a t e , s e e m s a l m o s t r a n d o m
Y e s
Y e s
R O S p a c k a g e
s t e r e o ( o r R G B - D )
n o n e
R e I n ,
B i G G ( m o n o c u l a r ) ,
V H F ( p o i n t c l o u d )
6
S t e f a n H i n t e r s t o i s s e r ,
H o l z e r : L I N E M O D
S t i l l w a i t i n g
f o r i t .
Y e s
Y e s
O p e n C V
i m p l e -
m e n t a t i o n ( w o r k
i n p r o g r e s s )
m o n o c u l a r
n o n e
r e l a t e d
t o
D O T ,
L I N E - 2 D , L I N E - 3 D
7
B L O R T
-
B l o c k s
W o r l d
R o b o t i c
V i s i o n T o o l b o x
I t l o o k s p r o m
i s i n g , p r o d u c e s
q u i t e v a l u a b
l e o u t p u t u n l i k e
t h e o t h e r s .
Y e s
Y e s
s t a n d a l o n e n a t i v e
C + + ,
O p e n G L ,
l o t s
o f
l e g a c y
c o d e
m o n o c u l a r
1 0 - 2 0 H Z ( s i f t , g p u )
e d g e t r a c k i n g , s i f t ,
g p u , C A D
8
O b j e c t
r e c o g n i t i o n ,
T O D , E c t o
N o
r e l e v a n t
d o c u m e n -
t a t i o n .
S u m m a r y :
h t t p : / / a n s w e
r s . r o s . o r g / q u e s t i o n / 2 9 3 5 7 / s e a r c h i n g -
f o r - t o d - e n d i n g - u p - a t - e c t o
n o n e
Y e s
. . .
. . .
. . .
. . .
T a b l e A . 1 : F i l t e r e d d e e p s u r v e y p a r t 1
68
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 77/88
T e s t
L i n k
S h o r t d e s c
T y p e o f l e a r n -
i n g
E x t e n d a b l e
h o w
R e l a t e d p a p e r s
1
h t t p : / / w w w .
i r i s a .
f r / l a g a d i c / v i s p /
c o m p u t e r - v i s i o n . h t m l
T
h i s t r a c k e r w o r k s u s i n g C A D m o d e l s ( V R M L f o r m a t )
a n d p r o v i d e s l o c a t i o n a n d p o s e o f t h e f
o l l o w e d o b j e c t . A
t r
a c k e r t r a c k s o n e o b j e c t a t a t i m e .
o f fl i n e
A d d i n g
d i f -
f e r e n t
C A D
m o d e l s
h t t p : / / w w w .
i r
i s a . f r / l a g a d i c /
p u b l i / a l l / a l l
- e n g . h t m l
2
h t t p : / / w w w . r o s . o r g /
w i k i / r o b o e a r t h
R
o b o E a r t h a i m s a t c r e a t i n g a r i c h d a t a b a s e o f k n o w l -
e d g e u s e f u l f o r r o b o t s . T h e y u s e R O
S a s p l a t f o r m o f
t h
e i r w o r k s a n d p r o v i d e u s e f u l p a c k a g e s w i t h w h i c h y o u
c a n d o w n l o a d m o d e l s f r o m t h e d a t a b a
s e o r r e c o r d y o u r
o w n m o d e l s u s i n g a p r i n t a b l e m a r k e r
t e m p l a t e a n d u p -
l o
a d t h e m t o t h e d a t a b a s e . R e l i e s o n t h
e o n l i n e d a t a b a s e .
o n l i n e ( r e c o r d
m o d e
u s -
i n g
p r i n t e d
t e m p l a t e s )
U s i n g
R o b o E a r t h s
d a t a b a s e
n o n e
3
h t t p : / / r o s . o r g /
w i k i / f a s t_
t e m p l a t e_
d e t e c t o r
A
n i m p l e m e n t a t i o n o f D O T b y H o l z e r
w i t h o u t p o s e e s t i -
m
a t i o n
o n l i n e
p r i n t e d
t e m -
p l a t e
s a m e a s D O T
4
h t t p : / / r o s . o r g / w i k i /
o b j e c t s_ o f_
d a i l y_ u s e_
f i n d e r
A
g e n e r a l f r a m e w o r k f o r d e t e c t i n g o b
j e c t , i t ’ s d a t a b a s e
a r e p r e - b u i l t w i t h o f t e n u s e d k i t c h e n o b j e c t s .
o f fl i n e
A d d i n g
n e w
i m a g e s
t o
i m a g e d a t a
f o l d e r , o f fl i n e
t r a i n i n g , 5 H z
6 4 0 x 4 8 0
h t t p : / / w w w . v i
s . u k y . e d u /
˜ d n i s t e r / P u b l
i c a t i o n s / 2 0 0 6 /
V o c T r e e / n i s t e
r_ s t e w e n i u s_
c v p r 2 0 0 6 . p d f
5
h t t p : / / w w w . r o s . o r g /
w i k i / b i g g_
d e t e c t o r
B
i G G s t a n d s f o r : B i n a r y G r a d i e n t G r i d , a f a s t e r i m p l e -
m
e n t a t i o n i s B i G G P y w h e r e t h e m a t c
h i n g a l g o r i t h m a t
t h
e e n d i s c h a n g e d t o a p y r a m i d m a t c h i n g m e t h o d . I n t h e
r e l a t e d p a p e r a c o m b i n a t i o n o f B i G G
a n d V F H i s d o n e
u s i n g R e I n a n d i t y i e l d s r e l i a b l e r e s u l t s
.
o f fl i n e ( m a n u a l l y
s e l e c t e d
3 D
b o u n d i n g b o x
o r s e g m e n t e d
p o i n t
c l o u d )
, m o d e l s c a n
b e s t o r e d
i n
d a t a b a s e
t r a i n i n g
h t t p : / / w w w . w i
l l o w g a r a g e . c o m /
s i t e s / d e f a u l t
/ f i l e s / i c r a 1 1 .
p d f
h t t p : / / w w w . a i s . u n i - b o n n .
d e / ˜ h o l z / s p m e
/ t a l k s / 0 1_
B r a d s k i_
S e m a n
t i c P e r c e p t i o n_
2 0 1 1 . p d f
h t
t p : / / w w w . c s . u b c .
c a / ˜ l o w e / p a p e
r s / 1 1 m u j a . p d f
6
( m e e t i n g
n o t e s :
l o o k
f o r
M a r i a f r o m 2 0 1 1 . 0 7 . ) h t t p :
/ / p r . w i l l o w g a r a g e .
c o m / w i k i /
O p e n C V M e e t i n g N o t e s
C
u r r e n t l y b e i n g r e f a c t o r e d ( o r r e i m
p l e m e n t e d ) i n t o
O
p e n C V b y W i l l o w g a r a g e . T h i s t e c h n
i q u e i s c a p a b l e o f
d e t e c t i n g m u l t i p l y o b j e c t s w h i c h h a d b e
e n l e a r n e d b e f o r e .
” T o d e t e c t a n o b j e c t u n d e r a f u l l c o v e r a g e o f v i e w p o i n t s
( 3 6 0 d e g r e e t i l t r o t a t i o n , 9 0 d e g r e e i n c l i n a t i o n r o t a t i o n
a n d i n - p l a n e r o t a t i o n s o f + / - 8 0 d e g r e
e s , s c a l e c h a n g e s
f r o m 1 . 0 t o 2 . 0 ) , w e u s u a l l y n e e d l e s s t h a n 2 0 0 0 t e m -
p l a t e s . ” W e a k s p o t : M o t i o n b l u r
o n l i n e
P r i n t e d
t e m -
p l a t e
h t t p : / / c a m p a r
.
c s . t u m . e d u / p u
b /
h i n t e r s t o i s s e
r 2 0 1 1 l i n e m o d /
h i n t e r s t o i s s e
r 2 0 1 1 l i n e m o d .
p d f
h t t p : / / c a
m p a r . c s . t u m . e d u /
p u b / h i n t e r s t o
i s s e r 2 0 1 1 p a m i /
h i n t e r s t o i s s e
r 2 0 1 1 p a m i . p d f
7
h t t p : / / w w w . a c i n .
t u w i e n . a c . a t / ? i d = 2 9 0
. . .
o n l i n e
C A D
g o t o l i n k
8
. . .
. . .
. . .
. . .
. . .
T a b l e A . 2 : F i l t e r e d d e e p s u r v e y p a r t 2
69
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 78/88
T e s t
D a t a r e q u i r e d
T e x t u r e d
T e x t u r e l e s s
O u t p u t
V i s u a l i z a t i o n
L i b r a r y u s e d
L i c e n s e
L a s t a c t i v i t y
V i d e o
1
C A D , i n i t i a l p o s e ,
2 D
Y e s
Y e s
p o s e
Y e s
O p e n C V ,
V i S P
G P L ,
B S D
2 0 1 1
h t t p : / /
w w w .
y o u t u b e
. c o m / w a t c h ?
v = U K 1 0 K
M M J F C I
2
3 D ( r e c o r d ,
d e t e c t ) ,
2 D ( d e t e c t )
Y e s
Y e s
m o d e l
( r e c o r d ) ,
m a t c h e d p o i n t
c l o u d ( k i n e c t
d e t e c t ) ,
d e -
t e c t e d
o b j e c t
n a m e , p o s e
Q t
i n t e r f a c e
f o r
d a t a b a s e
a p p s ,
r v i z
f o r
t o p i c s ,
p c l v
i s u a l i z a t i o n
O p e n C V ,
P C L , t h e r e i s
a J a v a d e p a t
o n t o l o g y
B S D ,
G P L ,
L G P L
2 0 1 1
. . .
3
2 D + 3 D
. . .
. . .
. . .
. . .
. . .
L G P L
2 0 1 0
. . .
4
2 D
Y e s ( t r e e s )
N o
b e s t m a t c h i n g
t e m p l a t e
o n
R O S t o p i c
Y e s
O p e n C V
B S D
2 0 1 2
. . .
5
3 D
Y e s
Y e s
P a r t i a l
Y e s
R e I n
B S D
2 0 1 1
. . .
6
2 D , 3 D
N o
Y e s
d e t e c t i o n
r o i
p o s e
Y e s
O p e n C V
B S D
2 0 1 2
. . .
7
2 D
Y e s
N o
p o s e
Y e s
O p e n C V , S i f t -
G P U
B S D
2 0 1 0
. . .
8
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
T a b l e A . 3 : F i l t e r e d d e e p s u r v e y p a r t 3
70
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 79/88
Appendix B
Appendix 2: Shallow survey tables
These tables were prepared to help evaluate the available scientific works related to the grasping
and tabletop manipulation topic. Assumptions here are not final but served as a first level for
the deep survey.
71
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 80/88
N a m e
L i n k
T y p e
T
e c h n i q u e k e y w o r d s
S h o r t d e s c
O b j e c t
o f
d a i l y
u s e
fi n d e r
h t t p : / / r o s . o r g / w i k
i /
o b j e c t s_ o f_
d a i l y_ u
s e_
f i n d e r
h t t p : / / a n s w e r s . r o s
. o r g /
q u e s t i o n / 3 8 8 / h o u s e
h o l d_
o b j e c t s_
d a t a b a s e - n e w - e n t r i e s
F r a m e w o r k
f o r
r e c o g n i -
t i o n
v
o c a b u l a r y t r e e , s i f t ,
l o c a l i m a g e r e g i o n s ,
D
O T
A g e n e r a l f r a m e w o r k f o r d e t e c t i n g o b j e c t , i t ’ s d a t a
b a s e a r e p r e -
b u i l t w i t h o f t e
n u s e d k i t c h e n o b j e c t s .
R o b o E
a r t h
R O S
p a c k -
a g e s
h t t p : / / w w w . r o s . o r g
/ w i k i /
r o b o e a r t h
F r a m e w o r k
f o r
r e c o g n i -
t i o n ,
o b j e c t
m o d e l
c r e -
a t i o n
r
e c o g n i t i o n , k i n e c t ,
R o b o E a r t h a i m s a t c r e a t i n g a r i c h d a t a b a s e o f k n o w
l e d g e u s e f u l
f o r r o b o t s . T h
e y u s e R O S a s p l a t f o r m o f t h e i r w o r k s a n d p r o v i d e
u s e f u l p a c k a g
e s w i t h w h i c h y o u c a n d o w n l o a d m o d e l s f r o m t h e
d a t a b a s e o r r e c o r d y o u r o w n m o d e l s u s i n g a p r i n
t a b l e m a r k e r
t e m p l a t e a n d u p l o a d t h e m t o t h e d a t a b a s e . R e l i e s o n t h e o n l i n e
d a t a b a s e .
B L O R
T
h t t p : / / u s e r s . a c i n .
t u w i e n . a c .
a t / m z i l l i c h / ? s i t e =
4
F r a m e w o r k
f o r s i m p l i fi e d
r e c o g n i t i o n
3
d s i f t , r a n s a c , e d g e s
a
s f e a t u r e s , g p u p r o c ,
p
a r t i c l e fi l t e r i n g
A f r a m e w o r k
f o r r e c o g n i z i n g o b j e c t s t h a t c a n b e
d e s c r i b e d a s
t y p e s o f b l o c k s . T h e f r a m e w o r k i s c a p a b l e o f t r a c
k i n g o b j e c t s ,
l e a r n i n g t e x t u
r e b y fi t t i n g a s i m p l e b l o c k ( c u b o i d
s , c y l i n d e r s ,
c o n e s , s p h e r e
s ) o n a n o b j e c t . A s t r o n g c o n s t r a i n t u s i n g t h i s
f r a m e w o r k i s
t h a t t h e o b j e c t s m u s t b e s i m p l e , n
o i r r e g u l a r l y
f o r m e d o r d e f o r m a b l e o b j e c t i s c o n s i d e r e d . T h e m
e t h o d s e e m s
r o b u s t a g a i n s t o c c l u s i o n a n d b a c k g r o u n d c l u t t e r .
W i l l o w
g a r a g e
E C T O
h t t p : / / r o s . o r g / w i k
i / o b j e c t_
r e c o g n i t i o n
h
t t p : / / e c t o .
w i l l o w g a r a g e . c o m
h t t p s :
/ / g i t h u b . c o m / w g - p e r c e p t i o n /
o b j e c t_ r e c o g n i t i o n
_ r o s_
s e r v e r
– d o e s n ’ t r e a l l y
s e e m s t a b l e , n o
d o c s
F r a m e w o r k
f o r p e r c e p t i o n
-
s e e m s
t o
b e m u c h l i k e
a
g e n e r a l
f r a m e w o r k
f o r
p r o c e s s -
i n g ”
c
e l l s ,
n o n - c y c l i c
g
r a p h ,
t y p e d
e
d g e :
c e l l , o b j e c t -
r
e c o g n i t i o n : b a g o f
f
e a t u r e
r e p r e s e n t a -
t i o n
A n E c t o s y s t e m i s m a d e o f c e l l s w i t h s h a r e d m e
m o r y , w h i c h
f o r m
a n o n - c y c l i c g r a p h .
T h e c o m p u t a t i o n g o e s a l o n g t h e
g r a p h .
A n
E c t o g r a p h c a n b e c o m p i l e d i n t o
a t h r e a d e d
c o d e . E c t o ’ s
q u i t e a b s t r a c t a n d t h e R O S p a c k a g e i s u n d o c -
u m e n t e d . D
e v e l e p m e n t i s a l s o i n b e t a s i n c e A
u g u s t 2 0 1 1 .
M o r e : h t t p : / / e c t o . w i l l o w g a r a g e . c o m / r e l e a s e s /
b l e e d i n g e
d g e / e c t o / m o t i v a t i o n . h t m l
H i n t e r
s t o i s s e r :
D O T
h t t p : / / c a m p a r .
i n . t
u m . d e /
M a i n / S t e f a n H i n t e r s
t o i s s e r
h t t p : / / t w . m y b l o g . y
a h o o . c o m /
s t e v e g i g i j o e / a r t i c
l e ? m i d =
2 7 5 & p r e v = 2 7 7 & n e x t =
2 6 4
– h o w t o
g e t i t w o r k o n l i n u x
A l g o r i t h m
s
i m i l a r t o H o G - b a s e d
r
e p r e s e n t a t i o n , t e m -
p
l a t e m a t c h i n g , l o w
t e x t u r e s o b j e c t s
N e e d s s t r o n g
g r a d i e n t s .
C a n l e a r n 3 D a p p e a r a n c e s u s i n g a
p r i n t e d p a t t e r n w h e r e t o t a r g e t o b j e c t i s p l a c e d . R e a l l y f a s t , u s e s
b i t w i s e o p e r a t o r s . H a s a r e a l l y l o w f a l s e p o s i t i v e r a t e , r o b u s t t o
o c c l u s i o n . T r a c k i n g i s n o t c o n t i n u o u s .
V i S P
M o d e l
b a s e d
t r a c k e r
h t t p : / / w w w .
i r i s a . f
r / l a g a d i c /
v i s p / c o m p u t e r - v i s i o n . h t m l
A l g o r i t h m
C
A D , e d g e t r a c k i n g
T h i s t r a c k e r w
o r k s u s i n g C A D m o d e l s ( V R M L f o r m a t ) a n d p r o -
v i d e s l o c a t i o n
a n d p o s e o f t h e f o l l o w e d o b j e c t . A t r a c k e r t r a c k s
o n e o b j e c t a t a t i m e .
H i n t e r
s t o i s s e r :
V i s i o n
t a r -
g e t e d
C A D
m o d e l s
h t t p : / / c a m p a r .
i n . t u m . d e / C h a i r /
P r o j e c t C o m p u t e r V i s
i o n C A D M o d e l
A l g o r i t h m
n
a t u r a l 3 D
m a r k e r s
(
N 3 M )
T h i s t e c h n i q u
e r e q u i r e s a C A D m o d e l o f t h e t a r g e t o b j e c t a n d
d o e s o f fl i n e t r a i n i n g o n i t t o c h o o s e t h e b e s t N 3 M s t h a t w i l l b e
u s e d d u r i n g t r
a c k i n g .
T a b l e B . 1 : W i d
e s h a l l o w s u r v e y p a r t 1
72
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 81/88
N a m e
S p e e d
E x t e n d a b l e E x
t e n d a b l e h o w
T y p e o f l e a r n i n g
I m p l e m e n t a t i o n
R e l a t e d p a p e r s
D a t a
r e -
q u i r e d
O b j e c
t
o f
d a i l y
u s e
fi n d e r
1 0 F P S
Y e s
A d d i n g
n e w
i m -
a g e s t o i m a g e - d a t a
f o l d e r , o f fl i n e t r a i n -
i n g , 5 H z 6 4 0 x 4 8 0
o f fl i n e
R O S p a c k a g e
h t t
p : / / w w w . v i s . u k y . e d u /
˜ d n
i s t e r / P u b l i c a t i o n s /
2 0 0
6 / V o c T r e e / n i s t e r_
s t e
w e n i u s_ c v p r 2 0 0 6 . p d f
2 D
R o b o E a r t h
R O S
p a c k -
a g e s
Y e s
U s i n g R o b o E a r t h s
d a t a b a s e
o n l i n e
( r e c o r d
m o d e u s i n g p r i n t e d
t e m p l a t e s )
R O S s t a c k
3 D ( r e c o r d ,
d e t e c t ) ,
2 D ( d e t e c t )
B L O R
T
3 0
F P S
o n
G P U
( g e f o r c e
g t x 2 8 5 )
Y e s
H a s l e a r n i n g f e a -
t u r e s b u i l t i n
o n l i n e
n a t i v e
C + + ,
O p e n G L
h t t
p : / / u s e r s . a c i n . t u w i e n .
a c .
a t / m z i l l i c h / f i l e s /
z i l
l i c h 2 0 0 8 a n y t i m e n e s s .
p d f
2 D
W i l l o w g a r a g e
E C T O
d e p e n d s
o n
t h e
s i z e
o f
g r a p h
a n d c o m p u -
t a t i o n a l
c o s t
o f
c e l l s
Y e s
Y o
u c a n b u i l d t h e
g r a p h o f c e l l s a n d
l e t i t w o r k
C + + , ( i n
t h e o r y )
i t ’ s r e a d y t o u s e
w i t h R O S
h t t
p : / / e c t o . w i l l o w g a r a g e .
c o m
/ r e l e a s e s /
b l e
e d i n g e d g e / e c t o /
o v e
r v i e w / c e l l s . h t m l
d e p e n d s
D O T
f o r
R e a l - T i m e
D e t e c
t i o n
1 2
F P S
o n
o r -
d i n a r y
l a p t o p
Y e s
H a s
f a s t
o n l i n e
t r a
i n i n g f e a t u r e s
o n l i n e
( r e c o r d
m o d e u s i n g p r i n t e d
t e m p l a t e s )
n a t i v e ,
o r i g i n a l l y
W i n d o w s
b u t
w o r k s o n l i n u x
h t t
p : / / c a m p a r .
i n . t u m . d e /
p e r
s o n a l / h i n t e r s t / i n d e x /
p r o
j e c t / C V P R 1 0 . p d f
2 D
V i S P
M o d e l
b a s e d t r a c k e
r
Y e s
A d d i n g
d i f f e r e n t
C A
D m o d e l s
n o n e
V i S P r o s p a c k a g e
h t t
p : / / w w w .
i r i s a . f r /
l a g
a d i c / p u b l i / a l l /
a l l
- e n g . h t m l
C A D ,
i n i t i a l
p o s e , 2 D
V i s i o n
t a r -
g e t e d
C A D
m o d e l s
1 5
F P S
( t e s t e d
o n 1 G H z
c e n t r i n o )
Y e s
A d d i n g
d i f f e r e n t
C A
D m o d e l s
o f fl i n e
h t t
p : / / w w w n a v a b .
i n .
t u m . d e / C h a i r /
P u b
l i c a t i o n D e t a i l ? p u b =
h i n
t e r s t o i s s e r 2 0 0 7 N 3 M
C A D , 2 D
T a b l e B . 2 : W i d
e s h a l l o w s u r v e y p a r t 2
73
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 82/88
N a m e
T e x t u r e d
T e x t u r e l e s s
O u t p u t
D e t e c t i o n
P o s e
V i s u a l i z a t i o n L i b r a r y
u s e d
S e n s o r r e q u i r e d
L i c e n
s e L a s t
a c t i v -
i t y
A d d i t i o n a l
c o m m e n t s
O b j e c
t
o f
d a i l y
u s e
fi n d e r
Y e s ( t r e e s )
Y e s ( d o t )
b e s t m a t c h -
i n g t e m p l a t e
Y e s
A v a i l a b l e
O p e n C
V
m o n o c u l a r
B S D
2 0 1 2
b o t h
S I F T
a n d D O T f o r
t e x t u r e s a n d
t e x t u r e l e s s
o b j e c t s .
R o b o E a r t h
R O S
p a c k -
a g e s
Y e s
Y e s
n a m e
a n d
p o s e
Y e s
Y e s
A v a i l a b l e
O p e n C
V ,
P C L
k i n e c t ( d e t e c t ,
r e c o r d ) , m o n o c u l a r
( d e t e c t )
B S D , G P L , L G P L
2 0 1 1
B L O R
T
Y e s
Y e s
p o s e
e s t i -
m a t e
Y e s
Y e s
o w n v i s u a l -
i z a t i o n
u s i n g
O p e n G L
o w n l i b r a r y
m o n o c u l a r
m o d i fi e d
B S D
2 0 1 0
W i l l o w g a r a g e
E C T O
d e p e n d s
d e p e n d s
o u t p u t
o f
c e l l s
d e p e n d s
d e p e n d s
h i g h g u i
o p e n c v ,
r o s , p c
l
a n y
B S D
2 0 1 1
H i n t e r s t o i s s e r :
D O T
N o
Y e s
l o c a t i o n a n d
p o s e ( i f g o t
3 D i n f o )
Y e s
Y e s
i f
g o t 3 D
i n f o
O p e n C
V ,
I n t e l I P P
, E S M
m o n o c u l a r
L G P L
2 0 1 0
V i S P
M o d e l
b a s e d t r a c k e
r
Y e s
Y e s
p o s e
Y e s
Y e s
Y e s
O p e n C
V ,
V i S P
m o n o c u l a r
G P L , B S D
2 0 1 1
h a d t o i n s t a l l
d e p s m a n u -
a l l y
H i n t e r s t o i s s e r :
V i s i o n
t a r -
g e t e d
C A D
m o d e l s
Y e s
Y e s
Y e s
Y e s
Y e s
O p e n C
V ,
E S M
m o n o c u l a r
2 0 0 7
T a b l e B . 3 : W i d
e s h a l l o w s u r v e y p a r t 3
74
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 83/88
N a m e
L i n k
T y p e
T
e c h n i q u e
k e y -
w
o r d s
S h o r t d e s c
R e I n
h t t p : / / w w w . r o s . o
r g / w i k i /
r e i n
F r a m e w o r k
f o r r e c o g n i -
t i o n
n
o d e l e t , c o m p u t a -
t i o n a l g r a p h , m o d -
u
l a r
T h e f r a m e w o r k i s m o d u l a r , c o n s i s t s o f A t t e n t i o
n O p e r a t o r ,
D e t e c t o r , P o
s e e s t i m a t o r , M o d e l fi l t e r m o d u l s d e fi n i n g a
c o m p u t a t i o n a l g r a p h . R e p l a c e d w i t h E c t o .
H i n t e r s t o i s s e r :
L I N E M O D
h t t p : / / c a m p a r .
i n
. t u m . d e /
M a i n / S t e f a n H i n t e
r s t o i s s e r
A l g o r i t h m
D
O T ,
C u r r e n t l y b
e i n g
r e f a c t o r e d
( o r r e i m p l e m e n t e d ) i n t o
O p e n C V b y
W i l l o w g a r a g e . T h i s t e c h n i q u e i s
c a p a b l e o f
d e t e c t i n g m u
l t i p l y o b j e c t s w h i c h h a d b e e n l e a r
n e d b e f o r e .
O p e n C V
B O W -
I m g D
e s c i p -
t o r E x t r a c t o r
h t t p : / / o p e n c v .
i t
s e e z . c o m /
m o d u l e s / f e a t u r e s
2 d / d o c /
o b j e c t_ c a t e g o r i z
a t i o n .
h t m l
h t t p : / / p r .
w i l l o w g a r a g e . c o m
/ w i k i /
O p e n C V M e e t i n g N o t
e s /
M i n u t e s 2 0 1 1 - 1 2 - 0
6
S i m p l e
f r a m e w o r k
d
e s c r i p t o r ( f e a t u r e )
e
x t r a c t i o n , d e s c r i p -
t o r m a t c h i n g , b a g
o
f w o r d s
O p e n C V p r o
v i d e d d e s c r i p t o r a n d m a t c h i n g f r a m e w o r k
J o n g
S e o :
L A R K
S
h t t p : / / w w w . r o s . o
r g / w i k i /
l a r k s
A l g o r i t h m
R
e I n
U s i n g l o c a l l y a d a p t i v e r e g r e s s i o n k e r n e l s ( L A R
K S ) m a k e s
i t e a s y t o r e c
o g n i z e o b j e c t s o f i n t e r e s t f r o m a s i n g l e e x a m -
p l e . I t p r o v i d e s s i m i l a r e f fi c i e n c y a s o t h e r s t a t e o f t h e a r t
t e c h n i q u e s , b u t r e q u i r e s n o t r a i n i n g a t a l l . E m p h a s i s i s o n
g e n e r i c o b j e c t d e t e c t i o n . I t s e e m s m o r e l i k e a n
i n f o r m a t i o n
r e t r i e v a l t o o l
B I G G
d e t e c -
t o r
h t t p : / / w w w . r o s . o
r g / w i k i /
b i g g_
d e t e c t o r
A l g o r i t h m
R
e I n , B i G G , V H F
B i G G s t a n d s
f o r : B i n a r y G r a d i e n t G r i d , a f a s t e r i m p l e m e n -
t a t i o n i s B i G
G P y w h e r e t h e m a t c h i n g a l g o r i t h m a t t h e e n d
i s c h a n g e d t o a p y r a m i d m a t c h i n g m e t h o d . I n
t h e r e l a t e d
p a p e r a c o m b i n a t i o n o f B i G G a n d V F H i s d o n e
u s i n g R e I n
a n d i t y i e l d s
r e l i a b l e r e s u l t s .
O R B
A l g o r i t h m
f
e a t u r e e x t r a c t i o n
N o s c a l e i n v a r i a n c e r i g h t n o w . C o u l d b e u s e d
w i t h B O W -
I m a g e D e s c r i p t o r E x t r a c t o r
t e x t u r e d
o b j e c t
d e t e c t i o n
h t t p : / / r o s . o r g / w
i k i /
t e x t u r e d_ o b j e c t_
d e t e c t i o n
A l g o r i t h m
d
e t e c t i o n , t r a i n i n g ,
T
O D
T r a i n i n g n e e
d s p r e - g i v e n p i c t u r e s a n d p o i n t c l
o u d fi l e s o f
e a c h o b j e c t .
M a k e s u s e o f r o s b a g t o t r a i n .
s t e r e o
o b j e c t
r e c o g n i t i o n
h t t p : / / r o s . o r g / w
i k i /
s t e r e o_ o b j e c t_ r e
c o g n i t i o n
C l a s s f r a m e -
w o r k f o r 3 D
N o d o c u m e n t a t i o n .
U s e d b y t e x t u r e d o b j e
c t d e t e c t o r .
S e e m s d e a d ,
n o n e w p a p e r s l i s t e d s i n c e 2 0 0 9 .
T a b l e B . 4 : W i d
e s h a l l o w s u r v e y p a r t 4
75
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 84/88
N a m e
S p e e d
E x t e n d a b l e E x
t e n d a b l e h o w
T y p e o f l e a r n i n g
I m p l e m e n t a t i o n
R e l a t e d p a p e r s
D a t a
r e -
q u i r e d
R e I n
d e p e n d s
Y e s
i m
p l e m e n t i n g R e I n
c o m p o n e n t s
a n d
p l u g g i n g
t h e m
t o g e t h e r
R O S p a c k a g e
h t t
p : / / w w w . w i l l o w g a r a g e .
c o m
/ s i t e s / d e f a u l t /
f i l
e s / i c r a 1 1 . p d f
h t t p :
/ / w
w w . c s . u b c . c a / ˜ l o w e /
p a p
e r s / 1 1 m u j a . p d f
2 D , 3 D ( o p t i o n a l )
H i n t e r s t o i s s e r :
L I N E M O D
1 0 F P S
Y e s
P r i n t e d t e m p l a t e
o n l i n e
O p e n C V
i m p l e -
m e n t a t i o n ( w o r k i n
p r o g r e s s )
h t t
p : / / c a m p a r .
c s .
t u m . e d u / p u b /
h i n
t e r s t o i s s e r 2 0 1 1 l i n e m o d /
h i n
t e r s t o i s s e r 2 0 1 1 l i n e m o d .
p d f
2 D , 3 D
O p e n C V
B O W -
I m g D
e s c i p -
t o r E x t r a c t o r
d e p e n d s
Y e s
I m
p l e m e n t i n g
m o r e
e x t r a c t o r s
a n d m a t c h e r s .
i m p l e m e n t e d
i n
O p e n C V
2 D , 3 D
H a e
J o n g
S e o : L A R K
S
n o i n f o
Y e s
o n
e e x a m p l e a t a
t i m
e
w i t h o u t
R O S p a c k a g e
h t t
p : / / w w w . s o e .
u c s
c . e d u / ˜ m i l a n f a r /
p u b
l i c a t i o n s / j o u r n a l /
T r a
i n i n g F r e e D e t e c t i o n_
F i n
a l . p d f
2 D
B I G G
d e t e c -
t o r
n o i n f o
Y e s
t r a
i n i n g
o f fl i n e ( b o u n d i n g
b o x )
R O S p a c k a g e
h t t
p : / / w w w . w i l l o w g a r a g e .
c o m
/ s i t e s / d e f a u l t / f i l e s /
i c r
a 1 1 . p d f
3 D
O R B
3 0 F P S
O p e n C V
i m p l e -
m e n t a t i o n a v a i l a b l e
h t t
p s : / / w i l l o w g a r a g e . c o m /
s i t
e s / d e f a u l t / f i l e s / o r b_
f i n
a l . p d f
2 D
t e x t u r e d
o b j e c t
d e t e c t i o n
Y e s
t r a
i n i n g
o f fl i n e , n e e d s
m a n -
u a l i n t e r a c t i o n
R O S p a c k a g e
3 D
s t e r e o
o b j e c t
r e c o g n i t i o n
o f fl i n e , d a t a b a s e
R O S p a c k a g e
3 D
T a b l e B . 5 : W i d
e s h a l l o w s u r v e y p a r t 5
76
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 85/88
N a m e
T e x t u r e d
T e x t u r e l e s s
O u t p u t
D e t e c t i o n
P o s e
V i s u a l i z a t i o n L i b r a r y
u s e d
S e n s o r r e q u i r e d
L i c e n
s e L a s t
a c t i v -
i t y
A d d i t i o n a l
c o m m e n t s
R e I n
Y e s
Y e s
d e t e c t i o n s ,
r o i - s
s u p p o s e d t o
s u p p o s e d
t o
Y e s
O p e n C
V ,
P C L
m o n o c u l a r ,
s t e r e o ( r e q u i r e d
f o r p o s e )
B S D
2 0 1 0
S t e f a n
H i n -
t e r s t o i s s e r :
L I N E M O D
N o
Y e s
d e t e c t i o n ,
r o i , p o s e
Y e s
Y e s
Y e s
O p e n C
V
m o n o c u l a r
B S D
2 0 1 2
O p e n C V
B O W -
I m g D
e s c i p -
t o r E x t r a c t o r
d e p e n d s
d e p e n d s
N o
N o
O p e n C
V
B S D
2 0 1 1
H a e
J o n g
S e o : L A R K
S
Y e s
Y e s
Y e s
Y e s
R e I n
m o n o c u l a r
B S D
2 0 1 0
B I G G
d e t e c -
t o r
Y e s
Y e s
Y e s
Y e s
Y e s
R e I n
s t e r e o
B S D
2 0 1 1
O R B
Y e s
N o
Y e s
Y e s
O p e n C
V
m o n o c u l a r
B S D
2 0 1 1
t e x t u r e d
o b j e c t
d e t e c t i o n
Y e s
N o
Y e s
Y e s
O p e n C
V
s t e r e o
B S D
2 0 1 0 -
2 0 1 1
s t e r e o
o b j e c t
r e c o g n i t i o n
s t e r e o
B S D
2 0 1 0
T a b l e B . 6 : W i d
e s h a l l o w s u r v e y p a r t 6
77
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 86/88
N a m e
L i n k
T y p e
T
e c h n i q u e
k e y -
w
o r d s
S h o r t d e s c
V i e w p o i n t
F e a t u r e
H i s t o g r a m
c l u s t e
r
c l a s s i fi e r
h t t p : / / r o s . o r g / w
i k i / v f h_
c l u s t e r_ c l a s s i f i
e r
h t t p :
/ / w w w . p o i n t c l o u d
s . o r g /
d o c u m e n t a t i o n / t u
t o r i a l s /
v f h_ r e c o g n i t i o n .
p h p
A l g o r i t h m
P
F H ,
d e t e c t i o n ,
p
o s e
e s t i m a t i o n ,
t a b l e t o p
m a n i p -
u
l a t i o n ,
m o b i l e ,
K
N N ,
k d - t r e e s
t o F L A N N , p o i n t
c
l o u d
D e s i g n e d s p e c i fi c a l l y f o r t a b l e t o p m a n i p u l a t i o
n s w i t h o n e
r o b o t h a n d . W o r k s w i t h c l u s t e r s o f p o i n t s , d o e s
r e c o g n i t i o n
a n d p o s e e s t i m a t i o n b y d e fi n i n g m e t a - l o c a l d e s c r i p t o r s . N o
r e fl e c t i v e o r t r a n s p a r e n t o b j e c t s .
H o l z e
r
a n d
H i n t e r -
s t o i s s e r :
D i s t a n c e
t r a n s f o r m
t e m p l a t e s
A l g o r i t h m
d
i s t a n c e t r a n s f o r m ,
e
d g e
b a s e d
t e m -
p
l a t e s ,
t e m p l a t e
m
a t c h i n g ,
f e r n s ,
L
u c a s - K a n a d e
U s e s t h e F e r n s c l a s s i fi e r o n e x t r a c t e d t e m p l a t e s f r o m a n
o b j e c t . A p p l i e s d i s t a n c e t r a n s f o r m o n i m a g e s a n d m a t c h e s
t e m p l a t e s i n
a L u c a s - K a n a d e - i a n w a y . T e m p l a
t e s a r e n o r -
m a l i z e d , a n d c i r c l e - i z e d c o n t o u r p a t c h e s . E
d g e b a s e d ,
n e e d s c l o s e d
c o n t o u r s . S c a l e s fi n e . C l a i m s t o b e b e t t e r
t h a n N 3 M s , R A N S A C , F e r n s .
f a s t t e
m p l a t e
d e t e c t o r
h t t p : / / r o s . o r g / w
i k i / f a s t_
t e m p l a t e_
d e t e c t o
r
A l g o r i t h m
D
O T
A n i m p l e m e n t a t i o n o f D O T b y H o l z e r
d e f o r m a b l e
o b j e c t s
d e t e c t o r
h t t p : / / r o s . o r g / w
i k i / d p m_
d e t e c t o r
t a b l e t o p o b -
j e c t p e r c e p -
t i o n
h t t p : / / w w w . r o s . o
r g / w i k i /
f a s t_ p l a n e_
d e t e c
t i o n
h t t p : / / w w w . r o s . o
r g / w i k i /
t a b l e t o p_ o b j e c t_
d e t e c t o r
P i p e l i n e
t a b l e t o p
p e r -
c
e p t i o n ,
p l a n e
d
e t e c t i o n ,
o b j e c t
d
e t e c t i o n
W a s u s e d w i t h P R 2 f o r t a b l e t o p m a n i p u l a t i o n
O p e n R A V E
h t t p : / / o p e n r a v e .
p r o g r a m m i n g v i s i o
n . c o m /
e n / m a i n / i n d e x . h t
m l
F r a m e w o r k
m
o t i o n p l a n n i n g
O l d s o u r c e s
c a n b e m i s l e a d i n g , O p e n R A V E o n l y c o n c e n -
t r a t e s o n m o t i o n p l a n n i n g n o w .
e c t o :
o b j e c t
r e c o g n i t i o n
h t t p : / / e c t o . w i l l
o w g a r a g e .
c o m / r e c o g n i t i o n /
r e l e a s e / l a t e s t / o
b j e c t_
r e c o g n i t i o n / i n d e
x . h t m l
A c o l l e c t i o n
o f e c t o c e l l s t h a t c a n b e u s e d f o r o b j e c t r e c o g -
n i t i o n t a s k s .
B O R 3
D
h t t p : / / s o u r c e f o r
g e . n e t /
p r o j e c t s / b o r 3 d /
F r a m e w o r k
O
b j e c t r e c o g n i t i o n
i n 3 D d a t a
W o r k i n p r o g r e s s
T a b l e B . 7 : W i d
e s h a l l o w s u r v e y p a r t 7
78
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 87/88
N a m e
S p e e d
E x t e n d a b l e E x
t e n d a b l e h o w
T y p e o f l e a r n i n g
I m p l e m e n t a t i o n
R e l a t e d p a p e r s
D a t a
r e -
q u i r e d
V i e w p o i n t
F e a t u r e
H i s t o g r a m
c l u s t e
r
c l a s s i fi e r
w e l l
b e l o w 3 0
F P S
Y e s
t r a
i n i n g
o f fl i n e
R O S - P C L p a c k a g e
h t t
p : / / w w w .
w i l
l o w g a r a g e . c o m / p a p e r s /
f a s
t - 3 d - r e c o g n i t i o n - a n d - p
o s e - u s i n g - v i e w p o i n t - f e
3 D
H o l z e
r
a n d
H i n t e r -
s t o i s s e r :
D i s t a n c e
t r a n s f o r m
t e m p l a t e s
6 F P S
Y e s
t r a
i n i n g
o f fl i n e
h t t
p : / / a r .
i n . t u m . d e / p u b /
h o l
z e r s t 2 0 0 9 d i s t a n c e t e m p l a t e s /
h o l
z e r s t 2 0 0 9 d i s t a n c e t e m p l a t e s .
p d f
2 D
f a s t t e
m p l a t e
d e t e c t o r
R O S p a c k a g e
s a m
e a s D O T
d e f o r m a b l e
o b j e c t s
d e t e c t o r
t a b l e t o p o b -
j e c t p e r c e p -
t i o n
R O S s t a c k
O p e n R A V E
e c t o :
o b j e c t
r e c o g n i t i o n
E c t o p a c k a g e
B O R 3
D
T a b l e B . 8 : W i d
e s h a l l o w s u r v e y p a r t 8
79
7/24/2019 Object Recognition on the REEM robot
http://slidepdf.com/reader/full/object-recognition-on-the-reem-robot 88/88
T e x t u r e l e s s
O u t p u t
D e t e c t i o n
P o s e
V i s u a l i z a t i o n L i b
r a r y
u s e
d
S e n s o r r e q u i r e d
L i c e n s e L a s t
a c t i v -
i t y
A d d i t i o n a l
c o m m e n t s
Y e s
Y e s
Y e s
P C L ,
O p e n C V
s t e r e o
B S
D
2 0 1 1
Y e s
Y e s
Y e s
m o n o c u l a r
2 0 0 9
L G
P L
2 0 1 0
R G B - D
B S
D
2 0 1 0
L G
P L
2 0 1 1
B S
D
2 0 1 2
G P L
2 0 1 2
U n d e r w o r k
t o r e l e a s e
T a b l e B . 9 : W i d
e s h a l l o w s u r v e y p a r t 9