3D Computer Vision · 2013-07-29 · “The vSLAM algorithm for robust localization and mapping,”...
Transcript of 3D Computer Vision · 2013-07-29 · “The vSLAM algorithm for robust localization and mapping,”...
3D Computer Vision
F. Tombari, S. Salti
F.Tombari, S. Salti
Introduction
3D sensors
Data representations
Differential entities and operators
Hands on session – Point Cloud Library (PCL) PCL installation
Load and visualize a point cloud
Compute normals
Summary – Day 1
Introduction
F. Tombari, S. Salti
3D Computer Vision
F.Tombari, S. Salti
3D Computer Vision
Reconstruction
Recognition
Acquisition
F.Tombari, S. Salti
Autonomous mobile robots – AMR (navigation)
Object recognition, grasping and manipulation (social robotics)
Applications - robotics
F.Tombari, S. Salti
Applications - video surveillance
Tracking and motion detection
People counting
Retail intelligence Crowd monitoring
Behavior analysis
F.Tombari, S. Salti
Applications – semantic segmentation
Urban data classification
Trimble Code Sprint
F.Tombari, S. Salti
Shape retrieval (www)
High-def 3D model acquisition (computer graphics)
Biometrical systems (eg. face recognition)
Medical imaging (MRI, CT, PET, x-ray, ultrasound, ..)
Other applications
Google warehouse 3D medical imaging
3D face recognition
Michelangelo project
F.Tombari, S. Salti
Autonomous vehicle navigation (AVN)
Augmented reality
Human computer interaction (HRI)
Videogaming, entertainment
..
And yet..
Augmented reality by Lego and Intel Microsoft Xbox
Autonomous Vehicle Navigation
F.Tombari, S. Salti
Reconstruction 3D registration
SLAM
Meshing
Recognition Object recognition under clutter and occlusions
Shape retrieval/categorization
People/face/obstacle/.. detection
Tracking Body pose estimation
People tracking/counting
Semantic segmentation
Typical 3D tasks
F.Tombari, S. Salti
Alignment of partially-overlapping 2.5D views
Useful to yield a high-def, fully-3D reconstruction of an object from views acquired from different view points
3D registration (1)
F.Tombari, S. Salti
Coarse registration provides an initial guess for the set of views that need to be registered
Fine registration (Iterative Closest Points - ICP [Besl 92])
3D registration (2)
Coarse registration
Fine registration
Unordered Input Views
Coarse Registration Fine Registration
F.Tombari, S. Salti
Iterative Closest Point
Iterative method to align two free-form shapes
Input: two sets of 3D points M,S
Output: the 6DOF transformation (R,t) that best aligning the two point sets
Given an initial transform, iterate until convergence
∀𝑝 ∈ 𝑀 find its Nearest Neighbor 𝑁𝑁(𝑝) ∈ 𝑆
Absolute orientation [Horn 87][Arun 87]
find R,t that minimize the mean square error between the set of point pairs (p, NN(p) )
𝑁𝑁 𝑝 − 𝑅 ∙ 𝑝 + 𝑡 2
𝑝∈𝑀
• T: distance between centroids
• R: least-square estimation on the over-determined system represented by the set of pairs
Convergence criteria:
Threshold on minimum error
Maximum number of iterations
If the initial guess is not good, different initial transforms ought to be tested to avoid local minima
Efficient versions available (GPU-ICP, [Rusinkiewicz 01])
Generalized ICP [Segal 09]
F.Tombari, S. Salti
MultiView Reconstruction
0.5
0.0
0.3
0.7
0.6 0.2 0.0
0.1
0.5 0.4
0.7
0.6
0.5 0.4
F.Tombari, S. Salti
Results
Spacetime Stereo [Zhang03, Davis05]
Kinect sensor
F.Tombari, S. Salti
Simultaneous Localization and Mapping incrementally build a map of the agent’s surroundings (mapping)
Localize itself within that map
Odometry, inertial sensing Measurement drifts
Visual odometry [Nistèr 04] [Konolige 06]
3D / photometric sensors Laser scanner
Sonar
Stereo [Sim 06]
Visual sensors (vSLAM) [Karlsson 05][Folkesson 05]
• Landmark initialization?
6DOF SLAM
monoSLAM [Davison 03] [Eade 06][Clemente 07] Visual odometry + single camera
SLAM (1)
Credits: J.B.Hayet
F.Tombari, S. Salti
SLAM (2)
Extended Kalman Filter
Landmark extraction
Geometric/photometric data
Odometry
Data association
Landmarks: • Re-observable • Distinctive • Stationary
EKF: • Update via odometry • Update via landmark re-observation
Mapping update Local vs. Global consistency
(loop closure, bundle adjustment)
F.Tombari, S. Salti
MonoSLAM converging to Structure-from-Motion [Strasdat 10] • PTAM [Klein 07], DTAM [Newcombe 11])
6DOF SLAM with RGB-D sensors
• Kinect Fusion [Newcombe 11b]
• RGB-D dense point cloud mapping [Henry 11]
SLAM (3)
F.Tombari, S. Salti
Determine the presence of a model in a scene
Estimate its 6DOF pose
Challenges: Clutter
Occlusions
Point density variations
Model library size (efficiency)
Multi-instance
Usually only rigid transformations are assumed (rotation, translation, scale)
Object recognition
Syntethic data
Spacetime Stereo
Real-time stereo
F.Tombari, S. Salti
Challenges: Intra-class variations
Invariance wrt. a high number of transformations including non-rigid deformations (eg. isometries)
Clutter and occlusions are not present
General approach Compute a compact representation for a query
Compare it with all object in the library
Retrieve the most similar ones - retrieval
(Assign a label to the query - categorization)
Shape retrieval and categorization
…
…
…
…
…
Vehicle
Animal
Household
Building
Furniture
Princeton Shape Benchmark (PSB) dataset («coarse2» categories)
SHREC 10 dataset
F.Tombari, S. Salti
Determine 3D connected components with specific properties or belonging to a particular category
Feature extraction
Description of 3D keypoints
Description of clusters [Lloyd 82], such as size, density, eigenvalues of the scatter matrix, ..
Feature classification
SVM, Random Trees, kNN, ..
Semantic segmentation (1)
F.Tombari, S. Salti
Inference on a loopy graph [Tombari 11]
An undirected graph is built over classified 3D features [Unnikrishnan 08]
The following function is maximized over the graph
where
l is a regularizer, n the classification probability
Other approaches based on graph inference are those relying on Associative Markov Networks (AMN) [Anguelov 05][Triebel 07][Munoz 09]
𝑃 𝑋 =1
𝑍 𝑒−𝜙𝑖 𝑥𝑖
𝑖∈𝑆
𝑒−𝜙𝑖,𝑗 𝑥𝑖,𝑥𝑗
𝑗∈𝑁 𝑖𝑖∈𝑆
𝜙𝑖 𝑥𝑖 = 𝜆 1 − 𝜈 𝑥𝑖 evidence (unary)
compatibility (pairwise)
𝜙𝑖,𝑗 𝑥𝑖 , 𝑥𝑗 =
0 𝑖𝑓 𝑥𝑖 = 𝑥𝑗
𝑒
− 𝑝𝑖−𝑝𝑗 2𝜎𝑐 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Semantic segmentation (2)
3D Sensors
F. Tombari, S. Salti
3D Computer Vision
F.Tombari, S. Salti
3D sensors
Goal: create a point cloud of (samples from) the surface of an object/scene Collection of distance measurements from the sensor to the surface
Distances are them transformed into 3D coordinates (x,y,z) by means of calibration information
Usually, 3D sensors (or 3D scanners) acquire only a view of the object (2.5D data)
Some sensors also acquire information concerning color or light intensity (RGB-D data)
First step of every 3D reconstruction / 3D recognition pipeline
Contact sensors
Active sensors LIDAR, rangefinders
Time-of-Flight cameras
Laser Triangulation
Structured light
Medical imaging (CT, MRI)
..
Passive sensors Stereo
Structure-from-motion
Shape-from-shading, shape-from-silhouette, shape-from-defocus, ..
F.Tombari, S. Salti
LIDAR
LIDAR: Light Detection And Ranging A light pulse is emitted from the sensor and the round-trip time is computed
The higher the time, the further away the point from the sensor
Usually visible or near-infrared light is used Arrays of emitters are employed together to yield a set of simultaneous range measureaments (3D slice) Slices can be swept using a motor to yield a slice array Pros:
High range (hundred meters / kms) Works indoor/outdoor Real-time (eg. 100 Hz slices)
Cons Accuracy is an issue due to speed of light (3 10^8 m/s -> 1 mm every 3.3 ps) Cost Color/intensity information is usually not provided
Velodyne
𝑑 = 𝑐𝑡
2
SICK LMS500
F.Tombari, S. Salti
Time-of-Flight camera
A particular LIDAR device yielding a full 2D array of range measurements
A light pulse is emitted through an infrared illuminator; then: Phase-shift measurament of the returning light pulse on each pixel (Photonic Mixed devies, Canesta Vision (now Microsoft), Swiss Ranger )
«range-gated imager»: each pixel has a shutter that starts closing when the light pulse is emitted; the less light received, the further away the point from the sensor (Zcam by 3DV Systems, now Microsoft)
Pros No motor needs to be employed
Real-time (30-100 fps)
Cost effective
Cons Low resolution
Dark, non-reflective objects are hard to be acquired
Hardly works under sunlight (no outdoors)
Multiple reflections can yield false measurements
Interference between different sensors MESA SwissRanger 4000
F.Tombari, S. Salti
Laser triangulation
Laser + camera system A laser dot or stripe is emitted on the scene
The camera locates the dot/stripe.
The distance is determined via triangulation of the position of the dot (emitting angle, receiving angle and baseline need to be known)
Pros Accuracy (tens of micrometers)
Cons Limited range
Slow scanning time (often requires static scene)
$$
No color information, needs to be paired with a color camera
LASER O b
a
d
Minolta Vivid 9I
b
F.Tombari, S. Salti
Structured light
Camera + projector system
Similar to laser triangulation, where the laser stripe is replaced by a stripe projected by a light projector
Projecting a set of stripes (2D pattern) allows for multiple sampling, hence a full 2D range image can be acquired at once (but problem of confusing different fringes)
Using infrared projection and two cameras (one in infrared, one in the visible band) yields accurate RGB-D data
Pros:
Relatively cheap
Sub-millimiter accuracy (down to tens of micrometers)
Real-time
Cons:
Limited range
Hardly usable outdoor or in presence of other light sources
Highly dependent from the object surface characteristics (eg. reflective, translucent, ..)
Interference between different sensors
Microsoft Kinect
F.Tombari, S. Salti
Stereo vision
Two (or more) cameras
Cameras have to be sync-ed, especially in presence of non-static scenes
Depth is retrieved via triangulation of the point projections on the two views
Correspondence problem!
Pros:
Cheap
Passive
Real-time
Color/intensity can be directly associated to range data (RGB-D)
Cons
Low accuracy (tend to fail on low-textured regions, repetitive patterns and depth borders)
A projector can help adding texture to the scene to improve accuracy on low-textured regions
Videre Design
P
OL OR
pL pR
p’R
P’
pL pR
F.Tombari, S. Salti
Stereo using spatial and temporal information[Davis 05][Zhang 03]
To gather information, the appearance must change over time (but not the geometry!)
A random pattern is projected to augment each frame with a different texture (no structured light, no interference)
More accurate than standard stereo, but depth must be constant in time (static objects)
Joint spatio-temporal window
Spacetime stereo
F.Tombari, S. Salti
Structure-from-motion
Monocular system
Instead of spatially extending multiple views, they are temporally extended
Requires either the surface or the camera to move
Tracking and matching features
Pros
Cheap and simple hardware (only one camera needed)
RGB-D data
Solving SfM also yields camera pose at each time instant
Cons
Highly dependent from the available motion that the object/camera can undergo
Sparse depth information
F.Tombari, S. Salti
[Horn 87] B. Horn, “Closed-form solution of absolute orientation using unit quaternions”, J. Optical Society of America A, Vol.4, No.4, pp.629–642, 1987
[Arun 87] K.S. Arun, T.S. Huang, S.D. Blostein, “Least-squares fitting of two 3-D point sets”, IEEE Trans Pattern Anal Machine Intell 9:698–700, 1987
[Rusinkiewicz 01] S. Rusinkiewicz, M. Levoy, «Efficient variants of the ICP algorithm», Proc. Int. Conf. On 3D Digital Imaging and Modelin (3DIM), 2001
[Segal 09] A. Segal, D. Haehnel, S. Thrun, «Generalized-ICP», Proc. Conf. Robotics: Science and Systems (RSS), 2009
[Zhang03] L. Zhang, B. Curless, S. Seitz, “Spacetime stereo: shape recovery for dynamic scenes” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2003
[Davis05] J. Davis, D. Nehab, R. Ramamoorthi, S. Rusinkiewicz, “Spacetime stereo: a unifying framework dor depth from triangulation,” Trans. Pattern Analysis and Machine Intelligence, vol. 27(2), 2005
[Nistèr 04] D. Nistèr, O. Naroditsky, J. Bergen, “Visual odometry”, Proc. Conf. on Computer Vision and Pattern Recognition (CVPR), 2004
[Konolige 06] K. Konolige, M. Agrawal, R.C. Bolles, C. Cowan, M. Fischler, B. Gerkey, "Outdoor mapping and navigation using stereo vision“, Proc. Int. Symp. on Experimental Robotics (ISER),2006
[Sim 06] R. Sim, J. J. Little, “Autonomous vision-based exploration and mapping using hybrid maps and rao-blackwellised particle filters,” Proc. Conf. on Intelligent Robots and Systems (IROS), 2006
Bibliography
F.Tombari, S. Salti
[Karlsson 05] N. Karlsson, E. D. Bernardo, J. Ostrowski, L. Goncalves, P. Pirjanian, M. E. Munich, “The vSLAM algorithm for robust localization and mapping,” Proc. Int. Conf. on Robotics and Automation ICRA), 2005
[Folkesson 05] J. Folkesson, P. Jensfelt, H. Christensen, “Vision SLAM in the Measurement Subspace,” IEEE Int. Conf. Robotics and Automation (ICRA), 2005
[Davison 03] A. J. Davison, “Real-time simultaneous localisation and mapping with a single camera”, Proc. ICCV, 2003
[Eade 06] E. Eade, T. Drummond, “Scalable monocular SLAM”, Proc. Conf. on Computer Vision and Pattern Recognition, 2006
[Clemente 07] L. Clemente, A. J. Davison, I. Reid, J. Neira, J. Tardòs, «Mapping large loops with a single hand-held camera», Proc. Conf. Robotics: Science and Systems (RSS), 2007
[Strasdat 10] H. Strasdat, J.M.M. Montiel, A. J. Davison, «Real-time Monocular SLAM: Why Filter?”, Proc. ICRA, 2010
[Klein 07] G. Klein, D. W. Murray, “Parallel tracking and mapping for small AR workspaces”, Proc. Int. Symp. on Mixed and Augmented Reality (ISMAR), 2007
[Newcombe 11] R.A. Newcombe, S.J. Lovegrove, A.J. Davison, “DTAM: Dense Tracking and Mapping in Real-Time”, IEEE International Conference on Computer Vision (ICCV), 2011
[Newcombe 11b] R.A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A.J. Davison, P. Kohli, J. Shotton, S. Hodges, A. Fitzgibbon, «KinectFusion: Real-Time Dense Surface Mapping and Tracking”, Proc. Int. Symp. on Mixed and Augmented Reality (ISMAR), 2011
Bibliography
F.Tombari, S. Salti
[Henry 11] P. Henry, M. Krainin, E. Herbst, X. Ren, D. Fox, “RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments”, Proc. Int. Symp. on Experimental Robotics, 2010
[Lloyd 82] S.P. Lloyd, "Least squares quantization in PCM“, IEEE Trans. on Information Theory, 28(2), pp. 129–137, 1982
[Tombari 11] F. Tombari, L. Di Stefano, “3D Data Segmentation by Local Classification and Markov Random Fields”, Proc. Conf. on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011
[Unnikrishnan 08] R. Unnikrishnan, M. Hebert, “Multi-scale interest regions from unorganized point clouds”, CVPR Workshop on Search in 3D, 2008.
[Anguelov 05] D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng, “Discriminative learning of markov random fields for segmentation of 3-d scan data”, Proc. CVPR, 2005
[Triebel 07] R. Triebel, R. Schmidt, O. M. Mozos, W. Burgard, “Instance-based AMN classification for improved object recognition in 2d and 3d laser range data”, Proc. Int. Conf. on Art. Intelligence, 2007
[Munoz 09] D. Munoz, J. A. Bagnell, N. Vandapel, M. Hebert, “Contextual classification with functional max-margin markov networks”, Proc. CVPR, 2009.
Bibliography