3D Computer Vision · 2013-07-29 · “The vSLAM algorithm for robust localization and mapping,”...

3D Computer Vision

F. Tombari, S. Salti

F.Tombari, S. Salti

Introduction

3D sensors

Data representations

Differential entities and operators

Hands on session – Point Cloud Library (PCL) PCL installation

Load and visualize a point cloud

Compute normals

Summary – Day 1

Introduction


3D Computer Vision

F.Tombari, S. Salti

3D Computer Vision

Reconstruction

Recognition

Acquisition

F.Tombari, S. Salti

Autonomous mobile robots – AMR (navigation)

Object recognition, grasping and manipulation (social robotics)

Applications - robotics

F.Tombari, S. Salti

Applications - video surveillance

Tracking and motion detection

People counting

Retail intelligence Crowd monitoring

Behavior analysis

F.Tombari, S. Salti

Applications – semantic segmentation

Urban data classification

Trimble Code Sprint

F.Tombari, S. Salti

Shape retrieval (www)

High-def 3D model acquisition (computer graphics)

Biometrical systems (eg. face recognition)

Medical imaging (MRI, CT, PET, x-ray, ultrasound, ..)

Other applications

Google warehouse 3D medical imaging

3D face recognition

Michelangelo project

F.Tombari, S. Salti

Autonomous vehicle navigation (AVN)

Augmented reality

Human computer interaction (HRI)

Videogaming, entertainment

..

And yet..

Augmented reality by Lego and Intel Microsoft Xbox

Autonomous Vehicle Navigation

vislab VIAC.mp4

F.Tombari, S. Salti

Reconstruction 3D registration

SLAM

Meshing

Recognition Object recognition under clutter and occlusions

Shape retrieval/categorization

People/face/obstacle/.. detection

Tracking Body pose estimation

People tracking/counting

Semantic segmentation

Typical 3D tasks

F.Tombari, S. Salti

Alignment of partially-overlapping 2.5D views

Useful to yield a high-def, fully-3D reconstruction of an object from views acquired from different view points

3D registration (1)

F.Tombari, S. Salti

Coarse registration provides an initial guess for the set of views that need to be registered

Fine registration (Iterative Closest Points - ICP [Besl 92])

3D registration (2)

Coarse registration

Fine registration

Unordered Input Views

Coarse Registration Fine Registration

F.Tombari, S. Salti

Iterative Closest Point

Iterative method to align two free-form shapes

Input: two sets of 3D points M,S

Output: the 6DOF transformation (R,t) that best aligning the two point sets

Given an initial transform, iterate until convergence

∀𝑝 ∈ 𝑀 find its Nearest Neighbor 𝑁𝑁(𝑝) ∈ 𝑆

Absolute orientation [Horn 87][Arun 87]

find R,t that minimize the mean square error between the set of point pairs (p, NN(p) )

𝑁𝑁 𝑝 − 𝑅 ∙ 𝑝 + 𝑡 2

𝑝∈𝑀

• T: distance between centroids

• R: least-square estimation on the over-determined system represented by the set of pairs

Convergence criteria:

Threshold on minimum error

Maximum number of iterations

If the initial guess is not good, different initial transforms ought to be tested to avoid local minima

Efficient versions available (GPU-ICP, [Rusinkiewicz 01])

Generalized ICP [Segal 09]

F.Tombari, S. Salti

MultiView Reconstruction

0.5

0.0

0.3

0.7

0.6 0.2 0.0

0.1

0.5 0.4

0.7

0.6

0.5 0.4

F.Tombari, S. Salti

Results

Spacetime Stereo [Zhang03, Davis05]

Kinect sensor

F.Tombari, S. Salti

Simultaneous Localization and Mapping incrementally build a map of the agent’s surroundings (mapping)

Localize itself within that map

Odometry, inertial sensing Measurement drifts

Visual odometry [Nistèr 04] [Konolige 06]

3D / photometric sensors Laser scanner

Sonar

Stereo [Sim 06]

Visual sensors (vSLAM) [Karlsson 05][Folkesson 05]

• Landmark initialization?

6DOF SLAM

monoSLAM [Davison 03] [Eade 06][Clemente 07] Visual odometry + single camera

SLAM (1)

Credits: J.B.Hayet

F.Tombari, S. Salti

SLAM (2)

Extended Kalman Filter

Landmark extraction

Geometric/photometric data

Odometry

Data association

Landmarks: • Re-observable • Distinctive • Stationary

EKF: • Update via odometry • Update via landmark re-observation

Mapping update Local vs. Global consistency

(loop closure, bundle adjustment)

F.Tombari, S. Salti

MonoSLAM converging to Structure-from-Motion [Strasdat 10] • PTAM [Klein 07], DTAM [Newcombe 11])

6DOF SLAM with RGB-D sensors

• Kinect Fusion [Newcombe 11b]

• RGB-D dense point cloud mapping [Henry 11]

SLAM (3)

F.Tombari, S. Salti

Determine the presence of a model in a scene

Estimate its 6DOF pose

Challenges: Clutter

Occlusions

Point density variations

Model library size (efficiency)

Multi-instance

Usually only rigid transformations are assumed (rotation, translation, scale)

Object recognition

Syntethic data

Spacetime Stereo

Real-time stereo

F.Tombari, S. Salti

Challenges: Intra-class variations

Invariance wrt. a high number of transformations including non-rigid deformations (eg. isometries)

Clutter and occlusions are not present

General approach Compute a compact representation for a query

Compare it with all object in the library

Retrieve the most similar ones - retrieval

(Assign a label to the query - categorization)

Shape retrieval and categorization

…

…

…

…

…

Vehicle

Animal

Household

Building

Furniture

Princeton Shape Benchmark (PSB) dataset («coarse2» categories)

SHREC 10 dataset

F.Tombari, S. Salti

Determine 3D connected components with specific properties or belonging to a particular category

Feature extraction

Description of 3D keypoints

Description of clusters [Lloyd 82], such as size, density, eigenvalues of the scatter matrix, ..

Feature classification

SVM, Random Trees, kNN, ..

Semantic segmentation (1)

F.Tombari, S. Salti

Inference on a loopy graph [Tombari 11]

An undirected graph is built over classified 3D features [Unnikrishnan 08]

The following function is maximized over the graph

where

l is a regularizer, n the classification probability

Other approaches based on graph inference are those relying on Associative Markov Networks (AMN) [Anguelov 05][Triebel 07][Munoz 09]

𝑃 𝑋 =1

𝑍 𝑒−𝜙𝑖 𝑥𝑖

𝑖∈𝑆

𝑒−𝜙𝑖,𝑗 𝑥𝑖,𝑥𝑗

𝑗∈𝑁 𝑖𝑖∈𝑆

𝜙𝑖 𝑥𝑖 = 𝜆 1 − 𝜈 𝑥𝑖 evidence (unary)

compatibility (pairwise)

𝜙𝑖,𝑗 𝑥𝑖 , 𝑥𝑗 =

0 𝑖𝑓 𝑥𝑖 = 𝑥𝑗

𝑒

− 𝑝𝑖−𝑝𝑗 2𝜎𝑐 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Semantic segmentation (2)

3D Sensors


3D Computer Vision

F.Tombari, S. Salti

3D sensors

Goal: create a point cloud of (samples from) the surface of an object/scene Collection of distance measurements from the sensor to the surface

Distances are them transformed into 3D coordinates (x,y,z) by means of calibration information

Usually, 3D sensors (or 3D scanners) acquire only a view of the object (2.5D data)

Some sensors also acquire information concerning color or light intensity (RGB-D data)

First step of every 3D reconstruction / 3D recognition pipeline

Contact sensors

Active sensors LIDAR, rangefinders

Time-of-Flight cameras

Laser Triangulation

Structured light

Medical imaging (CT, MRI)

..

Passive sensors Stereo

Structure-from-motion

Shape-from-shading, shape-from-silhouette, shape-from-defocus, ..

F.Tombari, S. Salti

LIDAR

LIDAR: Light Detection And Ranging A light pulse is emitted from the sensor and the round-trip time is computed

The higher the time, the further away the point from the sensor

Usually visible or near-infrared light is used Arrays of emitters are employed together to yield a set of simultaneous range measureaments (3D slice) Slices can be swept using a motor to yield a slice array Pros:

High range (hundred meters / kms) Works indoor/outdoor Real-time (eg. 100 Hz slices)

Cons Accuracy is an issue due to speed of light (3 10^8 m/s -> 1 mm every 3.3 ps) Cost Color/intensity information is usually not provided

Velodyne

𝑑 = 𝑐𝑡

2

SICK LMS500

F.Tombari, S. Salti

Time-of-Flight camera

A particular LIDAR device yielding a full 2D array of range measurements

A light pulse is emitted through an infrared illuminator; then: Phase-shift measurament of the returning light pulse on each pixel (Photonic Mixed devies, Canesta Vision (now Microsoft), Swiss Ranger )

«range-gated imager»: each pixel has a shutter that starts closing when the light pulse is emitted; the less light received, the further away the point from the sensor (Zcam by 3DV Systems, now Microsoft)

Pros No motor needs to be employed

Real-time (30-100 fps)

Cost effective

Cons Low resolution

Dark, non-reflective objects are hard to be acquired

Hardly works under sunlight (no outdoors)

Multiple reflections can yield false measurements

Interference between different sensors MESA SwissRanger 4000

F.Tombari, S. Salti

Laser triangulation

Laser + camera system A laser dot or stripe is emitted on the scene

The camera locates the dot/stripe.

The distance is determined via triangulation of the position of the dot (emitting angle, receiving angle and baseline need to be known)

Pros Accuracy (tens of micrometers)

Cons Limited range

Slow scanning time (often requires static scene)

$$

No color information, needs to be paired with a color camera

LASER O b

a

d

Minolta Vivid 9I

b

F.Tombari, S. Salti

Structured light

Camera + projector system

Similar to laser triangulation, where the laser stripe is replaced by a stripe projected by a light projector

Projecting a set of stripes (2D pattern) allows for multiple sampling, hence a full 2D range image can be acquired at once (but problem of confusing different fringes)

Using infrared projection and two cameras (one in infrared, one in the visible band) yields accurate RGB-D data

Pros:

Relatively cheap

Sub-millimiter accuracy (down to tens of micrometers)

Real-time

Cons:

Limited range

Hardly usable outdoor or in presence of other light sources

Highly dependent from the object surface characteristics (eg. reflective, translucent, ..)

Interference between different sensors

Microsoft Kinect

F.Tombari, S. Salti

Stereo vision

Two (or more) cameras

Cameras have to be sync-ed, especially in presence of non-static scenes

Depth is retrieved via triangulation of the point projections on the two views

Correspondence problem!

Pros:

Cheap

Passive

Real-time

Color/intensity can be directly associated to range data (RGB-D)

Cons

Low accuracy (tend to fail on low-textured regions, repetitive patterns and depth borders)

A projector can help adding texture to the scene to improve accuracy on low-textured regions

Videre Design

P

OL OR

pL pR

p’R

P’

pL pR

F.Tombari, S. Salti

Stereo using spatial and temporal information[Davis 05][Zhang 03]

To gather information, the appearance must change over time (but not the geometry!)

A random pattern is projected to augment each frame with a different texture (no structured light, no interference)

More accurate than standard stereo, but depth must be constant in time (static objects)

Joint spatio-temporal window

Spacetime stereo

F.Tombari, S. Salti

Structure-from-motion

Monocular system

Instead of spatially extending multiple views, they are temporally extended

Requires either the surface or the camera to move

Tracking and matching features

Pros

Cheap and simple hardware (only one camera needed)

RGB-D data

Solving SfM also yields camera pose at each time instant

Cons

Highly dependent from the available motion that the object/camera can undergo

Sparse depth information

F.Tombari, S. Salti

[Horn 87] B. Horn, “Closed-form solution of absolute orientation using unit quaternions”, J. Optical Society of America A, Vol.4, No.4, pp.629–642, 1987

[Arun 87] K.S. Arun, T.S. Huang, S.D. Blostein, “Least-squares fitting of two 3-D point sets”, IEEE Trans Pattern Anal Machine Intell 9:698–700, 1987

[Rusinkiewicz 01] S. Rusinkiewicz, M. Levoy, «Efficient variants of the ICP algorithm», Proc. Int. Conf. On 3D Digital Imaging and Modelin (3DIM), 2001

[Segal 09] A. Segal, D. Haehnel, S. Thrun, «Generalized-ICP», Proc. Conf. Robotics: Science and Systems (RSS), 2009

[Zhang03] L. Zhang, B. Curless, S. Seitz, “Spacetime stereo: shape recovery for dynamic scenes” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2003

[Davis05] J. Davis, D. Nehab, R. Ramamoorthi, S. Rusinkiewicz, “Spacetime stereo: a unifying framework dor depth from triangulation,” Trans. Pattern Analysis and Machine Intelligence, vol. 27(2), 2005

[Nistèr 04] D. Nistèr, O. Naroditsky, J. Bergen, “Visual odometry”, Proc. Conf. on Computer Vision and Pattern Recognition (CVPR), 2004

[Konolige 06] K. Konolige, M. Agrawal, R.C. Bolles, C. Cowan, M. Fischler, B. Gerkey, "Outdoor mapping and navigation using stereo vision“, Proc. Int. Symp. on Experimental Robotics (ISER),2006

[Sim 06] R. Sim, J. J. Little, “Autonomous vision-based exploration and mapping using hybrid maps and rao-blackwellised particle filters,” Proc. Conf. on Intelligent Robots and Systems (IROS), 2006

Bibliography

F.Tombari, S. Salti

[Karlsson 05] N. Karlsson, E. D. Bernardo, J. Ostrowski, L. Goncalves, P. Pirjanian, M. E. Munich, “The vSLAM algorithm for robust localization and mapping,” Proc. Int. Conf. on Robotics and Automation ICRA), 2005

[Folkesson 05] J. Folkesson, P. Jensfelt, H. Christensen, “Vision SLAM in the Measurement Subspace,” IEEE Int. Conf. Robotics and Automation (ICRA), 2005

[Davison 03] A. J. Davison, “Real-time simultaneous localisation and mapping with a single camera”, Proc. ICCV, 2003

[Eade 06] E. Eade, T. Drummond, “Scalable monocular SLAM”, Proc. Conf. on Computer Vision and Pattern Recognition, 2006

[Clemente 07] L. Clemente, A. J. Davison, I. Reid, J. Neira, J. Tardòs, «Mapping large loops with a single hand-held camera», Proc. Conf. Robotics: Science and Systems (RSS), 2007

[Strasdat 10] H. Strasdat, J.M.M. Montiel, A. J. Davison, «Real-time Monocular SLAM: Why Filter?”, Proc. ICRA, 2010

[Klein 07] G. Klein, D. W. Murray, “Parallel tracking and mapping for small AR workspaces”, Proc. Int. Symp. on Mixed and Augmented Reality (ISMAR), 2007

[Newcombe 11] R.A. Newcombe, S.J. Lovegrove, A.J. Davison, “DTAM: Dense Tracking and Mapping in Real-Time”, IEEE International Conference on Computer Vision (ICCV), 2011

[Newcombe 11b] R.A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A.J. Davison, P. Kohli, J. Shotton, S. Hodges, A. Fitzgibbon, «KinectFusion: Real-Time Dense Surface Mapping and Tracking”, Proc. Int. Symp. on Mixed and Augmented Reality (ISMAR), 2011

Bibliography

F.Tombari, S. Salti

[Henry 11] P. Henry, M. Krainin, E. Herbst, X. Ren, D. Fox, “RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments”, Proc. Int. Symp. on Experimental Robotics, 2010

[Lloyd 82] S.P. Lloyd, "Least squares quantization in PCM“, IEEE Trans. on Information Theory, 28(2), pp. 129–137, 1982

[Tombari 11] F. Tombari, L. Di Stefano, “3D Data Segmentation by Local Classification and Markov Random Fields”, Proc. Conf. on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011

[Unnikrishnan 08] R. Unnikrishnan, M. Hebert, “Multi-scale interest regions from unorganized point clouds”, CVPR Workshop on Search in 3D, 2008.

[Anguelov 05] D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng, “Discriminative learning of markov random fields for segmentation of 3-d scan data”, Proc. CVPR, 2005

[Triebel 07] R. Triebel, R. Schmidt, O. M. Mozos, W. Burgard, “Instance-based AMN classification for improved object recognition in 2d and 3d laser range data”, Proc. Int. Conf. on Art. Intelligence, 2007

[Munoz 09] D. Munoz, J. A. Bagnell, N. Vandapel, M. Hebert, “Contextual classification with functional max-margin markov networks”, Proc. CVPR, 2009.

Bibliography

3D Computer Vision · 2013-07-29 · “The vSLAM algorithm for robust localization and mapping,”...

Documents

Transcript of 3D Computer Vision · 2013-07-29 · “The vSLAM algorithm for robust localization and mapping,”...