Mid and high-level features for dense monocular SLAM and high-level features for dense monocular...

48
Mid and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19 th , 2015

Transcript of Mid and high-level features for dense monocular SLAM and high-level features for dense monocular...

Page 1: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Mid and high-level features for dense monocular SLAM

Javier Civera Qualcomm Augmented Reality Lecture Series

Nov. 19th, 2015

Page 2: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Index

Introduction/motivation

Point-based monocular SLAM

Keypoint-based monocular SLAM

Dense monocular SLAM

Mid-level features

Superpixels

Data-driven primitives

High-level features

Room Layout

Objects.

Page 3: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

• Robotic Vision is making a robot “see” ** • Now… what is to see for a robot? • Data input:

• Image sequences. • Multi-sensor. • Active sensing.

• Problem constraints: • Real-time. • Hardware limits.

• Goals: • Autolocation. • 3D scene models. • Temporal models. • Local short-term accuracy. • Long-term models. • Semantics.

Robotic Vision

** Paraphrasing Olivier Faugeras in Hartley & Zisserman’s book

Page 4: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Other applications

• The robotics constraints are shared with other applications.

• AR/VR. • Wearable/mobile devices. • Laparoscopic surgery. • …

Grasa et al., Visual SLAM for Hand-Held Monocular Endoscope, IEEE TMI, 2014

Page 5: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Point-based features (low-level)

• Point-based features are accurate in high-texture image regions and for high-parallax motions.

• The typical approach has been to use salient point features, discarding low-texture parts.

• SfM and Visual SLAM datasets are biased to high-parallax motions.

Page 6: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

C2

• Camera is a bearing-only sensor: it only measures angles.

• The depth of the scene is estimated by triangulation.

• The depth estimation is based on the parallax angle.

• The larger the parallax, the more accurate the depth estimation

?

PARALLAX ANGLE

tc1c2 C1

Z

Y

X

pi

Camera Geometry

Page 7: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

• Low parallax is due to: • Distant points • Small camera translation

• Depth cannot be estimated for zero parallax points... • ... but provide rich orientation information

Low-Parallax Points

Page 8: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

W

ii ,m

WCr

parallax angle

WCWCqr ,

C

i

i

i

z

y

xW

r

ii

i

i

i

i

z

y

x

,1

m

scene point i

i

d

1

i

i

i

z

y

x

i

i

i

i

i

i

i

z

y

x

y

Page 9: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

New Points added from 1st observation: 1) {x, y, z, θ, φ} initialized from 1st

observation and state vector 2) ρ0 and covariance σρ0 initialized so that

[ρ0-2 σρ0, ρ0+2 σρ0] includes infinity min0 /12 d

20

0 0

0

1

i

i

i

z

y

x

00 2

1

ii ,m

INVERSE DEPTH SPACE

EUCLIDEAN SPACE

Inverse Depth Point Initialization

Page 10: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

W

ii ,m

WCr

parallax angle

WCWCqr ,

C

i

i

i

z

y

xW

r

ii

i

i

i

i

z

y

x

,1

m

scene point i

i

d

1

i

i

i

z

y

x

Projection Model

1

1

Distortion Radial Parameters Two

Model Camera Pinhole

Frame Reference Camera

22

4

2

2

1

4

2

2

1

,

ydyxdxd

ddydy

ddxdx

u

u

u

zC

yC

y

zC

xC

x

u

u

u

ii

WC

i

i

i

i

CWC

CvdCudr

rrCvC

rrCuC

v

u

h

hfC

h

hfC

v

u

z

y

x

h

h

mrRh

i

i

i

i

i

i

i

z

y

x

y

Inverse Depth Point Measurement

Page 11: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Feature 3

Feature 11

Inverse Depth Parameterization

Page 12: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

10 votes 1 votes 8 votes

Outlier!!

n

Pm

11log

1log2n

1) RANDOM SAMPLES

2) PARTIAL UPDATE

3) RESCUE INLIERS

Standard RANSAC: 1D example

Page 13: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

High innovation

n

Pm

11log

1logsamples! less ,lower 1 mn

1) RANDOM SAMPLES

11 votes 3 votes 8 votes

2) PARTIAL UPDATE

3) RESCUE INLIERS

1-Point RANSAC: 1D example

Outlier

Inlier

Page 14: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

650 metres trajectory; 24180 images

ERROR : ~1% of the trajectory

length

Experimental Results for Large Trajectories

.

RAWSEEDS datasets: http://www.rawseeds.org

Camera+ wheel odometry,1310 metres, 54000 frames(~30 min video)

Page 15: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Feature-based stereo SLAM

• SPTAM: Stereo Parallel Tracking and Mapping • ~1,35% translation error • 10th position in KITTI (small differences with the previous ones) • 1st one with stereo code available

Taihú Pire, Thomas Fischer, Javier Civera, Pablo de Cristóforis, Julio César Jacobo Berlles, Stereo Parallel Tracking and Mapping for Robot Localization, IROS 2015. CODE AVAILABLE AT https://github.com/lrse/sptam

Page 16: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

How useful is a sparse map for a robot?

Page 17: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

How useful is a sparse map for a robot?

Not enough for navigation

Not enough for high-level tasks. E.g., “bring me a book from Henry’s table”

At least I have an accurate robot motion…

Page 18: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Dense mapping: RGB-D sensors

Page 19: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

But… • RGB-D sensors do not in direct sunlight

• RGB-D sensors do not work in every surface

• Minimum distance (~0,5 metres) and maximum distance (4-8 metres) • Size, weight, power consumption…

Page 20: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

• Minimize the photometric error and a regularization term.

Dense monocular mapping

Page 21: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Dense monocular mapping High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Keypoint-based

Dense

Page 22: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Dense Mapping: High Texture

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Dense

Page 23: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Dense Mapping: Low Texture

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Dense

Page 24: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.

Superpixels (mid-level)

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Keypoint-based

Dense

Superpixels

Dense + Sup.

• Image segmentation based on color and 2D distance.

• Decent features for textureless areas • We assume that homogeneous color

regions are almost planar.

Page 25: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Dense

Dense Mapping: Low Texture

Page 26: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Keypoint-Based Mapping: Low Texture

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Keypoint-based

Page 27: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Superpixels: Low Texture

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Superpixels

Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.

Page 28: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Superpixel Initialization

H

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014

Multiview model: Homography (h)

Error: Contour reprojection error (ɛ)

Montecarlo Initialization: For every superpixel we create h reasonable hypothesis and rank them by their error.

Page 29: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Superpixel Mapping

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014

Multiview model: Homography (h)

Error: Contour reprojection error (ɛ)

Mapping: Minimize the reprojection error.

H

Page 30: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Superpixels in low-textured areas

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Superpixels

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014

Page 31: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Using Superpixels in Monocular SLAM

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014

Page 32: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Dense + Superpixels

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

Page 33: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Dense + Superpixels

High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Dense + Sup.

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

Page 34: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

PMVS (high-gradient pixels) Dense (TV-regularization)

Superpixels PMVS + Superpixels Dense + Superpixels

Video (input)

Dense + Superpixels

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014

Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):13621376, 2010.

Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. Dtam: Dense tracking and mapping in real-time. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 23202327. IEEE, 2011.

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

Page 35: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Semidense mapping + superpixels

• TV-regularization is expensive, GPU might be needed for real-time. • Semidense mapping and superpixels is a reasonable option cheaper than

TV-regularization (CPU) and with a small loss on density. • Having a semidense map superpixels can be initialized via SVD more

accurately and at a lower cost.

Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015. Code to be released soon! https://github.com/alejocb/dpptam

Page 36: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Semidense mapping + superpixels

• The SVD superpixels are more accurate than the triangulated ones.

• The SVD superpixels are as accurate as the semidense map.

• Large errors in dense reconstructions!!

• Superpixels improve the error of dense reconstructions.

• A reasonable solution is to filter out low parallax points.

[3] is Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014 (ours) is Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015.

Page 37: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Monocular – Inertial Dense SLAM

• Integrating the inertial measurements gives the real scale of the reconstruction.

ICRA 2016 submission!

Page 38: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Now, how useful is this dense map for a robot?

Good enough for navigation

Not enough for high-level tasks. E.g., “bring me a book from Henry’s table” We are more resilient to low texture, we still need parallax…

Page 39: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Data-driven primitives (mid-level)

David F. Fouhey, Abhinav Gupta, and Martial Hebert. Data-driven 3D primitives for single image understanding. ICCV, 2013.

Feature discovery on RGB-D training data.

Extracts patterns that are consistent in D and discriminative in RGB

At test time, from a single RGB view we can predict mid-level depth patterns.

Page 40: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Multiview Layout (high-level) (a) Sparse/Semidense reconstruction. (b) Plane normals from 3D vanishing points (image VP, backprojection, 3D clustering). (c) Plane distances from a sparse/semidense multiview reconstruction. (d) Superpixel segmentation, geometric and photometric feature extraction. (e), (f) Classification (Adaboost)

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

Page 41: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Superpixels and Layout

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

Page 42: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Superpixels, Data-Driven Primitives and Layout

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.

• NYU dataset, high-parallax sequences

Page 43: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Superpixels, Data-Driven Primitives and Layout

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.

• NYU dataset, low-parallax sequences

Page 44: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

The layout can prevent tracking loss!

Marta Salas, Wajahat Hussain, Alejo Concha, Luis Montano, Javier Civera, J. M. M. Montiel, Layout Aware Visual Tracking and Mapping, IROS 2015.

Page 45: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Object features (high-level)

Page 46: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Conclusions: vSLAM features and performance

Point-based features (low-level)

High accuracy if high texture and high parallax.

Superpixels (mid-level)

High accuracy if low texture and high parallax.

Data-driven primitives (mid-level)

Decent accuracy even for low texture and low parallax.

The patterns should be discovered in the training data.

Layout (high-level)

Decent accuracy even for low texture and low parallax.

The layout patterns should appear in the image.

Objects (high-level)

High accuracy for object instances, decent accuracy for object categories.

The object should appear in the image.

Page 47: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Acknowledgments

J. M. M. Montiel, Andrew J. Davison, Alejo Concha, Wajahat Hussain, L. Montano, L. Montesano, J. Sola, T. Vidal-Calleja, A. C. Murillo, O. G. Grasa, D. R. Bueno, A. Agudo, D. Galvez-Lopez, L. Riazuelo, Taihú Pire, Jorge Romeo, J. D. Tardos, J. Neira, J. A. Castellanos, Marta Salas, A. Argiles, Chema Fácil, Jesús Oliva, Vittorio Ferrari, Alessandro Prest, Christian Leistner, Cordelia Schmid, Ian Reid, Brian Williams, Margarita Chli, Paulo Drews Jr, Mario Campos, Martial Hebert, Javier Mínguez, María López, Roboearth Consortium (TU/e, Philips, Universität Stuttgart, ETHZ, TUM), IGLU consortium (Univ. Montreal, Inria Bordeaux, Univ. Mons, KTH, Univ. Lille)…

Funding: CICYT DPI2003-07986, DPI2006-13578, DPI2009-07130, DPI2012-32168, PCIN-2015-122, EU RAWSEEDS project FP6-045144, EU RoboEarth project FP7-248942, DGA-CAI IT12-06, DGA-CAI IT 26/10, SNSF IZK0Z2-136096.

Page 48: Mid and high-level features for dense monocular SLAM and high-level features for dense monocular SLAM Javier Civera Qualcomm Augmented Reality Lecture Series Nov. 19th, 2015 Index

Thank you!

Javier Civera (+34) 876 55 55 54 [email protected]

https://plus.google.com/+JavierCivera http://www.youtube.com/user/jciveravision

https://twitter.com/jcivera http://www.linkedin.com/in/jcivera http://webdiis.unizar.es/~jcivera/