Mid and high-level features for dense monocular SLAM and high-level features for dense monocular...

Mid and high-level features for dense monocular SLAM

Javier Civera Qualcomm Augmented Reality Lecture Series

Nov. 19th, 2015

Introduction/motivation

Point-based monocular SLAM

Keypoint-based monocular SLAM

Dense monocular SLAM

Mid-level features

Superpixels

Data-driven primitives

High-level features

Room Layout

Objects.

• Robotic Vision is making a robot “see” ** • Now… what is to see for a robot? • Data input:

• Image sequences. • Multi-sensor. • Active sensing.

• Problem constraints: • Real-time. • Hardware limits.

• Goals: • Autolocation. • 3D scene models. • Temporal models. • Local short-term accuracy. • Long-term models. • Semantics.

Robotic Vision

** Paraphrasing Olivier Faugeras in Hartley & Zisserman’s book

Other applications

• The robotics constraints are shared with other applications.

• AR/VR. • Wearable/mobile devices. • Laparoscopic surgery. • …

Grasa et al., Visual SLAM for Hand-Held Monocular Endoscope, IEEE TMI, 2014

Point-based features (low-level)

• Point-based features are accurate in high-texture image regions and for high-parallax motions.

• The typical approach has been to use salient point features, discarding low-texture parts.

• SfM and Visual SLAM datasets are biased to high-parallax motions.

• Camera is a bearing-only sensor: it only measures angles.

• The depth of the scene is estimated by triangulation.

• The depth estimation is based on the parallax angle.

• The larger the parallax, the more accurate the depth estimation

PARALLAX ANGLE

tc1c2 C1

Camera Geometry

• Low parallax is due to: • Distant points • Small camera translation

• Depth cannot be estimated for zero parallax points... • ... but provide rich orientation information

Low-Parallax Points

parallax angle

WCWCqr ,

scene point i

New Points added from 1st observation: 1) {x, y, z, θ, φ} initialized from 1st

observation and state vector 2) ρ0 and covariance σρ0 initialized so that

[ρ0-2 σρ0, ρ0+2 σρ0] includes infinity min0 /12 d

INVERSE DEPTH SPACE

EUCLIDEAN SPACE

Inverse Depth Point Initialization

parallax angle

WCWCqr ,

scene point i

Projection Model

Distortion Radial Parameters Two

Model Camera Pinhole

Frame Reference Camera

ydyxdxd

CvdCudr

Inverse Depth Point Measurement

Feature 3

Feature 11

Inverse Depth Parameterization

10 votes 1 votes 8 votes

Outlier!!

1log2n

1) RANDOM SAMPLES

2) PARTIAL UPDATE

3) RESCUE INLIERS

Standard RANSAC: 1D example

High innovation

1logsamples! less ,lower 1 mn

1) RANDOM SAMPLES

11 votes 3 votes 8 votes

2) PARTIAL UPDATE

3) RESCUE INLIERS

1-Point RANSAC: 1D example

Outlier

Inlier

650 metres trajectory; 24180 images

ERROR : ~1% of the trajectory

length

Experimental Results for Large Trajectories

RAWSEEDS datasets: http://www.rawseeds.org

Camera+ wheel odometry,1310 metres, 54000 frames(~30 min video)

Feature-based stereo SLAM

• SPTAM: Stereo Parallel Tracking and Mapping • ~1,35% translation error • 10th position in KITTI (small differences with the previous ones) • 1st one with stereo code available

Taihú Pire, Thomas Fischer, Javier Civera, Pablo de Cristóforis, Julio César Jacobo Berlles, Stereo Parallel Tracking and Mapping for Robot Localization, IROS 2015. CODE AVAILABLE AT https://github.com/lrse/sptam

How useful is a sparse map for a robot?

Not enough for navigation

Not enough for high-level tasks. E.g., “bring me a book from Henry’s table”

At least I have an accurate robot motion…

Dense mapping: RGB-D sensors

But… • RGB-D sensors do not in direct sunlight

• RGB-D sensors do not work in every surface

• Minimum distance (~0,5 metres) and maximum distance (4-8 metres) • Size, weight, power consumption…

• Minimize the photometric error and a regularization term.

Dense monocular mapping

Dense monocular mapping High Texture Low Texture

Accuracy Density Cost Accuracy Density Cost

Keypoint-based

Dense Mapping: High Texture

High Texture Low Texture

Dense Mapping: Low Texture

Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.

Superpixels (mid-level)

Keypoint-based

Superpixels

Dense + Sup.

• Image segmentation based on color and 2D distance.

• Decent features for textureless areas • We assume that homogeneous color

regions are almost planar.

Dense Mapping: Low Texture

Keypoint-Based Mapping: Low Texture

Keypoint-based

Superpixels: Low Texture

Superpixels

Pedro F Felzenszwalb and Daniel P Huttenlocher. Ecient graph-based image segmentation. International Journal of Computer Vision, 59(2):167181, 2004.

Superpixel Initialization

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014

Multiview model: Homography (h)

Error: Contour reprojection error (ɛ)

Montecarlo Initialization: For every superpixel we create h reasonable hypothesis and rank them by their error.

Superpixel Mapping

Multiview model: Homography (h)

Error: Contour reprojection error (ɛ)

Mapping: Minimize the reprojection error.

Superpixels in low-textured areas

Superpixels

Using Superpixels in Monocular SLAM

Dense + Superpixels

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

Dense + Superpixels

Dense + Sup.

PMVS (high-gradient pixels) Dense (TV-regularization)

Superpixels PMVS + Superpixels Dense + Superpixels

Video (input)

Dense + Superpixels

Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):13621376, 2010.

Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. Dtam: Dense tracking and mapping in real-time. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 23202327. IEEE, 2011.

Semidense mapping + superpixels

• TV-regularization is expensive, GPU might be needed for real-time. • Semidense mapping and superpixels is a reasonable option cheaper than

TV-regularization (CPU) and with a small loss on density. • Having a semidense map superpixels can be initialized via SVD more

accurately and at a lower cost.

Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015. Code to be released soon! https://github.com/alejocb/dpptam

Semidense mapping + superpixels

• The SVD superpixels are more accurate than the triangulated ones.

• The SVD superpixels are as accurate as the semidense map.

• Large errors in dense reconstructions!!

• Superpixels improve the error of dense reconstructions.

• A reasonable solution is to filter out low parallax points.

[3] is Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014 (ours) is Alejo Concha, Javier Civera, DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence, IROS 2015.

Monocular – Inertial Dense SLAM

• Integrating the inertial measurements gives the real scale of the reconstruction.

ICRA 2016 submission!

Now, how useful is this dense map for a robot?

Good enough for navigation

Not enough for high-level tasks. E.g., “bring me a book from Henry’s table” We are more resilient to low texture, we still need parallax…

Data-driven primitives (mid-level)

David F. Fouhey, Abhinav Gupta, and Martial Hebert. Data-driven 3D primitives for single image understanding. ICCV, 2013.

Feature discovery on RGB-D training data.

Extracts patterns that are consistent in D and discriminative in RGB

At test time, from a single RGB view we can predict mid-level depth patterns.

Multiview Layout (high-level) (a) Sparse/Semidense reconstruction. (b) Plane normals from 3D vanishing points (image VP, backprojection, 3D clustering). (c) Plane distances from a sparse/semidense multiview reconstruction. (d) Superpixel segmentation, geometric and photometric feature extraction. (e), (f) Classification (Adaboost)

Superpixels and Layout

Superpixels, Data-Driven Primitives and Layout

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.

• NYU dataset, high-parallax sequences

Superpixels, Data-Driven Primitives and Layout

Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Incorporating Scene Priors to Dense Monocular Mapping, Autonomous Robots 2015.

• NYU dataset, low-parallax sequences

The layout can prevent tracking loss!

Marta Salas, Wajahat Hussain, Alejo Concha, Luis Montano, Javier Civera, J. M. M. Montiel, Layout Aware Visual Tracking and Mapping, IROS 2015.

Object features (high-level)

Conclusions: vSLAM features and performance

Point-based features (low-level)

High accuracy if high texture and high parallax.

Superpixels (mid-level)

High accuracy if low texture and high parallax.

Data-driven primitives (mid-level)

Decent accuracy even for low texture and low parallax.

The patterns should be discovered in the training data.

Layout (high-level)

Decent accuracy even for low texture and low parallax.

The layout patterns should appear in the image.

Objects (high-level)

High accuracy for object instances, decent accuracy for object categories.

The object should appear in the image.

Acknowledgments

J. M. M. Montiel, Andrew J. Davison, Alejo Concha, Wajahat Hussain, L. Montano, L. Montesano, J. Sola, T. Vidal-Calleja, A. C. Murillo, O. G. Grasa, D. R. Bueno, A. Agudo, D. Galvez-Lopez, L. Riazuelo, Taihú Pire, Jorge Romeo, J. D. Tardos, J. Neira, J. A. Castellanos, Marta Salas, A. Argiles, Chema Fácil, Jesús Oliva, Vittorio Ferrari, Alessandro Prest, Christian Leistner, Cordelia Schmid, Ian Reid, Brian Williams, Margarita Chli, Paulo Drews Jr, Mario Campos, Martial Hebert, Javier Mínguez, María López, Roboearth Consortium (TU/e, Philips, Universität Stuttgart, ETHZ, TUM), IGLU consortium (Univ. Montreal, Inria Bordeaux, Univ. Mons, KTH, Univ. Lille)…

Funding: CICYT DPI2003-07986, DPI2006-13578, DPI2009-07130, DPI2012-32168, PCIN-2015-122, EU RAWSEEDS project FP6-045144, EU RoboEarth project FP7-248942, DGA-CAI IT12-06, DGA-CAI IT 26/10, SNSF IZK0Z2-136096.

Thank you!

Javier Civera (+34) 876 55 55 54 jcivera@unizar.es

https://plus.google.com/+JavierCivera http://www.youtube.com/user/jciveravision

https://twitter.com/jcivera http://www.linkedin.com/in/jcivera http://webdiis.unizar.es/~jcivera/

Mid and high-level features for dense monocular SLAM and high-level features for dense monocular...

Documents

Transcript of Mid and high-level features for dense monocular SLAM and high-level features for dense monocular...

Monocular LSD-SLAM integreation within AR System

Inverse Depth Parameterization for Monocular SLAM Vision Seminar

CodeSLAM — Learning a Compact, Optimisable Representation for Dense Visual SLAM · CodeSLAM — Learning a Compact, Optimisable Representation for Dense Visual SLAM Michael Bloesch,

Robust Monocular SLAM in Dynamic - Semantic Scholar · Robust Monocular SLAM in Dynamic Environments Wei Tan, Haomin Liu, Zilong Dong, ... Keyframes Updating The occlusions caused

Real-Time Monocular Object-Model Aware Sparse SLAM · Real-Time Monocular Object-Model Aware Sparse SLAM Mehdi Hosseinzadeh, Kejie Li, Yasir Latif, and Ian Reid Abstract—Simultaneous

Dense Monocular Depth Estimation in Complex Dynamic Scenes · 2016-05-16 · Dense Monocular Depth Estimation in Complex Dynamic Scenes Rene Ranftl´ 1, Vibhav Vineet1, Qifeng Chen2,

Impact of landmark parametrization on monocular EKF-SLAM with points and lines · 2014-01-17 · Impact of landmark parametrization on monocular EKF-SLAM with points and lines ...

Autonomous Exploration with a Low-Cost …Autonomous Exploration with a Low-Cost Quadrocopter using Semi-Dense Monocular SLAM Lukas von Stumberg1, Vladyslav Usenko1, Jakob Engel1,

LSD-SLAM: Large-Scale Direct Monocular SLAM

Polarimetric Dense Monocular SLAMpingtan/Papers/cvpr18_pdms.pdf · Polarimetric Dense Monocular SLAM ... Introduction Polarization is a natural characteristic of light waves, which

Monocular Visual–Inertial SLAM Algorithm Combined with ...

A survey on non-filter-based monocular Visual SLAM systems · A survey on non-filter-based monocular Visual SLAM systems 3 Table 1 List of different visual SLAM system. Non-filter-based

REMODE: Probabilistic, Monocular Dense Reconstruction in ...rpg.ifi.uzh.ch › docs › ICRA14_Pizzoli.pdf · REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time Matia

PL-SLAM: Real-Time Monocular Visual SLAM with Points and Lines-Real-Time... · 2017-02-22 · PL-SLAM: Real-Time Monocular Visual SLAM with Points and Lines Albert Pumarola1 Alexander

A Highest Order Hypothesis Compatibility Test for Monocular SLAM

ORB-SLAM: a Versatile and Accurate Monocular SLAM System · 1 ORB-SLAM: a Versatile and Accurate Monocular SLAM System Raul Mur-Artal*, J. M. M. Montiel, and Juan D. Tard´ os´ Abstract—The

Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.

CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Predictionopenaccess.thecvf.com/content_cvpr_2017/papers/Tate… · · 2017-05-31learned approaches. ... repetitive patterns)

REMODE: Probabilistic, Monocular Dense Reconstruction in ...7edf1f0c-aaa1-470f... · REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time Matia Pizzoli, Christian Forster

NID-SLAM: Robust Monocular SLAM using Normalised ...mobile/Papers/2017CVPR_pascoe.pdfNID-SLAM: Robust Monocular SLAM using Normalised Information Distance Geoffrey Pascoe, Will Maddern,