MONOCULAR SLAM SUPPORTED OBJECT RECOGNITIONpeople.csail.mit.edu/spillai/projects/vslam-object... ·...

1
MONOCULAR SLAM SUPPORTED OBJECT RECOGNITION John J. Leonard [email protected] Sudeep Pillai [email protected] Marine Robotics Group CSAIL, MIT Object recognition for a robot is still a challenge Cap Co ee Mug Soda Can Bowl Bowl Reconstruction with Object Labels Multi-view Object Detection Objects easily tease apart to enable better proposals (Proposals from Semi-Dense Maps) Robust Reduced false positives via view correspondence from SLAM (Multi-view prediction) Scalable ~1.6-1.7 s for 5-50 object categories (FLAIR encoding) Single RGB Camera Monocular SLAM supports improved recognition (Semi-Dense Mapping Backend) UW-RGBD Dataset (v2) Lai et al. (ICRA 2014) Input RGB Video (a) ORB-SLAM Mur-Artal et al. Semi-Dense Depth Filtering SVO: Forster et al. (ICRA 2014) + Scale-ambiguous Reconstruction (b) Low-density region pruning Filtered Reconstruction Candidate proposals projection View-Consistent Proposals View View View (d) Fusion of object proposal classification Unified Object Prediction View (#) - Number of categories identifiable Runtime (s) >4 s 5 s 1.8 s 1.7 s 1.6 s Ours Ours DetOnly DetOnly HMP2D+3D (5) (51) (5) (51) (9) Lower is better IoU - Intersection Over Union (#) - Number of object proposals MULTI-VIEW OBJECT PROPOSALS Consistent object hypothesis supported by newer semi-dense map representations Achieve similar Recall-vs-IoU with fewer object proposals RUNTIME SCALABILITY Category-agnostic VLAD + FLAIR encoding Run-time nearly independent of object categories identifiable SLAM-SUPPORTED VS. FRAME-BASED RECOGNITION VLAD + FLAIR encoding: Run-time nearly independent of object categories identifiable For more details and videos, please visit our website http://people.csail.mit.edu/spillai/projects/vslam-object-recognition mAP 61.7 81.5 Ours DetOnly Higher is better (RGB) (RGB) RGB: Superior to frame-based methods Higher is better SLAM-SUPPORTED mAP 90.9 91.0 89.8 Ours Det3DMRF HMP2D+3D (RGB-D) (RGB-D) (RGB) RGB-D: Comparable to state-of-the- art SLAM-based recognition methods RESULTS METHODOLOGY Multi-scale Over-segmentation (b) Density-based segmentation over 4 spatial scales CONTRIBUTIONS Objectness Selective Search Randomized Prim BING Object Proposal Methods State-of-the-art object detection schemes utilize object proposal methods to localize objects SHIFT IN OBJECT DETECTION Fisher and VLAD with FLAIR van de Sande et al. (CVPR 2014) Efficient box encoding provide object localization capabilities to existing BoVW methods EFFICIENT FEATURE ENCODING Camera Motion and Scene Geometry LSD-SLAM Engel et al. (ECCV 2014) SLAM++ Salas-Moreno et al. (CVPR 2013) Detection-based Object Labeling in 3D scenes Lai et al. (ICRA 2012) • Semi-dense reconstructions could potentially propose objects • Object Detection / Recognition can be better informed with SLAM NEWER SLAM CAPABILITIES PRIOR WORK MOTIVATION Provide a fast and scalable solution for training and running object detectors Leverage inherent SLAM capabilities to better support the object detection and recognition task SLAM Object Recognition (Classification) Object Detection (Proposals) Robust View Correspondence Proposals from Semi-dense maps Efficient and Scalable Classification

Transcript of MONOCULAR SLAM SUPPORTED OBJECT RECOGNITIONpeople.csail.mit.edu/spillai/projects/vslam-object... ·...

Page 1: MONOCULAR SLAM SUPPORTED OBJECT RECOGNITIONpeople.csail.mit.edu/spillai/projects/vslam-object... · Object Proposal Methods State-of-the-art object detection schemes utilize object

MONOCULAR SLAM SUPPORTED OBJECT RECOGNITION

John J. [email protected]

Sudeep [email protected]

Marine Robotics GroupCSAIL, MIT

• Object recognition for a robot is still a challenge

Cap

Co eeMug

SodaCan

Bowl

Bowl

Cap

Co eeMug

SodaCan

Bowl

Bowl

Reconstruction with Object Labels

Multi-view Object DetectionObjects easily tease apart to

enable better proposals(Proposals from Semi-Dense Maps)

RobustReduced false positives via

view correspondence from SLAM

(Multi-view prediction)

Scalable ~1.6-1.7 s for 5-50 object categories(FLAIR encoding)

Single RGB CameraMonocular SLAM supports

improved recognition(Semi-Dense Mapping Backend)

UW-RGBD Dataset (v2)Lai et al. (ICRA 2014)

Input RGB Video Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video

(a) (b)

ViewView

View

Density-based Spatial Clustering

(c) (d)

Multi-view object proposals

ORB-SLAMMur-Artal et al.

Semi-Dense Depth FilteringSVO: Forster et al. (ICRA 2014)

+

Scale-ambiguous ReconstructionScale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video

(a) (b)

ViewView

View

Density-based Spatial Clustering

(c) (d)

Multi-view object proposalsLow-density region pruning

Filtered Reconstruction

Candidate proposals projection

View-Consistent Proposals

Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video

(a) (b)

ViewView

View

Density-based Spatial Clustering

(c) (d)

Multi-view object proposals

Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video

(a) (b)

ViewView

View

Density-based Spatial Clustering

(c) (d)

Multi-view object proposals

Fusion of object proposal classification

Unified Object Prediction

Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video

(a) (b)

ViewView

View

Density-based Spatial Clustering

(c) (d)

Multi-view object proposals

(#) - Number of categories identifiable

Runtime (s)

>4 s5 s

1.8 s1.7 s1.6 s

Ours Ours DetOnlyDetOnly HMP2D+3D

(5) (51) (5)

(51)(9)

Low

er is

bet

ter

IoU - Intersection Over Union (#) - Number of object proposals

MULTI-VIEW OBJECT PROPOSALS

• Consistent object hypothesis supported by newer semi-dense map representations

•Achieve similar Recall-vs-IoU with fewer object proposals

RUNTIME SCALABILITY

• Category-agnostic VLAD + FLAIR encoding• Run-time nearly independent of object

categories identifiable

SLAM-SUPPORTED VS. FRAME-BASED RECOGNITION

• VLAD + FLAIR encoding: Run-time nearly independent of object categories identifiable

For more details and videos, please visit our websitehttp://people.csail.mit.edu/spillai/projects/vslam-object-recognition

mAP

61.781.5

Ours DetOnly

High

er is

bet

ter

(RGB) (RGB)

• RGB: Superior to frame-based methods

High

er is

bet

ter

SLAM-SUPPORTED

mAP

90.991.089.8

Ours Det3DMRF HMP2D+3D(RGB-D)(RGB-D)(RGB)

• RGB-D: Comparable to state-of-the-art SLAM-based recognition methods

RESULTS

METHODOLOGYMulti-scale Over-segmentationScale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video

(a) (b)

ViewView

View

Density-based Spatial Clustering

(c) (d)

Multi-view object proposals

Density-based segmentation over 4 spatial scales

CONTRIBUTIONS

Objectness

Selective Search

Randomized Prim

BING

Object Proposal Methods

State-of-the-art object detection schemes utilize object proposal

methods to localize objects

SHIFT IN OBJECT DETECTION

Fisher and VLAD with FLAIRvan de Sande et al. (CVPR 2014) Efficient box encoding provide

object localization capabilities to existing BoVW methods

EFFICIENT FEATURE ENCODING

1

LSD-SLAM: Large-Scale Direct Monocular SLAM

Jakob Engel, Thomas Schöps, Daniel CremersTechnical University Munich

Monocular Video Camera Motion and Scene Geometry

Computer Vision GroupTechnical University of Munich

Jakob Engel

LSD-SLAM: Large-Scale Direct Monocular SLAMEngel, Schöps, Cremers

LSD-SLAMEngel et al. (ECCV 2014)

SLAM++Salas-Moreno et al. (CVPR 2013)

Detection-based Object Labeling in 3D scenes Lai et al. (ICRA 2012)

• Semi-dense reconstructions could potentially propose objects• Object Detection / Recognition can be better informed with SLAM

NEWER SLAM CAPABILITIES

PRIOR WORKMOTIVATION

• Provide a fast and scalable solution for training and running object detectors

• Leverage inherent SLAM capabilities to better support the object detection and recognition task

SLAM

ObjectRecognition

(Classification)

ObjectDetection(Proposals)

Robust View Correspondence

Proposals from Semi-dense

maps

Efficient and ScalableClassification