Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright...
Transcript of Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright...
![Page 1: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/1.jpg)
Copyright © 2015 G.D. Hager
Bridging the Robot Perception Gap With Mid-Level Vision
Chi Li, Jonathan Bohren, Gregory D. HagerLaboratory for Computational Sensing and RoboticsThe Johns Hopkins University
![Page 2: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/2.jpg)
Copyright © 2015 G.D. Hager
State of the Art Vision
Convolutional architectures such as deep CNN are designed for object classification in natural images
- Large number of classes- Large variations in scale
appearance and background- But ….
- Minor occlusions- Sensitive to 3D rotations [1]- Detection, but not pose
[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.
![Page 3: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/3.jpg)
Copyright © 2015 G.D. Hager
Vision Meets Manipulation
- Textureless objects
- Objects in contact
- Need accurate pose
But …
- Small number of classes (often task-directed)
- Consistent environment
![Page 4: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/4.jpg)
Copyright © 2015 G.D. Hager
Our Prior Work on RGB-D Instance Recognition
[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.
![Page 5: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/5.jpg)
Copyright © 2015 G.D. Hager
Our Prior Work on RGB-D Instance Recognition
[1] Li, C.,Reiter, A., Hager, G.D.: Beyond Spatial Pooling, Fine-Grained Representation Learning in Multiple Domains. In: CVPR, 2015.
UW-RGBD Dataset
JHU Tools Dataset
![Page 6: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/6.jpg)
Copyright © 2015 G.D. Hager
This Paper: Making it Work for Robots
![Page 7: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/7.jpg)
Copyright © 2015 G.D. Hager
Semantic Segmentation 1:Extract local features (e.g. CSHOT) and encode them using
learned dictionary. Feature codes are pooled in color domains to form higher-level representations.
![Page 8: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/8.jpg)
Copyright © 2015 G.D. Hager
Semantic Segmentation 2:Build an Integral Image for each pooling region and compute
pooled features of each sliding window for subsequent semantic classification.
![Page 9: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/9.jpg)
Copyright © 2015 G.D. Hager
Semantic Segmentation 2:Build an Integral Image for each pooling region and compute
pooled features of each sliding window for subsequent semantic classification.
![Page 10: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/10.jpg)
Copyright © 2015 G.D. Hager
Filter on Pose in Each Class
Papazov and Burschka. An efficient RANSAC for 3D object recognition in noisy and occluded scenes. In ACCV, 2010.
![Page 11: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/11.jpg)
Copyright © 2015 G.D. Hager
Testing Methodology
- Developed an LN-66 dataset - contains 66 scenes with various complex
configurations of the “link” and “node” textureless objects.
- 614 testing frames in total- Background has been subtracted by plane removal and
pass through filtering in each frame.
- Object model (SVM) is trained over the corresponding partial views in JHUIT-50.
![Page 12: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/12.jpg)
Copyright © 2015 G.D. Hager
Qualitative Examples
![Page 13: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/13.jpg)
Copyright © 2015 G.D. Hager
Effect of Segmentation
![Page 14: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/14.jpg)
Copyright © 2015 G.D. Hager
Quantitative Analysis
NS: no segmentation; S: our segmentation; GS: groundtruth segmentation;B: standard ObjRecRANSAC; GB: greedy-batch variant; GO: greedy-one
variant
No segmentation
Oursegmentation
Ground truthsegmentation
![Page 15: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/15.jpg)
Copyright © 2015 G.D. Hager
Running Time
- Semantic segmentation runs around 1s
- CPU-based ObjRecRANSAC 1-10 Sec. However, the CUDA-based implementation of ObjRecRANSAC runs at 4~5Hz.
- Full CUDA implementation would be << 1 s
![Page 16: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/16.jpg)
Copyright © 2015 G.D. Hager
Current Progress & Future Work
- Background/Foreground classification from training data
- More advanced features to distinguish objects with similar appearances
- A more effective method to collect sub-global patterns (supervoxels and their higher order sets)
- Cuda-based implementation
![Page 17: Perception Gap With Mid- Bridging the Robot Level Visioncli53/papers/chi_isrr15_slides.pdfCopyright © 2015 G.D. Hager State of the Art Vision Convolutional architectures such as deep](https://reader036.fdocuments.net/reader036/viewer/2022081607/5ed1b3bf7dccd150e82adb28/html5/thumbnails/17.jpg)
Copyright © 2015 G.D. Hager
Questions?
This work is supported by the National Science Foundation under Grant No. NRI-1227277. Bohren is supported by a NASA graduate fellowship.