Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun,...
-
Upload
alaina-ferguson -
Category
Documents
-
view
227 -
download
1
Transcript of Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun,...
![Page 1: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/1.jpg)
Learning Methods for Generic Object Recognition with Invariance to Pose and Lightingby Yann LeCun, Fu Jie Huang, and Léon Bottouin Proceedings of CVPR'04, 2004
Presentation by Hasan Doğu TAŞKIRANCS 550 – Machine Learning
Department of Computer EngineeringBilkent University
April 21, 2005
![Page 2: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/2.jpg)
2
About the paper… Recognition of Generic Object Categories The NORB Dataset Experiments and Results
Principal Component Analysis K-Nearest Neighbors Pairwise Support Vector Machines Convolutional Networks
Conclusion and Future Work
Outline
![Page 3: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/3.jpg)
3
The paper is about…
Describing the largest publicly available dataset
Reporting baseline performance with standard methods on this dataset
Exploring how methods fare when the number of input variables is huge
The performance of methods based on global template matching
The performance when the size of the problem is at the upper-limit of applicability
Learning invariance to 3D pose, lighting conditions and variabilities of images
Taking advantage of binocular inputs
![Page 4: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/4.jpg)
4
Recognition of Generic Object Categories
The recognition of generic object categories with invariance to pose, lighting, diverse backgrounds, and the presence of clutter is one of the major challenges of Computer Vision.
Variety of clues have been used previously: Color and Texture Distinctive Local Features Separately acquired 3D models Silhouettes and edges Pose-invariant Feature Histograms
Shape information??
![Page 5: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/5.jpg)
5
Using Shape Information
Recognizing Generic Categories such as cars, trucks, airplanes, human figures, or four-legged animals purely from the shape information is a difficult problem
Another difficulty of the problem is the non-availability of a dataset with sufficient size and diversity to carry out meaningful experiments.
![Page 6: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/6.jpg)
6
The NORB Dataset
The only useful and reliable clue in the dataset is the shape of the object
NORB is considerably larger than the past datasets and it offers: More variability Stereo pairs The ability to composite the objects and their cast shadows onto
diverse backgrounds
Images of 50 toys were collected using the peripherals whose details are given in the paper
![Page 7: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/7.jpg)
7
The NORB Dataset
The collection consists of 10 instance of 5 generic categories: Four-legged Animals, Human Figures, Airplanes, Trucks, Cars All objects are painted uniform green to eliminate irrelevant color and texture Each object instance was placed in a different initial pose 1944 stereo pairs were collected for each instance: 9 elevations, 36 azimuths
and 6 lighting conditions A total of 194.400 images RGB images of resolution 640x480 were collected (5
categories, 10 instances, 9 elevations, 36 azimuths, 6 lightings, and 2 cameras)
![Page 8: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/8.jpg)
8
The NORB Dataset
Experiments were conducted with 4 datasets generated from the normalized object images
Normalized-Uniform Set Jittered-Uniform Set Jittered-Textured Set Jittered-Cluttered Set
Each dataset consists of the 5 instances of categories for training and 5 instances for testing
![Page 9: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/9.jpg)
9
The NORB Dataset
![Page 10: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/10.jpg)
10
On raw image pairs Linear Classifier K-Nearest Neighbor Pairwise Support Vector Machines with Gaussian Kernels Convolutional Networks
On PCA coefficients K-Nearest Neighbor Pairwise Support Vector Machines with Gaussian Kernels
Lush environment, Torch Library are used
Experiments
![Page 11: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/11.jpg)
11
18,432 x 18,432 covariance matrix so we need a method
Find the principal direction of a centered cloud of points by finding two cluster centroids that are symmetric with respect to the origin i.e., find u that minimizes
Yields the first 100 principal components in a few CPU hours
Experiments - PCA
i ii uxux ))(,)min(( 22
![Page 12: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/12.jpg)
12
Running on 24,300 reference images of size 18,432 is prohibitively expensive
Pre-compute the distances of a few representative images Ak to all other reference images Xi.
Distances are bounded below by:
This can be used to choose which distances should be computed first.
Conducted up to K = 18 but best results are obtained for K = 1
Experiments – K-Nearest Neighbors
),(),( ikkk XAdAXdMax
![Page 13: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/13.jpg)
13
Failed to obtain convergence on normalized-uniform dataset in manageable time, also SVMs were not trained on jitter datasets
Applied on sub-sampled versions and PCA-derived versions
10 SVMs were independently trained to do pairwise classification and used voting strategy
The number of support vectors was between [800, 2000] for PCA-derived inputs
The number of support vectors was between [2000, 3000] for 32x32 raw images
Experiments – Pairwise SVM
![Page 14: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/14.jpg)
14
Succession of layers of trainable convolutions and spatial sub-sampling
Extracts features of: Increasingly large receptive fields Increasing complexity Increasing robustness to irrelevant
variabilities
The network has 90,575 trainable parameters (Full propagation requires 3,896,920 multiply-adds)
Levenberg-Marquardt algorithm with diagonal approximation of the Hessian for 250,000 online updates
No over-training, no early-stops.
Experiments – Convolutional Network
![Page 15: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/15.jpg)
15
Results
![Page 16: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/16.jpg)
16
Results
![Page 17: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/17.jpg)
17
Results
![Page 18: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/18.jpg)
18
Discussion
These are the first systematic experiments that apply machine learning to shape-based generic object recognition with invariance to pose and lighting
Normalized-uniform dataset is unrealistically favorable to template-based methods because of the perfect conditions
The size of the jittered database was too large to carry out experiments with the template based methods
The shear size and complexity of the jittered datasets place them above the practical limits of template based methods.
Binocular convolution network take advantage of disparity information to locate the outline of the object
![Page 19: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/19.jpg)
19
Conclusions
The system can spot and recognize the animals, human figures, planes, cars and trucks in natural scenes with high accuracy at a rate of several frames per second
By presenting the input image at multiple scales, the system can detect those objects over a wide range of scales
Popular template-based approaches including SVMs are limited for classification over very large datasets with complex variabilities.
Convolutional Networks can be scanned over large images very efficiently
The NORB Dataset opens the door to large-scale experiments with learning-based approaches to invariant object recognition
Future works may use trainable classifiers that incorporate explicit models of image formation and geometry
![Page 20: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/20.jpg)
20
Comments
The authors just dealt with their problems, not to the specific problems of the algorithms
The paper is well organized and clearly understandable
The dataset preparation details might be reduced
Previous works in the area could be discussed more with their disadvantages
![Page 21: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,](https://reader034.fdocuments.net/reader034/viewer/2022042522/56649cf85503460f949c8fd2/html5/thumbnails/21.jpg)
21
Questions?