Digest of Human Detection from CVPR2015
-
Upload
belltailjp -
Category
Technology
-
view
472 -
download
0
Transcript of Digest of Human Detection from CVPR2015
Digest of Human Detectionfrom CVPR 2015
Jan. 27th, 2016, Daichi SUZUO
Digest of Human Detection from CVPR2015
Features
1. Combination Features and Models for Human Detection - Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection - S. Zhang et al.
Training
3. Learning Scene-Specific Pedestrian Detectors without Real Data - H.Hattori et al.
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
5. Pedestrian Detection aided by Deep Learning Semantic Tasks - Y. Tian et al.
Dataset / Benchmark
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
Fundamentals of Human Detection
• Machine learning based bi-class classifier
• Sliding window search
Negative class
Positive class Convert toimage feature
Training Classifier
ClassifierCrop Feature
extraction
Human?
Not human?
Image features
1. Combination Features and Models for Human Detection- Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection- S. Zhang et al.
θ
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Popular HOG feature[Dalal05]
Input image Edge-image
Edgeextraction
(“cell”)pixel-wise
gradient
power
Histogram
θ
• Popular HOG feature[Dalal05]: 1st order feature
power
Input image 1st derivative
Differentiate
Histogram
(“cell”)pixel-wise
gradient
idea: How about extending to 0-th/2nd order?
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• 2nd order: HOB – “bar” shape
• Same as HOG, just using 2nd derivative
• 0th order: HOC – color feature
• Using HSI color space; H as θ, S as power
ignore I
convert to HSIR
G
V
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Combine them into one vector: HOG-III feature
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Train different classifiers from the same HOG-IIIs
• Detect individually, and fuse into one result
Inputimage
HOG-IIIfeatures
Detection byGrammar model[Girshick11]
Detection byPoselet model[Bourdev10]
FusionFinalresult
(This is one of the key process of the method
Please refer the original paper for more details)
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
Effect of HOG-III
Effect of Fusion
Feature AP
HOG 45.8%
HOC+HOG+HOB 50.1%
HOG-III 51.3%
Classifier AP
Single use of Grammer 45.8%
Single use of Poselet 47.0%
Fusion 52.3% Combining HOG-III and Fusion
performs best
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Viola-Jones method)
…
… …
Input imageLearn decision-tree
by AdaBoost
Extract “Haar-like”Features (scalar)
※Sum of difference between
white and black region
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Integral Channel Features)
…
… …
Input imageLearn decision-tree
by AdaBoost
“channel”
Extract sumof rectangle
※Unlike
Haar-like
Transform
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Filtered Channel Features)
…
… …
Learn decision-tree
by AdaBoost
“channel” Apply various
filters
(convolution)…
*
*
Pick-uppixel value
as a feature…
2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
Using 50 filters
performs bestAchieved the highest accuracy
Training
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
4. Taking a Deeper Look at Pedestrians- J. Hosang et al.
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
• Train detector by CG-based training datasets
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
Real background
(static image)
annotate
CG-based human
compositeSimulated scene
• Not only scene-specific, but also location-specific!
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
…
Classifier
Classifier
…
Grid with overwrap
(102~105 patches)Training images
(~103 pos, ~103 neg
for each patch)
JointClassifierEnsembleTraining
Scene-specific
Location-specific
detectors
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
Patch size # detectors Avg. Precision
8x8 371 .802
16x16 102 .798
32x32 30 .764
Effect of location-specific detection
Example of the detection result
Comparison
“convnet still underperforms state-of-the-arts”
…Really?
Enhance know-how of convnet based detector
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
• Small network (CifarNet) / Big network (AlexNet)
• Window size
• How to collect training images
• Fine-tuning
• Number and Type of layers
• …
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
Convnet with the best configuration outperforms!
Interesting points:
• Ratio of pos/neg does not affect
to the accuracy so much
• Data-augumentation is effective
• Network size should be chosen
by the amount of training samples
• ...
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Binary-classification is sometimes insufficient…
Human
Not human
(Hard negatives)
It is necessary to use semantic information jointly
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
Also recognizes current scene semantics
• Pedestrian attribute (e.g. wearing backpack)
• Background attribute (e.g. road, sky, …)
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
Difficult to collect various (annotated) negs from one dataset…
Transfer from other annotated datasets by TA-CNN
(Please refer the original and related papers for more details about TA-CNN…)
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Comparison with CNN-based methods
Example of detection results
Benchmark / Dataset
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
• Dataset of visible-light and thermal image
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
Contributions:
• Color and thermal images
• Both test/training data
• Temporally-corresponded tag
• Large enough
• …
Takeaways
• Human detection is still challenging
• Deep learning does not necessarily solve
every problems at this moment
• There are several knowledge that might be helpful
for your research/hobby/…
Takeaways
References / Supplemental materials
1. Filtered channel features for pedestrian detection
4. Taking a Deeper Look at Pedestrians• Author's website: http://rodrigob.github.io/
3. Learning Scene-Specific Pedestrian Detectors without Real Data• Project: http://vishnu.boddeti.net/projects/detection-by-synthesis.html
• YouTube: https://youtu.be/2Jf7faozHUs
5. Pedestrian Detection aided by Deep Learning Semantic Tasks• Project: http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/
6. Multispectral Pedestrian Detection: Benchmark Dataset and Baseline• Lab: http://rcv.kaist.ac.kr/v2/
And all the papers of CVPR2015 are available at cv-foundation.org
See also