Sketch-a-Net that Beats Humans - EECSqian/Qian's Materials/paper/BMVC_2015_84.pdf · Sketch-a-Net...
Transcript of Sketch-a-Net that Beats Humans - EECSqian/Qian's Materials/paper/BMVC_2015_84.pdf · Sketch-a-Net...
Sketch-a-Net that Beats Humans
Qian Yu
SketchLab@QMUL
Queen Mary University of London
1
Authors
Yongxin Yang
Yi-Zhe Song Tao Xiang Timothy Hospedales
2
Qian Yu
Let’s play a game!
Round 1 – Easy
3
fish face pizza
Let’s play a game!
Round 2 – Hard
4
piano
violin guitarcomb
Free-hand Sketch
Definition: drawn by non-artists on touchscreen
5
Free-hand sketchProfessional sketch
Motivation
• It has been used as a basic visual communication media
since pre-historic times
• A wide range of sketch-related applications have been
investigated
6
1500 BC AD 1970AD 1800
Why difficult?
Sketches are highly iconic and abstract
7
Why difficult?
Sketches exhibit hugely varied levels of details/abstraction
12
face
bicycle
8
Less abstract
Why difficult?
Sketches lack visual cues, like color, texture etc.
?
119
Prior Work
• Limitations
– No specific or effective features
– No sequential ordering information
1410
Prior work generally follows the conventional image
classification paradigm
Feature Extraction Feature Representation Classification
HOG
SIFT
shape context
…
…
BoW
FV
star graph
…
…
SVM
…
…
Our Method
Sketch-a-Net
(1) Specifically designed CNN for feature representation learning
(2) Multi-channel CNN architecture to embed sequential ordering
information
(3) Multi-scale network ensemble to address the variability of
levels of abstraction
1611
Our Method
Architecture of Sketch-a-Net
1512
Sketch-a-Net vs. AlexNet
1) A CNN for Sketch Recognition
• Modification of AlexNet [1]
– Larger first layer filters: 15*15
– Remove local response normalization layer
– Less filters, larger pooling size, higher dropout rate etc.
1713
Sketch-a-Net
2) Modelling sketch stroke order with multiple channel
Part 1 Whole Sketch Part 2 Part 3
Part 1 Whole Sketch Part 2 Part 3
1914
Sketch-a-Net
2) Modelling sketch stroke order with multiple channels
• 6 channels
2315
Channel 1
Channel 2
Channel 3
Channel 6
Channel 5
Channel 4
Part 2
Part 1
Part 3
Sketch-a-Net
3) A multi-scale network ensemble with Joint Bayesian fusion
2116
• How do we get multi-scale network– Original size: 256*256
– Downsample size: 256*256, 224*224, 192*192, 128*128, 64*64
– Upsample to original size different blur levels
• Joint Bayesian fusion [2]
• Testing: using the likelihood ratio as a distance for
KNN matching
Sketch-a-Net
2117
Experiments and Results
• Dataset: TU-Berlin sketch dataset [3]
– Collected on Amazon Mechanical Turk (AMT) from 1350
participants
– 250 categories, 80 sketches per category
• Data pre-processing
– Scale to 256*256
– Augmentation: reflection, rotation (in the range [-5 5] degrees),
horizontal and vertical shifts (up to 32 pixels in each direction)
– ( 20,000*0.67 )*( 32*32 *11*2 ) = 300 millions
2218
Experiments and Results
HOG-SVM [3] Ensemble [4] MKL-SVM [5] FV-SP [6]
56% 61.5% 65.8% 68.9%
AlexNet-
SVM [1]
AlexNet-
Sketch [1]LeNet [7] Sketch-a-
NetHuman [3]
67.1% 68.6% 55.2% 74.9% 73.1%
Table 1: Comparison with state of the art results on sketch recognition
2319
Cla
ssic
Deep
Experiments and Results
Full Model Multiple Channel &
Single Scale
Single Channel
& Single ScaleAlexNet-Sketch
74.9% 72.6% 72.2% 68.6%
Table 2: Evaluation on the contributions of individual components of Sketch-a-Net
2420
Success at Fine-grained Classification
Success cases
2621
Pigeon
Seagull
Human Sketch-a-
Net
Bird*
(average)
24.8% 42.5%
Seagull 2.5% 23.9%
*Bird (average) stands for the average
accuracy of “seagull”, “flying-bird”, “standing-
bird” and “pigeon”
Failure with Bad Drawings
Failure cases
2622
windmill
(Fan)
ice-cream-cone
(pizza)
lobster
(bear)
moon
(pizza)
dog
(pig)
Insufficient Training Data
Failure cases
2623
Different kinds of bridge.
ladder
piano
rifle
Mouse
(animal)
bench
References
[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep con-
volutional neural networks. In NIPS, 2012.
[2] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint
formulation. In ECCV, 2012.
[3] M. Eitz, J. Hays, and M. Alexa. How do humans sketch objects? In SIGGRAPH, 2012.
[4] Y. Li, Y. Song, and S. Gong. Sketch recognition by ensemble matching of structured
features. In BMVC, 2013.
[5] Y. Li, T. M. Hospedales, Y. Song, and S. Gong. Free-hand sketch recognition by multi-
kernel feature learning. CVIU, 2015.
[6] R. G. Schneider and T. Tuytelaars. Sketch classification and classification-driven
analysis using fisher vectors. In SIGGRAPH Asia, 2014.
[7] Y. LeCun, L. Bottou, G. B. Orr, and K. Muller. Efficient backprop. Neural networks:
Tricks of the trade, pages 9–48, 2012.
2924
3025
Questions