Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang...

85
Introduction to Deep Learning Xiaogang Wang Department of Electronic Engineering, The Chinese University of Hong Kong

Transcript of Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang...

Page 1: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Introduction to Deep Learning

Xiaogang Wang

Department of Electronic Engineering, The Chinese University of Hong Kong

Page 2: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Outline

• Historical review of deep learning

• Introduction to classical deep models

• Why does deep learning work?

Page 3: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Machine Learning

)(xFx y

Class label(Classification)Real-valued Vector

(Estimation)

{dog, cat, horse, flower, …}Object recognition

Super resolution

Low-resolution image

High-resolution image

Page 4: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Neural networkBack propagation

1986

• Solve general learning problems

• Tied with biological system

Nature

Page 5: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Neural networkBack propagation

1986

x1 x2 x3

w1 w2w3

g(x)

f(net)

Nature

Page 6: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Neural networkBack propagation

1986

• Solve general learning problems

• Tied with biological system

But it is given up…

• Hard to train

• Insufficient computational resources

• Small training sets

• Does not work well

Nature

Page 7: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Neural networkBack propagation

1986 2006

• SVM

• Boosting

• Decision tree

• KNN

• …

• Flat structures

• Loose tie with biological systems

• Specific methods for specific tasks

– Hand crafted features (GMM-HMM, SIFT, LBP, HOG)

Kruger et al. TPAMI’13

Nature

Page 8: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Neural networkBack propagation

1986 2006

Deep belief netScience

… …

… …

… …

… … • Unsupervised & Layer-wised pre-training

• Better designs for modeling and training (normalization, nonlinearity, dropout)

• New development of computer architectures– GPU

– Multi-core computer systems

• Large scale databases

Big Data !

Nature

Page 9: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Machine Learning with Big Data

• Machine learning with small data: overfitting, reducing model complexity (capacity)

• Machine learning with big data: underfitting, increasing model complexity, optimization, computation resource

Page 10: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

How to increase model capacity?

Curse of dimensionality

D. Chen, X. Cao, F. Wen, and J. Sun. Blessing of dimensionality: Highdimensional feature and its efficient compression for face verification. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2013.

Blessing of dimensionality

Learning hierarchical feature transforms (Learning features with deep structures)

Page 11: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Neural networkBack propagation

1986

• Solve general learning problems

• Tied with biological system

But it is given up…

2006

Deep belief netScience

deep learning results

Speech

2011

Nature

Page 12: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Rank Name Error rate

Description

1 U. Toronto 0.15315 Deep learning

2 U. Tokyo 0.26172 Hand-crafted features and learning models.Bottleneck.

3 U. Oxford 0.26979

4 Xerox/INRIA 0.27058

Object recognition over 1,000,000 images and 1,000 categories (2 GPU)

Neural networkBack propagation

1986 2006

Deep belief netScience Speech

2011 2012

Nature

A. Krizhevsky, L. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012.

Page 13: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Examples from ImageNet

Page 14: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Top5 Image Classification Error on ImageNet

14

AlexNet15.375%

ILSVRC 2012 ILSVRC 2013

Clarifai11.197%

ILSVRC 2014

GoogLeNet6.656%

Human5.1%

MSRA4.9%

Page 15: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

• How to label the data, or define the evaluation ask is challenging and important

Page 16: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

ImageNet Object Detection Task (2013)

16

• 200 object classes

• 40,000 test images

Page 17: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Mean Average Precision (mAP)

17

UvA-Euvision22.581%

ILSVRC 2013

RCNN31.4%

ILSVRC 2014

GoogLeNet43.9%

DeepID-Net50.3%

W. Ouyang and X. Wang, et al. “DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection,” CVPR 2015

Page 18: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Neural networkBack propagation

1986 2006

Deep belief netScience Speech

2011 2012

• ImageNet 2014 – object detection challenge

W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015

RCNN(Berkley)

Berkley vision

UvA-Euvision

DeepInsight GooLeNet(Google)

DeepID-Net (CUHK)

Model average n/a n/a n/a 40.5 43.9 50.3

Single model 31.4 34.5 35.4 40.2 38.0 47.9

Wanli Ouyang

Page 19: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Labeled Faces in the Wild (2007)

Best results without deep learning

Page 20: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural
Page 21: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Design Cycle start

Collect data

Preprocessing

Feature design

Choose and design model

Train classifier

Evaluation

end

Domain knowledge Interest of people working on computer vision, speech recognition, medical image processing,…

Interest of people working on machine learning

Interest of people working on machine learning and computer vision, speech recognition, medical image processing,…

Preprocessing and feature design may lose useful information and not be optimized, since they are not parts of an end-to-end learning system

Preprocessing could be the result of another pattern recognition system

Page 22: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Person re-identification pipeline

Pedestrian detection

Pose estimation

Body parts segmentation

Photometric & geometric

transform

Feature extraction

Classification

Face recognition pipeline

Face alignment

Geometric rectification

Photometricrectification

Feature extraction

Classification

Page 23: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Design Cycle with Deep Learning

start

Collect data

Preprocessing(Optional)

Design network

Feature learning

Classifier

Train network

Evaluation

end

• Learning plays a bigger role in the design circle

• Feature learning becomes part of the end-to-end learning system

• Preprocessing becomes optional means that several pattern recognition steps can be merged into one end-to-end learning system

• Feature learning makes the key difference

• We underestimated the importance of data collection and evaluation

Page 24: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

What makes deep learning successful in computer vision?

Deep learning

Li Fei-Fei Geoffrey Hinton

Data collection Evaluation task

One million images with labels

Predict 1,000 image categories

CNN is not new

Design network structure

New training strategies

Feature learned from ImageNet can be well generalized to other tasks and datasets!

Page 25: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Learning features and classifiers separately

• Not all the datasets and prediction tasks are suitable for learning features with deep models

Dataset A

feature transform

Classifier 1 Classifier 2...

Prediction on task 1

...Prediction on task 2

Deep learning

Training stage A Dataset B

feature transform

Classifier B

Prediction on task B (Our target task)

Training stage B

Page 26: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep learning can be treated as a language to described the world with great flexibility

Collect data

Preprocessing 1

Feature design

Classifier

Evaluation

Preprocessing 2

Collect data

Feature transform

Feature transform

Classifier

Deep neural network

Evaluation

Connection

Page 27: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Introduction to Deep Learning

• Historical review of deep learning

• Introduction to classical deep models

• Why does deep learning work?

Page 28: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Introduction on Classical Deep Models

• Convolutional Neural Networks (CNN)– Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Learning Applied to

Document Recognition,” Proceedings of the IEEE, Vol. 86, pp. 2278-2324, 1998.

• Deep Belief Net (DBN)– G. E. Hinton, S. Osindero, and Y. Teh, “A Fast Learning Algorithm for Deep Belief Nets,”

Neural Computation, Vol. 18, pp. 1527-1544, 2006.

• Auto-encoder– G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural

Networks,” Science, Vol. 313, pp. 504-507, July 2006.

Page 29: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Classical Deep Models

• Convolutional Neural Networks (CNN)– First proposed by Fukushima in 1980

– Improved by LeCun, Bottou, Bengio and Haffner in 1998

Convolution PoolingLearned filters

Page 30: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Backpropagation

W is the parameter of the network; J is the objective function

Output layer

Hidden layers

Input layer

Target values

Feedforwardoperation

Back error propagation

D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning Representations by Back-propagation Errors,” Nature, Vol. 323, pp. 533-536, 1986.

Page 31: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Classical Deep Models

• Deep belief net

– Hinton’06

P(x,h1,h2) = p(x|h1) p(h1,h2)

1

1

1

hx

hx

hx

1hx

,

),(

),(

),(E

E

e

eP

E(x,h1)=b' x+c' h1+h1' Wx

Initial pointPre-training:• Good initialization point• Make use of unlabeled data

Page 32: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Classical Deep Models

• Auto-encoder

– Hinton and Salakhutdinov 2006

x

1h

2h

1h~

x~

1W b1

2W b2

2W' b3

1W' b4Encoding: h1 = σ(W1x+b1)

h2 = σ(W2h1+b2)

Decoding: = σ(W’2h2+b3)

= σ(W’1h1+b4)

1h~

x~

Page 33: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Introduction to Deep Learning

• Historical review of deep learning

• Introduction to classical deep models

• Why does deep learning work?

• Properties of deep feature representations

Page 34: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Feature Learning vs Feature Engineering

Page 35: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Feature Engineering

• The performance of a pattern recognition system heavily depends on feature representations

• Manually designed features dominate the applications of image and video understanding in the past– Reply on human domain knowledge much more than data

– If handcrafted features have multiple parameters, it is hard to manually tune them

– Feature design is separate from training the classifier

– Developing effective features for new applications is slow

Page 36: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Handcrafted Features for Face Recognition

1980s

Geometric features

1992

Pixel vector

1997

Gabor filters

2 parameters

2006

Local binary patterns

3 parameters

Page 37: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Feature Learning

• Learning transformations of the data that make it easier to extract useful information when building classifiers or predictors– Learn the values of a huge number of parameters in feature

representations

– Make better use of big data

– Jointly learning feature transformations and classifiers makes their integration optimal

– Faster to get feature representations for new applications

Page 38: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep Learning Means Feature Learning

• Deep learning is about learning hierarchical feature representations

• Good feature representations should be able to disentangle multiple factors coupled in the data

Trainab

le Feature

Transfo

rm

Trainab

le Feature

Transfo

rm

Trainab

le Feature

Transfo

rm

Trainab

le Feature

Transfo

rm

Data …

Classifier

Pixel 1

Pixel n

Pixel 2 Ideal Feature

Transform

view

expression

Page 39: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep Learning Means Feature Learning

• How to effectively learn features with deep models

– With challenging tasks

– Predict high-dimensional vectors

Pre-train on classifying 1,000

categories

Fine-tune on classifying 201

categories

Feature representation

SVM binary classifier for each

categoryDetect 200 object classes on ImageNet

W. Ouyang and X. Wang et al. “DeepID-Net: deformable deep convolutional neural networks for object detection”, CVPR, 2015

Page 40: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Dataset A

feature transform

Classifier A

Distinguish 1000 categories

Training stage A

Dataset B

feature transform

Classifier B

Distinguish 201 categories

Training stage B

Dataset C

feature transform

SVM

Distinguish one object class from all the negatives

Training stage C

Fixed

Page 41: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Pedestrian Detection aided by Deep Learning Semantic Tasks

FemaleBagright

Femaleright

MaleBackpackBack

VehicleHorizontal

VehicleVertical

TreeVertical

Y. Tian, P. Luo, X. Wang, and X. Tang, “Pedestrian Detection aided by Deep Learning Semantic Tasks,” CVPR 2015

Page 42: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

42

Pedestrian Background

(a) Data Generation

patches160

Bb: Stanford Bkg.

bldg.

hard negatives

64

7

7

16

40

8 4

3248

6496

220 10 5

5

5 33

3

500

100

200

3

(b) TA-CNN

conv1conv2

conv3conv4

fc5

fc6

3

pedestrian classifier:

pedestrian attributes:

… shared bkg. attributes:

unshared bkg. attributes:

Ba: CamVid Bc: LM+SUNP: Caltech

hard negatives

D

……

SPV:x y

h(L)

z

y

as

ap

au

sky tree road traffic light

WL

Wz

h(L-1) Wm

Wap

Was

Wau

hard negatives

sky tree road vertical horizontal sky bldg. tree road vehiclebldg.

Figure 4:The proposed pipeline forpedestrian detection (B estview ed in color).

convolution and m ax-pooling,w hich are form ulated as

h v(l)n = relu(bv(l) +X

u

k vu (l)⇤h u (l− 1)n ), (1)

hv(l)n (i,j) = m ax

8 (p,q)2⌦(i,j ){h v(l)n (p,q)}. (2)

In Eqn.(1),relu(x) = m ax(0,x)isthe rectified linearfunc-tion [19]and ⇤denotes the convolution operatorapplied on

every pixelofthe feature m ap hu (l− 1)n ,w here h

u (l− 1)n and

hv(l)n stand forthe u-th inputchannelatthe l− 1 layerandthe v-th outputchannelatthe l layer,respectively. k vu (l)

and bv(l) denote the filters and bias. In Eqn.(2),the feature

m ap hv(l)n is partitioned into grid w ith overlapping cells,

each of w hich is denoted as⌦(i,j), w here (i,j) indicatesthe cell index. The m ax-pooling com pares value at eachlocation (p,q) ofa celland outputs the m axim um value ofeach cell.

Each hidden layerin fc5 and fc6 is obtained by

h (l)n = relu(W(l)T h (l− 1)n + b (l)), (3)

w here the higher levelrepresentation is transform ed fromlow er levelw ith a non-linear m apping. W (l) and b (l) arethe w eightm atrixesand bias vectoratthe l-th layer.

TA -C N N can be form ulated asm inim izing the log poste-riorprobability w ith respectto a setofnetw ork param eters

W

W ⇤= arg m inW

NX

n = 1

log p(yn ,opn ,o

sn ,o

un |x n ;W ), (4)

w here E = −P Nn = 1 log p(yn ,o

pn ,o

sn ,o

un |x n ) is a com -

plete loss function regarding the entire training set. H ere,w e illustrate that the shared attributes osn in Eqn.(4) arecrucialto learn shared representation across m ultiple scenedatasets B ’s.

For clarity,w e keep only the unshared scene attributesoun in the loss function, w hich then becom es E =

−P Nn = 1 log p(o

un |x n ). Let x

a denote the sam ple of B a.A shared representation can be learned if and only if allthe sam ples share atleastone target(attribute). Since thesam ples are independent,the lossfunction can be expandedasE = −

P Ii= 1 log p(o

u 1i |x

ai)−P Jj= 1 log p(o

u 2j ,o

u 3j |x

bj)−

P Kk= 1 log p(o

u 4k |x

ck),w here I + J + K = N ,im plying that

each datasetis only used to optim ize its corresponding un-shared attribute,although allthe datasets and attributes aretrained in a single TA -C N N .Forinstance,the classificationm odel of ou 1 is learned by using B a w ithout leveragingthe existence of the other datasets. In other w ords, theprobability of p(ou 1|x a,x b,x c) = p(ou 1|x a) because ofm issing labels. The above form ulation is not sufficientto learn shared features am ong datasets, especially w henthe data have large differences. To bridge m ultiple scenedatasets B ’s, w e introduce the shared attributes os, the

Page 43: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Pedestrian Detection on Caltech (average miss detection rates)

43

HPG+SVM68% DPM

63%

Joint DL39%

DL aided by semantic tasks

17%

W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” ICCV 2013.

Y. Tian, P. Luo, X. Wang, and X. Tang, “Pedestrian Detection aided by Deep Learning Semantic Tasks,” CVPR 2015.

Page 44: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Example 1: deep learning generic image features

• Hinton group’s groundbreaking work on ImageNet– They did not have much experience on general image classification on

ImageNet

– It took one week to train the network with 60 Million parameters

– The learned feature representations are effective on other datasets (e.g. Pascal VOC) and other tasks (object detection, segmentation, tracking, and image retrieval)

Page 45: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

PASCAL VOC (SIFT, HOG, DPM…)

45

Page 46: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

PSCAL VOC (CNN features)

46

Page 47: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

PSCAL VOL (CNN features)

47

DeepID-Net 64.1%

Our current result 73.9%

Page 48: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep learning is feature learning

Image classification Object detection Tracking

SegmentationFeatures learned on ImageNet

Page 49: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep Structures vs Shallow Structures(Why deep?)

Page 50: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Shallow Structures

• A three-layer neural network (with one hidden layer) can approximate any classification function

• Most machine learning tools (such as SVM, boosting, and KNN) can be approximated as neural networks with one or two hidden layers

• Shallow models divide the feature space into regions and match templates in local regions. O(N) parameters are needed to represent N regions

SVM

Page 51: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep Machines are More Efficient for Representing Certain Classes of Functions

• Theoretical results show that an architecture with insufficient depth can require many more computational elements, potentially exponentially more (with respect to input size), than architectures whose depth is matched to the task (Hastad 1986, Hastad and Goldmann 1991)

• It also means many more parameters to learn

Page 52: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

• Take the d-bit parity function as an example

• d-bit logical parity circuits of depth 2 have exponential size (Andrew Yao, 1985)

• There are functions computable with a polynomial-size logic gates circuits of depth k that require exponential size when restricted to depth k -1 (Hastad, 1986)

(X1, . . . , Xd) Xi is even

Page 53: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

• Architectures with multiple levels naturally provide sharing and re-use of components

Honglak Lee, NIPS’10

Page 54: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Humans Understand the World through Multiple Levels of Abstractions

• We do not interpret a scene image with pixels– Objects (sky, cars, roads, buildings, pedestrians) -> parts (wheels,

doors, heads) -> texture -> edges -> pixels

– Attributes: blue sky, red car

• It is natural for humans to decompose a complex problem into sub-problems through multiple levels of representations

Page 55: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Humans Understand the World through Multiple Levels of Abstractions

• Humans learn abstract concepts on top of less abstract ones

• Humans can imagine new pictures by re-configuring these abstractions at multiple levels. Thus our brain has good generalization can recognize things never seen before.– Our brain can estimate shape, lighting and pose from a face image and

generate new images under various lightings and poses. That’s why we have good face recognition capability.

Page 56: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Local and Global Representations

Page 57: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

• The way these regions carve the input space still depends on few parameters: this huge number of regions are not placed independently of each other

• We can thus represent a function that looks complicated but actually has (global) structures

Page 58: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Joint Learning vs Separate Learning

Data collection

Preprocessing step 1

Preprocessing step 2

… Feature extraction

Training or manual design

Classification

Manual design

Training or manual design

Data collection

Feature transform

Feature transform

… Feature transform

Classification

End-to-end learning

? ? ?

Deep learning is a framework/language but not a black-box model

Its power comes from joint optimization and increasing the capacity of the learner

Page 59: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

What if we treat an existing deep model as a black box in pedestrian detection?

ConvNet−U−MS

– Sermnet, K. Kavukcuoglu, S. Chintala, and LeCun, “Pedestrian Detection with Unsupervised Multi-Stage Feature Learning,” CVPR 2013.

Page 60: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Results on Caltech Test Results on ETHZ

Page 61: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

• N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. CVPR, 2005. (6000 citations)

• P. Felzenszwalb, D. McAlester, and D. Ramanan. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008. (2000 citations)

• W. Ouyang and X. Wang. A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling. CVPR, 2012.

Page 62: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Our Joint Deep Learning Model

W. Ouyang and X. Wang, “Joint Deep Learning for Pedestrian Detection,” Proc. ICCV, 2013.

Page 63: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Modeling Part Detectors

• Design the filters in the second convolutional layer with variable sizes

Part models Learned filtered at the second convolutional layer

Part models learned from HOG

Page 64: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deformation Layer

Page 65: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Visibility Reasoning with Deep Belief Net

Correlates with part detection score

Page 66: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Experimental Results

• Caltech – Test dataset (largest, most widely used)

2000 2002 2004 2006 2008 2010 2012 201430

40

50

60

70

80

90

100

Ave

rage

mis

s ra

te (

%)

Page 67: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Experimental Results

• Caltech – Test dataset (largest, most widely used)

2000 2002 2004 2006 2008 2010 2012 201430

40

50

60

70

80

90

100 95%

Ave

rage

mis

s ra

te (

%)

Page 68: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Experimental Results

• Caltech – Test dataset (largest, most widely used)

2000 2002 2004 2006 2008 2010 2012 201430

40

50

60

70

80

90

100 95%68%

Ave

rage

mis

s ra

te (

%)

Page 69: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Experimental Results

• Caltech – Test dataset (largest, most widely used)

2000 2002 2004 2006 2008 2010 2012 201430

40

50

60

70

80

90

100 95%68%

63% (state-of-the-art)

Ave

rage

mis

s ra

te (

%)

Page 70: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Experimental Results

• Caltech – Test dataset (largest, most widely used)

2000 2002 2004 2006 2008 2010 2012 201430

40

50

60

70

80

90

100 95%68%

63% (state-of-the-art)

53%

39% (best performing)Improve by ~ 20%

W. Ouyang, X. Zeng and X. Wang, "Modeling Mutual Visibility Relationship in Pedestrian Detection ", CVPR 2013.W. Ouyang, Xiaogang Wang, "Single-Pedestrian Detection aided by Multi-pedestrian Detection ", CVPR 2013.X. Zeng, W. Ouyang and X. Wang, ” A Cascaded Deep Learning Architecture for Pedestrian Detection,” ICCV 2013.W. Ouyang and Xiaogang Wang, “Joint Deep Learning for Pedestrian Detection,” IEEE ICCV 2013.

W. Ouyang and X. Wang, "A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling,“ CVPR 2012.

Ave

rage

mis

s ra

te (

%)

Page 71: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

DN-HOGUDN-HOGUDN-HOGCSSUDN-CNNFeatUDN-DefLayer

Page 72: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deformation layer for general object detection

Page 73: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deformation layer for repeated patterns

Pedestrian detection General object detection

Assume no repeated pattern Repeated patterns

Page 74: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deformation layer for repeated patterns

Pedestrian detection General object detection

Assume no repeated pattern Repeated patterns

Only consider one object class Patterns shared across different object classes

Page 75: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deformation constrained pooling layer

Can capture multiple patterns simultaneously

Page 76: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep model with deformation layer

Training scheme Cls+Det Loc+Det Loc+Det

Net structure AlexNet Clarifai Clarifai+Def layer

Mean AP on val2 0.299 0.360 0.385

Patterns shared across different classes

Page 77: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Large learning capacity makes high dimensional data transforms possible, and makes better use

of contextual information

Page 78: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

• How to make use of the large learning capacity of deep models?

– High dimensional data transform

– Hierarchical nonlinear representations

?

SVM + feature

smoothness, shape prior…Output

Input

High-dimensional data transform

Page 79: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Face Parsing

• P. Luo, X. Wang and X. Tang, “Hierarchical Face Parsing via Deep Learning,” CVPR 2012

Page 80: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Motivations

• Recast face segmentation as a cross-modality data transformation problem

• Cross modality autoencoder

• Data of two different modalities share the same representations in the deep model

• Deep models can be used to learn shape priors for segmentation

Page 81: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Training Segmentators

Page 82: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural
Page 83: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Fully convolutional neural network

Convolution-pooling layers

Fully connected layers “Fusion” convolutional layers implemented by 1 x 1 kernel

Page 84: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Big data

Rich information

Challenging supervision task with rich predictions

How to make use of it?

Capacity

Go deeper

Joint optimization

Hierarchical feature learning

Go wider

Take large input

Capture contextual information

Domain knowledge

Make learning more efficient

Reduce capacity

Page 85: Introduction to Deep Learning - ShanghaiTechssds2015.shanghaitech.edu.cn/slides/tutorial/Xiaogang Wang... · Introduction to Deep Learning ... deformable deep convolutional neural

Deep learning = ?

Machine learning with big data

Feature learning

Joint learning

Contextual learning