Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... ·...
Transcript of Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... ·...
![Page 1: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/1.jpg)
Big Data Analytics Architectures, Algorithms
and Applications!Part #2: Intro to deep
Learning
Edward Chang 張智威 HTC (Prior: Google & U. California) !
Simon Wu!HTC (prior: Twitter & Microsoft)
![Page 2: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/2.jpg)
Three Lectures • Lecture #1: Scalable Big Data Algorithms
– Scalability issues – Key algorithms with applica=on examples
• Lecture #2: Intro to Deep Learning – Autoencoder & Sparse Coding – Graph models: CNN, MRF, & RBM
• Lecture #3: Analy=cs PlaMorm [by Simon Wu] – Intro to LAMA plaMorm – Code lab
1/27/15 Ed Chang @ BigDat 2015
![Page 3: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/3.jpg)
Acknowledging Slide Contributors
• Geoffrey Hinton • Yoshua Bengio • Russ Salakhutdinov • Kai Yu • Yann Lecun • Andrew Ng • Steven Seitz
1/27/15 Ed Chang @ BigDat 2015
![Page 4: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/4.jpg)
Lecture #2 Outline
• Data Posteriors vs. Human Priors • Learn p(x) from Big Data
– Use NN to construct Autoencoder – Sparse Coding – Dynamic Par=al
• Graphical Models – CNN, MRF, & RBM
• Demo
1/27/15 Ed Chang @ BigDat 2015
![Page 5: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/5.jpg)
Representa=on?
1/27/15 Ed Chang @ BigDat 2015
![Page 6: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/6.jpg)
Knowledge or Feature extrac1on in Image processing: involves using algorithms to detect and isolate various desired edges or shapes
Low-‐level: Edge detec=on, corner detec=on, ridge detec=on, or more generally “Scale-‐invariant feature transform” (SIFT)
Curvature: Shape informa=on, blob detec=on
Hough transform: Lines, circles/ellipse, arbitrary shapes (Generalized Hough Transform)
Typical Image/Video Representa=on Based on Domain Knowledge and Human Priors
1/27/15 Ed Chang @ BigDat 2015
![Page 7: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/7.jpg)
Template matching (medical imaging)
Flexible methods for 2D, 3D or 3D+=me edge extrac=on, road detec=on, MRI, fMRI
Color and texture representa1ons: Histograms, various transforma=ons for conduc=ng frequency-‐domain analysis. e.g., wavelets
Mo1on: Mo=on detec=on: e.g., op=cal flow, global or area based
…Many Related Work on Representa=on
1/27/15 Ed Chang @ BigDat 2015
![Page 8: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/8.jpg)
Key Design Goals for Representa=on
Design features x that are invariant and selec+ve • Good Invariance
– Same object should have the same features
• Good Selec=vity (Disentanglement) – Different objects should exhibit different features for telling them apart
Once x has been designed, find label y for x and then learn p(y|x)
1/27/15 Ed Chang @ BigDat 2015
![Page 9: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/9.jpg)
Challenges • Invariance affected by noise
– Environmental factor (e.g., ligh=ng condi=on, occlusion) – Equipment factor (e.g., different camera brands different colors and gamma correc=on)
– Aliasing (e.g., cars have different models, hence different features)
• Selec=vity requires good similarity func=ons • Labeled data is tough to acquire
– Learning robust model requires big data
1/27/15 Ed Chang @ BigDat 2015
![Page 10: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/10.jpg)
Remedy #1 Learn ϕ from data p(x|ϕ) ≈ p*(x)
• Instead of designing, learn features ϕ from data, from data
• Data: not just the original data, but adding variants to the data – E.g., adding scaled, rotated, cropped, mirrored, gamma adjusted images
• Instead of requiring invariant features as input to a model, let the model cope with invariance
• Then, learn features ϕ for predic=ng p(x|ϕ) accurately (p(x|ϕ)≈ p*(x)) in an unsupervised way from data already covering variant condi=ons
1/27/15 Ed Chang @ BigDat 2015
![Page 11: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/11.jpg)
Remedy #2 Deep Model
• Learn representa=on in a hierarchical way • [T. Serre, T. Poggio; MIT 2005]
1/27/15 Ed Chang @ BigDat 2015
![Page 12: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/12.jpg)
Lecture Outline
• Data Posteriors vs. Human Priors • Learn p(x) from Big Data
– Use NN to Construct Autoencoder – Sparse Coding – Dynamic Par=al
• Graphical Models – CNN, MRF, & RBM
• Demo
1/27/15 Ed Chang @ BigDat 2015
![Page 13: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/13.jpg)
Mul=ple-‐Layer Networks Neuron Network (NN) Model
An elementary neuron with R inputs is shown below. Each input is weighted with an appropriate w. The sum of the weighted inputs and the bias forms the input to the transfer func=on f. Neurons can use any differen1able transfer func1on f to generate their output.
![Page 14: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/14.jpg)
NN Model Transfer Func=ons (Ac=vi=on Func=on)
Mul=layer networks oren use the log-‐sigmoid transfer func=on logsig. The func=on logsig generates outputs between 0 and 1 as the neuron's net input goes from nega=ve to posi=ve infinity
![Page 15: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/15.jpg)
NN Model Feedforward Network
A single-‐layer network of S logsig neurons having R inputs is shown below in full detail on the ler and with a layer diagram on the right.
![Page 16: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/16.jpg)
Example Four-‐layer NN
1/27/15 Ed Chang @ BigDat 2015
Input Layer Hidden Layer #1 Hidden Layer #2 Output Layer
y
![Page 17: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/17.jpg)
NN Model Learning Algorithm
The following slides describes learning process of mul=-‐layer neural network employing backpropaga1on algorithm. To illustrate this process the three layer neural network with two inputs and one output,which is shown in the picture below, is used:
![Page 18: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/18.jpg)
Learning Algorithm: Backpropaga=on
Each neuron is composed of two units. First unit adds products of weights coefficients and input signals. The second unit realizes a nonlinear func=on, called neuron transfer (ac=va=on) func=on. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element. Signal y is also output signal of neuron.
![Page 19: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/19.jpg)
Feed Forward Pictures below illustrate how signal is forward-‐feeding through the network, Symbols w(xm)n represent weights of connec=ons between network input xm and neuron n in input layer. Symbols yn represents output signal of neuron n.
![Page 20: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/20.jpg)
Feed Forward
![Page 21: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/21.jpg)
Feed Forward
![Page 22: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/22.jpg)
Feed Forward Propaga=on of signals through the hidden layer. Symbols wmn represent weights of connec=ons between output of neuron m and input of neuron n in the next layer.
![Page 23: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/23.jpg)
Feed Forward
![Page 24: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/24.jpg)
Learning Algorithm: Forward Pass
Propaga=on of signals through the output layer.
![Page 25: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/25.jpg)
Learning Algorithm: Backpropaga=on
To teach the neural network we need training data set. The training data set consists of input signals (x1 and x2 ) assigned with corresponding target (desired output) z. The network training is an itera=ve process. In each itera=on weights coefficients of nodes are modified using new data from training data set. Modifica=on is calculated using algorithm described below: Each teaching step starts with forcing both input signals from training set. Arer this stage we can determine output signal values for each neuron in each network layer.
![Page 26: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/26.jpg)
Learning Algorithm: Backpropaga=on
In the next algorithm step the output signal of the network y is compared with the desired output value (the target z), which is found in training data set. The difference is called error signal δ of output layer neuron
![Page 27: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/27.jpg)
Learning Algorithm: Backpropaga=on
The idea is to propagate error signal δ (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.
![Page 28: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/28.jpg)
Learning Algorithm: Backpropaga=on
The idea is to propagate error signal δ (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.
![Page 29: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/29.jpg)
Learning Algorithm: Backpropaga=on
The weights' coefficients wmn used to propagate errors back are equal to this used during compu=ng output value. Only the direc=on of data flow is changed (signals are propagated from output to inputs one arer the other). This technique is used for all network layers. If propagated errors came from few neurons they are added. The illustra=on is below:
![Page 30: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/30.jpg)
Learning Algorithm: Backpropaga=on
When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents deriva=ve of neuron ac=va=on func=on (which weights are modified).
![Page 31: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/31.jpg)
Learning Algorithm: Backpropaga=on
When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents deriva=ve of neuron ac=va=on func=on (which weights are modified).
![Page 32: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/32.jpg)
Learning Algorithm: Backpropaga=on
When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents deriva=ve of neuron ac=va=on func=on (which weights are modified).
![Page 33: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/33.jpg)
Sigmoid func=on f(e) and its deriva=ve f’(e)
Ed Chang @ BigDat 2015
f (e) = 11+ e−βe
, β is the paramter for slope
Hence
f ' (e) = df (e)de
=d 1
1+ e−βe"
#$
%
&'
d(1+ e−βe )df (e−βe )de
f ' (e) = −β(1+ e−βe )2 e
−βe =−β
(1+ e−βe )2 e−e
=1
(1+ e−βe )−βe−e
(1+ e−βe )= f (e) 1−β f (e)( )
For simplicity, paramter for the slope β =1f ' (e) = f (e) 1− f (e)( )
hup://link.springer.com/chapter/10.1007%2F3-‐540-‐59497-‐3_175#page-‐1
hup://mathworld.wolfram.com/SigmoidFunc=on.html
1/27/15
![Page 34: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/34.jpg)
Autoencorder NN for Unsupervised Compression
1/27/15 Ed Chang @ BigDat 2015
![Page 35: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/35.jpg)
hw,b(x) ≈x
1/27/15 Ed Chang @ BigDat 2015
![Page 36: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/36.jpg)
Parameter Learning
• 10x10 images with 100 pixels
• R100 Possible configura=ons
• H hidden units – H = 100? – H = 50? PCA
• Too computa=onal intensive to learn w
1/27/15 Ed Chang @ BigDat 2015
![Page 37: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/37.jpg)
Learning Algorithm
• Suppose ϕ (or h) to be a set of hidden variables • Model image x with k independent hidden features ϕi with addi=ve noise v
• The goal is to find a set of h such that posterior P(x|ϕ) us as close as P*(x) or to minimize KL divergence between the two
1/27/15 Ed Chang @ BigDat 2015
x = aiφi + v(x)i=1
k
∑
![Page 38: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/38.jpg)
…Learning Algorithm • Minimize KL divergence between the two dist.
• Since P*(x) is constant across choice of ϕ, Min KL è Maximize the log-‐likelihood P(x|ϕ)
1/27/15 Ed Chang @ BigDat 2015
D(P*(x) || P(x |φ)) = P*(x)log P*(x)P(x |φ)!
"#
$
%&dx∫
φ*= argmaxφ log(P(x |φ)
φ*,a*= argminφ,a x( j ) − ai( j )φi
i=1
k
∑j=1
m∑
2
+λ S(ai( j )
i=1
k∑ ) Revisit this later
![Page 39: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/39.jpg)
Lecture Outline
• Data Posteriors vs. Human Priors • Learn p(x) from Big Data
– Use NN to Construct Autoencoder – Sparse Coding – Dynamic Par=al
• Graphical Models – CNN, MRF, & RBM
• Demo
1/27/15 Ed Chang @ BigDat 2015
![Page 40: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/40.jpg)
General Priors in Real-‐World Data [Y. Bengio, et al., 2014]
• A Hierarchical Organiza=on of Factors • Smoothness
– x ≈y à f(x) ≈ f(y) – Nearest neighbor assump=on
• Local manifold – Clustered – Low degree of freedom – E.g., PCA
• Distributed Representa=ons – Feature reuse, and abstract & invariant representa=ons – Dynamic and par=al 1/27/15 Ed Chang @ BigDat 2015
![Page 41: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/41.jpg)
• Every learning model is a variant of the nearest neighbor model
• Similar objects should reside in the neighborhood of a feature subspace
Smoothness Nearest Neighbor Model
1/27/15 Ed Chang @ BigDat 2015
![Page 42: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/42.jpg)
Local Low dimensional manifolds
K. Yu and A. Ng, Tutorial: Feature Learning for Image classifica1on Part 3: Image Classifica1on using Sparse Coding: Advanced Topics, ECCV-‐2010.
Data Manifold
Local linear
1/27/15 Ed Chang @ BigDat 2015
![Page 43: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/43.jpg)
Smooth, Local, Sparse
K. Yu and A. Ng, Tutorial: Feature Learning for Image classifica1on Part 3: Image Classifica1on using Sparse Coding: Advanced Topics, ECCV-‐2010.
Data
Basis
Local linear
Data Manifold
Each datum can be represented by its neighbor anchors
Sparse Combina=on
1/27/15 Ed Chang @ BigDat 2015
![Page 44: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/44.jpg)
Sparse Coding [Olshausen & Field,1996]
• Find representa=on of data, unsupervised – Tradi=onally PCA (too contrived, why?)
• Find over-‐complete bases in an efficient way • x ≈ a ϕ, where x in Rn and ϕ in Rm, m > n • Coefficients ϕ cannot be uniquely determined • Thus, impose sparsity on ϕ • k-‐sparsity 1/27/15 Ed Chang @ BigDat 2015
![Page 45: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/45.jpg)
Sparse Coding
1/27/15 Ed Chang @ BigDat 2015
N
x
N X 1
a
N
M X 1 K
φA fixed Dictionary
N X M
![Page 46: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/46.jpg)
What is Sparse Coding
1/27/15
mina,�
mX
i=1
������xi �
kX
j=1
ai,j⇥j
������
2
+ �mX
i=1
kX
j=1
|ai,j |
Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detec=on in V1).
Training: given a set of random patches x, learning a dic=onary of bases [Φ1, Φ2, …]
Coding: for data vector x, solve LASSO to find the sparse coefficient vector a
Ed Chang @ BigDat 2015
![Page 47: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/47.jpg)
Sparse Coding: Training Time Input: Images x1, x2, …, xm (each in Rd) Learn: Dic=onary of bases ϕ1, ϕ2, …, ϕk (also Rd).
mina,�
mX
i=1
������xi �
kX
j=1
ai,j⇥j
������
2
+ �mX
i=1
kX
j=1
|ai,j |
Alterna=ng op=miza=on: 1. Fix dic=onary ϕ1, ϕ2, …, ϕk , op=mize a ( standard LASSO
problem) 2. Fix ac=va=ons a, op=mize dic=onary ϕ1, ϕ2, …, ϕk , (a convex
QP problem)
1/27/15 Ed Chang @ BigDat 2015
![Page 48: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/48.jpg)
Sparse Coding: Tes=ng Time Input: Unseen image patch xi (in Rd) and previously learned ϕi’s Output: Representa=on [ai,1, ai,2, …, ai,k] of image patch xi.
≈ 0.8 * + 0.3 * + 0.5 *
Represent xi as: ai = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
mina,�
mX
i=1
������xi �
kX
j=1
ai,j⇥j
������
2
+ �mX
i=1
kX
j=1
|ai,j |
1/27/15 Ed Chang @ BigDat 2015
![Page 49: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/49.jpg)
Jus=fica=ons & Examples
• Probabilis=c Interpreta=on • Human Visual Cortex
– Not enforcing orthogonal bases like PCA – Over-‐complete preserves more features
• Scales, orienta=ons
1/27/15 Ed Chang @ BigDat 2015
![Page 50: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/50.jpg)
Revisit Autoencoder’s Probabilis=c Interpreta=on
• Suppose ϕ (or h) to be a set of hidden variables • Model image x with k independent hidden features ϕi with addi=ve noise v
• The goal is to find a set of h such that posterior P(x|ϕ) us as close as P*(x) or to minimize KL divergence between the two
1/27/15 Ed Chang @ BigDat 2015
x = aiφi + v(x)i=1
k
∑
![Page 51: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/51.jpg)
…Probabilis=c Interpreta=on • Minimize KL divergence between the two dist.
• Since P*(x) is constant across choice of h • Maximize the log-‐likelihood P(x|ϕ)
1/27/15 Ed Chang @ BigDat 2015
D(P*(x) || P(x |φ)) = P*(x)log P*(x)P(x |φ)!
"#
$
%&dx∫
φ*= argmaxφ log(P(x |φ)
![Page 52: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/52.jpg)
…Probabilis=c Interpreta=on • We need two terms P(x|a, ϕ) and p(a) because
• Assume white noise v is Gaussian with variance σ2
• To determine P(x|ϕ), we need the prior P(a). Assume the independence of source features
1/27/15 Ed Chang @ BigDat 2015
P(x | a,φ) = 1Zexp −
(x − aiφi )2
i=1
k∑2σ 2
#
$
%%
&
'
((
P(a) = p(ai )i=1
k
∏
P(x |φ) = P(x | a,φ)P(a)da∫
![Page 53: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/53.jpg)
…Probabilis=c Interpreta=on • Add sparsity assump=on-‐-‐-‐every image is a product of few
features, we would like probability distribu=on of ai to be peaked at zero and have a high kurtosis, S(ai) controls the shape
1/27/15 Ed Chang @ BigDat 2015
P(ai ) =1Zexp(−βS(ai ))
P(x |φ) = P(x | a,φ)P(a)da∫
P(a) = p(ai )i=1
k
∏
![Page 54: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/54.jpg)
…Probabilis=c Interpreta=on • The problem is reduced to that over all input data
1/27/15 Ed Chang @ BigDat 2015
φ*= argmaxφ log(P(x |φ)
Max logj=1
m∑ P(x | a,φ)P(a)da∫
=Max log exp(−(x − aiφi )
2∑2σ 2∫
j=1
m
∑ ) exp(−βS(ai ))∏
=Max logj=1
m
∑ exp(−(x − aiφi )2∑∫ − βS(ai ))∑
→Min x( j ) − ai( j )φii=1
k∑
j=1
m
∑2
+λ S(ai( j )
i=1
k∑ )
![Page 55: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/55.jpg)
…Probabilis=c Interpreta=on • Maximizing log likelihood is equivalent to minimizing
energy func=on
• The choices of S(.), L1 or log penalty, correspond to the use of the Laplacian and the Cauchy prior, respec=vely
1/27/15 Ed Chang @ BigDat 2015
P(ai )∝ exp(−β ai )
P(ai )∝β
1+ ai2
φ*= argmaxφ log(P(x |φ)
φ*,a*= argminφ,a x( j ) − ai( j )φi
i=1
k
∑j=1
m∑
2
+λ S(ai( j )
i=1
k∑ )
![Page 56: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/56.jpg)
Jus=fica=ons & Examples
• Probabilis=c Interpreta=on • Human Visual Cortex
– Not enforcing orthogonal bases like PCA – Over-‐complete preserves more features
• Scales, orienta=ons
1/27/15 Ed Chang @ BigDat 2015
![Page 57: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/57.jpg)
Feature Invariance • Human visual system works so well • “Mental” model (T. Serre, T. Poggio; MIT 2005)
– Ventral visual pathway – Deep learning
1/27/15 Ed Chang @ BigDat 2015
![Page 58: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/58.jpg)
Visual Pathway [Hubel Wiesle, 68]
1/27/15 Ed Chang @ BigDat 2015
Primary Visual Cortex (V1)
![Page 59: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/59.jpg)
1/27/15 Ed Chang @ BigDat 2015
Extrastriate cortex
![Page 60: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/60.jpg)
1/27/15 Ed Chang @ BigDat 2015
Extrastriate cortex
![Page 61: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/61.jpg)
Feedforward Path of Ventral Stream • Invariance (overcomplete)
– V1, star=ng with scale/posi=on/orienta=on invariance over a restricted range
– Then invariance of view points and other transforma=ons
• Mul=-‐layer, mul=-‐area (deep) – V2 and V3 (shape), Improve complexity of op=mal s=mulus
• Feedforward – First 150 millisecond of percep=on – No color informa=on (in V4) – W/o feedback
1/27/15 Ed Chang @ BigDat 2015
![Page 62: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/62.jpg)
Six Steps of HMAX [T. Serre, T. Poggio; MIT 2005]
1/27/15 Ed Chang @ BigDat 2015
![Page 63: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/63.jpg)
Mul=-‐layer Visual Pathway
1/27/15 Ed Chang @ BigDat 2015
• Edge detec=on, mul=-‐scale, mul=-‐direc=on (on/off, simple) – Using mul=-‐scale mul=-‐direc=on Gabor filters
• Edge pooling (max, invariance) – Keep “strong” features”
• Unsupervised clustering (or) – Clustering edges into patches
![Page 64: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/64.jpg)
V1 Like Bases
1/27/15 Ed Chang @ BigDat 2015
![Page 65: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/65.jpg)
Mul=-‐layer Visual Pathway
1/27/15 Ed Chang @ BigDat 2015
• Part Detec=on (on/off, simple) – Find matching patches in photos
• Part Pooling (max, invariance) – Iden=fy useful patches/parts
• Supervised Learning – Object ß parts
![Page 66: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/66.jpg)
Edges and Parts
1/27/15 Ed Chang @ BigDat 2015
![Page 67: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/67.jpg)
Six Steps of HMAX [T. Serre, T. Poggio; MIT 2005]
• Edge Detec=on, mul=-‐/scale,direc=on (on/off, simple) – Using mul=-‐scale mul=-‐orienta=on Gabor filters
• Edge Pooling (max, invariance) – Keep “strong” features”
• Unsupervised Clustering (or) – Clustering edges into patches
• Part Detec=on (on/off, simple) – Find matching patches in photos
• Part Pooling (max, invariance) – Iden=fy useful patches/parts
• Supervised Learning – Object ß parts
1/27/15 Ed Chang @ BigDat 2015
![Page 68: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/68.jpg)
Revisit Challenges of Representa=on Learning
• Invariance affected by noise – Environmental factor (e.g., ligh=ng condi=on, occlusion) – Equipment factor (e.g., different camera brands different colors and gamma correc=on)
– Aliasing (e.g., cars have different models, hence different features)
• Labeled data is tough to acquire – Robust models require big data
• Selec=vity requires good similarity func=ons
1/27/15 Ed Chang @ BigDat 2015
![Page 69: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/69.jpg)
Lecture Outline
• Data Posteriors vs. Human Priors • Learn p(x) from Big Data
– Use NN to Construct Autoencoder – Sparse Coding – Dynamic Par=al
• Graphical Models – CNN, MRF, & RBM
• Demo
1/27/15 Ed Chang @ BigDat 2015
![Page 70: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/70.jpg)
Example of Sparse Models
1/27/15
• because the 2nd and 4th elements of w are non-zero, these are the two selected features in x
• globally-aligned sparse representation
x1 [ | | | | | | ]
x2 [ | | | | | | ]
xm [ | | | | | | ]
…
x3 [ | | | | | | ]
[ 0 | 0 | 0 0 ]
[ 0 | 0 | 0 0 ]
[ 0 | 0 | 0 0 ]
…
[ 0 | 0 | 0 0 ]
f(x) = <w,x>, where w=[0, 0.2, 0, 0.1, 0, 0]
Ed Chang @ BigDat 2015
![Page 71: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/71.jpg)
Example of Sparse Ac=va=ons
1/27/15
• Different x has different dimensions ac=vated • Locally-‐shared sparse representa=on: similar x’s tend to have
similar non-‐zero dimensions, but not all
a1 [ 0 | | | 0 … 0 ]
a2 [ | | | 0 0 … 0 ]
am [ 0 0 0 | | … 0 ]
…
a3 [ | 0 | | 0 … 0 ]
x1
x2 x3
xm
Ed Chang @ BigDat 2015
![Page 72: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/72.jpg)
Example of Sparse Ac=va=ons
1/27/15
• Preserving manifold structure • i.e., clusters, manifolds,
a1 [ | | | 0 0 … 0 ] a2 [ 0 | | | 0 … 0 ]
am [ 0 0 0 0 | … 0 ]
…
a3 [ 0 0 | | | … 0 ]
x1 x2 x3
xm
Ed Chang @ BigDat 2015
![Page 73: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/73.jpg)
1/27/15 Ed Chang @ BigDat 2015
Similarity Theories
• Objects are similar in all respects (Richardson 1928)
• Objects are similar in some respects (Tversky 1977)
• Similarity is a process of determining respects, rather than using predefined respects (Goldstone 94)
![Page 74: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/74.jpg)
1/27/15 Ed Chang @ BigDat 2015
Similarity Theories
• Objects are similar in all or some respects
• Minkowski Func=on – D = (Σi = 1..M (pi -‐ qi)n)1/n
• Weighted Minkowski Func=on – D = (Σi = 1..M, wi(pi -‐ qi)n)1/n
• Same w is imposed to app pairs of objects p and q
[ 0 | 0 | 0 0 ]
[ 0 | 0 | 0 0 ]
[ 0 | 0 | 0 0 ]
…
[ 0 | 0 | 0 0 ]
![Page 75: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/75.jpg)
1/27/15 Ed Chang @ BigDat 2015
DPF: Dynamic Par=al Func=on [B. Li, E. Chang, et al, MM Systems 2013]
• Similarity is a process of determining respects, rather than using predefined respects (Goldstone 94)
a1 [ 0 | | | 0 … 0 ]
a2 [ | | | 0 0 … 0 ]
am [ 0 0 0 | | … 0 ]
…
a3 [ | 0 | | 0 … 0 ]
a1 [ | | | 0 0 … 0 ] a2 [ 0 | | | 0 … 0 ]
am [ 0 0 0 0 | … 0 ]
…
a3 [ 0 0 | | | … 0 ]
![Page 76: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/76.jpg)
1/27/15 Ed Chang @ BigDat 2015
0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 0
0.007545 0.01307 0.004637 0.002413 0.002635 0.002954 0.0020070.014669 0.02717 0.010578 0.006734 0.007725 0.006379 0.0057660.012615 0.023055 0.009333 0.006764 0.007363 0.006593 0.0054430.082128 0.212612 0.068016 0.037835 0.032241 0.018068 0.0132030.061564 0.176548 0.045542 0.026445 0.026374 0.018583 0.0220370.019243 0.037016 0.015684 0.010834 0.012792 0.013536 0.0093460.09418 0.153677 0.066896 0.040249 0.036368 0.030341 0.0211380.1284 0.335405 0.13774 0.072613 0.054947 0.039216 0.043319
0.041414 0.101403 0.035881 0.022633 0.018991 0.017131 0.019450.014024 0.049782 0.01457 0.0053 0.004439 0.003041 0.0052260.049319 0.120274 0.045804 0.020165 0.019499 0.013805 0.018513
GIF
00.020.040.060.080.10.120.14
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
Feature Number
Ave
rage
Dis
tanc
e
00.050.10.150.20.250.30.350.4
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
0 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
0.002923 0.004377 0.029086 0.017063 0.007649 0.002019 0.001984 0.011560.006648 0.010143 0.070708 0.046142 0.023502 0.005178 0.005169 0.030140.006298 0.009264 0.075118 0.042225 0.020053 0.006285 0.006533 0.0300430.010198 0.056025 0.052869 0.033199 0.018294 0.00688 0.006858 0.023620.017066 0.047514 0.104013 0.073459 0.037468 0.013849 0.01293 0.0483440.008148 0.015337 0.074134 0.044238 0.021222 0.005197 0.005099 0.0299780.013529 0.051743 0.063263 0.038084 0.020885 0.010481 0.009844 0.0285110.045746 0.104141 0.145924 0.11276 0.065015 0.026333 0.02593 0.0751920.026167 0.034522 0.085067 0.054154 0.02918 0.015887 0.014371 0.0397320.002676 0.012148 0.008913 0.004682 0.002452 0.000913 0.000905 0.0035730.014527 0.036084 0.046779 0.024712 0.017418 0.004182 0.004991 0.0196160.012121 0.030269 0.045198 0.022268 0.012468 0.004706 0.004955 0.017919
Scale up/down
00.050.10.150.20.250.30.350.4
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
Feature Number
Aver
age
Dis
tanc
e
0.024788 0.069615 0.0226 0.009364 0.01 0.00678 0.0097120.094781 0.227558 0.099002 0.046466 0.047815 0.036883 0.0246990.093399 0.233519 0.188091 0.043026 0.037991 0.022151 0.0240640.040228 0.102763 0.034949 0.014184 0.01465 0.010237 0.0155170.001163 0.000896 0.000722 0.000627 0.000349 0.000452 0.0027580.006947 0.006769 0.003541 0.006377 0.002048 0.005515 0.0130060.006365 0.005313 0.002064 0.004006 0.002055 0.003338 0.01010.011705 0.010935 0.006615 0.007506 0.003319 0.005911 0.0152110.009434 0.010169 0.004484 0.006306 0.002582 0.004798 0.0136570.006305 0.005997 0.003392 0.005719 0.002382 0.004853 0.0128020.005835 0.00945 0.004323 0.00564 0.002688 0.004535 0.0063320.008149 0.009636 0.0047 0.006213 0.002564 0.003375 0.0064210.006776 0.010315 0.005393 0.008004 0.003845 0.005659 0.0132030.001526 0.002551 0.000576 0.000371 0.000331 0.000286 0.000380.016302 0.022657 0.007055 0.00353 0.002171 0.004162 0.003980.012414 0.020159 0.007076 0.003102 0.00188 0.004606 0.003490.007231 0.013591 0.004979 0.001092 0.000582 0.002766 0.0007410.011588 0.015102 0.005764 0.003855 0.00262 0.004584 0.0037920.01212 0.016013 0.006441 0.004048 0.002728 0.004856 0.0042410.012235 0.01671 0.00483 0.002616 0.00197 0.00268 0.001672
Cropping
00.050.10.150.20.250.30.35
1 11 21 31 41 51 61 71 81 91 101
111
121
131
141
Feature Number
Ave
rage
Dis
tanc
e
0.006109 0.019169 0.032795 0.015229 0.008667 0.002357 0.00292 0.0123940.01223 0.070665 0.046472 0.02549 0.017445 0.008694 0.00841 0.0213020.019067 0.08113 0.04592 0.024327 0.014169 0.004995 0.005275 0.0189370.011323 0.029089 0.063856 0.037716 0.01988 0.00522 0.005556 0.0264460.000995 0.000971 0.00241 0.001415 0.000736 0.000275 0.000272 0.0010220.007103 0.006337 0.015615 0.008709 0.003433 0.001572 0.002071 0.006280.004321 0.004457 0.012494 0.007507 0.003403 0.001351 0.001976 0.0053460.007451 0.008135 0.017145 0.008711 0.003192 0.001154 0.00223 0.0064860.00576 0.006822 0.015235 0.00869 0.003676 0.001193 0.002159 0.0061910.006491 0.005948 0.013473 0.007436 0.003165 0.001777 0.002377 0.0056460.003832 0.005257 0.011884 0.008077 0.002654 0.001227 0.001213 0.0050110.004812 0.005389 0.011737 0.00729 0.003216 0.001534 0.002039 0.0051630.008795 0.007888 0.016303 0.008801 0.004048 0.002367 0.0027 0.0068440.000451 0.000707 0.002277 0.001346 0.000797 0.000253 0.000239 0.0009820.004914 0.006924 0.01499 0.009123 0.006657 0.003364 0.003391 0.0075050.004473 0.006398 0.017247 0.008858 0.005219 0.002338 0.002392 0.0072110.001723 0.003639 0.010426 0.005216 0.003024 0.00043 0.000423 0.0039040.00427 0.005712 0.011221 0.00856 0.006923 0.004464 0.004462 0.0071260.004978 0.006186 0.009864 0.007161 0.005881 0.003835 0.003847 0.0061180.001722 0.0046 0.015611 0.007291 0.00338 0.000508 0.00049 0.005456
Rotation
0
0.02
0.04
0.06
0.08
0.1
0.12
1 10 19 28 37 46 55 64 73 82 91 100
109
118
127
136
Feature Number
Ave
rage
Dis
tanc
e
![Page 77: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/77.jpg)
1/27/15 Ed Chang @ BigDat 2015
DPF: Dynamic Par=al Func=on [B. Li, E. Chang, et al, MM Systems 2013]
• Which Place is Similar to Kyoto? • Par=al • Dynamic • Dynamic Par=al Func=on
![Page 78: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/78.jpg)
1/27/15 Ed Chang @ BigDat 2015
Precision/Recall
![Page 79: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/79.jpg)
Par=al, Dynamic Low dimensional manifolds
K. Yu and A. Ng, Tutorial: Feature Learning for Image classifica1on Part 3: Image Classifica1on using Sparse Coding: Advanced Topics, ECCV-‐2010.
Data Manifold
Local linear
1/27/15 Ed Chang @ BigDat 2015
![Page 80: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/80.jpg)
Part #1 Summary
• Overcomplete Representa=on • Sparse weigh=ng vector a for x • Autoencoders & Sparse Coding
– Equivalent models – One with implicit and one with explicit f(x)
1/27/15 Ed Chang @ BigDat 2015
![Page 81: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/81.jpg)
Autoencoders
-‐ also involve ac=va=on and reconstruc=on -‐ but have explicit f(x), e.g., sigmoid func=on -‐ not necessarily enforce sparsity on a -‐ but if put sparsity on a, oren get improved results [e.g. sparse RBM, Lee et al. NIPS08]
1/27/15
x
a
f(x) x’
a
g(a) encoding decoding
Ed Chang @ BigDat 2015
![Page 82: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/82.jpg)
Sparse Coding
1/27/15
mina,�
mX
i=1
������xi �
kX
j=1
ai,j⇥j
������
2
+ �mX
i=1
kX
j=1
|ai,j |
-‐ a is sparse -‐ a is oren higher dimension than x -‐ Ac=va=on a = f(x) is nonlinear implicit func=on of x -‐ reconstruc=on x’ = g(a) is linear & explicit
x
a
f(x) x’
a
g(a) encoding decoding
Ed Chang @ BigDat 2015
![Page 83: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/83.jpg)
Hierarchical Sparse Coding
Sparse Coding Pooling Sparse Coding Pooling
Learning from unlabeled data
Yu, Lin, & Lafferty, CVPR 11 Mauhew D. Zeiler, Graham W. Taylor, and Rob Fergus, ICCV 11
1/27/15 Ed Chang @ BigDat 2015
![Page 84: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/84.jpg)
DEEP MODELS CNN, MRF & RBM
1/27/15 Ed Chang @ BigDat 2015
![Page 85: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/85.jpg)
Recap NN
• Other network architectures – how the different neurons are connected to each other
Layer 3 Layer 1 Layer 2 Layer 4 In tradi1onal NN, neurons in a layer are fully connected to all neurons in the next layer.
1/27/15 Ed Chang @ BigDat 2015
![Page 86: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/86.jpg)
CNN: NN Considers Sparse Coding
1/27/15 Ed Chang @ BigDat 2015
![Page 87: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/87.jpg)
The replicated feature approach (Hinton: the dominant approach for neural networks)
• Use many different copies of the same feature detector with different posi=ons. – Could also replicate across scale and orienta=on (tricky and expensive)
– Replica=on greatly reduces the number of free parameters to be learned.
• Use several different feature types, each with its own map of replicated detectors. – Allows each patch of image to be represented in several ways à overcomplete
The red connec=ons all have the same weight.
1/27/15 Ed Chang @ BigDat 2015
![Page 88: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/88.jpg)
CNN Architecture: Convolu=onal Layers
Spa=ally-‐local correla=on – Spa=al informa=on is encoded in the network – Sparse connec=vity
Layer 1
Layer 2
…
v … v
…
Par1al Convolu1onal Layer
1/27/15 Ed Chang @ BigDat 2015
![Page 89: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/89.jpg)
1/27/15 Ed Chang @ BigDat 2015
![Page 90: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/90.jpg)
1/27/15 Ed Chang @ BigDat 2015
![Page 91: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/91.jpg)
1/27/15 Ed Chang @ BigDat 2015
![Page 92: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/92.jpg)
1/27/15 Ed Chang @ BigDat 2015
![Page 93: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/93.jpg)
Pooling the Outputs of Replicated Feature Detectors
Get a small amount of transla=onal invariance at each level by averaging four neighboring replicated detectors to give a single output to the next level.
– This reduces the number of inputs to the next layer of feature extrac=on, thus allowing us to have many more different feature maps.
– Taking the maximum of the four (like HMAX) works slightly beuer (G. Hinton).
1/27/15 Ed Chang @ BigDat 2015
![Page 94: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/94.jpg)
Convolu=onal Networks [LeCun 97]
• Convolu=on (feature detec=on) • Sub-‐sampling (mul=-‐scale) • Perform C & S itera=vely to form a deep-‐learning network
• Learning weights from data
• Loca=on informa=on (where an object is at) lost
1/27/15 Ed Chang @ BigDat 2015
![Page 95: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/95.jpg)
The 82 errors made by LeNet5
No=ce that most of the errors are cases that people find quite easy.
The human error rate is probably 20 to 30 errors but nobody has had the pa=ence to measure it.
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 96: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/96.jpg)
Ciresan’s brute force approach • LeNet uses knowledge about the invariances to design: – the local connec=vity – the weight-‐sharing – the pooling.
• Achieves about 80 errors – This can be reduced to about 40 errors by using many different transforma=ons of the input and other tricks (Ranzato 2008)
• Ciresan et. al. (2010) inject knowledge of invariances by crea=ng a huge amount of carefully designed extra training data: – For each training image, they produce many new training examples by applying many different transforma=ons.
– They can then train a large, deep, dumb net on a GPU without much overfi�ng.
• Improves to 35 errors
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 97: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/97.jpg)
The errors made by the Ciresan et. al. net
The top printed digit is the right answer. The bouom two printed digits are the network’s best two guesses. The right answer is almost always in the top 2 guesses. With model averaging they can now get about 25 errors.
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 98: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/98.jpg)
From hand-‐wriuen digits to 3-‐D objects
• Recognizing real objects in color photographs downloaded from the web is much more complicated than recognizing hand-‐wriuen digits: – Hundred =mes as many classes (1,000 vs 10) – Hundred =mes as many pixels (256 x 256 color vs 28 x 28 gray) – Two dimensional image of three-‐dimensional scene. – Cluuered scenes requiring segmenta=on – Mul=ple objects in each image.
• Will the same type of CNN work?
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 99: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/99.jpg)
The ILSVRC-‐2012 compe==on on ImageNet
• The dataset has 1.2 million high-‐resolu=on training images.
• The classifica=on task: – Get the “correct” class in your top 5 bets. There are 1,000 classes.
• The localiza=on task: – For each bet, put a box around the object. Your box must have at least 50% overlap with the correct box.
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 100: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/100.jpg)
Examples
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 101: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/101.jpg)
Error rates on the ILSVRC-‐2012 compe==on
• University of Tokyo • Oxford University Computer Vision Group
• INRIA (French na=onal research ins=tute in CS) + XRCE (Xerox Research Center Europe)
• University of Amsterdam
• 26.1% 53.6% • 26.9% 50.0%
• 27.0%
• 29.5%
• University of Toronto (Alex Krizhevsky) 16.4% 34.1% •
classifica=on classifica=on &localiza=on
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 102: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/102.jpg)
A neural network for ImageNet
• Alex Krizhevsky (NIPS 2012) developed a very deep convolu=onal neural net of the type pioneered by Yann Le Cun. Its architecture was: – 7 hidden layers not coun=ng some max pooling layers.
– The early layers were convolu=onal.
– The last two layers were globally connected.
• The ac=va=on func=ons were:
– Rec=fied linear units in every hidden layer f(x) = max(0, x). These train much faster and are more expressive than logis=c units.
– Compe==ve normaliza=on to suppress hidden ac=vi=es when nearby units have stronger ac=vi=es. This helps with varia=ons in intensity.
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 103: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/103.jpg)
Tricks that significantly improve generaliza=on
• Bagging Train on random 224x224 patches from the 256x256 images to get more data. Also use ler-‐right reflec=ons of the images. At test =me, combine the opinions from ten different patches: The four 224x224 corner patches plus the central 224x224 patch plus the reflec=ons of those five patches.
• Dropout (Sparsifica=on) Use “dropout” to regularize the weights in the globally connected layers (which contain most of the parameters). Dropout means that half of the hidden units in a layer are randomly removed for each training example. This stops hidden units from relying too much on other hidden units.
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 104: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/104.jpg)
Dropout: An efficient way to average many large neural nets (hup://arxiv.org/abs/1207.0580)
• Consider a neural net with one hidden layer.
• Each =me we present a training example, we randomly omit each hidden unit with probability 0.5.
• So we are randomly sampling from 2H different architectures. All architectures share weights.
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 105: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/105.jpg)
Dropout as a form of model averaging Bagging
• Sample from 2H models, so only a few of the models ever get trained, and they only get one training example. – This is as extreme as Bagging can get.
• The sharing of the weights means that every model is very strongly regularized. – It’s a much beuer regularizer than L2 or L1 penal=es that pull the weights towards zero.
1/27/15 Ed Chang @ BigDat 2015
Hinton NIPS 2013
![Page 106: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/106.jpg)
1/27/15 Ed Chang @ BigDat 2015
![Page 107: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/107.jpg)
DEEP MODELS CNN, MRF & RBM
1/27/15 Ed Chang @ BigDat 2015
![Page 108: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/108.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 109: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/109.jpg)
Directed Graph Bayesian Networks
General Factoriza=on pak denotes parents of xk
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 110: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/110.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 111: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/111.jpg)
1/27/15
“Explaining Away” • Cause inference for directed graphs has one subtlety
• Illustra=on: pixel colour in an image
image colour
surface colour
ligh=ng colour
Ed Chang @ BigDat 2015
C. Bishop, ECCV tutorial
![Page 112: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/112.jpg)
Shortcomings of Back-‐propaga=on • It requires labeled training data
– Almost all data is unlabeled. • The learning =me does not scale well
– It is very slow in networks with mul=ple hidden layers.
– Backward pass: signal = dE/dy, diminishing as # layers increases
• It can get stuck in poor local op=ma – These are oren quite good, but for deep nets they are far from op=mal.
1/27/15 Ed Chang @ BigDat 2015
![Page 113: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/113.jpg)
MRF & RBM Directed à Undirected Graph
1/27/15 Ed Chang @ BigDat 2015
![Page 114: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/114.jpg)
Markov Random Field (MRF) Components • A set of sites or pixels: P={1,…,m} : each pixel is a site. • Each pixel’s Neighborhood N={Np | p ∈ P} • A set of random variables (random field), one for each pixel
X={Xp | p ∈ P} • Denotes the label at each pixel.
Each random variable takes a value xp from the set of labels L={l1,…,lk}
• We have a joint event {X1=x1,…, Xm=xm} , or a configura=on, abbreviated as X=x
• The joint prob. Of such configura=on: p(X=x) or p(x) • Many possible configura=ons: k^m
From Slides by S. Seitz - University of Washington 1/27/15 Ed Chang @ BigDat 2015
![Page 115: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/115.jpg)
1/27/15
Markov Random Field Hammersley-‐Clifford Theorem
• p(x) joint distribu=on is product of non-‐nega=ve func=ons over the cliques (neighbourhoods) of the graph
• where are the clique poten+als, and Z is a normaliza=on constant
Ed Chang @ BigDat 2015
![Page 116: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/116.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 117: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/117.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 118: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/118.jpg)
Equilibrium Interpreta=on
• Expected value of product of states at thermal equilibrium when nothing is clamped
1/27/15 Ed Chang @ BigDat 2015
• Expected value of product of states at thermal equilibrium when the training data is clamped on the visible units
∂L(θ )∂θij
= EPdata[xix j ]−EPθ
[xix j ]
![Page 119: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/119.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 120: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/120.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 121: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/121.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 122: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/122.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 123: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/123.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 124: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/124.jpg)
Model Learning Similar to MRF
• Expensive to compute with exponen=al # of configura=ons (over all possible images)
• Use MCMC
1/27/15 Ed Chang @ BigDat 2015
• Simple to compute
∂L(θ )∂θij
= EPdata[vihj ]−EPθ
[vihj ]
![Page 125: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/125.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 126: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/126.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 127: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/127.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 128: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/128.jpg)
1/27/15 Ed Chang @ BigDat 2015
Russ S. KDD 04 Tutorial
![Page 129: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/129.jpg)
Latest ImageNet Compe==on Update
1/27/15 Ed Chang @ BigDat 2015
![Page 130: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/130.jpg)
Key References
1/27/15
• Deep Learning video lectures hup://videolectures.net/Top/Computer_Science/Machine_Learning/Deep_Learning/
• A Data-‐Driven Study on Image Feature Extrac<on and Fusion, Zhiyu Wang, Fangtao Li, Edward Y. Chang, and Shiqiang Yang, Google Technical Report, April 2012
• Founda<ons of Large-‐Scale Mul<media Informa<on Management and Retrieval, E. Y. Chang, Springer, 2011
• Convolu<onal deep belief networks for scalable unsupervised learning of hierarchical representa<ons, Honglak Lee, Roger Grosse, Rajesh Ranganath and Andrew Y. Ng. In Proceedings of the Twenty-‐Sixth Interna+onal Conference on Machine Learning, 2009
• Robust Object Recogni<on with Cortex-‐like Mechanisms, T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio, IEEE Transac=ons on Pauern Analysis and Machine Intelligence, 29(3):411–426, 2007.
• Object Recogni<on from Local Scale-‐Invariant Features, D.G. Lowe, In IEEE Interna=onal Conference on Computer Vision (ICCV), 1999.
Ed Chang @ BigDat 2015
![Page 131: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/131.jpg)
…Key References
1/27/15
• A Tutorial on Energy-‐Based Learning, Yann LeCun, et al, Predic=ng Structured Data, MIT Press, 2006.
• Dropout, A Simple Way to Prevent Neural Networks from OverfiNng, N. Srivastava, G. Hinton, A. Krizhevsky, U. Sutskever, and R. Salakhutdinov, Journal of Machine Learning, 2014.
• A Fast Learning Algorithm for Deep Belief Nets, G. Hinton, S. Osindero, and Y. The, Neural Computa=on, 2006
• Representa<on Learning Tutorial, Yoshua Bengio, ICML 2012. • Representa<on Learning: A Review and New Perspec<ves, Y. Bengio, A. Courville,
and P. Vincent, April 2014 • Convolu<onal networks for images, speech, and <me series, Y. LeCun and Y.
Bengio, The handbook of brain theory and neural networks 3361, 310, 1995. • Sparse Coding with an Overcomplete Basis Set, A Strategy Employed by V1,
Olshausen & Field,Vision Research, 37(23), p.3311-‐3325, 1997. • Deep Learning Tutorial, R. Salakhutdinov KDD, 2014
Ed C,hang @ BigDat 2015
![Page 132: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/132.jpg)
APPENDIX
1/27/15 Ed Chang @ BigDat 2015
![Page 133: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/133.jpg)
1/27/15 Ed Chang @ BigDat 2015
![Page 134: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/134.jpg)
1/27/15 Ed Chang @ BigDat 2015
![Page 135: Big Data Analytics - Stanford Universityinfolab.stanford.edu/~echang/BigDat2015/BigDat2015... · Big Data Analytics Architectures, Algorithms and Applications! Part #2: Intro to deep](https://reader035.fdocuments.net/reader035/viewer/2022062602/5ec9c5f8ecfbe4080923514e/html5/thumbnails/135.jpg)
1/27/15 Ed Chang @ BigDat 2015