Artificial Neural Networks Lect8: Neural networks for constrained optimization
Neural Networks - An Introduction · 2017-03-16 · Neural Networks - An Introduction Author:...
Transcript of Neural Networks - An Introduction · 2017-03-16 · Neural Networks - An Introduction Author:...
Neural NetworksAn Introduction
Warith HARCHAOUI
MAP5, UMR 8145Universite Paris-Descartes
Sorbonne Paris Cite&
Oscaro.comResearch and Development
March 2017
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
Supervised ClassificationThe binary case
Given a training set that consists of:
I xi ∈ RD
I yi ∈ {0, 1}for i = 1, . . . , nFind F s.t. F(xi ) ' yiEx:xi is an imageyi = 1 corresponds to “cat”yi = 0 corresponds to “non-cat”
Supervised ClassificationMore than 2 classes
Given a training set that consists of:
I xi ∈ RD
I yi ∈ {0, 1}K one-hot representation
for i = 1, . . . , nFind F s.t. F(xi ) ' yiEx:xi is an imageyi = [1, 0, 0] corresponds to “cat”yi = [0, 1, 0] corresponds to “dog”yi = [0, 0, 1] corresponds to “elephant”
Regression
Given a training set that consists of:
I xi ∈ RD
I yi ∈ RK
for i = 1, . . . , nFind F s.t. F(xi ) ' yiEx:xi is a buildingyi is the rent value of the building
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
One NeuronAn Input-Output Machine
x1
x2
x3
a y
w1
w2
w3
Figure: One Neuron
y = a(w1x1 + w2x2 + w3x3 + b)
One Neuron for RegressionLeast Mean Squares
Prediction:F(xi ) = yi = Wxi + b
Loss:
L(W,b) =1
n
n∑i=1
‖yi − yi‖22
One Neuron for Binary ClassificationLogistic Function
Prediction:
scorei = w>xi + b
P(yi = 1) = pi = Sigmoid(scorei ) =1
1 + exp(−scorei )
Loss:
`(w, b) =∏
i :yi=1
pi∏
i :yi=0
(1− pi ) =n∏
i=1
pyii (1− pi )1−yi
L(w, b) =−1
nlog(`(w, b)) =
−1
n
n∑i=1
yi log(pi )+(1−yi ) log(1−pi )
One neuron for Binary ClassificationLogistic function
Figure: The Sigmoid Function
Sigmoid(a) =1
1 + exp(−a)
One Neuron for Classification of K > 2 classesSoftmax Function
Prediction:
scoreik = wk>xi + bk
pik = SoftMax(scorei ) =exp(scoreik)∑K
k ′=1 exp(scoreik ′)
yi ,k = 1⇔ xi belongs to the kth class
yi ,k = 0⇔ xi does not belong to the kth class
Loss:
`(W,b) =n∏
i=1
K∏k=1
pyi,ki ,k
L(W,b) =−1
nlog(`(W, b)) =
−1
n
n∑i=1
K∑k=1
yi ,k log(pi ,k)
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
Batch Gradient DescentThe common problem
Loss function:
L(W,b) =1
n
n∑i=1
Li (W,b)
Problem:
minW,bL(W,b)
Batch Gradient DescentA Universal Learning Procedure
minw
1
n
n∑i=1
Li (w)
1. Choose a random w and a constant α > 0
2. Iterate:wnew = wold − α∇L(wold)
∇L(wold) =1
n
n∑i=1
∇Li (wold)
Stochastic Gradient DescentA Universal Learning Procedure
minw
1
n
n∑i=1
Li (w)
1. Choose a random w and a constant α > 0
2. Iterate:
2.1 Choose a random subset J ⊂ (1, n) ⊂ N (sometimes reducedto a singleton)
2.2wnew = wold − α
|J|∑j∈J
∇Lj(wold)
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
Several NeuronsThe Power of Back-Propagation
x1i
x2i
x3i
x4i
pi
Hiddenlayer
Inputlayer
Outputlayer
Figure: A Multi-Layer-Perceptron
Several NeuronsThe Power of Back-Propagation
Back-Propagation is just an iterated version of Chain Rule forplenty of functions:
(F ◦ G)′ =(F ′ ◦ G
)× G′
NB: (F ◦ G)(x) = F(G(x))
Three Remarks
1. Non-linearity: Sigmoid, SoftMax, ReLu
ReLu(x) = max(x , 0)
2. Automatic Differentiation thanks to: Theano, Torch, Caffe,Tensorflow, PyTorch
3. GPU Acceleration
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
Convolutional Neural Networks for ImagesConvolutions
x : a PixelI: an Image in gray levelsK: a Kernel = A FilterI ∗ K: Convolution of image I by filter KnonLinearity(I ∗ K): Element-wise non-linearity on the convolutionresult producing a Feature Map
(I ∗ K)(x) =∑
y∈Supp(K)
I(x − y)K(y)
The same neuron of weights K is applied many times (as much asthe number of pixels in I) producing a new image called featuremap.
Convolutional Neural Networks for ImagesConvolutions
Figure: LeNet architecture
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
Adversarial NetworksA Desired Network
yx or z Generator
Figure: Scheme for a Desired Network
Adversarial NetworksBinary Classification Networks
y
y
Discriminator p
Figure: Scheme for Binary Classification Networks
Adversarial NetworksThe full system
yx or z Generator
y
Discriminator p
Figure: Scheme for Adversarial Networks
Adversarial NetworksA New Kind of Loss
G : Generator (e.g. of images) from random noise or a real imageD: Discriminator that distinguished fake examples from realexamples
minwD
maxwG
L
Figure: Adversarial Networks Example
Outline
Supervised Classification and RegressionClassificationRegression
One Neuronfor Regressionfor Classification
Gradient DescentBatch Gradient DescentStochastic Gradient Descent
Several Neurons
Convolutional Neural Networks for Images
Adversarial Networks
Conclusion
ConclusionA Great Book
Figure: The Deep Learning Book