primer
Overivew and Tutorial
Jim O’ Donoghue
Deep Learning Meetup @Intercom, Stephens Green
7th April 2016
my background machine learning function elements
my background machine learning function elements
hypothesis functions (NN architectures)objective functions
optimisation
my background machine learning function elements
hypothesis functions (NN architectures)objective functions
optimisation
linear regressionmulti-layer perceptron
my background machine learning function elements
input types hypothesis functions (NN architectures)
objective functions + optimisationoutput types
regressionmulti-layer Perceptron
Continuous
DiscreteCategorical
Nominal
Ordinal
Supervised
Unsupervised
Semi-Supervised
Feature
Supervised
Unsupervised
Semi-Supervised
Feature
y 𝑦+𝜖𝑥
Optimisation
𝑓 𝑥
y 𝑦+𝜖𝑥
Optimisation + Hyper-Parameters
𝑓 𝑥
y 𝑦+𝜖𝑥
hypothesis
ℎ 𝑥
y 𝑦+𝜖𝑥 ℎ 𝑥
output
y 𝑦+𝜖𝑥 ℎ 𝑥
calculated by ...
y 𝑦+𝜖𝑥 ℎ𝜽 𝑥
calculated by ...
y 𝑦+𝜖𝑥 ℎ𝜃 𝑥objective
y 𝑦+𝜖𝑥 ℎ𝜽 𝑥
optimise
27
Hypothesis Functionsℎ 𝑥
28
Hypothesis Functionsℎ 𝑥
Hypothesis Functionsℎ 𝑥
calculate outputs via 𝑦
𝜃 = {Weights, bias}
calculate outputs via
𝜃 = {Weights, bias}
calculate outputs via
n activation functions
𝜃 = {Weights, bias}
calculate outputs via
n activation functionsinterim
𝜃 = {Weights, bias}
calculate outputs via
interim functionsLinear
𝜃 = {Weights, bias}
calculate outputs via
interim functionsLinearTanhCoshLogistic SigmoidRecitified Linear
{non linear
𝜃 = {Weights, bias}
calculate outputs via
ℎ 𝑥 = 𝑔(𝑓 𝑥 )
𝜃 = {Weights, bias}
calculate outputs via
ℎ 𝑥 = 𝑔(𝑓(𝑔(𝑓 𝑥 ))
loss/cost/error
y 𝑦+𝜖
y − 𝑦𝜖
loss/cost/error
y − 𝑦𝜖J(𝜃)
loss/cost/error
gradient descent
𝜃
𝐽(𝜃)
𝜃
𝐽(𝜃)
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜕𝐽(𝜃)
𝜕𝜃A. get partial derivative
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜃 ≔ 𝜃 − 𝛼𝜕𝐽(𝜃)
𝜕𝜃
𝜕𝐽(𝜃)
𝜕𝜃A. get partial derivative
B. update the parameters
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜃
𝐽(𝜃)
gradient descent
𝜃
𝐽(𝜃)
𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒
𝛼
gradient descent
54𝜃
𝐽(𝜃)
gradient descent
global optimim
55𝜃
𝐽(𝜃)
gradient descent
local optimum
the activation
ConnectionWeights
Class
Input Features
the activation
ConnectionWeights
Class
Input Features
the activation
𝑥
𝑦𝜃 = {Weights, bias}
the activation
𝑥
𝑦𝜃 = {Weights, bias}
𝑦 = 𝑓 𝑥
the activation
𝑥
𝑓(𝑥)𝜃 = {Weights, bias}
𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏
the activation
𝑥
𝑓𝜃(𝑥)𝜃 = {Weights, bias}
𝑦 = 𝑓𝜃 𝑥 = 𝑤𝑥 + 𝑏
the activation
𝑥
𝑓𝜃(𝑥)𝜃 = {Weights, bias}
𝑦 = 𝑓𝜃 𝑥 = 𝑤𝑥 + 𝑏
= 𝜃𝑇𝑥
the activation
𝑥
𝑓𝜃(𝑥)𝜃 = {Weights, bias}
𝑦 = 𝑓𝜃 𝑥 = 𝑤𝑥 + 𝑏= 𝜃𝑇𝑥
=
𝑖=1
𝑛
𝑤𝑖𝑥𝑖 + 𝑏
the activation for
𝑥
𝑧𝜃 = {Weights, bias}
𝑧 = 𝑓𝜃 𝑥
= 𝜃𝑇𝑥
= 𝑤𝑇𝑥 + 𝑏
y − 𝑦𝜖J(𝜃)
one sample
1
𝑚
𝑠=1
𝑚
(𝑦𝑠 − 𝑦𝑠)2
all samples
J(𝜃)
1
2𝑚
𝑠=1
𝑚
(𝑦𝑠 − 𝑦𝑠)2
all samples
J(𝜃)
1
𝑚
𝑠=1
𝑚
(𝑦𝑠 − 𝑦𝑠)𝑥𝑠
partial derivative
𝜕𝐽(𝜃)
𝜕𝜃
δ𝑥𝑠
partial derivative
𝜕𝐽(𝜃)
𝜕𝜃
𝜃 ≔ 𝜃 − 𝛼1
𝑚
𝑠=1
𝑚
(𝑦𝑠 − 𝑦𝑠)𝑥𝑠
update
the activation for
ConnectionWeights
Class
Input Features
ConnectionWeights
Class
Input Features
the activation
𝑥
𝑔(𝑧)𝑧 = 𝑓𝜃 𝑥
𝑎 = 𝑔 𝑧
=1
1+𝑒−𝑧
the activation
𝑥
𝑎𝑧 = 𝑓𝜃 𝑥
𝑎 = 𝑔 𝑧
p(a = 1|𝑥, 𝜃) =1
1+𝑒−𝑧
hypothesis
𝑦 = ℎ 𝑥 = 𝑓𝜃2 𝑎1
𝑥
𝑎1
𝑦
hypothesis
𝑦 = ℎ 𝑥 = 𝑓𝜃2 𝑎1
𝑥
𝑎1
𝑦
𝑎1 = 𝑔 𝑧 =1
1 + 𝑒−𝑧
𝑧1 = 𝑓𝜃1 𝑥 = 𝑤1𝑇𝑥 + 𝑏
hypothesis
𝑦 = ℎ 𝑥 = 𝑓𝜃2 𝑎1
𝑥
𝑎1
𝑦
𝑎1 = 𝑔 𝑧 =1
1 + 𝑒−𝑧
𝑧1 = 𝑓𝜃1 𝑥 = 𝑤1𝑇𝑥 + 𝑏
hypothesis
𝑥
𝑎1
𝑦
ℎ 𝑥 = 𝑓(𝑔(𝑓 𝑥 )
the error function
𝑥
𝑎1
𝑦
1
2𝑚
𝑠=1
𝑚
(𝑦𝑠 − 𝑦𝑠)2J(𝜃)
the partial derivative
𝑥
𝑎1
𝑦
𝜕𝐽(𝜃2)
𝜕𝜃2
1
𝑚
𝑠=1
𝑚
(𝑦𝑠 − 𝑦𝑠)𝑎1𝑠
the partial derivative
𝑥
𝑎1
𝑦
𝜕𝐽(𝜃2)
𝜕𝜃2
𝜕𝐽(𝜃2)
𝜕 𝑦
𝜕 𝑦
𝜕𝜃2
the partial derivative
𝑥
𝑎1
𝑦
𝜕𝐽(𝜃2)
𝜕𝜃2
𝜕𝐽(𝜃2)
𝜕 𝑦𝑎1
𝑠
the partial derivative
𝑥
𝑎1
𝑦
1
𝑚
𝑠=1
𝑚
(𝑦𝑠 − 𝑦𝑠)𝑎1𝑠
𝜕𝐽(𝜃2)
𝜕𝜃2
the partial derivative
𝑥
𝑎1𝛿2 𝑎1
𝛿2
𝜕𝐽(𝜃2)
𝜕𝜃2
the partial derivative
𝑥
𝛿1
𝜕𝐽(𝜃)
𝜕𝜃1𝛿1 𝑥
𝛿2
the partial derivative
𝑥
𝛿1
𝛿2
𝜕𝐽(𝜃1)
𝜕𝑎1
𝜕𝑎1
𝜕𝑧1
𝜕z1
𝜕𝜃1
𝜕𝐽(𝜃1)
𝜕𝜃1
the partial derivative
𝑥
𝛿1
𝛿2
𝜃2𝛿2𝜕𝑎1
𝜕𝑧1
𝜕z1
𝜕𝜃1
𝜕𝐽(𝜃1)
𝜕𝜃1
the partial derivative
𝑥
𝛿1
𝜕𝐽(𝜃1)
𝜕𝜃1
𝛿2
𝜃2𝛿2 𝑎1(1 − 𝑎1)𝜕z1
𝜕𝜃1
the partial derivative
𝑥
𝛿1
𝜕𝐽(𝜃1)
𝜕𝜃1
𝛿2
𝜃2𝛿2 𝑎1(1 − 𝑎1) 𝑥
the partial derivative
𝑥
𝛿1
𝜕𝐽(𝜃1)
𝜕𝜃1
𝛿2
𝜃2𝛿2 𝑎1(1 − 𝑎1) 𝑥
𝛿1
the partial derivative
𝑥
𝛿1
𝜕𝐽(𝜃1)
𝜕𝜃1𝛿1 𝑥
𝛿2
ConnectionWeights
Class
Input Features
Class
ConnectionWeights
Class
Input Features
Learned Features
Learning deep architectures for AI
https://deeplearning.net
https://github.com/jimod/deeplearning-meetup-dublin
http://colah.github.io/
https://www.coursera.org/learn/machine-learning
https://www.coursera.org/course/neuralnets