Dl meetup 07-04-16

primer

Overivew and Tutorial

Jim O’ Donoghue

Deep Learning Meetup @Intercom, Stephens Green

7th April 2016

my background machine learning function elements


hypothesis functions (NN architectures)objective functions

optimisation


hypothesis functions (NN architectures)objective functions

optimisation

linear regressionmulti-layer perceptron


input types hypothesis functions (NN architectures)

objective functions + optimisationoutput types

regressionmulti-layer Perceptron

Continuous

DiscreteCategorical

Nominal

Ordinal

Supervised

Unsupervised

Semi-Supervised

Feature

y𝑥 𝑓 𝑥

y 𝑦+𝜖𝑥 𝑓 𝑥

y 𝑦+𝜖𝑥

Optimisation

𝑓 𝑥

y 𝑦+𝜖𝑥

Optimisation + Hyper-Parameters

𝑓 𝑥

y 𝑦+𝜖𝑥

hypothesis

ℎ 𝑥

y 𝑦+𝜖𝑥 ℎ 𝑥

output

y 𝑦+𝜖𝑥 ℎ 𝑥

calculated by ...

y 𝑦+𝜖𝑥 ℎ𝜽 𝑥

calculated by ...

y 𝑦+𝜖𝑥 ℎ𝜃 𝑥objective

y 𝑦+𝜖𝑥 ℎ𝜽 𝑥

optimise

27

Hypothesis Functionsℎ 𝑥

28


calculate outputs via 𝑦

𝜃 = {Weights, bias}

calculate outputs via



n activation functions



n activation functionsinterim



interim functionsLinear



interim functionsLinearTanhCoshLogistic SigmoidRecitified Linear

{non linear



ℎ 𝑥 = 𝑔(𝑓 𝑥 )



ℎ 𝑥 = 𝑔(𝑓(𝑔(𝑓 𝑥 ))

loss/cost/error

y 𝑦+𝜖

y − 𝑦𝜖

loss/cost/error

y − 𝑦𝜖J(𝜃)

loss/cost/error

gradient descent

𝜃

𝐽(𝜃)

𝜃

𝐽(𝜃)

gradient descent

𝜕𝐽(𝜃)

𝜕𝜃A. get partial derivative

gradient descent

𝜃

𝐽(𝜃)

gradient descent

𝜃 ≔ 𝜃 − 𝛼𝜕𝐽(𝜃)

𝜕𝜃

𝜕𝐽(𝜃)

𝜕𝜃A. get partial derivative

B. update the parameters

gradient descent

𝜃

𝐽(𝜃)

gradient descent

𝜃

𝐽(𝜃)

𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒

𝛼

gradient descent

54𝜃

𝐽(𝜃)

gradient descent

global optimim

55𝜃

𝐽(𝜃)

gradient descent

local optimum

first...

the activation

ConnectionWeights

Class

Input Features

the activation

𝑥

𝑦𝜃 = {Weights, bias}

the activation

𝑥

𝑦𝜃 = {Weights, bias}

𝑦 = 𝑓 𝑥

the activation

𝑥

𝑓(𝑥)𝜃 = {Weights, bias}

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

the activation

𝑥

𝑓𝜃(𝑥)𝜃 = {Weights, bias}

𝑦 = 𝑓𝜃 𝑥 = 𝑤𝑥 + 𝑏

the activation

𝑥


𝑦 = 𝑓𝜃 𝑥 = 𝑤𝑥 + 𝑏

= 𝜃𝑇𝑥

the activation

𝑥


𝑦 = 𝑓𝜃 𝑥 = 𝑤𝑥 + 𝑏= 𝜃𝑇𝑥

=

𝑖=1

𝑛

𝑤𝑖𝑥𝑖 + 𝑏

the activation for

𝑥

𝑧𝜃 = {Weights, bias}

𝑧 = 𝑓𝜃 𝑥

= 𝜃𝑇𝑥

= 𝑤𝑇𝑥 + 𝑏

y − 𝑦𝜖J(𝜃)

one sample

1

𝑚

𝑠=1

𝑚

(𝑦𝑠 − 𝑦𝑠)2

all samples

J(𝜃)

1

2𝑚

𝑠=1

𝑚

(𝑦𝑠 − 𝑦𝑠)2

all samples

J(𝜃)

1

𝑚

𝑠=1

𝑚

(𝑦𝑠 − 𝑦𝑠)𝑥𝑠

partial derivative

𝜕𝐽(𝜃)

𝜕𝜃

δ𝑥𝑠

partial derivative

𝜕𝐽(𝜃)

𝜕𝜃

𝜃 ≔ 𝜃 − 𝛼1

𝑚

𝑠=1

𝑚

(𝑦𝑠 − 𝑦𝑠)𝑥𝑠

update

the activation for

ConnectionWeights

Class

Input Features

ConnectionWeights

Class

Input Features

the activation

𝑥

𝑔(𝑧)𝑧 = 𝑓𝜃 𝑥

𝑎 = 𝑔 𝑧

=1

1+𝑒−𝑧

the activation

the activation

𝑥

𝑎𝑧 = 𝑓𝜃 𝑥

𝑎 = 𝑔 𝑧

p(a = 1|𝑥, 𝜃) =1

1+𝑒−𝑧

hypothesis

𝑦 = ℎ 𝑥 = 𝑓𝜃2 𝑎1

𝑥

𝑎1

𝑦

hypothesis

𝑦 = ℎ 𝑥 = 𝑓𝜃2 𝑎1

𝑥

𝑎1

𝑦

𝑎1 = 𝑔 𝑧 =1

1 + 𝑒−𝑧

𝑧1 = 𝑓𝜃1 𝑥 = 𝑤1𝑇𝑥 + 𝑏

hypothesis

𝑥

𝑎1

𝑦

ℎ 𝑥 = 𝑓(𝑔(𝑓 𝑥 )

the error function

𝑥

𝑎1

𝑦

1

2𝑚

𝑠=1

𝑚

(𝑦𝑠 − 𝑦𝑠)2J(𝜃)

the partial derivative

𝑥

𝑎1

𝑦

𝜕𝐽(𝜃2)

𝜕𝜃2

1

𝑚

𝑠=1

𝑚

(𝑦𝑠 − 𝑦𝑠)𝑎1𝑠


𝑥

𝑎1

𝑦

𝜕𝐽(𝜃2)

𝜕𝜃2

𝜕𝐽(𝜃2)

𝜕 𝑦

𝜕 𝑦

𝜕𝜃2


𝑥

𝑎1

𝑦

𝜕𝐽(𝜃2)

𝜕𝜃2

𝜕𝐽(𝜃2)

𝜕 𝑦𝑎1

𝑠


𝑥

𝑎1

𝑦

1

𝑚

𝑠=1

𝑚

(𝑦𝑠 − 𝑦𝑠)𝑎1𝑠

𝜕𝐽(𝜃2)

𝜕𝜃2


𝑥

𝑎1𝛿2 𝑎1

𝛿2

𝜕𝐽(𝜃2)

𝜕𝜃2


𝑥

𝛿1

𝜕𝐽(𝜃)

𝜕𝜃1𝛿1 𝑥

𝛿2


𝑥

𝛿1

𝛿2

𝜕𝐽(𝜃1)

𝜕𝑎1

𝜕𝑎1

𝜕𝑧1

𝜕z1

𝜕𝜃1

𝜕𝐽(𝜃1)

𝜕𝜃1


𝑥

𝛿1

𝛿2

𝜃2𝛿2𝜕𝑎1

𝜕𝑧1

𝜕z1

𝜕𝜃1

𝜕𝐽(𝜃1)

𝜕𝜃1


𝑥

𝛿1

𝜕𝐽(𝜃1)

𝜕𝜃1

𝛿2

𝜃2𝛿2 𝑎1(1 − 𝑎1)𝜕z1

𝜕𝜃1


𝑥

𝛿1

𝜕𝐽(𝜃1)

𝜕𝜃1

𝛿2

𝜃2𝛿2 𝑎1(1 − 𝑎1) 𝑥


𝑥

𝛿1

𝜕𝐽(𝜃1)

𝜕𝜃1

𝛿2

𝜃2𝛿2 𝑎1(1 − 𝑎1) 𝑥

𝛿1


𝑥

𝛿1

𝜕𝐽(𝜃1)

𝜕𝜃1𝛿1 𝑥

𝛿2

ConnectionWeights

Class

Input Features

Class

ConnectionWeights

Class

Input Features

Learned Features

Learning deep architectures for AI

https://deeplearning.net

https://github.com/jimod/deeplearning-meetup-dublin

http://colah.github.io/

https://www.coursera.org/learn/machine-learning

https://www.coursera.org/course/neuralnets

https://deeplearning.net/

https://github.com/jimod/deeplearning-meetup-dublin

http://colah.github.io/

https://www.coursera.org/learn/machine-learning

https://www.coursera.org/course/neuralnets

Dl meetup 07-04-16

Data & Analytics

Transcript of Dl meetup 07-04-16