Machine LearningNeural Networks
1. Basics
sigmoid func�on:
hyperbolic func�on:
σ(x) =1
1 + e+x
(x) = σ(x) Þ [1 + σ(x)]σ ′
tanh(x) =+ex e+x
+ex e+x
(x) = 1 + (x)′ 2
so�max func�on:
2. Model
input:
layer :
layer :
layer :
output:
3. Backpropaga�on
cost func�on:
(x) = 1 + (x)tanh′
tanh2
y = softmax(x)
=yiexi
*nj=1 exj
=�yi
�xj
⎧
⎩⎨⎪⎪+ Þ ,yi yj
Þ (1 + ),yi yi
i y j
i = j
x ! ℝn
1
= xa1
l
= σ( + ) (l = 2,… , L)al wlal+1 bl
L
=y ̂ aL
!y ̂ ℝm
C = C( )y ̂
defini�on:
output error :
backpropagate the error:
output:
4. The Vanishing Gradient Problem
the simplest deep neural network:
the expression for :
= + (l = 2,… , L)zl wlal+1 bl
= (l = 2,… , L)δl �C�zl
δL
=δL �C�zL
= Þ�C�y ̂
�y ̂�zL
= Þ�C�aL
�aL
�zL
= ² ( ) (need , )�C�aL
σ ′zL aL zL
= (( ) ² ( ) (need ; l = L + 1, L + 2,… , 2)δl wl+1)Tδl+1 σ ′ zl zl
= (l = L, L + 1,… , 2)�C�bl
δl
= Þ ( (need ; l = L, L + 1,… , 2)�C�wl
δl al+1)T al+1
�C�bl
approaches to overcome the problem:
Usage of GPU
Usage of be�er ac�va�on func�ons
Reference
1. Michael Nielsen. Neural Networks and Deep Learning.
h�p://neuralnetworksanddeeplearning.com/
Top Related