From neural networks to deep learning

From Artificial Neural Networks to Deep learning

Viet-Trung Tran

1

Perceptron •  Rosenblatt 1957 •  input signals x1, x2, •  bias x0 = 1 •  Net input = weighted sum = Net(w,x) •  Activation/transfer func = f(Net(w,x)) •  output

weighted sum

step func1on

6

Weighted Sum and Bias

•  Weighted sum

•  Bias

7

Hard-limiter function

•  Hard-limiter – Threshold function – Discontinuous function – Discontinuous derivative

9

Threshold logic function

•  Saturating linear function

•  Contiguous function

•  Discontinuous derivative

10

Sigmoid function •  Most popular •  Output (0,1) •  Continuous derivatives •  Easy to differentiate

11

Artificial neural network – ANN structure

•  Number of input/output signals •  Number of hidden layers •  Number of neurons per layer •  Neuron weights •  Topology •  Biases

12

Feed-forward neural network

•  connections between the units do not form a directed cycle

13

Recurrent neural network

•  A class of artificial neural network where connections between units form a directed cycle

14

Why hidden layers

15

Neural network learning

•  2 types of learning – Parameter learning •  Learn neuron weight connections

– Structure learning •  Learn ANN structure from training data

16

Error function

•  Consider an ANN with n neurons •  For each learning example (x,d) – Training error caused by current weight w

•  Training error caused by w for entire learning examples

17

Learning principle

18

Neuron error gradients

19

Parameter learning: back propagation of error

•  Calculate total error at the top •  Calculate contributions to error at each step going

backwards

20

Back propagation discussion

•  Initial weights •  Learning rate •  Number of neurons per hidden layers •  Number of hidden layers

21

Stochastic gradient descent (SGD)

22

Deep learning

24

Google brain

25

GPU

26

Learning from tagged data

•  @Andrew Ng

27

2006 breakthrough

•  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep

architectures

28

Deep Learning trends

•  @Andrew Ng

32

AI will transform the internet •  @Andrew Ng •  Technology areas with potential for paradigm shift: –  Computer vision –  Speech recognition & speech synthesis –  Language understanding: Machine translation; Web

search; Dialog systems; …. –  Advertising –  Personalization/recommendation systems –  Robotics

•  All this is hard: scalability, algorithms.

35

Deep learning

39

CONVOLUTIONAL NEURAL NETWORK

http://colah.github.io/

41

Convolution •  Convolution is a mathematical operation on two

functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions,

42

Convolutional neural networks

•  Conv Nets is a kind of neural network that uses many identical copies of the same neuron – Large number of neurons – Large computational models – Number of actual weights (parameters) to be

learned fairly small

43

A 2D Convolutional Neural Network

•  a convolutional neural network can learn a neuron once and use it in many places, making it easier to learn the model and reducing error.

44

Structure of Conv Nets

•  Problem – predict whether a human is speaking or not

•  Input: audio samples at different points in time

45

Simple approach

•  just connect them all to a fully-connected layer

•  Then classify

46

A more sophisticated approach •  Local properties of the data –  frequency of sounds (increasing/decreasing)

•  Look at a small window of the audio sample –  Create a group of neuron A to compute certain features –  the output of this convolutional layer is fed into a fully-

connected layer, F

47

Max pooling layer

50

2D convolutional neural networks

51

Three-dimensional convolutional networks

54

Group of neurons: A

•  Bunch of neurons in parallel •  all get the same inputs and compute different

features.

55

Network in Network (Lin et al. (2013)

56

Conv Nets breakthroughs in computer vision

•  Krizehvsky et al. (2012)

57

Diferent Levels of Abstraction

58

RECURRENT NEURAL NETWORKS

http://colah.github.io/

61

Recurrent Neural Networks (RNN) have loops

•  A loop allows information to be passed from one step of the network to the next.

62

Unroll RNN

•  recurrent neural networks are intimately related to sequences and lists.

63

Examples •  predict the last word in “the clouds are in the sky" •  the gap between the relevant information and the

place that it’s needed is small •  RNNs can learn to use the past information

64

•  “I grew up in France… I speak fluent French.” •  As the gap grows, RNNs become unable to

learn to connect the information.

65

LONG SHORT TERM MEMORY NETWORKS

LSTM Networks

66

LSTM networks •  A Special kind of RNN •  Capable of learning long-term dependencies •  Structure in the form of a chain of repeating

modules of neural network

67

RNN

•  repeating module has a very simple structure, such as a single tanh layer

68

•  The tanh(z) function is a rescaled version of the sigmoid, and its output range is [ − 1,1] instead of [0,1].

69

LSTM networks

•  Repeating module consists of four neuron, interacting in a very special way

70

Core idea behind LSTMs •  The key to LSTMs is the cell state, the horizontal line

running through the top of the diagram. •  The cell state runs straight down the entire chain, with only

some minor linear interactions •  Easy for information to just flow along it unchanged

71

Gates

•  The ability to remove or add information to the cell state, carefully regulated by structures called gates

•  Sigmoid – How much of each component should be let

through. – Zero means nothing through – One means let everything through

•  An LSTM has three of these gates 72

LSTM step 1

•  decide what information we’re going to throw away from the cell state

•  forget gate layer

73

LSTM step 2

•  decide what new information we’re going to store in the cell state

•  input gate layer

74

LSTMs step 3

•  update the old cell state, Ct−1, into the new cell state Ct

75

LSTMs step 4

•  decide what we’re going to output

76

RECURRENT NEURAL NETWORKS WITH WORD EMBEDDINGS

81

APPENDIX

82

Perceptron 1957

84

Perceptron 1957

85

Perceptron 1986

86

Perceptron

87

Activation function

88

Back propagation 1974/1986

89

•  Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks.

•  No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998

•  SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow

•  architecture). •  Breakthrough in 2006!

92

2006 breakthrough

•  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep

architectures

93

•  Beat state of the art in many areas: – Language Modeling (2012, Mikolov et al) –  Image Recognition (Krizhevsky won 2012

ImageNet competition) – Sentiment Classification (2011, Socher et al) – Speech Recognition (2010, Dahl et al) – MNIST hand-written digit recognition (Ciresan et

al, 2010)

94

Credits

•  Roelof Pieters, www.graph-technologies.com •  Andrew Ng •  http://colah.github.io/

95

From neural networks to deep learning

Data & Analytics

Transcript of From neural networks to deep learning