From neural networks to deep learning

Post on 14-Jan-2017

4.061 views 2 download

Transcript of From neural networks to deep learning

From Artificial Neural Networks to Deep learning

Viet-Trung Tran

1  

2  

3  

4  

5  

Perceptron •  Rosenblatt 1957 •  input signals x1, x2, •  bias x0 = 1 •  Net input = weighted sum = Net(w,x) •  Activation/transfer func = f(Net(w,x)) •  output

weighted  sum  

step  func1on  

6  

Weighted Sum and Bias

•  Weighted sum

•  Bias

7  

8  

Hard-limiter function

•  Hard-limiter – Threshold function – Discontinuous function – Discontinuous derivative

9  

Threshold logic function

•  Saturating linear function

•  Contiguous function

•  Discontinuous derivative

10  

Sigmoid function •  Most popular •  Output (0,1) •  Continuous derivatives •  Easy to differentiate

11  

Artificial neural network – ANN structure

•  Number of input/output signals •  Number of hidden layers •  Number of neurons per layer •  Neuron weights •  Topology •  Biases

12  

Feed-forward neural network

•  connections between the units do not form a directed cycle

13  

Recurrent neural network

•  A class of artificial neural network where connections between units form a directed cycle

14  

Why hidden layers

15  

Neural network learning

•  2 types of learning – Parameter learning •  Learn neuron weight connections

– Structure learning •  Learn ANN structure from training data

16  

Error function

•  Consider an ANN with n neurons •  For each learning example (x,d) – Training error caused by current weight w

•  Training error caused by w for entire learning examples

17  

Learning principle

18  

Neuron error gradients

19  

Parameter learning: back propagation of error

•  Calculate total error at the top •  Calculate contributions to error at each step going

backwards

20  

Back propagation discussion

•  Initial weights •  Learning rate •  Number of neurons per hidden layers •  Number of hidden layers

21  

Stochastic gradient descent (SGD)

22  

23  

Deep learning

24  

Google brain

25  

GPU

26  

Learning from tagged data

•  @Andrew Ng

27  

2006 breakthrough

•  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep

architectures

28  

29  

30  

31  

Deep Learning trends

•  @Andrew Ng

32  

33  

34  

AI will transform the internet •  @Andrew Ng •  Technology areas with potential for paradigm shift: –  Computer vision –  Speech recognition & speech synthesis –  Language understanding: Machine translation; Web

search; Dialog systems; …. –  Advertising –  Personalization/recommendation systems –  Robotics

•  All this is hard: scalability, algorithms.

35  

36  

37  

38  

Deep learning

39  

40  

CONVOLUTIONAL NEURAL NETWORK

http://colah.github.io/

41  

Convolution •  Convolution is a mathematical operation on two

functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions,

42  

Convolutional neural networks

•  Conv Nets is a kind of neural network that uses many identical copies of the same neuron – Large number of neurons – Large computational models – Number of actual weights (parameters) to be

learned fairly small

43  

A 2D Convolutional Neural Network

•  a convolutional neural network can learn a neuron once and use it in many places, making it easier to learn the model and reducing error.

44  

Structure of Conv Nets

•  Problem – predict whether a human is speaking or not

•  Input: audio samples at different points in time

45  

Simple approach

•  just connect them all to a fully-connected layer

•  Then classify

46  

A more sophisticated approach •  Local properties of the data –  frequency of sounds (increasing/decreasing)

•  Look at a small window of the audio sample –  Create a group of neuron A to compute certain features –  the output of this convolutional layer is fed into a fully-

connected layer, F

47  

48  

49  

Max pooling layer

50  

2D convolutional neural networks

51  

52  

53  

Three-dimensional convolutional networks

54  

Group of neurons: A

•  Bunch of neurons in parallel •  all get the same inputs and compute different

features.

55  

Network in Network (Lin et al. (2013)

56  

Conv Nets breakthroughs in computer vision

•  Krizehvsky et al. (2012)

57  

Diferent Levels of Abstraction

58  

59  

60  

RECURRENT NEURAL NETWORKS

http://colah.github.io/

61  

Recurrent Neural Networks (RNN) have loops

•  A loop allows information to be passed from one step of the network to the next.

62  

Unroll RNN

•  recurrent neural networks are intimately related to sequences and lists.

63  

Examples •  predict the last word in “the clouds are in the sky" •  the gap between the relevant information and the

place that it’s needed is small •  RNNs can learn to use the past information

64  

•  “I grew up in France… I speak fluent French.” •  As the gap grows, RNNs become unable to

learn to connect the information.

65  

LONG SHORT TERM MEMORY NETWORKS

LSTM Networks

66  

LSTM networks •  A Special kind of RNN •  Capable of learning long-term dependencies •  Structure in the form of a chain of repeating

modules of neural network

67  

RNN

•  repeating module has a very simple structure, such as a single tanh layer

68  

•  The tanh(z) function is a rescaled version of the sigmoid, and its output range is [ − 1,1] instead of [0,1].

69  

LSTM networks

•  Repeating module consists of four neuron, interacting in a very special way

70  

Core idea behind LSTMs •  The key to LSTMs is the cell state, the horizontal line

running through the top of the diagram. •  The cell state runs straight down the entire chain, with only

some minor linear interactions •  Easy for information to just flow along it unchanged

71  

Gates

•  The ability to remove or add information to the cell state, carefully regulated by structures called gates

•  Sigmoid – How much of each component should be let

through. – Zero means nothing through – One means let everything through

•  An LSTM has three of these gates 72  

LSTM step 1

•  decide what information we’re going to throw away from the cell state

•  forget gate layer

73  

LSTM step 2

•  decide what new information we’re going to store in the cell state

•  input gate layer

74  

LSTMs step 3

•  update the old cell state, Ct−1, into the new cell state Ct

75  

LSTMs step 4

•  decide what we’re going to output

76  

77  

78  

79  

80  

RECURRENT NEURAL NETWORKS WITH WORD EMBEDDINGS

81  

APPENDIX

82  

83  

Perceptron 1957

84  

Perceptron 1957

85  

Perceptron 1986

86  

Perceptron

87  

Activation function

88  

Back propagation 1974/1986

89  

90  

91  

•  Inspired by the architectural depth of the brain, researchers wanted for decades to train deep multi-layer neural networks.

•  No successful attempts were reported before 2006 …Exception: convolutional neural networks, LeCun 1998

•  SVM: Vapnik and his co-workers developed the Support Vector Machine (1993) (shallow

•  architecture). •  Breakthrough in 2006!

92  

2006 breakthrough

•  More data •  Faster hardware: GPU’s, multi-core CPU’s •  Working ideas on how to train deep

architectures

93  

•  Beat state of the art in many areas: – Language Modeling (2012, Mikolov et al) –  Image Recognition (Krizhevsky won 2012

ImageNet competition) – Sentiment Classification (2011, Socher et al) – Speech Recognition (2010, Dahl et al) – MNIST hand-written digit recognition (Ciresan et

al, 2010)

94  

Credits

•  Roelof Pieters, www.graph-technologies.com •  Andrew Ng •  http://colah.github.io/

95