Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9....

31
Deep learning & tensorflow 杞坚玮 2017.3.22

Transcript of Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9....

Page 1: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

Deep learning & tensorflow

杞坚玮

2017.3.22

Page 2: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 2 -

What I want to talk about

1. Let’s start with machine learning

2. Perceptron

3. Neural network

4. Why deep

5. CNN, RNN, LSTM

6. Tensorflow

7. Some demos

8. Other framework

9. Future work

Page 3: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 3 -

1. Let’s start with machine learning

When we talk about machine learning, it is actually looking for a function.

• Speech recognition: f( ) = “Deep learning is great.”

• Image recognition: f( ) = “Cat”

• Emotional analysis: f(“I think this car is great”) = “Positive attitude”

• Dialogue system: f(“How are you?”) = “I’m fine, and you?”

Page 4: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 4 -

1. Let’s start with machine learning

The framework of machine learning

Training Testing

Training data

A set of

function

cat dogcat

Goodness of

function f

Model 1, 2, 3…

The best

function f*

Pick the best function f*

Test data

“This is or not a cat”

Page 5: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 5 -

2. Perceptron

Perceptron is a discriminative model of supervised learning in machine learning.

Page 6: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 6 -

3. Neural network

Neural networks give a way of defining a complex, non-linear form of hypotheses

ℎ𝑊, 𝑏(𝑥), with parameters 𝑊,𝑏 that we can fit to our data.

𝒙𝟏

𝒙𝟐

𝒙𝟑

𝒉𝑾,𝒃(𝒙)

𝒘𝟏

𝒘𝟐

𝒘𝟑

𝒃

𝝈(𝒛)

Page 7: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 7 -

3. Neural network

This neuron is a computational unit that takes as input 𝑥1, 𝑥2, 𝑥3, and the output

ℎ𝑊,𝑏 𝑥 = 𝜎(σ𝑤𝑖𝑥𝑖 + 𝑏), where 𝜎 𝑧 is called activation function.

There are several common choice for activation function:

Sigmoid: 𝜎 𝑧 =1

1+𝑒−𝑧; Tanh:𝜎 𝑧 =

𝑒𝑧−𝑒−𝑧

𝑒𝑧+𝑒−𝑧; Rectified linear:𝜎 𝑧 = max(0, 𝑧)

Page 8: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 8 -

3. Neural network

We define the overall cost function to be:

𝐽 𝑊, 𝑏 =1

𝑚

𝑖=1

𝑚1

2||ℎ𝑊,𝑏 𝑥𝑖 − 𝑦𝑖||2 +

𝜃

2

𝑙=1

𝑛𝑙−1

𝑖=1

𝑠𝑙

𝑗=1

𝑠𝑙+1

(𝑊𝑗𝑖(𝑙))2

Where 𝑚 is the number of training examples, 𝑛𝑙 is number of layers, 𝑠𝑙 is the number of

nodes in layer l (not counting the bias unit).

average sum-of-squares error term weight decay term

Page 9: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 9 -

3. Neural network

We can update the parameters 𝑊,𝑏 as follow: (gradient descent)

𝑊𝑖𝑗(𝑙)

= 𝑊𝑖𝑗(𝑙)

− 𝛼𝜕

𝜕𝑊𝑖𝑗𝑙𝐽 𝑊, 𝑏

𝑏𝑖(𝑙)

= 𝑏𝑖(𝑙)

− 𝛼𝜕

𝜕𝑏𝑖𝑙𝐽 𝑊, 𝑏

Where 𝛼 is the learning rate. The problem lies in computing the partial derivatives

above. The backpropagation algorithm, which gives an efficient way to compute these

partial derivatives.

Page 10: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 10 -

3. Neural network

In the backpropagation algorithm, we will first describe how to compute the partial

derivatives of the cost function defined with respect to a single example. Once we can

compute these, we see that the derivative of the overall cost function can be computed

as:

𝜕

𝜕𝑊𝑖𝑗𝑙 𝐽 𝑊, 𝑏 =

1

𝑚σ𝑖=1𝑚 𝜕

𝜕𝑊𝑖𝑗𝑙 𝐽(𝑊, 𝑏, 𝑥𝑖 , 𝑦𝑖) + 𝜃𝑊𝑖𝑗

𝑙

𝜕

𝜕𝑏𝑖𝑙𝐽 𝑊, 𝑏 =

1

𝑚

𝑖=1

𝑚𝜕

𝜕𝑏𝑖𝑙𝐽(𝑊, 𝑏, 𝑥𝑖 , 𝑦𝑖)

Page 11: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 11 -

3. Neural network

Here is the backpropagation algorithm:

1. Perform a feedforward pass, computing the activations for layers L2, L3, and so on

up to the output layer 𝐿𝑛𝑙

2. For each output unit 𝑖 in layer 𝐿𝑛𝑙 (the output layer), set

𝛿𝑖(𝑛𝑙)

=𝜕

𝜕𝑧𝑖𝑛𝑙

1

2||𝑦 − ℎ𝑊,𝑏(𝑥)||

2 = − 𝑦𝑖 − 𝑎𝑖𝑛𝑙 𝑓′ 𝑧𝑖

𝑛𝑙 , 𝑧𝑖𝑙 =

𝑗=1

𝑛

𝑊𝑖𝑗𝑙−1 + 𝑏𝑖

𝑙−1 , 𝑎𝑖𝑙 = 𝑓(𝑧𝑖

𝑙)

3. For l = 𝑛𝑙 − 1, 𝑛𝑙 − 2, 𝑛𝑙 − 3, … , 2, for each node 𝑖 in layer l, set

𝛿𝑖𝑙 =

𝑗=1

𝑠𝑙+1

𝑊𝑗𝑖𝑙 𝛿𝑗

𝑙+1 𝑓′(𝑧𝑖𝑙)

Page 12: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 12 -

3. Neural network

4. Compute the desired partial derivatives, which are given as:𝜕

𝜕𝑊𝑖𝑗𝑙 𝐽 𝑊, 𝑏; 𝑥, 𝑦 = 𝑎𝑗

𝑙𝛿𝑖𝑙+1

𝜕

𝜕𝑏𝑖𝑙 𝐽 𝑊, 𝑏; 𝑥, 𝑦 = 𝛿𝑖

𝑙+1

Note: In steps 2 and 3 above, we need to compute 𝑓′(𝑧) for each value of 𝑖. Assuming

we choose sigmoid as the activation function, the 𝑓′(𝑧) can be computed as 𝑓′ 𝑧 =𝑎𝑖(1 − 𝑎𝑖).

Page 13: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 13 -

3. Neural network

Finally, we can describe the full gradient descent algorithm in the pseudo-code below.

1. Set ∆𝑊𝑙 = 0, ∆𝑏𝑙 = 0 for all 𝑙.

2. For 𝑖 = 1 𝑡𝑜 𝑚:

1. Use backpropagation to compute 𝛻𝑊𝑙𝐽 𝑊, 𝑏; 𝑥, 𝑦 and 𝛻𝑏𝑙𝐽 𝑊, 𝑏; 𝑥, 𝑦 .

2. Set ∆𝑊𝑙 = ∆𝑊𝑙 + 𝛻𝑊𝑙𝐽 𝑊, 𝑏; 𝑥, 𝑦 .

3. Set ∆𝑏𝑙 = ∆𝑏𝑙 + 𝛻𝑏𝑙𝐽 𝑊, 𝑏; 𝑥, 𝑦 .

3. Update the parameters:

𝑊𝑙 = 𝑊𝑙 − 𝛼1

𝑚∆𝑊𝑙 + 𝜃𝑊𝑙 , 𝑏𝑙 = 𝑏𝑙 − 𝛼

1

𝑚∆𝑏𝑙

Page 14: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 14 -

4. Why deep?

There is no doubt that the more parameters, the better performance. We can build

a deep neural network or a fat neural network, which is better?

The answer is deep neural network. Because each layer of the deep neural

network can be trained as a classifier module, this process is what we called

modularization, which need less data than one hidden layer neural network.

Page 15: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 15 -

5. CNN, RNN, LSTM

CNN: Convolutional Neural Network

From image to everything, there are some properties:

• Some patterns are much smaller than the whole image.

• Smaller than the whole documents.

• The same patterns appear in different regions.

• In different position of the documents.

• Subsampling the pixels will not change the object.

• Subsampling the words will not change the topic of the documents.

So, we can do convolution with property 1 and property 2, do pooling with property 3.

And then, we flatten the feature as the input to a fully connected feedforward network.

Page 16: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 16 -

5. CNN, RNN, LSTM

Convolution

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

1 -1 -1

-1 1 -1

-1 -1 1

-1 1 -1

-1 1 -1

-1 1 -1

-13

input filter Feature map

stride

Page 17: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 17 -

5. CNN, RNN, LSTM

Max pooling

-13 -1-3

1-3 -30

-3-3 10

-23 -1-2

The same patterns appear in

different regions.

03

13

Subsampling the pixels will not

change the object.

6*6 to 4*4

1 0 0 0 0 1

0 1 0 0 1 0

0 0 1 1 0 0

1 0 0 0 1 0

0 1 0 0 1 0

0 0 1 0 1 0

Some patterns are much smaller

than the whole image.

4*4 to 2*2

Page 18: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 18 -

5. CNN, RNN, LSTM

The whole CNN

convolution

begin

image

pooling

convolution

pooling

Can

repeat

many

times

flatten

Page 19: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 19 -

5. CNN, RNN, LSTM

RNN: Recurrent neural network

The output of hidden layer are stored in the memory. The memory can be

consider as another input.

x1

x2

a1

a2

Example

Query1: I will leave Xiamen on March 17th.

Query2: I will arrive Xiamen on March 20th.

The values stored in memory are different between ‘leave’ and

‘arrive’, so the result of query1 and query2 for Xiamen is different,

it can be destination or departure.

Page 20: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 20 -

5. CNN, RNN, LSTM

LSTM: Long short term memory

f

f

g

h

fC’

𝒛𝒐

𝒛𝒊

𝒛𝒇

𝒛

𝒇(𝒛𝒊)

𝒈(𝒛)𝒈(𝒛)𝒇(𝒛𝒊)

𝒇(𝒛𝒇)𝒄

𝒄𝒇(𝒛𝒇)

𝒉(𝒄′)

𝒇(𝒛𝒐)

𝒂 = 𝒉(𝒄′)𝒇(𝒛𝒐)

𝒄′ = 𝒈 𝒛 𝒇 𝒛𝒊 + 𝒄𝒇(𝒛𝒇)

Page 21: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 21 -

6. Tensorflow

TensorFlow is an open source software library for numerical computation using

data flow graphs.

Nodes in the graph represent mathematical operations, while the graph edges

represent the multidimensional data arrays (tensors) communicated between

them.

The flexible architecture allows you to deploy computation to one or more CPUs

or GPUs in a desktop, server, or mobile device with a single API.

Page 22: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 22 -

6. Tensorflow

Tensor

[1, 2, 3] # a rank 1 tensor with shape [3]

[[1, 2, 3], [4, 5, 6]] # a rank 2 tensor with shape [2, 3]

[[[1, 2, 3]], [[4, 5, 6]]] # a rank 3 tensor with shape [2, 1, 3]

Node

node = tf.constant(3.0, tf.float32)

print(node)

=> Tensor("Const:0", shape=(), dtype=float32)

sess = tf.Session()

print(sess.run([node]))

=>[3.0]

Page 23: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 23 -

6. Tensorflow

import tensorflow as tf

a = tf.placeholder(tf.float32)

b = tf.placeholder(tf.float32)

adder_node = a+b

print(sess.run(adder_node,{a: 3, b: 4.5}))

=>7.5

print(sess.run(adder_node,{a: [1,3], b: [2,4]}))

=>[ 3. 7.]

a b

adder_node

Page 24: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 24 -

6. Tensorflow

W = tf.Variable([.3], tf.float32)

b = tf.Variable([-.3], tf.float32)

x = tf.placeholder(tf.float32)

linear_model = W*x+b

print(sess.run(linear_model, {x:[1,2,3,4]}))

=>[0 0.3 0.6 0.9]

Page 25: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 25 -

6. Tensorflow

loss = tf.reduce_sum(tf.square(linear_model - y))

optimizer = tf.train.GradientDescentOptimizer(0.01)

train = optimizer.minimize(loss)

x_train = [1,2,3,4]

y_train = [0,-1,-2,-3]

init = tf.global_variables_initializer()

sess = tf.Session()

sess.run(init)

for i in range(1000):

sess.run(train, {x:x_train, y:y_train})

curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x:x_train, y:y_train})

print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))

Page 26: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 26 -

7. Some demos

CNN in image recognition

5*5 convolution

4 features, 0 padding

begin

28*28 images of

handwritten digits

2*2 pooling

5*5 convolution

8 features, 0 padding

2*2 pooling

flatten

1 28*28 pic

4 28*28 pic

4 14*14 pic

8 14*14 pic

8 7*7 pic

8*7*7 input nodes

1024 hide nodes

10 output nodes, softmax

Accuracy rate is 99.2%

Page 27: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 27 -

7. Some demos

RNN in Segmentation

Dictionary index

begin

Pku corpus

Embedding

2 RNN layers

Dense

{b; m; e; s}

Page 28: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 28 -

7. Some demos

RNN in Segmentation

Dictionary index

begin

Pku corpus

Embedding

2 RNN layers

Dense

{b; m; e; s}

Page 29: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 29 -

8. Other framework

Software Open source Platform Written in Interface OpenMP support OpenCL support CUDA supportAutomatic

differentiation[1]

Has pretrained

modelsRecurrent nets Convolutional nets RBM/DBNs

Parallel execution

(multi node)

Apache Singa YesLinux, Mac OS X,

WindowsC++ Python, C++, Java No Yes Yes ? Yes Yes Yes Yes Yes

Deeplearning4j Yes

Linux, Mac OS X,

Windows, Android

(Cross-platform)

C, C++Java, Scala, Clojure,

Python (Keras)Yes On roadmap[2] Yes[3] Computational

GraphYes[4] Yes Yes Yes Yes[5]

Dlib Yes Cross-Platform C++ C++ Yes No Yes Yes Yes No Yes Yes Yes

Keras YesLinux, Mac OS X,

WindowsPython Python

Only if using

Theano as backend

Under development

for the Theano

backend (and on

roadmap for the

TensorFlow

backend)

Yes Yes Yes[6] Yes Yes Yes Yes[7]

Microsoft Cognitive

Toolkit - CNTKYes

Windows, Linux[9]

(OSX via Docker on

roadmap)

C++

Python, C++,

Command line,[10]

BrainScript[11]

(.NET on

roadmap[12])

Yes[13] No Yes Yes Yes[14] Yes[15] Yes[15] No[16] Yes[17]

MXNet Yes

Linux, Mac OS X,

Windows,[18][19]

AWS, Android,[20]

iOS, JavaScript[21]

Small C++ core

library

C++, Python, Julia,

Matlab, JavaScript,

Go, R, Scala, Perl

Yes On roadmap[22] Yes Yes[23] Yes[24] Yes Yes Yes Yes[25]

Neural Designer NoLinux, Mac OS X,

WindowsC++

Graphical user

interfaceYes No No ? ? No No No ?

OpenNN Yes Cross-platform C++ C++ Yes No No ? ? No No No ?

TensorFlow YesLinux, Mac OS X,

Windows[26] C++, Python

Python, (C/C++

public API only for

executing

graphs[27])

No On roadmap[28][29] Yes Yes[30] Yes[31] Yes Yes Yes Yes

Theano Yes Cross-platform Python Python YesUnder

development[32] Yes Yes[33][34] Through Lasagne's

model zoo[35] Yes Yes Yes Yes[36]

Torch Yes

Linux, Mac OS X,

Windows,[37]

Android,[38] iOS

C, Lua

Lua, LuaJIT,[39] C,

utility library for

C++/OpenCL[40]

Yes

Third party

implementations[41][

42]

Yes[43][44] Through Twitter's

Autograd[45] Yes[46] Yes Yes Yes Yes[47]

Wolfram

MathematicaNo

Windows, Mac OS

X, Linux, Cloud

computing

C++Command line,

Java, C++No Yes Yes Yes Yes Yes Yes Yes Yes

Page 30: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 30 -

9. Future work

Next move

Event extraction, short or long term influence identification.

Change a framework?

It seems that Keras is a better choice.

Page 31: Deep learning & tensorflow - Xiamen University · Tensorflow 7. Some demos 8. Other framework 9. Future work - 3 - 1. Let’s start with machine learning When we talk about machine

- 31 -

Reference

1. http://ufldl.stanford.edu/tutorial

2. Deep Learning Tutorial, Hung-yi Le, NTU

3. https://www.tensorflow.org

4. https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software