Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD...

51
>TensorFlow and deep learning_ without a PhD >TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...

Transcript of Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD...

Page 1: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

>TensorFlow and deep learning_without a PhD

>TensorFlow and deep learning_without a PhD

#Tensorflow @martin_gorner

deepScience !

deep Code ...

Page 2: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

?MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/

Hello World: handwritten digits classification - MNIST

Page 3: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

3Data & Analytics

28x28 pixels

softmax

...

...

0 1 2 9

weighted sum of all pixels + bias

neuron outputs

Very simple model: softmax classification

784 pixels

Page 4: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

L0,0

In matrix notation, 100 images at a timew0,0

w0,1

w0,2

w0,3

… w0,9

w1,0

w1,1

w1,2

w1,3

… w1,9

w2,0

w2,1

w2,2

w2,3

… w2,9

w3,0

w3,1

w3,2

w3,3

… w3,9

w4,0

w4,1

w4,2

w4,3

… w4,9

w5,0

w5,1

w5,2

w5,3

… w5,9

w6,0

w6,1

w6,2

w6,3

… w6,9

w7,0

w7,1

w7,2

w7,3

… w7,9

w8,0

w8,1

w8,2

w8,3

… w8,9

…w783,0

w783,1

w783,2

… w783,9

xxxxxxx

x

L1,0

L1,1

L1,2

L1,3

… L1,9

L2,0

L2,1

L2,2

L2,3

… L2,9

L3,0

L3,1

L3,2

L3,3

… L3,9

L4,0

L4,1

L4,2

L4,3

… L4,9

…L99,0

L99,1

L99,2

… L99,9

L0,0

L0,1

L0,2

L0,3

… L0,9

+ b0 b

1 b

2 b

3 … b

9

+ Same biases on all lines

X : 100 images,one per line, flattened

784 pixels

10 columns

784 lines

broadcast

Page 5: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

5Data & Analytics

Predictions Images Weights Biases

Y[100, 10] X[100, 748] W[748,10] b[10]

matrix multiply broadcast on all lines

Softmax, on a batch of images

applied line by line

tensor shapes in [ ]

Page 6: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

6Data & Analytics

Y = tf.nn.softmax(tf.matmul(X, W) + b)

tensor shapes: X[100, 748] W[748,10] b[10]

matrix multiply broadcast on all lines

Now in TensorFlow (Python)

Page 7: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

7Data & Analytics

Cross entropy:

computed probabilities

actual probabilities, “one-hot” encoded

0 0 0 0 0 0 1 0 0 0

this is a “6”

0.1 0.2 0.1 0.3 0.2 0.1 0.9 0.2 0.1 0.1

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

Success ?

Page 9: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

92%

Page 10: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

10Data & Analytics

import tensorflow as tf

X = tf.placeholder(tf.float32, [None, 28, 28, 1])W = tf.Variable(tf.zeros([784, 10]))b = tf.Variable(tf.zeros([10]))

init = tf.initialize_all_variables()

this will become the batch size, 100

28 x 28 grayscale images

TensorFlow - initialisation

Training = computing variables W and b

Page 11: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

11Data & Analytics

# modelY = tf.nn.softmax(tf.matmul(tf.reshape(X, [-1, 784]), W) + b)# placeholder for correct answersY_ = tf.placeholder(tf.float32, [None, 10])

# loss function

cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

# % of correct answers found in batchis_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

“one-hot” encoded

“one-hot” decoding

TensorFlow - success metrics

flattening images

Page 12: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

12Data & Analytics

optimizer = tf.train.GradientDescentOptimizer(0.003)train_step = optimizer.minimize(cross_entropy)

learning rate

TensorFlow - training

loss function

Page 13: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

sess = tf.Session()sess.run(init)

for i in range(1000):# load batch of images and correct answersbatch_X, batch_Y = mnist.train.next_batch(100)train_data={X: batch_X, Y_: batch_Y}

# train

sess.run(train_step, feed_dict=train_data)

# success ?

a,c = sess.run([accuracy, cross_entropy], feed_dict=train_data)

# success on test data ?test_data={X: mnist.test.images, Y_: mnist.test.labels}a,c = sess.run([accuracy, cross_entropy, It], feed=test_data)

running a Tensorflow computation, feeding placeholders

TensorFlow - run !

Tip:do this every 100 iterations

Page 14: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

14Data & Analytics

import tensorflow as tf

X = tf.placeholder(tf.float32, [None, 28, 28, 1])W = tf.Variable(tf.zeros([784, 10]))b = tf.Variable(tf.zeros([10]))init = tf.initialize_all_variables()

# modelY=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b)

# placeholder for correct answersY_ = tf.placeholder(tf.float32, [None, 10])

# loss functioncross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

# % of correct answers found in batchis_correct = tf.equal(tf.argmax(Y,1), tf.argmax(Y_,1))accuracy = tf.reduce_mean(tf.cast(is_correct,tf.float32))

optimizer = tf.train.GradientDescentOptimizer(0.003)train_step = optimizer.minimize(cross_entropy)

sess = tf.Session()sess.run(init)

for i in range(10000):# load batch of images and correct answersbatch_X, batch_Y = mnist.train.next_batch(100)train_data={X: batch_X, Y_: batch_Y}

# trainsess.run(train_step, feed_dict=train_data)

# success ? add code to print ita,c = sess.run([accuracy, cross_entropy], feed=train_data)

# success on test data ?test_data={X:mnist.test.images, Y_:mnist.test.labels}a,c = sess.run([accuracy, cross_entropy], feed=test_data)

initialisation

model

success metrics

training step

Run

TensorFlow - full python code

Page 15: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

ML

SoftmaxCross-entropy

Mini-batch

Cookbook

Page 16: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

|

|Go deep !|

|

Page 17: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

17Data & Analytics

9...0 1 2

sigmoïd function

softmax

200

100

60

10

30

784overkill

;-)

Let’s try 5 fully-connected layers !

Page 18: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

18Data & Analytics

K = 200

L = 100

M = 60

N = 30

W1 = tf.Variable(tf.truncated_normal([28*28, K] ,stddev=0.1))

B1 = tf.Variable(tf.zeros([K])

W2 = tf.Variable(tf.truncated_normal([K, L], stddev=0.1))

B2 = tf.Variable(tf.zeros([L])

W3 = tf.Variable(tf.truncated_normal([L, M], stddev=0.1))

B3 = tf.Variable(tf.zeros([M])

W4 = tf.Variable(tf.truncated_normal([M, N], stddev=0.1))

B4 = tf.Variable(tf.zeros([N])

W5 = tf.Variable(tf.truncated_normal([N, 10], stddev=0.1))

B5 = tf.Variable(tf.zeros([10]))

weights initialised with random values

TensorFlow - initialisation

Page 19: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

19Data & Analytics

X = tf.reshape(X, [-1, 28*28])

Y1 = tf.nn.sigmoid(tf.matmul(X, W1) + B1)

Y2 = tf.nn.sigmoid(tf.matmul(Y1, W2) + B2)

Y3 = tf.nn.sigmoid(tf.matmul(Y2, W3) + B3)

Y4 = tf.nn.sigmoid(tf.matmul(Y3, W4) + B4)

Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)

weights and biases

TensorFlow - the model

Page 20: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

20Data & Analytics

Demo - slow start ?

Page 21: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Relu !

Page 22: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

22Data & Analytics

RELURELU = Rectified Linear Unit

Y = tf.nn.relu(tf.matmul(X, W) + b)

Page 23: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

23Data & Analytics

RELU

Page 24: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Demo - noisy accuracy curve ?

yuck!

Page 25: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Slow down . . . Learning

rate decay

Page 26: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

26Data & Analytics

Learning rate decay

Learning rate 0.003Learning rate 0.003 at start then dropping exponentially to 0.0001

Page 27: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

27Data & Analytics

Demo - dying neurons

Dying...

Page 28: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Dropout

Page 29: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

29Data & Analytics

Dropout

pkeep =

tf.placeholder(tf.float32) TRAININGpkeep=0.75

EVALUATION pkeep=1

Yf = tf.nn.relu(tf.matmul(X, W) + B)

Y = tf.nn.dropout(Yf, pkeep)

Page 30: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

30Data & Analytics

Dying...Dead

with dropout

Dropout

Page 32: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

32Data & Analytics

98%

Page 33: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

33Data & Analytics

Sigmoid, learning rate = 0.003RELU, learning rate = 0.003RELU, decaying learning rate 0.003 -> 0.0001RELU, decaying learning rate 0.003 -> 0.0001 and dropout 0.75

97.9%98.2%peak98.2%sustained

All the party tricks

Page 34: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

34Data & Analytics

Overfitting

Overfitting

Cross-entropy loss

Page 35: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Too many neurons

Overfitting ?!?

Not enough DATA

BAD Network

Page 36: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

36Data & Analytics

W1[4, 4, 3]

W2[4, 4, 3]

+padding

W[4, 4, 3, 2]

filter size

input channels

output channels

stride

convolutionalsubsampling

convolutionalsubsampling

convolutionalsubsampling

Convolutional layer

Page 37: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Hacker’s tip

ALLConvolu-tional

Page 38: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

convolutional layer, 4 channelsW1[5, 5, 1, 4] stride 1

convolutional layer, 8 channelsW2[4, 4, 4, 8] stride 2

convolutional layer, 12 channelsW3[4, 4, 8, 12] stride 2

28x28x1

28x28x4

14x14x8

200

7x7x12

10fully connected layer W4[7x7x12, 200]softmax readout layer W5[200, 10]

+ biases on all layers

Convolutional neural network

Page 39: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

K=4

L=8

M=12

W1 = tf.Variable(tf.truncated_normal([5, 5, 1, K] ,stddev=0.1))B1 = tf.Variable(tf.ones([K])/10)

W2 = tf.Variable(tf.truncated_normal([5, 5, K, L] ,stddev=0.1))

B2 = tf.Variable(tf.ones([L])/10)

W3 = tf.Variable(tf.truncated_normal([4, 4, L, M] ,stddev=0.1))

B3 = tf.Variable(tf.ones([M])/10)

N=200

W4 = tf.Variable(tf.truncated_normal([7*7*M, N] ,stddev=0.1))

B4 = tf.Variable(tf.ones([N])/10)

W5 = tf.Variable(tf.truncated_normal([N, 10] ,stddev=0.1))

B5 = tf.Variable(tf.zeros([10])/10)

filter size

input channels

output channels

weights initialised with random values

Tensorflow - initialisation

Page 40: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

40Data & Analytics

Y1 = tf.nn.relu(tf.nn.conv2d(X, W1, strides=[1, 1, 1, 1], padding='SAME') + B1)

Y2 = tf.nn.relu(tf.nn.conv2d(Y1, W2, strides=[1, 2, 2, 1], padding='SAME') + B2)

Y3 = tf.nn.relu(tf.nn.conv2d(Y2, W3, strides=[1, 2, 2, 1], padding='SAME') + B3)

YY = tf.reshape(Y3, shape=[-1, 7 * 7 * M])

Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4)

Y = tf.nn.softmax(tf.matmul(Y4, W5) + B5)

weights biasesstride

flatten all values for fully connected layer

input image batchX[100, 28, 28, 1]

Y3 [100, 7, 7, 12]

YY [100, 7x7x12]

Tensorflow - the model

Page 42: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

42Data & Analytics

98.9%

Page 43: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

43Data & Analytics

???

WTFH ???

Page 44: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

convolutional layer, 12 channelsW2[5, 5, 6, 12] stride 2convolutional layer, 12 channelsW2[5, 5, 6, 12] stride 2

convolutional layer, 6 channelsW1[6, 6, 1, 6] stride 1

convolutional layer, 24 channelsW3[4, 4, 12, 24] stride 2convolutional layer, 24 channelsW3[4, 4, 12, 24] stride 2

convolutional layer, 6 channelsW1[6, 6, 1, 6] stride 1

28x28x1

28x28x6

14x14x12

200

7x7x24

10fully connected layer W4[7x7x24, 200]softmax readout layer W5[200, 10]

+ biases on all layers

+DROPOUT p=0.75

Bigger convolutional network + dropout

Page 46: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

46Data & Analytics

99.3%

Page 47: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

47Data & Analytics

YEAH !

with dropout

Page 48: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Too many neurons

Overfitting ?!?

Not enough DATA

BAD Network

Go d

eep

| |ALL

Convolu-tional

Relu !

Dropout

Learning rate decay

SoftmaxCross-entropy

Mini-batch

Cartoon images copyright: alexpokusay / 123RF stock photos

Page 49: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

49Data & Analytics

tensorflow.org

Have fun !

Cloud ML ALPHA

your TensorFlow models trained in Google’s cloud, fast.

Pre-trained models:

Cloud Vision API

Cloud Speech API ALPHA

Google Translate API

- cloud.google.com

Martin GörnerGoogle Developer relations

@martin_gorner

plus.google.com/+MartinGorner

All code snippets are on GitHub:github.com/martin-gorner/tensorflow-mnist-tutorial

This presentation:

goo.gl/pHeXe7

That’s allfolks...

Page 50: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

50Data & Analytics

Workshop

Keyboard shortcuts for thevisualisation GUI:

1 ......... display 1st graph only2 ......... display 2nd graph only3 ......... display 3rd graph only4 ......... display 4th graph only5 ......... display 5th graph only6 ......... display 6th graph only7 ......... display graphs 1 and 28 ......... display graphs 4 and 59 ......... display graphs 3 and 6ESC or 0 .. back to displaying all graphs

SPACE ..... pause/resumeO ......... box zoom mode (then use mouse)H ......... reset all zoomsCtrl-S .... save current image

Page 51: Science ! deep without a PhD Code >TensorFlow and …TensorFlow and deep learning_ without a PhD #Tensorflow @martin_gorner deep Science ! deep Code ...? MNIST = Mixed National Institute

Workshop Starter code and solutions:github.com/martin-gorner/tensorflow-mnist-tutorial

1. Theory (sit back and listen)

Softmax classifier, mini-batch, cross-entropy and how to implement them in Tensorflow (slides 1-14)

2. PracticeOpen file: mnist_1.0_softmax.pyRun it, play with the visualisations (see instructions on previous slide), read and understand the code as well as the basic structure of a Tensorflow program.

3. Theory (sit back and listen)

Hidden layers, sigmoid activation function (slides 16-19)

4. PracticeStart from the file you have and add one or two hidden layers. Use cross_entropy_with_logits to avoid numerical instabilities with log(0).

Solution in: mnist_2.0_five_layers_sigmoid.py

5. Theory (sit back and listen)

The neural network toolbox: RELUs, learning rate decay, dropout, overfitting (slides 20-35)

6. PracticeReplace all your sigmoids with RELUs. Test. Then add learning rate decay from 0.003 to 0.0001 using the formula lr = lrmin+(lrmax-lrmin)*exp(-i/2000).

Solution in: mnist_2.1_five_layers_relu_lrdecay.py

7. Practice (if time allows)

Add dropout on all layers using a value between 0.5 and 0.8 for pkeep.

Solution in: mnist_2.2_five_layers_relu_lrdecay_dropout.py

8. Theory (sit back and listen)

Convolutional networks (slides 36-42)

9. Practice

Replace your model with a convolutional network, without dropout.

Solution in: mnist_3.0_convolutional.py

10. Practice (if time allows)

Try a bigger neural network (good hyperparameters on slide 44) and add dropout on the last layer.

Solution in: mnist_3.0_convolutional_bigger_dropout.py