Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX...
Transcript of Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX...
![Page 1: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/1.jpg)
Neural Network
Regularization and Activation
FunctionDr. Mongkol Ekpanyapong
![Page 2: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/2.jpg)
MNIST Dataset
• MNIST is used as the “hello world” for
machine learning
• In this study, we will try to use ANN for
MNIST classification
![Page 3: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/3.jpg)
One hot encoding
• The vector value in where the index is set
to 1 or 0
• It helps machine learning performing
better classification and less confusion
• For example, if we want to calculate the
average when 5 time of digit 1 and 5 time
digit 3, using number representation for
the average function (without do it
carefully) can result in digit 2
![Page 4: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/4.jpg)
MNIST Trainingimport sys, numpy as np
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000,28*28) / 255, y_train[0:1000])
one_hot_labels = np.zeros((len(labels),10))
for i,l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test),28*28) / 255
test_labels = np.zeros((len(y_test),10))
for i,l in enumerate(y_test):
test_labels[i][l] = 1
np.random.seed(1)
relu = lambda x:(x>=0) * x # returns x if x > 0, return 0 otherwise
relu2deriv = lambda x: x>=0 # returns 1 for input > 0, return 0 otherwise
alpha, iterations, hidden_size, pixels_per_image, num_labels = (0.005, 350, 40,
784, 10)
![Page 5: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/5.jpg)
weights_0_1 = 0.2*np.random.random((pixels_per_image,hidden_size)) - 0.1
weights_1_2 = 0.2*np.random.random((hidden_size,num_labels)) - 0.1
for j in range(iterations):
error, correct_cnt = (0.0, 0)
for i in range(len(images)):
layer_0 = images[i:i+1]
layer_1 = relu(np.dot(layer_0,weights_0_1))
layer_2 = np.dot(layer_1,weights_1_2)
error += np.sum((labels[i:i+1] - layer_2) ** 2)
correct_cnt += int(np.argmax(layer_2) == \
np.argmax(labels[i:i+1]))
layer_2_delta = (labels[i:i+1] - layer_2)
layer_1_delta = layer_2_delta.dot(weights_1_2.T)\
* relu2deriv(layer_1)
weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
if j == 0 or (j + 1) % 50 == 0:
print("[INFO] epoch={}, loss={:.7f}".format(j + 1, error))
sys.stdout.write("\r I:"+str(j)+ \
" Train-Err:" + str(error/float(len(images)))[0:5] +\
" Train-Acc:" + str(correct_cnt/float(len(images))))
![Page 6: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/6.jpg)
MNIST Training
• Three layers of ANN
• Accuracy is 1.00 on training data
• What’s about the result on test data?
![Page 7: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/7.jpg)
Testing on the test data
if(j % 10 == 0 or j == iterations-1):
error, correct_cnt = (0.0, 0)
for i in range(len(test_images)):
layer_0 = test_images[i:i+1]
layer_1 = relu(np.dot(layer_0,weights_0_1))
layer_2 = np.dot(layer_1,weights_1_2)
error += np.sum((test_labels[i:i+1] - layer_2) ** 2)
correct_cnt += int(np.argmax(layer_2) == \
np.argmax(test_labels[i:i+1]))
sys.stdout.write(" Test-Err:" + str(error/float(len(test_images)))[0:5] +\
" Test-Acc:" + str(correct_cnt/float(len(test_images))) + "\n")
print()
![Page 8: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/8.jpg)
Results
Loss function
![Page 9: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/9.jpg)
Memorization vs. Generalization
• Why the accuracy goes down?
• The system is trying to memorize the data
• This is also known as system overfit
(ANN can go worse if we train them too
much)
• How can we solve the problem?
![Page 10: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/10.jpg)
Regularization
• Early Stopping
• Dropout
• Batch Gradient Descent
• Loss function modification
![Page 11: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/11.jpg)
Early Stopping
• Once the system is trained to much, it will
start to memorize the training data
• How do we know when to stop training?
• The only way to know is to run the model
that isn’t in your training dataset and truly
represent the real data
• The rational behinds validation set is for
this purpose
![Page 12: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/12.jpg)
Dropout
• Randomly turning neurons off (setting to 0)
• Dropout makes our big network act like a
small one by randomly training little
subsections of the network at a time
• Small networks are more difficult to overfit
![Page 13: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/13.jpg)
ExampleNeed to multiply
with layer 0
![Page 14: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/14.jpg)
Explanation
• We create a Bernoulli random distribution
with 50% one and 50% zero
• Note that we have to multiply the weight of
layer 1 by 2 to compensate with the 50%
dropout
• Otherwise, the sum of the value to layer 2
will be cut in half
![Page 15: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/15.jpg)
![Page 16: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/16.jpg)
Output
![Page 17: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/17.jpg)
Batch Gradient Descent
• We train a batch of data instead of one at a time
• It allows smoother accuracy due to the reduce of
individual noise
• Note that the alpha weight have to be increased
proportion to the number of batch size
• The accuracy is also better, due to the reduction
of the noise
• The computation is much faster as we use a big
matrix size
![Page 18: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/18.jpg)
Code
![Page 19: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/19.jpg)
Output
![Page 20: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/20.jpg)
Activation Function
• Activation function is a function applied to
the neuron in a layer during prediction
• RELU is one example of activation
function
• There are some constrains for
the properties of activation functions
![Page 21: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/21.jpg)
Activation Function properties
• The function must be continuous and
infinite in domain
• The function is monotonic
• The function is nonlinear
• The function is efficiently computable
![Page 22: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/22.jpg)
Example of activation functions
• Sigmoid
• Tanh
• RELU (Rectified Linear Unit) and
ELU (Exponential Linear Unit)
![Page 23: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/23.jpg)
Standard Recommendation
• Most recent papers tend to use RELU as
the baseline
• With good enough configuration, ELU can
has 2-3% better accuracy
![Page 24: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/24.jpg)
Standard Output Layer
Activation Functions• Predict Raw Data Value (Regression)
- no activation function
• Predict Binary Output (Yes/No)
– Sigmoid activation function
• Predict Which one (from many classes)
– Softmax activation function
![Page 25: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/25.jpg)
Example
• Raw output
• Sigmoid
• Softmax
![Page 26: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/26.jpg)
Softmax Function
![Page 27: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/27.jpg)
Modify Activation Function
• Feed Forward
• Back Propagation
![Page 28: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/28.jpg)
Derivative function
• RELU derivative function
• Multiply delta by the weight
![Page 29: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/29.jpg)
Back Propagation
![Page 30: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/30.jpg)
![Page 31: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/31.jpg)
Output
![Page 32: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/32.jpg)
Weight Initialization
How do we initialize weight matrices
• Constant initialization
• Uniform and Normal distribution
• LeCun Uniform and Normal
• Glorot/Xavier Uniform and Normal
• He et al./Kaiming/MSRA Uniform and
Normal
![Page 33: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/33.jpg)
Constant Initialization
• We can initial the weight with constant
zero, one, or a constant value
64 rows, 32 columns
• It is not good in practice due to a fixed
value of the weight
![Page 34: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/34.jpg)
Uniform and Normal Distribution
• A uniform distribution draws a random
value from the range [lower,upper] with
equal probability
• A normal distribution uses Gaussian
distribution
• Both can be used for weight initialization,
but various heuristics provide better
performance
• Normal distribution with std. variation=0.05
Image from wikipedia
![Page 35: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/35.jpg)
LeCun Uniform and Normal
• The idea is to use the number of fan in
(number of inputs to the layer) and fan out
(number of outputs from the layer) along
with uniform or normal distribution
• For LeCun normal distribution
![Page 36: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/36.jpg)
Glorot/Xavier Uniform and
Normal• Similar to LeCun with minor change in the
equation
• For Glorot/Xavier normal distribution
• Provide good performance in general cases
![Page 37: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/37.jpg)
He et al.
• Usually uses for deep network
• For He et al., normal distribution
![Page 38: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/38.jpg)
For Deep learning, the
number of layers has to be
more than two hidden layers
![Page 39: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/39.jpg)
Keras implementation of MNISTfrom sklearn.preprocessing import LabelBinarizer
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
import matplotlib.pyplot as plt
import numpy as np
import argparse
from keras.datasets import mnist
(trainX, trainY), (testX, testY) = mnist.load_data()
trainX, trainY = (trainX.reshape(trainX.shape[0],28*28) / 255, trainY)
testX, testY = (testX.reshape(testX.shape[0],28*28) / 255, testY)
![Page 40: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/40.jpg)
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
model = Sequential()
model.add(Dense(256, input_shape=(784,), activation="sigmoid"))
model.add(Dense(128, activation="sigmoid"))
model.add(Dense(10, activation="softmax"))
# train the model usign SGD
print("[INFO] training network...")
sgd = SGD(0.01)
model.compile(loss="categorical_crossentropy", optimizer=sgd,
metrics=["accuracy"])
H = model.fit(trainX, trainY, validation_data=(testX, testY),
epochs=100, batch_size=128)
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=128)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1),
target_names=[str(x) for x in lb.classes_]))
![Page 41: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/41.jpg)
# plot the training loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 100), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 100), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 100), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, 100), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.show()
# plt.savefig("output_file")
![Page 42: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/42.jpg)
Output
• Test result should get around 92%
accuracy
![Page 43: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/43.jpg)
Four Ingredients in a NN Recipe
• Dataset
The more the data, the better the accuracy
• Loss Function
Usually use categorical cross-entropy
• Model/Architecture
Next slide
• Optimization
The default is Stochastic Gradient Descent
![Page 44: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/44.jpg)
Model/Architecture
Which model to use depend on these
question?
• How many data points you have?
• The number of classes
• How similar/dissimilar the classes are
• The intra-class variance
![Page 45: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/45.jpg)
CIFARfrom sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.models import Sequential
from keras.layers.core import Dense
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
![Page 46: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/46.jpg)
print("[INFO] loading CIFAR-10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0
trainX = trainX.reshape((trainX.shape[0], 3072))
testX = testX.reshape((testX.shape[0], 3072))
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)
labelNames = ["airplane", "automobile", "bird", "cat", "deer",
"dog", "frog", "horse", "ship", "truck"]
model = Sequential()
model.add(Dense(1024, input_shape=(3072,), activation="relu"))
model.add(Dense(512, activation="relu"))
model.add(Dense(10, activation="softmax"))
![Page 47: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/47.jpg)
print("[INFO] training network...")
sgd = SGD(0.01)
model.compile(loss="categorical_crossentropy", optimizer=sgd,
metrics=["accuracy"])
H = model.fit(trainX, trainY, validation_data=(testX, testY),
epochs=100, batch_size=32)
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=labelNames))
![Page 48: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/48.jpg)
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 100), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 100), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 100), H.history["acc"], label="train_acc")
plt.plot(np.arange(0, 100), H.history["val_acc"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.show()
![Page 49: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/49.jpg)
CIFAR Output
![Page 50: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/50.jpg)
Training Loss and Accuracy
![Page 51: Neural Network Regularization and Activation Functionesl.ait.ac.th/courses/AT81.XX DeepLearning/class5-3 - ANN... · One hot encoding • The vector value in where the index is set](https://reader034.fdocuments.net/reader034/viewer/2022050205/5f587225d993de44e300a0d5/html5/thumbnails/51.jpg)
Questions?