Tiny ImageNet Challenge -Scaling of Inception Layers for...

1
Tiny ImageNet Challenge - Scaling of Inception Layers for Reduced Scale Classification CS 231N Poster Session Emeric Stéphane Boigné, Jan Felix Heyse {eboigne, heyse}@stanford.edu Tiny ImageNet Challenge The Tiny ImageNet Challenge is a classification challenge within the CS 231N class, using the Tiny ImageNet dataset. Tiny ImageNet Dataset The Tiny ImageNet is a dataset used for the training and testing of neural networks for visual recognition problems comprising images of dimensions 64x64 pixels. It covers 200 different classes with: 100,000 labeled training images 10,000 labeled validation images 10,000 unlabeled test images Pre-processing and Data Augmentation The data is first normalized such that the training set has zero mean and unit variance. Data augmentation was used to increase the amount of available training data. For each image we added a second one, which was with equal probability either flipped in the horizontal direction, rotated clockwise (6, 8 or 10 degrees), or rotated counterclockwise (6, 8 or 10 degrees). This yielded a training dataset of almost 200,000 labeled images. 2 – Our Convolutional Neural Network Architecture Network Architecture 1 – Problem Statement and Dataset 3 – Training the Network 4 – Visualization Visualization of activation features For each Inception module, we looked at the most 6 activated images for different neurons (bottom images with true label)[2]. We computed the gradients of the neuron with respect to the input images yielding a map of the pixels that are sensitive to the neuron (top images with predicted label) [3]. First Inception Module 5 – Future Work References Scatter plot of the validation accuracy Prediction accuracy learning history for baseline and current CNN Third Inception Module Analysis in Feature Space We extracted the data entering the last layer and performed a PCA The reduced representation has 3 dimensions, accounting for 16% of the total variance. It is shown below for 3 labels. Principal Component Analysis – Clustering of three labels [1] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014. [2] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013. [3] Erhan, D., Bengio, Y., Courville, A., and Vincent, P. Visualizing higher-layer features of a deep network. In Technical report, University of Montreal, 2009. Carry on the Visualization Analysis Develop a Deconvolutional Network to visualize the activated pixels instead of the ones corresponding the high fluctuations [2]. Leverage our visualization results to guide our improvement of the current architecture Layer Filter size / stride Output size convolution 3x3 / 1 62x62x128 max pool 4x4 / 2 30x30x128 inception #1 30x30x128 max pool 2x2 / 2 16x16x128 inception #2 16x16x128 inception #3 16x16x256 max pool 2x2 / 2 8x8x258 inception #4 8x8x320 avg pool 8x8 / 1 1x1x320 fully connected 1x1x200 softmax 1x1x200 # 1x1 3x3 red 3x3 5x5 red 5x5 pool 1 32 48 64 8 16 16 2 32 48 64 8 16 16 3 64 96 128 16 32 32 4 80 120 160 20 40 40 CNN Architecture Inspired from [1], our design is a reduced version of the original GoogleNet, which was designed for 1000 classes. We investigate how the Inception module architecture scales for a smaller classification task of 200 classes. Each convolution layer is followed by a batch normalization, a ReLU activation, and for the current architecture by a drop out layer. Inception Modules Inception modules are elements in CNNs that consist in parallel convolutions concatenated depth- wise. Overfitting We tackled overfitting by introducing dropout layers and by varying the number of parameters of our architecture. Hyperparameter fitting On a reduced size dataset, we compared accuracies while performing a random search over the weight decay and learning rate to get a range of reasonable values. Last Inception Module

Transcript of Tiny ImageNet Challenge -Scaling of Inception Layers for...

Page 1: Tiny ImageNet Challenge -Scaling of Inception Layers for ...cs231n.stanford.edu/reports/2017/posters/938.pdf · Tiny ImageNet Challenge -Scaling of Inception Layers for Reduced Scale

TinyImageNetChallenge- ScalingofInceptionLayersforReducedScaleClassification

CS231NPosterSessionEmericStéphane Boigné,JanFelixHeyse{eboigne,heyse}@stanford.edu

TinyImageNetChallengeThe Tiny ImageNet Challenge is a classification challenge within theCS 231N class, using the Tiny ImageNet dataset.

Tiny ImageNet DatasetThe Tiny ImageNet is a dataset used for the training and testing ofneural networks for visual recognition problems comprising imagesof dimensions 64x64 pixels. It covers 200 different classes with:• 100,000 labeled training images• 10,000 labeled validation images• 10,000 unlabeled test images

Pre-processing and Data AugmentationThe data is first normalized such that the training set has zero meanand unit variance. Data augmentation was used to increase theamount of available training data. For each image we added asecond one, which was with equal probability either• flipped in the horizontal direction,• rotated clockwise (6, 8 or 10 degrees), or• rotated counterclockwise (6, 8 or 10 degrees).This yielded a training dataset of almost 200,000 labeled images.

2 – OurConvolutionalNeuralNetworkArchitecture

NetworkArchitecture

1– ProblemStatementandDataset

3– TrainingtheNetwork

4– Visualization

Visualizationofactivationfeatures• For each Inception module, we looked at the

most 6 activated images for differentneurons (bottom images with true label)[2].

• We computed the gradients of the neuronwith respect to the input images yielding amap of the pixels that are sensitive to theneuron (top images with predicted label) [3].

FirstInceptionModule

5– FutureWork

References

Scatterplotofthevalidationaccuracy PredictionaccuracylearninghistoryforbaselineandcurrentCNN

ThirdInceptionModule

AnalysisinFeatureSpace• We extracted the data entering the last

layer and performed a PCA• The reduced representation has 3

dimensions, accounting for 16% of the totalvariance. It is shown below for 3 labels.

PrincipalComponentAnalysis– Clusteringofthreelabels

[1] C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.E.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,andA.Rabinovich.Goingdeeperwithconvolutions.CoRR,abs/1409.4842,2014.[2]M.D.Zeiler andR.Fergus.Visualizingandunderstandingconvolutionalnetworks.CoRR,abs/1311.2901,2013.[3]Erhan,D.,Bengio,Y.,Courville,A.,andVincent,P.Visualizinghigher-layerfeaturesofadeepnetwork.InTechnicalreport,UniversityofMontreal,2009.

CarryontheVisualizationAnalysis• Develop a Deconvolutional Network

to visualize the activated pixelsinstead of the ones correspondingthe high fluctuations [2].

• Leverage our visualization results toguide our improvement of thecurrent architecture

Layer Filtersize/stride Outputsize

convolution 3x3/1 62x62x128maxpool 4x4/2 30x30x128

inception#1 30x30x128maxpool 2x2/2 16x16x128

inception#2 16x16x128inception#3 16x16x256maxpool 2x2 /2 8x8x258

inception#4 8x8x320avg pool 8x8/ 1 1x1x320

fullyconnected 1x1x200softmax 1x1x200

# 1x1 3x3red 3x3 5x5

red 5x5 pool

1 32 48 64 8 16 162 32 48 64 8 16 163 64 96 128 16 32 324 80 120 160 20 40 40

CNN ArchitectureInspired from [1], our design is areduced version of the originalGoogleNet, which was designed for1000 classes. We investigate howthe Inception module architecturescales for a smaller classificationtask of 200 classes. Eachconvolution layer is followed by abatch normalization, a ReLUactivation, and for the currentarchitecture by a drop out layer.

Inception ModulesInception modules are elements inCNNs that consist in parallelconvolutions concatenated depth-wise.

OverfittingWe tackled overfitting by introducingdropout layers and by varying thenumber of parameters of ourarchitecture.

Hyperparameter fittingOn a reduced size dataset, wecompared accuracies while performinga random search over the weight decayand learning rate to get a range ofreasonable values.

LastInceptionModule