ImageNet Winning CNN Architectures – A...

ImageNetWinningCNNArchitectures–AReviewRajatVikramSingh

rajats@andrew.cmu.eduI. IntroductionSinceaconvolutionalneuralnetworkwontheImageNetchallengein2012,researchinCNNshasproliferatedinanattempttoimprovethemwithprogressbeingmadeeveryyear.InthisreportIwillbediscussingthemajorimprovementsmadeinCNNssincetheadventofmodernCNNs.Iwillbebrieflydiscussingsomeofthepath-breakingapproachesandthevalueitaddstothepreviousapproaches.II. ImageNetChallengeandCNNArchitecturesSince 2010, the annual ImageNet Large Scale Visual Recognition Challenge (ILSVCR) [1],commonly called the ImageNet challenge, is a competition where research teams submitprogramsthatclassifyanddetectobjectsandscenes.Overtheyears,variousapproachesandarchitectureshavebeenusedtocompeteintheImageNetchallengeandeveryyearmanynewand exciting architecturesmake it to the competition. I will be covering the winners of theImageNetchallengefrom2012to2015inthisreport.III. AlexNetAlexNet[2]isconsideredtobethebreak-throughpaperwhichrosetheinterestinCNNswhenitwon the ImageNet challenge of 2012. AlexNet is a deep CNN trained on ImageNet andoutperformedall theentries thatyear. Itwasamajor improvementwith thenextbestentrygettingonly26.2%top5testerrorrate.Comparedtomodernarchitectures,arelativelysimplelayoutwasusedinthispaper.Thearchitecturefromtheirpaperisasfollows:

Fig.1:AlexNetarchitecture

The network wasmade up of 5 conv layers,max-pooling layers, dropout layers, and 3 fullyconnectedlayersattheend.AlexNetusedReLUforthenonlinearityfunctions,whichtheyfoundtodecreasetrainingtimebecauseReLUsaremuchfasterthanusingtanhfunctions.Theyalsodidimagetranslations,horizontalreflections,andpatchextractionsasawayofaugmentingthedatabeforeusingittotraintheirnetwork.Theyalsouseddropoutlayerstopreventover-fittingtothe

trainingdata.Theyusedabatchstochasticgradientdescentoptimization.TheirimplementationwastrainedonGPUsforfivetosixdays.IV. ZFNetTheZFpaper[3]hasaslightlymodifiedAlexNetmodelwhichgivesbetteraccuracy.Thepaperalsodescribesaveryinterestingwayofvisualizingfeaturemaps(deconvnets).ThearchitectureoftheZFNetasdescribedintheirpaperisasfollows:

Fig.2–ZFNetArchitectureZFNetused1.3millionimagesfortraining,comparedto15millionimagesusedbyAlexNet.OnemajordifferenceintheapproacheswasthatZFNetused7x7sizedfilterswhereasAlexNetused11x11filters.Theintuitionbehindthisisthatbyusingbiggerfilterswewerelosingalotofpixelinformation,whichwecan retainbyhaving smaller filter sizes in theearlier conv layers. Thenumberoffiltersincreaseaswegodeeper.ThisnetworkalsousedReLUsfortheiractivationandtrainedusingbatchstochasticgradientdescent. It trainedonaGPUfor twelvedays.Anotherinterestingapproach fromthepaperwas this ideaofaDeConvNetwhichcanbeusedtoseewhichimagepixelsexciteeachfilterandprovidesgreatintuitioninhowCNNswork.

Fig.3–VisualizationofaDeConvNet

V. VGGNetVGGNet[4]wasatechniqueproposedfortheImageNetchallengeof2013.VGGNetdidn’twinthe ImageNet 2013 challenge but it is still used by many people because it was a simplearchitecturebasedontheAlexNettypearchitecture.Thearchitectureisdescribedasbelow:

Fig.4–VGGNet–Alltheapproachestried.ColumnDgivesthebestperformingarchitecture.

VGGNetused3x3filterscomparedto11x11filtersinAlexNetand7x7inZFNet.Theauthorsgivetheintuitionbehindthisthathavingtwoconsecutive2consecutive3x3filtersgivesaneffectivereceptivefieldof5x5,and3–3x3filtersgiveareceptivefieldof7x7filters,butusingthiswecanuseafarlessnumberofhyper-parameterstobetrainedinthenetwork.Asyoumaynoticefromthe architecture the number of filters double after every max-pooling operation. As a dataaugmentationtechniquescalejitteringwasused.TrainedwithbatchgradientdescentandusedRelUs.Trainedon4GPUsfortwotothreeweeks.

VI. GoogLeNetWithsubmissionslikeVGGNetImageNetChallenge2014hadmanygreatsubmissions,butthewinnerofthemallwasGoogle’sGoogLeNet[5](Thename‘GooLeNet’isatributetotheworksof Yann LeCun in his LeNet [6],widely considered to be the first use ofmodern CNNs). TheGoogLeNetarchitectureisvisuallyrepresentedasfollows:

Fig.5-GoogLeNetarchitectureInceptionmodule:GoogLeNetproposedsomethingcalledthe inceptionmodules. Ifyou lookatthearchitecture,youcannoticesomeskipconnectionsinthenetworkessentiallyformingaminimoduleandthatmoduleisrepeatedthroughoutthenetwork.Googlecalledthismoduleaninceptionmodule.Thedetailsofthemoduleareasfollows:

Fig.6-Inceptionmodule

GoogLeNetuses9 inceptionmoduleand iteliminatesall fullyconnected layersusingaveragepooling to go from7x7x1024 to1x1x1024. This saves a lotofparameters.As a formofdataaugmentation,multiplecropsofthesameimagewerecreatedandthenetworkwastrainedonit.Trainingtooklessthanaweekwithfewhigh-endGPUs.VII. MicrosoftResNetThelastCNNarchitectureI’lldiscusshereistheMicrosoftResNet(residualnetwork)[7]whichwonthe2015ImageNetchallenge.ThearchitectureofthisCNNisasfollows:

Fig.7-MicrosoftResNetvisualizationcomparedtoaVGGNet

Fig.8-ResidualBlockinaResNet

Thereare152layersintheMicrosoftResNet.Theauthorsshowedempiricallythatifyoukeeponaddinglayerstheerrorrateshouldkeepondecreasingincontrastto“plainnets”whereadding

afewlayersresultedinhighertrainingandtesterrors.Ittooktwotothreeweekstotrainitonan8GPUmachine.Oneintuitivereasonwhyresidualblocksimproveclassificationisthedirectstepfromonelayertothenextandintuitivelyusingalltheseskipstepsformagradienthighwaywherethegradientscomputedcandirectlyaffecttheweightsinthefirstlayermakingupdateshavemoreeffect.VIII. References[1] Deng,Jia,etal."Imagenet:Alarge-scalehierarchicalimagedatabase."ComputerVision

andPatternRecognition,2009.CVPR2009.IEEEConferenceon.IEEE,2009.[2] Krizhevsky,Alex,IlyaSutskever,andGeoffreyE.Hinton."Imagenetclassificationwith

deepconvolutionalneuralnetworks."Advancesinneuralinformationprocessingsystems.2012.

[3] Zeiler,MatthewD.,andRobFergus."Visualizingandunderstandingconvolutionalnetworks."EuropeanConferenceonComputerVision.SpringerInternationalPublishing,2014.

[4] Simonyan,Karen,andAndrewZisserman."Verydeepconvolutionalnetworksforlarge-scaleimagerecognition."arXivpreprintarXiv:1409.1556(2014).

[5] Szegedy,Christian,etal."Goingdeeperwithconvolutions."ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.2015.

[6] LeCun,Y.,Bottou,L.,Bengio,Y.,andHaffner,P.(1998d).Gradient-basedlearningappliedtodocumentrecognition.ProceedingsoftheIEEE,86(11),2278–2324.

[7] He,Kaiming,etal."Deepresiduallearningforimagerecognition."arXivpreprintarXiv:1512.03385(2015).

[8] CS231nVideoLectures:https://www.youtube.com/playlist?list=PLLvH2FwAQhnpj1WEB-jHmPuUeQ8mX-XXG(retrieved:Oct25th,2016)

[9] DeepLearningSlides:http://www.slideshare.net/holbertonschool/deep-learning-class-2-by-louis-monier(retrieved:Oct25th,2016)

ImageNet Winning CNN Architectures – A...

Documents

Transcript of ImageNet Winning CNN Architectures – A...

HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN · 2020. 10. 22. · pute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate

UTILITIES - CNN

Lecture 9: CNN Architectures - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - 1 May 2, 2017

A Hierarchical binary CNNs for landmark localization with ... · Our work is inspired by recent results of binarized CNN architectures on image classiﬁcation [5], [6]. Contrary

Šiaudinukai CNN

07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

shershey@google - arXiv · ious CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level la-bels.

CNN StoryLine

Methodology for Efficient CNN Architectures in Profiling ... · MethodologyforEﬃcientCNNArchitecturesin ProﬁlingAttacks Gabriel Zaid 1, 2, Lilian Bossuet , Amaury Habrard and

Performance Comparison of Hybrid CNN-SVM and CNN-XGBoost ...

CNN | by CNN Staff

Deep Face Recognition - csuohio.educis.csuohio.edu/~sschung/CIS660/DeepFaceRecognition_parkhi15.pdfrecent works on face recognition have proposed numerous variants of CNN architectures

Exploring CNN-based architectures for Multimodal …cvsp.cs.ntua.gr › publications › confr › 2018_KZM_CNNbased...Exploring CNN-based architectures for Multimodal Salient Event

Giornale CNN

Assessing Knee OA Severity with CNN attention-based end-to ... · ASSESSING KNEE OA SEVERITY WITH CNN ATTENTION-BASED END-TO-END ARCHITECTURES state-of-the-art baseline for the application

CNN Center

ImageNet Winning CNN Architectures – A ReviewRajat Vikram Singh rajats@andrew.cmu.edu I. Introduction Since a convolutional neural network won the ImageNet challenge in 2012, research

Switching Convolutional Neural Network for Crowd Countingval.cds.iisc.ac.in/crowdcnn/files/combined.pdf · performance of CNN based architectures for crowd count-ing and propose Switching

2sRanking-CNN: A 2-stage ranking-CNN for …2.2.1 1st-stage Ranking-CNN Before explaining 1st-stage ranking-CNN, let’s brieﬂy explain how ranking-CNN works. Ranking-CNN was proposed

Cnn familybonding