ImageNet Winning CNN Architectures – A...

Post on 26-Mar-2020

2 views 0 download

Transcript of ImageNet Winning CNN Architectures – A...

ImageNetWinningCNNArchitectures–AReviewRajatVikramSingh

rajats@andrew.cmu.eduI. IntroductionSinceaconvolutionalneuralnetworkwontheImageNetchallengein2012,researchinCNNshasproliferatedinanattempttoimprovethemwithprogressbeingmadeeveryyear.InthisreportIwillbediscussingthemajorimprovementsmadeinCNNssincetheadventofmodernCNNs.Iwillbebrieflydiscussingsomeofthepath-breakingapproachesandthevalueitaddstothepreviousapproaches.II. ImageNetChallengeandCNNArchitecturesSince 2010, the annual ImageNet Large Scale Visual Recognition Challenge (ILSVCR) [1],commonly called the ImageNet challenge, is a competition where research teams submitprogramsthatclassifyanddetectobjectsandscenes.Overtheyears,variousapproachesandarchitectureshavebeenusedtocompeteintheImageNetchallengeandeveryyearmanynewand exciting architecturesmake it to the competition. I will be covering the winners of theImageNetchallengefrom2012to2015inthisreport.III. AlexNetAlexNet[2]isconsideredtobethebreak-throughpaperwhichrosetheinterestinCNNswhenitwon the ImageNet challenge of 2012. AlexNet is a deep CNN trained on ImageNet andoutperformedall theentries thatyear. Itwasamajor improvementwith thenextbestentrygettingonly26.2%top5testerrorrate.Comparedtomodernarchitectures,arelativelysimplelayoutwasusedinthispaper.Thearchitecturefromtheirpaperisasfollows:

Fig.1:AlexNetarchitecture

The network wasmade up of 5 conv layers,max-pooling layers, dropout layers, and 3 fullyconnectedlayersattheend.AlexNetusedReLUforthenonlinearityfunctions,whichtheyfoundtodecreasetrainingtimebecauseReLUsaremuchfasterthanusingtanhfunctions.Theyalsodidimagetranslations,horizontalreflections,andpatchextractionsasawayofaugmentingthedatabeforeusingittotraintheirnetwork.Theyalsouseddropoutlayerstopreventover-fittingtothe

trainingdata.Theyusedabatchstochasticgradientdescentoptimization.TheirimplementationwastrainedonGPUsforfivetosixdays.IV. ZFNetTheZFpaper[3]hasaslightlymodifiedAlexNetmodelwhichgivesbetteraccuracy.Thepaperalsodescribesaveryinterestingwayofvisualizingfeaturemaps(deconvnets).ThearchitectureoftheZFNetasdescribedintheirpaperisasfollows:

Fig.2–ZFNetArchitectureZFNetused1.3millionimagesfortraining,comparedto15millionimagesusedbyAlexNet.OnemajordifferenceintheapproacheswasthatZFNetused7x7sizedfilterswhereasAlexNetused11x11filters.Theintuitionbehindthisisthatbyusingbiggerfilterswewerelosingalotofpixelinformation,whichwecan retainbyhaving smaller filter sizes in theearlier conv layers. Thenumberoffiltersincreaseaswegodeeper.ThisnetworkalsousedReLUsfortheiractivationandtrainedusingbatchstochasticgradientdescent. It trainedonaGPUfor twelvedays.Anotherinterestingapproach fromthepaperwas this ideaofaDeConvNetwhichcanbeusedtoseewhichimagepixelsexciteeachfilterandprovidesgreatintuitioninhowCNNswork.

Fig.3–VisualizationofaDeConvNet

V. VGGNetVGGNet[4]wasatechniqueproposedfortheImageNetchallengeof2013.VGGNetdidn’twinthe ImageNet 2013 challenge but it is still used by many people because it was a simplearchitecturebasedontheAlexNettypearchitecture.Thearchitectureisdescribedasbelow:

Fig.4–VGGNet–Alltheapproachestried.ColumnDgivesthebestperformingarchitecture.

VGGNetused3x3filterscomparedto11x11filtersinAlexNetand7x7inZFNet.Theauthorsgivetheintuitionbehindthisthathavingtwoconsecutive2consecutive3x3filtersgivesaneffectivereceptivefieldof5x5,and3–3x3filtersgiveareceptivefieldof7x7filters,butusingthiswecanuseafarlessnumberofhyper-parameterstobetrainedinthenetwork.Asyoumaynoticefromthe architecture the number of filters double after every max-pooling operation. As a dataaugmentationtechniquescalejitteringwasused.TrainedwithbatchgradientdescentandusedRelUs.Trainedon4GPUsfortwotothreeweeks.

VI. GoogLeNetWithsubmissionslikeVGGNetImageNetChallenge2014hadmanygreatsubmissions,butthewinnerofthemallwasGoogle’sGoogLeNet[5](Thename‘GooLeNet’isatributetotheworksof Yann LeCun in his LeNet [6],widely considered to be the first use ofmodern CNNs). TheGoogLeNetarchitectureisvisuallyrepresentedasfollows:

Fig.5-GoogLeNetarchitectureInceptionmodule:GoogLeNetproposedsomethingcalledthe inceptionmodules. Ifyou lookatthearchitecture,youcannoticesomeskipconnectionsinthenetworkessentiallyformingaminimoduleandthatmoduleisrepeatedthroughoutthenetwork.Googlecalledthismoduleaninceptionmodule.Thedetailsofthemoduleareasfollows:

Fig.6-Inceptionmodule

GoogLeNetuses9 inceptionmoduleand iteliminatesall fullyconnected layersusingaveragepooling to go from7x7x1024 to1x1x1024. This saves a lotofparameters.As a formofdataaugmentation,multiplecropsofthesameimagewerecreatedandthenetworkwastrainedonit.Trainingtooklessthanaweekwithfewhigh-endGPUs.VII. MicrosoftResNetThelastCNNarchitectureI’lldiscusshereistheMicrosoftResNet(residualnetwork)[7]whichwonthe2015ImageNetchallenge.ThearchitectureofthisCNNisasfollows:

Fig.7-MicrosoftResNetvisualizationcomparedtoaVGGNet

Fig.8-ResidualBlockinaResNet

Thereare152layersintheMicrosoftResNet.Theauthorsshowedempiricallythatifyoukeeponaddinglayerstheerrorrateshouldkeepondecreasingincontrastto“plainnets”whereadding

afewlayersresultedinhighertrainingandtesterrors.Ittooktwotothreeweekstotrainitonan8GPUmachine.Oneintuitivereasonwhyresidualblocksimproveclassificationisthedirectstepfromonelayertothenextandintuitivelyusingalltheseskipstepsformagradienthighwaywherethegradientscomputedcandirectlyaffecttheweightsinthefirstlayermakingupdateshavemoreeffect.VIII. References[1] Deng,Jia,etal."Imagenet:Alarge-scalehierarchicalimagedatabase."ComputerVision

andPatternRecognition,2009.CVPR2009.IEEEConferenceon.IEEE,2009.[2] Krizhevsky,Alex,IlyaSutskever,andGeoffreyE.Hinton."Imagenetclassificationwith

deepconvolutionalneuralnetworks."Advancesinneuralinformationprocessingsystems.2012.

[3] Zeiler,MatthewD.,andRobFergus."Visualizingandunderstandingconvolutionalnetworks."EuropeanConferenceonComputerVision.SpringerInternationalPublishing,2014.

[4] Simonyan,Karen,andAndrewZisserman."Verydeepconvolutionalnetworksforlarge-scaleimagerecognition."arXivpreprintarXiv:1409.1556(2014).

[5] Szegedy,Christian,etal."Goingdeeperwithconvolutions."ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.2015.

[6] LeCun,Y.,Bottou,L.,Bengio,Y.,andHaffner,P.(1998d).Gradient-basedlearningappliedtodocumentrecognition.ProceedingsoftheIEEE,86(11),2278–2324.

[7] He,Kaiming,etal."Deepresiduallearningforimagerecognition."arXivpreprintarXiv:1512.03385(2015).

[8] CS231nVideoLectures:https://www.youtube.com/playlist?list=PLLvH2FwAQhnpj1WEB-jHmPuUeQ8mX-XXG(retrieved:Oct25th,2016)

[9] DeepLearningSlides:http://www.slideshare.net/holbertonschool/deep-learning-class-2-by-louis-monier(retrieved:Oct25th,2016)