Learning OpenCV 3 Computer Vision with...
Transcript of Learning OpenCV 3 Computer Vision with...
LearningOpenCV3ComputerVisionwithPythonSecondEdition
TableofContents
LearningOpenCV3ComputerVisionwithPythonSecondEdition
Credits
AbouttheAuthors
AbouttheReviewers
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmore
Whysubscribe?
FreeaccessforPacktaccountholders
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Errata
Piracy
Questions
1.SettingUpOpenCV
Choosingandusingtherightsetuptools
InstallationonWindows
Usingbinaryinstallers(nosupportfordepthcameras)
UsingCMakeandcompilers
InstallingonOSX
UsingMacPortswithready-madepackages
UsingMacPortswithyourowncustompackages
UsingHomebrewwithready-madepackages(nosupportfordepthcameras)
UsingHomebrewwithyourowncustompackages
InstallationonUbuntuanditsderivatives
UsingtheUbunturepository(nosupportfordepthcameras)
BuildingOpenCVfromasource
InstallationonotherUnix-likesystems
InstallingtheContribmodules
Runningsamples
Findingdocumentation,help,andupdates
Summary
2.HandlingFiles,Cameras,andGUIs
BasicI/Oscripts
Reading/writinganimagefile
Convertingbetweenanimageandrawbytes
Accessingimagedatawithnumpy.array
Reading/writingavideofile
Capturingcameraframes
Displayingimagesinawindow
Displayingcameraframesinawindow
ProjectCameo(facetrackingandimagemanipulation)
Cameo–anobject-orienteddesign
Abstractingavideostreamwithmanagers.CaptureManager
Abstractingawindowandkeyboardwithmanagers.WindowManager
Applyingeverythingwithcameo.Cameo
Summary
3.ProcessingImageswithOpenCV3
Convertingbetweendifferentcolorspaces
AquicknoteonBGR
TheFourierTransform
Highpassfilter
Lowpassfilter
Creatingmodules
Edgedetection
Customkernels–gettingconvoluted
Modifyingtheapplication
EdgedetectionwithCanny
Contourdetection
Contours–boundingbox,minimumarearectangle,andminimumenclosingcircle
Contours–convexcontoursandtheDouglas-Peuckeralgorithm
Lineandcircledetection
Linedetection
Circledetection
Detectingshapes
Summary
4.DepthEstimationandSegmentation
Creatingmodules
Capturingframesfromadepthcamera
Creatingamaskfromadisparitymap
Maskingacopyoperation
Depthestimationwithanormalcamera
ObjectsegmentationusingtheWatershedandGrabCutalgorithms
ExampleofforegrounddetectionwithGrabCut
ImagesegmentationwiththeWatershedalgorithm
Summary
5.DetectingandRecognizingFaces
ConceptualizingHaarcascades
GettingHaarcascadedata
UsingOpenCVtoperformfacedetection
Performingfacedetectiononastillimage
Performingfacedetectiononavideo
Performingfacerecognition
Generatingthedataforfacerecognition
Recognizingfaces
Preparingthetrainingdata
Loadingthedataandrecognizingfaces
PerforminganEigenfacesrecognition
PerformingfacerecognitionwithFisherfaces
PerformingfacerecognitionwithLBPH
Discardingresultswithconfidencescore
Summary
6.RetrievingImagesandSearchingUsingImageDescriptors
Featuredetectionalgorithms
Definingfeatures
Detectingfeatures–corners
FeatureextractionanddescriptionusingDoGandSIFT
Anatomyofakeypoint
FeatureextractionanddetectionusingFastHessianandSURF
ORBfeaturedetectionandfeaturematching
FAST
BRIEF
Brute-Forcematching
FeaturematchingwithORB
UsingK-NearestNeighborsmatching
FLANN-basedmatching
FLANNmatchingwithhomography
Asampleapplication–tattooforensics
Savingimagedescriptorstofile
Scanningformatches
Summary
7.DetectingandRecognizingObjects
Objectdetectionandrecognitiontechniques
HOGdescriptors
Thescaleissue
Thelocationissue
Imagepyramid
Slidingwindows
Non-maximum(ornon-maxima)suppression
Supportvectormachines
Peopledetection
Creatingandtraininganobjectdetector
Bag-of-words
BOWincomputervision
Thek-meansclustering
Detectingcars
Whatdidwejustdo?
SVMandslidingwindows
Example–cardetectioninascene
Examiningdetector.py
Associatingtrainingdatawithclasses
Dude,where’smycar?
Summary
8.TrackingObjects
Detectingmovingobjects
Basicmotiondetection
Backgroundsubtractors–KNN,MOG2,andGMG
MeanshiftandCAMShift
Colorhistograms
ThecalcHistfunction
ThecalcBackProjectfunction
Insummary
Backtothecode
CAMShift
TheKalmanfilter
Predictandupdate
Anexample
Areal-lifeexample–trackingpedestrians
Theapplicationworkflow
Abriefdigression–functionalversusobject-orientedprogramming
ThePedestrianclass
Themainprogram
Wheredowegofromhere?
Summary
9.NeuralNetworkswithOpenCV–anIntroduction
Artificialneuralnetworks
Neuronsandperceptrons
ThestructureofanANN
Networklayersbyexample
Theinputlayer
Theoutputlayer
Thehiddenlayer
Thelearningalgorithms
ANNsinOpenCV
ANN-imalclassification
Trainingepochs
HandwrittendigitrecognitionwithANNs
MNIST–thehandwrittendigitdatabase
Customizedtrainingdata
Theinitialparameters
Theinputlayer
Thehiddenlayer
Theoutputlayer
Trainingepochs
Otherparameters
Mini-libraries
Themainfile
Possibleimprovementsandpotentialapplications
Improvements
Potentialapplications
Summary
Toboldlygo…
Index
LearningOpenCV3ComputerVisionwithPythonSecondEdition
LearningOpenCV3ComputerVisionwithPythonSecondEditionCopyright©2015PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:September2015
Productionreference:1240915
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
BirminghamB32PB,UK.
ISBN978-1-78528-384-0
www.packtpub.com
CreditsAuthors
JoeMinichino
JosephHowse
Reviewers
NandanBanerjee
TianCao
BrandonCastellano
HaojianJin
AdrianRosebrock
CommissioningEditor
AkramHussain
AcquisitionEditors
VivekAnantharaman
PrachiBisht
ContentDevelopmentEditor
RitikaSingh
TechnicalEditors
NovinaKewalramani
ShivaniKiranMistry
CopyEditor
SoniaCheema
ProjectCoordinator
MiltonDsouza
Proofreader
SafisEditing
Indexer
MonicaAjmeraMehta
Graphics
DishaHaria
ProductionCoordinator
ArvindkumarGupta
CoverWork
ArvindkumarGupta
AbouttheAuthorsJoeMinichinoisacomputervisionengineerforHooluxMedicalbydayandadeveloperoftheNoSQLdatabaseLokiJSbynight.Onweekends,heisaheavymetalsinger/songwriter.Heisapassionateprogrammerwhoisimmenselycuriousaboutprogramminglanguagesandtechnologiesandconstantlyexperimentswiththem.AtHoolux,JoeleadsthedevelopmentofanAndroidcomputervision-basedadvertisingplatformforthemedicalindustry.
BornandraisedinVarese,Lombardy,Italy,andcomingfromahumanisticbackgroundinphilosophy(atMilan’sUniversitàStatale),Joehasspenthislast11yearslivinginCork,Ireland,whichiswherehebecameacomputersciencegraduateattheCorkInstituteofTechnology.
Iamimmenselygratefultomypartner,Rowena,foralwaysencouragingme,andalsomytwolittledaughtersforinspiringme.Abigthankyoutothecollaboratorsandeditorsofthisbook,especiallyJoeHowse,AdrianRoesbrock,BrandonCastellano,theOpenCVcommunity,andthepeopleatPacktPublishing.
JosephHowselivesinCanada.Duringthewinters,hegrowshisbeard,whilehisfourcatsgrowtheirthickcoatsoffur.Helovescombinghiscatseverydayandsometimes,hiscatsalsopullhisbeard.
HehasbeenwritingforPacktPublishingsince2012.HisbooksincludeOpenCVforSecretAgents,OpenCVBlueprints,AndroidApplicationProgrammingwithOpenCV3,OpenCVComputerVisionwithPython,andPythonGameProgrammingbyExample.
Whenheisnotwritingbooksorgroominghiscats,heprovidesconsulting,training,andsoftwaredevelopmentservicesthroughhiscompany,NummistMedia(http://nummist.com).
AbouttheReviewersNandanBanerjeehasabachelor’sdegreeincomputerscienceandamaster’sinroboticsengineering.HestartedworkingwithSamsungElectronicsrightaftergraduation.HeworkedforayearatitsR&DcentreinBangalore.HealsoworkedintheWPI-CMUteamontheBostonDynamics’robot,Atlas,fortheDARPARoboticsChallenge.HeiscurrentlyworkingasaroboticssoftwareengineerinthetechnologyorganizationatiRobotCorporation.Heisanembeddedsystemsandroboticsenthusiastwithaninclinationtowardcomputervisionandmotionplanning.Hehasexperienceinvariouslanguages,includingC,C++,Python,Java,andDelphi.HealsohasasubstantialexperienceinworkingwithROS,OpenRAVE,OpenCV,PCL,OpenGL,CUDAandtheAndroidSDK.
Iwouldliketothanktheauthorandpublisherforcomingoutwiththiswonderfulbook.
TianCaoispursuinghisPhDincomputerscienceattheUniversityofNorthCarolinainChapelHill,USA,andworkingonprojectsrelatedtoimageanalysis,computervision,andmachinelearning.
Idedicatethisworktomyparentsandgirlfriend.
BrandonCastellanoisastudentfromCanadapursuinganMEScinelectricalengineeringattheUniversityofWesternOntario,CityofLondon,Canada.HereceivedhisBEScinthesamesubjectin2012.ThefocusofhisresearchisinparallelprocessingandGPGPU/FPGAoptimizationforreal-timeimplementationsofimageprocessingalgorithms.BrandonalsoworksforEagleVisionSystemsInc.,focusingontheuseofreal-timeimageprocessingforroboticsapplications.
WhilehehasbeenusingOpenCVandC++formorethan5years,hehasalsobeenadvocatingtheuseofPythonfrequentlyinhisresearch,mostnotably,foritsrapidspeedofdevelopment,allowinglow-levelinterfacingwithcomplexsystems.ThisisevidentinhisopensourceprojectshostedonGitHub,forexample,PySceneDetect,whichismostlywritteninPython.Inadditiontoimage/videoprocessing,hehasalsoworkedonimplementationsofthree-dimensionaldisplaysaswellasthesoftwaretoolstosupportthedevelopmentofsuchdisplays.
Inadditiontopostingtechnicalarticlesandtutorialsonhiswebsite(http://www.bcastell.com),heparticipatesinavarietyofbothopenandclosedsourceprojectsandcontributestoGitHubundertheusernameBreakthrough(http://www.github.com/Breakthrough).HeisanactivememberoftheSuperUserandStackOverflowcommunities(underthenameBreakthrough),andcanbecontacteddirectlyviahiswebsite.
Iwouldliketothankallmyfriendsandfamilyfortheirpatienceduringthepastfewyears(especiallymyparents,PeterandLori,andmybrother,Mitchell).Icouldnothaveaccomplishedeverythingwithouttheircontinuedloveandsupport.Ican’teverthankeveryoneenough.
Iwouldalsoliketoextendaspecialthankstoallofthedevelopersthatcontributetoopen
sourcesoftwarelibraries,specificallyOpenCV,whichhelpbringthedevelopmentofcutting-edgesoftwaretechnologyclosertoallthesoftwaredevelopersaroundtheworld,freeofcost.Iwouldalsoliketothankthosepeoplewhohelpwritedocumentation,submitbugreports,andwritetutorials/books(especiallytheauthorofthisbook!).Theircontributionsarevitaltothesuccessofanyopensourceproject,especiallyonethatisasextensiveandcomplexasOpenCV.
HaojianJinisasoftwareengineer/researcheratYahoo!Labs,Sunnyvale,CA.Helooksprimarilyatbuildingnewsystemsofwhat’spossibleoncommoditymobiledevices(orwithminimumhardwarechanges).Tocreatethingsthatdon’texisttoday,hespendslargechunksofhistimeplayingwithsignalprocessing,computervision,machinelearning,andnaturallanguageprocessingandusingthemininterestingways.Youcanfindmoreabouthimathttp://shift-3.com/
AdrianRosebrockisanauthorandbloggerathttp://www.pyimagesearch.com/.HeholdsaPhDincomputersciencefromtheUniversityofMaryland,BaltimoreCounty,USA,withafocusoncomputervisionandmachinelearning.
HehasconsultedfortheNationalCancerInstitutetodevelopmethodsthatautomaticallypredictbreastcancerriskfactorsusingbreasthistologyimages.Hehasalsoauthoredabook,PracticalPythonandOpenCV(http://pyimg.co/x7ed5),ontheutilizationofPythonandOpenCVtobuildreal-worldcomputervisionapplications.
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.
DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
https://www2.packtpub.com/books/subscription/packtlib
DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.
Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser
FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.
PrefaceOpenCV3isastate-of-the-artcomputervisionlibrarythatisusedforavarietyofimageandvideoprocessingoperations.Someofthemorespectacularandfuturisticfeatures,suchasfacerecognitionorobjecttracking,areeasilyachievablewithOpenCV3.Learningthebasicconceptsbehindcomputervisionalgorithms,models,andOpenCV’sAPIwillenablethedevelopmentofallsortsofreal-worldapplications,includingsecurityandsurveillancetools.
Startingwithbasicimageprocessingoperations,thisbookwilltakeyouthroughajourneythatexploresadvancedcomputervisionconcepts.Computervisionisarapidlyevolvingsciencewhoseapplicationsintherealworldareexploding,sothisbookwillappealtocomputervisionnovicesaswellasexpertsofthesubjectwhowanttolearnaboutthebrandnewOpenCV3.0.0.
WhatthisbookcoversChapter1,SettingUpOpenCV,explainshowtosetupOpenCV3withPythonondifferentplatforms.Itwillalsotroubleshootcommonproblems.
Chapter2,HandlingFiles,Cameras,andGUIs,introducesOpenCV’sI/Ofunctionalities.Itwillalsodiscusstheconceptofaprojectandthebeginningsofanobject-orienteddesignforthisproject.
Chapter3,ProcessingImageswithOpenCV3,presentssometechniquesrequiredtoalterimages,suchasdetectingskintoneinanimage,sharpeninganimage,markingcontoursofsubjects,anddetectingcrosswalksusingalinesegmentdetector.
Chapter4,DepthEstimationandSegmentation,showsyouhowtousedatafromadepthcameratoidentifyforegroundandbackgroundregions,suchthatwecanlimitaneffecttoonlytheforegroundorbackground.
Chapter5,DetectingandRecognizingFaces,introducessomeofOpenCV’sfacedetectionfunctionalities,alongwiththedatafilesthatdefineparticulartypesoftrackableobjects.
Chapter6,RetrievingImagesandSearchingUsingImageDescriptors,showshowtodetectthefeaturesofanimagewiththehelpofOpenCVandmakeuseofthemtomatchandsearchforimages.
Chapter7,DetectingandRecognizingObjects,introducestheconceptofdetectingandrecognizingobjects,whichisoneofthemostcommonchallengesincomputervision.
Chapter8,TrackingObjects,exploresthevasttopicofobjecttracking,whichistheprocessoflocatingamovingobjectinamovieorvideofeedwiththehelpofacamera.
Chapter9,NeuralNetworkswithOpenCV–anIntroduction,introducesyoutoArtificialNeuralNetworksinOpenCVandillustratestheirusageinareal-lifeapplication.
WhatyouneedforthisbookYousimplyneedarelativelyrecentcomputer,asthefirstchapterwillguideyouthroughtheinstallationofallthenecessarysoftware.Awebcamishighlyrecommended,butnotnecessary.
WhothisbookisforThisbookisaimedatprogrammerswithworkingknowledgeofPythonaswellaspeoplewhowanttoexplorethetopicofcomputervisionusingtheOpenCVlibrary.NopreviousexperienceofcomputervisionorOpenCVisrequired.Programmingexperienceisrecommended.
ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.
Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Wecanincludeothercontextsthroughtheuseoftheincludedirective.”
Ablockofcodeissetasfollows:
importcv2
importnumpyasnp
img=cv2.imread('images/chess_board.png')
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray=np.float32(gray)
dst=cv2.cornerHarris(gray,2,23,0.04)
Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:
img=cv2.imread('images/chess_board.png')
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray=np.float32(gray)
dst=cv2.cornerHarris(gray,2,23,0.04)
Anycommand-lineinputoroutputiswrittenasfollows:
mkdirbuild&&cdbuild
cmakeDCMAKE_BUILD_TYPE=Release-DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modulesDCMAKE_INSTALL_PREFIX=/usr/local..make
Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:”OnWindowsVista/Windows7/Windows8,clickontheStartmenu.”
NoteWarningsorimportantnotesappearinaboxlikethis.
TipTipsandtricksappearlikethis.
ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.
Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.
Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.
CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.
DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.
Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.
PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.
Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.
Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.
QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.
Chapter1.SettingUpOpenCVYoupickedupthisbooksoyoumayalreadyhaveanideaofwhatOpenCVis.Maybe,youheardofSci-Fi-soundingfeatures,suchasfacedetection,andgotintrigued.Ifthisisthecase,you’vemadetheperfectchoice.OpenCVstandsforOpenSourceComputerVision.Itisafreecomputervisionlibrarythatallowsyoutomanipulateimagesandvideostoaccomplishavarietyoftasksfromdisplayingthefeedofawebcamtopotentiallyteachingarobottorecognizereal-lifeobjects.
Inthisbook,youwilllearntoleveragetheimmensepotentialofOpenCVwiththePythonprogramminglanguage.Pythonisanelegantlanguagewitharelativelyshallowlearningcurveandverypowerfulfeatures.ThischapterisaquickguidetosettingupPython2.7,OpenCV,andotherrelatedlibraries.Aftersetup,wealsolookatOpenCV’sPythonsamplescriptsanddocumentation.
NoteIfyouwishtoskiptheinstallationprocessandjumprightintoaction,youcandownloadthevirtualmachine(VM)I’vemadeavailableathttp://techfort.github.io/pycv/.
ThisfileiscompatiblewithVirtualBox,afree-to-usevirtualizationapplicationthatletsyoubuildandrunVMs.TheVMI’vebuiltisbasedonUbuntuLinux14.04andhasallthenecessarysoftwareinstalledsothatyoucanstartcodingrightaway.
ThisVMrequiresatleast2GBofRAMtorunsmoothly,somakesurethatyouallocateatleast2(but,ideally,morethan4)GBofRAMtotheVM,whichmeansthatyourhostmachinewillneedatleast6GBofRAMtosustainit.
Thefollowingrelatedlibrariesarecoveredinthischapter:
NumPy:ThislibraryisadependencyofOpenCV’sPythonbindings.Itprovidesnumericcomputingfunctionality,includingefficientarrays.SciPy:ThislibraryisascientificcomputinglibrarythatiscloselyrelatedtoNumPy.ItisnotrequiredbyOpenCV,butitisusefulformanipulatingdatainOpenCVimages.OpenNI:ThislibraryisanoptionaldependencyofOpenCV.Itaddsthesupportforcertaindepthcameras,suchasAsusXtionPRO.SensorKinect:ThislibraryisanOpenNIpluginandoptionaldependencyofOpenCV.ItaddssupportfortheMicrosoftKinectdepthcamera.
Forthisbook’spurposes,OpenNIandSensorKinectcanbeconsideredoptional.TheyareusedthroughoutChapter4,DepthEstimationandSegmentation,butarenotusedintheotherchaptersorappendices.
NoteThisbookfocusesonOpenCV3,thenewmajorreleaseoftheOpenCVlibrary.AlladditionalinformationaboutOpenCVisavailableathttp://opencv.org,anditsdocumentationisavailableathttp://docs.opencv.org/master.
ChoosingandusingtherightsetuptoolsWearefreetochoosevarioussetuptools,dependingonouroperatingsystemandhowmuchconfigurationwewanttodo.Let’stakeanoverviewofthetoolsforWindows,Mac,Ubuntu,andotherUnix-likesystems.
InstallationonWindowsWindowsdoesnotcomewithPythonpreinstalled.However,installationwizardsareavailableforprecompiledPython,NumPy,SciPy,andOpenCV.Alternatively,wecanbuildfromasource.OpenCV’sbuildsystemusesCMakeforconfigurationandeitherVisualStudioorMinGWforcompilation.
Ifwewantsupportfordepthcameras,includingKinect,weshouldfirstinstallOpenNIandSensorKinect,whichareavailableasprecompiledbinarieswithinstallationwizards.Then,wemustbuildOpenCVfromasource.
NoteTheprecompiledversionofOpenCVdoesnotoffersupportfordepthcameras.
OnWindows,OpenCV2offersbettersupportfor32-bitPythonthan64-bitPython;however,withthemajorityofcomputerssoldtodaybeing64-bitsystems,ourinstructionswillreferto64-bit.Allinstallershave32-bitversionsavailablefromthesamesiteasthe64-bit.
Someofthefollowingstepsrefertoeditingthesystem’sPATHvariable.ThistaskcanbedoneintheEnvironmentVariableswindowofControlPanel.
1. OnWindowsVista/Windows7/Windows8,clickontheStartmenuandlaunchControlPanel.Now,navigatetoSystemandSecurity|System|Advancedsystemsettings.ClickontheEnvironmentVariables…button.
2. OnWindowsXP,clickontheStartmenuandnavigatetoControlPanel|System.SelecttheAdvancedtab.ClickontheEnvironmentVariables…button.
3. Now,underSystemvariables,selectPathandclickontheEdit…button.4. Makechangesasdirected.5. Toapplythechanges,clickonalltheOKbuttons(untilwearebackinthemain
windowofControlPanel).6. Then,logoutandlogbackin(alternatively,reboot).
Usingbinaryinstallers(nosupportfordepthcameras)YoucanchoosetoinstallPythonanditsrelatedlibrariesseparatelyifyouprefer;however,therearePythondistributionsthatcomewithinstallersthatwillsetuptheentireSciPystack(whichincludesPythonandNumPy),whichmakeitverytrivialtosetupthedevelopmentenvironment.
OnesuchdistributionisAnacondaPython(downloadableathttp://09c8d0b2229f813c1b93-c95ac804525aac4b6dba79b00b39d1d3.r79.cf1.rackcdn.com/Anaconda-2.1.0Windows-x86_64.exe).Oncetheinstallerisdownloaded,runitandremembertoaddthepathtotheAnacondainstallationtoyourPATHvariablefollowingtheprecedingprocedure.
HerearethestepstosetupPython7,NumPy,SciPy,andOpenCV:
1. Downloadandinstallthe32-bitPython2.7.9fromhttps://www.python.org/ftp/python/2.7.9/python-2.7.9.amd64.msi.
2. DownloadandinstallNumPy1.6.2fromhttp://www.lfd.uci.edu/~gohlke/pythonlibs/#numpyhttp://sourceforge.net/projects/numpy/files/NumPy/1.6.2/numpy-1.6.2-win32-superpack-python2.7.exe/download(notethatinstallingNumPyonWindows64-bitisabittrickyduetothelackofa64-bitFortrancompileronWindows,whichNumPydependson.Thebinaryattheprecedinglinkisunofficial).
3. DownloadandinstallSciPy11.0fromhttp://www.lfd.uci.edu/~gohlke/pythonlibs/#scipyhttp://sourceforge.net/projects/scipy/files/scipy/0.11.0/scipy-0.11.0win32-superpack-python2.7.exe/download(thisisthesameasNumPyandthesearecommunityinstallers).
4. Downloadtheself-extractingZIPofOpenCV3.0.0fromhttps://github.com/Itseez/opencv.RunthisZIP,andwhenprompted,enteradestinationfolder,whichwewillrefertoas<unzip_destination>.Asubfolder,<unzip_destination>\opencv,iscreated.
5. Copy<unzip_destination>\opencv\build\python\2.7\cv2.pydtoC:\Python2.7\Lib\site-packages(assumingthatwehadinstalledPython2.7tothedefaultlocation).IfyouinstalledPython2.7withAnaconda,usetheAnacondainstallationfolderinsteadofthedefaultPythoninstallation.Now,thenewPythoninstallationcanfindOpenCV.
6. AfinalstepisnecessaryifwewantPythonscriptstorunusingthenewPythoninstallationbydefault.Editthesystem’sPATHvariableandappend;C:\Python2.7(assumingthatwehadinstalledPython2.7tothedefaultlocation)oryourAnacondainstallationfolder.RemoveanypreviousPythonpaths,suchas;C:\Python2.6.Logoutandlogbackin(alternatively,reboot).
UsingCMakeandcompilersWindowsdoesnotcomewithanycompilersorCMake.Weneedtoinstallthem.Ifwewantsupportfordepthcameras,includingKinect,wealsoneedtoinstallOpenNIandSensorKinect.
Let’sassumethatwehavealreadyinstalled32-bitPython2.7,NumPy,andSciPyeitherfrombinaries(asdescribedpreviously)orfromasource.Now,wecanproceedwithinstallingcompilersandCMake,optionallyinstallingOpenNIandSensorKinect,andthenbuildingOpenCVfromthesource:
1. DownloadandinstallCMake3.1.2fromhttp://www.cmake.org/files/v3.1/cmake-3.1.2-win32-x86.exe.Whenrunningtheinstaller,selecteitherAddCMaketothesystemPATHforallusersorAddCMaketothesystemPATHforcurrentuser.Don’tworryaboutthefactthata64-bitversionofCMakeisnotavailableCMakeisonlyaconfigurationtoolanddoesnotperformanycompilationsitself.Instead,onWindows,itcreatesprojectfilesthatcanbeopenedwithVisualStudio.
2. DownloadandinstallMicrosoftVisualStudio2013(theDesktopeditionifyouareworkingonWindows7)fromhttps://www.visualstudio.com/products/free-developer-offers-vs.aspx?slcid=0x409&type=weborMinGW.
NotethatyouwillneedtosigninwithyourMicrosoftaccountandifyoudon’thaveone,youcancreateoneonthespot.Installthesoftwareandrebootafterinstallationiscomplete.
ForMinGW,gettheinstallerfromhttp://sourceforge.net/projects/mingw/files/Installer/mingw-get-setup.exe/downloadandhttp://sourceforge.net/projects/mingw/files/OldFiles/mingw-get-inst/mingw-get-inst-20120426/mingw-get-inst-20120426.exe/download.Whenrunningtheinstaller,makesurethatthedestinationpathdoesnotcontainspacesandthattheoptionalC++compilerisincluded.Editthesystem’sPATHvariableandappend;C:\MinGW\bin(assumingthatMinGWisinstalledtothedefaultlocation).Rebootthesystem.
3. Optionally,downloadandinstallOpenNI1.5.4.0fromthelinksprovidedintheGitHubhomepageofOpenNIathttps://github.com/OpenNI/OpenNI.
4. YoucandownloadandinstallSensorKinect0.93fromhttps://github.com/avin2/SensorKinect/blob/unstable/Bin/SensorKinect093-Bin-Win32-v5.1.2.1.msi?raw=true(32-bit).Alternatively,for64-bitPython,downloadthesetupfromhttps://github.com/avin2/SensorKinect/blob/unstable/Bin/SensorKinect093-Bin-Win64-v5.1.2.1.msi?raw=true(64-bit).Notethatthisrepositoryhasbeeninactiveformorethanthreeyears.
5. Downloadtheself-extractingZIPofOpenCV3.0.0fromhttps://github.com/Itseez/opencv.Runtheself-extractingZIP,andwhenprompted,enteranydestinationfolder,whichwewillrefertoas<unzip_destination>.Asubfolder,<unzip_destination>\opencv,isthencreated.
6. OpenCommandPromptandmakeanotherfolderwhereourbuildwillgousingthiscommand:
>mkdir<build_folder>
Changethedirectoryofthebuildfolder:
>cd<build_folder>
7. Now,wearereadytoconfigureourbuild.Tounderstandalltheoptions,wecanreadthecodein<unzip_destination>\opencv\CMakeLists.txt.However,forthisbook’spurposes,weonlyneedtousetheoptionsthatwillgiveusareleasebuildwithPythonbindings,andoptionally,depthcamerasupportviaOpenNIandSensorKinect.
8. OpenCMake(cmake-gui)andspecifythelocationofthesourcecodeofOpenCVandthefolderwhereyouwouldliketobuildthelibrary.ClickonConfigure.Selecttheprojecttobegenerated.Inthiscase,selectVisualStudio12(whichcorrespondstoVisualStudio2013).AfterCMakehasfinishedconfiguringtheproject,itwilloutputalistofbuildoptions.Ifyouseearedbackground,itmeansthatyourprojectmayneedtobereconfigured:CMakemightreportthatithasfailedtofindsomedependencies.ManyofOpenCV’sdependenciesareoptional,sodonotbetooconcernedyet.
Note
Ifthebuildfailstocompleteoryourunintoproblemslater,tryinstallingmissingdependencies(oftenavailableasprebuiltbinaries),andthenrebuildOpenCVfromthisstep.
Youhavetheoptionofselecting/deselectingbuildoptions(accordingtothelibrariesyouhaveinstalledonyourmachine)andclickonConfigureagain,untilyougetaclearbackground(white).
9. Attheendofthisprocess,youcanclickonGenerate,whichwillcreateanOpenCV.slnfileinthefolderyou’vechosenforthebuild.Youcanthennavigateto<build_folder>/OpenCV.slnandopenthefilewithVisualStudio2013,andproceedwithbuildingtheproject,ALL_BUILD.YouwillneedtobuildboththeDebugandReleaseversionsofOpenCV,sogoaheadandbuildthelibraryintheDebugmode,thenselectReleaseandrebuildit(F7isthekeytolaunchthebuild).
10. Atthisstage,youwillhaveabinfolderintheOpenCVbuilddirectory,whichwillcontainallthegenerated.dllfilesthatwillallowyoutoincludeOpenCVinyourprojects.
Alternatively,forMinGW,runthefollowingcommand:
>cmake-D:CMAKE_BUILD_TYPE=RELEASE-D:WITH_OPENNI=ON-G
"MinGWMakefiles"<unzip_destination>\opencv
IfOpenNIisnotinstalled,omit-D:WITH_OPENNI=ON.(Inthiscase,depthcameraswillnotbesupported.)IfOpenNIandSensorKinectareinstalledtonondefaultlocations,modifythecommandtoinclude-D:OPENNI_LIB_DIR=<openni_install_destination>\Lib-D:OPENNI_INCLUDE_DIR=
<openni_install_destination>\Include-
D:OPENNI_PRIME_SENSOR_MODULE_BIN_DIR=
<sensorkinect_install_destination>\Sensor\Bin.
Alternatively,forMinGW,runthiscommand:
>mingw32-make
11. Copy<build_folder>\lib\Release\cv2.pyd(fromaVisualStudiobuild)or<build_folder>\lib\cv2.pyd(fromaMinGWbuild)to<python_installation_folder>\site-packages.
12. Finally,editthesystem’sPATHvariableandappend;<build_folder>/bin/Release(foraVisualStudiobuild)or;<build_folder>/bin(foraMinGWbuild).Rebootyoursystem.
InstallingonOSXSomeversionsofMacusedtocomewithaversionofPython2.7preinstalledthatwerecustomizedbyAppleforasystem’sinternalneeds.However,thishaschangedandthestandardversionofOSXshipswithastandardinstallationofPython.Onpython.org,youcanalsofindauniversalbinarythatiscompatiblewithboththenewIntelsystemsandthelegacyPowerPC.
NoteYoucanobtainthisinstallerathttps://www.python.org/downloads/release/python-279/(refertotheMacOSX32-bitPPCortheMacOSX64-bitIntellinks).InstallingPythonfromthedownloaded.dmgfilewillsimplyoverwriteyourcurrentsysteminstallationofPython.
ForMac,thereareseveralpossibleapproachesforobtainingstandardPython2.7,NumPy,SciPy,andOpenCV.AllapproachesultimatelyrequireOpenCVtobecompiledfromasourceusingXcodeDeveloperTools.However,dependingontheapproach,thistaskisautomatedforusinvariouswaysbythird-partytools.WewilllookatthesekindsofapproachesusingMacPortsorHomebrew.ThesetoolscanpotentiallydoeverythingthatCMakecan,plustheyhelpusresolvedependenciesandseparateourdevelopmentlibrariesfromsystemlibraries.
TipIrecommendMacPorts,especiallyifyouwanttocompileOpenCVwithdepthcamerasupportviaOpenNIandSensorKinect.Relevantpatchesandbuildscripts,includingsomethatImaintain,areready-madeforMacPorts.Bycontrast,Homebrewdoesnotcurrentlyprovideaready-madesolutiontocompileOpenCVwithdepthcamerasupport.
Beforeproceeding,let’smakesurethattheXcodeDeveloperToolsareproperlysetup:
1. DownloadandinstallXcodefromtheMacAppStoreorhttps://developer.apple.com/xcode/downloads/.Duringinstallation,ifthereisanoptiontoinstallCommandLineTools,selectit.
2. OpenXcodeandacceptthelicenseagreement.3. Afinalstepisnecessaryiftheinstallerdoesnotgiveustheoptiontoinstall
CommandLineTools.NavigatetoXcode|Preferences|Downloads,andclickontheInstallbuttonnexttoCommandLineTools.WaitfortheinstallationtofinishandquitXcode.
Alternatively,youcaninstallXcodecommand-linetoolsbyrunningthefollowingcommand(intheterminal):
$xcode-select–install
Now,wehavetherequiredcompilersforanyapproach.
UsingMacPortswithready-madepackages
WecanusetheMacPortspackagemanagertohelpussetupPython2.7,NumPy,andOpenCV.MacPortsprovidesterminalcommandsthatautomatetheprocessofdownloading,compiling,andinstallingvariouspiecesofopensourcesoftware(OSS).MacPortsalsoinstallsdependenciesasneeded.Foreachpieceofsoftware,thedependenciesandbuildrecipesaredefinedinaconfigurationfilecalledaPortfile.AMacPortsrepositoryisacollectionofPortfiles.
StartingfromasystemwhereXcodeanditscommand-linetoolsarealreadysetup,thefollowingstepswillgiveusanOpenCVinstallationviaMacPorts:
1. DownloadandinstallMacPortsfromhttp://www.macports.org/install.php.2. IfyouwantsupportfortheKinectdepthcamera,youneedtotellMacPortswhereto
downloadthecustomPortfilesthatIhavewritten.Todoso,edit/opt/local/etc/macports/sources.conf(assumingthatMacPortsisinstalledtothedefaultlocation).Justabovetheline,rsync://rsync.macports.org/release/ports/[default],addthefollowingline:
http://nummist.com/opencv/ports.tar.gz
Savethefile.Now,MacPortsknowsthatithastosearchforPortfilesinmyonlinerepositoryfirst,andthenthedefaultonlinerepository.
3. OpentheterminalandrunthefollowingcommandtoupdateMacPorts:
$sudoportselfupdate
Whenprompted,enteryourpassword.
4. Now(ifweareusingmyrepository),runthefollowingcommandtoinstallOpenCVwithPython2.7bindingsandsupportfordepthcameras,includingKinect:
$sudoportinstallopencv+python27+openni_sensorkinect
Alternatively(withorwithoutmyrepository),runthefollowingcommandtoinstallOpenCVwithPython2.7bindingsandsupportfordepthcameras,excludingKinect:
$sudoportinstallopencv+python27+openni
NoteDependencies,includingPython2.7,NumPy,OpenNI,and(inthefirstexample)SensorKinect,areautomaticallyinstalledaswell.
Byadding+python27tothecommand,wespecifythatwewanttheopencvvariant(buildconfiguration)withPython2.7bindings.Similarly,+openni_sensorkinectspecifiesthevariantwiththebroadestpossiblesupportfordepthcamerasviaOpenNIandSensorKinect.Youmayomit+openni_sensorkinectifyoudonotintendtousedepthcameras,oryoumayreplaceitwith+openniifyoudointendtouseOpenNI-compatibledepthcamerasbutjustnotKinect.Toseethefulllistoftheavailablevariantsbeforeinstalling,wecanenterthefollowingcommand:
$portvariantsopencv
Dependingonourcustomizationneeds,wecanaddothervariantstotheinstallcommand.Forevenmoreflexibility,wecanwriteourownvariants(asdescribedinthenextsection).
5. Also,runthefollowingcommandtoinstallSciPy:
$sudoportinstallpy27-scipy
6. ThePythoninstallation’sexecutableisnamedpython2.7.Ifwewanttolinkthedefaultpythonexecutabletopython2.7,let’salsorunthiscommand:
$sudoportinstallpython_select
$sudoportselectpythonpython27
UsingMacPortswithyourowncustompackagesWithafewextrasteps,wecanchangethewaythatMacPortscompilesOpenCVoranyotherpieceofsoftware.Aspreviouslymentioned,MacPorts’buildrecipesaredefinedinconfigurationfilescalledPortfiles.BycreatingoreditingPortfiles,wecanaccesshighlyconfigurablebuildtools,suchasCMake,whilealsobenefittingfromMacPorts’features,suchasdependencyresolution.
Let’sassumethatwealreadyhaveMacPortsinstalled.Now,wecanconfigureMacPortstousethecustomPortfilesthatwewrite:
1. CreateafoldersomewheretoholdourcustomPortfiles.Wewillrefertothisfolderas<local_repository>.
2. Editthe/opt/local/etc/macports/sources.conffile(assumingthatMacPortsisinstalledtothedefaultlocation).Justabovethersync://rsync.macports.org/release/ports/[default]line,addthisline:
file://<local_repository>
Forexample,if<local_repository>is/Users/Joe/Portfiles,addthefollowingline:
file:///Users/Joe/Portfiles
Notethetripleslashesandsavethefile.Now,MacPortsknowsthatithastosearchforPortfilesin<local_repository>first,andthen,itsdefaultonlinerepository.
3. OpentheterminalandupdateMacPortstoensurethatwehavethelatestPortfilesfromthedefaultrepository:
$sudoportselfupdate
4. Let’scopythedefaultrepository’sopencvPortfileasanexample.Weshouldalsocopythedirectorystructure,whichdetermineshowthepackageiscategorizedbyMacPorts:
$mkdir<local_repository>/graphics/
$cp
/opt/local/var/macports/sources/rsync.macports.org/release/ports/graphi
cs/opencv<local_repository>/graphics
Alternatively,foranexamplethatincludesKinectsupport,wecoulddownloadmyonlinerepositoryfromhttp://nummist.com/opencv/ports.tar.gz,unzipit,andcopyitsentiregraphicsfolderinto<local_repository>:
$cp<unzip_destination>/graphics<local_repository>
5. Edit<local_repository>/graphics/opencv/Portfile.NotethatthisfilespecifiestheCMakeconfigurationflags,dependencies,andvariants.FordetailsonthePortfileediting,gotohttp://guide.macports.org/#development.
ToseewhichCMakeconfigurationflagsarerelevanttoOpenCV,weneedtolookatitssourcecode.Downloadthesourcecodearchivefromhttps://github.com/Itseez/opencv/archive/3.0.0.zip,unzipittoanylocation,andread<unzip_destination>/OpenCV-3.0.0/CMakeLists.txt.
AftermakinganyeditstothePortfile,saveit.
6. Now,weneedtogenerateanindexfileinourlocalrepositorysothatMacPortscanfindthenewPortfile:
$cd<local_repository>
$portindex
7. Fromnowon,wecantreatourcustomopencvfilejustlikeanyotherMacPortspackage.Forexample,wecaninstallitasfollows:
$sudoportinstallopencv+python27+openni_sensorkinect
Notethatourlocalrepository’sPortfiletakesprecedenceoverthedefaultrepository’sPortfilebecauseoftheorderinwhichtheyarelistedin/opt/local/etc/macports/sources.conf.
UsingHomebrewwithready-madepackages(nosupportfordepthcameras)Homebrewisanotherpackagemanagerthatcanhelpus.Normally,MacPortsandHomebrewshouldnotbeinstalledonthesamemachine.
StartingfromasystemwhereXcodeanditscommand-linetoolsarealreadysetup,thefollowingstepswillgiveusanOpenCVinstallationviaHomebrew:
1. OpentheterminalandrunthefollowingcommandtoinstallHomebrew:
$ruby-e"$(curl-fsSkLraw.github.com/mxcl/homebrew/go)"
2. UnlikeMacPorts,HomebrewdoesnotautomaticallyputitsexecutablesinPATH.Todoso,createoreditthe~/.profilefileandaddthislineatthetopofthecode:
exportPATH=/usr/local/bin:/usr/local/sbin:$PATH
SavethefileandrunthiscommandtorefreshPATH:
$source~/.profile
NotethatexecutablesinstalledbyHomebrewnowtakeprecedenceoverexecutablesinstalledbythesystem.
3. ForHomebrew’sself-diagnosticreport,runthefollowingcommand:
$brewdoctor
Followanytroubleshootingadviceitgives.
4. Now,updateHomebrew:
$brewupdate
5. RunthefollowingcommandtoinstallPython2.7:
$brewinstallpython
6. Now,wecaninstallNumPy.Homebrew’sselectionofthePythonlibrarypackagesislimited,soweuseaseparatepackagemanagementtoolcalledpip,whichcomeswithHomebrew’sPython:
$pipinstallnumpy
7. SciPycontainssomeFortrancode,soweneedanappropriatecompiler.WecanuseHomebrewtoinstallthegfortrancompiler:
$brewinstallgfortran
Now,wecaninstallSciPy:
$pipinstallscipy
8. ToinstallOpenCVona64-bitsystem(allnewMachardwaresincelate2006),runthefollowingcommand:
$brewinstallopencv
TipDownloadingtheexamplecode
YoucandownloadtheexamplecodefilesforallPacktPublishingbooksthatyouhavepurchasedfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
UsingHomebrewwithyourowncustompackagesHomebrewmakesiteasytoeditexistingpackagedefinitions:
$breweditopencv
ThepackagedefinitionsareactuallyscriptsintheRubyprogramminglanguage.TipsoneditingthemcanbefoundontheHomebrewWikipageat
https://github.com/mxcl/homebrew/wiki/Formula-Cookbook.AscriptmayspecifyMakeorCMakeconfigurationflags,amongotherthings.
ToseewhichCMakeconfigurationflagsarerelevanttoOpenCV,weneedtolookatitssourcecode.Downloadthesourcecodearchivefromhttps://github.com/Itseez/opencv/archive/3.0.0.zip,unzipittoanylocation,andread<unzip_destination>/OpenCV-2.4.3/CMakeLists.txt.
AftermakingeditstotheRubyscript,saveit.
Thecustomizedpackagecanbetreatedasnormal.Forexample,itcanbeinstalledasfollows:
$brewinstallopencv
InstallationonUbuntuanditsderivativesFirstandforemost,hereisaquicknoteonUbuntu’sversionsofanoperatingsystem:Ubuntuhasa6-monthreleasecycleinwhicheachreleaseiseithera.04ora.10minorversionofamajorversion(14atthetimeofwriting).Everytwoyears,however,Ubuntureleasesaversionclassifiedaslong-termsupport(LTS)whichwillgrantyouafiveyearsupportbyCanonical(thecompanybehindUbuntu).Ifyouworkinanenterpriseenvironment,itiscertainlyadvisabletoinstalloneoftheLTSversions.Thelatestoneavailableis14.04.
UbuntucomeswithPython2.7preinstalled.ThestandardUbunturepositorycontainsOpenCV2.4.9packageswithoutsupportfordepthcameras.Atthetimeofwritingthis,OpenCV3isnotyetavailablethroughtheUbunturepositories,sowewillhavetobuilditfromsource.Fortunately,thevastmajorityofUnix-likeandLinuxsystemscomewithallthenecessarysoftwaretobuildaprojectfromscratchalreadyinstalled.Whenbuiltfromsource,OpenCVcansupportdepthcamerasviaOpenNIandSensorKinect,whichareavailableasprecompiledbinarieswithinstallationscripts.
UsingtheUbunturepository(nosupportfordepthcameras)WecaninstallPythonandallitsnecessarydependenciesusingtheaptpackagemanager,byrunningthefollowingcommands:
>sudoapt-getinstallbuild-essential
>sudoapt-getinstallcmakegitlibgtk2.0-devpkg-configlibavcodecdevlibavformat-devlibswscale-dev
>sudoapt-getinstallpython-devpython-numpylibtbb2libtbb-devlibjpeg-
devlibpng-devlibtiff-devlibjasper-devlibdc1394-22-dev
Equivalently,wecouldhaveusedUbuntuSoftwareCenter,whichistheaptpackagemanager’sgraphicalfrontend.
BuildingOpenCVfromasourceNowthatwehavetheentirePythonstackandcmakeinstalled,wecanbuildOpenCV.First,weneedtodownloadthesourcecodefromhttps://github.com/Itseez/opencv/archive/3.0.0-beta.zip.
Extractthearchiveandmoveitintotheunzippedfolderinaterminal.
Then,runthefollowingcommands:
>mkdirbuild
>cdbuild
>cmake-DCMAKE_BUILD_TYPE=Release-DCMAKE_INSTALL_PREFIX=/usr/local..
>make
>makeinstall
Aftertheinstallationterminates,youmightwanttolookatOpenCV’sPythonsamplesin<opencv_folder>/opencv/samples/pythonand<script_folder>/opencv/samples/python2.
InstallationonotherUnix-likesystemsTheapproachesforUbuntu(asdescribedpreviously)arelikelytoworkonanyLinuxdistributionderivedfromUbuntu14.04LTSorUbuntu14.10asfollows:
Kubuntu14.04LTSorKubuntu14.10Xubuntu14.04LTSorXubuntu14.10LinuxMint17
OnDebianLinuxanditsderivatives,theaptpackagemanagerworksthesameasonUbuntu,thoughtheavailablepackagesmaydiffer.
OnGentooLinuxanditsderivatives,thePortagepackagemanagerissimilartoMacPorts(asdescribedpreviously),thoughtheavailablepackagesmaydiffer.
OnFreeBSDderivatives,theprocessofinstallationisagainsimilartoMacPorts;infact,MacPortsderivesfromtheportsinstallationsystemadoptedonFreeBSD.ConsulttheremarkableFreeBSDHandbookathttps://www.freebsd.org/doc/handbook/foranoverviewofthesoftwareinstallationprocess.
OnotherUnix-likesystems,thepackagemanagerandavailablepackagesmaydiffer.Consultyourpackagemanager’sdocumentationandsearchforpackageswithopencvintheirnames.RememberthatOpenCVanditsPythonbindingsmightbesplitintomultiplepackages.
Also,lookforanyinstallationnotespublishedbythesystemprovider,therepositorymaintainer,orthecommunity.SinceOpenCVusescameradriversandmediacodecs,gettingallofitsfunctionalitytoworkcanbetrickyonsystemswithpoormultimediasupport.Undersomecircumstances,systempackagesmightneedtobereconfiguredorreinstalledforcompatibility.
IfpackagesareavailableforOpenCV,checktheirversionnumber.OpenCV3orhigherisrecommendedforthisbook’spurposes.Also,checkwhetherthepackagesofferPythonbindingsanddepthcamerasupportviaOpenNIandSensorKinect.Finally,checkwhetheranyoneinthedevelopercommunityhasreportedsuccessorfailureinusingthepackages.
If,instead,wewanttodoacustombuildofOpenCVfromsource,itmightbehelpfultorefertotheinstallationscriptforUbuntu(asdiscussedpreviously)andadaptittothepackagemanagerandpackagesthatarepresentonanothersystem.
InstallingtheContribmodulesUnlikewithOpenCV2.4,somemodulesarecontainedinarepositorycalledopencv_contrib,whichisavailableathttps://github.com/Itseez/opencv_contrib.IhighlyrecommendinstallingthesemodulesastheycontainextrafunctionalitiesthatarenotincludedinOpenCV,suchasthefacerecognitionmodule.
Oncedownloaded(eitherthroughziporgit,Irecommendgitsothatyoucankeepuptodatewithasimplegitpullcommand),youcanrerunyourcmakecommandtoincludethebuildingofOpenCVwiththeopencv_contribmodulesasfollows:
cmake-DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules
<opencv_source_directory>
So,ifyou’vefollowedthestandardprocedureandcreatedabuilddirectoryinyourOpenCVdownloadfolder,youshouldrunthefollowingcommand:
mkdirbuild&&cdbuild
cmake-DCMAKE_BUILD_TYPE=Release-DOPENCV_EXTRA_MODULES_PATH=
<opencv_contrib>/modules-DCMAKE_INSTALL_PREFIX=/usr/local..
make
RunningsamplesRunningafewsamplescriptsisagoodwaytotestwhetherOpenCViscorrectlysetup.ThesamplesareincludedinOpenCV’ssourcecodearchive.
OnWindows,weshouldhavealreadydownloadedandunzippedOpenCV’sself-extractingZIP.Findthesamplesin<unzip_destination>/opencv/samples.
OnUnix-likesystems,includingMac,downloadthesourcecodearchivefromhttps://github.com/Itseez/opencv/archive/3.0.0.zipandunzipittoanylocation(ifwehavenotalreadydoneso).Findthesamplesin<unzip_destination>/OpenCV-3.0.0/samples.
Someofthesamplescriptsrequirecommand-linearguments.However,thefollowingscripts(amongothers)shouldworkwithoutanyarguments:
python/camera.py:Thisscriptdisplaysawebcamfeed(assumingthatawebcamispluggedin).python/drawing.py:Thisscriptdrawsaseriesofshapes,suchasascreensaver.python2/hist.py:Thisscriptdisplaysaphoto.PressA,B,C,D,orEtoseethevariationsofthephotoalongwithacorrespondinghistogramofcolororgrayscalevalues.python2/opt_flow.py(missingfromtheUbuntupackage):Thisscriptdisplaysawebcamfeedwithasuperimposedvisualizationofanopticalflow(suchasthedirectionofmotion).Forexample,slowlywaveyourhandatthewebcamtoseetheeffect.Press1or2foralternativevisualizations.
Toexitascript,pressEsc(notthewindow’sclosebutton).
IfweencountertheImportError:Nomodulenamedcv2.cvmessage,thenthismeansthatwearerunningthescriptfromaPythoninstallationthatdoesnotknowanythingaboutOpenCV.Therearetwopossibleexplanationsforthis:
SomestepsintheOpenCVinstallationmighthavefailedorbeenmissed.Gobackandreviewthesteps.IfwehavemultiplePythoninstallationsonthemachine,wemightbeusingthewrongversionofPythontolaunchthescript.Forexample,onMac,itmightbethecasethatOpenCVisinstalledforMacPortsPython,butwearerunningthescriptwiththesystem’sPython.Gobackandreviewtheinstallationstepsabouteditingthesystempath.Also,trylaunchingthescriptmanuallyfromthecommandlineusingcommandssuchasthis:
$pythonpython/camera.py
Youcanalsousethefollowingcommand:
$python2.7python/camera.py
AsanotherpossiblemeansofselectingadifferentPythoninstallation,tryeditingthesamplescripttoremovethe#!lines.TheselinesmightexplicitlyassociatethescriptwiththewrongPythoninstallation(forourparticularsetup).
Findingdocumentation,help,andupdatesOpenCV’sdocumentationcanbefoundonlineathttp://docs.opencv.org/.ThedocumentationincludesacombinedAPIreferenceforOpenCV’snewC++API,itsnewPythonAPI(whichisbasedontheC++API),oldCAPI,anditsoldPythonAPI(whichisbasedontheCAPI).Whenlookingupaclassorfunction,besuretoreadthesectionaboutthenewPythonAPI(thecv2module),andnottheoldPythonAPI(thecvmodule).
ThedocumentationisalsoavailableasseveraldownloadablePDFfiles:
APIreference:Thisdocumentationcanbefoundathttp://docs.opencv.org/modules/refman.htmlTutorials:Thesedocumentscanbefoundathttp://docs.opencv.org/doc/tutorials/tutorials.html(thesetutorialsusetheC++code;foraPythonportofthetutorials’code,seetherepositoryofAbidRahmanK.athttp://goo.gl/EPsD1)
IfyouwritecodeonairplanesorotherplaceswithoutInternetaccess,youwilldefinitelywanttokeepofflinecopiesofthedocumentation.
Ifthedocumentationdoesnotseemtoansweryourquestions,trytalkingtotheOpenCVcommunity.Herearesomesiteswhereyouwillfindhelpfulpeople:
TheOpenCVforum:http://www.answers.opencv.org/questions/DavidMillánEscrivá‘sblog(oneofthisbook’sreviewers):http://blog.damiles.com/AbidRahmanK.‘sblog(oneofthisbook’sreviewers):http://www.opencvpython.blogspot.com/AdrianRosebrock’swebsite(oneofthisbook’sreviewers):http://www.pyimagesearch.com/JoeMinichino’swebsiteforthisbook(authorofthisbook):http://techfort.github.io/pycv/JoeHowse’swebsiteforthisbook(authorofthefirsteditionofthisbook):http://nummist.com/opencv/
Lastly,ifyouareanadvanceduserwhowantstotrynewfeatures,bugfixes,andsamplescriptsfromthelatest(unstable)OpenCVsourcecode,havealookattheproject’srepositoryathttps://github.com/Itseez/opencv/.
SummaryBynow,weshouldhaveanOpenCVinstallationthatcandoeverythingweneedfortheprojectdescribedinthisbook.Dependingonwhichapproachwetook,wemightalsohaveasetoftoolsandscriptsthatareusabletoreconfigureandrebuildOpenCVforourfutureneeds.
WeknowwheretofindOpenCV’sPythonsamples.Thesesamplescoveredadifferentrangeoffunctionalitiesoutsidethisbook’sscope,buttheyareusefulasadditionallearningaids.
Inthenextchapter,wewillfamiliarizeourselveswiththemostbasicfunctionsoftheOpenCVAPI,namely,displayingimages,videos,capturingvideosthroughawebcam,andhandlingbasickeyboardandmouseinputs.
Chapter2.HandlingFiles,Cameras,andGUIsInstallingOpenCVandrunningsamplesisfun,butatthisstage,wewanttotryitoutourselves.ThischapterintroducesOpenCV’sI/Ofunctionality.Wealsodiscusstheconceptofaprojectandthebeginningsofanobject-orienteddesignforthisproject,whichwewillfleshoutinsubsequentchapters.
BystartingwithalookattheI/Ocapabilitiesanddesignpatterns,wewillbuildourprojectinthesamewaywewouldmakeasandwich:fromtheoutsidein.Breadslicesandspread,orendpointsandglue,comebeforefillingsoralgorithms.Wechoosethisapproachbecausecomputervisionismostlyextroverted—itcontemplatestherealworldoutsideourcomputer—andwewanttoapplyalloursubsequentalgorithmicworktotherealworldthroughacommoninterface.
BasicI/OscriptsMostCVapplicationsneedtogetimagesasinput.Mostalsoproduceimagesasoutput.AninteractiveCVapplicationmightrequireacameraasaninputsourceandawindowasanoutputdestination.However,otherpossiblesourcesanddestinationsincludeimagefiles,videofiles,andrawbytes.Forexample,rawbytesmightbetransmittedviaanetworkconnection,ortheymightbegeneratedbyanalgorithmifweincorporateproceduralgraphicsintoourapplication.Let’slookateachofthesepossibilities.
Reading/writinganimagefileOpenCVprovidestheimread()andimwrite()functionsthatsupportvariousfileformatsforstillimages.ThesupportedformatsvarybysystembutshouldalwaysincludetheBMPformat.Typically,PNG,JPEG,andTIFFshouldbeamongthesupportedformatstoo.
Let’sexploretheanatomyoftherepresentationofanimageinPythonandNumPy.
Nomattertheformat,eachpixelhasavalue,butthedifferenceisinhowthepixelisrepresented.Forexample,wecancreateablacksquareimagefromscratchbysimplycreatinga2DNumPyarray:
img=numpy.zeros((3,3),dtype=numpy.uint8)
Ifweprintthisimagetoaconsole,weobtainthefollowingresult:
array([[0,0,0],
[0,0,0],
[0,0,0]],dtype=uint8)
Eachpixelisrepresentedbyasingle8-bitinteger,whichmeansthatthevaluesforeachpixelareinthe0-255range.
Let’snowconvertthisimageintoBlue-green-red(BGR)usingcv2.cvtColor:
img=cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)
Let’sobservehowtheimagehaschanged:
array([[[0,0,0],
[0,0,0],
[0,0,0]],
[[0,0,0],
[0,0,0],
[0,0,0]],
[[0,0,0],
[0,0,0],
[0,0,0]]],dtype=uint8)
Asyoucansee,eachpixelisnowrepresentedbyathree-elementarray,witheachintegerrepresentingtheB,G,andRchannels,respectively.Othercolorspaces,suchasHSV,willberepresentedinthesameway,albeitwithdifferentvalueranges(forexample,thehuevalueoftheHSVcolorspacehasarangeof0-180)anddifferentnumbersofchannels.
Youcancheckthestructureofanimagebyinspectingtheshapeproperty,whichreturnsrows,columns,andthenumberofchannels(ifthereismorethanone).
Considerthisexample:
>>>img=numpy.zeros((3,3),dtype=numpy.uint8)
>>>img.shape
Theprecedingcodewillprint(3,3).IfyouthenconvertedtheimagetoBGR,theshape
wouldbe(3,3,3),whichindicatesthepresenceofthreechannelsperpixel.
Imagescanbeloadedfromonefileformatandsavedtoanother.Forexample,let’sconvertanimagefromPNGtoJPEG:
importcv2
image=cv2.imread('MyPic.png')
cv2.imwrite('MyPic.jpg',image)
NoteMostoftheOpenCVfunctionalitiesthatweuseareinthecv2module.YoumightcomeacrossotherOpenCVguidesthatinsteadrelyonthecvorcv2.cvmodules,whicharelegacyversions.ThereasonwhythePythonmoduleiscalledcv2isnotbecauseitisaPythonbindingmoduleforOpenCV2.x.x,butbecauseithasintroducedabetterAPI,whichleveragesobject-orientedprogrammingasopposedtothepreviouscvmodule,whichadheredtoamoreproceduralstyleofprogramming.
Bydefault,imread()returnsanimageintheBGRcolorformatevenifthefileusesagrayscaleformat.BGRrepresentsthesamecolorspaceasred-green-blue(RGB),butthebyteorderisreversed.
Optionally,wemayspecifythemodeofimread()tobeoneofthefollowingenumerators:
IMREAD_ANYCOLOR=4
IMREAD_ANYDEPTH=2
IMREAD_COLOR=1
IMREAD_GRAYSCALE=0
IMREAD_LOAD_GDAL=8
IMREAD_UNCHANGED=-1
Forexample,let’sloadaPNGfileasagrayscaleimage(losinganycolorinformationintheprocess),andthen,saveitasagrayscalePNGimage:
importcv2
grayImage=cv2.imread('MyPic.png',cv2.IMREAD_GRAYSCALE)
cv2.imwrite('MyPicGray.png',grayImage)
Toavoidunnecessaryheadaches,useabsolutepathstoyourimages(forexample,C:\Users\Joe\Pictures\MyPic.pngonWindowsor/home/joe/pictures/MyPic.pngonUnix)atleastwhileyou’refamiliarizingyourselfwithOpenCV’sAPI.Thepathofanimage,unlessabsolute,isrelativetothefolderthatcontainsthePythonscript,sointheprecedingexample,MyPic.pngwouldhavetobeinthesamefolderasyourPythonscriptortheimagewon’tbefound.
Regardlessofthemode,imread()discardsanyalphachannel(transparency).Theimwrite()functionrequiresanimagetobeintheBGRorgrayscaleformatwithacertainnumberofbitsperchannelthattheoutputformatcansupport.Forexample,bmprequires8bitsperchannel,whilePNGallowseither8or16bitsperchannel.
ConvertingbetweenanimageandrawbytesConceptually,abyteisanintegerrangingfrom0to255.Inallreal-timegraphicapplicationstoday,apixelistypicallyrepresentedbyonebyteperchannel,thoughotherrepresentationsarealsopossible.
AnOpenCVimageisa2Dor3Darrayofthe.arraytype.An8-bitgrayscaleimageisa2Darraycontainingbytevalues.A24-bitBGRimageisa3Darray,whichalsocontainsbytevalues.Wemayaccessthesevaluesbyusinganexpression,suchasimage[0,0]orimage[0,0,0].Thefirstindexisthepixel’sycoordinateorrow,0beingthetop.Thesecondindexisthepixel’sxcoordinateorcolumn,0beingtheleftmost.Thethirdindex(ifapplicable)representsacolorchannel.
Forexample,inan8-bitgrayscaleimagewithawhitepixelintheupper-leftcorner,image[0,0]is255.Fora24-bitBGRimagewithabluepixelintheupper-leftcorner,image[0,0]is[255,0,0].
NoteAsanalternativetousinganexpression,suchasimage[0,0]orimage[0,0]=128,wemayuseanexpression,suchasimage.item((0,0))orimage.setitem((0,0),128).Thelatterexpressionsaremoreefficientforsingle-pixeloperations.However,aswewillseeinsubsequentchapters,weusuallywanttoperformoperationsonlargeslicesofanimageratherthanonsinglepixels.
Providedthatanimagehas8bitsperchannel,wecancastittoastandardPythonbytearray,whichisone-dimensional:
byteArray=bytearray(image)
Conversely,providedthatbytearraycontainsbytesinanappropriateorder,wecancastandthenreshapeittogetanumpy.arraytypethatisanimage:
grayImage=numpy.array(grayByteArray).reshape(height,width)
bgrImage=numpy.array(bgrByteArray).reshape(height,width,3)
Asamorecompleteexample,let’sconvertbytearray,whichcontainsrandombytestoagrayscaleimageandaBGRimage:
importcv2
importnumpy
importos
#Makeanarrayof120,000randombytes.
randomByteArray=bytearray(os.urandom(120000))
flatNumpyArray=numpy.array(randomByteArray)
#Convertthearraytomakea400x300grayscaleimage.
grayImage=flatNumpyArray.reshape(300,400)
cv2.imwrite('RandomGray.png',grayImage)
#Convertthearraytomakea400x100colorimage.
bgrImage=flatNumpyArray.reshape(100,400,3)
cv2.imwrite('RandomColor.png',bgrImage)
Afterrunningthisscript,weshouldhaveapairofrandomlygeneratedimages,RandomGray.pngandRandomColor.png,inthescript’sdirectory.
NoteHere,weusePython’sstandardos.urandom()functiontogeneraterandomrawbytes,whichwewillthenconverttoaNumPyarray.NotethatitisalsopossibletogeneratearandomNumPyarraydirectly(andmoreefficiently)usingastatement,suchasnumpy.random.randint(0,256,120000).reshape(300,400).Theonlyreasonweuseos.urandom()istohelpdemonstrateaconversionfromrawbytes.
Accessingimagedatawithnumpy.arrayNowthatyouhaveabetterunderstandingofhowanimageisformed,wecanstartperformingbasicoperationsonit.Weknowthattheeasiest(andmostcommon)waytoloadanimageinOpenCVistousetheimreadfunction.Wealsoknowthatthiswillreturnanimage,whichisreallyanarray(eithera2Dor3Done,dependingontheparametersyoupassedtoimread()).
They.arraystructureiswelloptimizedforarrayoperations,anditallowscertainkindsofbulkmanipulationsthatarenotavailableinaplainPythonlist.Thesekindsof.arraytype-specificoperationscomeinhandyforimagemanipulationsinOpenCV.Let’sexploreimagemanipulationsfromthestartandstepbystepthough,withabasicexample:sayyouwanttomanipulateapixelatthecoordinates,(0,0),ofaBGRimageandturnitintoawhitepixel.
importcv
importnumpyasnp
img=cv.imread('MyPic.png')
img[0,0]=[255,255,255]
Ifyouthenshowedtheimagewithastandardimshow()call,youwillseeawhitedotinthetop-leftcorneroftheimage.Naturally,thisisn’tveryuseful,butitshowswhatcanbeaccomplished.Let’snowleveragetheabilityofnumpy.arraytooperatetransformationstoanarraymuchfasterthanaplainPythonarray.
Let’ssaythatyouwanttochangethebluevalueofaparticularpixel,forexample,thepixelatcoordinates,(150,120).Thenumpy.arraytypeprovidesaveryhandymethod,item(),whichtakesthreeparameters:thex(orleft)position,y(ortop),andtheindexwithinthearrayat(x,y)position(rememberthatinaBGRimage,thedataatacertainpositionisathree-elementarraycontainingtheB,G,andRvaluesinthisorder)andreturnsthevalueattheindexposition.Anotheritemset()methodsetsthevalueofaparticularchannelofaparticularpixeltoaspecifiedvalue(itemset()takestwoarguments:athree-elementtuple(x,y,andindex)andthenewvalue).
Inthisexample,wewillchangethevalueofblueat(150,120)fromitscurrentvalue(127)toanarbitrary255:
importcv
importnumpyasnp
img=cv.imread('MyPic.png')
printimg.item(150,120,0)//printsthecurrentvalueofBforthat
pixel
img.itemset((150,120,0),255)
printimg.item(150,120,0)//prints255
Rememberthatwedothiswithnumpy.arrayfortworeasons:numpy.arrayisanextremelyoptimizedlibraryforthesekindofoperations,andbecauseweobtainmorereadablecodethroughNumPy’selegantmethodsratherthantherawindexaccessofthefirstexample.
Thisparticularcodedoesn’tdomuchinitself,butitdoesopenaworldofpossibilities.Itis,however,advisablethatyouutilizebuilt-infiltersandmethodstomanipulateanentireimage;theaboveapproachisonlysuitableforsmallregionsofinterest.
Now,let’stakealookataverycommonoperation,namely,manipulatingchannels.Sometimes,you’llwanttozero-outallthevaluesofaparticularchannel(B,G,orR).
TipUsingloopstomanipulatethePythonarraysisverycostlyintermsofruntimeandshouldbeavoidedatallcosts.Usingarrayindexingallowsforefficientmanipulationofpixels.Thisisacostlyandslowoperation,especiallyifyoumanipulatevideos,you’llfindyourselfwithajitteryoutput.Thenafeaturecalledindexingcomestotherescue.SettingallG(green)valuesofanimageto0isassimpleasusingthiscode:
importcv
importasnp
img=cv.imread('MyPic.png')
img[:,:,1]=0
Thisisafairlyimpressivepieceofcodeandeasytounderstand.Therelevantlineisthelastone,whichbasicallyinstructstheprogramtotakeallpixelsfromallrowsandcolumnsandsettheresultingvalueatindexoneofthethree-elementarray,representingthecolorofthepixelto0.Ifyoudisplaythisimage,youwillnoticeacompleteabsenceofgreen.
ThereareanumberofinterestingthingswecandobyaccessingrawpixelswithNumPy’sarrayindexing;oneofthemisdefiningregionsofinterests(ROI).Oncetheregionisdefined,wecanperformanumberofoperations,namely,bindingthisregiontoavariable,andthenevendefiningasecondregionandassigningitthevalueofthefirstone(visuallycopyingaportionoftheimageovertoanotherpositionintheimage):
importcv
importnumpyasnp
img=cv.imread('MyPic.png')
my_roi=img[0:100,0:100]
img[300:400,300:400]=my_roi
It’simportanttomakesurethatthetworegionscorrespondintermsofsize.Ifnot,NumPywill(rightly)complainthatthetwoshapesmismatch.
Finally,thereareafewinterestingdetailswecanobtainfromnumpy.array,suchastheimagepropertiesusingthiscode:
importcv
importnumpyasnp
img=cv.imread('MyPic.png')
printimg.shape
printimg.size
printimg.dtype
Thesethreepropertiesareinthisorder:
Shape:NumPyreturnsatuplecontainingthewidth,height,and—iftheimageisincolor—thenumberofchannels.Thisisusefultodebugatypeofimage;iftheimageismonochromaticorgrayscale,itwillnotcontainachannel’svalue.Size:Thispropertyreferstothesizeofanimageinpixels.Datatype:Thispropertyreferstothedatatypeusedforanimage(normallyavariationofanunsignedintegertypeandthebitssupportedbythistype,thatis,uint8).
Allinall,itisstronglyadvisablethatyoufamiliarizeyourselfwithNumPyingeneralandnumpy.arrayinparticularwhenworkingwithOpenCV,asitisthefoundationofanimageprocessingdonewithPython.
Reading/writingavideofileOpenCVprovidestheVideoCaptureandVideoWriterclassesthatsupportvariousvideofileformats.ThesupportedformatsvarybysystembutshouldalwaysincludeanAVI.Viaitsread()method,aVideoCaptureclassmaybepolledfornewframesuntilitreachestheendofitsvideofile.EachframeisanimageinaBGRformat.
Conversely,animagemaybepassedtothewrite()methodoftheVideoWriterclass,whichappendstheimagetoafileinVideoWriter.Let’slookatanexamplethatreadsframesfromoneAVIfileandwritesthemtoanotherwithaYUVencoding:
importcv2
videoCapture=cv2.VideoCapture('MyInputVid.avi')
fps=videoCapture.get(cv2.CAP_PROP_FPS)
size=(int(videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter=cv2.VideoWriter(
'MyOutputVid.avi',cv2.VideoWriter_fourcc('I','4','2','0'),fps,size)
success,frame=videoCapture.read()
whilesuccess:#Loopuntiltherearenomoreframes.
videoWriter.write(frame)
success,frame=videoCapture.read()
TheargumentstotheVideoWriterclassconstructordeservespecialattention.Avideo’sfilenamemustbespecified.Anypreexistingfilewiththisnameisoverwritten.Avideocodecmustalsobespecified.Theavailablecodecsmayvaryfromsystemtosystem.Thesearetheoptionsthatareincluded:
cv2.VideoWriter_fourcc('I','4','2','0'):ThisoptionisanuncompressedYUVencoding,4:2:0chromasubsampled.Thisencodingiswidelycompatiblebutproduceslargefiles.Thefileextensionshouldbe.avi.cv2.VideoWriter_fourcc('P','I','M','1'):ThisoptionisMPEG-1.Thefileextensionshouldbe.avi.cv2.VideoWriter_fourcc('X','V','I','D'):ThisoptionisMPEG-4andapreferredoptionifyouwanttheresultingvideosizetobeaverage.Thefileextensionshouldbe.avi.cv2.VideoWriter_fourcc('T','H','E','O'):ThisoptionisOggVorbis.Thefileextensionshouldbe.ogv.cv2.VideoWriter_fourcc('F','L','V','1'):ThisoptionisaFlashvideo.Thefileextensionshouldbe.flv.
Aframerateandframesizemustbespecifiedtoo.Sincewearecopyingvideoframesfromanothervideo,thesepropertiescanbereadfromtheget()methodoftheVideoCaptureclass.
CapturingcameraframesAstreamofcameraframesisrepresentedbytheVideoCaptureclasstoo.However,foracamera,weconstructaVideoCaptureclassbypassingthecamera’sdeviceindexinsteadofavideo’sfilename.Let’sconsideranexamplethatcaptures10secondsofvideofromacameraandwritesittoanAVIfile:
importcv2
cameraCapture=cv2.VideoCapture(0)
fps=30#anassumption
size=(int(cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
videoWriter=cv2.VideoWriter(
'MyOutputVid.avi',cv2.VideoWriter_fourcc('I','4','2','0'),fps,size)
success,frame=cameraCapture.read()
numFramesRemaining=10*fps-1
whilesuccessandnumFramesRemaining>0:
videoWriter.write(frame)
success,frame=cameraCapture.read()
numFramesRemaining-=1
cameraCapture.release()
Unfortunately,theget()methodofaVideoCaptureclassdoesnotreturnanaccuratevalueforthecamera’sframerate;italwaysreturns0.Theofficialdocumentationathttp://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.htmlreads:
“WhenqueryingapropertythatisnotsupportedbythebackendusedbytheVideoCaptureclass,value0isreturned.”
Thisoccursmostcommonlyonsystemswherethedriveronlysupportsbasicfunctionalities.
ForthepurposeofcreatinganappropriateVideoWriterclassforthecamera,wehavetoeithermakeanassumptionabouttheframerate(aswedidinthecodepreviously)ormeasureitusingatimer.Thelatterapproachisbetterandwewillcoveritlaterinthischapter.
Thenumberofcamerasandtheirorderisofcoursesystem-dependent.Unfortunately,OpenCVdoesnotprovideanymeansofqueryingthenumberofcamerasortheirproperties.IfaninvalidindexisusedtoconstructaVideoCaptureclass,theVideoCaptureclasswillnotyieldanyframes;itsread()methodwillreturn(false,None).AgoodwaytopreventitfromtryingtoretrieveframesfromVideoCapturethatwerenotopenedcorrectlyistousetheVideoCapture.isOpenedmethod,whichreturnsaBoolean.
Theread()methodisinappropriatewhenweneedtosynchronizeasetofcamerasoramultiheadcamera(suchasastereocameraorKinect).Then,weusethegrab()and
retrieve()methodsinstead.Forasetofcameras,weusethiscode:
success0=cameraCapture0.grab()
success1=cameraCapture1.grab()
ifsuccess0andsuccess1:
frame0=cameraCapture0.retrieve()
frame1=cameraCapture1.retrieve()
DisplayingimagesinawindowOneofthemostbasicoperationsinOpenCVisdisplayinganimage.Thiscanbedonewiththeimshow()function.IfyoucomefromanyotherGUIframeworkbackground,youwouldthinkitsufficienttocallimshow()todisplayanimage.Thisisonlypartiallytrue:theimagewillbedisplayed,andwilldisappearimmediately.Thisisbydesign,toenabletheconstantrefreshingofawindowframewhenworkingwithvideos.Here’saverysimpleexamplecodetodisplayanimage:
importcv2
importnumpyasnp
img=cv2.imread('my-image.png')
cv2.imshow('myimage',img)
cv2.waitKey()
cv2.destroyAllWindows()
Theimshow()functiontakestwoparameters:thenameoftheframeinwhichwewanttodisplaytheimage,andtheimageitself.We’lltalkaboutwaitKey()inmoredetailwhenweexplorethedisplayingofframesinawindow.
TheaptlynameddestroyAllWindows()functiondisposesofallthewindowscreatedbyOpenCV.
DisplayingcameraframesinawindowOpenCVallowsnamedwindowstobecreated,redrawn,anddestroyedusingthenamedWindow(),imshow(),anddestroyWindow()functions.Also,anywindowmaycapturekeyboardinputviathewaitKey()functionandmouseinputviathesetMouseCallback()function.Let’slookatanexamplewhereweshowtheframesofalivecamerainput:
importcv2
clicked=False
defonMouse(event,x,y,flags,param):
globalclicked
ifevent==cv2.EVENT_LBUTTONUP:
clicked=True
cameraCapture=cv2.VideoCapture(0)
cv2.namedWindow('MyWindow')
cv2.setMouseCallback('MyWindow',onMouse)
print'Showingcamerafeed.Clickwindoworpressanykeytostop.'
success,frame=cameraCapture.read()
whilesuccessandcv2.waitKey(1)==-1andnotclicked:
cv2.imshow('MyWindow',frame)
success,frame=cameraCapture.read()
cv2.destroyWindow('MyWindow')
cameraCapture.release()
TheargumentforwaitKey()isanumberofmillisecondstowaitforkeyboardinput.Thereturnvalueiseither-1(meaningthatnokeyhasbeenpressed)oranASCIIkeycode,suchas27forEsc.ForalistofASCIIkeycodes,seehttp://www.asciitable.com/.Also,notethatPythonprovidesastandardfunction,ord(),whichcanconvertacharactertoitsASCIIkeycode.Forexample,ord('a')returns97.
TipOnsomesystems,waitKey()mayreturnavaluethatencodesmorethanjusttheASCIIkeycode.(AbugisknowntooccuronLinuxwhenOpenCVusesGTKasitsbackendGUIlibrary.)Onallsystems,wecanensurethatweextractjusttheASCIIkeycodebyreadingthelastbytefromthereturnvaluelikethis:
keycode=cv2.waitKey(1)
ifkeycode!=-1:
keycode&=0xFF
OpenCV’swindowfunctionsandwaitKey()areinterdependent.OpenCVwindowsareonlyupdatedwhenwaitKey()iscalled,andwaitKey()onlycapturesinputwhenanOpenCVwindowhasfocus.
ThemousecallbackpassedtosetMouseCallback()shouldtakefivearguments,asseeninourcodesample.Thecallback’sparamargumentissetasanoptionalthirdargumentto
setMouseCallback().Bydefault,itis0.Thecallback’seventargumentisoneofthefollowingactions:
cv2.EVENT_MOUSEMOVE:Thiseventreferstomousemovementcv2.EVENT_LBUTTONDOWN:Thiseventreferstotheleftbuttondowncv2.EVENT_RBUTTONDOWN:Thisreferstotherightbuttondowncv2.EVENT_MBUTTONDOWN:Thisreferstothemiddlebuttondowncv2.EVENT_LBUTTONUP:Thisreferstotheleftbuttonupcv2.EVENT_RBUTTONUP:Thiseventreferstotherightbuttonupcv2.EVENT_MBUTTONUP:Thiseventreferstothemiddlebuttonupcv2.EVENT_LBUTTONDBLCLK:Thiseventreferstotheleftbuttonbeingdouble-clickedcv2.EVENT_RBUTTONDBLCLK:Thisreferstotherightbuttonbeingdouble-clickedcv2.EVENT_MBUTTONDBLCLK:Thisreferstothemiddlebuttonbeingdouble-clicked
Themousecallback’sflagsargumentmaybesomebitwisecombinationofthefollowingevents:
cv2.EVENT_FLAG_LBUTTON:Thiseventreferstotheleftbuttonbeingpressedcv2.EVENT_FLAG_RBUTTON:Thiseventreferstotherightbuttonbeingpressedcv2.EVENT_FLAG_MBUTTON:Thiseventreferstothemiddlebuttonbeingpressedcv2.EVENT_FLAG_CTRLKEY:ThiseventreferstotheCtrlkeybeingpressedcv2.EVENT_FLAG_SHIFTKEY:ThiseventreferstotheShiftkeybeingpressedcv2.EVENT_FLAG_ALTKEY:ThiseventreferstotheAltkeybeingpressed
Unfortunately,OpenCVdoesnotprovideanymeansofhandlingwindowevents.Forexample,wecannotstopourapplicationwhenawindow’sclosebuttonisclicked.DuetoOpenCV’slimitedeventhandlingandGUIcapabilities,manydevelopersprefertointegrateitwithotherapplicationframeworks.Laterinthischapter,wewilldesignanabstractionlayertohelpintegrateOpenCVintoanyapplicationframework.
ProjectCameo(facetrackingandimagemanipulation)OpenCVisoftenstudiedthroughacookbookapproachthatcoversalotofalgorithmsbutnothingabouthigh-levelapplicationdevelopment.Toanextent,thisapproachisunderstandablebecauseOpenCV’spotentialapplicationsaresodiverse.OpenCVisusedinawidevarietyofapplications:photo/videoeditors,motion-controlledgames,arobot’sAI,orpsychologyexperimentswherewelogparticipants’eyemovements.Acrosssuchdifferentusecases,canwetrulystudyausefulsetofabstractions?
Ibelievewecanandthesoonerwestartcreatingabstractions,thebetter.WewillstructureourstudyofOpenCVaroundasingleapplication,but,ateachstep,wewilldesignacomponentofthisapplicationtobeextensibleandreusable.
Wewilldevelopaninteractiveapplicationthatperformsfacetrackingandimagemanipulationsoncamerainputinrealtime.ThistypeofapplicationcoversabroadrangeofOpenCV’sfunctionalityandchallengesustocreateanefficient,effectiveimplementation.
Specifically,ourapplicationwillperformreal-timefacialmerging.Giventwostreamsofcamerainput(or,optionally,prerecordedvideoinput),theapplicationwillsuperimposefacesfromonestreamontofacesintheother.Filtersanddistortionswillbeappliedtogivethisblendedsceneaunifiedlookandfeel.Usersshouldhavetheexperienceofbeingengagedinaliveperformancewheretheyenteranotherenvironmentandpersona.ThistypeofuserexperienceispopularinamusementparkssuchasDisneyland.
Insuchanapplication,userswouldimmediatelynoticeflaws,suchasalowframerateorinaccuratetracking.Togetthebestresults,wewilltryseveralapproachesusingconventionalimaginganddepthimaging.
WewillcallourapplicationCameo.Acameois(injewelry)asmallportraitofapersonor(infilm)averybriefroleplayedbyacelebrity.
Cameo–anobject-orienteddesignPythonapplicationscanbewritteninapurelyproceduralstyle.Thisisoftendonewithsmallapplications,suchasourbasicI/Oscripts,discussedpreviously.However,fromnowon,wewilluseanobject-orientedstylebecauseitpromotesmodularityandextensibility.
FromouroverviewofOpenCV’sI/Ofunctionality,weknowthatallimagesaresimilar,regardlessoftheirsourceordestination.Nomatterhowweobtainastreamofimagesorwherewesenditasoutput,wecanapplythesameapplication-specificlogictoeachframeinthisstream.SeparationofI/Ocodeandapplicationcodebecomesespeciallyconvenientinanapplication,suchasCameo,whichusesmultipleI/Ostreams.
WewillcreateclassescalledCaptureManagerandWindowManagerashigh-levelinterfacestoI/Ostreams.OurapplicationcodemayuseCaptureManagertoreadnewframesand,optionally,todispatcheachframetooneormoreoutputs,includingastillimagefile,avideofile,andawindow(viaaWindowManagerclass).AWindowManagerclassletsourapplicationcodehandleawindowandeventsinanobject-orientedstyle.
BothCaptureManagerandWindowManagerareextensible.WecouldmakeimplementationsthatdonotrelyonOpenCVforI/O.Indeed,AppendixA,IntegratingwithPygame,OpenCVComputerVisionwithPython,usesaWindowManagersubclass.
Abstractingavideostreamwithmanagers.CaptureManagerAswehaveseen,OpenCVcancapture,show,andrecordastreamofimagesfromeitheravideofileoracamera,buttherearesomespecialconsiderationsineachcase.OurCaptureManagerclassabstractssomeofthedifferencesandprovidesahigher-levelinterfacetodispatchimagesfromthecapturestreamtooneormoreoutputs—astillimagefile,videofile,orawindow.
ACaptureManagerclassisinitializedwithaVideoCaptureclassandhastheenterFrame()andexitFrame()methodsthatshouldtypicallybecalledoneveryiterationofanapplication’smainloop.BetweenacalltoenterFrame()andexitFrame(),theapplicationmay(anynumberoftimes)setachannelpropertyandgetaframeproperty.Thechannelpropertyisinitially0andonlymultiheadcamerasuseothervalues.Theframepropertyisanimagecorrespondingtothecurrentchannel’sstatewhenenterFrame()wascalled.
ACaptureManagerclassalsohasthewriteImage(),startWritingVideo(),andstopWritingVideo()methodsthatmaybecalledatanytime.ActualfilewritingispostponeduntilexitFrame().Also,duringtheexitFrame()method,theframepropertymaybeshowninawindow,dependingonwhethertheapplicationcodeprovidesaWindowManagerclasseitherasanargumenttotheconstructorofCaptureManagerorbysettingapreviewWindowManagerproperty.
Iftheapplicationcodemanipulatesframe,themanipulationsarereflectedinrecordedfilesandinthewindow.ACaptureManagerclasshasaconstructorargumentandpropertycalledshouldMirrorPreview,whichshouldbeTrueifwewantframetobemirrored(horizontallyflipped)inthewindowbutnotinrecordedfiles.Typically,whenfacingacamera,userspreferlivecamerafeedtobemirrored.
RecallthataVideoWriterclassneedsaframerate,butOpenCVdoesnotprovideanywaytogetanaccurateframerateforacamera.TheCaptureManagerclassworksaroundthislimitationbyusingaframecounterandPython’sstandardtime.time()functiontoestimatetheframerateifnecessary.Thisapproachisnotfoolproof.Dependingonframeratefluctuationsandthesystem-dependentimplementationoftime.time(),theaccuracyoftheestimatemightstillbepoorinsomecases.However,ifwedeploytounknownhardware,itisbetterthanjustassumingthattheuser’scamerahasaparticularframerate.
Let’screateafilecalledmanagers.py,whichwillcontainourimplementationofCaptureManager.Theimplementationturnsouttobequitelong.So,wewilllookatitinseveralpieces.First,let’saddimports,aconstructor,andproperties,asfollows:
importcv2
importnumpy
importtime
classCaptureManager(object):
def__init__(self,capture,previewWindowManager=None,
shouldMirrorPreview=False):
self.previewWindowManager=previewWindowManager
self.shouldMirrorPreview=shouldMirrorPreview
self._capture=capture
self._channel=0
self._enteredFrame=False
self._frame=None
self._imageFilename=None
self._videoFilename=None
self._videoEncoding=None
self._videoWriter=None
self._startTime=None
self._framesElapsed=long(0)
self._fpsEstimate=None
@property
defchannel(self):
returnself._channel
@channel.setter
defchannel(self,value):
ifself._channel!=value:
self._channel=value
self._frame=None
@property
defframe(self):
ifself._enteredFrameandself._frameisNone:
_,self._frame=self._capture.retrieve()
returnself._frame
@property
defisWritingImage(self):
returnself._imageFilenameisnotNone
@property
defisWritingVideo(self):
returnself._videoFilenameisnotNone
Notethatmostofthemembervariablesarenon-public,asdenotedbytheunderscoreprefixinvariablenames,suchasself._enteredFrame.Thesenonpublicvariablesrelatetothestateofthecurrentframeandanyfile-writingoperations.Asdiscussedpreviously,theapplicationcodeonlyneedstoconfigureafewthings,whichareimplementedasconstructorargumentsandsettablepublicproperties:thecamerachannel,windowmanager,andtheoptiontomirrorthecamerapreview.
ThisbookassumesacertainleveloffamiliaritywithPython;however,ifyouaregetting
confusedbythose@annotations(forexample,@property),refertothePythondocumentationaboutdecorators,abuilt-infeatureofthelanguagethatallowsthewrappingofafunctionbyanotherfunction,normallyusedtoapplyauser-definedbehaviorinseveralplacesofanapplication(refertohttps://docs.python.org/2/reference/compound_stmts.html#grammar-token-decorator).
NotePythondoesnothavetheconceptofprivatemembervariablesandthesingle/doubleunderscoreprefix(_)isonlyaconvention.
Bythisconvention,inPython,variablesthatareprefixedwithasingleunderscoreshouldbetreatedasprotected(accessedonlywithintheclassanditssubclasses),whilevariablesthatareprefixedwithadoubleunderscoreshouldbetreatedasprivate(accessedonlywithintheclass).
Continuingwithourimplementation,let’saddtheenterFrame()andexitFrame()methodstomanagers.py:
defenterFrame(self):
"""Capturethenextframe,ifany."""
#Butfirst,checkthatanypreviousframewasexited.
assertnotself._enteredFrame,\
'previousenterFrame()hadnomatchingexitFrame()'
ifself._captureisnotNone:
self._enteredFrame=self._capture.grab()
defexitFrame(self):
"""Drawtothewindow.Writetofiles.Releasetheframe."""
#Checkwhetheranygrabbedframeisretrievable.
#Thegettermayretrieveandcachetheframe.
ifself.frameisNone:
self._enteredFrame=False
return
#UpdatetheFPSestimateandrelatedvariables.
ifself._framesElapsed==0:
self._startTime=time.time()
else:
timeElapsed=time.time()-self._startTime
self._fpsEstimate=self._framesElapsed/timeElapsed
self._framesElapsed+=1
#Drawtothewindow,ifany.
ifself.previewWindowManagerisnotNone:
ifself.shouldMirrorPreview:
mirroredFrame=numpy.fliplr(self._frame).copy()
self.previewWindowManager.show(mirroredFrame)
else:
self.previewWindowManager.show(self._frame)
#Writetotheimagefile,ifany.
ifself.isWritingImage:
cv2.imwrite(self._imageFilename,self._frame)
self._imageFilename=None
#Writetothevideofile,ifany.
self._writeVideoFrame()
#Releasetheframe.
self._frame=None
self._enteredFrame=False
NotethattheimplementationofenterFrame()onlygrabs(synchronizes)aframe,whereasactualretrievalfromachannelispostponedtoasubsequentreadingoftheframevariable.TheimplementationofexitFrame()takestheimagefromthecurrentchannel,estimatesaframerate,showstheimageviathewindowmanager(ifany),andfulfillsanypendingrequeststowritetheimagetofiles.
Severalothermethodsalsopertaintofilewriting.Tofinishourclassimplementation,let’saddtheremainingfile-writingmethodstomanagers.py:
defwriteImage(self,filename):
"""Writethenextexitedframetoanimagefile."""
self._imageFilename=filename
defstartWritingVideo(
self,filename,
encoding=cv2.VideoWriter_fourcc('I','4','2','0')):
"""Startwritingexitedframestoavideofile."""
self._videoFilename=filename
self._videoEncoding=encoding
defstopWritingVideo(self):
"""Stopwritingexitedframestoavideofile."""
self._videoFilename=None
self._videoEncoding=None
self._videoWriter=None
def_writeVideoFrame(self):
ifnotself.isWritingVideo:
return
ifself._videoWriterisNone:
fps=self._capture.get(cv2.CAP_PROP_FPS)
iffps==0.0:
#Thecapture'sFPSisunknownsouseanestimate.
ifself._framesElapsed<20:
#Waituntilmoreframeselapsesothatthe
#estimateismorestable.
return
else:
fps=self._fpsEstimate
size=(int(self._capture.get(
cv2.CAP_PROP_FRAME_WIDTH)),
int(self._capture.get(
cv2.CAP_PROP_FRAME_HEIGHT)))
self._videoWriter=cv2.VideoWriter(
self._videoFilename,self._videoEncoding,
fps,size)
self._videoWriter.write(self._frame)
ThewriteImage(),startWritingVideo(),andstopWritingVideo()publicmethodssimplyrecordtheparametersforfile-writingoperations,whereastheactualwritingoperationsarepostponedtothenextcallofexitFrame().The_writeVideoFrame()nonpublicmethodcreatesorappendsavideofileinamannerthatshouldbefamiliarfromourearlierscripts.(SeetheReading/writingavideofilesection.)However,insituationswheretheframerateisunknown,weskipsomeframesatthestartofthecapturesessionsothatwehavetimetobuildupanestimateoftheframerate.
AlthoughourcurrentimplementationofCaptureManagerreliesonVideoCapture,wecouldmakeotherimplementationsthatdonotuseOpenCVforinput.Forexample,wecouldmakeasubclassthatisinstantiatedwithasocketconnection,whosebytestreamcouldbeparsedasastreamofimages.Wecouldalsomakeasubclassthatusesathird-partycameralibrarywithdifferenthardwaresupportthanwhatOpenCVprovides.However,forCameo,ourcurrentimplementationissufficient.
Abstractingawindowandkeyboardwithmanagers.WindowManagerAswehaveseen,OpenCVprovidesfunctionsthatcauseawindowtobecreated,destroyed,showanimage,andprocessevents.Ratherthanbeingmethodsofawindowclass,thesefunctionsrequireawindow’snametopassasanargument.Sincethisinterfaceisnotobject-oriented,itisinconsistentwithOpenCV’sgeneralstyle.Also,itisunlikelytobecompatiblewithotherwindoworeventhandlinginterfacesthatwemighteventuallywanttouseinsteadofOpenCV’s.
Forthesakeofobjectorientationandadaptability,weabstractthisfunctionalityintoaWindowManagerclasswiththecreateWindow(),destroyWindow(),show(),andprocessEvents()methods.Asaproperty,aWindowManagerclasshasafunctionobjectcalledkeypressCallback,which(ifnotNone)iscalledfromprocessEvents()inresponsetoanykeypress.ThekeypressCallbackobjectmusttakeasingleargument,suchasanASCIIkeycode.
Let’saddthefollowingimplementationofWindowManagertomanagers.py:
classWindowManager(object):
def__init__(self,windowName,keypressCallback=None):
self.keypressCallback=keypressCallback
self._windowName=windowName
self._isWindowCreated=False
@property
defisWindowCreated(self):
returnself._isWindowCreated
defcreateWindow(self):
cv2.namedWindow(self._windowName)
self._isWindowCreated=True
defshow(self,frame):
cv2.imshow(self._windowName,frame)
defdestroyWindow(self):
cv2.destroyWindow(self._windowName)
self._isWindowCreated=False
defprocessEvents(self):
keycode=cv2.waitKey(1)
ifself.keypressCallbackisnotNoneandkeycode!=-1:
#Discardanynon-ASCIIinfoencodedbyGTK.
keycode&=0xFF
self.keypressCallback(keycode)
Ourcurrentimplementationonlysupportskeyboardevents,whichwillbesufficientfor
Cameo.However,wecouldmodifyWindowManagertosupportmouseeventstoo.Forexample,theclass’sinterfacecouldbeexpandedtoincludeamouseCallbackproperty(andoptionalconstructorargument),butcouldotherwiseremainthesame.WithsomeeventframeworkotherthanOpenCV’s,wecouldsupportadditionaleventtypesinthesamewaybyaddingcallbackproperties.
AppendixA,IntegratingwithPygame,OpenCVComputerVisionwithPython,showsaWindowManagersubclassthatisimplementedwithPygame’swindowhandlingandeventframeworkinsteadofOpenCV’s.ThisimplementationimprovesonthebaseWindowManagerclassbyproperlyhandlingquitevents—forexample,whenauserclicksonawindow’sclosebutton.Potentially,manyothereventtypescanbehandledviaPygametoo.
Applyingeverythingwithcameo.CameoOurapplicationisrepresentedbyaCameoclasswithtwomethods:run()andonKeypress().Oninitialization,aCameoclasscreatesaWindowManagerclasswithonKeypress()asacallback,aswellasaCaptureManagerclassusingacameraandtheWindowManagerclass.Whenrun()iscalled,theapplicationexecutesamainloopinwhichframesandeventsareprocessed.Asaresultofeventprocessing,onKeypress()maybecalled.Thespacebarcausesascreenshottobetaken,Tabcausesascreencast(avideorecording)tostart/stop,andEsccausestheapplicationtoquit.
Inthesamedirectoryasmanagers.py,let’screateafilecalledcameo.pycontainingthefollowingimplementationofCameo:
importcv2
frommanagersimportWindowManager,CaptureManager
classCameo(object):
def__init__(self):
self._windowManager=WindowManager('Cameo',
self.onKeypress)
self._captureManager=CaptureManager(
cv2.VideoCapture(0),self._windowManager,True)
defrun(self):
"""Runthemainloop."""
self._windowManager.createWindow()
whileself._windowManager.isWindowCreated:
self._captureManager.enterFrame()
frame=self._captureManager.frame
#TODO:Filtertheframe(Chapter3).
self._captureManager.exitFrame()
self._windowManager.processEvents()
defonKeypress(self,keycode):
"""Handleakeypress.
space->Takeascreenshot.
tab->Start/stoprecordingascreencast.
escape->Quit.
"""
ifkeycode==32:#space
self._captureManager.writeImage('screenshot.png')
elifkeycode==9:#tab
ifnotself._captureManager.isWritingVideo:
self._captureManager.startWritingVideo(
'screencast.avi')
else:
self._captureManager.stopWritingVideo()
elifkeycode==27:#escape
self._windowManager.destroyWindow()
if__name__=="__main__":
Cameo().run()
Whenrunningtheapplication,notethatthelivecamerafeedismirrored,whilescreenshotsandscreencastsarenot.Thisistheintendedbehavior,aswepassTrueforshouldMirrorPreviewwheninitializingtheCaptureManagerclass.
Sofar,wedonotmanipulatetheframesinanywayexcepttomirrorthemforpreview.WewillstarttoaddmoreinterestingeffectsinChapter3,FilteringImages.
SummaryBynow,weshouldhaveanapplicationthatdisplaysacamerafeed,listensforkeyboardinput,and(oncommand)recordsascreenshotorscreencast.Wearereadytoextendtheapplicationbyinsertingsomeimage-filteringcode(Chapter3,FilteringImages)betweenthestartandendofeachframe.Optionally,wearealsoreadytointegrateothercameradriversorapplicationframeworks(AppendixA,IntegratingwithPygame,OpenCVComputerVisionwithPython)besidestheonessupportedbyOpenCV.
WealsonowhavetheknowledgetoprocessimagesandunderstandtheprincipleofimagemanipulationthroughtheNumPyarrays.Thisformstheperfectfoundationtounderstandthenexttopic,filteringimages.
Chapter3.ProcessingImageswithOpenCV3Soonerorlater,whenworkingwithimages,youwillfindyourselfinneedofalteringimages:beitapplyingartisticfilters,extrapolatingcertainsections,cutting,pasting,orwhateverelseyourmindcanconjure.Thischapterpresentssometechniquestoalterimages,andbytheendofit,youshouldbeabletoperformtasks,suchasdetectingskintoneinanimage,sharpeninganimage,markcontoursofsubjects,anddetectingcrosswalksusingalinesegmentdetector.
ConvertingbetweendifferentcolorspacesThereareliterallyhundredsofmethodsinOpenCVthatpertaintotheconversionofcolorspaces.Ingeneral,threecolorspacesareprevalentinmoderndaycomputervision:gray,BGR,andHue,Saturation,Value(HSV).
Grayisacolorspacethateffectivelyeliminatescolorinformationtranslatingtoshadesofgray:thiscolorspaceisextremelyusefulforintermediateprocessing,suchasfacedetection.BGRistheblue-green-redcolorspace,inwhicheachpixelisathree-elementarray,eachvaluerepresentingtheblue,green,andredcolors:webdeveloperswouldbefamiliarwithasimilardefinitionofcolors,excepttheorderofcolorsisRGB.InHSV,hueisacolortone,saturationistheintensityofacolor,andvaluerepresentsitsdarkness(orbrightnessattheoppositeendofthespectrum).
AquicknoteonBGRWhenIfirststarteddealingwiththeBGRcolorspace,somethingwasn’taddingup:the[0255255]value(noblue,fullgreen,andfullred)producestheyellowcolor.Ifyouhaveanartisticbackground,youwon’tevenneedtopickuppaintsandbrushestowitnessgreenandredmixintoamuddyshadeofbrown.Thatisbecausethecolormodelusedincomputingiscalledanadditiveanddealswithlights.Lightsbehavedifferentlyfrompaints(whichfollowthesubtractivecolormodel),and—assoftwarerunsoncomputerswhosemediumisamonitorthatemitslight—thecolormodelofreferenceistheadditiveone.
TheFourierTransformMuchoftheprocessingyouapplytoimagesandvideosinOpenCVinvolvestheconceptofFourierTransforminsomecapacity.JosephFourierwasan18thcenturyFrenchmathematicianwhodiscoveredandpopularizedmanymathematicalconcepts,andconcentratedhisworkonstudyingthelawsgoverningheat,andinmathematics,allthingswaveform.Inparticular,heobservedthatallwaveformsarejustthesumofsimplesinusoidsofdifferentfrequencies.
Inotherwords,thewaveformsyouobserveallaroundyouarethesumofotherwaveforms.Thisconceptisincrediblyusefulwhenmanipulatingimages,becauseitallowsustoidentifyregionsinimageswhereasignal(suchasimagepixels)changesalot,andregionswherethechangeislessdramatic.Wecanthenarbitrarilymarktheseregionsasnoiseorregionsofinterests,backgroundorforeground,andsoon.Thesearethefrequenciesthatmakeuptheoriginalimage,andwehavethepowertoseparatethemtomakesenseoftheimageandextrapolateinterestingdata.
NoteInanOpenCVcontext,thereareanumberofalgorithmsimplementedthatenableustoprocessimagesandmakesenseofthedatacontainedinthem,andthesearealsoreimplementedinNumPytomakeourlifeeveneasier.NumPyhasaFastFourierTransform(FFT)package,whichcontainsthefft2()method.ThismethodallowsustocomputeDiscreteFourierTransform(DFT)oftheimage.
Let’sexaminethemagnitudespectrumconceptofanimageusingFourierTransform.Themagnitudespectrumofanimageisanotherimage,whichgivesarepresentationoftheoriginalimageintermsofitschanges:thinkofitastakinganimageanddraggingallthebrightestpixelstothecenter.Then,yougraduallyworkyourwayouttotheborderwhereallthedarkestpixelshavebeenpushed.Immediately,youwillbeabletoseehowmanylightanddarkpixelsarecontainedinyourimageandthepercentageoftheirdistribution.
TheconceptofFourierTransformisthebasisofmanyalgorithmsusedforcommonimageprocessingoperations,suchasedgedetectionorlineandshapedetection.
Beforeexaminingtheseindetail,let’stakealookattwoconceptsthat—inconjunctionwiththeFourierTransform—formthefoundationoftheaforementionedprocessingoperations:highpassfiltersandlowpassfilters.
HighpassfilterAhighpassfilter(HPF)isafilterthatexaminesaregionofanimageandbooststheintensityofcertainpixelsbasedonthedifferenceintheintensitywiththesurroundingpixels.
Take,forexample,thefollowingkernel:
[[0,-0.25,0],
[-0.25,1,-0.25],
[0,-0.25,0]]
NoteAkernelisasetofweightsthatareappliedtoaregioninasourceimagetogenerateasinglepixelinthedestinationimage.Forexample,aksizeof7impliesthat49(7x7)sourcepixelsareconsideredingeneratingeachdestinationpixel.Wecanthinkofakernelasapieceoffrostedglassmovingoverthesourceimageandlettingthroughadiffusedblendofthesource’slight.
Aftercalculatingthesumofdifferencesoftheintensitiesofthecentralpixelcomparedtoalltheimmediateneighbors,theintensityofthecentralpixelwillbeboosted(ornot)ifahighlevelofchangesarefound.Inotherwords,ifapixelstandsoutfromthesurroundingpixels,itwillgetboosted.
Thisisparticularlyeffectiveinedgedetection,whereacommonformofHPFcalledhighboostfilterisused.
Bothhighpassandlowpassfiltersuseapropertycalledradius,whichextendstheareaoftheneighborsinvolvedinthefiltercalculation.
Let’sgothroughanexampleofanHPF:
importcv2
importnumpyasnp
fromscipyimportndimage
kernel_3x3=np.array([[-1,-1,-1],
[-1,8,-1],
[-1,-1,-1]])
kernel_5x5=np.array([[-1,-1,-1,-1,-1],
[-1,1,2,1,-1],
[-1,2,4,2,-1],
[-1,1,2,1,-1],
[-1,-1,-1,-1,-1]])
NoteNotethatbothfilterssumupto0,thereasonforthisisexplainedindetailintheEdgedetectionsection.
img=cv2.imread("../images/color1_small.jpg",0)
k3=ndimage.convolve(img,kernel_3x3)
k5=ndimage.convolve(img,kernel_5x5)
blurred=cv2.GaussianBlur(img,(11,11),0)
g_hpf=img-blurred
cv2.imshow("3x3",k3)
cv2.imshow("5x5",k5)
cv2.imshow("g_hpf",g_hpf)
cv2.waitKey()
cv2.destroyAllWindows()
Aftertheinitialimports,wedefinea3x3kernelanda5x5kernel,andthenweloadtheimageingrayscale.Normally,themajorityofimageprocessingisdonewithNumPy;however,inthisparticularcase,wewantto“convolve”animagewithagivenkernelandNumPyhappenstoonlyacceptone-dimensionalarrays.
Thisdoesnotmeanthattheconvolutionofdeeparrayscan’tbeachievedwithNumPy,justthatitwouldbeabitcomplex.Instead,ndimage(whichisapartofSciPy,soyoushouldhaveitinstalledaspertheinstructionsinChapter1,SettingUpOpenCV),makesthistrivial,throughitsconvolve()function,whichsupportstheclassicNumPyarraysthatthecv2modulesusetostoreimages.
WeapplytwoHPFswiththetwoconvolutionkernelswedefined.Lastly,wealsoimplementadifferentialmethodofobtainingaHPFbyapplyingalowpassfilterandcalculatingthedifferencewiththeoriginalimage.Youwillnoticethatthethirdmethodactuallyyieldsthebestresult,solet’salsoelaborateonlowpassfilters.
LowpassfilterIfanHPFbooststheintensityofapixel,givenitsdifferencewithitsneighbors,alowpassfilter(LPF)willsmoothenthepixelifthedifferencewiththesurroundingpixelsislowerthanacertainthreshold.Thisisusedindenoisingandblurring.Forexample,oneofthemostpopularblurring/smootheningfilters,theGaussianblur,isalowpassfilterthatattenuatestheintensityofhighfrequencysignals.
CreatingmodulesAsinthecaseofourCaptureManagerandWindowManagerclasses,ourfiltersshouldbereusableoutsideCameo.Thus,weshouldseparatethefiltersintotheirownPythonmoduleorfile.
Let’screateafilecalledfilters.pyinthesamedirectoryascameo.py.Weneedthefollowingimportstatementsinfilters.py:
importcv2
importnumpy
importutils
Let’salsocreateafilecalledutils.pyinthesamedirectory.Itshouldcontainthefollowingimportstatements:
importcv2
importnumpy
importscipy.interpolate
Wewillbeaddingfilterfunctionsandclassestofilters.py,whilemoregeneral-purposemathfunctionswillgoinutils.py.
EdgedetectionEdgesplayamajorroleinbothhumanandcomputervision.We,ashumans,caneasilyrecognizemanyobjecttypesandtheirposejustbyseeingabacklitsilhouetteoraroughsketch.Indeed,whenartemphasizesedgesandposes,itoftenseemstoconveytheideaofanarchetype,suchasRodin’sTheThinkerorJoeShuster’sSuperman.Software,too,canreasonaboutedges,poses,andarchetypes.Wewilldiscussthesekindsofreasoningsinlaterchapters.
OpenCVprovidesmanyedge-findingfilters,includingLaplacian(),Sobel(),andScharr().Thesefiltersaresupposedtoturnnon-edgeregionstoblackwhileturningedgeregionstowhiteorsaturatedcolors.However,theyarepronetomisidentifyingnoiseasedges.Thisflawcanbemitigatedbyblurringanimagebeforetryingtofinditsedges.OpenCValsoprovidesmanyblurringfilters,includingblur()(simpleaverage),medianBlur(),andGaussianBlur().Theargumentsfortheedge-findingandblurringfiltersvarybutalwaysincludeksize,anoddwholenumberthatrepresentsthewidthandheight(inpixels)ofafilter’skernel.
Forblurring,let’susemedianBlur(),whichiseffectiveinremovingdigitalvideonoise,especiallyincolorimages.Foredge-finding,let’suseLaplacian(),whichproducesboldedgelines,especiallyingrayscaleimages.AfterapplyingmedianBlur(),butbeforeapplyingLaplacian(),weshouldconverttheimagefromBGRtograyscale.
OncewehavetheresultofLaplacian(),wecaninvertittogetblackedgesonawhitebackground.Then,wecannormalizeit(sothatitsvaluesrangefrom0to1)andmultiplyitwiththesourceimagetodarkentheedges.Let’simplementthisapproachinfilters.py:
defstrokeEdges(src,dst,blurKsize=7,edgeKsize=5):
ifblurKsize>=3:
blurredSrc=cv2.medianBlur(src,blurKsize)
graySrc=cv2.cvtColor(blurredSrc,cv2.COLOR_BGR2GRAY)
else:
graySrc=cv2.cvtColor(src,cv2.COLOR_BGR2GRAY)
cv2.Laplacian(graySrc,cv2.CV_8U,graySrc,ksize=edgeKsize)
normalizedInverseAlpha=(1.0/255)*(255-graySrc)
channels=cv2.split(src)
forchannelinchannels:
channel[:]=channel*normalizedInverseAlpha
cv2.merge(channels,dst)
NotethatweallowkernelsizestobespecifiedasargumentsforstrokeEdges().TheblurKsizeargumentisusedasksizeformedianBlur(),whileedgeKsizeisusedasksizeforLaplacian().Withmywebcams,IfindthatablurKsizevalueof7andanedgeKsizevalueof5looksbest.Unfortunately,medianBlur()isexpensivewithalargeksize,suchas7.
TipIfyouencounterperformanceproblemswhenrunningstrokeEdges(),trydecreasingthe
blurKsizevalue.Toturnoffblur,setittoavaluelessthan3.
Customkernels–gettingconvolutedAswehavejustseen,manyofOpenCV’spredefinedfiltersuseakernel.Rememberthatakernelisasetofweights,whichdeterminehoweachoutputpixeliscalculatedfromaneighborhoodofinputpixels.Anothertermforakernelisaconvolutionmatrix.Itmixesuporconvolvesthepixelsinaregion.Similarly,akernel-basedfiltermaybecalledaconvolutionfilter.
OpenCVprovidesaveryversatilefilter2D()function,whichappliesanykernelorconvolutionmatrixthatwespecify.Tounderstandhowtousethisfunction,let’sfirstlearntheformatofaconvolutionmatrix.Itisa2Darraywithanoddnumberofrowsandcolumns.Thecentralelementcorrespondstoapixelofinterestandtheotherelementscorrespondtotheneighborsofthispixel.Eachelementcontainsanintegerorfloatingpointvalue,whichisaweightthatgetsappliedtoaninputpixel’svalue.Considerthisexample:
kernel=numpy.array([[-1,-1,-1],
[-1,9,-1],
[-1,-1,-1]])
Here,thepixelofinteresthasaweightof9anditsimmediateneighborseachhaveaweightof-1.Forthepixelofinterest,theoutputcolorwillbeninetimesitsinputcolorminustheinputcolorsofalleightadjacentpixels.Ifthepixelofinterestisalreadyabitdifferentfromitsneighbors,thisdifferencebecomesintensified.Theeffectisthattheimagelookssharperasthecontrastbetweentheneighborsisincreased.
Continuingourexample,wecanapplythisconvolutionmatrixtoasourceanddestinationimage,respectively,asfollows:
cv2.filter2D(src,-1,kernel,dst)
Thesecondargumentspecifiestheper-channeldepthofthedestinationimage(suchascv2.CV_8Ufor8bitsperchannel).Anegativevalue(asusedhere)meansthatthedestinationimagehasthesamedepthasthesourceimage.
NoteForcolorimages,notethatfilter2D()appliesthekernelequallytoeachchannel.Tousedifferentkernelsondifferentchannels,wewouldalsohavetousethesplit()andmerge()functions.
Basedonthissimpleexample,let’saddtwoclassestofilters.py.Oneclass,VConvolutionFilter,willrepresentaconvolutionfilteringeneral.Asubclass,SharpenFilter,willrepresentoursharpeningfilterspecifically.Let’seditfilters.pytoimplementthesetwonewclassesasfollows:
classVConvolutionFilter(object):
"""AfilterthatappliesaconvolutiontoV(orallofBGR)."""
def__init__(self,kernel):
self._kernel=kernel
defapply(self,src,dst):
"""ApplythefilterwithaBGRorgraysource/destination."""
cv2.filter2D(src,-1,self._kernel,dst)
classSharpenFilter(VConvolutionFilter):
"""Asharpenfilterwitha1-pixelradius."""
def__init__(self):
kernel=numpy.array([[-1,-1,-1],
[-1,9,-1],
[-1,-1,-1]])
VConvolutionFilter.__init__(self,kernel)
Notethattheweightssumupto1.Thisshouldbethecasewheneverwewanttoleavetheimage’soverallbrightnessunchanged.Ifwemodifyasharpeningkernelslightlysothatitsweightssumupto0instead,wehaveanedgedetectionkernelthatturnsedgeswhiteandnon-edgesblack.Forexample,let’saddthefollowingedgedetectionfiltertofilters.py:
classFindEdgesFilter(VConvolutionFilter):
"""Anedge-findingfilterwitha1-pixelradius."""
def__init__(self):
kernel=numpy.array([[-1,-1,-1],
[-1,8,-1],
[-1,-1,-1]])
VConvolutionFilter.__init__(self,kernel)
Next,let’smakeablurfilter.Generally,forablureffect,theweightsshouldsumupto1andshouldbepositivethroughouttheneighborhood.Forexample,wecantakeasimpleaverageoftheneighborhoodasfollows:
classBlurFilter(VConvolutionFilter):
"""Ablurfilterwitha2-pixelradius."""
def__init__(self):
kernel=numpy.array([[0.04,0.04,0.04,0.04,0.04],
[0.04,0.04,0.04,0.04,0.04],
[0.04,0.04,0.04,0.04,0.04],
[0.04,0.04,0.04,0.04,0.04],
[0.04,0.04,0.04,0.04,0.04]])
VConvolutionFilter.__init__(self,kernel)
Oursharpening,edgedetection,andblurfiltersusekernelsthatarehighlysymmetric.Sometimes,though,kernelswithlesssymmetryproduceaninterestingeffect.Let’sconsiderakernelthatblursononeside(withpositiveweights)andsharpensontheother(withnegativeweights).Itwillproducearidgedorembossedeffect.Hereisanimplementationthatwecanaddtofilters.py:
classEmbossFilter(VConvolutionFilter):
"""Anembossfilterwitha1-pixelradius."""
def__init__(self):
kernel=numpy.array([[-2,-1,0],
[-1,1,1],
[0,1,2]])
VConvolutionFilter.__init__(self,kernel)
Thissetofcustomconvolutionfiltersisverybasic.Indeed,itismorebasicthanOpenCV’sready-madesetoffilters.However,withabitofexperimentation,youshouldbeabletowriteyourownkernelsthatproduceauniquelook.
ModifyingtheapplicationNowthatwehavehigh-levelfunctionsandclassesforseveralfilters,itistrivialtoapplyanyofthemtothecapturedframesinCameo.Let’seditcameo.pyandaddthelinesthatappearinboldfaceinthefollowingexcerpt:
importcv2
importfilters
frommanagersimportWindowManager,CaptureManager
classCameo(object):
def__init__(self):
self._windowManager=WindowManager('Cameo',
self.onKeypress)
self._captureManager=CaptureManager(
cv2.VideoCapture(0),self._windowManager,True)
self._curveFilter=filters.BGRPortraCurveFilter()
defrun(self):
"""Runthemainloop."""
self._windowManager.createWindow()
whileself._windowManager.isWindowCreated:
self._captureManager.enterFrame()
frame=self._captureManager.frame
filters.strokeEdges(frame,frame)
self._curveFilter.apply(frame,frame)
self._captureManager.exitFrame()
self._windowManager.processEvents()
#...TherestisthesameasinChapter2.
Here,Ihavechosentoapplytwoeffects:strokingtheedgesandemulatingPortrafilmcolors.Feelfreetomodifythecodetoapplyanyfiltersyoulike.
HereisascreenshotfromCameowithstrokededgesandPortra-likecolors:
EdgedetectionwithCannyOpenCValsooffersaveryhandyfunctioncalledCanny(afterthealgorithm’sinventor,JohnF.Canny),whichisverypopularnotonlybecauseofitseffectiveness,butalsothesimplicityofitsimplementationinanOpenCVprogram,asitisaone-liner:
importcv2
importnumpyasnp
img=cv2.imread("../images/statue_small.jpg",0)
cv2.imwrite("canny.jpg",cv2.Canny(img,200,300))
cv2.imshow("canny",cv2.imread("canny.jpg"))
cv2.waitKey()
cv2.destroyAllWindows()
Theresultisaveryclearidentificationoftheedges:
TheCannyedgedetectionalgorithmisquitecomplexbutalsointeresting:it’safive-stepprocessthatdenoisestheimagewithaGaussianfilter,calculatesgradients,appliesnonmaximumsuppression(NMS)onedges,adoublethresholdonallthedetectededgestoeliminatefalsepositives,and,lastly,analyzesalltheedgesandtheirconnectiontoeachothertokeeptherealedgesanddiscardtheweakones.
ContourdetectionAnothervitaltaskincomputervisioniscontourdetection,notonlybecauseoftheobviousaspectofdetectingcontoursofsubjectscontainedinanimageorvideoframe,butbecauseofthederivativeoperationsconnectedwithidentifyingcontours.
Theseoperationsare,namely,computingboundingpolygons,approximatingshapes,andgenerallycalculatingregionsofinterest,whichconsiderablysimplifyinteractionwithimagedatabecausearectangularregionwithNumPyiseasilydefinedwithanarrayslice.Wewillbeusingthistechniquealotwhenexploringtheconceptofobjectdetection(includingfaces)andobjecttracking.
Let’sgoinorderandfamiliarizeourselveswiththeAPIfirstwithanexample:
importcv2
importnumpyasnp
img=np.zeros((200,200),dtype=np.uint8)
img[50:150,50:150]=255
ret,thresh=cv2.threshold(img,127,255,0)
image,contours,hierarchy=cv2.findContours(thresh,cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
color=cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)
img=cv2.drawContours(color,contours,-1,(0,255,0),2)
cv2.imshow("contours",color)
cv2.waitKey()
cv2.destroyAllWindows()
Firstly,wecreateanemptyblackimagethatis200x200pixelsinsize.Then,weplaceawhitesquareinthecenterofitutilizingndarray’sabilitytoassignvaluesonaslice.
Wethenthresholdtheimage,andcallthefindContours()function.Thisfunctionhasthreeparameters:theinputimage,hierarchytype,andthecontourapproximationmethod.Thereareanumberofaspectsthatareofparticularinterestinthisfunction:
Thefunctionmodifiestheinputimage,soitwouldbeadvisabletouseacopyoftheoriginalimage(forexample,bypassingimg.copy()).Secondly,thehierarchytreereturnedbythefunctionisquiteimportant:cv2.RETR_TREEwillretrievetheentirehierarchyofcontoursintheimage,enablingyoutoestablish“relationships”betweencontours.Ifyouonlywanttoretrievethemostexternalcontours,usecv2.RETR_EXTERNAL.Thisisparticularlyusefulwhenyouwanttoeliminatecontoursthatareentirelycontainedinothercontours(forexample,inavastmajorityofcases,youwon’tneedtodetectanobjectwithinanotherobjectofthesametype).
ThefindContoursfunctionreturnsthreeelements:themodifiedimage,contours,andtheirhierarchy.Weusethecontourstodrawonthecolorversionoftheimage(sothatwecandrawcontoursingreen)andeventuallydisplayit.
Theresultisawhitesquarewithitscontourdrawningreen.Spartan,buteffectivein
demonstratingtheconcept!Let’smoveontomoremeaningfulexamples.
Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleFindingthecontoursofasquareisasimpletask;irregular,skewed,androtatedshapesbringthebestoutofthecv2.findContoursutilityfunctionofOpenCV.Let’stakealookatthefollowingimage:
Inareal-lifeapplication,wewouldbemostinterestedindeterminingtheboundingboxofthesubject,itsminimumenclosingrectangle,anditscircle.Thecv2.findContoursfunctioninconjunctionwithafewotherOpenCVutilitiesmakesthisveryeasytoaccomplish:
importcv2
importnumpyasnp
img=cv2.pyrDown(cv2.imread("hammer.jpg",cv2.IMREAD_UNCHANGED))
ret,thresh=cv2.threshold(cv2.cvtColor(img.copy(),cv2.COLOR_BGR2GRAY),
127,255,cv2.THRESH_BINARY)
image,contours,hier=cv2.findContours(thresh,cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
forcincontours:
#findboundingboxcoordinates
x,y,w,h=cv2.boundingRect(c)
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
#findminimumarea
rect=cv2.minAreaRect(c)
#calculatecoordinatesoftheminimumarearectangle
box=cv2.boxPoints(rect)
#normalizecoordinatestointegers
box=np.int0(box)
#drawcontours
cv2.drawContours(img,[box],0,(0,0,255),3)
#calculatecenterandradiusofminimumenclosingcircle
(x,y),radius=cv2.minEnclosingCircle(c)
#casttointegers
center=(int(x),int(y))
radius=int(radius)
#drawthecircle
img=cv2.circle(img,center,radius,(0,255,0),2)
cv2.drawContours(img,contours,-1,(255,0,0),1)
cv2.imshow("contours",img)
Aftertheinitialimports,weloadtheimage,andthenapplyabinarythresholdonagrayscaleversionoftheoriginalimage.Bydoingthis,weoperateallfind-contourcalculationsonagrayscalecopy,butwedrawontheoriginalsothatwecanutilizecolorinformation.
Firstly,let’scalculateasimpleboundingbox:
x,y,w,h=cv2.boundingRect(c)
Thisisaprettystraightforwardconversionofcontourinformationtothe(x,y)coordinates,plustheheightandwidthoftherectangle.Drawingthisrectangleisaneasytaskandcanbedoneusingthiscode:
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
Secondly,let’scalculatetheminimumareaenclosingthesubject:
rect=cv2.minAreaRect(c)
box=cv2.boxPoints(rect)
box=np.int0(box)
Themechanismusedhereisparticularlyinteresting:OpenCVdoesnothaveafunctiontocalculatethecoordinatesoftheminimumrectanglevertexesdirectlyfromthecontourinformation.Instead,wecalculatetheminimumrectanglearea,andthencalculatethevertexesofthisrectangle.Notethatthecalculatedvertexesarefloats,butpixelsareaccessedwithintegers(youcan’taccessa“portion”ofapixel),soweneedtooperatethisconversion.Next,wedrawthebox,whichgivesustheperfectopportunitytointroducethecv2.drawContoursfunction:
cv2.drawContours(img,[box],0,(0,0,255),3)
Firstly,thisfunction—likealldrawingfunctions—modifiestheoriginalimage.Secondly,ittakesanarrayofcontoursinitssecondparameter,soyoucandrawanumberofcontoursinasingleoperation.Therefore,ifyouhaveasinglesetofpointsrepresentinga
contourpolygon,youneedtowrapthesepointsintoanarray,exactlylikewedidwithourboxintheprecedingexample.Thethirdparameterofthisfunctionspecifiestheindexofthecontoursarraythatwewanttodraw:avalueof-1willdrawallcontours;otherwise,acontouratthespecifiedindexinthecontoursarray(thesecondparameter)willbedrawn.
Mostdrawingfunctionstakethecolorofthedrawinganditsthicknessasthelasttwoparameters.
Thelastboundingcontourwe’regoingtoexamineistheminimumenclosingcircle:
(x,y),radius=cv2.minEnclosingCircle(c)
center=(int(x),int(y))
radius=int(radius)
img=cv2.circle(img,center,radius,(0,255,0),2)
Theonlypeculiarityofthecv2.minEnclosingCirclefunctionisthatitreturnsatwo-elementtuple,ofwhichthefirstelementisatupleitself,representingthecoordinatesofthecircle’scenter,andthesecondelementistheradiusofthiscircle.Afterconvertingallthesevaluestointegers,drawingthecircleisquiteatrivialoperation.
Thefinalresultontheoriginalimagelookslikethis:
Contours–convexcontoursandtheDouglas-PeuckeralgorithmMostofthetime,whenworkingwithcontours,subjectswillhavethemostdiverseshapes,includingconvexones.Aconvexshapeisonewheretherearetwopointswithinthisshapewhoseconnectinglinegoesoutsidetheperimeteroftheshapeitself.
ThefirstfacilitythatOpenCVofferstocalculatetheapproximateboundingpolygonofashapeiscv2.approxPolyDP.Thisfunctiontakesthreeparameters:
AcontourAnepsilonvaluerepresentingthemaximumdiscrepancybetweentheoriginalcontourandtheapproximatedpolygon(thelowerthevalue,theclosertheapproximatedvaluewillbetotheoriginalcontour)ABooleanflagsignifyingthatthepolygonisclosed
Theepsilonvalueisofvitalimportancetoobtainausefulcontour,solet’sunderstandwhatitrepresents.Anepsilonisthemaximumdifferencebetweentheapproximatedpolygon’sperimeterandtheoriginalcontour’sperimeter.Thelowerthisdifferenceis,themoretheapproximatedpolygonwillbesimilartotheoriginalcontour.
Youmayaskyourselfwhyweneedanapproximatepolygonwhenwehaveacontourthatisalreadyapreciserepresentation.Theanswertothisisthatapolygonisasetofstraightlines,andtheimportanceofbeingabletodefinepolygonsinaregionforfurthermanipulationandprocessingisparamountinmanycomputervisiontasks.
Nowthatweknowwhatanepsilonis,weneedtoobtaincontourperimeterinformationasareferencevalue.Thisisobtainedwiththecv2.arcLengthfunctionofOpenCV:
epsilon=0.01*cv2.arcLength(cnt,True)
approx=cv2.approxPolyDP(cnt,epsilon,True)
Effectively,we’reinstructingOpenCVtocalculateanapproximatedpolygonwhoseperimetercanonlydifferfromtheoriginalcontourinanepsilonratio.
OpenCValsooffersacv2.convexHullfunctiontoobtainprocessedcontourinformationforconvexshapesandthisisastraightforwardone-lineexpression:
hull=cv2.convexHull(cnt)
Let’scombinetheoriginalcontour,approximatedpolygoncontour,andtheconvexhullinoneimagetoobservethedifferencebetweenthem.Tosimplifythings,I’veappliedthecontourstoablackimagesothattheoriginalsubjectisnotvisiblebutitscontoursare:
Asyoucansee,theconvexhullsurroundstheentiresubject,theapproximatedpolygonistheinnermostpolygonshape,andinbetweenthetwoistheoriginalcontour,mainlycomposedofarcs.
LineandcircledetectionDetectingedgesandcontoursarenotonlycommonandimportanttasks,theyalsoconstitutethebasisforothercomplexoperations.Linesandshapedetectiongohandinhandwithedgeandcontourdetection,solet’sexaminehowOpenCVimplementsthese.
ThetheorybehindlinesandshapedetectionhasitsfoundationinatechniquecalledtheHoughtransform,inventedbyRichardDudaandPeterHart,whoextended(generalized)theworkdonebyPaulHoughintheearly1960s.
Let’stakealookatOpenCV’sAPIfortheHoughtransforms.
LinedetectionFirstofall,let’sdetectsomelines,whichisdonewiththeHoughLinesandHoughLinesPfunctions.TheonlydifferencebetweenthetwofunctionsisthatoneusesthestandardHoughtransform,andthesecondusestheprobabilisticHoughtransform(hencePinthename).
Theprobabilisticversionisso-calledbecauseitonlyanalyzesasubsetofpointsandestimatestheprobabilityofthesepointsallbelongingtothesameline.ThisimplementationisanoptimizedversionofthestandardHoughtransform,andinthiscase,it’slesscomputationallyintensiveandexecutesfaster.
Let’stakealookataverysimpleexample:
importcv2
importnumpyasnp
img=cv2.imread('lines.jpg')
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges=cv2.Canny(gray,50,120)
minLineLength=20
maxLineGap=5
lines=cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
forx1,y1,x2,y2inlines[0]:
cv2.line(img,(x1,y1),(x2,y2),(0,255,0),2)
cv2.imshow("edges",edges)
cv2.imshow("lines",img)
cv2.waitKey()
cv2.destroyAllWindows()
Thecrucialpointofthissimplescript—asidefromtheHoughLinesfunctioncall—isthesettingofminimumlinelength(shorterlineswillbediscarded)andthemaximumlinegap,whichisthemaximumsizeofagapinalinebeforethetwosegmentsstartbeingconsideredasseparatelines.
AlsonotethattheHoughLinesfunctiontakesasinglechannelbinaryimage,processedthroughtheCannyedgedetectionfilter.Cannyisnotastrictrequirement,however;animagethat’sbeendenoisedandonlyrepresentsedges,istheidealsourceforaHoughtransform,soyouwillfindthistobeacommonpractice.
TheparametersofHoughLinesPareasfollows:
Theimagewewanttoprocess.Thegeometricalrepresentationsofthelines,rhoandtheta,whichareusually1andnp.pi/180.Thethreshold,whichrepresentsthethresholdbelowwhichalineisdiscarded.TheHoughtransformworkswithasystemofbinsandvotes,witheachbinrepresentingaline,soanylinewithaminimumofthe<threshold>votesisretained,therestdiscarded.MinLineLengthandMaxLineGap,whichwementionedpreviously.
CircledetectionOpenCValsohasafunctionfordetectingcircles,calledHoughCircles.ItworksinaverysimilarfashiontoHoughLines,butwhereminLineLengthandmaxLineGapweretheparameterstodiscardorretainlines,HoughCircleshasaminimumdistancebetweencircles’centers,minimum,andmaximumradiusofthecircles.Here’stheobligatoryexample:
importcv2
importnumpyasnp
planets=cv2.imread('planet_glow.jpg')
gray_img=cv2.cvtColor(planets,cv2.COLOR_BGR2GRAY)
img=cv2.medianBlur(gray_img,5)
cimg=cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)
circles=cv2.HoughCircles(img,cv2.HOUGH_GRADIENT,1,120,
param1=100,param2=30,minRadius=0,maxRadius=0)
circles=np.uint16(np.around(circles))
foriincircles[0,:]:
#drawtheoutercircle
cv2.circle(planets,(i[0],i[1]),i[2],(0,255,0),2)
#drawthecenterofthecircle
cv2.circle(planets,(i[0],i[1]),2,(0,0,255),3)
cv2.imwrite("planets_circles.jpg",planets)
cv2.imshow("HoughCirlces",planets)
cv2.waitKey()
cv2.destroyAllWindows()
Here’savisualrepresentationoftheresult:
DetectingshapesThedetectionofshapeswiththeHoughtransformislimitedtocircles;however,wealreadyimplicitlyexploreddetectingshapesofanykind,specificallywhenwetalkedaboutapproxPolyDP.Thisfunctionallowstheapproximationofpolygons,soifyourimagecontainspolygons,theywillbequiteaccuratelydetected,combiningtheusageofcv2.findContoursandcv2.approxPolyDP.
SummaryAtthispoint,youshouldhavegainedagoodunderstandingofcolorspaces,FourierTransform,andtheseveralkindsoffiltersmadeavailablebyOpenCVtoprocessimages.
Youshouldalsobeproficientindetectingedges,lines,circles,andshapesingeneral.Additionally,youshouldbeabletofindcontoursandexploittheinformationtheyprovideaboutthesubjectscontainedinanimage.Theseconceptswillserveastheidealbackgroundtoexplorethetopicsinthenextchapter.
Chapter4.DepthEstimationandSegmentationThischaptershowsyouhowtousedatafromadepthcameratoidentifyforegroundandbackgroundregions,sothatwecanlimitaneffecttoonlytheforegroundoronlythebackground.Asprerequisites,weneedadepthcamera,suchasMicrosoftKinect,andweneedtobuildOpenCVwithsupportforourdepthcamera.Forbuildinstructions,seeChapter1,SettingUpOpenCV.
We’lldealwithtwomaintopicsinthischapter:depthestimationandsegmentation.Wewillexploredepthestimationwithtwodistinctapproaches:firstly,byusingadepthcamera(aprerequisiteofthefirstpartofthechapter),suchasMicrosoftKinect,andthen,byusingstereoimages,forwhichanormalcamerawillsuffice.ForinstructionsonhowtobuildOpenCVwithsupportfordepthcameras,seeChapter1,SettingUpOpenCV.Thesecondpartofthechapterisaboutsegmentation,thetechniquethatallowsustoextractforegroundobjectsfromanimage.
CreatingmodulesThecodetocaptureandmanipulatedepth-cameradatawillbereusableoutsideCameo.py.So,weshouldseparateitintoanewmodule.Let’screateafilecalleddepth.pyinthesamedirectoryasCameo.py.Weneedthefollowingimportstatementindepth.py:
importnumpy
Wewillalsoneedtomodifyourpreexistingrects.pyfilesothatourcopyoperationscanbelimitedtoanonrectangularsubregionofarectangle.Tosupportthechangeswearegoingtomake,let’saddthefollowingimportstatementstorects.py:
importnumpy
importutils
Finally,thenewversionofourapplicationwillusedepth-relatedfunctionalities.So,let’saddthefollowingimportstatementtoCameo.py:
importdepth
Now,let’sgodeeperintothesubjectofdepth.
CapturingframesfromadepthcameraBackinChapter2,HandlingFiles,Cameras,andGUIs,wediscussedtheconceptthatacomputercanhavemultiplevideocapturedevicesandeachdevicecanhavemultiplechannels.Supposeagivendeviceisastereocamera.Eachchannelmightcorrespondtoadifferentlensandsensor.Also,eachchannelmightcorrespondtodifferentkindsofdata,suchasanormalcolorimageversusadepthmap.TheC++versionofOpenCVdefinessomeconstantsfortheidentifiersofcertaindevicesandchannels.However,theseconstantsarenotdefinedinthePythonversion.
Toremedythissituation,let’saddthefollowingdefinitionsindepth.py:
#Devices.CAP_OPENNI=900#OpenNI(forMicrosoftKinect)CAP_OPENNI_ASUS=
910#OpenNI(forAsusXtion)
#ChannelsofanOpenNI-compatibledepthgenerator.CAP_OPENNI_DEPTH_MAP=0
#Depthvaluesinmm(16UC1)CAP_OPENNI_POINT_CLOUD_MAP=1#XYZinmeters
(32FC3)CAP_OPENNI_DISPARITY_MAP=2#Disparityinpixels
(8UC1)CAP_OPENNI_DISPARITY_MAP_32F=3#Disparityinpixels
(32FC1)CAP_OPENNI_VALID_DEPTH_MASK=4#8UC1
#ChannelsofanOpenNI-compatibleRGBimagegenerator.CAP_OPENNI_BGR_IMAGE
=5CAP_OPENNI_GRAY_IMAGE=6
Thedepth-relatedchannelsrequiresomeexplanation,asgiveninthefollowinglist:
Adepthmapisagrayscaleimageinwhicheachpixelvalueistheestimateddistancefromthecameratoasurface.Specifically,animagefromtheCAP_OPENNI_DEPTH_MAPchannelgivesthedistanceasafloating-pointnumberofmillimeters.Apointcloudmapisacolorimageinwhicheachcolorcorrespondstoan(x,y,orz)spatialdimension.Specifically,theCAP_OPENNI_POINT_CLOUD_MAPchannelyieldsaBGRimage,whereBisx(blueisright),Gisy(greenisup),andRisz(redisdeep),fromthecamera’sperspective.Thevaluesareinmeters.Adisparitymapisagrayscaleimageinwhicheachpixelvalueisthestereodisparityofasurface.Toconceptualizestereodisparity,let’ssupposeweoverlaytwoimagesofascene,shotfromdifferentviewpoints.Theresultwouldbesimilartoseeingdoubleimages.Forpointsonanypairoftwinobjectsinthescene,wecanmeasurethedistanceinpixels.Thismeasurementisthestereodisparity.Nearbyobjectsexhibitgreaterstereodisparitythanfar-offobjects.Thus,nearbyobjectsappearbrighterinadisparitymap.Avaliddepthmaskshowswhetherthedepthinformationatagivenpixelisbelievedtobevalid(shownbyanonzerovalue)orinvalid(shownbyavalueofzero).Forexample,ifthedepthcameradependsonaninfraredilluminator(aninfraredflash),depthinformationisinvalidinregionsthatareoccluded(shadowed)fromthislight.
Thefollowingscreenshotshowsapointcloudmapofamansittingbehindasculptureofacat:
Thefollowingscreenshothasadisparitymapofamansittingbehindasculptureofacat:
Avaliddepthmaskofamansittingbehindasculptureofacatisshowninthefollowingscreenshot:
CreatingamaskfromadisparitymapForthepurposesofCameo,weareinterestedindisparitymapsandvaliddepthmasks.Theycanhelpusrefineourestimatesoffacialregions.
UsingtheFaceTrackerfunctionandanormalcolorimage,wecanobtainrectangularestimatesoffacialregions.Byanalyzingsucharectangularregioninthecorrespondingdisparitymap,wecantellthatsomepixelswithintherectangleareoutliers—toonearortoofartoreallybeapartoftheface.Wecanrefinethefacialregiontoexcludetheseoutliers.However,weshouldonlyapplythistestwherethedataisvalid,asindicatedbythevaliddepthmask.
Let’swriteafunctiontogenerateamaskwhosevaluesare0fortherejectedregionsofthefacialrectangleand1fortheacceptedregions.Thisfunctionshouldtakeadisparitymap,validdepthmask,andarectangleasarguments.Wecanimplementitindepth.pyasfollows:
defcreateMedianMask(disparityMap,validDepthMask,rect=None):
"""Returnamaskselectingthemedianlayer,plusshadows."""
ifrectisnotNone:
x,y,w,h=rect
disparityMap=disparityMap[y:y+h,x:x+w]
validDepthMask=validDepthMask[y:y+h,x:x+w]
median=numpy.median(disparityMap)
returnnumpy.where((validDepthMask==0)|\
(abs(disparityMap-median)<12),
1.0,0.0)
Toidentifyoutliersinthedisparitymap,wefirstfindthemedianusingnumpy.median(),whichtakesanarrayasanargument.Ifthearrayisofanoddlength,median()returnsthevaluethatwouldlieinthemiddleofthearrayifthearrayweresorted.Ifthearrayisofevenlength,median()returnstheaverageofthetwovaluesthatwouldbesortednearesttothemiddleofthearray.
Togenerateamaskbasedonper-pixelBooleanoperations,weusenumpy.where()withthreearguments.Inthefirstargument,where()takesanarraywhoseelementsareevaluatedfortruthorfalsity.Anoutputarrayoflikedimensionsisreturned.Whereveranelementintheinputarrayistrue,thewhere()function’ssecondargumentisassignedtothecorrespondingelementintheoutputarray.Conversely,whereveranelementintheinputarrayisfalse,thewhere()function’sthirdargumentisassignedtothecorrespondingelementintheoutputarray.
Ourimplementationtreatsapixelasanoutlierwhenithasavaliddisparityvaluethatdeviatesfromthemediandisparityvalueby12ormore.I’vechosenthevalueof12justbyexperimentation.FeelfreetotweakthisvaluelaterbasedontheresultsyouencounterwhenrunningCameowithyourparticularcamerasetup.
MaskingacopyoperationAspartofthepreviouschapter’swork,wewrotecopyRect()asacopyoperationthatlimitsitselftothegivenrectanglesofasourceanddestinationimage.Now,wewanttoapplyfurtherlimitstothiscopyoperation.Wewanttouseagivenmaskthathasthesamedimensionsasthesourcerectangle.
Weshallcopyonlythosepixelsinthesourcerectanglewherethemask’svalueisnotzero.Otherpixelsshallretaintheiroldvaluesfromthedestinationimage.Thislogic,withanarrayofconditionsandtwoarraysofpossibleoutputvalues,canbeexpressedconciselywiththenumpy.where()functionthatwehaverecentlylearned.
Let’sopenrects.pyandeditcopyRect()toaddanewmaskargument.ThisargumentmaybeNone,inwhichcase,wefallbacktoouroldimplementationofthecopyoperation.Otherwise,wenextensurethatmaskandtheimageshavethesamenumberofchannels.Weassumethatmaskhasonechannelbuttheimagesmayhavethreechannels(BGR).Wecanaddduplicatechannelstomaskusingtherepeat()andreshape()methodsofnumpy.array.
Finally,weperformthecopyoperationusingwhere().Thecompleteimplementationisasfollows:
defcopyRect(src,dst,srcRect,dstRect,mask=None,
interpolation=cv2.INTER_LINEAR):
"""Copypartofthesourcetopartofthedestination."""
x0,y0,w0,h0=srcRect
x1,y1,w1,h1=dstRect
#Resizethecontentsofthesourcesub-rectangle.
#Puttheresultinthedestinationsub-rectangle.
ifmaskisNone:
dst[y1:y1+h1,x1:x1+w1]=\
cv2.resize(src[y0:y0+h0,x0:x0+w0],(w1,h1),
interpolation=interpolation)
else:
ifnotutils.isGray(src):
#Convertthemaskto3channels,liketheimage.
mask=mask.repeat(3).reshape(h0,w0,3)
#Performthecopy,withthemaskapplied.
dst[y1:y1+h1,x1:x1+w1]=\
numpy.where(cv2.resize(mask,(w1,h1),
interpolation=\
cv2.INTER_NEAREST),
cv2.resize(src[y0:y0+h0,x0:x0+w0],(w1,h1),
interpolation=interpolation),
dst[y1:y1+h1,x1:x1+w1])
WealsoneedtomodifyourswapRects()function,whichusescopyRect()toperformacircularswapofalistofrectangularregions.ThemodificationstoswapRects()arequitesimple.Wejustneedtoaddanewmasksargument,whichisalistofmaskswhoseelementsarepassedtotherespectivecopyRect()calls.Ifthevalueofthegivenmasks
argumentisNone,wepassNonetoeverycopyRect()call.
Thefollowingcodeshowsyouthefullimplementationofthis:
defswapRects(src,dst,rects,masks=None,
interpolation=cv2.INTER_LINEAR):
"""Copythesourcewithtwoormoresub-rectanglesswapped."""
ifdstisnotsrc:
dst[:]=src
numRects=len(rects)
ifnumRects<2:
return
ifmasksisNone:
masks=[None]*numRects
#Copythecontentsofthelastrectangleintotemporarystorage.
x,y,w,h=rects[numRects-1]
temp=src[y:y+h,x:x+w].copy()
#Copythecontentsofeachrectangleintothenext.
i=numRects-2
whilei>=0:
copyRect(src,dst,rects[i],rects[i+1],masks[i],
interpolation)
i-=1
#Copythetemporarilystoredcontentintothefirstrectangle.
copyRect(temp,dst,(0,0,w,h),rects[0],masks[numRects-1],
interpolation)
NotethatthemasksargumentincopyRect()andswapRects()bothdefaulttoNone.Thus,ournewversionsofthesefunctionsarebackwardcompatiblewithourpreviousversionsofCameo.
DepthestimationwithanormalcameraAdepthcameraisafantasticlittledevicetocaptureimagesandestimatethedistanceofobjectsfromthecameraitself,but,howdoesthedepthcameraretrievedepthinformation?Also,isitpossibletoreproducethesamekindofcalculationswithanormalcamera?
Adepthcamera,suchasMicrosoftKinect,usesatraditionalcameracombinedwithaninfraredsensorthathelpsthecameradifferentiatesimilarobjectsandcalculatetheirdistancefromthecamera.However,noteverybodyhasaccesstoadepthcameraoraKinect,andespeciallywhenyou’rejustlearningOpenCV,you’reprobablynotgoingtoinvestinanexpensivepieceofequipmentuntilyoufeelyourskillsarewell-sharpened,andyourinterestinthesubjectisconfirmed.
Oursetupincludesasimplecamera,whichismostlikelyintegratedinourmachine,orawebcamattachedtoourcomputer.So,weneedtoresorttolessfancymeansofestimatingthedifferenceindistanceofobjectsfromthecamera.
Geometrywillcometotherescueinthiscase,andinparticular,EpipolarGeometry,whichisthegeometryofstereovision.Stereovisionisabranchofcomputervisionthatextractsthree-dimensionalinformationoutoftwodifferentimagesofthesamesubject.
Howdoesepipolargeometrywork?Conceptually,ittracesimaginarylinesfromthecameratoeachobjectintheimage,thendoesthesameonthesecondimage,andcalculatesthedistanceofobjectsbasedontheintersectionofthelinescorrespondingtothesameobject.Hereisarepresentationofthisconcept:
Let’sseehowOpenCVappliesepipolargeometrytocalculateaso-calleddisparitymap,whichisbasicallyarepresentationofthedifferentdepthsdetectedintheimages.Thiswillenableustoextracttheforegroundofapictureanddiscardtherest.
Firstly,weneedtwoimagesofthesamesubjecttakenfromdifferentpointsofview,butpayingattentiontothefactthatthepicturesaretakenatanequaldistancefromtheobject,
otherwisethecalculationswillfailandthedisparitymapwillbemeaningless.
So,movingontoanexample:
importnumpyasnp
importcv2
defupdate(val=0):
#disparityrangeistunedfor'aloe'imagepair
stereo.setBlockSize(cv2.getTrackbarPos('window_size','disparity'))
stereo.setUniquenessRatio(cv2.getTrackbarPos('uniquenessRatio',
'disparity'))
stereo.setSpeckleWindowSize(cv2.getTrackbarPos('speckleWindowSize',
'disparity'))
stereo.setSpeckleRange(cv2.getTrackbarPos('speckleRange','disparity'))
stereo.setDisp12MaxDiff(cv2.getTrackbarPos('disp12MaxDiff',
'disparity'))
print'computingdisparity…'
disp=stereo.compute(imgL,imgR).astype(np.float32)/16.0
cv2.imshow('left',imgL)
cv2.imshow('disparity',(disp-min_disp)/num_disp)
if__name__=="__main__":
window_size=5
min_disp=16
num_disp=192-min_disp
blockSize=window_size
uniquenessRatio=1
speckleRange=3
speckleWindowSize=3
disp12MaxDiff=200
P1=600
P2=2400
imgL=cv2.imread('images/color1_small.jpg')
imgR=cv2.imread('images/color2_small.jpg')
cv2.namedWindow('disparity')
cv2.createTrackbar('speckleRange','disparity',speckleRange,50,
update)
cv2.createTrackbar('window_size','disparity',window_size,21,update)
cv2.createTrackbar('speckleWindowSize','disparity',speckleWindowSize,
200,update)
cv2.createTrackbar('uniquenessRatio','disparity',uniquenessRatio,50,
update)
cv2.createTrackbar('disp12MaxDiff','disparity',disp12MaxDiff,250,
update)
stereo=cv2.StereoSGBM_create(
minDisparity=min_disp,
numDisparities=num_disp,
blockSize=window_size,
uniquenessRatio=uniquenessRatio,
speckleRange=speckleRange,
speckleWindowSize=speckleWindowSize,
disp12MaxDiff=disp12MaxDiff,
P1=P1,
P2=P2
)
update()
cv2.waitKey()
Inthisexample,wetaketwoimagesofthesamesubjectandcalculateadisparitymap,showinginbrightercolorsthepointsinthemapthatareclosertothecamera.Theareasmarkedinblackrepresentthedisparities.
Firstofall,weimportnumpyandcv2asusual.
Let’sskipthedefinitionoftheupdatefunctionforasecondandtakealookatthemaincode;theprocessisquitesimple:loadtwoimages,createaStereoSGBMinstance(StereoSGBMstandsforsemiglobalblockmatching,anditisanalgorithmusedforcomputingdisparitymaps),andalsocreateafewtrackbarstoplayaroundwiththeparametersofthealgorithmandcalltheupdatefunction.
TheupdatefunctionappliesthetrackbarvaluestotheStereoSGBMinstance,andthencallsthecomputemethod,whichproducesadisparitymap.Allinall,prettysimple!HereisthefirstimageI’veused:
Thisisthesecondone:
Thereyougo:aniceandquiteeasytointerpretdisparitymap.
TheparametersusedbyStereoSGBMareasfollows(takenfromtheOpenCVdocumentation):
Parameter Description
minDisparity
Thisparameterreferstotheminimumpossibledisparityvalue.Normally,itiszerobutsometimes,rectificationalgorithmscanshiftimages,sothisparameterneedstobeadjustedaccordingly.
numDisparitiesThisparameterreferstothemaximumdisparityminusminimumdisparity.Theresultantvalueisalwaysgreaterthanzero.Inthecurrentimplementation,thisparametermustbedivisibleby16.
windowSizeThisparameterreferstoamatchedblocksize.Itmustbeanoddnumbergreaterthanorequalto1.Normally,itshouldbesomewhereinthe3-11range.
P1Thisparameterreferstothefirstparametercontrollingthedisparitysmoothness.Seethenextpoint.
P2
Thisparameterreferstothesecondparameterthatcontrolsthedisparitysmoothness.Thelargerthevaluesare,thesmootherthedisparityis.P1isthepenaltyonthedisparitychangebyplusorminus1betweenneighborpixels.P2isthepenaltyonthedisparitychangebymorethan1betweenneighborpixels.ThealgorithmrequiresP2>P1.
Seethestereo_match.cppsamplewheresomereasonablygoodP1andP2valuesareshown(suchas8*number_of_image_channels*windowSize*windowSizeand32*number_of_image_channels*windowSize*windowSize,respectively).
disp12MaxDiffThisparameterreferstothemaximumalloweddifference(inintegerpixelunits)intheleft-rightdisparitycheck.Setittoanonpositivevaluetodisablethecheck.
preFilterCap
Thisparameterreferstothetruncationvalueforprefilteredimagepixels.Thealgorithmfirstcomputesthex-derivativeateachpixelandclipsitsvaluebythe[-preFilterCap,preFilterCap]interval.TheresultantvaluesarepassedtotheBirchfield-Tomasipixelcostfunction.
uniquenessRatio
Thisparameterreferstothemargininpercentagebywhichthebest(minimum)computedcostfunctionvalueshould“win”thesecondbestvaluetoconsiderthefoundmatchtobecorrect.Normally,avaluewithinthe5-15rangeisgoodenough.
speckleWindowSize
Thisparameterreferstothemaximumsizeofsmoothdisparityregionstoconsidertheirnoisespecklesandinvalidate.Setitto0todisablespecklefiltering.Otherwise,setitsomewhereinthe50-200range.
speckleRange
Thisparameterreferstothemaximumdisparityvariationwithineachconnectedcomponent.Ifyoudospecklefiltering,settheparametertoapositivevalue;itwillimplicitlybemultipliedby16.Normally,1or2isgoodenough.
Withtheprecedingscript,you’llbeabletoloadtheimagesandplayaroundwithparametersuntilyou’rehappywiththedisparitymapgeneratedbyStereoSGBM.
ObjectsegmentationusingtheWatershedandGrabCutalgorithmsCalculatingadisparitymapcanbeveryusefultodetecttheforegroundofanimage,butStereoSGBMisnottheonlyalgorithmavailabletoaccomplishthis,andinfact,StereoSGBMismoreaboutgathering3Dinformationfrom2Dpictures,thananythingelse.GrabCut,however,isaperfecttoolforthispurpose.TheGrabCutalgorithmfollowsaprecisesequenceofsteps:
1. Arectangleincludingthesubject(s)ofthepictureisdefined.2. Thearealyingoutsidetherectangleisautomaticallydefinedasabackground.3. Thedatacontainedinthebackgroundisusedasareferencetodistinguish
backgroundareasfromforegroundareaswithintheuser-definedrectangle.4. AGaussiansMixtureModel(GMM)modelstheforegroundandbackground,and
labelsundefinedpixelsasprobablebackgroundandforegrounds.5. Eachpixelintheimageisvirtuallyconnectedtothesurroundingpixelsthrough
virtualedges,andeachedgegetsaprobabilityofbeingforegroundorbackground,basedonhowsimilaritisincolortothepixelssurroundingit.
6. Eachpixel(ornodeasitisconceptualizedinthealgorithm)isconnectedtoeitheraforegroundorabackgroundnode,whichyoucanpicturelookinglikethis:
7. Afterthenodeshavebeenconnectedtoeitherterminal(backgroundorforeground,alsocalledasourceandsink),theedgesbetweennodesbelongingtodifferentterminalsarecut(thefamouscutpartofthealgorithm),whichenablestheseparationofthepartsoftheimage.Thisgraphadequatelyrepresentsthealgorithm:
ExampleofforegrounddetectionwithGrabCutLet’slookatanexample.Westartwiththepictureofabeautifulstatueofanangel.
Wewanttograbourangelanddiscardthebackground.Todothis,wewillcreatearelativelyshortscriptthatwillinstantiateGrabCut,operatetheseparation,andthendisplaytheresultingimagesidebysidetotheoriginal.Wewilldothisusingmatplotlib,averyusefulPythonlibrary,whichmakesdisplayingchartsandimagesatrivialtask:
importnumpyasnp
importcv2
frommatplotlibimportpyplotasplt
img=cv2.imread('images/statue_small.jpg')
mask=np.zeros(img.shape[:2],np.uint8)
bgdModel=np.zeros((1,65),np.float64)
fgdModel=np.zeros((1,65),np.float64)
rect=(100,50,421,378)
cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)
mask2=np.where((mask==2)|(mask==0),0,1).astype('uint8')
img=img*mask2[:,:,np.newaxis]
plt.subplot(121),plt.imshow(img)
plt.title("grabcut"),plt.xticks([]),plt.yticks([])
plt.subplot(122),
plt.imshow(cv2.cvtColor(cv2.imread('images/statue_small.jpg'),
cv2.COLOR_BGR2RGB))
plt.title("original"),plt.xticks([]),plt.yticks([])
plt.show()
Thiscodeisactuallyquitestraightforward.Firstly,weloadtheimagewewanttoprocess,andthenwecreateamaskpopulatedwithzeroswiththesameshapeastheimagewe’veloaded:
importnumpyasnp
importcv2
frommatplotlibimportpyplotasplt
img=cv2.imread('images/statue_small.jpg')
mask=np.zeros(img.shape[:2],np.uint8)
Wethencreatezero-filledforegroundandbackgroundmodels:
bgdModel=np.zeros((1,65),np.float64)
fgdModel=np.zeros((1,65),np.float64)
Wecouldhavepopulatedthesemodelswithdata,butwe’regoingtoinitializetheGrabCutalgorithmwitharectangleidentifyingthesubjectwewanttoisolate.So,backgroundandforegroundmodelsaregoingtobedeterminedbasedontheareasleftoutoftheinitialrectangle.Thisrectangleisdefinedinthenextline:
rect=(100,50,421,378)
Nowtotheinterestingpart!WeruntheGrabCutalgorithmspecifyingtheemptymodelsandmask,andthefactthatwe’regoingtousearectangletoinitializetheoperation:
cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)
You’llalsonoticeanintegerafterfgdModel,whichisthenumberofiterationsthealgorithmisgoingtorunontheimage.Youcanincreasethese,butthereisapointinwhichpixelclassificationswillconverge,andeffectively,you’lljustbeaddingiterationswithoutobtaininganymoreimprovements.
Afterthis,ourmaskwillhavechangedtocontainvaluesbetween0and3.Thevalues,0and2,willbeconvertedintozeros,and1-3intoones,andstoredintomask2,whichwecanthenusetofilteroutallzero-valuepixels(theoreticallyleavingallforegroundpixelsintact):
mask2=np.where((mask==2)|(mask==0),0,1).astype('uint8')
img=img*mask2[:,:,np.newaxis]
Thelastpartofthecodedisplaystheimagessidebyside,andhere’stheresult:
Thisisquiteasatisfactoryresult.You’llnoticethatanareaofbackgroundisleftundertheangel’sarm.Itispossibletoapplytouchstrokestoapplymoreiterations;thetechniqueisquitewellillustratedinthegrabcut.pyfileinsamples/python2ofyourOpenCVinstallation.
ImagesegmentationwiththeWatershedalgorithmFinally,wetakeaquicklookattheWatershedalgorithm.ThealgorithmiscalledWatershed,becauseitsconceptualizationinvolveswater.Imagineareaswithlowdensity(littletonochange)inanimageasvalleys,andareaswithhighdensity(lotsofchange)aspeaks.Startfillingthevalleyswithwatertothepointwherewaterfromtwodifferentvalleysisabouttomerge.Topreventthemergingofwaterfromdifferentvalleys,youbuildabarriertokeepthemseparated.Theresultingbarrieristheimagesegmentation.
AsanItalian,Ilovefood,andoneofthethingsIlovethemostisagoodplateofpastawithapestosauce.Sohere’sapictureofthemostvitalingredientforapesto,basil:
Now,wewanttosegmenttheimagetoseparatethebasilleavesfromthewhitebackground.
Oncemore,weimportnumpy,cv2,andmatplotlib,andthenimportourbasilleaves’image:
importnumpyasnp
importcv2
frommatplotlibimportpyplotasplt
img=cv2.imread('images/basil.jpg')
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
Afterchangingthecolortograyscale,werunathresholdontheimage.Thisoperationhelpsdividingtheimageintwo,blacksandwhites:
ret,thresh=
cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
Nextup,weremovenoisefromtheimagebyapplyingthemorphologyExtransformation,anoperationthatconsistsofdilatingandthenerodinganimagetoextractfeatures:
kernel=np.ones((3,3),np.uint8)
opening=cv2.morphologyEx(thresh,cv2.MORPH_OPEN,kernel,iterations=2)
Bydilatingtheresultofthemorphologytransformation,wecanobtainareasoftheimagethataremostcertainlybackground:
sure_bg=cv2.dilate(opening,kernel,iterations=3)
Conversely,wecanobtainsureforegroundareasbyapplyingdistanceTransform.Inpracticalterms,ofalltheareasmostlikelytobeforeground,thefartherawayfromthe“border”withthebackgroundapointis,thehigherthechanceitisforeground.Oncewe’veobtainedthedistanceTransformrepresentationoftheimage,weapplyathresholdtodeterminewithahighlymathematicalprobabilitywhethertheareasareforeground:
dist_transform=cv2.distanceTransform(opening,cv2.DIST_L2,5)
ret,sure_fg=cv2.threshold(dist_transform,0.7*dist_transform.max(),255,0)
Atthisstage,wehavesomesureforegroundsandbackgrounds.Now,whatabouttheareasinbetween?Firstofall,weneedtodeterminetheseregions,whichcanbedonebysubtractingthesureforegroundfromthebackground:
sure_fg=np.uint8(sure_fg)
unknown=cv2.subtract(sure_bg,sure_fg)
Nowthatwehavetheseareas,wecanbuildourfamous“barriers”tostopthewaterfrommerging.ThisisdonewiththeconnectedComponentsfunction.WetookaglimpseatthegraphtheorywhenweanalyzedtheGrabCutalgorithm,andconceptualizedanimageasasetofnodesthatareconnectedbyedges.Giventhesureforegroundareas,someofthesenodeswillbeconnectedtogether,butsomewon’t.Thismeansthattheybelongtodifferentwatervalleys,andthereshouldbeabarrierbetweenthem:
ret,markers=cv2.connectedComponents(sure_fg)
Nowweadd1tothebackgroundareasbecauseweonlywantunknownstostayat0:
markers=markers+1
markers[unknown==255]=0
Finally,weopenthegates!Letthewaterfallandourbarriersbedrawninred:
markers=cv2.watershed(img,markers)
img[markers==-1]=[255,0,0]
plt.imshow(img)
plt.show()
Now,let’sshowtheresult:
Needlesstosay,Iamnowhungry!
SummaryInthischapter,welearnedaboutgatheringthree-dimensionalinformationfrombi-dimensionalinput(avideoframeoranimage).Firstly,weexamineddepthcameras,andthenepipolargeometryandstereoimages,sowearenowabletocalculatedisparitymaps.Finally,welookedatimagesegmentationwithtwoofthemostpopularmethods:GrabCutandWatershed.
ThischapterhasintroducedustotheworldofinterpretinginformationprovidedbyimagesandwearenowreadytoexploreanotherimportantfeatureofOpenCV:featuredescriptorsandkeypointdetection.
Chapter5.DetectingandRecognizingFacesAmongthemanyreasonsthatmakecomputervisionafascinatingsubjectisthefactthatcomputervisionmakesveryfuturistic-soundingtasksareality.Onesuchfeatureisfacedetection.OpenCVhasabuilt-infacilitytoperformfacedetection,whichhasvirtuallyinfiniteapplicationsintherealworldinallsortsofcontexts,fromsecuritytoentertainment.
ThischapterintroducessomeofOpenCV’sfacedetectionfunctionalities,alongwiththedatafilesthatdefineparticulartypesoftrackableobjects.Specifically,welookatHaarcascadeclassifiers,whichanalyzecontrastbetweenadjacentimageregionstodeterminewhetherornotagivenimageorsubimagematchesaknowntype.WeconsiderhowtocombinemultipleHaarcascadeclassifiersinahierarchy,suchthatoneclassifieridentifiesaparentregion(forourpurposes,aface)andotherclassifiersidentifychildregions(eyes,nose,andmouth).
Wealsotakeadetourintothehumblebutimportantsubjectofrectangles.Bydrawing,copying,andresizingrectangularimageregions,wecanperformsimplemanipulationsonimageregionsthatwearetracking.
Bytheendofthischapter,wewillintegratefacetrackingandrectanglemanipulationsintoCameo.Finally,we’llhavesomeface-to-faceinteraction!
ConceptualizingHaarcascadesWhenwetalkaboutclassifyingobjectsandtrackingtheirlocation,whatexactlyarewehopingtopinpoint?Whatconstitutesarecognizablepartofanobject?
Photographicimages,evenfromawebcam,maycontainalotofdetailforour(human)viewingpleasure.However,imagedetailtendstobeunstablewithrespecttovariationsinlighting,viewingangle,viewingdistance,camerashake,anddigitalnoise.Moreover,evenrealdifferencesinphysicaldetailmightnotinterestusforthepurposeofclassification.Iwastaughtinschoolthatnotwosnowflakeslookalikeunderamicroscope.Fortunately,asaCanadianchild,Ihadalreadylearnedhowtorecognizesnowflakeswithoutamicroscope,asthesimilaritiesaremoreobviousinbulk.
Thus,somemeansofabstractingimagedetailisusefulinproducingstableclassificationandtrackingresults.Theabstractionsarecalledfeatures,whicharesaidtobeextractedfromtheimagedata.Thereshouldbefarfewerfeaturesthanpixels,thoughanypixelmightinfluencemultiplefeatures.ThelevelofsimilaritybetweentwoimagescanbeevaluatedbasedonEuclideandistancesbetweentheimages’correspondingfeatures.
Forexample,distancemightbedefinedintermsofspatialcoordinatesorcolorcoordinates.Haar-likefeaturesareonetypeoffeaturethatisoftenappliedtoreal-timefacetracking.Theywerefirstusedforthispurposeinthepaper,RobustReal-TimeFaceDetection,PaulViolaandMichaelJones,KluwerAcademicPublishers,2001(availableathttp://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf).EachHaar-likefeaturedescribesthepatternofcontrastamongadjacentimageregions.Forexample,edges,vertices,andthinlineseachgeneratedistinctivefeatures.
Foranygivenimage,thefeaturesmayvarydependingontheregion’ssize;thismaybecalledthewindowsize.Twoimagesthatdifferonlyinscaleshouldbecapableofyieldingsimilarfeatures,albeitfordifferentwindowsizes.Thus,itisusefultogeneratefeaturesformultiplewindowsizes.Suchacollectionoffeaturesiscalledacascade.WemaysayaHaarcascadeisscale-invariantor,inotherwords,robusttochangesinscale.OpenCVprovidesaclassifierandtrackerforscale-invariantHaarcascadesthatitexpectstobeinacertainfileformat.
Haarcascades,asimplementedinOpenCV,arenotrobusttochangesinrotation.Forexample,anupside-downfaceisnotconsideredsimilartoanuprightfaceandafaceviewedinprofileisnotconsideredsimilartoafaceviewedfromthefront.Amorecomplexandmoreresource-intensiveimplementationcouldimproveHaarcascades’robustnesstorotationbyconsideringmultipletransformationsofimagesaswellasmultiplewindowsizes.However,wewillconfineourselvestotheimplementationinOpenCV.
GettingHaarcascadedataOnceyouhaveacopyofthesourcecodeofOpenCV3,youwillfindafolder,data/haarcascades.
ThisfoldercontainsalltheXMLfilesusedbytheOpenCVfacedetectionenginetodetectfacesinstillimages,videos,andcamerafeeds.
Onceyoufindhaarcascades,createadirectoryforyourproject;inthisfolder,createasubfoldercalledcascades,andcopythefollowingfilesfromhaarcascadesintocascades:
haarcascade_profileface.xml
haarcascade_righteye_2splits.xml
haarcascade_russian_plate_number.xml
haarcascade_smile.xml
haarcascade_upperbody.xml
Astheirnamessuggest,thesecascadesarefortrackingfaces,eyes,noses,andmouths.Theyrequireafrontal,uprightviewofthesubject.Wewillusethemlaterwhenbuildingafacedetector.Ifyouarecuriousabouthowthesedatasetsaregenerated,refertoAppendixB,GeneratingHaarCascadesforCustomTargets,OpenCVComputerVisionwithPython.Withalotofpatienceandapowerfulcomputer,youcanmakeyourowncascadesandtrainthemforvarioustypesofobjects.
UsingOpenCVtoperformfacedetectionUnlikewhatyoumaythinkfromtheoutset,performingfacedetectiononastillimageoravideofeedisanextremelysimilaroperation.Thelatterisjustthesequentialversionoftheformer:facedetectiononvideosissimplyfacedetectionappliedtoeachframereadintotheprogramfromthecamera.Naturally,awholehostofconceptsareappliedtovideofacedetectionsuchastracking,whichdoesnotapplytostillimages,butit’salwaysgoodtoknowthattheunderlyingtheoryisthesame.
Solet’sgoaheadanddetectsomefaces.
PerformingfacedetectiononastillimageThefirstandmostbasicwaytoperformfacedetectionistoloadanimageanddetectfacesinit.Tomaketheresultvisuallymeaningful,wewilldrawrectanglesaroundfacesontheoriginalimage.
Nowthatyouhavehaarcascadesincludedinyourproject,let’sgoaheadandcreateabasicscripttoperformfacedetection.
importcv2
filename='/path/to/my/pic.jpg'
defdetect(filename):
face_cascade=
cv2.CascadeClassifier('./cascades/haarcascade_frontalface_default.xml')
img=cv2.imread(filename)
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
faces=face_cascade.detectMultiScale(gray,1.3,5)
for(x,y,w,h)infaces:
img=cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
cv2.namedWindow('VikingsDetected!!')
cv2.imshow('VikingsDetected!!',img)
cv2.imwrite('./vikings.jpg',img)
cv2.waitKey(0)
detect(filename)
Let’sgothroughthecode.First,weusetheobligatorycv2import(you’llfindthateveryscriptinthisbookwillstartlikethis,oralmostsimilar).Secondly,wedeclarethedetectfunction.
defdetect(filename):
Withinthisfunction,wedeclareaface_cascadevariable,whichisaCascadeClassifierobjectforfaces,andresponsibleforfacedetection.
face_cascade=
cv2.CascadeClassifier('./cascades/haarcascade_frontalface_default.xml')
Wethenloadourfilewithcv2.imread,andconvertittograyscale,becausethat’sthecolorspaceinwhichthefacedetectionhappens.
Thenextstep(face_cascade.detectMultiScale)iswhereweoperatetheactualfacedetection.
img=cv2.imread(filename)
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
faces=face_cascade.detectMultiScale(gray,1.3,5)
TheparameterspassedarescaleFactorandminNeighbors,whichdeterminethepercentagereductionoftheimageateachiterationofthefacedetectionprocess,andtheminimumnumberofneighborsretainedbyeachfacerectangleateachiteration.Thismay
allseemalittlecomplexinthebeginningbutyoucancheckalltheoptionsoutintheofficialdocumentation.
Thevaluereturnedfromthedetectionoperationisanarrayoftuplesthatrepresentthefacerectangles.Theutilitymethod,cv2.rectangle,allowsustodrawrectanglesatthespecifiedcoordinates(xandyrepresenttheleftandtopcoordinates,wandhrepresentthewidthandheightofthefacerectangle).
Wewilldrawbluerectanglesaroundallthefaceswefindbyloopingthroughthefacesvariable,makingsureweusetheoriginalimagefordrawing,notthegrayversion.
for(x,y,w,h)infaces:
img=cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
Lastly,wecreateanamedWindowinstanceanddisplaytheresultingprocessedimageinit.Topreventtheimagewindowfromclosingautomatically,weinsertacalltowaitKey,whichclosesthewindowdownatthepressofanykey.
cv2.namedWindow('VikingsDetected!!')
cv2.imshow('VikingsDetected!!',img)
cv2.waitKey(0)
Andtherewego,awholesetofVikingshavebeendetectedinourimage,asshowninthefollowingscreenshot:
PerformingfacedetectiononavideoWenowhaveagoodfoundationtounderstandhowtoperformfacedetectiononastillimage.Asmentionedpreviously,wecanrepeattheprocessontheindividualframesofavideo(beitacamerafeedoravideo)andperformfacedetection.
Thescriptwillperformthefollowingtasks:itwillopenacamerafeed,itwillreadaframe,itwillexaminethatframeforfaces,itwillscanforeyeswithinthefacesdetected,andthenitwilldrawbluerectanglesaroundthefacesandgreenrectanglesaroundtheeyes.
1. Let’screateafilecalledface_detection.pyandstartbyimportingthenecessarymodule:
importcv2
2. Afterthis,wedeclareamethod,detect(),whichwillperformfacedetection.
defdetect():
face_cascade=
cv2.CascadeClassifier('./cascades/haarcascade_frontalface_default.xml')
eye_cascade=cv2.CascadeClassifier('./cascades/haarcascade_eye.xml')
camera=cv2.VideoCapture(0)
3. Thefirstthingweneedtodoinsidethedetect()methodistoloadtheHaarcascadefilessothatOpenCVcanoperatefacedetection.Aswecopiedthecascadefilesinthelocalcascades/folder,wecanusearelativepath.Then,weopenaVideoCaptureobject(thecamerafeed).TheVideoCaptureconstructortakesaparameter,whichindicatesthecameratobeused;zeroindicatesthefirstcameraavailable.
while(True):
ret,frame=camera.read()
gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
4. Nextup,wecaptureaframe.Theread()methodreturnstwovalues:aBooleanindicatingthesuccessoftheframereadoperation,andtheframeitself.Wecapturetheframe,andthenweconvertittograyscale.Thisisanecessaryoperation,becausefacedetectioninOpenCVhappensinthegrayscalecolorspace:
faces=face_cascade.detectMultiScale(gray,1.3,5)
5. Muchlikethesinglestillimageexample,wecalldetectMultiScaleonthegrayscaleversionoftheframe.
for(x,y,w,h)infaces:
img=cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),2)
roi_gray=gray[y:y+h,x:x+w]
eyes=eye_cascade.detectMultiScale(roi_gray,1.03,5,0,
(40,40))
NoteThereareafewadditionalparametersintheeyedetection.Why?ThemethodsignaturefordetectMultiScaletakesanumberofoptionalparameters:inthecaseofdetectingaface,thedefaultoptionsweregoodenoughtodetectfaces.However,eyesareasmallerfeatureoftheface,andself-castingshadowsinmybeardormynoseandrandomshadowsintheframeweretriggeringfalsepositives.
Bylimitingthesearchforeyestoaminimumsizeof40x40pixels,Iwasabletodiscardallfalsepositives.Goaheadandtesttheseparametersuntilyoureachapointatwhichyourapplicationperformsasyouexpecteditto(forexample,youcantryandspecifyamaximumsizeforthefeaturetoo,orincreasethescalefactorandnumberofneighbors).
6. Herewehaveafurtherstepcomparedtothestillimageexample:wecreatearegionofinterestcorrespondingtothefacerectangle,andwithinthisrectangle,weoperate“eyedetection”.Thismakessenseasyouwouldn’twanttogolookingforeyesoutsideaface(well,forhumanbeingsatleast!).
for(ex,ey,ew,eh)ineyes:
cv2.rectangle(img,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
7. Again,weloopthroughtheresultingeyetuplesanddrawgreenrectanglesaroundthem.
cv2.imshow("camera",frame)
ifcv2.waitKey(1000/12)&0xff==ord("q"):
break
camera.release()
cv2.destroyAllWindows()
if__name__=="__main__":
detect()
8. Finally,weshowtheresultingframeinthewindow.Allbeingwell,ifanyfaceiswithinthefieldofviewofthecamera,youwillhaveabluerectanglearoundtheirfaceandagreenrectanglearoundeacheye,asshowninthisscreenshot:
PerformingfacerecognitionDetectingfacesisafantasticfeatureofOpenCVandonethatconstitutesthebasisforamoreadvancedoperation:facerecognition.Whatisfacerecognition?It’stheabilityofaprogram,givenanimageoravideofeed,toidentifyaperson.Oneofthewaystoachievethis(andtheapproachadoptedbyOpenCV)isto“train”theprogrambyfeedingitasetofclassifiedpictures(afacialdatabase),andoperatetherecognitionagainstthosepictures.
ThisistheprocessthatOpenCVanditsfacerecognitionmodulefollowtorecognizefaces.
Anotherimportantfeatureofthefacerecognitionmoduleisthateachrecognitionhasaconfidencescore,whichallowsustosetthresholdsinreal-lifeapplicationstolimittheamountoffalsereads.
Let’sstartfromtheverybeginning;tooperatefacerecognition,weneedfacestorecognize.Youcandothisintwoways:supplytheimagesyourselforobtainfreelyavailablefacedatabases.ThereareanumberoffacedatabasesontheInternet:
TheYalefacedatabase(Yalefaces):http://vision.ucsd.edu/content/yale-face-databaseTheAT&T:http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.htmlTheExtendedYaleorYaleB:http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
Tooperatefacerecognitiononthesesamples,youwouldthenhavetorunfacerecognitiononanimagethatcontainsthefaceofoneofthesampledpeople.Thatmaybeaneducationalprocess,butIfoundittobenotassatisfyingasprovidingimagesofmyown.Infact,Iprobablyhadthesamethoughtthatmanypeoplehad:IwonderifIcouldwriteaprogramthatrecognizesmyfacewithacertaindegreeofconfidence.
GeneratingthedataforfacerecognitionSolet’sgoaheadandwriteascriptthatwillgeneratethoseimagesforus.Afewimagescontainingdifferentexpressionsareallthatweneed,butwehavetomakesurethesampleimagesadheretocertaincriteria:
Imageswillbegrayscaleinthe.pgmformatSquareshapeAllthesamesizeimages(Iused200x200;mostfreelyavailablesetsaresmallerthanthat)
Here’sthescriptitself:
importcv2
defgenerate():
face_cascade=
cv2.CascadeClassifier('./cascades/haarcascade_frontalface_default.xml')
eye_cascade=cv2.CascadeClassifier('./cascades/haarcascade_eye.xml')
camera=cv2.VideoCapture(0)
count=0
while(True):
ret,frame=camera.read()
gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
faces=face_cascade.detectMultiScale(gray,1.3,5)
for(x,y,w,h)infaces:
img=cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),2)
f=cv2.resize(gray[y:y+h,x:x+w],(200,200))
cv2.imwrite('./data/at/jm/%s.pgm'%str(count),f)
count+=1
cv2.imshow("camera",frame)
ifcv2.waitKey(1000/12)&0xff==ord("q"):
break
camera.release()
cv2.destroyAllWindows()
if__name__=="__main__":
generate()
Whatisquiteinterestingaboutthisexerciseisthatwearegoingtogeneratesampleimagesbuildingonournewfoundknowledgeofhowtodetectafaceinavideofeed.Effectively,whatwearedoingisdetectingaface,croppingthatregionofthegray-scaledframe,resizingittobe200x200pixels,andsavingitwithanameinaparticularfolder(inmycase,jm;youcanuseyourinitials)inthe.pgmformat.
Iinsertedavariable,count,becauseweneededprogressivenamesfortheimages.Runthescriptforafewseconds,changeexpressionsafewtimes,andcheckthedestinationfolderyouspecifiedinthescript.Youwillfindanumberofimagesofyourface,grayed,resized,andnamedwiththeformat,<count>.pgm.
Let’snowmoveontotryandrecognizeourfaceinavideofeed.Thisshouldbefun!
RecognizingfacesOpenCV3comeswiththreemainmethodsforrecognizingfaces,basedonthreedifferentalgorithms:Eigenfaces,Fisherfaces,andLocalBinaryPatternHistograms(LBPH).Itisbeyondthescopeofthisbooktogetintothenitty-grittyofthetheoreticaldifferencesbetweenthesemethods,butwecangiveahigh-leveloverviewoftheconcepts.
Iwillreferyoutothefollowinglinksforadetaileddescriptionofthealgorithms:
PrincipalComponentAnalysis(PCA):AveryintuitiveintroductionbyJonathonShlensisavailableathttp://arxiv.org/pdf/1404.1100v1.pdf.Thisalgorithmwasinventedin1901byK.Pearson,andtheoriginalpaper,OnLinesandPlanesofClosestFittoSystemsofPointsinSpace,isavailableathttp://stat.smmu.edu.cn/history/pearson1901.pdf.Eigenfaces:Thepaper,EigenfacesforRecognition,M.TurkandA.Pentland,1991,
isavailableathttp://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf.Fisherfaces:Theseminalpaper,THEUSEOFMULTIPLEMEASUREMENTSINTAXONOMICPROBLEMS,R.A.Fisher,1936,isavailableathttp://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.1936.tb02137.x/pdf.LocalBinaryPattern:ThefirstpaperdescribingthisalgorithmisPerformanceevaluationoftexturemeasureswithclassificationbasedonKullbackdiscriminationofdistributions,T.Ojala,M.Pietikainen,D.Harwoodandisavailableathttp://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=576366&searchWithin%5B%5D=%22Authors%22%3A.QT.Ojala%2C+T..QT.&newsearch=true
Firstandforemost,allmethodsfollowasimilarprocess;theyalltakeasetofclassifiedobservations(ourfacedatabase,containingnumeroussamplesperindividual),get“trained”onit,performananalysisoffacesdetectedinanimageorvideo,anddeterminetwoelements:whetherthesubjectisidentified,andameasureoftheconfidenceofthesubjectreallybeingidentified,whichiscommonlyknownastheconfidencescore.
EigenfacesperformsasocalledPCA,which—ofallthemathematicalconceptsyouwillhearmentionedinrelationtocomputervision—ispossiblythemostdescriptive.Itbasicallyidentifiesprincipalcomponentsofacertainsetofobservations(again,yourfacedatabase),calculatesthedivergenceofthecurrentobservation(thefacesbeingdetectedinanimageorframe)comparedtothedataset,anditproducesavalue.Thesmallerthevalue,thesmallerthedifferencebetweenfacedatabaseanddetectedface;hence,avalueof0isanexactmatch.
FisherfacesderivesfromPCAandevolvestheconcept,applyingmorecomplexlogic.Whilecomputationallymoreintensive,ittendstoyieldmoreaccurateresultsthanEigenfaces.
LBPHinsteadroughly(again,fromaveryhighlevel)dividesadetectedfaceintosmallcellsandcompareseachcelltothecorrespondingcellinthemodel,producingahistogramofmatchingvaluesforeacharea.Becauseofthisflexibleapproach,LBPHistheonlyfacerecognitionalgorithmthatallowsthemodelsamplefacesandthedetectedfacestobeofdifferentshapeandsize.Ipersonallyfoundthistobethemostaccuratealgorithmgenerallyspeaking,buteachalgorithmhasitsstrengthsandweaknesses.
PreparingthetrainingdataNowthatwehaveourdata,weneedtoloadthesesamplepicturesintoourfacerecognitionalgorithms.Allfacerecognitionalgorithmstaketwoparametersintheirtrain()method:anarrayofimagesandanarrayoflabels.Whatdotheselabelsrepresent?TheyaretheIDsofacertainindividual/facesothatwhenfacerecognitionisperformed,wenotonlyknowthepersonwasrecognizedbutalsowho—amongthemanypeopleavailableinourdatabase—thepersonis.
Todothat,weneedtocreateacomma-separatedvalue(CSV)file,whichwillcontainthepathtoasamplepicturefollowedbytheIDofthatperson.Inmycase,Ihave20picturesgeneratedwiththepreviousscript,inthesubfolder,jm/,ofthefolder,data/at/,whichcontainsallthepicturesofalltheindividuals.
MyCSVfilethereforelookslikethis:
jm/1.pgm;0
jm/2.pgm;0
jm/3.pgm;0…
jm/20.pgm;0
NoteThedotsareallthemissingnumbers.Thejm/instanceindicatesthesubfolder,andthe0valueattheendistheIDformyface.
OK,atthisstage,wehaveeverythingweneedtoinstructOpenCVtorecognizeourface.
LoadingthedataandrecognizingfacesNextup,weneedtoloadthesetworesources(thearrayofimagesandCSVfile)intothefacerecognitionalgorithm,soitcanbetrainedtorecognizeourface.Todothis,webuildafunctionthatreadstheCSVfileand—foreachlineofthefile—loadstheimageatthecorrespondingpathintotheimagesarrayandtheIDintothelabelsarray.
defread_images(path,sz=None):
c=0
X,y=[],[]
fordirname,dirnames,filenamesinos.walk(path):
forsubdirnameindirnames:
subject_path=os.path.join(dirname,subdirname)
forfilenameinos.listdir(subject_path):
try:
if(filename==".directory"):
continue
filepath=os.path.join(subject_path,filename)
im=cv2.imread(os.path.join(subject_path,filename),
cv2.IMREAD_GRAYSCALE)
#resizetogivensize(ifgiven)
if(szisnotNone):
im=cv2.resize(im,(200,200))
X.append(np.asarray(im,dtype=np.uint8))
y.append(c)
exceptIOError,(errno,strerror):
print"I/Oerror({0}):{1}".format(errno,strerror)
except:
print"Unexpectederror:",sys.exc_info()[0]
raise
c=c+1
return[X,y]
PerforminganEigenfacesrecognitionWe’rereadytotestthefacerecognitionalgorithm.Here’sthescripttoperformit:
defface_rec():
names=['Joe','Jane','Jack']
iflen(sys.argv)<2:
print"USAGE:facerec_demo.py</path/to/images>
[</path/to/store/images/at>]"
sys.exit()
[X,y]=read_images(sys.argv[1])
y=np.asarray(y,dtype=np.int32)
iflen(sys.argv)==3:
out_dir=sys.argv[2]
model=cv2.face.createEigenFaceRecognizer()
model.train(np.asarray(X),np.asarray(y))
camera=cv2.VideoCapture(0)
face_cascade=
cv2.CascadeClassifier('./cascades/haarcascade_frontalface_default.xml')
while(True):
read,img=camera.read()
faces=face_cascade.detectMultiScale(img,1.3,5)
for(x,y,w,h)infaces:
img=cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
roi=gray[x:x+w,y:y+h]
try:
roi=cv2.resize(roi,(200,200),
interpolation=cv2.INTER_LINEAR)
params=model.predict(roi)
print"Label:%s,Confidence:%.2f"%(params[0],params[1])
cv2.putText(img,names[params[0]],(x,y-20),
cv2.FONT_HERSHEY_SIMPLEX,1,255,2)
except:
continue
cv2.imshow("camera",img)
ifcv2.waitKey(1000/12)&0xff==ord("q"):
break
cv2.destroyAllWindows()
Thereareafewlinesthatmaylookabitmysterious,solet’sanalyzethescript.Firstofall,there’sanarrayofnamesdeclared;thosearetheactualnamesoftheindividualpeopleIstoredinmydatabaseoffaces.It’sgreattoidentifyapersonasID0,butprinting'Joe'ontopofafacethat’sbeencorrectlydetectedandrecognizedismuchmoredramatic.
SowheneverthescriptrecognizesanID,wewillprintthecorrespondingnameinthenamesarrayinsteadofanID.
Afterthis,weloadtheimagesasdescribedinthepreviousfunction,createthefacerecognitionmodelwithcv2.createEigenFaceRecognizer(),andtrainitbypassingthetwoarraysofimagesandlabels(IDs).NotethattheEigenfacerecognizertakestwoimportantparametersthatyoucanspecify:thefirstoneisthenumberofprincipalcomponentsyouwanttokeepandthesecondisafloatvaluespecifyingaconfidencethreshold.
Nextup,werepeatasimilarprocesstothefacedetectionoperation.Thistime,though,weextendtheprocessingoftheframesbyalsooperatingfacerecognitiononanyfacethat’sbeendetected.
Thishappensintwosteps:firstly,weresizethedetectedfacetotheexpectedsize(inmycase,sampleswere200x200pixels),andthenwecallthepredict()functionontheresizedregion.
NoteThisisabitofasimplifiedprocess,anditservesthepurposeofenablingyoutohaveabasicapplicationrunningandunderstandtheprocessoffacerecognitioninOpenCV3.Inreality,youwillapplyafewmoreoptimizations,suchascorrectlyaligningandrotatingdetectedfaces,sotheaccuracyoftherecognitionismaximized.
Lastly,weobtaintheresultsoftherecognitionand,justforeffect,wedrawitintheframe:
PerformingfacerecognitionwithFisherfacesWhataboutFisherfaces?Theprocessdoesn’tchangemuch;wesimplyneedtoinstantiateadifferentalgorithm.So,thedeclarationofourmodelvariablewouldlooklikeso:
model=cv2.face.createFisherFaceRecognizer()
FisherfacetakesthesametwoargumentsasEigenfaces:theFisherfacestokeepandtheconfidencethreshold.Faceswithconfidenceabovethisthresholdwillbediscarded.
PerformingfacerecognitionwithLBPHFinally,let’stakeaquicklookattheLBPHalgorithm.Again,theprocessisverysimilar.However,theparameterstakenbythealgorithmfactoryareabitmorecomplexasthey
indicateinorder:radius,neighbors,grid_x,grid_y,andtheconfidencethreshold.Ifyoudon’tspecifythesevalues,theywillautomaticallybesetto1,8,8,8,and123.0.Themodeldeclarationwilllooklikeso:
model=cv2.face.createLBPHFaceRecognizer()
NoteNotethatwithLBPH,youwon’tneedtoresizeimages,asthedivisioningridsallowscomparingpatternsidentifiedineachcell.
DiscardingresultswithconfidencescoreThepredict()methodreturnsatwo-elementarray:thefirstelementisthelabeloftherecognizedindividualandthesecondistheconfidencescore.Allalgorithmscomewiththeoptionofsettingaconfidencescorethreshold,whichmeasuresthedistanceoftherecognizedfacefromtheoriginalmodel,thereforeascoreof0signifiesanexactmatch.
Theremaybecasesinwhichyouwouldratherretainallrecognitions,andthenapplyfurtherprocessing,soyoucancomeupwithyourownalgorithmstoestimatetheconfidencescoreofarecognition;forexample,ifyouaretryingtoidentifypeopleinavideo,youmaywanttoanalyzetheconfidencescoreinsubsequentframestoestablishwhethertherecognitionwassuccessfulornot.Inthiscase,youcaninspecttheconfidencescoreobtainedbythealgorithmanddrawyourownconclusions.
NoteTheconfidencescorevalueiscompletelydifferentinEigenfaces/FisherfacesandLBPH.EigenfacesandFisherfaceswillproducevalues(roughly)intherange0to20,000,withanyscorebelow4-5,000beingquiteaconfidentrecognition.
LBPHworkssimilarly;however,thereferencevalueforagoodrecognitionisbelow50,andanyvalueabove80isconsideredasalowconfidencescore.
Anormalcustomapproachwouldbetohold-offdrawingarectanglearoundarecognizedfaceuntilwehaveanumberofframeswithasatisfyingarbitraryconfidencescore,butyouhavetotalfreedomtouseOpenCV’sfacerecognitionmoduletotailoryourapplicationtoyourneeds.
SummaryBynow,youshouldhaveagoodunderstandingofhowfacedetectionandfacerecognitionwork,andhowtoimplementtheminPythonandOpenCV3.
Facedetectionandfacerecognitionareconstantlyevolvingbranchesofcomputervision,withalgorithmsbeingdevelopedcontinuously,andtheywillevolveevenfasterinthenearfuturewiththeemphasisposedonroboticsandtheInternetofthings.
Fornow,theaccuracyofdetectionandrecognitionheavilydependsonthequalityofthetrainingdata,somakesureyouprovideyourapplicationswithhigh-qualityfacedatabasesandyouwillbesatisfiedwiththeresults.
Chapter6.RetrievingImagesandSearchingUsingImageDescriptorsSimilartothehumaneyesandbrain,OpenCVcandetectthemainfeaturesofanimageandextracttheseintoso-calledimagedescriptors.Thesefeaturescanthenbeusedasadatabase,enablingimage-basedsearches.Moreover,wecanusekeypointstostitchimagestogetherandcomposeabiggerimage(thinkofputtingtogethermanypicturestoforma360degreepanorama).
ThischaptershowsyouhowtodetectfeaturesofanimagewithOpenCVandmakeuseofthemtomatchandsearchimages.Throughoutthechapter,wewilltakesampleimagesanddetecttheirmainfeatures,andthentrytofindasampleimagecontainedinanotherimageusinghomography.
FeaturedetectionalgorithmsThereareanumberofalgorithmsthatcanbeusedtodetectandextractfeatures,andwewillexploremostofthem.ThemostcommonalgorithmsusedinOpenCVareasfollows:
Harris:ThisalgorithmisusefultodetectcornersSIFT:ThisalgorithmisusefultodetectblobsSURF:ThisalgorithmisusefultodetectblobsFAST:ThisalgorithmisusefultodetectcornersBRIEF:ThisalgorithmisusefultodetectblobsORB:ThisalgorithmstandsforOrientedFASTandRotatedBRIEF
Matchingfeaturescanbeperformedwiththefollowingmethods:
Brute-ForcematchingFLANN-basedmatching
Spatialverificationcanthenbeperformedwithhomography.
DefiningfeaturesWhatisafeatureexactly?Whyisaparticularareaofanimageclassifiableasafeature,whileothersarenot?Broadlyspeaking,afeatureisanareaofinterestintheimagethatisuniqueoreasilyrecognizable.Asyoucanimagine,cornersandhigh-densityareasaregoodfeatures,whilepatternsthatrepeatthemselvesalotorlow-densityareas(suchasabluesky)arenot.Edgesaregoodfeaturesastheytendtodividetworegionsofanimage.Ablob(anareaofanimagethatgreatlydiffersfromitssurroundingareas)isalsoaninterestingfeature.
Mostfeaturedetectionalgorithmsrevolvearoundtheidentificationofcorners,edges,andblobs,withsomealsofocusingontheconceptofaridge,whichyoucanconceptualizeasthesymmetryaxisofanelongatedobject(think,forexample,aboutidentifyingaroadinanimage).
Somealgorithmsarebetteratidentifyingandextractingfeaturesofacertaintype,soit’simportanttoknowwhatyourinputimageissothatyoucanutilizethebesttoolinyourOpenCVbelt.
Detectingfeatures–cornersLet’sstartbyidentifyingcornersbyutilizingCornerHarris,andlet’sdothiswithanexample.IfyoucontinuestudyingOpenCVbeyondthisbook,you’llfindthat—formanyareason—chessboardsareacommonsubjectofanalysisincomputervision,partlybecauseachequeredpatternissuitedtomanytypesoffeaturedetections,andmaybebecausechessisprettypopularamonggeeks.
Here’soursampleimage:
OpenCVhasaveryhandyutilityfunctioncalledcornerHarris,whichdetectscornersinanimage.Thecodetoillustratethisisincrediblysimple:
importcv2
importnumpyasnp
img=cv2.imread('images/chess_board.png')
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray=np.float32(gray)
dst=cv2.cornerHarris(gray,2,23,0.04)
img[dst>0.01*dst.max()]=[0,0,255]
while(True):
cv2.imshow('corners',img)
ifcv2.waitKey(1000/12)&0xff==ord("q"):
break
cv2.destroyAllWindows()
Let’sanalyzethecode:aftertheusualimports,weloadthechessboardimageandturnittograyscalesothatcornerHarriscancomputeit.Then,wecallthecornerHarrisfunction:
dst=cv2.cornerHarris(gray,2,23,0.04)
Themostimportantparameterhereisthethirdone,whichdefinestheapertureoftheSobeloperator.TheSobeloperatorperformsthechangedetectionintherowsandcolumnsofanimagetodetectedges,anditdoesthisusingakernel.TheOpenCVcornerHarrisfunctionusesaSobeloperatorwhoseapertureisdefinedbythisparameter.Inplainenglish,itdefineshowsensitivecornerdetectionis.Itmustbebetween3and31andbeanoddvalue.Atvalue3,allthosediagonallinesintheblacksquaresofthechessboardwillregisterascornerswhentheytouchtheborderofthesquare.At23,onlythecornersofeachsquarewillbedetectedascorners.
Considerthefollowingline:
img[dst>0.01*dst.max()]=[0,0,255]
Here,inplaceswherearedmarkinthecornersisdetected,tweakingthesecondparameterincornerHarriswillchangethis,thatis,thesmallerthevalue,thesmallerthemarksindicatingcorners.
Here’sthefinalresult:
Great,wehavecornerpointsmarked,andtheresultismeaningfulatfirstglance;allthecornersaremarkedinred.
FeatureextractionanddescriptionusingDoGandSIFTTheprecedingtechnique,whichusescornerHarris,isgreattodetectcornersandhasadistinctadvantagebecausecornersarecorners;theyaredetectedeveniftheimageisrotated.
However,ifwereduce(orincrease)thesizeofanimage,somepartsoftheimagemayloseorevengainacornerquality.
Forexample,lookatthefollowingcornerdetectionsoftheF1ItalianGrandPrixtrack:
Here’sasmallerversionofthesamescreenshot:
Youwillnoticehowcornersarealotmorecondensed;however,wedidn’tonlygaincorners,wealsolostsome!Inparticular,lookattheVarianteAscarichicanethatsquigglesattheendoftheNW/SEstraightpartofthetrack.Inthelargerversionoftheimage,boththeentranceandtheapexofthedoublebendweredetectedascorners.Inthe
smallerimage,theapexisnotdetectedassuch.Themorewereducetheimage,themorelikelyitisthatwe’regoingtolosetheentrancetothatchicanetoo.
Thislossoffeaturesraisesanissue;weneedanalgorithmthatwillworkregardlessofthescaleoftheimage.EnterSIFT:whileScale-InvariantFeatureTransformmaysoundabitmysterious,nowthatweknowwhatproblemwe’retryingtoresolve,itactuallymakessense.Weneedafunction(atransform)thatwilldetectfeatures(afeaturetransform)andwillnotoutputdifferentresultsdependingonthescaleoftheimage(ascale-invariantfeaturetransform).NotethatSIFTdoesnotdetectkeypoints(whichisdonewithDifferenceofGaussians),butitdescribestheregionsurroundingthembymeansofafeaturevector.
AquickintroductiontoDifferenceofGaussians(DoG)isinorder;wehavealreadytalkedaboutlowpassfiltersandblurringoperations,specificallywiththecv2.GaussianBlur()function.DoGistheresultofdifferentGaussianfiltersappliedtothesameimage.InChapter3,ProcessingImageswithOpenCV3,weappliedthistechniquetocomputeaveryefficientedgedetection,andtheideaisthesame.ThefinalresultofaDoGoperationcontainsareasofinterest(keypoints),whicharethengoingtobedescribedthroughSIFT.
Let’sseehowSIFTbehavesinanimagefullofcornersandfeatures:
Now,thebeautifulpanoramaofVarese(Lombardy,Italy)alsogainsacomputervisionmeaning.Here’sthecodeusedtoobtainthisprocessedimage:
importcv2
importsys
importnumpyasnp
imgpath=sys.argv[1]
img=cv2.imread(imgpath)
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
sift=cv2.xfeatures2d.SIFT_create()
keypoints,descriptor=sift.detectAndCompute(gray,None)
img=cv2.drawKeypoints(image=img,outImage=img,keypoints=keypoints,
flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINT,color=(51,163,236))
cv2.imshow('sift_keypoints',img)
while(True):
ifcv2.waitKey(1000/12)&0xff==ord("q"):
break
cv2.destroyAllWindows()
Aftertheusualimports,weloadtheimagethatwewanttoprocess.Tomakethisscriptgeneric,wewilltaketheimagepathasacommand-lineargumentusingthesysmoduleofPython:
imgpath=sys.argv[1]
img=cv2.imread(imgpath)
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
Wethenturntheimageintograyscale.Atthisstage,youmayhavegatheredthatmostprocessingalgorithmsinPythonneedagrayscalefeedinordertowork.
ThenextstepistocreateaSIFTobjectandcomputethegrayscaleimage:
sift=cv2.xfeatures2d.SIFT_create()
keypoints,descriptor=sift.detectAndCompute(gray,None)
Thisisaninterestingandimportantprocess;theSIFTobjectusesDoGtodetectkeypointsandcomputesafeaturevectorforthesurroundingregionsofeachkeypoint.Asthenameofthemethodclearlygivesaway,therearetwomainoperationsperformed:detectionandcomputation.Thereturnvalueoftheoperationisatuplecontainingkeypointinformation(keypoints)andthedescriptor.
Finally,weprocessthisimagebydrawingthekeypointsonitanddisplayingitwiththeusualimshowfunction.
NotethatinthedrawKeypointsfunction,wepassaflagthathasavalueof4.Thisisactuallythecv2moduleproperty:
cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINT
Thiscodeenablesthedrawingofcirclesandorientationofeachkeypoint.
AnatomyofakeypointLet’stakeaquicklookatthedefinition,fromtheOpenCVdocumentation,ofthekeypointclass:
pt
size
angle
response
octave
class_id
Somepropertiesaremoreself-explanatorythanothers,butlet’snottakeanythingforgrantedandgothrougheachone:
Thept(point)propertyindicatesthexandycoordinatesofthekeypointintheimage.Thesizepropertyindicatesthediameterofthefeature.Theanglepropertyindicatestheorientationofthefeatureasshownintheprecedingprocessedimage.Theresponsepropertyindicatesthestrengthofthekeypoint.SomefeaturesareclassifiedbySIFTasstrongerthanothers,andresponseisthepropertyyouwouldchecktoevaluatethestrengthofafeature.Theoctavepropertyindicatesthelayerinthepyramidwherethefeaturewasfound.Tofullyexplainthisproperty,wewouldneedtowriteanentirechapteronit,soIwillonlyintroducethebasicconcept.TheSIFTalgorithmoperatesinasimilarfashiontofacedetectionalgorithmsinthat,itprocessesthesameimagesequentiallybutalterstheparametersofthecomputation.
Forexample,thescaleoftheimageandneighboringpixelsareparametersthatchangeateachiteration(octave)ofthealgorithm.So,theoctavepropertyindicatesthelayeratwhichthekeypointwasdetected.
Finally,theobjectIDistheIDofthekeypoint.
FeatureextractionanddetectionusingFastHessianandSURFComputervisionisarelativelyrecentbranchofcomputerscienceandmanyalgorithmsandtechniquesareofrecentinvention.SIFTisinfactonly16yearsold,havingbeenpublishedbyDavidLowein1999.
SURFisafeaturedetectionalgorithmpublishedin2006byHerbertBay,whichisseveraltimesfasterthanSIFT,anditispartiallyinspiredbyit.
NoteNotethatbothSIFTandSURFarepatentedalgorithmsand,forthisreason,aremadeavailableinthexfeatures2dmoduleofOpenCV.
ItisnotparticularlyrelevanttothisbooktounderstandhowSURFworksunderthehood,asmuchaswecanuseitinourapplicationsandmakethebestofit.WhatisimportanttounderstandisthatSURFisanOpenCVclassthatoperateskeypointdetectionwiththeFastHessianalgorithmandextractionwithSURF,muchliketheSIFTclassinOpenCVoperatingkeypointdetectionwithDoGandextractionwithSIFT.
Also,thegoodnewsisthatasafeaturedetectionalgorithm,theAPIofSURFdoesnotdifferfromSIFT.Therefore,wecansimplyeditthepreviousscripttodynamicallychooseafeaturedetectionalgorithminsteadofrewritingtheentireprogram.
Asweonlysupporttwoalgorithmsfornow,thereisnoneedtofindaparticularlyelegantsolutiontotheevaluationofthealgorithmtobeusedandwe’llusethesimpleifblocks,asshowninthefollowingcode:
importcv2
importsys
importnumpyasnp
imgpath=sys.argv[1]
img=cv2.imread(imgpath)
alg=sys.argv[2]
deffd(algorithm):
ifalgorithm=="SIFT":
returncv2.xfeatures2d.SIFT_create()
ifalgorithm=="SURF":
returncv2.xfeatures2d.SURF_create(float(sys.argv[3])iflen(sys.argv)
==4else4000)
gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
fd_alg=fd(alg)
keypoints,descriptor=fd_alg.detectAndCompute(gray,None)
img=cv2.drawKeypoints(image=img,outImage=img,keypoints=keypoints,
flags=4,color=(51,163,236))
cv2.imshow('keypoints',img)
while(True):
ifcv2.waitKey(1000/12)&0xff==ord("q"):
break
cv2.destroyAllWindows()
Here’stheresultusingSURFwithathreshold:
ThisimagehasbeenobtainedbyprocessingitwithaSURFalgorithmusingaHessianthresholdof8000.Tobeprecise,Iranthefollowingcommand:
>pythonfeat_det.pyimages/varese.jpgSURF8000
Thehigherthethreshold,thelessfeaturesidentified,soplayaroundwiththevaluesuntilyoureachanoptimaldetection.Intheprecedingcase,youcanclearlyseehowindividualbuildingsaredetectedasfeatures.
InaprocesssimilartotheoneweadoptedinChapter4,DepthEstimationandSegmentation,whenwewerecalculatingdisparitymaps,try—asanexercise—tocreateatrackbartofeedthevalueoftheHessianthresholdtotheSURFinstance,andseethenumberoffeaturesincreaseanddecreaseinaninverselyproportionalfashion.
Now,let’sexaminecornerdetectionwithFAST,theBRIEFkeypointdescriptor,andORB(whichusesthetwo)andputfeaturedetectiontogooduse.
ORBfeaturedetectionandfeaturematchingIfSIFTisyoung,andSURFyounger,ORBisinitsinfancy.ORBwasfirstpublishedin2011asafastalternativetoSIFTandSURF.
Thealgorithmwaspublishedinthepaper,ORB:anefficientalternativetoSIFTorSURF,andisavailableinthePDFformatathttp://www.vision.cs.chubu.ac.jp/CV-R/pdf/Rublee_iccv2011.pdf.
ORBmixestechniquesusedintheFASTkeypointdetectionandtheBRIEFdescriptor,soitisdefinitelyworthtakingaquicklookatFASTandBRIEFfirst.WewillthentalkaboutBrute-Forcematching—oneofthealgorithmsusedforfeaturematching—andshowanexampleoffeaturematching.
FASTTheFeaturesfromAcceleratedSegmentTest(FAST)algorithmworksinacleverway;itdrawsacirclearoundincluding16pixels.Itthenmarkseachpixelbrighterordarkerthanaparticularthresholdcomparedtothecenterofthecircle.Acornerisdefinedbyidentifyinganumberofcontiguouspixelsmarkedasbrighterordarker.
FASTimplementsahigh-speedtest,whichattemptsatquicklyskippingthewhole16-pixeltest.Tounderstandhowthistestworks,let’stakealookatthisscreenshot:
Asyoucansee,threeoutoffourofthetestpixels(pixelsnumber1,9,5,and13)mustbewithin(orbeyond)thethreshold(and,therefore,markedasbrighterordarker)andonemustbeintheoppositesideofthethreshold.Ifallfouraremarkedasbrighterordarker,ortwoareandtwoarenot,thepixelisnotacandidatecorner.
FASTisanincrediblycleveralgorithm,butnotdevoidofweaknesses,andtocompensatetheseweaknesses,developersanalyzingimagescanimplementamachinelearningapproach,feedingasetofimages(relevanttoyourapplication)tothealgorithmsothatcornerdetectionisoptimized.
Despitethis,FASTdependsonathreshold,sothedeveloper’sinputisalwaysnecessary(unlikeSIFT).
BRIEFBinaryRobustIndependentElementaryFeatures(BRIEF),ontheotherhand,isnotafeaturedetectionalgorithm,butadescriptor.Wehavenotyetexploredthisconcept,solet’sexplainwhatadescriptoris,andthenlookatBRIEF.
YouwillnoticethatwhenwepreviouslyanalyzedimageswithSIFTandSURF,theheartoftheentireprocessisthecalltothedetectAndComputefunction.Thisfunctionoperatestwodifferentsteps:detectionandcomputation,andtheyreturntwodifferentresultsifcoupledinatuple.
Theresultofdetectionisasetofkeypoints;theresultofthecomputationisthedescriptor.ThismeansthattheOpenCV’sSIFTandSURFclassesarebothdetectorsanddescriptors(although,rememberthattheoriginalalgorithmsarenot!OpenCV’sSIFTisreallyDoGplusSIFTandOpenCV’sSURFisreallyFastHessianplusSURF).
Keypointdescriptorsarearepresentationoftheimagethatservesasthegatewaytofeaturematchingbecauseyoucancomparethekeypointdescriptorsoftwoimagesandfindcommonalities.
BRIEFisoneofthefastestdescriptorscurrentlyavailable.ThetheorybehindBRIEFisactuallyquitecomplicated,butsufficetosaythatBRIEFadoptsaseriesofoptimizationsthatmakeitaverygoodchoiceforfeaturematching.
Brute-ForcematchingABrute-Forcematcherisadescriptormatcherthatcomparestwodescriptorsandgeneratesaresultthatisalistofmatches.Thereasonwhyit’scalledBrute-Forceisthatthereislittleoptimizationinvolvedinthealgorithm;allthefeaturesinthefirstdescriptorarecomparedtothefeaturesintheseconddescriptor,andeachcomparisonisgivenadistancevalueandthebestresultisconsideredamatch.
Thisiswhyit’scalledBrute-Force.Incomputing,theterm,brute-force,isoftenassociatedwithanapproachthatprioritizestheexhaustionofallpossiblecombinations(forexample,allthepossiblecombinationsofcharacterstocrackapassword)oversomecleverandconvolutedalgorithmicallogic.OpenCVprovidesaBFMatcherobjectthatdoesjustthat.
FeaturematchingwithORBNowthatwehaveageneralideaofwhatFASTandBRIEFare,wecanunderstandwhytheteambehindORB(atthetimecomposedbyEthanRublee,VincentRabaud,KurtKonolige,andGaryR.Bradski)chosethesetwoalgorithmsasafoundationforORB.
Intheirpaper,theauthorsaimatachievingthefollowingresults:
TheadditionofafastandaccurateorientationcomponenttoFASTTheefficientcomputationoforientedBRIEFfeaturesAnalysisofvarianceandcorrelationoforientedBRIEFfeaturesAlearningmethodtodecorrelateBRIEFfeaturesunderrotationalinvariance,leadingtobetterperformanceinnearest-neighborapplications.
Asidefromverytechnicaljargon,themainpointsarequiteclear;ORBaimsatoptimizingandspeedingupoperations,includingtheveryimportantstepofutilizingBRIEFinarotation-awarefashionsothatmatchingisimprovedeveninsituationswhereatrainingimagehasaverydifferentrotationtothequeryimage.
Atthisstage,though,Ibetyouhavehadenoughofthetheoryandwanttosinkyourteethinsomefeaturematching,solet’sgolookatsomecode.
Asanavidlistenerofmusic,thefirstexamplethatcomestomymindistogetthelogoofabandandmatchittooneoftheband’salbums:
importnumpyasnp
importcv2
frommatplotlibimportpyplotasplt
img1=cv2.imread('images/manowar_logo.png',cv2.IMREAD_GRAYSCALE)
img2=cv2.imread('images/manowar_single.jpg',cv2.IMREAD_GRAYSCALE)
orb=cv2.ORB_create()
kp1,des1=orb.detectAndCompute(img1,None)
kp2,des2=orb.detectAndCompute(img2,None)
bf=cv2.BFMatcher(cv2.NORM_HAMMING,crossCheck=True)
matches=bf.match(des1,des2)
matches=sorted(matches,key=lambdax:x.distance)
img3=cv2.drawMatches(img1,kp1,img2,kp2,matches[:40],img2,flags=2)
plt.imshow(img3),plt.show()
Let’snowexaminethiscodestepbystep.
Aftertheusualimports,weloadtwoimages(thequeryimageandthetrainingimage).
Notethatyouhaveprobablyseentheloadingofimageswithasecondparameterwiththevalueof0beingpassed.Thisisbecausecv2.imreadtakesasecondparameterthatcanbeoneofthefollowingflags:
IMREAD_ANYCOLOR=4
IMREAD_ANYDEPTH=2
IMREAD_COLOR=1
IMREAD_GRAYSCALE=0
IMREAD_LOAD_GDAL=8
IMREAD_UNCHANGED=-1
Asyoucansee,cv2.IMREAD_GRAYSCALEisequalto0,soyoucanpasstheflagitselforitsvalue;theyarethesamething.
Thisistheimagewe’veloaded:
Thisisanotherimagethatwe’veloaded:
Now,weproceedtocreatingtheORBfeaturedetectoranddescriptor:
orb=cv2.ORB_create()
kp1,des1=orb.detectAndCompute(img1,None)
kp2,des2=orb.detectAndCompute(img2,None)
InasimilarfashiontowhatwedidwithSIFTandSURF,wedetectandcomputethe
keypointsanddescriptorsforbothimages.
Thetheoryatthispointisprettysimple;iteratethroughthedescriptorsanddeterminewhethertheyareamatchornot,andthencalculatethequalityofthismatch(distance)andsortthematchessothatwecandisplaythetopnmatcheswithadegreeofconfidencethattheyare,infact,matchingfeaturesonbothimages.
BFMatcher,asdescribedinBrute-Forcematching,doesthisforus:
bf=cv2.BFMatcher(cv2.NORM_HAMMING,crossCheck=True)
matches=bf.match(des1,des2)
matches=sorted(matches,key=lambdax:x.distance)
Atthisstage,wealreadyhavealltheinformationweneed,butascomputervisionenthusiasts,weplacequiteabitofimportanceonvisuallyrepresentingdata,solet’sdrawthesematchesinamatplotlibchart:
img3=cv2.drawMatches(img1,kp1,img2,kp2,matches[:40],img2,flags=2)
plt.imshow(img3),plt.show()
Theresultisasfollows:
UsingK-NearestNeighborsmatchingThereareanumberofalgorithmsthatcanbeusedtodetectmatchessothatwecandrawthem.OneofthemisK-NearestNeighbors(KNN).Usingdifferentalgorithmsfordifferenttaskscanbereallybeneficial,becauseeachalgorithmhasstrengthsandweaknesses.Somemaybemoreaccuratethanothers,somemaybefasterorlesscomputationallyexpensive,soit’suptoyoutodecidewhichonetouse,dependingonthetaskathand.
Forexample,ifyouhavehardwareconstraints,youmaychooseanalgorithmthatislesscostly.Ifyou’redevelopingareal-timeapplication,youmaychoosethefastestalgorithm,regardlessofhowheavyitisontheprocessorormemoryusage.
Amongallthemachinelearningalgorithms,KNNisprobablythesimplest,andwhilethetheorybehinditisinteresting,itiswelloutofthescopeofthisbook.Instead,wewillsimplyshowyouhowtouseKNNinyourapplication,whichisnotverydifferentfromtheprecedingexample.
Crucially,thetwopointswherethescriptdifferstoswitchtoKNNareinthewaywecalculatematcheswiththeBrute-Forcematcher,andthewaywedrawthesematches.Theprecedingexample,whichhasbeeneditedtouseKNN,lookslikethis:
importnumpyasnp
importcv2
frommatplotlibimportpyplotasplt
img1=cv2.imread('images/manowar_logo.png',0)
img2=cv2.imread('images/manowar_single.jpg',0)
orb=cv2.ORB_create()
kp1,des1=orb.detectAndCompute(img1,None)
kp2,des2=orb.detectAndCompute(img2,None)
bf=cv2.BFMatcher(cv2.NORM_HAMMING,crossCheck=True)
matches=bf.knnMatch(des1,des2,k=2)
img3=cv2.drawMatchesKnn(img1,kp1,img2,kp2,matches,img2,flags=2)
plt.imshow(img3),plt.show()
Thefinalresultissomewhatsimilartothepreviousone,sowhatisthedifferencebetweenmatchandknnMatch?Thedifferenceisthatmatchreturnsbestmatches,whileKNNreturnskmatches,givingthedevelopertheoptiontofurthermanipulatethematchesobtainedwithknnMatch.
Forexample,youcoulditeratethroughthematchesandapplyaratiotestsothatyoucanfilteroutmatchesthatdonotsatisfyauser-definedcondition.
FLANN-basedmatchingFinally,wearegoingtotakealookatFastLibraryforApproximateNearestNeighbors(FLANN).TheofficialInternethomeofFLANNisathttp://www.cs.ubc.ca/research/flann/.
LikeORB,FLANNhasamorepermissivelicensethanSIFTorSURF,soyoucanfreelyuseitinyourproject.QuotingthewebsiteofFLANN,
“FLANNisalibraryforperformingfastapproximatenearestneighborsearchesinhighdimensionalspaces.Itcontainsacollectionofalgorithmswefoundtoworkbestfornearestneighborsearchandasystemforautomaticallychoosingthebestalgorithmandoptimumparametersdependingonthedataset.
FLANNiswritteninC++andcontainsbindingsforthefollowinglanguages:C,MATLABandPython.”
Inotherwords,FLANNpossessesaninternalmechanismthatattemptsatemployingthebestalgorithmtoprocessadatasetdependingonthedataitself.FLANNhasbeenproventobe10timestimesfasterthanothernearestneighborssearchsoftware.
FLANNisevenavailableonGitHubathttps://github.com/mariusmuja/flann.Inmyexperience,I’vefoundFLANN-basedmatchingtobeveryaccurateandfastaswellasfriendlytouse.
Let’slookatanexampleofFLANN-basedfeaturematching:
importnumpyasnp
importcv2
frommatplotlibimportpyplotasplt
queryImage=cv2.imread('images/bathory_album.jpg',0)
trainingImage=cv2.imread('images/vinyls.jpg',0)
#createSIFTanddetect/compute
sift=cv2.xfeatures2d.SIFT_create()
kp1,des1=sift.detectAndCompute(queryImage,None)
kp2,des2=sift.detectAndCompute(trainingImage,None)
#FLANNmatcherparameters
FLANN_INDEX_KDTREE=0
indexParams=dict(algorithm=FLANN_INDEX_KDTREE,trees=5)
searchParams=dict(checks=50)#orpassemptydictionary
flann=cv2.FlannBasedMatcher(indexParams,searchParams)
matches=flann.knnMatch(des1,des2,k=2)
#prepareanemptymasktodrawgoodmatches
matchesMask=[[0,0]foriinxrange(len(matches))]
#DavidG.Lowe'sratiotest,populatethemask
fori,(m,n)inenumerate(matches):
ifm.distance<0.7*n.distance:
matchesMask[i]=[1,0]
drawParams=dict(matchColor=(0,255,0),
singlePointColor=(255,0,0),
matchesMask=matchesMask,
flags=0)
resultImage=
cv2.drawMatchesKnn(queryImage,kp1,trainingImage,kp2,matches,None,**drawPara
ms)
plt.imshow(resultImage,),plt.show()
Somepartsoftheprecedingscriptwillbefamiliartoyouatthisstage(importofmodules,imageloading,andcreationofaSIFTobject).
NoteTheinterestingpartisthedeclarationoftheFLANNmatcher,whichfollowsthedocumentationathttp://www.cs.ubc.ca/~mariusm/uploads/FLANN/flann_manual-1.6.pdf.
WefindthattheFLANNmatchertakestwoparameters:anindexParamsobjectandasearchParamsobject.Theseparameters,passedinadictionaryforminPython(andastructinC++),determinethebehavioroftheindexandsearchobjectsusedinternallybyFLANNtocomputethematches.
Inthiscase,wecouldhavechosenbetweenLinearIndex,KTreeIndex,KMeansIndex,CompositeIndex,andAutotuneIndex,andwechoseKTreeIndex.Why?Thisisbecauseit’sasimpleenoughindextoconfigure(onlyrequirestheusertospecifythenumberofkerneldensitytreestobeprocessed;agoodvalueisbetween1and16)andcleverenough(thekd-treesareprocessedinparallel).ThesearchParamsdictionaryonlycontainsonefield(checks)thatspecifiesthenumberoftimesanindextreeshouldbetraversed.Thehigherthevalue,thelongerittakestocomputethematching,butitwillalsobemoreaccurate.
Inreality,alotdependsontheinputthatyoufeedtheprogramwith.I’vefoundthat5kd-treesand50checksalwaysyieldarespectablyaccurateresult,whileonlytakingashorttimetocomplete.
AfterthecreationoftheFLANNmatcherandhavingcreatedthematchesarray,matchesarethenfilteredaccordingtothetestdescribedbyLoweinhispaper,DistinctiveImageFeaturesfromScale-InvariantKeypoints,availableathttps://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf.
Initschapter,Applicationtoobjectrecognition,Loweexplainsthatnotallmatchesare“good”ones,andthatfilteringaccordingtoanarbitrarythresholdwouldnotyieldgoodresultsallthetime.Instead,Dr.Loweexplains,
“Theprobabilitythatamatchiscorrectcanbedeterminedbytakingtheratioofdistancefromtheclosestneighbortothedistanceofthesecondclosest.”
Intheprecedingexample,discardinganyvaluewithadistancegreaterthan0.7willresultinjustafewgoodmatchesbeingfilteredout,whilegettingridofaround90percentoffalsematches.
Let’sunveiltheresultofapracticalexampleofFLANN.ThisisthequeryimagethatI’vefedthescript:
Thisisthetrainingimage:
Here,youmaynoticethattheimagecontainsthequeryimageatposition(5,3)ofthisgrid.
ThisistheFLANNprocessedresult:
Aperfectmatch!!
FLANNmatchingwithhomographyFirstofall,whatishomography?Let’sreadadefinitionfromtheInternet:
“Arelationbetweentwofigures,suchthattoanypointoftheonecorrespondsoneandbutonepointintheother,andviseversa.Thus,atangentlinerollingonacirclecutstwofixedtangentsofthecircleintwosetsofpointsthatarehomographic.”
Ifyou—likeme—arenonethewiserfromtheprecedingdefinition,youwillprobablyfindthisexplanationabitclearer:homographyisaconditioninwhichtwofiguresfindeachotherwhenoneisaperspectivedistortionoftheother.
Unlikeallthepreviousexamples,let’sfirsttakealookatwhatwewanttoachievesothatwecanfullyunderstandwhathomographyis.Then,we’llgothroughthecode.Here’sthefinalresult:
Asyoucanseefromthescreenshot,wetookasubjectontheleft,correctlyidentifiedintheimageontheright-handside,drewmatchinglinesbetweenkeypoints,andevendrewawhitebordershowingtheperspectivedeformationoftheseedsubjectintheright-handsideoftheimage:
importnumpyasnp
importcv2
frommatplotlibimportpyplotasplt
MIN_MATCH_COUNT=10
img1=cv2.imread('images/bb.jpg',0)
img2=cv2.imread('images/color2_small.jpg',0)
sift=cv2.xfeatures2d.SIFT_create()
kp1,des1=sift.detectAndCompute(img1,None)
kp2,des2=sift.detectAndCompute(img2,None)
FLANN_INDEX_KDTREE=0
index_params=dict(algorithm=FLANN_INDEX_KDTREE,trees=5)
search_params=dict(checks=50)
flann=cv2.FlannBasedMatcher(index_params,search_params)
matches=flann.knnMatch(des1,des2,k=2)
#storeallthegoodmatchesasperLowe'sratiotest.
good=[]
form,ninmatches:
ifm.distance<0.7*n.distance:
good.append(m)
iflen(good)>MIN_MATCH_COUNT:
src_pts=np.float32([kp1[m.queryIdx].ptformingood
]).reshape(-1,1,2)
dst_pts=np.float32([kp2[m.trainIdx].ptformingood
]).reshape(-1,1,2)
M,mask=cv2.findHomography(src_pts,dst_pts,cv2.RANSAC,5.0)
matchesMask=mask.ravel().tolist()
h,w=img1.shape
pts=np.float32([[0,0],[0,h-1],[w-1,h-1],[w-1,0]]).reshape(-1,1,2)
dst=cv2.perspectiveTransform(pts,M)
img2=cv2.polylines(img2,[np.int32(dst)],True,255,3,cv2.LINE_AA)
else:
print"Notenoughmatchesarefound-%d/%d"%
(len(good),MIN_MATCH_COUNT)
matchesMask=None
draw_params=dict(matchColor=(0,255,0),#drawmatchesingreencolor
singlePointColor=None,
matchesMask=matchesMask,#drawonlyinliers
flags=2)
img3=cv2.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)
plt.imshow(img3,'gray'),plt.show()
WhencomparedtothepreviousFLANN-basedmatchingexample,theonlydifference(andthisiswherealltheactionhappens)isintheifblock.
Here’swhathappensinthiscodestepbystep:firstly,wemakesurethatwehaveatleastacertainnumberofgoodmatches(theminimumrequiredtocomputeahomographyisfour),whichwewillarbitrarilysetat10(inreallife,youwouldprobablyuseahighervaluethanthis):
iflen(good)>MIN_MATCH_COUNT:
Then,wefindthekeypointsintheoriginalimageandthetrainingimage:
src_pts=np.float32([kp1[m.queryIdx].ptformingood]).reshape(-1,1,2)
dst_pts=np.float32([kp2[m.trainIdx].ptformingood]).reshape(-1,1,2)
Now,wefindthehomography:
M,mask=cv2.findHomography(src_pts,dst_pts,cv2.RANSAC,5.0)
matchesMask=mask.ravel().tolist()
NotethatwecreatematchesMask,whichwillbeusedinthefinaldrawingofthematchessothatonlypointslyingwithinthehomographywillhavematchinglinesdrawn.
Atthisstage,wesimplyhavetocalculatetheperspectivedistortionoftheoriginalobjectintothesecondpicturesothatwecandrawtheborder:
h,w=img1.shape
pts=np.float32([[0,0],[0,h-1],[w-1,h-1],[w-1,0]]).reshape(-1,1,2)
dst=cv2.perspectiveTransform(pts,M)
img2=cv2.polylines(img2,[np.int32(dst)],True,255,3,cv2.LINE_AA)
Andwethenproceedtodrawasperallourpreviousexamples.
Asampleapplication–tattooforensicsLet’sconcludethischapterwithareal-life(orkindof)example.Imaginethatyou’reworkingfortheGothamforensicsdepartmentandyouneedtoidentifyatattoo.Youhavetheoriginalpictureofthetattoo(imaginethiscomingfromaCCTVfootage)belongingtoacriminal,butyoudon’tknowtheidentityoftheperson.However,youpossessadatabaseoftattoos,indexedwiththenameofthepersontowhomthetattoobelongs.
So,let’sdividethetaskintwoparts:saveimagedescriptorstofilesfirst,andthen,scantheseformatchesagainstthepictureweareusingasaqueryimage.
SavingimagedescriptorstofileThefirstthingwewilldoissaveimagedescriptorstoanexternalfile.Thisissothatwedon’thavetorecreatethedescriptorseverytimewewanttoscantwoimagesformatchesandhomography.
Inourapplication,wewillscanafolderforimagesandcreatethecorrespondingdescriptorfilessothatwehavethemreadilyavailableforfuturesearches.
Tocreatedescriptorsandsavethemtoafile,wewilluseaprocesswehaveusedanumberoftimesinthischapter,namelyloadanimage,createafeaturedetector,detect,andcompute:
#generate_descriptors.py
importcv2
importnumpyasnp
fromosimportwalk
fromos.pathimportjoin
importsys
defcreate_descriptors(folder):
files=[]
for(dirpath,dirnames,filenames)inwalk(folder):
files.extend(filenames)
forfinfiles:
save_descriptor(folder,f,cv2.xfeatures2d.SIFT_create())
defsave_descriptor(folder,image_path,feature_detector):
img=cv2.imread(join(folder,image_path),0)
keypoints,descriptors=feature_detector.detectAndCompute(img,None)
descriptor_file=image_path.replace("jpg","npy")
np.save(join(folder,descriptor_file),descriptors)
dir=sys.argv[1]
create_descriptors(dir)
Inthisscript,wepassthefoldernamewhereallourimagesarecontained,andthencreateallthedescriptorfilesinthesamefolder.
NumPyhasaveryhandysave()utility,whichdumpsarraydataintoafileinanoptimizedway.Togeneratethedescriptorsinthefoldercontainingyourscript,runthis
command:
>pythongenerate_descriptors.py<foldercontainingimages>
NotethatcPickle/picklearemorepopularlibrariesforPythonobjectserialization.However,inthisparticularcontext,wearetryingtolimitourselvestotheusageofOpenCVandPythonwithNumPyandSciPy.
ScanningformatchesNowthatwehavedescriptorssavedtofiles,allweneedtodoistorepeatthehomographyprocessonallthedescriptorsandfindapotentialmatchtoourqueryimage.
Thisistheprocesswewillputinplace:
Loadingaqueryimageandcreatingadescriptorforit(tattoo_seed.jpg)ScanningthefolderwithdescriptorsForeachdescriptor,computingaFLANN-basedmatchIfthenumberofmatchesisbeyondanarbitrarythreshold,includingthefileofpotentialculprits(rememberwe’reinvestigatingacrime)Ofalltheculprits,electingtheonewiththehighestnumberofmatchesasthepotentialsuspect
Let’sinspectthecodetoachievethis:
fromos.pathimportjoin
fromosimportwalk
importnumpyasnp
importcv2
fromsysimportargv
#createanarrayoffilenames
folder=argv[1]
query=cv2.imread(join(folder,"tattoo_seed.jpg"),0)
#createfiles,images,descriptorsglobals
files=[]
images=[]
descriptors=[]
for(dirpath,dirnames,filenames)inwalk(folder):
files.extend(filenames)
forfinfiles:
iff.endswith("npy")andf!="tattoo_seed.npy":
descriptors.append(f)
printdescriptors
#createthesiftdetector
sift=cv2.xfeatures2d.SIFT_create()
query_kp,query_ds=sift.detectAndCompute(query,None)
#createFLANNmatcher
FLANN_INDEX_KDTREE=0
index_params=dict(algorithm=FLANN_INDEX_KDTREE,trees=5)
search_params=dict(checks=50)
flann=cv2.FlannBasedMatcher(index_params,search_params)
#minimumnumberofmatches
MIN_MATCH_COUNT=10
potential_culprits={}
print">>Initiatingpicturescan…"
fordindescriptors:
print"---------analyzing%sformatches------------"%d
matches=flann.knnMatch(query_ds,np.load(join(folder,d)),k=2)
good=[]
form,ninmatches:
ifm.distance<0.7*n.distance:
good.append(m)
iflen(good)>MIN_MATCH_COUNT:
print"%sisamatch!(%d)"%(d,len(good))
else:
print"%sisnotamatch"%d
potential_culprits[d]=len(good)
max_matches=None
potential_suspect=None
forculprit,matchesinpotential_culprits.iteritems():
ifmax_matches==Noneormatches>max_matches:
max_matches=matches
potential_suspect=culprit
print"potentialsuspectis%s"%potential_suspect.replace("npy",
"").upper()
Isavedthisscriptasscan_for_matches.py.Theonlyelementofnoveltyinthisscriptistheuseofnumpy.load(filename),whichloadsannpyfileintoannparray.
Runningthescriptproducesthefollowingoutput:
>>Initiatingpicturescan…
---------analyzingposion-ivy.npyformatches------------
posion-ivy.npyisnotamatch
---------analyzingbane.npyformatches------------
bane.npyisnotamatch
---------analyzingtwo-face.npyformatches------------
two-face.npyisnotamatch
---------analyzingriddler.npyformatches------------
riddler.npyisnotamatch
---------analyzingpenguin.npyformatches------------
penguin.npyisnotamatch
---------analyzingdr-hurt.npyformatches------------
dr-hurt.npyisamatch!(298)
---------analyzinghush.npyformatches------------
hush.npyisamatch!(301)
potentialsuspectisHUSH.
Ifweweretorepresentthisgraphically,thisiswhatwewouldsee:
SummaryInthischapter,welearnedaboutdetectingfeaturesinimagesandextractingthemintodescriptors.WeexploredanumberofalgorithmsavailableinOpenCVtoaccomplishthistask,andthenappliedthemtoreal-lifescenariostounderstandareal-worldapplicationoftheconceptsweexplored.
Wearenowfamiliarwiththeconceptofdetectingfeaturesinanimage(oravideoframe),whichisagoodfoundationforthenextchapter.
Chapter7.DetectingandRecognizingObjectsThischapterwillintroducetheconceptofdetectingandrecognizingobjects,whichisoneofthemostcommonchallengesincomputervision.You’vecomethisfarinthebook,soatthisstage,you’rewonderinghowfarareyoufrommountingacomputerinyourcarthatwillgiveyouinformationaboutcarsandpeoplesurroundingyouthroughtheuseofacamera.Well,You’renottoofarfromyourgoal,actually.
Inthischapter,wewillexpandontheconceptofobjectdetection,whichweinitiallyexploredwhentalkingaboutrecognizingfaces,andadaptittoallsortsofreal-lifeobjects,notjustfaces.
ObjectdetectionandrecognitiontechniquesWemadeadistinctioninChapter5,DetectingandRecognizingFaces,whichwe’llreiterateforclarity:detectinganobjectistheabilityofaprogramtodetermineifacertainregionofanimagecontainsanunidentifiedobject,andrecognizingistheabilityofaprogramtoidentifythisobject.Recognizingnormallyonlyoccursinareasofinterestwhereanobjecthasbeendetected,forexample,wehaveattemptedtorecognizefacesontheareasofanimagethatcontainedafaceinthefirstplace.
Whenitcomestorecognizinganddetectingobjects,thereareanumberoftechniquesusedincomputervision,whichwe’llbeexamining:
HistogramofOrientedGradientsImagepyramidsSlidingwindows
Unlikefeaturedetectionalgorithms,thesearenotmutuallyexclusivetechniques,rather,theyarecomplimentary.YoucanperformaHistogramofOrientedGradients(HOG)whileapplyingtheslidingwindowstechnique.
So,let’stakealookatHOGfirstandunderstandwhatitis.
HOGdescriptorsHOGisafeaturedescriptor,soitbelongstothesamefamilyofalgorithms,suchasSIFT,SURF,andORB.
Itisusedinimageandvideoprocessingtodetectobjects.Itsinternalmechanismisreallyclever;animageisdividedintoportionsandagradientforeachportioniscalculated.We’veobservedasimilarapproachwhenwetalkedaboutfacerecognitionthroughLBPH.
HOG,however,calculateshistogramsthatarenotbasedoncolorvalues,rather,theyarebasedongradients.AsHOGisafeaturedescriptor,itiscapableofdeliveringthetypeofinformationthatisvitalforfeaturematchingandobjectdetection/recognition.
BeforedivingintothetechnicaldetailsofhowHOGworks,let’sfirsttakealookathowHOGseestheworld;hereisanimageofatruck:
ThisisitsHOGversion:
Youcaneasilyrecognizethewheelsandthemainstructureofthevehicle.So,whatisHOGseeing?Firstofall,youcanseehowtheimageisdividedintocells;theseare16x16pixelscells.Eachcellcontainsavisualrepresentationofthecalculatedgradientsofcolorineightdirections(N,NW,W,SW,S,SE,E,andNE).
Theseeightvaluescontainedineachcellarethefamoushistograms.Therefore,asinglecellgetsauniquesignature,whichyoucanmentallyvisualizetobesomewhatlikethis:
Theextrapolationofhistogramsintodescriptorsisquiteacomplexprocess.First,localhistogramsforeachcellarecalculated.Thecellsaregroupedintolargerregionscalledblocks.Theseblockscanbemadeofanynumberofcells,butDalalandTriggsfoundthat2x2cellblocksyieldedthebestresultswhenperformingpeopledetection.Ablock-widevectoriscreatedsothatitcanbenormalized,accountingforvariationsinilluminationandshadowing(asinglecellistoosmallaregiontodetectsuchvariations).Thisimprovestheaccuracyofdetectionasitreducestheilluminationandshadowingdifferencebetweenthesampleandtheblockbeingexamined.
Simplycomparingcellsintwoimageswouldnotworkunlesstheimagesareidentical(bothintermsofsizeanddata).
Therearetwomainproblemstoresolve:
LocationScale
ThescaleissueImagine,forexample,ifyoursamplewasadetail(say,abike)extrapolatedfromalargerimage,andyou’retryingtocomparethetwopictures.Youwouldnotobtainthesamegradientsignaturesandthedetectionwouldfail(eventhoughthebikeisinbothpictures).
ThelocationissueOncewe’veresolvedthescaleproblem,wehaveanotherobstacleinourpath:apotentiallydetectableobjectcanbeanywhereintheimage,soweneedtoscantheentireimageinportionstomakesurewecanidentifyareasofinterest,andwithintheseareas,trytodetectobjects.Evenifasampleimageandobjectintheimageareofidenticalsize,thereneedstobeawaytoinstructOpenCVtolocatethisobject.So,therestoftheimageisdiscardedandacomparisonismadeonpotentiallymatchingregions.
Toobviatetheseproblems,weneedtofamiliarizeourselveswiththeconceptsofimagepyramidandslidingwindows.
Imagepyramid
Manyofthealgorithmsusedincomputervisionutilizeaconceptcalledpyramid.
Animagepyramidisamultiscalerepresentationofanimage.Thisdiagramshouldhelpyouunderstandthisconcept:
Amultiscalerepresentationofanimage,oranimagepyramid,helpsyouresolvetheproblemofdetectingobjectsatdifferentscales.Theimportanceofthisconceptiseasilyexplainedthroughreal-lifehardfacts,suchasitisextremelyunlikelythatanobjectwillappearinanimageattheexactscaleitappearedinoursampleimage.
Moreover,youwilllearnthatobjectclassifiers(utilitiesthatallowyoutodetectobjectsinOpenCV)needtraining,andthistrainingisprovidedthroughimagedatabasesmadeupofpositivematchesandnegativematches.Amongthepositives,itisagainunlikelythattheobjectwewanttoidentifywillappearinthesamescalethroughoutthetrainingdataset.
We’vegotit,Joe.Weneedtotakescaleoutoftheequation,sonowlet’sexaminehowanimagepyramidisbuilt.
Animagepyramidisbuiltthroughthefollowingprocess:
1. Takeanimage.2. Resize(smaller)theimageusinganarbitraryscaleparameter.3. Smoothentheimage(usingGaussianblurring).4. Iftheimageislargerthananarbitraryminimumsize,repeattheprocessfromstep1.
Despiteexploringimagepyramids,scaleratio,andminimumsizesonlyatthisstageofthebook,you’vealreadydealtwiththem.IfyourecallChapter5,DetectingandRecognizingFaces,weusedthedetectMultiScalemethodoftheCascadeClassifierobject.
Straightaway,detectMultiScaledoesn’tsoundsoobscureanymore;infact,ithas
becomeself-explanatory.Thecascadeclassifierobjectattemptsatdetectinganobjectatdifferentscalesofaninputimage.ThesecondpieceofinformationthatshouldbecomemuchcleareristhescaleFactorparameterofthedetectMultiScale()method.Thisparameterrepresentstheratioatwhichtheimagewillberesampledtoasmallersizeateachstepofthepyramid.
ThesmallerthescaleFactorparameter,themorelayersinthepyramid,andtheslowerandmorecomputationallyintensivetheoperationwillbe,although—toanextent—moreaccurateinresults.
So,bynow,youshouldhaveanunderstandingofwhatanimagepyramidis,andwhyitisusedincomputervision.Let’snowmoveontoslidingwindows.
Slidingwindows
Slidingwindowsisatechniqueusedincomputervisionthatconsistsofexaminingtheshiftingportionsofanimage(slidingwindows)andoperatingdetectiononthoseusingimagepyramids.Thisisdonesothatanobjectcanbedetectedatamultiscalelevel.
Slidingwindowsresolveslocationissuesbyscanningsmallerregionsofalargerimage,andthenrepeatingthescanningondifferentscalesofthesameimage.
Withthistechnique,eachimageisdecomposedintoportions,whichallowsdiscardingportionsthatareunlikelytocontainobjects,whiletheremainingportionsareclassified.
Thereisoneproblemthatemergeswiththisapproach,though:overlappingregions.
Let’sexpandalittlebitonthisconcepttoclarifythenatureoftheproblem.Say,you’reoperatingfacedetectiononanimageandareusingslidingwindows.
Eachwindowslidesoffafewpixelsatatime,whichmeansthataslidingwindowhappenstobeapositivematchforthesamefaceinfourdifferentpositions.Naturally,wedon’twanttoreportfourmatches,ratheronlyone;furthermore,we’renotinterestedintheportionoftheimagewithagoodscore,butsimplyintheportionwiththehighestscore.
Here’swherenon-maximumsuppressioncomesintoplay:givenasetofoverlappingregions,wecansuppressalltheregionsthatarenotclassifiedwiththemaximumscore.
Non-maximum(ornon-maxima)suppressionNon-maximum(ornon-maxima)suppressionisatechniquethatsuppressesalltheresultsthatrelatetothesameareaofanimage,whicharenotthemaximumscoreforaparticulararea.Thisisbecausesimilarlycolocatedwindowstendtohavehigherscoresandoverlappingareasaresignificant,butweareonlyinterestedinthewindowwiththebestresult,anddiscardingoverlappingwindowswithlowerscores.
Whenexamininganimagewithslidingwindows,youwanttomakesuretoretainthebestwindowofabunchofwindows,alloverlappingaroundthesamesubject.
Todothis,youdeterminethatallthewindowswithmorethanathreshold,x,incommonwillbethrownintothenon-maximumsuppressionoperation.
Thisisquitecomplex,butit’salsonottheendofthisprocess.Remembertheimage
pyramid?We’rescanningtheimageatsmallerscalesiterativelytomakesuretodetectobjectsindifferentscales.
Thismeansthatyouwillobtainaseriesofwindowsatdifferentscales,then,computethesizeofawindowobtainedinasmallerscaleasifitweredetectedintheoriginalscale,and,finally,throwthiswindowintotheoriginalmix.
Itdoessoundabitcomplex.Thankfully,we’renotthefirsttocomeacrossthisproblem,whichhasbeenresolvedinseveralways.ThefastestalgorithminmyexperiencewasimplementedbyDr.TomaszMalisiewiczathttp://www.computervisionblog.com/2011/08/blazing-fast-nmsm-from-exemplar-svm.html.TheexampleisinMATLAB,butintheapplicationexample,wewillobviouslyuseaPythonversionofit.
Thegeneralapproachbehindnon-maximumsuppressionisasfollows:
1. Onceanimagepyramidhasbeenconstructed,scantheimagewiththeslidingwindowapproachforobjectdetection.
2. Collectallthecurrentwindowsthathavereturnedapositiveresult(beyondacertainarbitrarythreshold),andtakeawindow,W,withthehighestresponse.
3. EliminateallwindowsthatoverlapWsignificantly.4. Moveontothenextwindowwiththehighestresponseandrepeattheprocessforthe
currentscale.
Whenthisprocessiscomplete,moveupthenextscaleintheimagepyramidandrepeattheprecedingprocess.Tomakesurewindowsarecorrectlyrepresentedattheendoftheentirenon-maximumsuppressionprocess,besuretocomputethewindowsizeinrelationtotheoriginalsizeoftheimage(forexample,ifyoudetectawindowat50percentscaleoftheoriginalsizeinthepyramid,thedetectedwindowwillactuallybefourtimesthesizeintheoriginalimage).
Attheendofthisprocess,youwillhaveasetofmaximumscoredwindows.Optionally,youcancheckforwindowsthatareentirelycontainedinotherwindows(likewedidforthepeopledetectionprocessatthebeginningofthechapter)andeliminatethose.
Now,howdowedeterminethescoreofawindow?Weneedaclassificationsystemthatdetermineswhetheracertainfeatureispresentornotandaconfidencescoreforthisclassification.Thisiswheresupportvectormachines(SVM)comesintoplay.
SupportvectormachinesExplainingindetailwhatanSVMisanddoesisbeyondthescopeofthisbook,butsufficeittosay,SVMisanalgorithmthat—givenlabeledtrainingdata–enablestheclassificationofthisdatabyoutputtinganoptimalhyperplane,which,inplainEnglish,istheoptimalplanethatdividesdifferentlyclassifieddata.Avisualrepresentationwillhelpyouunderstandthis:
Whyisitsohelpfulincomputervisionandobjectdetectioninparticular?Thisisduetothefactthatfindingtheoptimaldivisionlinebetweenpixelsthatbelongtoanobjectandthosethatdon’tisavitalcomponentofobjectdetection.
TheSVMmodelhasbeenaroundsincetheearly1960s;however,thecurrentformofitsimplementationoriginatesina1995paperbyCorinnaCortesandVadimirVapnik,whichisavailableathttp://link.springer.com/article/10.1007/BF00994018.
Nowthatwehaveagoodunderstandingoftheconceptsinvolvedinobjectdetection,wecanstartlookingatafewexamples.Wewillstartfrombuilt-infunctionsandevolveintotrainingourowncustomobjectdetectors.
PeopledetectionOpenCVcomeswithHOGDescriptorthatperformspeopledetection.
Here’saprettystraightforwardexample:
importcv2
importnumpyasnp
defis_inside(o,i):
ox,oy,ow,oh=o
ix,iy,iw,ih=i
returnox>ixandoy>iyandox+ow<ix+iwandoy+oh<iy+ih
defdraw_person(image,person):
x,y,w,h=person
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,255),2)
img=cv2.imread("../images/people.jpg")
hog=cv2.HOGDescriptor()
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
found,w=hog.detectMultiScale(img)
found_filtered=[]
forri,rinenumerate(found):
forqi,qinenumerate(found):
ifri!=qiandis_inside(r,q):
break
else:
found_filtered.append(r)
forpersoninfound_filtered:
draw_person(img,person)
cv2.imshow("peopledetection",img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Aftertheusualimports,wedefinetwoverysimplefunctions:is_insideanddraw_person,whichperformtwominimaltasks,namely,determiningwhetherarectangleisfullycontainedinanotherrectangle,anddrawingrectanglesarounddetectedpeople.
WethenloadtheimageandcreateHOGDescriptorthroughaverysimpleandself-explanatorycode:
cv2.HOGDescriptor()
Afterthis,wespecifythatHOGDescriptorwilluseadefaultpeopledetector.
ThisisdonethroughthesetSVMDetector()method,which—afterourintroductiontoSVM—soundslessobscurethanitmayhaveifwehadn’tintroducedSVMs.
Next,weapplydetectMultiScaleontheloadedimage.Interestingly,unlikeallthefacedetectionalgorithms,wedon’tneedtoconverttheoriginalimagetograyscalebefore
applyinganyformofobjectdetection.
Thedetectionmethodwillreturnanarrayofrectangles,whichwouldbeagoodenoughsourceofinformationforustostartdrawingshapesontheimage.Ifwedidthis,however,youwouldnoticesomethingstrange:someoftherectanglesareentirelycontainedinotherrectangles.Thisclearlyindicatesanerrorindetection,andwecansafelyassumethatarectangleentirelyinsideanotheronecanbediscarded.
Thisispreciselythereasonwhywedefinedanis_insidefunction,andwhyweiteratethroughtheresultofthedetectiontodiscardfalsepositives.
Ifyourunthescriptyourself,youwillseerectanglesaroundpeopleintheimage.
CreatingandtraininganobjectdetectorUsingbuilt-infeaturesmakesiteasytocomeupwithaquickprototypeforanapplication,andwe’reallverygratefultotheOpenCVdevelopersformakinggreatfeatures,suchasfacedetectionorpeopledetectionreadilyavailable(truly,weare).
However,whetheryouareahobbyistoracomputervisionprofessional,it’sunlikelythatyouwillonlydealwithpeopleandfaces.
Moreover,ifyou’relikeme,youwonderhowthepeopledetectorfeaturewascreatedinthefirstplaceandifyoucanimproveit.Furthermore,youmayalsowonderwhetheryoucanapplythesameconceptstodetectthemostdiversetypeofobjects,rangingfromcarstogoblins.
Inanenterpriseenvironment,youmayhavetodealwithveryspecificdetection,suchasregistrationplates,bookcovers,orwhateveryourcompanymaydealwith.
So,thequestionis,howdowecomeupwithourownclassifiers?
TheanswerliesinSVMandbag-of-wordstechnique.
We’vealreadytalkedaboutHOGandSVM,solet’stakeacloserlookatbag-of-words.
Bag-of-wordsBag-of-words(BOW)isaconceptthatwasnotinitiallyintendedforcomputervision,rather,weuseanevolvedversionofthisconceptinthecontextofcomputervision.So,let’sfirsttalkaboutitsbasicversion,which—asyoumayhaveguessed—originallybelongstothefieldoflanguageanalysisandinformationretrieval.
BOWisthetechniquebywhichweassignacountweighttoeachwordinaseriesofdocuments;wethenrerepresentthesedocumentswithvectorsthatrepresentthesesetofcounts.Let’slookatanexample:
Document1:IlikeOpenCVandIlikePythonDocument2:IlikeC++andPythonDocument3:Idon'tlikeartichokes
Thesethreedocumentsallowustobuildadictionary(orcodebook)withthesevalues:
{
I:4,
like:4,
OpenCV:2,
and:2,
Python:2,
C++:1,
dont:1,
artichokes:1
}
Wehaveeightentries.Let’snowrerepresenttheoriginaldocumentsusingeight-entryvectors,eachvectorcontainingallthewordsinthedictionarywithvaluesrepresentingthe
countforeachterminthedocument.Thevectorrepresentationoftheprecedingthreesentencesisasfollows:
[2,2,1,1,1,0,0,0]
[1,1,0,1,1,1,0,0]
[1,1,0,0,0,0,1,1]
Thiskindofrepresentationofdocumentshasmanyeffectiveapplicationsintherealworld,suchasspamfiltering.
Thesevectorscanbeconceptualizedasahistogramrepresentationofdocumentsorasafeature(thesamewayweextractedfeaturesfromimagesinpreviouschapters),whichcanbeusedtotrainclassifiers.
NowthatwehaveagraspofthebasicconceptofBOWorbagofvisualwords(BOVW)incomputervision,let’sseehowthisappliestotheworldofcomputervision.
BOWincomputervisionWearebynowfamiliarwiththeconceptofimagefeatures.We’veusedfeatureextractors,suchasSIFT,andSURF,toextractfeaturesfromimagessothatwecouldmatchthesefeaturesinanotherimage.
We’vealsofamiliarizedourselveswiththeconceptofcodebook,andweknowaboutSVM,amodelthatcanbefedasetoffeaturesandutilizescomplexalgorithmstoclassifytraindata,andcanpredicttheclassificationofnewdata.
So,theimplementationofaBOWapproachwillinvolvethefollowingsteps:
1. Takeasampledataset.2. Foreachimageinthedataset,extractdescriptors(withSIFT,SURF,andsoon).3. AddeachdescriptortotheBOWtrainer.4. Clusterthedescriptorstokclusters(okay,thissoundsobscure,butbearwithme)
whosecenters(centroids)areourvisualwords.
Atthispoint,wehaveadictionaryofvisualwordsreadytobeused.Asyoucanimagine,alargedatasetwillhelpmakeourdictionaryricherinvisualwords.Uptoanextent,themorewords,thebetter!
Afterthis,wearereadytotestourclassifierandattemptdetection.Thegoodnewsisthattheprocessisverysimilartotheoneoutlinedpreviously:givenatestimage,wecanextractfeaturesandquantizethembasedontheirdistancetothenearestcentroidtoformahistogram.
Basedonthis,wecanattempttorecognizevisualwordsandlocatethemintheimage.Here’savisualrepresentationoftheBOWprocess:
Thisisthepointinthechapterwhenyouhavebuiltanappetiteforapracticalexample,andarerearingtocode.However,beforeproceeding,Ifeelthataquickdigressionintothetheoryofthek-meansclusteringisnecessarysothatyoucanfullyunderstandhowvisualwordsarecreated,andgainabetterunderstandingoftheprocessofobjectdetectionusingBOWandSVM.
Thek-meansclustering
Thek-meansclusteringisamethodofvectorquantizationtoperformdataanalysis.Givenadataset,krepresentsthenumberofclustersinwhichthedatasetisgoingtobedivided.Theterm“means”referstothemathematicalconceptofmean,whichisprettybasic,butforthesakeofclarity,it’swhatpeoplecommonlyrefertoasaverage;whenvisuallyrepresented,themeanofaclusterisitscentroidorthegeometricalcenterofpointsinthecluster.
NoteClusteringreferstothegroupingofpointsinadatasetintoclusters.
OneoftheclasseswewillbeusingtoperformobjectdetectioniscalledBagOfWordsKMeansTrainer;bynow,youshouldabletodeducewhattheresponsibilityofthisclassistocreate:
”kmeans()-basedclasstotrainavisualvocabularyusingthebag-of-wordsapproach”
ThisisaspertheOpenCVdocumentation.
Here’sarepresentationofak-meansclusteringoperationwithfiveclusters:
Afterthislongtheoreticalintroduction,wecanlookatanexample,andstarttrainingourobjectdetector.
DetectingcarsThereisnovirtuallimittothetypeofobjectsyoucandetectinyourimagesandvideos.However,toobtainanacceptablelevelofaccuracy,youneedasufficientlylargedataset,containingtrainimagesthatareidenticalinsize.
Thiswouldbeatimeconsumingoperationifweweretodoitallbyourselves(whichisentirelypossible).
Wecanavailofready-madedatasets;thereareanumberofthemfreelydownloadablefromvarioussources:
TheUniversityofIllinois:http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/CarData.tar.gzStanfordUniversity:http://ai.stanford.edu/~jkrause/cars/car_dataset.html
NoteNotethattrainingimagesandtestimagesareavailableinseparatefiles.
I’llbeusingtheUIUCdatasetinmyexample,butfeelfreetoexploretheInternetforothertypesofdatasets.
Now,let’stakealookatanexample:
importcv2
importnumpyasnp
fromos.pathimportjoin
datapath="/home/d3athmast3r/dev/python/CarData/TrainImages/"
defpath(cls,i):
return"%s/%s%d.pgm"%(datapath,cls,i+1)
pos,neg="pos-","neg-"
detect=cv2.xfeatures2d.SIFT_create()
extract=cv2.xfeatures2d.SIFT_create()
flann_params=dict(algorithm=1,trees=5)flann=
cv2.FlannBasedMatcher(flann_params,{})
bow_kmeans_trainer=cv2.BOWKMeansTrainer(40)
extract_bow=cv2.BOWImgDescriptorExtractor(extract,flann)
defextract_sift(fn):
im=cv2.imread(fn,0)
returnextract.compute(im,detect.detect(im))[1]
foriinrange(8):
bow_kmeans_trainer.add(extract_sift(path(pos,i)))
bow_kmeans_trainer.add(extract_sift(path(neg,i)))
voc=bow_kmeans_trainer.cluster()
extract_bow.setVocabulary(voc)
defbow_features(fn):
im=cv2.imread(fn,0)
returnextract_bow.compute(im,detect.detect(im))
traindata,trainlabels=[],[]
foriinrange(20):
traindata.extend(bow_features(path(pos,i)));trainlabels.append(1)
traindata.extend(bow_features(path(neg,i)));trainlabels.append(-1)
svm=cv2.ml.SVM_create()
svm.train(np.array(traindata),cv2.ml.ROW_SAMPLE,np.array(trainlabels))
defpredict(fn):
f=bow_features(fn);
p=svm.predict(f)
printfn,"\t",p[1][0][0]
returnp
car,notcar="/home/d3athmast3r/dev/python/study/images/car.jpg",
"/home/d3athmast3r/dev/python/study/images/bb.jpg"
car_img=cv2.imread(car)
notcar_img=cv2.imread(notcar)
car_predict=predict(car)
not_car_predict=predict(notcar)
font=cv2.FONT_HERSHEY_SIMPLEX
if(car_predict[1][0][0]==1.0):
cv2.putText(car_img,'CarDetected',(10,30),font,1,
(0,255,0),2,cv2.LINE_AA)
if(not_car_predict[1][0][0]==-1.0):
cv2.putText(notcar_img,'CarNotDetected',(10,30),font,1,(0,0,
255),2,cv2.LINE_AA)
cv2.imshow('BOW+SVMSuccess',car_img)
cv2.imshow('BOW+SVMFailure',notcar_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Whatdidwejustdo?Thisisquitealottoassimilate,solet’sgothroughwhatwe’vedone:
1. Firstofall,ourusualimportsarefollowedbythedeclarationofthebasepathofourtrainingimages.Thiswillcomeinhandytoavoidrewritingthebasepatheverytimeweprocessanimageinaparticularfolderonourcomputer.
2. Afterthis,wedeclareafunction,path:
defpath(cls,i):
return"%s/%s%d.pgm"%(datapath,cls,i+1)
pos,neg="pos-","neg-"
NoteMoreonthepathfunction
Thisfunctionisautilitymethod:giventhenameofaclass(inourcase,wehavetwoclasses,posandneg)andanumericalindex,wereturnthefullpathtoaparticulartestingimage.Ourcardatasetcontainsimagesnamedinthefollowingway:pos-x.pgmandneg-x.pgm,wherexisanumber.
Immediately,youwillfindtheusefulnessofthisfunctionwheniteratingthrougharangeofnumbers(say,20),whichwillallowyoutoloadallimagesfrompos-0.pgmtopos-20.pgm,andthesamegoesforthenegativeclass.
3. Nextup,we’llcreatetwoSIFTinstances:onetoextractkeypoints,theothertoextractfeatures:
detect=cv2.xfeatures2d.SIFT_create()
extract=cv2.xfeatures2d.SIFT_create()
4. WheneveryouseeSIFTinvolved,youcanbeprettysuresomefeaturematchingalgorithmwillbeinvolvedtoo.Inourcase,we’llcreateaninstanceforaFLANNmatcher:
flann_params=dict(algorithm=1,trees=5)flann=
cv2.FlannBasedMatcher(flann_params,{})
NoteNotethatcurrently,theenumvaluesforFLANNaremissingfromthePythonversionofOpenCV3,so,number1,whichispassedasthealgorithmparameter,representstheFLANN_INDEX_KDTREEalgorithm.Isuspectthefinalversionwillbecv2.FLANN_INDEX_KDTREE,whichisalittlemorehelpful.Makesuretochecktheenumvaluesforthecorrectflags.
5. Next,wementiontheBOWtrainer:
bow_kmeans_trainer=cv2.BOWKMeansTrainer(40)
6. ThisBOWtrainerutilizes40clusters.Afterthis,we’llinitializetheBOWextractor.
ThisistheBOWclassthatwillbefedavocabularyofvisualwordsandwilltrytodetecttheminthetestimage:
extract_bow=cv2.BOWImgDescriptorExtractor(extract,flann)
7. ToextracttheSIFTfeaturesfromanimage,webuildautilitymethod,whichtakesthepathtotheimage,readsitingrayscale,andreturnsthedescriptor:
defextract_sift(fn):
im=cv2.imread(fn,0)
returnextract.compute(im,detect.detect(im))[1]
Atthisstage,wehaveeverythingweneedtostarttrainingtheBOWtrainer.
1. Let’sreadeightimagesperclass(eightpositivesandeightnegatives)fromourdataset:
foriinrange(8):
bow_kmeans_trainer.add(extract_sift(path(pos,i)))
bow_kmeans_trainer.add(extract_sift(path(neg,i)))
2. Tocreatethevocabularyofvisualwords,we’llcallthecluster()methodonthetrainer,whichperformsthek-meansclassificationandreturnsthesaidvocabulary.We’llassignthisvocabularytoBOWImgDescriptorExtractorsothatitcanextractdescriptorsfromtestimages:
vocabulary=bow_kmeans_trainer.cluster()
extract_bow.setVocabulary(vocabulary)
3. Inlinewithotherutilityfunctionsdeclaredinthisscript,we’lldeclareafunctionthattakesthepathtoanimageandreturnsthedescriptorascomputedbytheBOWdescriptorextractor:
defbow_features(fn):
im=cv2.imread(fn,0)
returnextract_bow.compute(im,detect.detect(im))
4. Let’screatetwoarraystoaccommodatethetraindataandlabels,andpopulatethemwiththedescriptorsgeneratedbyBOWImgDescriptorExtractor,associatinglabelstothepositiveandnegativeimageswe’refeeding(1standsforapositivematch,-1foranegative):
traindata,trainlabels=[],[]
foriinrange(20):
traindata.extend(bow_features(path(pos,i)));trainlabels.append(1)
traindata.extend(bow_features(path(neg,i)));trainlabels.append(-1)
5. Now,let’screateaninstanceofanSVM:
svm=cv2.ml.SVM_create()
6. Then,trainitbywrappingthetraindataandlabelsintotheNumPyarrays:
svm.train(np.array(traindata),cv2.ml.ROW_SAMPLE,
np.array(trainlabels))
We’reallsetwithatrainedSVM;allthatislefttodoistofeedtheSVMacoupleofsampleimagesandseehowitbehaves.
1. Let’sfirstdefineanotherutilitymethodtoprinttheresultofourpredictmethodandreturnit:
defpredict(fn):
f=bow_features(fn);
p=svm.predict(f)
printfn,"\t",p[1][0][0]
returnp
2. Let’sdefinetwosampleimagepathsandreadthemastheNumPyarrays:
car,notcar="/home/d3athmast3r/dev/python/study/images/car.jpg",
"/home/d3athmast3r/dev/python/study/images/bb.jpg"
car_img=cv2.imread(car)
notcar_img=cv2.imread(notcar)
3. We’llpasstheseimagestothetrainedSVM,andgettheresultoftheprediction:
car_predict=predict(car)
not_car_predict=predict(notcar)
Naturally,we’rehopingthatthecarimagewillbedetectedasacar(resultofpredict()shouldbe1.0),andthattheotherimagewillnot(resultshouldbe-1.0),sowewillonlyaddtexttotheimagesiftheresultistheexpectedone.
4. Atlast,we’llpresenttheimagesonthescreen,hopingtoseethecorrectcaptiononeach:
font=cv2.FONT_HERSHEY_SIMPLEX
if(car_predict[1][0][0]==1.0):
cv2.putText(car_img,'CarDetected',(10,30),font,1,
(0,255,0),2,cv2.LINE_AA)
if(not_car_predict[1][0][0]==-1.0):
cv2.putText(notcar_img,'CarNotDetected',(10,30),font,1,(0,0,
255),2,cv2.LINE_AA)
cv2.imshow('BOW+SVMSuccess',car_img)
cv2.imshow('BOW+SVMFailure',notcar_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Theprecedingoperationproducesthefollowingresult:
Italsoresultsinthis:
SVMandslidingwindowsHavingdetectedanobjectisanimpressiveachievement,butnowwewanttopushthistothenextlevelintheseways:
DetectingmultipleobjectsofthesamekindinanimageDeterminingthepositionofadetectedobjectinanimage
Toaccomplishthis,wewillusetheslidingwindowsapproach.Ifit’snotalreadyclearfromthepreviousexplanationoftheconceptofslidingwindows,therationalebehindtheadoptionofthisapproachwillbecomemoreapparentifwetakealookatadiagram:
Observethemovementoftheblock:
1. Wetakearegionoftheimage,classifyit,andthenmoveapredefinedstepsizetotheright-handside.Whenwereachtherightmostendoftheimage,we’llresetthexcoordinateto0andmovedownastep,andrepeattheentireprocess.
2. Ateachstep,we’llperformaclassificationwiththeSVMthatwastrainedwithBOW.
3. KeepatrackofalltheblocksthathavepassedtheSVMpredicttest.4. Whenyou’vefinishedclassifyingtheentireimage,scaletheimagedownandrepeat
theentireslidingwindowsprocess.
Continuerescalingandclassifyinguntilyougettoaminimumsize.
Thisgivesyouthechancetodetectobjectsinseveralregionsoftheimageandatdifferentscales.
Atthisstage,youwillhavecollectedimportantinformationaboutthecontentoftheimage;however,there’saproblem:it’smostlikelythatyouwillendupwithanumberofoverlappingblocksthatgiveyouapositivescore.Thismeansthatyourimagemaycontainoneobjectthatgetsdetectedfourorfivetimes,andifyouweretoreporttheresultofthedetection,yourreportwouldbequiteinaccurate,sohere’swherenon-maximum
suppressioncomesintoplay.
Example–cardetectioninasceneWearenowreadytoapplyalltheconceptswelearnedsofartoareal-lifeexample,andcreateacardetectorapplicationthatscansanimageanddrawsrectanglesaroundcars.
Let’ssummarizetheprocessbeforedivingintothecode:
1. Obtainatraindataset.2. CreateaBOWtrainerandcreateavisualvocabulary.3. TrainanSVMwiththevocabulary.4. Attemptdetectionusingslidingwindowsonanimagepyramidofatestimage.5. Applynon-maximumsuppressiontooverlappingboxes.6. Outputtheresult.
Let’salsotakealookatthestructureoftheproject,asitisabitmorecomplexthantheclassicstandalonescriptapproachwe’veadopteduntilnow.
Theprojectstructureisasfollows:
├──car_detector
│├──detector.py
│├──__init__.py
│├──non_maximum.py
│├──pyramid.py
│└──sliding_w112661222.indow.py
└──car_sliding_windows.py
Themainprogramisincar_sliding_windows.py,andalltheutilitiesarecontainedinthecar_detectorfolder.Aswe’reusingPython2.7,we’llneedan__init__.pyfileinthefolderforittobedetectedasamodule.
Thefourfilesinthecar_detectormoduleareasfollows:
TheSVMtrainingmodelThenon-maximumsuppressionfunctionTheimagepyramidTheslidingwindowsfunction
Let’sexaminethemonebyone,startingfromtheimagepyramid:
importcv2
defresize(img,scaleFactor):
returncv2.resize(img,(int(img.shape[1]*(1/scaleFactor)),
int(img.shape[0]*(1/scaleFactor))),interpolation=cv2.INTER_AREA)
defpyramid(image,scale=1.5,minSize=(200,80)):
yieldimage
whileTrue:
image=resize(image,scale)
ifimage.shape[0]<minSize[1]orimage.shape[1]<minSize[0]:
break
yieldimage
Thismodulecontainstwofunctiondefinitions:
ResizetakesanimageandresizesitbyaspecifiedfactorPyramidtakesanimageandreturnsaresizedversionofituntiltheminimumconstraintsofwidthandheightarereached
NoteYouwillnoticethattheimageisnotreturnedwiththereturnkeywordbutwiththeyieldkeyword.Thisisbecausethisfunctionisaso-calledgenerator.Ifyouarenotfamiliarwithgenerators,takealookathttps://wiki.python.org/moin/Generators.
Thiswillallowustoobtainaresizedimagetoprocessinourmainprogram.
Nextupistheslidingwindowsfunction:
defsliding_window(image,stepSize,windowSize):
foryinxrange(0,image.shape[0],stepSize):
forxinxrange(0,image.shape[1],stepSize):
yield(x,y,image[y:y+windowSize[1],x:x+windowSize[0]])
Again,thisisagenerator.Althoughabitdeep-nested,thismechanismisverysimple:givenanimage,returnawindowthatmovesofanarbitrarysizedstepfromtheleftmargintowardstheright,untiltheentirewidthoftheimageiscovered,thengoesbacktotheleftmarginbutdownastep,coveringthewidthoftheimagerepeatedlyuntilthebottomrightcorneroftheimageisreached.Youcanvisualizethisasthesamepatternusedforwritingonapieceofpaper:startfromtheleftmarginandreachtherightmargin,thenmoveontothenextlinefromtheleftmargin.
Thelastutilityisnon-maximumsuppression,whichlookslikethis(Malisiewicz/Rosebrock’scode):
defnon_max_suppression_fast(boxes,overlapThresh):
#iftherearenoboxes,returnanemptylist
iflen(boxes)==0:
return[]
#iftheboundingboxesintegers,convertthemtofloats—
#thisisimportantsincewe'llbedoingabunchofdivisions
ifboxes.dtype.kind=="i":
boxes=boxes.astype("float")
#initializethelistofpickedindexes
pick=[]
#grabthecoordinatesoftheboundingboxes
x1=boxes[:,0]
y1=boxes[:,1]
x2=boxes[:,2]
y2=boxes[:,3]
scores=boxes[:,4]
#computetheareaoftheboundingboxesandsortthebounding
#boxesbythescore/probabilityoftheboundingbox
area=(x2-x1+1)*(y2-y1+1)
idxs=np.argsort(scores)[::-1]
#keeploopingwhilesomeindexesstillremainintheindexes
#list
whilelen(idxs)>0:
#grabthelastindexintheindexeslistandaddthe
#indexvaluetothelistofpickedindexes
last=len(idxs)-1
i=idxs[last]
pick.append(i)
#findthelargest(x,y)coordinatesforthestartof
#theboundingboxandthesmallest(x,y)coordinates
#fortheendoftheboundingbox
xx1=np.maximum(x1[i],x1[idxs[:last]])
yy1=np.maximum(y1[i],y1[idxs[:last]])
xx2=np.minimum(x2[i],x2[idxs[:last]])
yy2=np.minimum(y2[i],y2[idxs[:last]])
#computethewidthandheightoftheboundingbox
w=np.maximum(0,xx2-xx1+1)
h=np.maximum(0,yy2-yy1+1)
#computetheratioofoverlap
overlap=(w*h)/area[idxs[:last]]
#deleteallindexesfromtheindexlistthathave
idxs=np.delete(idxs,np.concatenate(([last],
np.where(overlap>overlapThresh)[0])))
#returnonlytheboundingboxesthatwerepickedusingthe
#integerdatatype
returnboxes[pick].astype("int")
Thisfunctionsimplytakesalistofrectanglesandsortsthembytheirscore.Startingfromtheboxwiththehighestscore,iteliminatesallboxesthatoverlapbeyondacertainthresholdbycalculatingtheareaofintersectionanddeterminingwhetheritisgreaterthanacertainthreshold.
Examiningdetector.py
Now,let’sexaminetheheartofthisprogram,whichisdetector.py.Thisabitlongandcomplex;however,everythingshouldappearmuchclearergivenournewfoundfamiliaritywiththeconceptsofBOW,SVM,andfeaturedetection/extraction.
Here’sthecode:
importcv2
importnumpyasnp
datapath="/path/to/CarData/TrainImages/"
SAMPLES=400
defpath(cls,i):
return"%s/%s%d.pgm"%(datapath,cls,i+1)
defget_flann_matcher():
flann_params=dict(algorithm=1,trees=5)
returncv2.FlannBasedMatcher(flann_params,{})
defget_bow_extractor(extract,flann):
returncv2.BOWImgDescriptorExtractor(extract,flann)
defget_extract_detect():
returncv2.xfeatures2d.SIFT_create(),cv2.xfeatures2d.SIFT_create()
defextract_sift(fn,extractor,detector):
im=cv2.imread(fn,0)
returnextractor.compute(im,detector.detect(im))[1]
defbow_features(img,extractor_bow,detector):
returnextractor_bow.compute(img,detector.detect(img))
defcar_detector():
pos,neg="pos-","neg-"
detect,extract=get_extract_detect()
matcher=get_flann_matcher()
print"buildingBOWKMeansTrainer…"
bow_kmeans_trainer=cv2.BOWKMeansTrainer(1000)
extract_bow=cv2.BOWImgDescriptorExtractor(extract,flann)
print"addingfeaturestotrainer"
foriinrange(SAMPLES):
printi
bow_kmeans_trainer.add(extract_sift(path(pos,i),extract,detect))
bow_kmeans_trainer.add(extract_sift(path(neg,i),extract,detect))
voc=bow_kmeans_trainer.cluster()
extract_bow.setVocabulary(voc)
traindata,trainlabels=[],[]
print"addingtotraindata"
foriinrange(SAMPLES):
printi
traindata.extend(bow_features(cv2.imread(path(pos,i),0),extract_bow,
detect))
trainlabels.append(1)
traindata.extend(bow_features(cv2.imread(path(neg,i),0),extract_bow,
detect))
trainlabels.append(-1)
svm=cv2.ml.SVM_create()
svm.setType(cv2.ml.SVM_C_SVC)
svm.setGamma(0.5)
svm.setC(30)
svm.setKernel(cv2.ml.SVM_RBF)
svm.train(np.array(traindata),cv2.ml.ROW_SAMPLE,np.array(trainlabels))
returnsvm,extract_bow
Let’sgothroughit.First,we’llimportourusualmodules,andthensetapathforthetrainingimages.
Then,we’lldefineanumberofutilityfunctions:
defpath(cls,i):
return"%s/%s%d.pgm"%(datapath,cls,i+1)
Thisfunctionreturnsthepathtoanimagegivenabasepathandaclassname.Inourexample,we’regoingtousetheneg-andpos-classnames,becausethisiswhatthetrainingimagesarecalled(thatis,neg-1.pgm).Thelastargumentisanintegerusedtocomposethefinalpartoftheimagepath.
Next,we’lldefineautilityfunctiontoobtainaFLANNmatcher:
defget_flann_matcher():
flann_params=dict(algorithm=1,trees=5)
returncv2.FlannBasedMatcher(flann_params,{})
Again,it’snotthattheinteger,1,passedasanalgorithmargumentrepresentsFLANN_INDEX_KDTREE.
ThenexttwofunctionsreturntheSIFTfeaturedetectors/extractorsandaBOWtrainer:
defget_bow_extractor(extract,flann):
returncv2.BOWImgDescriptorExtractor(extract,flann)
defget_extract_detect():
returncv2.xfeatures2d.SIFT_create(),cv2.xfeatures2d.SIFT_create()
Thenextutilityisafunctionusedtoreturnfeaturesfromanimage:
defextract_sift(fn,extractor,detector):
im=cv2.imread(fn,0)
returnextractor.compute(im,detector.detect(im))[1]
NoteASIFTdetectordetectsfeatures,whileaSIFTextractorextractsandreturnsthem.
We’llalsodefineasimilarutilityfunctiontoextracttheBOWfeatures:
defbow_features(img,extractor_bow,detector):
returnextractor_bow.compute(img,detector.detect(img))
Inthemaincar_detectorfunction,we’llfirstcreatethenecessaryobjectusedtoperformfeaturedetectionandextraction:
pos,neg="pos-","neg-"
detect,extract=get_extract_detect()
matcher=get_flann_matcher()
bow_kmeans_trainer=cv2.BOWKMeansTrainer(1000)
extract_bow=cv2.BOWImgDescriptorExtractor(extract,flann)
Then,we’lladdfeaturestakenfromtrainingimagestothetrainer:
print"addingfeaturestotrainer"
foriinrange(SAMPLES):
printi
bow_kmeans_trainer.add(extract_sift(path(pos,i),extract,detect))
Foreachclass,we’lladdapositiveimagetothetrainerandanegativeimage.
Afterthis,we’llinstructthetrainertoclusterthedataintokgroups.
Theclustereddataisnowourvocabularyofvisualwords,andwecansettheBOWImgDescriptorExtractorclass’vocabularyinthisway:
vocabulary=bow_kmeans_trainer.cluster()
extract_bow.setVocabulary(vocabulary)
Associatingtrainingdatawithclasses
Withavisualvocabularyready,wecannowassociatetraindatawithclasses.Inourcase,wehavetwoclasses:-1fornegativeresultsand1forpositiveones.
Let’spopulatetwoarrays,traindataandtrainlabels,containingextractedfeaturesandtheircorrespondinglabels.Iteratingthroughthedataset,wecanquicklysetthisupwiththefollowingcode:
traindata,trainlabels=[],[]
print"addingtotraindata"
foriinrange(SAMPLES):
printi
traindata.extend(bow_features(cv2.imread(path(pos,i),0),extract_bow,
detect))
trainlabels.append(1)
traindata.extend(bow_features(cv2.imread(path(neg,i),0),extract_bow,
detect))
trainlabels.append(-1)
Youwillnoticethatateachcycle,we’lladdonepositiveandonenegativeimage,andthenpopulatethelabelswitha1anda-1valuetokeepthedatasynchronizedwiththelabels.
Shouldyouwishtotrainmoreclasses,youcoulddothatbyfollowingthispattern:
traindata,trainlabels=[],[]
print"addingtotraindata"
foriinrange(SAMPLES):
printi
traindata.extend(bow_features(cv2.imread(path(class1,i),0),
extract_bow,detect))
trainlabels.append(1)
traindata.extend(bow_features(cv2.imread(path(class2,i),0),
extract_bow,detect))
trainlabels.append(2)
traindata.extend(bow_features(cv2.imread(path(class3,i),0),
extract_bow,detect))
trainlabels.append(3)
Forexample,youcouldtrainadetectortodetectcarsandpeopleandperformdetectionontheseinanimagecontainingbothcarsandpeople.
Lastly,we’lltraintheSVMwiththefollowingcode:
svm=cv2.ml.SVM_create()
svm.setType(cv2.ml.SVM_C_SVC)
svm.setGamma(0.5)
svm.setC(30)
svm.setKernel(cv2.ml.SVM_RBF)
svm.train(np.array(traindata),cv2.ml.ROW_SAMPLE,np.array(trainlabels))
returnsvm,extract_bow
TherearetwoparametersinparticularthatI’dliketofocusyourattentionon:
C:Withthisparameter,youcouldconceptualizethestrictnessorseverityoftheclassifier.Thehigherthevalue,thelesschancesofmisclassification,butthetrade-offisthatsomepositiveresultsmaynotbedetected.Ontheotherhand,alowvaluemayover-fit,soyouriskgettingfalsepositives.Kernel:Thisparameterdeterminesthenatureoftheclassifier:SVM_LINEARindicatesalinearhyperplane,which,inpracticalterms,worksverywellforabinaryclassification(thetestsampleeitherbelongstoaclassoritdoesn’t),whileSVM_RBF(radialbasisfunction)separatesdatausingtheGaussianfunctions,whichmeansthatthedataissplitintoseveralkernelsdefinedbythesefunctions.WhentrainingtheSVMtoclassifyformorethantwoclasses,youwillhavetouseRBF.
Finally,we’llpassthetraindataandtrainlabelsarraysintotheSVMtrainmethod,andreturntheSVMandBOWextractorobject.Thisisbecauseinourapplications,wedon’twanttohavetorecreatethevocabularyeverytime,soweexposeitforreuse.
Dude,where’smycar?Wearereadytotestourcardetector!
Let’sfirstcreateasimpleprogramthatloadsanimage,andthenoperatesdetectionusingtheslidingwindowsandimagepyramidtechniques,respectively:
importcv2
importnumpyasnp
fromcar_detector.detectorimportcar_detector,bow_features
fromcar_detector.pyramidimportpyramid
fromcar_detector.non_maximumimportnon_max_suppression_fastasnms
fromcar_detector.sliding_windowimportsliding_window
defin_range(number,test,thresh=0.2):
returnabs(number-test)<thresh
test_image="/path/to/cars.jpg"
svm,extractor=car_detector()
detect=cv2.xfeatures2d.SIFT_create()
w,h=100,40
img=cv2.imread(test_img)
rectangles=[]
counter=1
scaleFactor=1.25
scale=1
font=cv2.FONT_HERSHEY_PLAIN
forresizedinpyramid(img,scaleFactor):
scale=float(img.shape[1])/float(resized.shape[1])
for(x,y,roi)insliding_window(resized,20,(w,h)):
ifroi.shape[1]!=worroi.shape[0]!=h:
continue
try:
bf=bow_features(roi,extractor,detect)
_,result=svm.predict(bf)
a,res=svm.predict(bf,flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)
print"Class:%d,Score:%f"%(result[0][0],res[0][0])
score=res[0][0]
ifresult[0][0]==1:
ifscore<-1.0:
rx,ry,rx2,ry2=int(x*scale),int(y*scale),int((x+w)*
scale),int((y+h)*scale)
rectangles.append([rx,ry,rx2,ry2,abs(score)])
except:
pass
counter+=1
windows=np.array(rectangles)
boxes=nms(windows,0.25)
for(x,y,x2,y2,score)inboxes:
printx,y,x2,y2,score
cv2.rectangle(img,(int(x),int(y)),(int(x2),int(y2)),(0,255,0),1)
cv2.putText(img,"%f"%score,(int(x),int(y)),font,1,(0,255,0))
cv2.imshow("img",img)
cv2.waitKey(0)
Thenotablepartoftheprogramisthefunctionwithinthepyramid/slidingwindowloop:
bf=bow_features(roi,extractor,detect)
_,result=svm.predict(bf)
a,res=svm.predict(bf,flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)
print"Class:%d,Score:%f"%(result[0][0],res[0][0])
score=res[0][0]
ifresult[0][0]==1:
ifscore<-1.0:
rx,ry,rx2,ry2=int(x*scale),int(y*scale),int((x+w)*
scale),int((y+h)*scale)
rectangles.append([rx,ry,rx2,ry2,abs(score)])
Here,weextractthefeaturesoftheregionofinterest(ROI),whichcorrespondstothecurrentslidingwindow,andthenwecallpredictontheextractedfeatures.Thepredictmethodhasanoptionalparameter,flags,whichreturnsthescoreoftheprediction(containedatthe[0][0]value).
NoteAwordonthescoreoftheprediction:thelowerthevalue,thehighertheconfidencethat
theclassifiedelementreallybelongstotheclass.
So,we’llsetanarbitrarythresholdof-1.0forclassifiedwindows,andallwindowswithlessthan-1.0aregoingtobetakenasgoodresults.AsyouexperimentwithyourSVMs,youmaytweakthistoyourlikinguntilyoufindagoldenmeanthatassuresbestresults.
Finally,weaddthecomputedcoordinatesoftheslidingwindow(meaning,wemultiplythecurrentcoordinatesbythescaleofthecurrentlayerintheimagepyramidsothatitgetscorrectlyrepresentedinthefinaldrawing)tothearrayofrectangles.
There’sonelastoperationweneedtoperformbeforedrawingourfinalresult:non-maximumsuppression.
WeturntherectanglesarrayintoaNumPyarray(toallowcertainkindofoperationsthatareonlypossiblewithNumPy),andthenapplyNMS:
windows=np.array(rectangles)
boxes=nms(windows,0.25)
Finally,weproceedwithdisplayingallourresults;forthesakeofconvenience,I’vealsoprintedthescoreobtainedforalltheremainingwindows:
Thisisaremarkablyaccurateresult!
AfinalnoteonSVM:youdon’tneedtotrainadetectoreverytimeyouwanttouseit,whichwouldbeextremelyimpractical.Youcanusethefollowingcode:
svm.save('/path/to/serialized/svmxml')
Youcansubsequentlyreloaditwithaloadmethodandfeedittestimagesorframes.
SummaryInthischapter,wetalkedaboutnumerousobjectdetectionconcepts,suchasHOG,BOW,SVM,andsomeusefultechniques,suchasimagepyramid,slidingwindows,andnon-maximumsuppression.
Weintroducedtheconceptofmachinelearningandexploredthevariousapproachesusedtotrainacustomdetector,includinghowtocreateorobtainatrainingdatasetandclassifydata.Finally,weputthisknowledgetogoodusebycreatingacardetectorfromscratchandverifyingitscorrectfunctioning.
Alltheseconceptsformthefoundationofthenextchapter,inwhichwewillutilizeobjectdetectionandclassificationtechniquesinthecontextofmakingvideos,andlearnhowtotrackobjectstoretaininformationthatcanpotentiallybeusedforbusinessorapplicationpurposes.
Chapter8.TrackingObjectsInthischapter,wewillexplorethevasttopicofobjecttracking,whichistheprocessoflocatingamovingobjectinamovieorvideofeedfromacamera.Real-timeobjecttrackingisacriticaltaskinmanycomputervisionapplicationssuchassurveillance,perceptualuserinterfaces,augmentedreality,object-basedvideocompression,anddriverassistance.
Trackingobjectscanbeaccomplishedinseveralways,withtheoptimaltechniquebeinglargelydependentonthetaskathand.Wewilllearnhowtoidentifymovingobjectsandtrackthemacrossframes.
DetectingmovingobjectsThefirsttaskthatneedstobeaccomplishedforustobeabletotrackanythinginavideoistoidentifythoseregionsofavideoframethatcorrespondtomovingobjects.
Therearemanywaystotrackobjectsinavideo,allofthemfulfillingaslightlydifferentpurpose.Forexample,youmaywanttotrackanythingthatmoves,inwhichcasedifferencesbetweenframesaregoingtobeofhelp;youmaywanttotrackahandmovinginavideo,inwhichcaseMeanshiftbasedonthecoloroftheskinisthemostappropriatesolution;youmaywanttotrackaparticularobjectofwhichyouknowtheaspect,inwhichcasetechniquessuchastemplatematchingwillbeofhelp.
Objecttrackingtechniquescangetquitecomplex,let’sexplorethemintheascendingorderofdifficulty,startingfromthesimplesttechnique.
BasicmotiondetectionThefirstandmostintuitivesolutionistocalculatethedifferencesbetweenframes,orbetweenaframeconsidered“background”andalltheotherframes.
Let’slookatanexampleofthisapproach:
importcv2
importnumpyasnp
camera=cv2.VideoCapture(0)
es=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(9,4))
kernel=np.ones((5,5),np.uint8)
background=None
while(True):
ret,frame=camera.read()
ifbackgroundisNone:
background=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
background=cv2.GaussianBlur(background,(21,21),0)
continue
gray_frame=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
gray_frame=cv2.GaussianBlur(gray_frame,(21,21),0)
diff=cv2.absdiff(background,gray_frame)
diff=cv2.threshold(diff,25,255,cv2.THRESH_BINARY)[1]
diff=cv2.dilate(diff,es,iterations=2)
image,cnts,hierarchy=cv2.findContours(diff.copy(),cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
forcincnts:
ifcv2.contourArea(c)<1500:
continue
(x,y,w,h)=cv2.boundingRect(c)
cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
cv2.imshow("contours",frame)
cv2.imshow("dif",diff)
ifcv2.waitKey(1000/12)&0xff==ord("q"):
break
cv2.destroyAllWindows()
camera.release()
Afterthenecessaryimports,weopenthevideofeedobtainedfromthedefaultsystemcamera,andwesetthefirstframeasthebackgroundoftheentirefeed.Eachframereadfromthatpointonwardisprocessedtocalculatethedifferencebetweenthebackgroundandtheframeitself.Thisisatrivialoperation:
diff=cv2.threshold(diff,25,255,cv2.THRESH_BINARY)[1]
Beforewegettodothat,though,weneedtoprepareourframeforprocessing.Thefirstthingwedoisconverttheframetograyscaleandbluritabit:
gray_frame=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
gray_frame=cv2.GaussianBlur(gray_frame,(21,21),0)
NoteYoumaywonderabouttheblurring:thereasonwhyweblurtheimageisthat,ineachvideofeed,there’sanaturalnoisecomingfromnaturalvibrations,changesinlighting,andthenoisegeneratedbythecameraitself.Wewanttosmooththisnoiseoutsothatitdoesn’tgetdetectedasmotionandconsequentlygettracked.
Nowthatourframeisgrayscaledandsmoothed,wecancalculatethedifferencecomparedtothebackground(whichhasalsobeengrayscaledandsmoothed),andobtainamapofdifferences.Thisisnottheonlyprocessingstep,though.We’realsogoingtoapplyathreshold,soastoobtainablackandwhiteimage,anddilatetheimagesoholesandimperfectionsgetnormalized,likeso:
diff=cv2.absdiff(background,gray_frame)
diff=cv2.threshold(diff,25,255,cv2.THRESH_BINARY)[1]
diff=cv2.dilate(diff,es,iterations=2)
Notethaterodinganddilatingcanalsoactasanoisefilter,muchliketheblurringweapplied,andthatitcanalsobeobtainedinonefunctioncallusingcv2.morphologyEx,weshowbothstepsexplicitlyfortransparencypurposes.Allthatislefttodoatthispointistofindthecontoursofallthewhiteblobsinthecalculateddifferencemap,anddisplaythem.Optionally,weonlydisplaycontoursforrectanglesgreaterthananarbitrarythreshold,sotinymovementsarenotdisplayed.Naturally,thisisuptoyouandyourapplicationneeds.Withaconstantlightingandaverynoiselesscamera,youmaywishtohavenothresholdontheminimumsizeofthecontours.Thisishowwedisplaytherectangles:
image,cnts,hierarchy=cv2.findContours(diff.copy(),cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
forcincnts:
ifcv2.contourArea(c)<1500:
continue
(x,y,w,h)=cv2.boundingRect(c)
cv2.rectangle(frame,(x,y),(x+w,y+h),(255,255,0),2)
cv2.imshow("contours",frame)
cv2.imshow("dif",diff)
OpenCVofferstwoveryhandyfunctions:
cv2.findContours:Thisfunctioncomputesthecontoursofsubjectsinanimagecv2.boundinRect:Thisfunctioncalculatestheirboundingbox
Sothereyouhaveit,abasicmotiondetectorwithrectanglesaroundsubjects.Thefinalresultissomethinglikethis:
Forsuchasimpletechnique,thisisquiteaccurate.However,thereareafewdrawbacksthatmakethisapproachunsuitableforallbusinessneeds,mostnotablythefactthatyouneedafirst“default”frametosetasabackground.Insituationssuchas—forexample—outdoorcameras,withlightschangingquiteconstantly,thisprocessresultsinaquiteinflexibleapproach,soweneedabitmoreintelligenceintooursystem.That’swherebackgroundsubtractorscomeintoplay.
Backgroundsubtractors–KNN,MOG2,andGMGOpenCVprovidesaclasscalledBackgroundSubtractor,whichisahandywaytooperateforegroundandbackgroundsegmentation.
ThisworkssimilarlytotheGrabCutalgorithmweanalyzedinChapter3,ProcessingImageswithOpenCV3,however,BackgroundSubtractorisafullyfledgedclasswithaplethoraofmethodsthatnotonlyperformbackgroundsubtraction,butalsoimprovebackgrounddetectionintimethroughmachinelearningandletsyousavetheclassifiertoafile.
TofamiliarizeourselveswithBackgroundSubtractor,let’slookatabasicexample:
importnumpyasnp
importcv2
cap=cv2.VideoCapture')
mog=cv2.createBackgroundSubtractorMOG2()
while(1):
ret,frame=cap.read()
fgmask=mog.apply(frame)
cv2.imshow('frame',fgmask)
ifcv2.waitKey(30)&0xff:
break
cap.release()
cv2.destroyAllWindows()
Let’sgothroughthisinorder.Firstofall,let’stalkaboutthebackgroundsubtractorobject.TherearethreebackgroundsubtractorsavailableinOpenCV3:K-NearestNeighbors(KNN),MixtureofGaussians(MOG2),andGeometricMultigrid(GMG),correspondingtothealgorithmusedtocomputethebackgroundsubtraction.
YoumayrememberthatwealreadyelaboratedonthetopicofforegroundandbackgrounddetectioninChapter5,DepthEstimationandSegmentation,inparticularwhenwetalkedaboutGrabCutandWatershed.
SowhydoweneedtheBackgroundSubtractorclasses?ThemainreasonbehindthisisthatBackgroundSubtractorclassesarespecificallybuiltwithvideoanalysisinmind,whichmeansthattheOpenCVBackgroundSubtractorclasses“learn”somethingabouttheenvironmentwitheveryframe.Forexample,withGMG,youcanspecifythenumberofframesusedtoinitializethevideoanalysis,withthedefaultbeing120(roughly5secondswithaveragecameras).TheconstantaspectabouttheBackgroundSubtractorclassesisthattheyoperateacomparisonbetweenframesandtheystoreahistory,whichallowsthemtoimprovemotionanalysisresultsastimepasses.
Anotherfundamental(andfrankly,quiteamazing)featureoftheBackgroundSubtractor
classesistheabilitytocomputeshadows.Thisisabsolutelyvitalforanaccuratereadingofvideoframes;bydetectingshadows,youcanexcludeshadowareas(bythresholdingthem)fromtheobjectsyoudetected,andconcentrateontherealfeatures.Italsogreatlyreducestheunwanted“merging”ofobjects.AnimagecomparisonwillgiveyouagoodideaoftheconceptI’mtryingtoillustrate.Here’sasampleofbackgroundsubtractionwithoutshadowdetection:
Here’sanexampleofshadowdetection(withshadowsthresholded):
Notethatshadowdetectionisn’tabsolutelyperfect,butithelpsbringtheobjectcontoursbacktotheobject’soriginalshape.Let’stakealookatareimplementedexampleofmotiondetectionutilizingBackgroundSubtractorKNN:
importcv2
importnumpyasnp
bs=cv2.createBackgroundSubtractorKNN(detectShadows=True)
camera=cv2.VideoCapture("/path/to/movie.flv")
whileTrue:
ret,frame=camera.read()
fgmask=bs.apply(frame)
th=cv2.threshold(fgmask.copy(),244,255,cv2.THRESH_BINARY)[1]
dilated=cv2.dilate(th,cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
(3,3)),iterations=2)
image,contours,hier=cv2.findContours(dilated,cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
forcincontours:
ifcv2.contourArea(c)>1600:
(x,y,w,h)=cv2.boundingRect(c)
cv2.rectangle(frame,(x,y),(x+w,y+h),(255,255,0),2)
cv2.imshow("mog",fgmask)
cv2.imshow("thresh",th)
cv2.imshow("detection",frame)
ifcv2.waitKey(30)&0xff==27:
break
camera.release()
cv2.destroyAllWindows()
Asaresultoftheaccuracyofthesubtractor,anditsabilitytodetectshadows,weobtainareallyprecisemotiondetection,inwhichevenobjectsthatarenexttoeachotherdon’tgetmergedintoonedetection,asshowninthefollowingscreenshot:
That’saremarkableresultforfewerthan30linesofcode!
Thecoreoftheentireprogramistheapply()methodofthebackgroundsubtractor;itcomputesaforegroundmask,whichcanbeusedasabasisfortherestoftheprocessing:
fgmask=bs.apply(frame)
th=cv2.threshold(fgmask.copy(),244,255,cv2.THRESH_BINARY)[1]
dilated=cv2.dilate(th,cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
(3,3)),iterations=2)
image,contours,hier=cv2.findContours(dilated,cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
forcincontours:
ifcv2.contourArea(c)>1600:
(x,y,w,h)=cv2.boundingRect(c)
cv2.rectangle(frame,(x,y),(x+w,y+h),(255,255,0),2)
Onceaforegroundmaskisobtained,wecanapplyathreshold:theforegroundmaskhaswhitevaluesfortheforegroundandgrayforshadows;thus,inthethresholdedimage,allpixelsthatarenotalmostpurewhite(244-255)arebinarizedto0insteadof1.
Fromthere,weproceedwiththesameapproachweadoptedforthebasicmotiondetectionexample:identifyingobjects,detectingcontours,anddrawingthemontheoriginalframe.
MeanshiftandCAMShiftBackgroundsubtractionisareallyeffectivetechnique,butnottheonlyoneavailabletotrackobjectsinavideo.Meanshiftisanalgorithmthattracksobjectsbyfindingthemaximumdensityofadiscretesampleofaprobabilityfunction(inourcase,aregionofinterestinanimage)andrecalculatingitatthenextframe,whichgivesthealgorithmanindicationofthedirectioninwhichtheobjecthasmoved.
Thiscalculationgetsrepeateduntilthecentroidmatchestheoriginalone,orremainsunalteredevenafterconsecutiveiterationsofthecalculation.Thisfinalmatchingiscalledconvergence.Forreference,thealgorithmwasfirstdescribedinthepaper,Theestimationofthegradientofadensityfunction,withapplicationsinpatternrecognition,FukunagaK.andHoestetlerL.,IEEE,1975,whichisavailableathttp://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330(notethatthispaperisnotfreefordownload).
Here’savisualrepresentationofthisprocess:
Asidefromthetheory,Meanshiftisveryusefulwhentrackingaparticularregionofinterestinavideo,andthishasaseriesofimplications;forexample,ifyoudon’tknowaprioriwhattheregionyouwanttotrackis,you’regoingtohavetomanagethiscleverlyanddevelopprogramsthatdynamicallystarttracking(andceasetracking)certainareasofthevideo,dependingonarbitrarycriteria.OneexamplecouldbethatyouoperateobjectdetectionwithatrainedSVM,andthenstartusingMeanshifttotrackadetectedobject.
Let’snotmakeourlifecomplicatedfromtheverybeginning,though;let’sfirstgetfamiliarwithMeanshift,andthenuseitinmorecomplexscenarios.
Wewillstartbysimplymarkingaregionofinterestandkeepingtrackofit,likeso:
importnumpyasnp
importcv2
cap=cv2.VideoCapture(0)
ret,frame=cap.read()
r,h,c,w=10,200,10,200
track_window=(c,r,w,h)
roi=frame[r:r+h,c:c+w]
hsv_roi=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
mask=cv2.inRange(hsv_roi,np.array((100.,30.,32.)),
np.array((180.,120.,255.)))
roi_hist=cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)
term_crit=(cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,1)
whileTrue:
ret,frame=cap.read()
ifret==True:
hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
dst=cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)
#applymeanshifttogetthenewlocation
ret,track_window=cv2.meanShift(dst,track_window,term_crit)
#Drawitonimage
x,y,w,h=track_window
img2=cv2.rectangle(frame,(x,y),(x+w,y+h),255,2)
cv2.imshow('img2',img2)
k=cv2.waitKey(60)&0xff
ifk==27:
break
else:
break
cv2.destroyAllWindows()
cap.release()
Intheprecedingcode,IsuppliedtheHSVvaluesfortrackingsomeshadesoflilac,andhere’stheresult:
Ifyouranthecodeonyourmachine,you’dnoticehowtheMeanshiftwindowactuallylooksforthespecifiedcolorrange;ifitdoesn’tfindit,you’lljustseethewindowwobbling(itactuallylooksabitimpatient).Ifanobjectwiththespecifiedcolorrangeentersthewindow,thewindowwillthenstarttrackingit.
Let’sexaminethecodesothatwecanfullyunderstandhowMeanshiftperformsthistrackingoperation.
ColorhistogramsBeforeshowingthecodefortheprecedingexample,though,hereisanot-so-briefdigressiononcolorhistogramsandthetwoveryimportantbuilt-infunctionsofOpenCV:calcHistandcalcBackProject.
Thefunction,calcHist,calculatescolorhistogramsofanimage,sothenextlogicalstepistoexplaintheconceptofcolorhistograms.Acolorhistogramisarepresentationofthecolordistributionofanimage.Onthexaxisoftherepresentation,wehavecolorvalues,andontheyaxis,wehavethenumberofpixelscorrespondingtothecolorvalues.
Let’slookatavisualrepresentationofthisconcept,hopingtheadage,“apicturespeaksathousandwords”,willapplyinthisinstancetoo:
Thepictureshowsarepresentationofacolorhistogramwithonecolumnpervaluefrom0to180(notethatOpenCVusesHvalues0-180.Othersystemsmayuse0-360or0-255).
AsidefromMeanshift,colorhistogramsareusedforanumberofdifferentandusefulimageandvideoprocessingoperations.
ThecalcHistfunctionThecalcHist()functioninOpenCVhasthefollowingPythonsignature:
calcHist(...)
calcHist(images,channels,mask,histSize,ranges[,hist[,
accumulate]])->hist
Thedescriptionoftheparameters(astakenfromtheofficialOpenCVdocumentation)areasfollows:
Parameter Description
imagesThisparameteristhesourcearrays.Theyallshouldhavethesamedepth,CV_8UorCV_32F,andthesamesize.Eachofthemcanhaveanarbitrarynumberofchannels.
channels Thisparameteristhelistofthedimschannelsusedtocomputethehistogram.
maskThisparameteristheoptionalmask.Ifthematrixisnotempty,itmustbean8-bitarrayofthesamesizeasimages[i].Thenonzeromaskelementsmarkthearrayelementscountedinthehistogram.
histSize Thisparameteristhearrayofhistogramsizesineachdimension.
ranges Thisparameteristhearrayofthedimsarraysofthehistogrambinboundariesineachdimension.
hist Thisparameteristheoutputhistogram,whichisadenseorsparsedims(dimensional)array.
accumulate
Thisparameteristheaccumulationflag.Ifitisset,thehistogramisnotclearedinthebeginningwhenitisallocated.Thisfeatureenablesyoutocomputeasinglehistogramfromseveralsetsofarrays,ortoupdatethehistogramintime.
Inourexample,wecalculatethehistogramsoftheregionofinterestlikeso:
roi_hist=cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
ThiscanbeinterpretedasthecalculationofcolorhistogramsforanarrayofimagescontainingonlytheregionofinterestintheHSVspace.Inthisregion,wecomputeonlytheimagevaluescorrespondingtothemaskvaluesnotequalto0,with18histogramcolumns,andwitheachhistogramhaving0asthelowerboundaryand180astheupperboundary.
Thisisratherconvolutedtodescribebut,onceyouhavefamiliarizedyourselfwiththeconceptofahistogram,thepiecesofthepuzzleshouldclickintoplace.
ThecalcBackProjectfunctionTheotherfunctionthatcoversavitalroleintheMeanshiftalgorithm(butnotonlythis)iscalcBackProject,whichisshortforhistogrambackprojection(calculation).Ahistogrambackprojectionissocalledbecauseittakesahistogramandprojectsitbackontoanimage,withtheresultbeingtheprobabilitythateachpixelwillbelongtotheimagethatgeneratedthehistograminthefirstplace.Therefore,calcBackProjectgivesaprobabilityestimationthatacertainimageisequalorsimilartoamodelimage(fromwhichtheoriginalhistogramwasgenerated).
Again,ifyouthoughtcalcHistwasabitconvoluted,calcBackProjectisprobablyevenmorecomplex!
InsummaryThecalcHistfunctionextractsacolorhistogramfromanimage,givingastatisticalrepresentationofthecolorsinanimage,andcalcBackProjecthelpsincalculatingtheprobabilityofeachpixelofanimagebelongingtotheoriginalimage.
BacktothecodeLet’sgetbacktoourexample.Firstourusualimports,andthenwemarktheinitialregionofinterest:
cap=cv2.VideoCapture(0)
ret,frame=cap.read()
r,h,c,w=10,200,10,200
track_window=(c,r,w,h)
Then,weextractandconverttheROItoHSVcolorspace:
roi=frame[r:r+h,c:c+w]
hsv_roi=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
Now,wecreateamasktoincludeallpixelsoftheROIwithHSVvaluesbetweenthelowerandupperbounds:
mask=cv2.inRange(hsv_roi,np.array((100.,30.,32.)),
np.array((180.,120.,255.)))
Next,wecalculatethehistogramsoftheROI:
roi_hist=cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)
Afterthehistogramsarecalculated,thevaluesarenormalizedtobeincludedwithintherange0-255.
Meanshiftperformsanumberofiterationsbeforereachingconvergence;however,thisconvergenceisnotassured.So,OpenCVallowsustopassso-calledterminationcriteria,whichisawaytospecifythebehaviorofMeanshiftwithregardtoterminatingtheseriesofcalculations:
term_crit=(cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,1)
Inthisparticularcase,we’respecifyingabehaviorthatinstructsMeanshifttostopcalculatingthecentroidshiftafterteniterationsorifthecentroidhasmovedatleast1pixel.Thatfirstflag(EPSorCRITERIA_COUNT)indicateswe’regoingtouseeitherofthetwocriteria(countor“epsilon”,meaningtheminimummovement).
Nowthatwehaveahistogramcalculated,andterminationcriteriaforMeanshift,wecanstartourusualinfiniteloop,grabthecurrentframefromthecamera,andstartprocessingit.ThefirstthingwedoisswitchtoHSVcolorspace:
ifret==True:
hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
NowthatwehaveanHSVarray,wecanoperatethelongawaitedhistogrambackprojection:
dst=cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)
TheresultofcalcBackProjectisamatrix.Ifyouprintedittoconsole,itlooksmoreorlesslikethis:
[[000…,000]
[000…,000]
[000…,000]
...,
[0020…,000]
[78200…,000]
[25513720…,000]]
Eachpixelisrepresentedwithitsprobability.
ThismatrixcanthefinallybepassedintoMeanshift,togetherwiththetrackwindowandtheterminationcriteriaasoutlinedbythePythonsignatureofcv2.meanShift:
meanShift(...)
meanShift(probImage,window,criteria)->retval,window
Sohereitis:
ret,track_window=cv2.meanShift(dst,track_window,term_crit)
Finally,wecalculatethenewcoordinatesofthewindow,drawarectangletodisplayitintheframe,andthenshowit:
x,y,w,h=track_window
img2=cv2.rectangle(frame,(x,y),(x+w,y+h),255,2)
cv2.imshow('img2',img2)
That’sit.Youshouldbynowhaveagoodideaofcolorhistograms,backprojections,andMeanshift.However,thereremainsoneissuetoberesolvedwiththeprecedingprogram:thesizeofthewindowdoesnotchangewiththesizeoftheobjectintheframesbeingtracked.
Oneoftheauthoritiesincomputervisionandauthoroftheseminalbook,LearningOpenCV,GaryBradski,O’Reilly,publishedapaperin1988toimprovetheaccuracyofMeanshift,anddescribedanewalgorithmcalledContinuouslyAdaptiveMeanshift(CAMShift),whichisverysimilartoMeanshiftbutalsoadaptsthesizeofthetrackwindowwhenMeanshiftreachesconvergence.
CAMShiftWhileCAMShiftaddscomplexitytoMeanshift,theimplementationofthetheprecedingprogramusingCAMShiftissurprisingly(ornot?)similartotheMeanshiftexample,withthemaindifferencebeingthat,afterthecalltoCamShift,therectangleisdrawnwithaparticularrotationthatfollowstherotationoftheobjectbeingtracked.
Here’sthecodereimplementedwithCAMShift:
importnumpyasnp
importcv2
cap=cv2.VideoCapture(0)
#takefirstframeofthevideo
ret,frame=cap.read()
#setupinitiallocationofwindow
r,h,c,w=300,200,400,300#simplyhardcodedthevalues
track_window=(c,r,w,h)
roi=frame[r:r+h,c:c+w]
hsv_roi=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
mask=cv2.inRange(hsv_roi,np.array((100.,30.,32.)),
np.array((180.,120.,255.)))
roi_hist=cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])
cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)
term_crit=(cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,1)
while(1):
ret,frame=cap.read()
ifret==True:
hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
dst=cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)
ret,track_window=cv2.CamShift(dst,track_window,term_crit)
pts=cv2.boxPoints(ret)
pts=np.int0(pts)
img2=cv2.polylines(frame,[pts],True,255,2)
cv2.imshow('img2',img2)
k=cv2.waitKey(60)&0xff
ifk==27:
break
else:
break
cv2.destroyAllWindows()
cap.release()
ThedifferencebetweentheCAMShiftcodeandtheMeanshiftoneliesinthesefourlines:
ret,track_window=cv2.CamShift(dst,track_window,term_crit)
pts=cv2.boxPoints(ret)
pts=np.int0(pts)
img2=cv2.polylines(frame,[pts],True,255,2)
ThemethodsignatureofCamShiftisidenticaltoMeanshift.
TheboxPointsfunctionfindstheverticesofarotatedrectangle,whilethepolylinesfunctiondrawsthelinesoftherectangleontheframe.
Bynow,youshouldbefamiliarwiththethreeapproachesweadoptedfortrackingobjects:basicmotiondetection,Meanshift,andCAMShift.
Let’snowexploreanothertechnique:theKalmanfilter.
TheKalmanfilterTheKalmanfilterisanalgorithmmainly(butnotonly)developedbyRudolfKalmaninthelate1950s,andhasfoundpracticalapplicationinmanyfields,particularlynavigationsystemsforallsortsofvehiclesfromnuclearsubmarinestoaircrafts.
TheKalmanfilteroperatesrecursivelyonstreamsofnoisyinputdata(whichincomputervisionisnormallyavideofeed)toproduceastatisticallyoptimalestimateoftheunderlyingsystemstate(thepositioninsidethevideo).
Let’stakeaquickexampletoconceptualizetheKalmanfilterandtranslatethepreceding(purposelybroadandgeneric)definitionintoplainerEnglish.Thinkofasmallredballonatable,andimagineyouhaveacamerapointingatthescene.Youidentifytheballasthesubjecttobetracked,andflickitwithyourfingers.Theballwillstartrollingonthetable,followingthelawsofmotionwe’refamiliarwith.
Iftheballisrollingataspeedof1meterpersecond(1m/s)inaparticulardirection,youdon’tneedtheKalmanfiltertoestimatewheretheballwillbein1second’stime:itwillbe1meteraway.TheKalmanfilterappliestheselawstopredictanobject’spositioninthecurrentvideoframebasedonobservationsgatheredinthepreviousframes.Naturally,theKalmanfiltercannotknowaboutapencilonthetabledeflectingthecourseoftheball,butitcanadjustforthiskindofunforeseeableevent.
PredictandupdateFromtheprecedingdescription,wegatherthattheKalmanfilteralgorithmisdividedintotwophases:
Predict:Inthefirstphase,theKalmanfilterusesthecovariancecalculateduptothecurrentpointintimetoestimatetheobject’snewpositionUpdate:Inthesecondphase,itrecordstheobject’spositionandadjuststhecovarianceforthenextcycleofcalculations
Thisadjustmentis—inOpenCVterms—acorrection,hencetheAPIoftheKalmanFilterclassinthePythonbindingsofOpenCVisasfollows:
classKalmanFilter(__builtin__.object)
|Methodsdefinedhere:
|
|__repr__(...)
|x.__repr__()<==>repr(x)
|
|correct(...)
|correct(measurement)->retval
|
|predict(...)
|predict([,control])->retval
Wecandeducethat,inourprograms,wewillcallpredict()toestimatethepositionofanobject,andcorrect()toinstructtheKalmanfiltertoadjustitscalculations.
AnexampleUltimately,wewillaimtousetheKalmanfilterincombinationwithCAMShifttoobtainthehighestdegreeofaccuracyandperformance.However,beforewegointosuchlevelsofcomplexity,let’sanalyzeasimpleexample,specificallyonethatseemstobeverycommonontheWebwhenitcomestotheKalmanfilterandOpenCV:mousetracking.
Inthefollowingexample,wewilldrawanemptyframeandtwolines:onecorrespondingtotheactualmovementofthemouse,andtheothercorrespondingtotheKalmanfilterprediction.Here’sthecode:
importcv2
importnumpyasnp
frame=np.zeros((800,800,3),np.uint8)
last_measurement=current_measurement=np.array((2,1),np.float32)
last_prediction=current_prediction=np.zeros((2,1),np.float32)
defmousemove(event,x,y,s,p):
globalframe,current_measurement,measurements,last_measurement,
current_prediction,last_prediction
last_prediction=current_prediction
last_measurement=current_measurement
current_measurement=np.array([[np.float32(x)],[np.float32(y)]])
kalman.correct(current_measurement)
current_prediction=kalman.predict()
lmx,lmy=last_measurement[0],last_measurement[1]
cmx,cmy=current_measurement[0],current_measurement[1]
lpx,lpy=last_prediction[0],last_prediction[1]
cpx,cpy=current_prediction[0],current_prediction[1]
cv2.line(frame,(lmx,lmy),(cmx,cmy),(0,100,0))
cv2.line(frame,(lpx,lpy),(cpx,cpy),(0,0,200))
cv2.namedWindow("kalman_tracker")
cv2.setMouseCallback("kalman_tracker",mousemove)
kalman=cv2.KalmanFilter(4,2)
kalman.measurementMatrix=np.array([[1,0,0,0],[0,1,0,0]],np.float32)
kalman.transitionMatrix=np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],
[0,0,0,1]],np.float32)
kalman.processNoiseCov=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],
[0,0,0,1]],np.float32)*0.03
whileTrue:
cv2.imshow("kalman_tracker",frame)
if(cv2.waitKey(30)&0xFF)==27:
break
cv2.destroyAllWindows()
Asusual,let’sanalyzeitstepbystep.Afterthepackagesimport,wecreateanemptyframe,ofsize800x800,andtheninitializethearraysthatwilltakethecoordinatesofthemeasurementsandpredictionsofthemousemovements:
frame=np.zeros((800,800,3),np.uint8)
last_measurement=current_measurement=np.array((2,1),np.float32)
last_prediction=current_prediction=np.zeros((2,1),np.float32)
Then,wedeclarethemousemoveCallbackfunction,whichisgoingtohandlethedrawingofthetracking.Themechanismisquitesimple;westorethelastmeasurementsandlastprediction,correcttheKalmanwiththecurrentmeasurement,calculatetheKalmanprediction,andfinallydrawtwolines,fromthelastmeasurementtothecurrentandfromthelastpredictiontothecurrent:
defmousemove(event,x,y,s,p):
globalframe,current_measurement,measurements,last_measurement,
current_prediction,last_prediction
last_prediction=current_prediction
last_measurement=current_measurement
current_measurement=np.array([[np.float32(x)],[np.float32(y)]])
kalman.correct(current_measurement)
current_prediction=kalman.predict()
lmx,lmy=last_measurement[0],last_measurement[1]
cmx,cmy=current_measurement[0],current_measurement[1]
lpx,lpy=last_prediction[0],last_prediction[1]
cpx,cpy=current_prediction[0],current_prediction[1]
cv2.line(frame,(lmx,lmy),(cmx,cmy),(0,100,0))
cv2.line(frame,(lpx,lpy),(cpx,cpy),(0,0,200))
ThenextstepistoinitializethewindowandsettheCallbackfunction.OpenCVhandlesmouseeventswiththesetMouseCallbackfunction;specificeventsmustbehandledusingthefirstparameteroftheCallback(event)functionthatdetermineswhatkindofeventhasbeentriggered(click,move,andsoon):
cv2.namedWindow("kalman_tracker")
cv2.setMouseCallback("kalman_tracker",mousemove)
Nowwe’rereadytocreatetheKalmanfilter:
kalman=cv2.KalmanFilter(4,2)
kalman.measurementMatrix=np.array([[1,0,0,0],[0,1,0,0]],np.float32)
kalman.transitionMatrix=np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],
[0,0,0,1]],np.float32)
kalman.processNoiseCov=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],
[0,0,0,1]],np.float32)*0.03
TheKalmanfilterclasstakesoptionalparametersinitsconstructor(fromtheOpenCVdocumentation):
dynamParams:ThisparameterstatesthedimensionalityofthestateMeasureParams:ThisparameterstatesthedimensionalityofthemeasurementControlParams:Thisparameterstatesthedimensionalityofthecontrolvector.type:ThisparameterstatesthetypeofthecreatedmatricesthatshouldbeCV_32ForCV_64F
Ifoundtheprecedingparameters(bothfortheconstructorandtheKalmanproperties)toworkverywell.
Fromthispointon,theprogramisstraightforward;everymousemovementtriggersaKalmanprediction,boththeactualpositionofthemouseandtheKalmanpredictionaredrawnintheframe,whichiscontinuouslydisplayed.Ifyoumoveyourmousearound,you’llnoticethat,ifyoumakeasuddenturnathighspeed,thepredictionlinewillhaveawidertrajectory,whichisconsistentwiththemomentumofthemousemovementatthetime.Here’sasampleresult:
Areal-lifeexample–trackingpedestriansUptothispoint,wehavefamiliarizedourselveswiththeconceptsofmotiondetection,objectdetection,andobjecttracking,soIimagineyouareanxioustoputthisnewfoundknowledgetogooduseinareal-lifescenario.Let’sdojustthatbyexaminingthevideofeedofasurveillancecameraandtrackingpedestriansinit.
Firstofall,weneedasamplevideo;ifyoudownloadtheOpenCVsource,youwillfindtheperfectvideofileforthispurposein<opencv_dir>/samples/data/768x576.avi.
Nowthatwehavetheperfectassettoanalyze,let’sstartbuildingtheapplication.
TheapplicationworkflowTheapplicationwilladheretothefollowinglogic:
1. Examinethefirstframe.2. Examinethefollowingframesandperformbackgroundsubtractiontoidentify
pedestriansinthesceneatthestartofthescene.3. EstablishanROIperpedestrian,anduseKalman/CAMShifttotrackgivinganIDto
eachpedestrian.4. Examinethenextframesfornewpedestriansenteringthescene.
Ifthiswereareal-worldapplication,youwouldprobablystorepedestrianinformationtoobtaininformationsuchastheaveragepermanenceofapedestrianinthesceneandmostlikelyroutes.However,thisisallbeyondtheremitofthisexampleapplication.
Inareal-worldapplication,youwouldmakesuretoidentifynewpedestriansenteringthescene,butfornow,we’llfocusontrackingthoseobjectsthatareinthesceneatthestartofthevideo,utilizingtheCAMShiftandKalmanfilteralgorithms.
Youwillfindthecodeforthisapplicationinchapter8/surveillance_demo/ofthecoderepository.
Abriefdigression–functionalversusobject-orientedprogrammingAlthoughmostprogrammersareeitherfamiliar(orworkonaconstantbasis)withObject-orientedProgramming(OOP),Ihavefoundthat,themoretheyearspass,themoreIpreferFunctionalProgramming(FP)solutions.
Forthosenotfamiliarwiththeterminology,FPisaprogrammingparadigmadoptedbymanylanguagesthattreatsprogramsastheevaluationofmathematicalfunctions,allowsfunctionstoreturnfunctions,andpermitsfunctionsasargumentsinafunction.ThestrengthofFPdoesnotonlyresideinwhatitcando,butalsoinwhatitcanavoid,oraimsatavoidingside-effectsandchangingstates.Ifthetopicoffunctionalprogramminghassparkedaninterest,makesuretocheckoutlanguagessuchasHaskell,Clojure,orML.
NoteWhatisaside-effectinprogrammingterms?Youcandefineasideeffectasanyfunction
thatchangesanyvaluethatdoesnotdependonthefunction’sinput.Python,alongwithmanyotherlanguages,issusceptibletocausingside-effectsbecause—muchlike,forexample,JavaScript—itallowsaccesstoglobalvariables(andsometimesthisaccesstoglobalvariablescanbeaccidental!).
Anothermajorissueencounteredwithlanguagesthatarenotpurelyfunctionalisthefactthatafunction’sresultwillchangeovertime,dependingonthestateofthevariablesinvolved.Ifafunctiontakesanobjectasanargument—forexample—andthecomputationreliesontheinternalstateofthatobject,thefunctionwillreturndifferentresultsaccordingtothechangesintheobject’sstate.Thisissomethingthatverytypicallyhappensinlanguages,suchasCandC++,infunctionswhereoneormoreoftheargumentsarereferencestoobjects.
Whythisdigression?BecausesofarIhaveillustratedconceptsusingmostlyfunctions;Ididnotshyawayfromaccessingglobalvariableswherethiswasthesimplestandmostrobustapproach.However,thenextprogramwewillexaminewillcontainOOP.SowhydoIchoosetoadoptOOPwhileadvocatingFP?BecauseOpenCVhasquiteanopinionatedapproach,whichmakesithardtoimplementaprogramwithapurelyfunctionalorobject-orientedapproach.
Forexample,anydrawingfunction,suchascv2.rectangleandcv2.circle,modifiestheargumentpassedintoit.Thisapproachcontravenesoneofthecardinalrulesoffunctionalprogramming,whichistoavoidside-effectsandchangingstates.
Outofcuriosity,youcould—inPython—redeclaretheAPIofthesedrawingfunctionsinawaythatismoreFP-friendly.Forexample,youcouldrewritecv2.rectanglelikethis:
defdrawRect(frame,topLeft,bottomRight,color,thickness,fill=
cv2.LINE_AA):
newframe=frame.copy()
cv2.rectangle(newframe,topLeft,bottomRight,color,thickness,fill)
returnnewframe
Thisapproach—whilecomputationallymoreexpensiveduetothecopy()operation—allowstheexplicitreassignmentofaframe,likeso:
frame=camera.read()
frame=drawRect(frame,(0,0),(10,10),(0,255,0),1)
Toconcludethisdigression,Iwillreiterateabeliefveryoftenmentionedinallprogrammingforumsandresources:thereisnosuchthingasthebestlanguageorparadigm,onlythebesttoolforthejobinhand.
Solet’sgetbacktoourprogramandexploretheimplementationofasurveillanceapplication,trackingmovingobjectsinavideo.
ThePedestrianclassThemainrationalebehindthecreationofaPedestrianclassisthenatureoftheKalmanfilter.TheKalmanfiltercanpredictthepositionofanobjectbasedonhistoricalobservationsandcorrectthepredictionbasedontheactualdata,butitcanonlydothatforoneobject.
Asaconsequence,weneedoneKalmanfilterperobjecttracked.
SothePedestrianclasswillactasaholderforaKalmanfilter,acolorhistogram(calculatedonthefirstdetectionoftheobjectandusedasareferenceforthesubsequentframes),andinformationabouttheregionofinterest,whichwillbeusedbytheCAMShiftalgorithm(thetrack_windowparameter).
Furthermore,westoretheIDofeachpedestrianforsomefancyreal-timeinfo.
Let’stakealookatthePedestrianclass:
classPedestrian():
"""Pedestrianclass
eachpedestrianiscomposedofaROI,anIDandaKalmanfilter
sowecreateaPedestrianclasstoholdtheobjectstate
"""
def__init__(self,id,frame,track_window):
"""initthepedestrianobjectwithtrackwindowcoordinates"""
#setuptheroi
self.id=int(id)
x,y,w,h=track_window
self.track_window=track_window
self.roi=cv2.cvtColor(frame[y:y+h,x:x+w],cv2.COLOR_BGR2HSV)
roi_hist=cv2.calcHist([self.roi],[0],None,[16],[0,180])
self.roi_hist=cv2.normalize(roi_hist,roi_hist,0,255,
cv2.NORM_MINMAX)
#setupthekalman
self.kalman=cv2.KalmanFilter(4,2)
self.kalman.measurementMatrix=np.array([[1,0,0,0],
[0,1,0,0]],np.float32)
self.kalman.transitionMatrix=np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],
[0,0,0,1]],np.float32)
self.kalman.processNoiseCov=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],
[0,0,0,1]],np.float32)*0.03
self.measurement=np.array((2,1),np.float32)
self.prediction=np.zeros((2,1),np.float32)
self.term_crit=(cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,
1)
self.center=None
self.update(frame)
def__del__(self):
print"Pedestrian%ddestroyed"%self.id
defupdate(self,frame):
#print"updating%d"%self.id
hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
back_project=cv2.calcBackProject([hsv],[0],self.roi_hist,[0,180],1)
ifargs.get("algorithm")=="c":
ret,self.track_window=cv2.CamShift(back_project,
self.track_window,self.term_crit)
pts=cv2.boxPoints(ret)
pts=np.int0(pts)
self.center=center(pts)
cv2.polylines(frame,[pts],True,255,1)
ifnotargs.get("algorithm")orargs.get("algorithm")=="m":
ret,self.track_window=cv2.meanShift(back_project,
self.track_window,self.term_crit)
x,y,w,h=self.track_window
self.center=center([[x,y],[x+w,y],[x,y+h],[x+w,y+h]])
cv2.rectangle(frame,(x,y),(x+w,y+h),(255,255,0),1)
self.kalman.correct(self.center)
prediction=self.kalman.predict()
cv2.circle(frame,(int(prediction[0]),int(prediction[1])),4,(0,255,
0),-1)
#fakeshadow
cv2.putText(frame,"ID:%d->%s"%(self.id,self.center),(11,
(self.id+1)*25+1),
font,0.6,
(0,0,0),
1,
cv2.LINE_AA)
#actualinfo
cv2.putText(frame,"ID:%d->%s"%(self.id,self.center),(10,
(self.id+1)*25),
font,0.6,
(0,255,0),
1,
cv2.LINE_AA)
Atthecoreoftheprogramliesthebackgroundsubtractorobject,whichletsusidentifyregionsofinterestcorrespondingtomovingobjects.
Whentheprogramstarts,wetakeeachoftheseregionsandinstantiateaPedestrianclass,passingtheID(asimplecounter),andtheframeandtrackwindowcoordinates(sowecanextracttheRegionofInterest(ROI),and,fromthis,theHSVhistogramoftheROI).
Theconstructorfunction(__init__inPython)ismoreorlessanaggregationofallthepreviousconcepts:givenanROI,wecalculateitshistogram,setupaKalmanfilter,andassociateittoaproperty(self.kalman)oftheobject.
Intheupdatemethod,wepassthecurrentframeandconvertittoHSVsothatwecancalculatethebackprojectionofthepedestrian’sHSVhistogram.
WethenuseeitherCAMShiftorMeanshift(dependingontheargumentpassed;Meanshiftisthedefaultifnoargumentsarepassed)totrackthemovementofthepedestrian,andcorrecttheKalmanfilterforthatpedestrianwiththeactualposition.
WealsodrawbothCAMShift/Meanshift(withasurroundingrectangle)andKalman(withadot),soyoucanobserveKalmanandCAMShift/Meanshiftgonearlyhandinhand,exceptforsuddenmovementsthatcauseKalmantohavetoreadjust.
Lastly,weprintsomepedestrianinformationonthetop-leftcorneroftheimage.
ThemainprogramNowthatwehaveaPedestrianclassholdingallspecificinformationforeachobject,let’stakealookatthemainfunctionintheprogram.
First,weloadavideo(itcouldbeawebcam),andthenweinitializeabackgroundsubtractor,setting20framesastheframesaffectingthebackgroundmodel:
history=20
bs=cv2.createBackgroundSubtractorKNN(detectShadows=True)
bs.setHistory(history)
Wealsocreatethemaindisplaywindow,andthensetupapedestriansdictionaryandafirstFrameflag,whichwe’regoingtousetoallowafewframesforthebackgroundsubtractortobuildhistory,soitcanbetteridentifymovingobjects.Tohelpwiththis,wealsosetupaframecounter:
cv2.namedWindow("surveillance")
pedestrians={}
firstFrame=True
frames=0
Nowwestarttheloop.Wereadcameraframes(orvideoframes)onebyone:
whileTrue:
print"--------------------FRAME%d--------------------"%frames
grabbed,frane=camera.read()
if(grabbedisFalse):
print"failedtograbframe."
break
ret,frame=camera.read()
WeletBackgroundSubtractorKNNbuildthehistoryforthebackgroundmodel,sowedon’tactuallyprocessthefirst20frames;weonlypassthemintothesubtractor:
fgmask=bs.apply(frame)
#thisisjusttoletthebackgroundsubtractorbuildabitofhistory
ifframes<history:
frames+=1
continue
Thenweprocesstheframewiththeapproachexplainedearlierinthechapter,byapplyingaprocessofdilationanderosionontheforegroundmasksoastoobtaineasilyidentifiableblobsandtheirboundingboxes.Theseareobviouslymovingobjectsintheframe:
th=cv2.threshold(fgmask.copy(),127,255,cv2.THRESH_BINARY)[1]
th=cv2.erode(th,cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3)),
iterations=2)
dilated=cv2.dilate(th,cv2.getStructuringElement(cv2.MORPH_ELLIPSE,
(8,3)),iterations=2)
image,contours,hier=cv2.findContours(dilated,cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
Oncethecontoursareidentified,weinstantiateonepedestrianpercontourforthefirst
frameonly(notethatIsetaminimumareaforthecontourtofurtherdenoiseourdetection):
counter=0
forcincontours:
ifcv2.contourArea(c)>500:
(x,y,w,h)=cv2.boundingRect(c)
cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),1)
#onlycreatepedestriansinthefirstframe,thenjustfollowthe
onesyouhave
iffirstFrameisTrue:
pedestrians[counter]=Pedestrian(counter,frame,(x,y,w,h))
counter+=1
Then,foreachpedestriandetected,weperformanupdatemethodpassingthecurrentframe,whichisneededinitsoriginalcolorspace,becausethepedestrianobjectsareresponsiblefordrawingtheirowninformation(textandMeanshift/CAMShiftrectangles,andKalmanfiltertracking):
fori,pinpedestrians.iteritems():
p.update(frame)
WesetthefirstFrameflagtoFalse,sowedon’tinstantiateanymorepedestrians;wejustkeeptrackoftheoneswehave:
firstFrame=False
frames+=1
Finally,weshowtheresultinthedisplaywindow.TheprogramcanbeexitedbypressingtheEsckey:
cv2.imshow("surveillance",frame)
ifcv2.waitKey(110)&0xff==27:
break
if__name__=="__main__":
main()
Thereyouhaveit:CAMShift/MeanshiftworkingintandemwiththeKalmanfiltertotrackmovingobjects.Allbeingwell,youshouldobtainaresultsimilartothis:
Inthisscreenshot,thebluerectangleistheCAMShiftdetectionandthegreenrectangleistheKalmanfilterpredictionwithitscenteratthebluecircle.
Wheredowegofromhere?
Thisprogramconstitutesabasisforyourapplicationdomain’sneeds.Therearemanyimprovementsthatcanbemadebuildingontheprogramabovetosuitanapplication’sadditionalrequirements.Considerthefollowingexamples:
YoucoulddestroyapedestrianobjectifKalmanpredictsitspositiontobeoutsidetheframeYoucouldcheckwhethereachdetectedmovingobjectiscorrespondingtoexistingpedestrianinstances,andifnot,createaninstanceforitYoucouldtrainanSVMandoperateclassificationoneachmovingobjecttoestablishwhetherornotthemovingobjectisofthenatureyouintendtotrack(forinstance,adogmightenterthescenebutyourapplicationrequirestoonlytrackhumans)
Whateveryourneeds,hopefullythischapterwillhaveprovidedyouwiththenecessaryknowledgetobuildapplicationsthatsatisfyyourrequirements.
SummaryThischapterexploredthevastandcomplextopicofvideoanalysisandtrackingobjects.
Welearnedaboutvideobackgroundsubtractionwithabasicmotiondetectiontechniquethatcalculatesframedifferences,andthenmovedtomorecomplexandefficienttoolssuchasBackgroundSubtractor.
Wethenexploredtwoveryimportantvideoanalysisalgorithms:MeanshiftandCAMShift.Inthecourseofthis,wetalkedindetailaboutcolorhistogramsandbackprojections.WealsofamiliarizedourselveswiththeKalmanfilter,anditsusefulnessinacomputervisioncontext.Finally,weputallourknowledgetogetherinasamplesurveillanceapplication,whichtracksmovingobjectsinavideo.
NowthatourfoundationinOpenCVandmachinelearningissolidifying,wearereadytotackleartificialneuralnetworksanddivedeeperintoartificialintelligencewithOpenCVandPythoninthenextchapter.
Chapter9.NeuralNetworkswithOpenCV–anIntroductionMachinelearningisabranchofartificialintelligence,onethatdealsspecificallywithalgorithmsthatenableamachinetorecognizepatternsandtrendsindataandsuccessfullymakepredictionsandclassifications.
ManyofthealgorithmsandtechniquesusedbyOpenCVtoaccomplishsomeofthemoreadvancedtasksincomputervisionaredirectlyrelatedtoartificialintelligenceandmachinelearning.
ThischapterwillintroduceyoutotheMachineLearningconceptsinOpenCVsuchasartificialneuralnetworks.Thisisagentleintroductionthatbarelyscratchesthesurfaceofavastworld,thatofMachineLearning,whichiscontinuouslyevolving.
ArtificialneuralnetworksLet’sstartbydefiningArtificialNeuralNetworks(ANN)withanumberoflogicalsteps,ratherthanaclassicmonolithicsentenceusingobscurejargonwithanevenmoreobscuremeaning.
Firstofall,anANNisastatisticalmodel.Whatisastatisticalmodel?Astatisticalmodelisapairofelements,namelythespaceS(asetofobservations)andtheprobabilityP,wherePisadistributionthatapproximatesS(inotherwords,afunctionthatwouldgenerateasetofobservationsthatisverysimilartoS).
IliketothinkofPintwoways:asasimplificationofacomplexscenario,andasthefunctionthatgeneratedSinthefirstplace,orattheveryleastasetofobservationsverysimilartoS.
SoANNsaremodelsthattakeacomplexreality,simplifyit,anddeduceafunctionto(approximately)representstatisticalobservationsonewouldexpectfromthatreality,inmathematicalform.
ThenextstepinourjourneytowardscomprehendingANNsistounderstandhowanANNimprovesontheconceptofasimplestatisticalmodel.
Whatifthefunctionthatgeneratedthedatasetislikelytotakealargeamountof(unknown)inputs?
TheapproachthatANNstakeistodelegateworktoanumberofneurons,nodes,orunits,eachofwhichiscapableof“approximating”thefunctionthatcreatedtheinputs.Approximationismathematicallythedefinitionofasimplerfunctionthatapproximatesamorecomplexfunction,whichenablesustodefineerrors(relativetotheapplicationdomain).Furthermore,foraccuracy’ssake,anetworkisgenerallyrecognizedtobeneuraliftheneuronsorunitsarecapableofapproximatinganonlinearfunction.
Let’stakeacloserlookatneurons.
NeuronsandperceptronsTheperceptronisaconceptthatdatesbacktothe1950s,and(toputitsimply)aperceptronisafunctionthattakesanumberofinputsandproducesasinglevalue.
Eachoftheinputshasanassociatedweightthatsignifiestheimportanceoftheinputinthefunction.Thesigmoidfunctionproducesasinglevalue:
Asigmoidfunctionisatermthatindicatesthatthefunctionproduceseithera0or1value.Thediscriminantisathresholdvalue;iftheweightedsumoftheinputsisgreaterthanacertainthreshold,theperceptronproducesabinaryclassificationof1,otherwise0.
Howaretheseweightsdetermined,andwhatdotheyrepresent?
Neuronsareinterconnectedtoeachother,andeachneuron’ssetofweights(thesearejustnumericalparameters)definesthestrengthoftheconnectiontootherneurons.Theseweightsare“adaptive”,meaningtheychangeintimeaccordingtoalearningalgorithm.
ThestructureofanANNHere’savisualrepresentationofaneuralnetwork:
Asyoucanseefromthefigure,therearethreedistinctlayersinaneuralnetwork:Inputlayer,Hiddenlayer(ormiddle),andOutputlayer.
Therecanbemorethanonehiddenlayer;however,onehiddenlayerwouldbeenoughtoresolvethemajorityofreal-lifeproblems.
NetworklayersbyexampleHowdowedeterminethenetwork’stopology,andhowmanyneuronstocreateforeachlayer?Let’smakethisdeterminationlayerbylayer.
TheinputlayerTheinputlayerdefinesthenumberofinputsintothenetwork.Forexample,let’ssayyouwanttocreateanANN,whichwillhelpyoudeterminewhatanimalyou’relookingatgivenadescriptionofitsattributes.Let’sfixtheseattributestoweight,length,andteeth.That’sasetofthreeattributes;ournetworkwillneedtocontainthreeinputnodes.
TheoutputlayerTheoutputlayerisequaltothenumberofclassesweidentified.Continuingwiththeprecedingexampleofananimalclassificationnetwork,wewillarbitrarilysettheoutputlayerto4,becauseweknowwe’regoingtodealwiththefollowinganimals:dog,condor,dolphin,anddragon.Ifwefeedindataforananimalthatisnotinoneofthesecategories,thenetworkwillreturntheclassmostlikelytoresemblethisunclassifiedanimal.
ThehiddenlayerThehiddenlayercontainsperceptrons.Asmentioned,thevastmajorityofproblemsonlyrequireonehiddenlayer;mathematicallyspeaking,thereisnoverifiedreasontohavemorethantwohiddenlayers.Wewill,therefore,sticktoonehiddenlayerandworkwiththat.
Thereareanumberofrulesofthumbtodeterminethenumberofneuronscontainedinthehiddenlayer,butthereisnohard-and-fastrule.Theempiricalwayisyourfriendinthisparticularcircumstance:testyournetworkwithdifferentsettings,andchoosetheonethatfitsbest.
ThesearesomeofthemostcommonrulesusedwhenbuildinganANN:
Thenumberofhiddenneuronsshouldbebetweenthesizeoftheinputlayerandthesizeoftheoutputlayer.Ifthedifferencebetweentheinputlayersizeandtheoutputlayerislarge,itismyexperiencethatahiddenlayersizemuchclosertotheoutputlayerispreferable.Forrelativelysmallinputlayers,thenumberofhiddenneuronsistwo-thirdsthesizeoftheinputlayer,plusthesizeoftheoutputlayer,orlessthantwicethesizeoftheinputlayer.
Oneveryimportantfactortokeepinmindisoverfitting.Overfittingoccurswhenthere’ssuchaninordinateamountofinformationcontainedinthehiddenlayer(forexample,adisproportionateamountofneuronsinthelayer)comparedtotheinformationprovidedbythetrainingdatathatclassificationisnotverymeaningful.
Thelargerthehiddenlayer,themoretraininginformationisrequiredforthenetworktobetrainedproperly.And,needlesstosay,thisisgoingtolengthenthetimerequiredbythenetworktoproperlytrain.
So,followingthesecondrulesofthumbillustratedearlier,ournetworkwillhaveahiddenlayerofsize8,justbecauseafterafewrunsofthenetwork,Ifoundittoyieldthebestresults.Asasidenote,theempiricalapproachisverymuchencouragedintheworldofANNs.Thebestnetworktopologyisrelatedtothetypeofdatafedtothenetwork,sodon’trefrainfromtestingANNsinatrial-and-errorfashion.
Insummary,ournetworkhasthefollowingsizes:
Input:3Hidden:8Output:4
Thelearningalgorithms
ThereareanumberoflearningalgorithmsusedbyANNs,butwecanidentifythreemajorones:
Supervisedlearning:Withthisalgorithm,wewanttoobtainafunctionfromtheANN,whichdescribesthedatawelabeled.Weknow,apriori,thenatureofthisdata,andwedelegatetotheANNtheprocessoffindingafunctionthatdescribesthedata.Unsupervisedlearning:Thisalgorithmdiffersfromsupervisedlearning;inthis,thedataisunlabeled.Thisimpliesthatwedon’thavetoselectandlabeldata,butitalsomeanstheANNhasalotmoreworktodo.Theclassificationofthedataisusuallyobtainedthroughtechniquessuchas(butnotonly)clustering,whichweexploredinChapter7,DetectingandRecognizingObjects.Reinforcementlearning:Reinforcementlearningisalittlemorecomplex.Asystemreceivesaninput;adecision-makingmechanismdeterminesanaction,whichisperformedandscored(success/failureandgradesinbetween);andfinallytheinputandtheactionarepairedwiththeirscore,sothesystemlearnstorepeatorchangetheactiontobeperformedforacertaininputorstate.
NowthatwehaveageneralideaofwhatANNsare,let’sseehowOpenCVimplementsthem,andhowtoputthemtogooduse.Finally,we’llworkourwayuptoafullblownapplication,inwhichwewillattempttorecognizehandwrittendigits.
ANNsinOpenCVUnsurprisingly,ANNsresideinthemlmoduleofOpenCV.
Let’sexamineadummyexample,asagentleintroductiontoANNs:
importcv2
importnumpyasnp
ann=cv2.ml.ANN_MLP_create()
ann.setLayerSizes(np.array([9,5,9],dtype=np.uint8))
ann.setTrainMethod(cv2.ml.ANN_MLP_BACKPROP)
ann.train(np.array([[1.2,1.3,1.9,2.2,2.3,2.9,3.0,3.2,3.3]],
dtype=np.float32),
cv2.ml.ROW_SAMPLE,
np.array([[0,0,0,0,0,1,0,0,0]],dtype=np.float32))
printann.predict(np.array([[1.4,1.5,1.2,2.,2.5,2.8,3.,3.1,3.8]],
dtype=np.float32))
First,wecreateanANN:
ann=cv2.ml.ANN_MLP_create()
YoumaywonderabouttheMLPacronyminthefunctionname;itstandsformultilayerperceptron.Bynow,youshouldknowwhataperceptronis.
Aftercreatingthenetwork,weneedtosetitstopology:
ann.setLayerSizes(np.array([9,5,9],dtype=np.uint8))
ann.setTrainMethod(cv2.ml.ANN_MLP_BACKPROP)
ThelayersizesaredefinedbytheNumPyarraythatispassedintothesetLayerSizesmethod.Thefirstelementsetsthesizeoftheinputlayer,thelastelementsetsthesizeoftheoutputlayer,andallintermediaryelementsdefinethesizeofthehiddenlayers.
Wethensetthetrainmethodtobebackpropagation.Therearetwochoiceshere:BACKPROPandRPROP.
BothBACKPROPandRPROParebackpropagationalgorithms—insimpleterms,algorithmsthathaveaneffectonweightsbasedonerrorsinclassification.
Thesetwoalgorithmsworkinthecontextofsupervisedlearning,whichiswhatweareusingintheexample.Howcanwetellthisparticulardetail?Lookatthenextstatement:
ann.train(np.array([[1.2,1.3,1.9,2.2,2.3,2.9,3.0,3.2,3.3]],
dtype=np.float32),
cv2.ml.ROW_SAMPLE,
np.array([[0,0,0,0,0,1,0,0,0]],dtype=np.float32))
Youshouldnoticeanumberofdetails.Themethodlooksextremelysimilartothetrain()methodofsupportvectormachine.Themethodcontainsthreeparameters:samples,layout,andresponses.Onlysamplesistherequiredparameter;theothertwoareoptional.
Thisrevealsthefollowinginformation:
First,ANN,likeSVM,isanOpenCVStatModel(statisticalmodel);trainandpredictarethemethodsinheritedfromthebaseStatModelclass.Second,astatisticalmodeltrainedwithonlysamplesisadoptinganunsupervisedlearningalgorithm.Ifweprovidelayoutandresponses,we’reinasupervisedlearningcontext.
Aswe’reusingANNs,wecanspecifythetypeofbackpropagationalgorithmwe’regoingtouse(BACKPROPorRPROP),because—aswesaid—we’reinasupervisedlearningenvironment.
Sowhatisbackpropagation?Backpropagationisatwo-phasealgorithmthatcalculatestheerrorofpredictionsandupdatesinbothdirectionsofthenetwork(theinputandoutputlayers);itthenupdatestheneuronweightsaccordingly.
Let’straintheANN;aswespecifiedaninputlayerofsize9,weneedtoprovide9inputs,and9outputstoreflectthesizeoftheoutputlayer:
ann.train(np.array([[1.2,1.3,1.9,2.2,2.3,2.9,3.0,3.2,3.3]],
dtype=np.float32),
cv2.ml.ROW_SAMPLE,
np.array([[0,0,0,0,0,1,0,0,0]],dtype=np.float32))
Thestructureoftheresponseissimplyanarrayofzeros,witha1valueinthepositionindicatingtheclasswewanttoassociatetheinputwith.Inourprecedingexample,weindicatedthatthespecifiedinputarraycorrespondstoclass5(classesarezero-indexed)ofclasses0to8.
Lastly,weperformclassification:
printann.predict(np.array([[1.4,1.5,1.2,2.,2.5,2.8,3.,3.1,3.8]],
dtype=np.float32))
Thiswillyieldthefollowingresult:
(5.0,array([[-0.06419383,-0.13360272,-0.1681568,-0.18708915,
0.0970564,
0.89237726,0.05093023,0.17537238,0.13388439]],dtype=float32))
Thismeansthattheprovidedinputwasclassifiedasbelongingtoclass5.Thisisonlyadummyexampleandtheclassificationisprettymeaningless;however,thenetworkbehavedcorrectly.Inthiscode,weonlyprovidedonetrainingrecordforclass5,sothenetworkclassifiedanewinputasbelongingtoclass5.
Asyoumayhaveguessed,theoutputofapredictionisatuple,withthefirstvaluebeingtheclassandthesecondbeinganarraycontainingtheprobabilitiesforeachclass.Thepredictedclasswillhavethehighestvalue.Let’smoveontoaslightlymoreusefulexample:animalclassification.
ANN-imalclassificationPickingupfromwhereweleftoff,let’sillustrateaverysimpleexampleofanANNthatattemptstoclassifyanimalsbasedontheirstatistics(weight,length,andteeth).Myintentistodescribeamockreal-lifescenariotoimproveourunderstandingofANNsbeforewestartapplyingittocomputervisionand,specifically,OpenCV:
importcv2
importnumpyasnp
fromrandomimportrandint
animals_net=cv2.ml.ANN_MLP_create()
animals_net.setTrainMethod(cv2.ml.ANN_MLP_RPROP|
cv2.ml.ANN_MLP_UPDATE_WEIGHTS)
animals_net.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
animals_net.setLayerSizes(np.array([3,8,4]))
animals_net.setTermCriteria((cv2.TERM_CRITERIA_EPS|
cv2.TERM_CRITERIA_COUNT,10,1))
"""Inputarrays
weight,length,teeth
"""
"""Outputarrays
dog,eagle,dolphinanddragon
"""
defdog_sample():
return[randint(5,20),1,randint(38,42)]
defdog_class():
return[1,0,0,0]
defcondor_sample():
return[randint(3,13),3,0]
defcondor_class():
return[0,1,0,0]
defdolphin_sample():
return[randint(30,190),randint(5,15),randint(80,100)]
defdolphin_class():
return[0,0,1,0]
defdragon_sample():
return[randint(1200,1800),randint(15,40),randint(110,180)]
defdragon_class():
return[0,0,0,1]
SAMPLES=5000
forxinrange(0,SAMPLES):
print"Samples%d/%d"%(x,SAMPLES)
animals_net.train(np.array([dog_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([dog_class()],dtype=np.float32))
animals_net.train(np.array([condor_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([condor_class()],dtype=np.float32))
animals_net.train(np.array([dolphin_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([dolphin_class()],dtype=np.float32))
animals_net.train(np.array([dragon_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([dragon_class()],dtype=np.float32))
printanimals_net.predict(np.array([dog_sample()],dtype=np.float32))
printanimals_net.predict(np.array([condor_sample()],dtype=np.float32))
printanimals_net.predict(np.array([dragon_sample()],dtype=np.float32))
Thereareagoodfewdifferencesbetweenthisexampleandthedummyexample,solet’sexaminetheminorder.
First,theusualimports.Then,weimportrandint,justbecausewewanttogeneratesomerelativelyrandomdata:
importcv2
importnumpyasnp
fromrandomimportrandint
Then,wecreatetheANN.Thistime,wespecifythetrainmethodtoberesilientbackpropagation(animprovedversionofbackpropagation)andtheactivationfunctiontobeasigmoidfunction:
animals_net=cv2.ml.ANN_MLP_create()
animals_net.setTrainMethod(cv2.ml.ANN_MLP_RPROP|
cv2.ml.ANN_MLP_UPDATE_WEIGHTS)
animals_net.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
animals_net.setLayerSizes(np.array([3,8,4]))
Also,wespecifytheterminationcriteriasimilarlytothewaywedidintheCAMShiftalgorithminthepreviouschapter:
animals_net.setTermCriteria((cv2.TERM_CRITERIA_EPS|
cv2.TERM_CRITERIA_COUNT,10,1))
Nowweneedsomedata.We’renotreallysomuchinterestedinrepresentinganimalsaccurately,asrequiringabunchofrecordstobeusedastrainingdata.Sowebasicallydefinefoursamplecreationfunctionsandfourclassificationfunctionsthatwillhelpustrainthenetwork:
"""Inputarrays
weight,length,teeth
"""
"""Outputarrays
dog,eagle,dolphinanddragon
"""
defdog_sample():
return[randint(5,20),1,randint(38,42)]
defdog_class():
return[1,0,0,0]
defcondor_sample():
return[randint(3,13),3,0]
defcondor_class():
return[0,1,0,0]
defdolphin_sample():
return[randint(30,190),randint(5,15),randint(80,100)]
defdolphin_class():
return[0,0,1,0]
defdragon_sample():
return[randint(1200,1800),randint(15,40),randint(110,180)]
defdragon_class():
return[0,0,0,1]
Let’sproceedwiththecreationofourfakeanimaldata;we’llcreate5,000samplesperclass:
SAMPLES=5000
forxinrange(0,SAMPLES):
print"Samples%d/%d"%(x,SAMPLES)
animals_net.train(np.array([dog_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([dog_class()],dtype=np.float32))
animals_net.train(np.array([condor_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([condor_class()],dtype=np.float32))
animals_net.train(np.array([dolphin_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([dolphin_class()],dtype=np.float32))
animals_net.train(np.array([dragon_sample()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([dragon_class()],dtype=np.float32))
Intheend,weprinttheresultsthatyieldthefollowingcode:
(1.0,array([[1.49817729,1.60551953,-1.56444871,-0.04313202]],
dtype=float32))
(1.0,array([[1.49817729,1.60551953,-1.56444871,-0.04313202]],
dtype=float32))
(3.0,array([[-1.54576635,-1.68725526,1.6469276,2.23223686]],
dtype=float32))
Fromtheseresults,wededucethefollowing:
Thenetworkgottwooutofthreesamplescorrect,whichisnotperfectbutservesasagoodexampletoillustratetheimportanceofalltheelementsinvolvedinbuildingandtraininganANN.Thesizeoftheinputlayerisveryimportanttocreatediversificationbetweenthedifferentclasses.Inourcase,weonlyhadthreestatisticsandthereisarelativeoverlappinginfeatures.Thesizeofthehiddenlayerneedstobetested.Youwillfindthatincreasingneuronsmayimproveaccuracytoapoint,andthenitwilloverfit,unlessyoustartcompensatingwithenormousamountsofdata:thenumberoftrainingrecords.
Definitely,avoidhavingtoofewrecordsorfeedingalotofidenticalrecordsastheANNwon’tlearnmuchfromthem.
TrainingepochsAnotherimportantconceptintrainingANNsistheideaofepochs.Atrainingepochisaniterationthroughthetrainingdata,afterwhichthedataistestedforclassification.MostANNstrainoverseveralepochs;you’llfindthatsomeofthemostcommonexamplesofANNs,classifyinghandwrittendigits,willhavethetrainingdataiteratedseveralhundredtimes.
IpersonallysuggestyouspendalotoftimeplayingwithANNsandthenumberofepochs,untilyoureachconvergence,whichmeansthatfurtheriterationswillnolongerimprove(atleastnotnoticeably)theaccuracyoftheresults.
Theprecedingexamplecanbemodifiedasfollowstoleverageepochs:
defrecord(sample,classification):
return(np.array([sample],dtype=np.float32),np.array([classification],
dtype=np.float32))
records=[]
RECORDS=5000
forxinrange(0,RECORDS):
records.append(record(dog_sample(),dog_class()))
records.append(record(condor_sample(),condor_class()))
records.append(record(dolphin_sample(),dolphin_class()))
records.append(record(dragon_sample(),dragon_class()))
EPOCHS=5
foreinrange(0,EPOCHS):
print"Epoch%d:"%e
fort,cinrecords:
animals_net.train(t,cv2.ml.ROW_SAMPLE,c)
Then,dosometests,startingwiththedogclass:
dog_results=0
forxinrange(0,100):
clas=int(animals_net.predict(np.array([dog_sample()],
dtype=np.float32))[0])
print"class:%d"%clas
if(clas)==0:
dog_results+=1
Repeatoverallclassesandoutputtheresults:
print"Dogaccuracy:%f"%(dog_results)
print"condoraccuracy:%f"%(condor_results)
print"dolphinaccuracy:%f"%(dolphin_results)
print"dragonaccuracy:%f"%(dragon_results)
Finally,weobtainthefollowingresults:
Dogaccuracy:100.000000%
condoraccuracy:0.000000%
dolphinaccuracy:0.000000%
dragonaccuracy:92.000000%
Considerthefactthatwe’reonlyplayingwithtoy/fakedataandthesizeoftrainingdata/trainingiterations;thisteachesusquitealot.WecandiagnosetheANNasoverfittingtowardscertainclasses,soit’simportanttoimprovethequalityofthedatayoufeedintothetrainingprocess.
Allthatsaid,timeforareal-lifeexample:handwrittendigitrecognition.
HandwrittendigitrecognitionwithANNsTheworldofMachineLearningisvastandmostlyunexplored,andANNsarebutoneofthemanyconceptsrelatedtoMachineLearning,whichisoneofthemanysubdisciplinesofArtificialIntelligence.Forthepurposeofthischapter,wewillonlybeexploringtheconceptofANNsinthecontextofOpenCV.ItisbynomeansanexhaustivetreatiseonthesubjectofArtificialIntelligence.
Ultimately,we’reinterestedinseeingANNsworkintherealworld.Solet’sgoaheadandmakeithappen.
MNIST–thehandwrittendigitdatabaseOneofthemostpopularresourcesontheWebforthetrainingofclassifiersdealingwithOCRandhandwrittencharacterrecognitionistheMNISTdatabase,publiclyavailableathttp://yann.lecun.com/exdb/mnist/.
Thisparticulardatabaseisafreelyavailableresourcetokick-startthecreationofaprogramthatutilizesANNstorecognizehandwrittendigits.
CustomizedtrainingdataItisalwayspossibletobuildyourowntrainingdata.Itwilltakealittlebitofpatiencebutit’sfairlyeasy;collectavastnumberofhandwrittendigitsandcreateimagescontainingasingledigit,makingsurealltheimagesarethesamesizeandingrayscale.
Afterthis,youwillhavetocreateamechanismthatkeepsatrainingsampleinsyncwiththeexpectedclassification.
TheinitialparametersLet’stakealookattheindividuallayersinthenetwork:
InputlayerHiddenlayerOutputlayer
TheinputlayerSincewe’regoingtoutilizetheMNISTdatabase,theinputlayerwillhaveasizeof784inputnodes:that’sbecauseMNISTsamplesare28x28pixelimages,whichmeans784pixels.
ThehiddenlayerAswehaveseen,there’snohard-and-fastruleforthesizeofthehiddenlayer,I’vefound—throughseveralattempts—that50to60nodesyieldsthebestresultwhilenotnecessitatinganinordinateamountoftrainingdata.
Youcanincreasethesizeofthehiddenlayerwiththeamountofdata,butbeyondacertainpoint,therewillbenoadvantagetothat;youwillalsohavetobepreparedforyournetworktotakehourstotrain(themorehiddenneurons,thelongerittakestotrainthenetwork).
TheoutputlayerTheoutputlayerwillhaveasizeof10.Thisshouldnotbeasurpriseaswewanttoclassify10digits(0-9).
TrainingepochsWewillinitiallyusetheentiresetofthetraindatafromMNIST,whichconsistsofover60,000handwrittenimages,halfofwhichwerewrittenbyUSgovernmentemployees,andtheotherhalfbyhigh-schoolstudents.That’salotofdata,sowewon’tneedmorethanoneepochtoachieveanacceptablyhighaccuracyondetection.
Fromthereon,itisuptoyoutotrainthenetworkiterativelyonthesametraindata,andmysuggestionisthatyouuseanaccuracytest,andfindtheepochatwhichtheaccuracy“peaks”.Bydoingso,youwillhaveaprecisemeasurementofthehighestpossibleaccuracyachievedbyyournetworkgivenitscurrentconfiguration.
OtherparametersWewilluseasigmoidactivationfunction,ResilientBackPropagation(RPROP),andextendtheterminationcriteriaforeachcalculationto20iterationsinsteadof10,likewedidforeveryotheroperationinthisbookthatinvolvedcv2.TermCriteria.
NoteImportantnotesontraindataandANNslibraries
ExploringtheInternetforsources,IfoundanamazingarticlebyMichaelNielsenathttp://neuralnetworksanddeeplearning.com/chap1.html,whichillustrateshowtowriteanANNlibraryfromscratch,andthecodeforthislibraryisfreelyavailableonGitHubathttps://github.com/mnielsen/neural-networks-and-deep-learning;thisisthesourcecodeforabook,NeuralNetworksandDeepLearning,byMichaelNielsen.
Inthedatafolder,youwillfindapicklefile,signifyingdatathathasbeensavedtodiskthroughthepopularPythonlibrary,cPickle,whichmakesloadingandsavingthePythondataatrivialtask.
ThispicklefileisacPicklelibrary-serializedversionoftheMNISTdataand,asitissousefulandreadytoworkwith,Istronglysuggestyouusethat.NothingstopsyoufromloadingtheMNISTdatasetbuttheprocessofdeserializingthetrainingdataisquitetediousand—strictlyspeaking—outsidetheremitofthisbook.
Second,IwouldliketopointoutthatOpenCVisnottheonlyPythonlibrarythatallowsyoutouseANNs,notbyanystretchoftheimagination.TheWebisfullofalternativesthatIstronglyencourageyoutotryout,mostnotablyPyBrain,alibrarycalledLasagna(which—asanItalian—Ifindexceptionallyattractive)andmanycustom-writtenimplementations,suchastheaforementionedMichaelNielsen’simplementation.
Enoughintroductorydetails,though.Let’sgetgoing.
Mini-librariesSettingupanANNinOpenCVisnotdifficult,butyouwillalmostdefinitelyfindyourselftrainingyournetworkcountlesstimes,insearchofthatelusivepercentagepointthatbooststheaccuracyofyourresults.
Toautomatethisasmuchaspossible,wewillbuildamini-librarythatwrapstheOpenCV’snativeimplementationofANNsandletsusrerunandretrainthenetworkeasily.
Here’sanexampleofawrapperlibrary:
importcv2
importcPickle
importnumpyasnp
importgzip
defload_data():
mnist=gzip.open('./data/mnist.pkl.gz','rb')
training_data,classification_data,test_data=cPickle.load(mnist)
mnist.close()
return(training_data,classification_data,test_data)
defwrap_data():
tr_d,va_d,te_d=load_data()
training_inputs=[np.reshape(x,(784,1))forxintr_d[0]]
training_results=[vectorized_result(y)foryintr_d[1]]
training_data=zip(training_inputs,training_results)
validation_inputs=[np.reshape(x,(784,1))forxinva_d[0]]
validation_data=zip(validation_inputs,va_d[1])
test_inputs=[np.reshape(x,(784,1))forxinte_d[0]]
test_data=zip(test_inputs,te_d[1])
return(training_data,validation_data,test_data)
defvectorized_result(j):
e=np.zeros((10,1))
e[j]=1.0
returne
defcreate_ANN(hidden=20):
ann=cv2.ml.ANN_MLP_create()
ann.setLayerSizes(np.array([784,hidden,10]))
ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP)
ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
ann.setTermCriteria((cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,
20,1))
returnann
deftrain(ann,samples=10000,epochs=1):
tr,val,test=wrap_data()
forxinxrange(epochs):
counter=0
forimgintr:
if(counter>samples):
break
if(counter%1000==0):
print"Epoch%d:Trained%d/%d"%(x,counter,samples)
counter+=1
data,digit=img
ann.train(np.array([data.ravel()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([digit.ravel()],dtype=np.float32))
print"Epoch%dcomplete"%x
returnann,test
deftest(ann,test_data):
sample=np.array(test_data[0][0].ravel(),dtype=np.float32).reshape(28,
28)
cv2.imshow("sample",sample)
cv2.waitKey()
printann.predict(np.array([test_data[0][0].ravel()],dtype=np.float32))
defpredict(ann,sample):
resized=sample.copy()
rows,cols=resized.shape
if(rows!=28orcols!=28)androws*cols>0:
resized=cv2.resize(resized,(28,28),interpolation=
cv2.INTER_CUBIC)
returnann.predict(np.array([resized.ravel()],dtype=np.float32))
Let’sexamineitinorder.First,theload_data,wrap_data,andvectorized_resultfunctionsareincludedinMichaelNielsen’scodeforloadingthepicklefile.
It’sarelativelystraightforwardloadingofapicklefile.Mostnotably,though,theloadeddatahasbeensplitintothetrainandtestdata.Bothtrainandtestdataarearrayscontainingtwo-elementtuples:thefirstoneisthedataitself;thesecondoneistheexpectedclassification.SowecanusethetraindatatotraintheANNandthetestdatatoevaluateitsaccuracy.
Thevectorized_resultfunctionisaverycleverfunctionthat—givenanexpectedclassification—createsa10-elementarrayofzeros,settingasingle1fortheexpectedresult.This10-elementarray,youmayhaveguessed,willbeusedasaclassificationfortheoutputlayer.
ThefirstANN-relatedfunctioniscreate_ANN:
defcreate_ANN(hidden=20):
ann=cv2.ml.ANN_MLP_create()
ann.setLayerSizes(np.array([784,hidden,10]))
ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP)
ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)
ann.setTermCriteria((cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,
20,1))
returnann
ThisfunctioncreatesanANNspecificallygearedtowardshandwrittendigitrecognitionwithMNIST,byspecifyinglayersizesasillustratedintheInitialparameterssection.
Wenowneedatrainingfunction:
deftrain(ann,samples=10000,epochs=1):
tr,val,test=wrap_data()
forxinxrange(epochs):
counter=0
forimgintr:
if(counter>samples):
break
if(counter%1000==0):
print"Epoch%d:Trained%d/%d"%(x,counter,samples)
counter+=1
data,digit=img
ann.train(np.array([data.ravel()],dtype=np.float32),
cv2.ml.ROW_SAMPLE,np.array([digit.ravel()],dtype=np.float32))
print"Epoch%dcomplete"%x
returnann,test
Again,thisisquitesimple:givenanumberofsamplesandtrainingepochs,weloadthedata,andtheniteratethroughthesamplesanx-number-of-epochstimes.
Theimportantsectionofthisfunctionisthedeconstructionofthesingletrainingrecordintothetraindataandanexpectedclassification,whichisthenpassedintotheANN.
Todoso,weutilizethenumpyarrayfunction,ravel(),whichtakesanarrayofanyshapeand“flattens”itintoasingle-rowarray.So,forexample,considerthisarray:
data=[[1,2,3],[4,5,6],[7,8,9]]
Theprecedingarrayonce“raveled”,becomesthefollowingarray:
[1,2,3,4,5,6,7,8,9]
ThisistheformatthatOpenCV’sANNexpectsdatatolooklikeinitstrain()method.
Finally,wereturnboththenetworkandtestdata.Wecouldhavejustreturnedthedata,buthavingthetestdataathandforaccuracycheckingisquiteuseful.
Thelastfunctionweneedisapredict()functiontowrapANN’sownpredict()method:
defpredict(ann,sample):
resized=sample.copy()
rows,cols=resized.shape
if(rows!=28orcols!=28)androws*cols>0:
resized=cv2.resize(resized,(28,28),interpolation=
cv2.INTER_CUBIC)
returnann.predict(np.array([resized.ravel()],dtype=np.float32))
ThisfunctiontakesanANNandasampleimage;itoperatesaminimumof“sanitization”bymakingsuretheshapeofthedataisasexpectedandresizingitifit’snot,andthenravelingitforasuccessfulprediction.
ThefileIcreatedalsocontainsatestfunctiontoverifythatthenetworkworksanditdisplaysthesampleprovidedforclassification.
ThemainfileThiswholechapterhasbeenanintroductoryjourneyleadingustothispoint.Infact,manyofthetechniqueswe’regoingtousearefrompreviouschapters,soinawaytheentirebookhasledustothispoint.Solet’sputallourknowledgetogooduse.
Let’stakeaninitiallookatthefile,andthendecomposeitforabetterunderstanding:
importcv2
importnumpyasnp
importdigits_annasANN
definside(r1,r2):
x1,y1,w1,h1=r1
x2,y2,w2,h2=r2
if(x1>x2)and(y1>y2)and(x1+w1<x2+w2)and(y1+h1<y2+h2):
returnTrue
else:
returnFalse
defwrap_digit(rect):
x,y,w,h=rect
padding=5
hcenter=x+w/2
vcenter=y+h/2
if(h>w):
w=h
x=hcenter-(w/2)
else:
h=w
y=vcenter-(h/2)
return(x-padding,y-padding,w+padding,h+padding)
ann,test_data=ANN.train(ANN.create_ANN(56),20000)
font=cv2.FONT_HERSHEY_SIMPLEX
path="./images/numbers.jpg"
img=cv2.imread(path,cv2.IMREAD_UNCHANGED)
bw=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
bw=cv2.GaussianBlur(bw,(7,7),0)
ret,thbw=cv2.threshold(bw,127,255,cv2.THRESH_BINARY_INV)
thbw=cv2.erode(thbw,np.ones((2,2),np.uint8),iterations=2)
image,cntrs,hier=cv2.findContours(thbw.copy(),cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
rectangles=[]
forcincntrs:
r=x,y,w,h=cv2.boundingRect(c)
a=cv2.contourArea(c)
b=(img.shape[0]-3)*(img.shape[1]-3)
is_inside=False
forqinrectangles:
ifinside(r,q):
is_inside=True
break
ifnotis_inside:
ifnota==b:
rectangles.append(r)
forrinrectangles:
x,y,w,h=wrap_digit(r)
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
roi=thbw[y:y+h,x:x+w]
try:
digit_class=int(ANN.predict(ann,roi.copy())[0])
except:
continue
cv2.putText(img,"%d"%digit_class,(x,y-1),font,1,(0,255,0))
cv2.imshow("thbw",thbw)
cv2.imshow("contours",img)
cv2.imwrite("sample.jpg",img)
cv2.waitKey()
Aftertheinitialusualimports,weimportthemini-librarywecreated,whichisstoredindigits_ann.py.
Ifinditgoodpracticetodefinefunctionsatthetopofthefile,solet’sexaminethose.Theinside()functiondetermineswhetherarectangleisentirelycontainedinanotherrectangle:
definside(r1,r2):
x1,y1,w1,h1=r1
x2,y2,w2,h2=r2
if(x1>x2)and(y1>y2)and(x1+w1<x2+w2)and(y1+h1<y2+h2):
returnTrue
else:
returnFalse
Thewrap_digit()functiontakesarectanglethatsurroundsadigit,turnsitintoasquare,andcentersitonthedigititself,with5-pointpaddingtomakesurethedigitisentirelycontainedinit:
defwrap_digit(rect):
x,y,w,h=rect
padding=5
hcenter=x+w/2
vcenter=y+h/2
if(h>w):
w=h
x=hcenter-(w/2)
else:
h=w
y=vcenter-(h/2)
return(x-padding,y-padding,w+padding,h+padding)
Thepointofthisfunctionwillbecomeclearerlateron;let’snotdwellonittoomuchatthe
moment.
Now,let’screatethenetwork.Wewilluse58hiddennodes,andtrainover20,000samples:
ann,test_data=ANN.train(ANN.create_ANN(58),20000)
Thisisgoodenoughforapreliminarytesttokeepthetrainingtimedowntoaminuteortwo(dependingontheprocessingpowerofyourmachine).Theidealistousethefullsetoftrainingdata(50,000),anditeratethroughitseveraltimes,untilsomeconvergenceisreached(aswediscussedearlier,theaccuracy“peak”).Youwoulddothisbycallingthefollowingfunction:
ann,test_data=ANN.train(ANN.create_ANN(100),50000,30)
Wecannowpreparethedatatotest.Todothat,we’regoingtoloadanimage,andcleanupalittle:
path="./images/numbers.jpg"
img=cv2.imread(path,cv2.IMREAD_UNCHANGED)
bw=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
bw=cv2.GaussianBlur(bw,(7,7),0)
Nowthatwehaveagrayscalesmoothedimage,wecanapplyathresholdandsomemorphologyoperationstomakesurethenumbersareproperlystandingoutfromthebackgroundandrelativelycleanedupforirregularities,whichmightthrowthepredictionoperationoff:
ret,thbw=cv2.threshold(bw,127,255,cv2.THRESH_BINARY_INV)
thbw=cv2.erode(thbw,np.ones((2,2),np.uint8),iterations=2)
NoteNotethethresholdflag,whichisforaninversebinarythreshold:asthesamplesoftheMNISTdatabasearewhiteonblack(andnotblackonwhite),weturntheimageintoablackbackgroundwithwhitenumbers.
Afterthemorphologyoperation,weneedtoidentifyandseparateeachnumberinthepicture.Todothis,wefirstidentifythecontoursintheimage:
image,cntrs,hier=cv2.findContours(thbw.copy(),cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
Then,weiteratethroughthecontours,anddiscardalltherectanglesthatareentirelycontainedinotherrectangles;weonlyappendtothelistofgoodrectanglestheonesthatarenotcontainedinotherrectanglesandarealsonotaswideastheimageitself.Insomeofthetests,findContoursyieldedtheentireimageasacontouritself,whichmeantnootherrectanglepassedtheinsidetest:
rectangles=[]
forcincntrs:
r=x,y,w,h=cv2.boundingRect(c)
a=cv2.contourArea(c)
b=(img.shape[0]-3)*(img.shape[1]-3)
is_inside=False
forqinrectangles:
ifinside(r,q):
is_inside=True
break
ifnotis_inside:
ifnota==b:
rectangles.append(r)
Nowthatwehavealistofgoodrectangles,wecaniteratethroughthemanddefinearegionofinterestforeachoftherectanglesweidentified:
forrinrectangles:
x,y,w,h=wrap_digit(r)
Thisiswherethewrap_digit()functionwedefinedatthebeginningoftheprogramcomesintoplay:weneedtopassasquareregionofinteresttothepredictorfunction;ifwesimplyresizedarectangleintoasquare,we’druinourtestdata.
Youmaywonderwhythinkofthenumberone.Arectanglesurroundingthenumberonewouldbeverynarrow,especiallyifithasbeendrawnwithouttoomuchofaleantoeitherside.Ifyousimplyresizedittoasquare,youwould“fatten”thenumberoneinsuchawaythatnearlytheentiresquarewouldturnblack,renderingthepredictionimpossible.Instead,wewanttocreateasquarearoundtheidentifiednumber,whichisexactlywhatwrap_digit()does.
Thisapproachisquick-and-dirty;itallowsustodrawasquarearoundanumberandsimultaneouslypassthatsquareasaregionofinterestfortheprediction.Apurerapproachwouldbetotaketheoriginalrectangleand“center”itintoasquarenumpyarraywithrowsandcolumnsequaltothelargerofthetwodimensionsoftheoriginalrectangle.Thereasonforthisisyouwillnoticethatsomeofthesquarewillincludetinybitsofadjacentnumbers,whichcanthrowthepredictionoff.Withasquarecreatedfromanp.zeros()function,noimpuritieswillbeaccidentallydraggedin:
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
roi=thbw[y:y+h,x:x+w]
try:
digit_class=int(ANN.predict(ann,roi.copy())[0])
except:
continue
cv2.putText(img,"%d"%digit_class,(x,y-1),font,1,(0,255,0))
Oncethepredictionforthesquareregioniscomplete,wedrawitontheoriginalimage:
cv2.imshow("thbw",thbw)
cv2.imshow("contours",img)
cv2.imwrite("sample.jpg",img)
cv2.waitKey()
Andthat’sit!Thefinalresultwilllooksimilartothis:
PossibleimprovementsandpotentialapplicationsWehaveillustratedhowtobuildanANN,feedittrainingdata,anduseitforclassification.Thereareanumberofaspectswecanimprove,dependingonthetaskathand,andanumberofpotentialapplicationsofournew-foundknowledge.
ImprovementsThereareanumberofimprovementsthatcanbeappliedtothisapproach,someofwhichwehavealreadydiscussed:
Forexample,youcouldenlargeyourdatasetanditeratemoretimes,untilaperformancepeakisreachedYoucouldalsoexperimentwiththeseveralactivationfunctions(cv2.ml.ANN_MLP_SIGMOID_SYMisnottheonlyone;thereisalsocv2.ml.ANN_MLP_IDENTITYandcv2.ml.ANN_MLP_GAUSSIAN)Youcouldutilizedifferenttrainingflags(cv2.ml.ANN_MLP_UPDATE_WEIGHTS,cv2.ml.ANN_MLP_NO_INPUT_SCALE,cv2.ml.ANN_MLP_NO_OUTPUT_SCALE),andtrainingmethods(backpropagationorresilientbackpropagation)
Asidefromthat,bearinmindoneofthemantrasofsoftwaredevelopment:thereisnosinglebesttechnology,thereisonlythebesttoolforthejobathand.So,carefulanalysisoftheapplicationrequirementswillleadyoutothebestchoicesofparameters.Forexample,noteveryonedrawsdigitsthesameway.Infact,youwillevenfindthatsomecountriesdrawnumbersinaslightlydifferentway.
TheMNISTdatabasewascompiledintheUS,inwhichthenumbersevenisdrawnlikethecharacter7.Butyouwillfindthatthenumber7inEuropeisoftendrawnwithasmallhorizontallinehalfwaythroughthediagonalportionofthenumber,whichwasintroducedtodistinguishitfromthenumber1.
NoteForamoredetailedoverviewofregionalhandwritingvariations,checktheWikipediaarticleonthesubject,whichisagoodintroduction,availableathttps://en.wikipedia.org/wiki/Regional_handwriting_variation.
ThismeanstheMNISTdatabasehaslimitedaccuracywhenappliedtoEuropeanhandwriting;somenumberswillbeclassifiedmoreaccuratelythanothers.Soyoumayendupcreatingyourowndataset.Inalmostallcircumstances,itispreferabletoutilizethetraindatathat’srelevantandbelongstothecurrentapplicationdomain.
Finally,rememberthatonceyou’rehappywiththeaccuracyofyournetwork,youcanalwayssaveitandreloaditlater,soitcanbeutilizedinthird-partyapplicationswithouthavingtotraintheANNeverytime.
PotentialapplicationsTheprecedingprogramisonlythefoundationofahandwritingrecognitionapplication.Straightaway,youcanquicklyextendtheearlierapproachtovideosanddetecthandwrittendigitsinreal-time,oryoucouldtrainyourANNtorecognizetheentirealphabetforafull-blownOCRsystem.
Carregistrationplatedetectionseemslikeanobviousextensionofthelessonslearnedtothispoint,anditshouldbeaneveneasierdomaintoworkwith,asregistrationplatesuseconsistentcharacters.
Also,foryourownedificationorbusinesspurposes,youmaytrytobuildaclassifierwithANNsandplainSVMs(withfeaturedetectorssuchasSIFT)andseehowtheybenchmark.
SummaryInthischapter,wescratchedthesurfaceofthevastandfascinatingworldofANNs,focusingonOpenCV’simplementationofit.WelearnedaboutthestructureofANNs,andhowtodesignanetworktopologybasedonapplicationrequirements.
Finally,weutilizedvariousconceptsthatweexploredinthepreviouschapterstobuildahandwrittendigitrecognitionapplication.
Toboldlygo…IhopeyouenjoyedthejourneythroughthePythonbindingsforOpenCV3.AlthoughcoveringOpenCV3wouldtakeaseriesofbooks,weexploredveryfascinatingandfuturisticconcepts,andIencourageyoutogetintouchandletmeandtheOpenCVcommunityknowwhatyournextgroundbreakingcomputer-vision-basedprojectis!
IndexA
additiveabout/AquicknoteonBGR
applicationmodifying/Modifyingtheapplication
artificialneuralnetworks(ANN)about/Artificialneuralnetworksstatisticalmodel/Artificialneuralnetworksperceptrons/Neuronsandperceptronsstructure/ThestructureofanANNnetworklayers,byexample/NetworklayersbyexampleinOpenCV/ANNsinOpenCVANN-imalclassification/ANN-imalclassificationtrainingepochs/Trainingepochsdigitrecognition,handwritten/HandwrittendigitrecognitionwithANNshandwrittendigitrecognition/HandwrittendigitrecognitionwithANNsURL/Otherparametersimprovements/Improvementspotentialapplications/Potentialapplications
AT&TURL/Performingfacerecognition
Bbackgroundsubtractors
about/Backgroundsubtractors–KNN,MOG2,andGMGcode/Backtothecode
Bagofwords(BOW)about/Bag-of-words,BOWincomputervisionincomputervision/BOWincomputervisionkmeansclustering/Thek-meansclustering
BGRabout/AquicknoteonBGR
BinaryRobustIndependentElementaryFeatures(BRIEF)algorithm/BRIEFBlue-green-red(BGR)/Reading/writinganimagefileBRIEF/FeaturedetectionalgorithmsBrute-Forcematching/Brute-Forcematching
CcalcBackProjectfunction
about/ThecalcBackProjectfunctioncalcHistfunction/ThecalcHistfunctionCameo
about/ProjectCameo(facetrackingandimagemanipulation),Cameo–anobject-orienteddesignvideostreamabstracting,managers.CaptureManagerused/Abstractingavideostreamwithmanagers.CaptureManagerwindowabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManagerkeyboardabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManagerclass/Applyingeverythingwithcameo.Cameo
cameraframescapturing/Capturingcameraframesdisplaying,inwindow/Displayingcameraframesinawindow
CAMShiftabout/Backtothecode,CAMShift
Cannyused,foredgedetection/EdgedetectionwithCanny
cardetectionabout/Detectingcars,Whatdidwejustdo?slidingwindows/SVMandslidingwindowsSVM/SVMandslidingwindowsinscene/Example–cardetectioninascenetesting/Dude,where’smycar?
cardetection,inscenedetector.py,examining/Examiningdetector.pytrainingdata,associatingwithclasses/Associatingtrainingdatawithclasses
cascade/ConceptualizingHaarcascadeschannels
depthmap/Capturingframesfromadepthcamerapointcloudmap/Capturingframesfromadepthcameradisparitymap/Capturingframesfromadepthcameravaliddepthmask/Capturingframesfromadepthcamera
circledetectionabout/Lineandcircledetection,Circledetection
clustering/Thek-meansclusteringcolorhistograms
about/ColorhistogramscalcHist/ThecalcHistfunctioncalcBackProject/ThecalcBackProjectfunction
insummary/Insummarycolorspaces
converting,between/Convertingbetweendifferentcolorspacescomma-separatedvalue(CSV)file/Preparingthetrainingdatacontours
detection/Contourdetectionboundingbox/Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleminimumenclosingcircle/Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleminimumarearectangle/Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleconvexcontours/Contours–convexcontoursandtheDouglas-Peuckeralgorithmdouglas-peuckeralgorithm/Contours–convexcontoursandtheDouglas-Peuckeralgorithm
Contribmodulesinstalling/InstallingtheContribmodules
convergenceURL/MeanshiftandCAMShift
convexcontours/Contours–convexcontoursandtheDouglas-Peuckeralgorithmconvolutionmatrix
about/Customkernels–gettingconvolutedcopyoperation
masking/Maskingacopyoperation
Ddepthcamera
modules,creating/Creatingmodulesframes,capturing/Capturingframesfromadepthcamera
depthestimationwithnormalcamera/Depthestimationwithanormalcamera
depthmapabout/Capturingframesfromadepthcamera
DifferenceofGaussians(DoG)/FeatureextractionanddescriptionusingDoGandSIFTDiscreteFourierTransform(DFT)/TheFourierTransformdisparitymap/Capturingframesfromadepthcamera
mask,creatingfrom/Creatingamaskfromadisparitymapdouglas-peuckeralgorithm/Contours–convexcontoursandtheDouglas-Peuckeralgorithm
Eedge
detecting/Edgedetectiondetection,withCanny/EdgedetectionwithCanny
Eigenfacesabout/RecognizingfacesURL/Recognizingfacesrecognition,performing/PerforminganEigenfacesrecognition
ExtendedYaleorYaleBURL/Performingfacerecognition
Ffacedetection
performing,OpneCVused/UsingOpenCVtoperformfacedetectionperforming,onstillimage/Performingfacedetectiononastillimageperforming,onvideo/Performingfacedetectiononavideoperforming/Performingfacerecognitiondata,generating/Generatingthedataforfacerecognitionfaces,recognizing/Recognizingfaces,Loadingthedataandrecognizingfacestrainingdata,preparing/Preparingthetrainingdatadata,loading/LoadingthedataandrecognizingfacesEigenfacesrecognition,performing/PerforminganEigenfacesrecognitionfacerecognition,performingwithFisherfaces/PerformingfacerecognitionwithFisherfacesfacerecognition,performingwithLBPH/PerformingfacerecognitionwithLBPHresults,discardingwithconfidencescore/Discardingresultswithconfidencescore
FAST/FeaturedetectionalgorithmsFastFourierTransform(FFT)/TheFourierTransformFastLibraryforApproximateNearestNeighbors(FLANN)/FLANN-basedmatching
matching,withhomography/FLANNmatchingwithhomographyfeaturedetectionalgorithms
about/Featuredetectionalgorithmsfeatures,defining/Definingfeaturescorners/Detectingfeatures–cornersDoGused/FeatureextractionanddescriptionusingDoGandSIFTSIFTused/FeatureextractionanddescriptionusingDoGandSIFTkeypoint,anatomy/AnatomyofakeypointFastHessianused/FeatureextractionanddetectionusingFastHessianandSURFSURFused/FeatureextractionanddetectionusingFastHessianandSURFORBfeaturedetection/ORBfeaturedetectionandfeaturematchingFeaturesfromAcceleratedSegmentTest(FAST)algorithm/FASTBinaryRobustIndependentElementaryFeatures(BRIEF)algorithm/BRIEFBrute-Forcematching/Brute-Forcematchingfeaturematching,withORB/FeaturematchingwithORBK-NearestNeighborsmatching,using/UsingK-NearestNeighborsmatchingFastLibraryforApproximateNearestNeighbors(FLANN)/FLANN-basedmatchingFLANNmatching,withhomography/FLANNmatchingwithhomography
features/ConceptualizingHaarcascadesFeaturesfromAcceleratedSegmentTest(FAST)algorithm/FAST
Fisherfacesabout/RecognizingfacesURL/Recognizingfacesfacerecognition,performingwith/PerformingfacerecognitionwithFisherfaces
FourierTransformabout/TheFourierTransformhighpassfilter(HPF)/Highpassfilterlowpassfilter(LPF)/Lowpassfilter
framescapturing,fromdepthcamera/Capturingframesfromadepthcamera
functionalprogramming(FP)solutions/Abriefdigression–functionalversusobject-orientedprogramming
Ggeometricmultigrid(GMG)/Backgroundsubtractors–KNN,MOG2,andGMGGrabCutalgorithm
used,forobjectsegmentation/ObjectsegmentationusingtheWatershedandGrabCutalgorithmsforegrounddetection,example/ExampleofforegrounddetectionwithGrabCut
HHaarcascades
conceptualizing/ConceptualizingHaarcascadesdata,getting/GettingHaarcascadedata
handwrittendigitrecognitionwithANNs/HandwrittendigitrecognitionwithANNsMNISTdatabase/MNIST–thehandwrittendigitdatabasetrainingdata,customized/Customizedtrainingdatainitialparameters/Theinitialparameterstrainingepochs/Trainingepochsotherparameters/Otherparametersmini-libraries/Mini-librariesmainfile/Themainfile
Harris/Featuredetectionalgorithmshiddenlayer
about/Thehiddenlayer,Thehiddenlayerhighpassfilter(HPF)/Highpassfilterhistogrambackprojection/ThecalcBackProjectfunctionHistogramofOrientedGradients(HOG)
about/HOGdescriptorsscaleissue/Thescaleissuelocationissue/Thelocationissueimagepyramid/Imagepyramidslidingwindows/Slidingwindowsnonmaximum(ornonmaxima)suppression/Non-maximum(ornon-maxima)suppressionsupportvectormachines/Supportvectormachines
Hue,Saturation,Value(HSV)about/Convertingbetweendifferentcolorspaces
II/Oscripts
basic/BasicI/Oscriptsimagefile,reading/Reading/writinganimagefileimagefile,writing/Reading/writinganimagefileimageandrawbytes,conversions/Convertingbetweenanimageandrawbytesimagedata,accessingwithnumpy.array/Accessingimagedatawithnumpy.arrayvideofile,writing/Reading/writingavideofilevideofile,reading/Reading/writingavideofilecameraframes,capturing/Capturingcameraframesimages,displayinginwindow/Displayingimagesinawindowcameraframes,displayinginwindow/Displayingcameraframesinawindow
imageabout/Convertingbetweenanimageandrawbytesdisplaying,inwindow/Displayingimagesinawindow
imagedataaccessing,numpy.arrayused/Accessingimagedatawithnumpy.array
imagedescriptorssaving,tofile/Savingimagedescriptorstofilematches,scanningfor/Scanningformatches
imagefilereading/Reading/writinganimagefilewriting/Reading/writinganimagefile
imagesegmentationwithWatershedalgorithm/ImagesegmentationwiththeWatershedalgorithm
inputlayer/Theinputlayer,Theinputlayerinsummary
about/Insummary
KK-NearestNeighbors(KNN)/UsingK-NearestNeighborsmatching,Backgroundsubtractors–KNN,MOG2,andGMGKalmanfilter
about/TheKalmanfilter,Wheredowegofromhere?predictphase/Predictandupdateupdatephase/Predictandupdateexample/Anexamplereal-lifeexample/Areal-lifeexample–trackingpedestriansPedestrianclass/ThePedestrianclassmainprogram/Themainprogram
Kalmanfilter,real-lifeexampleabout/Areal-lifeexample–trackingpedestriansapplicationworkflow/Theapplicationworkflowfunctional,versusobject-orientedprogramming/Abriefdigression–functionalversusobject-orientedprogramming
kernelsabout/Highpassfiltercustomkernels/Customkernels–gettingconvoluted
keyboardabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManager
kmeansclustering/Thek-meansclustering
LLasagna/Otherparameterslearningalgorithms
about/Thelearningalgorithmssupervisedlearning/Thelearningalgorithmsunsupervisedlearning/Thelearningalgorithmsreinforcementlearning/Thelearningalgorithms
linesdetectionabout/Lineandcircledetection,Linedetection
LocalBinaryPatternURL/Recognizingfaces
LocalBinaryPatternHistogramsfacerecognition,performingwith/PerformingfacerecognitionwithLBPH
LocalBinaryPatternHistograms(LBPH)about/Recognizingfaces
lowpassfilter(LPF)/Lowpassfilter
Mmagnitudespectrum/TheFourierTransformmask
creating,fromdisparitymap/CreatingamaskfromadisparitymapMeanshift
about/MeanshiftandCAMShiftmixtureofGaussians(MOG2)/Backgroundsubtractors–KNN,MOG2,andGMGMNIST
about/MNIST–thehandwrittendigitdatabaseURL/MNIST–thehandwrittendigitdatabase
modulescreating/Creatingmodules
multilayerperceptron/ANNsinOpenCV
Nnetworklayers
byexample/Networklayersbyexampleinputlayer/Theinputlayeroutputlayer/Theoutputlayerhiddenlayer/Thehiddenlayer
nonmaximum(ornonmaxima)suppression/Non-maximum(ornon-maxima)suppressionnonmaximumsuppression(NMS)/EdgedetectionwithCannynumpy.array
used,foraccessingimagedata/Accessingimagedatawithnumpy.array
Oobjectdetection
about/Objectdetectionandrecognitiontechniquesobjectdetection,techniques
HistogramofOrientedGradients(HOG)/HOGdescriptorspeopledetection/Peopledetectioncreating/Creatingandtraininganobjectdetectortraining/CreatingandtraininganobjectdetectorBagofwords(BOW)/Bag-of-words
objectsmovingobjects,detecting/Detectingmovingobjectsmotiondetection/Basicmotiondetection
objectsegmentationWatershedalgorithmused/ObjectsegmentationusingtheWatershedandGrabCutalgorithmsGrabCutalgorithmused/ObjectsegmentationusingtheWatershedandGrabCutalgorithms
OpenSourceComputerVision(OCV)building,fromsource/BuildingOpenCVfromasourcedocumentation,URL/Findingdocumentation,help,andupdatescommunity,URL/Findingdocumentation,help,andupdates
OrientedFASTandRotatedBRIEF(ORB)/Featuredetectionalgorithmsfeaturedetectionandfeaturematching/ORBfeaturedetectionandfeaturematching
OSXinstallationon/InstallingonOSXMacPorts,usingwithready-madepackages/UsingMacPortswithready-madepackagesMacPorts,usingwithowncustompackages/UsingMacPortswithyourowncustompackagesHomebrew,usingwithready-madepackages/UsingHomebrewwithready-madepackages(nosupportfordepthcameras)Homebrew,usingwithowncustompackages/UsingHomebrewwithyourowncustompackages
outputlayer/Theoutputlayer,Theoutputlayeroverfitting/Thehiddenlayer
PPedestrianclass
about/ThePedestrianclassperceptrons
about/Neuronsandperceptronspointcloudmap/CapturingframesfromadepthcameraPrincipalComponentAnalysis(PCA)
URL/RecognizingfacesPyBrain/Otherparameterspyramid/Imagepyramid
Rrawbytes
about/Convertingbetweenanimageandrawbytesred-green-blue(RGB)/Reading/writinganimagefileregionalhandwritingvariations
URL/Improvementsregionofinterest(ROI)/ThePedestrianclassregionsofinterests(ROI)/Accessingimagedatawithnumpy.arrayreinforcementlearning/Thelearningalgorithmsresilientbackpropagation(RPROP)/Otherparametersridge/Definingfeatures
Ssamples
running/RunningsamplesScale-InvariantFeatureTransform(SIFT)/FeatureextractionanddescriptionusingDoGandSIFTsemiglobalblockmatching/Depthestimationwithanormalcamerasetuptools
selecting/Choosingandusingtherightsetuptoolsinstallation,onWindows/InstallationonWindowsinstallation,onOSX/InstallingonOSXinstallation,onUbuntuandderivatives/InstallationonUbuntuanditsderivativesinstallation,onUnix-likesystems/InstallationonotherUnix-likesystems
shapesdetecting/Detectingshapes
SIFT/Featuredetectionalgorithmssigmoidfunction/Neuronsandperceptronsslidingwindows/SlidingwindowsStanfordUniversity
URL/Detectingcarsstatisticalmodel/Artificialneuralnetworks,ANNsinOpenCVStereoSGBMparameters
minDisparity/DepthestimationwithanormalcameranumDisparities/DepthestimationwithanormalcamerawindowSize/DepthestimationwithanormalcameraP1/DepthestimationwithanormalcameraP2/Depthestimationwithanormalcameradisp12MaxDiff/DepthestimationwithanormalcamerapreFilterCap/DepthestimationwithanormalcamerauniquenessRatio/DepthestimationwithanormalcameraspeckleWindowSize/DepthestimationwithanormalcameraspeckleRange/Depthestimationwithanormalcamera
subtractivecolormodelabout/AquicknoteonBGR
supervisedlearning/Thelearningalgorithmssupportvectormachines(SVM)/Non-maximum(ornon-maxima)suppression
URL/SupportvectormachinesSURF/Featuredetectionalgorithms
UUbuntu
repository,using/UsingtheUbunturepository(nosupportfordepthcameras)UniversityofIllinois
URL/DetectingcarsUnix-likesystems
installationon/InstallationonotherUnix-likesystemsunsupervisedlearning/Thelearningalgorithms
Vvaliddepthmask/CapturingframesfromadepthcameraVarianteAscari/FeatureextractionanddescriptionusingDoGandSIFTvideofile
reading/Reading/writingavideofilewriting/Reading/writingavideofile
videostreamabstracting,managers.CaptureManagerused/Abstractingavideostreamwithmanagers.CaptureManager
WWatershedalgorithm
used,forobjectsegmentation/ObjectsegmentationusingtheWatershedandGrabCutalgorithmsused,forimagesegmentation/ImagesegmentationwiththeWatershedalgorithm
windowabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManager
Windowsinstallationon/InstallationonWindowsbinaryinstallers,using/Usingbinaryinstallers(nosupportfordepthcameras)CMake,using/UsingCMakeandcompilerscompilers,using/UsingCMakeandcompilers
windowsize/ConceptualizingHaarcascades
YYalefacedatabase(Yalefaces)
URL/Performingfacerecognition