Learning OpenCV 3 Computer Vision with...

LearningOpenCV3ComputerVisionwithPythonSecondEdition

TableofContents


Credits

AbouttheAuthors

AbouttheReviewers

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmore

Whysubscribe?

FreeaccessforPacktaccountholders

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Errata

Piracy

Questions

1.SettingUpOpenCV

Choosingandusingtherightsetuptools

InstallationonWindows

Usingbinaryinstallers(nosupportfordepthcameras)

UsingCMakeandcompilers

InstallingonOSX

UsingMacPortswithready-madepackages

UsingMacPortswithyourowncustompackages

UsingHomebrewwithready-madepackages(nosupportfordepthcameras)

UsingHomebrewwithyourowncustompackages

InstallationonUbuntuanditsderivatives

UsingtheUbunturepository(nosupportfordepthcameras)

BuildingOpenCVfromasource

InstallationonotherUnix-likesystems

InstallingtheContribmodules

Runningsamples

Findingdocumentation,help,andupdates

Summary

2.HandlingFiles,Cameras,andGUIs

BasicI/Oscripts

Reading/writinganimagefile

Convertingbetweenanimageandrawbytes

Accessingimagedatawithnumpy.array

Reading/writingavideofile

Capturingcameraframes

Displayingimagesinawindow

Displayingcameraframesinawindow

ProjectCameo(facetrackingandimagemanipulation)

Cameo–anobject-orienteddesign

Abstractingavideostreamwithmanagers.CaptureManager

Abstractingawindowandkeyboardwithmanagers.WindowManager

Applyingeverythingwithcameo.Cameo

Summary

3.ProcessingImageswithOpenCV3

Convertingbetweendifferentcolorspaces

AquicknoteonBGR

TheFourierTransform

Highpassfilter

Lowpassfilter

Creatingmodules

Edgedetection

Customkernels–gettingconvoluted

Modifyingtheapplication

EdgedetectionwithCanny

Contourdetection

Contours–boundingbox,minimumarearectangle,andminimumenclosingcircle

Contours–convexcontoursandtheDouglas-Peuckeralgorithm

Lineandcircledetection

Linedetection

Circledetection

Detectingshapes

Summary

4.DepthEstimationandSegmentation

Creatingmodules

Capturingframesfromadepthcamera

Creatingamaskfromadisparitymap

Maskingacopyoperation

Depthestimationwithanormalcamera

ObjectsegmentationusingtheWatershedandGrabCutalgorithms

ExampleofforegrounddetectionwithGrabCut

ImagesegmentationwiththeWatershedalgorithm

Summary

5.DetectingandRecognizingFaces

ConceptualizingHaarcascades

GettingHaarcascadedata

UsingOpenCVtoperformfacedetection

Performingfacedetectiononastillimage

Performingfacedetectiononavideo

Performingfacerecognition

Generatingthedataforfacerecognition

Recognizingfaces

Preparingthetrainingdata

Loadingthedataandrecognizingfaces

PerforminganEigenfacesrecognition

PerformingfacerecognitionwithFisherfaces

PerformingfacerecognitionwithLBPH

Discardingresultswithconfidencescore

Summary

6.RetrievingImagesandSearchingUsingImageDescriptors

Featuredetectionalgorithms

Definingfeatures

Detectingfeatures–corners

FeatureextractionanddescriptionusingDoGandSIFT

Anatomyofakeypoint

FeatureextractionanddetectionusingFastHessianandSURF

ORBfeaturedetectionandfeaturematching

FAST

BRIEF

Brute-Forcematching

FeaturematchingwithORB

UsingK-NearestNeighborsmatching

FLANN-basedmatching

FLANNmatchingwithhomography

Asampleapplication–tattooforensics

Savingimagedescriptorstofile

Scanningformatches

Summary

7.DetectingandRecognizingObjects

Objectdetectionandrecognitiontechniques

HOGdescriptors

Thescaleissue

Thelocationissue

Imagepyramid

Slidingwindows

Non-maximum(ornon-maxima)suppression

Supportvectormachines

Peopledetection

Creatingandtraininganobjectdetector

Bag-of-words

BOWincomputervision

Thek-meansclustering

Detectingcars

Whatdidwejustdo?

SVMandslidingwindows

Example–cardetectioninascene

Examiningdetector.py

Associatingtrainingdatawithclasses

Dude,where’smycar?

Summary

8.TrackingObjects

Detectingmovingobjects

Basicmotiondetection

Backgroundsubtractors–KNN,MOG2,andGMG

MeanshiftandCAMShift

Colorhistograms

ThecalcHistfunction

ThecalcBackProjectfunction

Insummary

Backtothecode

CAMShift

TheKalmanfilter

Predictandupdate

Anexample

Areal-lifeexample–trackingpedestrians

Theapplicationworkflow

Abriefdigression–functionalversusobject-orientedprogramming

ThePedestrianclass

Themainprogram

Wheredowegofromhere?

Summary

9.NeuralNetworkswithOpenCV–anIntroduction

Artificialneuralnetworks

Neuronsandperceptrons

ThestructureofanANN

Networklayersbyexample

Theinputlayer

Theoutputlayer

Thehiddenlayer

Thelearningalgorithms

ANNsinOpenCV

ANN-imalclassification

Trainingepochs

HandwrittendigitrecognitionwithANNs

MNIST–thehandwrittendigitdatabase

Customizedtrainingdata

Theinitialparameters

Theinputlayer

Thehiddenlayer

Theoutputlayer

Trainingepochs

Otherparameters

Mini-libraries

Themainfile

Possibleimprovementsandpotentialapplications

Improvements

Potentialapplications

Summary

Toboldlygo…

Index

LearningOpenCV3ComputerVisionwithPythonSecondEditionCopyright©2015PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:September2015

Productionreference:1240915

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78528-384-0

www.packtpub.com

http://www.packtpub.com

CreditsAuthors

JoeMinichino

JosephHowse

Reviewers

NandanBanerjee

TianCao

BrandonCastellano

HaojianJin

AdrianRosebrock

CommissioningEditor

AkramHussain

AcquisitionEditors

VivekAnantharaman

PrachiBisht

ContentDevelopmentEditor

RitikaSingh

TechnicalEditors

NovinaKewalramani

ShivaniKiranMistry

CopyEditor

SoniaCheema

ProjectCoordinator

MiltonDsouza

Proofreader

SafisEditing

Indexer

MonicaAjmeraMehta

Graphics

DishaHaria

ProductionCoordinator

ArvindkumarGupta

CoverWork

ArvindkumarGupta

AbouttheAuthorsJoeMinichinoisacomputervisionengineerforHooluxMedicalbydayandadeveloperoftheNoSQLdatabaseLokiJSbynight.Onweekends,heisaheavymetalsinger/songwriter.Heisapassionateprogrammerwhoisimmenselycuriousaboutprogramminglanguagesandtechnologiesandconstantlyexperimentswiththem.AtHoolux,JoeleadsthedevelopmentofanAndroidcomputervision-basedadvertisingplatformforthemedicalindustry.

BornandraisedinVarese,Lombardy,Italy,andcomingfromahumanisticbackgroundinphilosophy(atMilan’sUniversitàStatale),Joehasspenthislast11yearslivinginCork,Ireland,whichiswherehebecameacomputersciencegraduateattheCorkInstituteofTechnology.

Iamimmenselygratefultomypartner,Rowena,foralwaysencouragingme,andalsomytwolittledaughtersforinspiringme.Abigthankyoutothecollaboratorsandeditorsofthisbook,especiallyJoeHowse,AdrianRoesbrock,BrandonCastellano,theOpenCVcommunity,andthepeopleatPacktPublishing.

JosephHowselivesinCanada.Duringthewinters,hegrowshisbeard,whilehisfourcatsgrowtheirthickcoatsoffur.Helovescombinghiscatseverydayandsometimes,hiscatsalsopullhisbeard.

HehasbeenwritingforPacktPublishingsince2012.HisbooksincludeOpenCVforSecretAgents,OpenCVBlueprints,AndroidApplicationProgrammingwithOpenCV3,OpenCVComputerVisionwithPython,andPythonGameProgrammingbyExample.

Whenheisnotwritingbooksorgroominghiscats,heprovidesconsulting,training,andsoftwaredevelopmentservicesthroughhiscompany,NummistMedia(http://nummist.com).

http://nummist.com

AbouttheReviewersNandanBanerjeehasabachelor’sdegreeincomputerscienceandamaster’sinroboticsengineering.HestartedworkingwithSamsungElectronicsrightaftergraduation.HeworkedforayearatitsR&DcentreinBangalore.HealsoworkedintheWPI-CMUteamontheBostonDynamics’robot,Atlas,fortheDARPARoboticsChallenge.HeiscurrentlyworkingasaroboticssoftwareengineerinthetechnologyorganizationatiRobotCorporation.Heisanembeddedsystemsandroboticsenthusiastwithaninclinationtowardcomputervisionandmotionplanning.Hehasexperienceinvariouslanguages,includingC,C++,Python,Java,andDelphi.HealsohasasubstantialexperienceinworkingwithROS,OpenRAVE,OpenCV,PCL,OpenGL,CUDAandtheAndroidSDK.

Iwouldliketothanktheauthorandpublisherforcomingoutwiththiswonderfulbook.

TianCaoispursuinghisPhDincomputerscienceattheUniversityofNorthCarolinainChapelHill,USA,andworkingonprojectsrelatedtoimageanalysis,computervision,andmachinelearning.

Idedicatethisworktomyparentsandgirlfriend.

BrandonCastellanoisastudentfromCanadapursuinganMEScinelectricalengineeringattheUniversityofWesternOntario,CityofLondon,Canada.HereceivedhisBEScinthesamesubjectin2012.ThefocusofhisresearchisinparallelprocessingandGPGPU/FPGAoptimizationforreal-timeimplementationsofimageprocessingalgorithms.BrandonalsoworksforEagleVisionSystemsInc.,focusingontheuseofreal-timeimageprocessingforroboticsapplications.

WhilehehasbeenusingOpenCVandC++formorethan5years,hehasalsobeenadvocatingtheuseofPythonfrequentlyinhisresearch,mostnotably,foritsrapidspeedofdevelopment,allowinglow-levelinterfacingwithcomplexsystems.ThisisevidentinhisopensourceprojectshostedonGitHub,forexample,PySceneDetect,whichismostlywritteninPython.Inadditiontoimage/videoprocessing,hehasalsoworkedonimplementationsofthree-dimensionaldisplaysaswellasthesoftwaretoolstosupportthedevelopmentofsuchdisplays.

Inadditiontopostingtechnicalarticlesandtutorialsonhiswebsite(http://www.bcastell.com),heparticipatesinavarietyofbothopenandclosedsourceprojectsandcontributestoGitHubundertheusernameBreakthrough(http://www.github.com/Breakthrough).HeisanactivememberoftheSuperUserandStackOverflowcommunities(underthenameBreakthrough),andcanbecontacteddirectlyviahiswebsite.

Iwouldliketothankallmyfriendsandfamilyfortheirpatienceduringthepastfewyears(especiallymyparents,PeterandLori,andmybrother,Mitchell).Icouldnothaveaccomplishedeverythingwithouttheircontinuedloveandsupport.Ican’teverthankeveryoneenough.

Iwouldalsoliketoextendaspecialthankstoallofthedevelopersthatcontributetoopen

http://www.bcastell.com

http://www.github.com/Breakthrough

sourcesoftwarelibraries,specificallyOpenCV,whichhelpbringthedevelopmentofcutting-edgesoftwaretechnologyclosertoallthesoftwaredevelopersaroundtheworld,freeofcost.Iwouldalsoliketothankthosepeoplewhohelpwritedocumentation,submitbugreports,andwritetutorials/books(especiallytheauthorofthisbook!).Theircontributionsarevitaltothesuccessofanyopensourceproject,especiallyonethatisasextensiveandcomplexasOpenCV.

HaojianJinisasoftwareengineer/researcheratYahoo!Labs,Sunnyvale,CA.Helooksprimarilyatbuildingnewsystemsofwhat’spossibleoncommoditymobiledevices(orwithminimumhardwarechanges).Tocreatethingsthatdon’texisttoday,hespendslargechunksofhistimeplayingwithsignalprocessing,computervision,machinelearning,andnaturallanguageprocessingandusingthemininterestingways.Youcanfindmoreabouthimathttp://shift-3.com/

AdrianRosebrockisanauthorandbloggerathttp://www.pyimagesearch.com/.HeholdsaPhDincomputersciencefromtheUniversityofMaryland,BaltimoreCounty,USA,withafocusoncomputervisionandmachinelearning.

HehasconsultedfortheNationalCancerInstitutetodevelopmethodsthatautomaticallypredictbreastcancerriskfactorsusingbreasthistologyimages.Hehasalsoauthoredabook,PracticalPythonandOpenCV(http://pyimg.co/x7ed5),ontheutilizationofPythonandOpenCVtobuildreal-worldcomputervisionapplications.

http://shift-3.com/

http://www.pyimagesearch.com/

http://pyimg.co/x7ed5

www.PacktPub.com

Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.

DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

http://www.PacktPub.com


mailto:[email protected]


https://www2.packtpub.com/books/subscription/packtlib

Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.


PrefaceOpenCV3isastate-of-the-artcomputervisionlibrarythatisusedforavarietyofimageandvideoprocessingoperations.Someofthemorespectacularandfuturisticfeatures,suchasfacerecognitionorobjecttracking,areeasilyachievablewithOpenCV3.Learningthebasicconceptsbehindcomputervisionalgorithms,models,andOpenCV’sAPIwillenablethedevelopmentofallsortsofreal-worldapplications,includingsecurityandsurveillancetools.

Startingwithbasicimageprocessingoperations,thisbookwilltakeyouthroughajourneythatexploresadvancedcomputervisionconcepts.Computervisionisarapidlyevolvingsciencewhoseapplicationsintherealworldareexploding,sothisbookwillappealtocomputervisionnovicesaswellasexpertsofthesubjectwhowanttolearnaboutthebrandnewOpenCV3.0.0.

WhatthisbookcoversChapter1,SettingUpOpenCV,explainshowtosetupOpenCV3withPythonondifferentplatforms.Itwillalsotroubleshootcommonproblems.

Chapter2,HandlingFiles,Cameras,andGUIs,introducesOpenCV’sI/Ofunctionalities.Itwillalsodiscusstheconceptofaprojectandthebeginningsofanobject-orienteddesignforthisproject.

Chapter3,ProcessingImageswithOpenCV3,presentssometechniquesrequiredtoalterimages,suchasdetectingskintoneinanimage,sharpeninganimage,markingcontoursofsubjects,anddetectingcrosswalksusingalinesegmentdetector.

Chapter4,DepthEstimationandSegmentation,showsyouhowtousedatafromadepthcameratoidentifyforegroundandbackgroundregions,suchthatwecanlimitaneffecttoonlytheforegroundorbackground.

Chapter5,DetectingandRecognizingFaces,introducessomeofOpenCV’sfacedetectionfunctionalities,alongwiththedatafilesthatdefineparticulartypesoftrackableobjects.

Chapter6,RetrievingImagesandSearchingUsingImageDescriptors,showshowtodetectthefeaturesofanimagewiththehelpofOpenCVandmakeuseofthemtomatchandsearchforimages.

Chapter7,DetectingandRecognizingObjects,introducestheconceptofdetectingandrecognizingobjects,whichisoneofthemostcommonchallengesincomputervision.

Chapter8,TrackingObjects,exploresthevasttopicofobjecttracking,whichistheprocessoflocatingamovingobjectinamovieorvideofeedwiththehelpofacamera.

Chapter9,NeuralNetworkswithOpenCV–anIntroduction,introducesyoutoArtificialNeuralNetworksinOpenCVandillustratestheirusageinareal-lifeapplication.

WhatyouneedforthisbookYousimplyneedarelativelyrecentcomputer,asthefirstchapterwillguideyouthroughtheinstallationofallthenecessarysoftware.Awebcamishighlyrecommended,butnotnecessary.

WhothisbookisforThisbookisaimedatprogrammerswithworkingknowledgeofPythonaswellaspeoplewhowanttoexplorethetopicofcomputervisionusingtheOpenCVlibrary.NopreviousexperienceofcomputervisionorOpenCVisrequired.Programmingexperienceisrecommended.

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“Wecanincludeothercontextsthroughtheuseoftheincludedirective.”

Ablockofcodeissetasfollows:

importcv2

importnumpyasnp

img=cv2.imread('images/chess_board.png')

gray=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

gray=np.float32(gray)

dst=cv2.cornerHarris(gray,2,23,0.04)

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:





Anycommand-lineinputoroutputiswrittenasfollows:

mkdirbuild&&cdbuild

cmakeDCMAKE_BUILD_TYPE=Release-DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modulesDCMAKE_INSTALL_PREFIX=/usr/local..make

Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:”OnWindowsVista/Windows7/Windows8,clickontheStartmenu.”

NoteWarningsorimportantnotesappearinaboxlikethis.

TipTipsandtricksappearlikethis.

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.


http://www.packtpub.com/authors

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.


http://www.packtpub.com/support

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

http://www.packtpub.com/submit-errata

https://www.packtpub.com/books/content/support

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.


QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.


Chapter1.SettingUpOpenCVYoupickedupthisbooksoyoumayalreadyhaveanideaofwhatOpenCVis.Maybe,youheardofSci-Fi-soundingfeatures,suchasfacedetection,andgotintrigued.Ifthisisthecase,you’vemadetheperfectchoice.OpenCVstandsforOpenSourceComputerVision.Itisafreecomputervisionlibrarythatallowsyoutomanipulateimagesandvideostoaccomplishavarietyoftasksfromdisplayingthefeedofawebcamtopotentiallyteachingarobottorecognizereal-lifeobjects.

Inthisbook,youwilllearntoleveragetheimmensepotentialofOpenCVwiththePythonprogramminglanguage.Pythonisanelegantlanguagewitharelativelyshallowlearningcurveandverypowerfulfeatures.ThischapterisaquickguidetosettingupPython2.7,OpenCV,andotherrelatedlibraries.Aftersetup,wealsolookatOpenCV’sPythonsamplescriptsanddocumentation.

NoteIfyouwishtoskiptheinstallationprocessandjumprightintoaction,youcandownloadthevirtualmachine(VM)I’vemadeavailableathttp://techfort.github.io/pycv/.

ThisfileiscompatiblewithVirtualBox,afree-to-usevirtualizationapplicationthatletsyoubuildandrunVMs.TheVMI’vebuiltisbasedonUbuntuLinux14.04andhasallthenecessarysoftwareinstalledsothatyoucanstartcodingrightaway.

ThisVMrequiresatleast2GBofRAMtorunsmoothly,somakesurethatyouallocateatleast2(but,ideally,morethan4)GBofRAMtotheVM,whichmeansthatyourhostmachinewillneedatleast6GBofRAMtosustainit.

Thefollowingrelatedlibrariesarecoveredinthischapter:

NumPy:ThislibraryisadependencyofOpenCV’sPythonbindings.Itprovidesnumericcomputingfunctionality,includingefficientarrays.SciPy:ThislibraryisascientificcomputinglibrarythatiscloselyrelatedtoNumPy.ItisnotrequiredbyOpenCV,butitisusefulformanipulatingdatainOpenCVimages.OpenNI:ThislibraryisanoptionaldependencyofOpenCV.Itaddsthesupportforcertaindepthcameras,suchasAsusXtionPRO.SensorKinect:ThislibraryisanOpenNIpluginandoptionaldependencyofOpenCV.ItaddssupportfortheMicrosoftKinectdepthcamera.

Forthisbook’spurposes,OpenNIandSensorKinectcanbeconsideredoptional.TheyareusedthroughoutChapter4,DepthEstimationandSegmentation,butarenotusedintheotherchaptersorappendices.

NoteThisbookfocusesonOpenCV3,thenewmajorreleaseoftheOpenCVlibrary.AlladditionalinformationaboutOpenCVisavailableathttp://opencv.org,anditsdocumentationisavailableathttp://docs.opencv.org/master.

http://techfort.github.io/pycv/

http://opencv.org

http://docs.opencv.org/master

ChoosingandusingtherightsetuptoolsWearefreetochoosevarioussetuptools,dependingonouroperatingsystemandhowmuchconfigurationwewanttodo.Let’stakeanoverviewofthetoolsforWindows,Mac,Ubuntu,andotherUnix-likesystems.

InstallationonWindowsWindowsdoesnotcomewithPythonpreinstalled.However,installationwizardsareavailableforprecompiledPython,NumPy,SciPy,andOpenCV.Alternatively,wecanbuildfromasource.OpenCV’sbuildsystemusesCMakeforconfigurationandeitherVisualStudioorMinGWforcompilation.

Ifwewantsupportfordepthcameras,includingKinect,weshouldfirstinstallOpenNIandSensorKinect,whichareavailableasprecompiledbinarieswithinstallationwizards.Then,wemustbuildOpenCVfromasource.

NoteTheprecompiledversionofOpenCVdoesnotoffersupportfordepthcameras.

OnWindows,OpenCV2offersbettersupportfor32-bitPythonthan64-bitPython;however,withthemajorityofcomputerssoldtodaybeing64-bitsystems,ourinstructionswillreferto64-bit.Allinstallershave32-bitversionsavailablefromthesamesiteasthe64-bit.

Someofthefollowingstepsrefertoeditingthesystem’sPATHvariable.ThistaskcanbedoneintheEnvironmentVariableswindowofControlPanel.

1. OnWindowsVista/Windows7/Windows8,clickontheStartmenuandlaunchControlPanel.Now,navigatetoSystemandSecurity|System|Advancedsystemsettings.ClickontheEnvironmentVariables…button.

2. OnWindowsXP,clickontheStartmenuandnavigatetoControlPanel|System.SelecttheAdvancedtab.ClickontheEnvironmentVariables…button.

3. Now,underSystemvariables,selectPathandclickontheEdit…button.4. Makechangesasdirected.5. Toapplythechanges,clickonalltheOKbuttons(untilwearebackinthemain

windowofControlPanel).6. Then,logoutandlogbackin(alternatively,reboot).

Usingbinaryinstallers(nosupportfordepthcameras)YoucanchoosetoinstallPythonanditsrelatedlibrariesseparatelyifyouprefer;however,therearePythondistributionsthatcomewithinstallersthatwillsetuptheentireSciPystack(whichincludesPythonandNumPy),whichmakeitverytrivialtosetupthedevelopmentenvironment.

OnesuchdistributionisAnacondaPython(downloadableathttp://09c8d0b2229f813c1b93-c95ac804525aac4b6dba79b00b39d1d3.r79.cf1.rackcdn.com/Anaconda-2.1.0Windows-x86_64.exe).Oncetheinstallerisdownloaded,runitandremembertoaddthepathtotheAnacondainstallationtoyourPATHvariablefollowingtheprecedingprocedure.

HerearethestepstosetupPython7,NumPy,SciPy,andOpenCV:

http://09c8d0b2229f813c1b93%C2%ADc95ac804525aac4b6dba79b00b39d1d3.r79.cf1.rackcdn.com/Anaconda-2.1.0%C2%ADWindows-x86_64.exe

1. Downloadandinstallthe32-bitPython2.7.9fromhttps://www.python.org/ftp/python/2.7.9/python-2.7.9.amd64.msi.

2. DownloadandinstallNumPy1.6.2fromhttp://www.lfd.uci.edu/~gohlke/pythonlibs/#numpyhttp://sourceforge.net/projects/numpy/files/NumPy/1.6.2/numpy-1.6.2-win32-superpack-python2.7.exe/download(notethatinstallingNumPyonWindows64-bitisabittrickyduetothelackofa64-bitFortrancompileronWindows,whichNumPydependson.Thebinaryattheprecedinglinkisunofficial).

3. DownloadandinstallSciPy11.0fromhttp://www.lfd.uci.edu/~gohlke/pythonlibs/#scipyhttp://sourceforge.net/projects/scipy/files/scipy/0.11.0/scipy-0.11.0win32-superpack-python2.7.exe/download(thisisthesameasNumPyandthesearecommunityinstallers).

4. Downloadtheself-extractingZIPofOpenCV3.0.0fromhttps://github.com/Itseez/opencv.RunthisZIP,andwhenprompted,enteradestinationfolder,whichwewillrefertoas<unzip_destination>.Asubfolder,<unzip_destination>\opencv,iscreated.

5. Copy<unzip_destination>\opencv\build\python\2.7\cv2.pydtoC:\Python2.7\Lib\site-packages(assumingthatwehadinstalledPython2.7tothedefaultlocation).IfyouinstalledPython2.7withAnaconda,usetheAnacondainstallationfolderinsteadofthedefaultPythoninstallation.Now,thenewPythoninstallationcanfindOpenCV.

6. AfinalstepisnecessaryifwewantPythonscriptstorunusingthenewPythoninstallationbydefault.Editthesystem’sPATHvariableandappend;C:\Python2.7(assumingthatwehadinstalledPython2.7tothedefaultlocation)oryourAnacondainstallationfolder.RemoveanypreviousPythonpaths,suchas;C:\Python2.6.Logoutandlogbackin(alternatively,reboot).

UsingCMakeandcompilersWindowsdoesnotcomewithanycompilersorCMake.Weneedtoinstallthem.Ifwewantsupportfordepthcameras,includingKinect,wealsoneedtoinstallOpenNIandSensorKinect.

Let’sassumethatwehavealreadyinstalled32-bitPython2.7,NumPy,andSciPyeitherfrombinaries(asdescribedpreviously)orfromasource.Now,wecanproceedwithinstallingcompilersandCMake,optionallyinstallingOpenNIandSensorKinect,andthenbuildingOpenCVfromthesource:

1. DownloadandinstallCMake3.1.2fromhttp://www.cmake.org/files/v3.1/cmake-3.1.2-win32-x86.exe.Whenrunningtheinstaller,selecteitherAddCMaketothesystemPATHforallusersorAddCMaketothesystemPATHforcurrentuser.Don’tworryaboutthefactthata64-bitversionofCMakeisnotavailableCMakeisonlyaconfigurationtoolanddoesnotperformanycompilationsitself.Instead,onWindows,itcreatesprojectfilesthatcanbeopenedwithVisualStudio.

2. DownloadandinstallMicrosoftVisualStudio2013(theDesktopeditionifyouareworkingonWindows7)fromhttps://www.visualstudio.com/products/free-developer-offers-vs.aspx?slcid=0x409&type=weborMinGW.

https://www.python.org/ftp/python/2.7.9/python-2.7.9.amd64.msi

http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpyhttp://sourceforge.net/projects/numpy/files/NumPy/1.6.2/numpy-1.6.2-win32-superpack-python2.7.exe/download

http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipyhttp://sourceforge.net/projects/scipy/files/scipy/0.11.0/scipy-0.11.0%C2%ADwin32-superpack-python2.7.exe/download

https://github.com/Itseez/opencv

http://www.cmake.org/files/v3.1/cmake-3.1.2-win32-x86.exe

https://www.visualstudio.com/products/free-developer-offers-vs.aspx?slcid=0x409&type=web%20or%20MinGW

NotethatyouwillneedtosigninwithyourMicrosoftaccountandifyoudon’thaveone,youcancreateoneonthespot.Installthesoftwareandrebootafterinstallationiscomplete.

ForMinGW,gettheinstallerfromhttp://sourceforge.net/projects/mingw/files/Installer/mingw-get-setup.exe/downloadandhttp://sourceforge.net/projects/mingw/files/OldFiles/mingw-get-inst/mingw-get-inst-20120426/mingw-get-inst-20120426.exe/download.Whenrunningtheinstaller,makesurethatthedestinationpathdoesnotcontainspacesandthattheoptionalC++compilerisincluded.Editthesystem’sPATHvariableandappend;C:\MinGW\bin(assumingthatMinGWisinstalledtothedefaultlocation).Rebootthesystem.

3. Optionally,downloadandinstallOpenNI1.5.4.0fromthelinksprovidedintheGitHubhomepageofOpenNIathttps://github.com/OpenNI/OpenNI.

4. YoucandownloadandinstallSensorKinect0.93fromhttps://github.com/avin2/SensorKinect/blob/unstable/Bin/SensorKinect093-Bin-Win32-v5.1.2.1.msi?raw=true(32-bit).Alternatively,for64-bitPython,downloadthesetupfromhttps://github.com/avin2/SensorKinect/blob/unstable/Bin/SensorKinect093-Bin-Win64-v5.1.2.1.msi?raw=true(64-bit).Notethatthisrepositoryhasbeeninactiveformorethanthreeyears.

5. Downloadtheself-extractingZIPofOpenCV3.0.0fromhttps://github.com/Itseez/opencv.Runtheself-extractingZIP,andwhenprompted,enteranydestinationfolder,whichwewillrefertoas<unzip_destination>.Asubfolder,<unzip_destination>\opencv,isthencreated.

6. OpenCommandPromptandmakeanotherfolderwhereourbuildwillgousingthiscommand:

>mkdir<build_folder>

Changethedirectoryofthebuildfolder:

>cd<build_folder>

7. Now,wearereadytoconfigureourbuild.Tounderstandalltheoptions,wecanreadthecodein<unzip_destination>\opencv\CMakeLists.txt.However,forthisbook’spurposes,weonlyneedtousetheoptionsthatwillgiveusareleasebuildwithPythonbindings,andoptionally,depthcamerasupportviaOpenNIandSensorKinect.

8. OpenCMake(cmake-gui)andspecifythelocationofthesourcecodeofOpenCVandthefolderwhereyouwouldliketobuildthelibrary.ClickonConfigure.Selecttheprojecttobegenerated.Inthiscase,selectVisualStudio12(whichcorrespondstoVisualStudio2013).AfterCMakehasfinishedconfiguringtheproject,itwilloutputalistofbuildoptions.Ifyouseearedbackground,itmeansthatyourprojectmayneedtobereconfigured:CMakemightreportthatithasfailedtofindsomedependencies.ManyofOpenCV’sdependenciesareoptional,sodonotbetooconcernedyet.

Note

http://sourceforge.net/projects/mingw/files/Installer/mingw-get-setup.exe/download

http://sourceforge.net/projects/mingw/files/OldFiles/mingw-get-inst/mingw-get-inst-20120426/mingw-get-inst-20120426.exe/download

https://github.com/OpenNI/OpenNI

https://github.com/avin2/SensorKinect/blob/unstable/Bin/SensorKinect093-Bin-Win32-v5.1.2.1.msi?raw=true

https://github.com/avin2/SensorKinect/blob/unstable/Bin/SensorKinect093-Bin-Win64-v5.1.2.1.msi?raw=true

https://github.com/Itseez/opencv

Ifthebuildfailstocompleteoryourunintoproblemslater,tryinstallingmissingdependencies(oftenavailableasprebuiltbinaries),andthenrebuildOpenCVfromthisstep.

Youhavetheoptionofselecting/deselectingbuildoptions(accordingtothelibrariesyouhaveinstalledonyourmachine)andclickonConfigureagain,untilyougetaclearbackground(white).

9. Attheendofthisprocess,youcanclickonGenerate,whichwillcreateanOpenCV.slnfileinthefolderyou’vechosenforthebuild.Youcanthennavigateto<build_folder>/OpenCV.slnandopenthefilewithVisualStudio2013,andproceedwithbuildingtheproject,ALL_BUILD.YouwillneedtobuildboththeDebugandReleaseversionsofOpenCV,sogoaheadandbuildthelibraryintheDebugmode,thenselectReleaseandrebuildit(F7isthekeytolaunchthebuild).

10. Atthisstage,youwillhaveabinfolderintheOpenCVbuilddirectory,whichwillcontainallthegenerated.dllfilesthatwillallowyoutoincludeOpenCVinyourprojects.

Alternatively,forMinGW,runthefollowingcommand:

>cmake-D:CMAKE_BUILD_TYPE=RELEASE-D:WITH_OPENNI=ON-G

"MinGWMakefiles"<unzip_destination>\opencv

IfOpenNIisnotinstalled,omit-D:WITH_OPENNI=ON.(Inthiscase,depthcameraswillnotbesupported.)IfOpenNIandSensorKinectareinstalledtonondefaultlocations,modifythecommandtoinclude-D:OPENNI_LIB_DIR=<openni_install_destination>\Lib-D:OPENNI_INCLUDE_DIR=

<openni_install_destination>\Include-

D:OPENNI_PRIME_SENSOR_MODULE_BIN_DIR=

<sensorkinect_install_destination>\Sensor\Bin.

Alternatively,forMinGW,runthiscommand:

>mingw32-make

11. Copy<build_folder>\lib\Release\cv2.pyd(fromaVisualStudiobuild)or<build_folder>\lib\cv2.pyd(fromaMinGWbuild)to<python_installation_folder>\site-packages.

12. Finally,editthesystem’sPATHvariableandappend;<build_folder>/bin/Release(foraVisualStudiobuild)or;<build_folder>/bin(foraMinGWbuild).Rebootyoursystem.

InstallingonOSXSomeversionsofMacusedtocomewithaversionofPython2.7preinstalledthatwerecustomizedbyAppleforasystem’sinternalneeds.However,thishaschangedandthestandardversionofOSXshipswithastandardinstallationofPython.Onpython.org,youcanalsofindauniversalbinarythatiscompatiblewithboththenewIntelsystemsandthelegacyPowerPC.

NoteYoucanobtainthisinstallerathttps://www.python.org/downloads/release/python-279/(refertotheMacOSX32-bitPPCortheMacOSX64-bitIntellinks).InstallingPythonfromthedownloaded.dmgfilewillsimplyoverwriteyourcurrentsysteminstallationofPython.

ForMac,thereareseveralpossibleapproachesforobtainingstandardPython2.7,NumPy,SciPy,andOpenCV.AllapproachesultimatelyrequireOpenCVtobecompiledfromasourceusingXcodeDeveloperTools.However,dependingontheapproach,thistaskisautomatedforusinvariouswaysbythird-partytools.WewilllookatthesekindsofapproachesusingMacPortsorHomebrew.ThesetoolscanpotentiallydoeverythingthatCMakecan,plustheyhelpusresolvedependenciesandseparateourdevelopmentlibrariesfromsystemlibraries.

TipIrecommendMacPorts,especiallyifyouwanttocompileOpenCVwithdepthcamerasupportviaOpenNIandSensorKinect.Relevantpatchesandbuildscripts,includingsomethatImaintain,areready-madeforMacPorts.Bycontrast,Homebrewdoesnotcurrentlyprovideaready-madesolutiontocompileOpenCVwithdepthcamerasupport.

Beforeproceeding,let’smakesurethattheXcodeDeveloperToolsareproperlysetup:

1. DownloadandinstallXcodefromtheMacAppStoreorhttps://developer.apple.com/xcode/downloads/.Duringinstallation,ifthereisanoptiontoinstallCommandLineTools,selectit.

2. OpenXcodeandacceptthelicenseagreement.3. Afinalstepisnecessaryiftheinstallerdoesnotgiveustheoptiontoinstall

CommandLineTools.NavigatetoXcode|Preferences|Downloads,andclickontheInstallbuttonnexttoCommandLineTools.WaitfortheinstallationtofinishandquitXcode.

Alternatively,youcaninstallXcodecommand-linetoolsbyrunningthefollowingcommand(intheterminal):

$xcode-select–install

Now,wehavetherequiredcompilersforanyapproach.

UsingMacPortswithready-madepackages

http://python.org

https://www.python.org/downloads/release/python-279/

https://developer.apple.com/xcode/downloads/

WecanusetheMacPortspackagemanagertohelpussetupPython2.7,NumPy,andOpenCV.MacPortsprovidesterminalcommandsthatautomatetheprocessofdownloading,compiling,andinstallingvariouspiecesofopensourcesoftware(OSS).MacPortsalsoinstallsdependenciesasneeded.Foreachpieceofsoftware,thedependenciesandbuildrecipesaredefinedinaconfigurationfilecalledaPortfile.AMacPortsrepositoryisacollectionofPortfiles.

StartingfromasystemwhereXcodeanditscommand-linetoolsarealreadysetup,thefollowingstepswillgiveusanOpenCVinstallationviaMacPorts:

1. DownloadandinstallMacPortsfromhttp://www.macports.org/install.php.2. IfyouwantsupportfortheKinectdepthcamera,youneedtotellMacPortswhereto

downloadthecustomPortfilesthatIhavewritten.Todoso,edit/opt/local/etc/macports/sources.conf(assumingthatMacPortsisinstalledtothedefaultlocation).Justabovetheline,rsync://rsync.macports.org/release/ports/[default],addthefollowingline:

http://nummist.com/opencv/ports.tar.gz

Savethefile.Now,MacPortsknowsthatithastosearchforPortfilesinmyonlinerepositoryfirst,andthenthedefaultonlinerepository.

3. OpentheterminalandrunthefollowingcommandtoupdateMacPorts:

$sudoportselfupdate

Whenprompted,enteryourpassword.

4. Now(ifweareusingmyrepository),runthefollowingcommandtoinstallOpenCVwithPython2.7bindingsandsupportfordepthcameras,includingKinect:

$sudoportinstallopencv+python27+openni_sensorkinect

Alternatively(withorwithoutmyrepository),runthefollowingcommandtoinstallOpenCVwithPython2.7bindingsandsupportfordepthcameras,excludingKinect:

$sudoportinstallopencv+python27+openni

NoteDependencies,includingPython2.7,NumPy,OpenNI,and(inthefirstexample)SensorKinect,areautomaticallyinstalledaswell.

Byadding+python27tothecommand,wespecifythatwewanttheopencvvariant(buildconfiguration)withPython2.7bindings.Similarly,+openni_sensorkinectspecifiesthevariantwiththebroadestpossiblesupportfordepthcamerasviaOpenNIandSensorKinect.Youmayomit+openni_sensorkinectifyoudonotintendtousedepthcameras,oryoumayreplaceitwith+openniifyoudointendtouseOpenNI-compatibledepthcamerasbutjustnotKinect.Toseethefulllistoftheavailablevariantsbeforeinstalling,wecanenterthefollowingcommand:

$portvariantsopencv

http://www.macports.org/install.php

Dependingonourcustomizationneeds,wecanaddothervariantstotheinstallcommand.Forevenmoreflexibility,wecanwriteourownvariants(asdescribedinthenextsection).

5. Also,runthefollowingcommandtoinstallSciPy:

$sudoportinstallpy27-scipy

6. ThePythoninstallation’sexecutableisnamedpython2.7.Ifwewanttolinkthedefaultpythonexecutabletopython2.7,let’salsorunthiscommand:

$sudoportinstallpython_select

$sudoportselectpythonpython27

UsingMacPortswithyourowncustompackagesWithafewextrasteps,wecanchangethewaythatMacPortscompilesOpenCVoranyotherpieceofsoftware.Aspreviouslymentioned,MacPorts’buildrecipesaredefinedinconfigurationfilescalledPortfiles.BycreatingoreditingPortfiles,wecanaccesshighlyconfigurablebuildtools,suchasCMake,whilealsobenefittingfromMacPorts’features,suchasdependencyresolution.

Let’sassumethatwealreadyhaveMacPortsinstalled.Now,wecanconfigureMacPortstousethecustomPortfilesthatwewrite:

1. CreateafoldersomewheretoholdourcustomPortfiles.Wewillrefertothisfolderas<local_repository>.

2. Editthe/opt/local/etc/macports/sources.conffile(assumingthatMacPortsisinstalledtothedefaultlocation).Justabovethersync://rsync.macports.org/release/ports/[default]line,addthisline:

file://<local_repository>

Forexample,if<local_repository>is/Users/Joe/Portfiles,addthefollowingline:

file:///Users/Joe/Portfiles

Notethetripleslashesandsavethefile.Now,MacPortsknowsthatithastosearchforPortfilesin<local_repository>first,andthen,itsdefaultonlinerepository.

3. OpentheterminalandupdateMacPortstoensurethatwehavethelatestPortfilesfromthedefaultrepository:

$sudoportselfupdate

4. Let’scopythedefaultrepository’sopencvPortfileasanexample.Weshouldalsocopythedirectorystructure,whichdetermineshowthepackageiscategorizedbyMacPorts:

$mkdir<local_repository>/graphics/

$cp

/opt/local/var/macports/sources/rsync.macports.org/release/ports/graphi

cs/opencv<local_repository>/graphics

Alternatively,foranexamplethatincludesKinectsupport,wecoulddownloadmyonlinerepositoryfromhttp://nummist.com/opencv/ports.tar.gz,unzipit,andcopyitsentiregraphicsfolderinto<local_repository>:

$cp<unzip_destination>/graphics<local_repository>

5. Edit<local_repository>/graphics/opencv/Portfile.NotethatthisfilespecifiestheCMakeconfigurationflags,dependencies,andvariants.FordetailsonthePortfileediting,gotohttp://guide.macports.org/#development.

ToseewhichCMakeconfigurationflagsarerelevanttoOpenCV,weneedtolookatitssourcecode.Downloadthesourcecodearchivefromhttps://github.com/Itseez/opencv/archive/3.0.0.zip,unzipittoanylocation,andread<unzip_destination>/OpenCV-3.0.0/CMakeLists.txt.

AftermakinganyeditstothePortfile,saveit.

6. Now,weneedtogenerateanindexfileinourlocalrepositorysothatMacPortscanfindthenewPortfile:

$cd<local_repository>

$portindex

7. Fromnowon,wecantreatourcustomopencvfilejustlikeanyotherMacPortspackage.Forexample,wecaninstallitasfollows:

$sudoportinstallopencv+python27+openni_sensorkinect

Notethatourlocalrepository’sPortfiletakesprecedenceoverthedefaultrepository’sPortfilebecauseoftheorderinwhichtheyarelistedin/opt/local/etc/macports/sources.conf.

UsingHomebrewwithready-madepackages(nosupportfordepthcameras)Homebrewisanotherpackagemanagerthatcanhelpus.Normally,MacPortsandHomebrewshouldnotbeinstalledonthesamemachine.

StartingfromasystemwhereXcodeanditscommand-linetoolsarealreadysetup,thefollowingstepswillgiveusanOpenCVinstallationviaHomebrew:

1. OpentheterminalandrunthefollowingcommandtoinstallHomebrew:

$ruby-e"$(curl-fsSkLraw.github.com/mxcl/homebrew/go)"

2. UnlikeMacPorts,HomebrewdoesnotautomaticallyputitsexecutablesinPATH.Todoso,createoreditthe~/.profilefileandaddthislineatthetopofthecode:

exportPATH=/usr/local/bin:/usr/local/sbin:$PATH

SavethefileandrunthiscommandtorefreshPATH:

http://nummist.com/opencv/ports.tar.gz

http://guide.macports.org/#development

https://github.com/Itseez/opencv/archive/3.0.0.zip

$source~/.profile

NotethatexecutablesinstalledbyHomebrewnowtakeprecedenceoverexecutablesinstalledbythesystem.

3. ForHomebrew’sself-diagnosticreport,runthefollowingcommand:

$brewdoctor

Followanytroubleshootingadviceitgives.

4. Now,updateHomebrew:

$brewupdate

5. RunthefollowingcommandtoinstallPython2.7:

$brewinstallpython

6. Now,wecaninstallNumPy.Homebrew’sselectionofthePythonlibrarypackagesislimited,soweuseaseparatepackagemanagementtoolcalledpip,whichcomeswithHomebrew’sPython:

$pipinstallnumpy

7. SciPycontainssomeFortrancode,soweneedanappropriatecompiler.WecanuseHomebrewtoinstallthegfortrancompiler:

$brewinstallgfortran

Now,wecaninstallSciPy:

$pipinstallscipy

8. ToinstallOpenCVona64-bitsystem(allnewMachardwaresincelate2006),runthefollowingcommand:

$brewinstallopencv

TipDownloadingtheexamplecode

YoucandownloadtheexamplecodefilesforallPacktPublishingbooksthatyouhavepurchasedfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

UsingHomebrewwithyourowncustompackagesHomebrewmakesiteasytoeditexistingpackagedefinitions:

$breweditopencv

ThepackagedefinitionsareactuallyscriptsintheRubyprogramminglanguage.TipsoneditingthemcanbefoundontheHomebrewWikipageat


http://www.packtpub.com/support

https://github.com/mxcl/homebrew/wiki/Formula-Cookbook.AscriptmayspecifyMakeorCMakeconfigurationflags,amongotherthings.

ToseewhichCMakeconfigurationflagsarerelevanttoOpenCV,weneedtolookatitssourcecode.Downloadthesourcecodearchivefromhttps://github.com/Itseez/opencv/archive/3.0.0.zip,unzipittoanylocation,andread<unzip_destination>/OpenCV-2.4.3/CMakeLists.txt.

AftermakingeditstotheRubyscript,saveit.

Thecustomizedpackagecanbetreatedasnormal.Forexample,itcanbeinstalledasfollows:

$brewinstallopencv

https://github.com/mxcl/homebrew/wiki/Formula-Cookbook


InstallationonUbuntuanditsderivativesFirstandforemost,hereisaquicknoteonUbuntu’sversionsofanoperatingsystem:Ubuntuhasa6-monthreleasecycleinwhicheachreleaseiseithera.04ora.10minorversionofamajorversion(14atthetimeofwriting).Everytwoyears,however,Ubuntureleasesaversionclassifiedaslong-termsupport(LTS)whichwillgrantyouafiveyearsupportbyCanonical(thecompanybehindUbuntu).Ifyouworkinanenterpriseenvironment,itiscertainlyadvisabletoinstalloneoftheLTSversions.Thelatestoneavailableis14.04.

UbuntucomeswithPython2.7preinstalled.ThestandardUbunturepositorycontainsOpenCV2.4.9packageswithoutsupportfordepthcameras.Atthetimeofwritingthis,OpenCV3isnotyetavailablethroughtheUbunturepositories,sowewillhavetobuilditfromsource.Fortunately,thevastmajorityofUnix-likeandLinuxsystemscomewithallthenecessarysoftwaretobuildaprojectfromscratchalreadyinstalled.Whenbuiltfromsource,OpenCVcansupportdepthcamerasviaOpenNIandSensorKinect,whichareavailableasprecompiledbinarieswithinstallationscripts.

UsingtheUbunturepository(nosupportfordepthcameras)WecaninstallPythonandallitsnecessarydependenciesusingtheaptpackagemanager,byrunningthefollowingcommands:

>sudoapt-getinstallbuild-essential

>sudoapt-getinstallcmakegitlibgtk2.0-devpkg-configlibavcodecdevlibavformat-devlibswscale-dev

>sudoapt-getinstallpython-devpython-numpylibtbb2libtbb-devlibjpeg-

devlibpng-devlibtiff-devlibjasper-devlibdc1394-22-dev

Equivalently,wecouldhaveusedUbuntuSoftwareCenter,whichistheaptpackagemanager’sgraphicalfrontend.

BuildingOpenCVfromasourceNowthatwehavetheentirePythonstackandcmakeinstalled,wecanbuildOpenCV.First,weneedtodownloadthesourcecodefromhttps://github.com/Itseez/opencv/archive/3.0.0-beta.zip.

Extractthearchiveandmoveitintotheunzippedfolderinaterminal.

Then,runthefollowingcommands:

>mkdirbuild

>cdbuild

>cmake-DCMAKE_BUILD_TYPE=Release-DCMAKE_INSTALL_PREFIX=/usr/local..

>make

>makeinstall

Aftertheinstallationterminates,youmightwanttolookatOpenCV’sPythonsamplesin<opencv_folder>/opencv/samples/pythonand<script_folder>/opencv/samples/python2.

https://github.com/Itseez/opencv/archive/3.0.0-beta.zip

InstallationonotherUnix-likesystemsTheapproachesforUbuntu(asdescribedpreviously)arelikelytoworkonanyLinuxdistributionderivedfromUbuntu14.04LTSorUbuntu14.10asfollows:

Kubuntu14.04LTSorKubuntu14.10Xubuntu14.04LTSorXubuntu14.10LinuxMint17

OnDebianLinuxanditsderivatives,theaptpackagemanagerworksthesameasonUbuntu,thoughtheavailablepackagesmaydiffer.

OnGentooLinuxanditsderivatives,thePortagepackagemanagerissimilartoMacPorts(asdescribedpreviously),thoughtheavailablepackagesmaydiffer.

OnFreeBSDderivatives,theprocessofinstallationisagainsimilartoMacPorts;infact,MacPortsderivesfromtheportsinstallationsystemadoptedonFreeBSD.ConsulttheremarkableFreeBSDHandbookathttps://www.freebsd.org/doc/handbook/foranoverviewofthesoftwareinstallationprocess.

OnotherUnix-likesystems,thepackagemanagerandavailablepackagesmaydiffer.Consultyourpackagemanager’sdocumentationandsearchforpackageswithopencvintheirnames.RememberthatOpenCVanditsPythonbindingsmightbesplitintomultiplepackages.

Also,lookforanyinstallationnotespublishedbythesystemprovider,therepositorymaintainer,orthecommunity.SinceOpenCVusescameradriversandmediacodecs,gettingallofitsfunctionalitytoworkcanbetrickyonsystemswithpoormultimediasupport.Undersomecircumstances,systempackagesmightneedtobereconfiguredorreinstalledforcompatibility.

IfpackagesareavailableforOpenCV,checktheirversionnumber.OpenCV3orhigherisrecommendedforthisbook’spurposes.Also,checkwhetherthepackagesofferPythonbindingsanddepthcamerasupportviaOpenNIandSensorKinect.Finally,checkwhetheranyoneinthedevelopercommunityhasreportedsuccessorfailureinusingthepackages.

If,instead,wewanttodoacustombuildofOpenCVfromsource,itmightbehelpfultorefertotheinstallationscriptforUbuntu(asdiscussedpreviously)andadaptittothepackagemanagerandpackagesthatarepresentonanothersystem.

https://www.freebsd.org/doc/handbook/

InstallingtheContribmodulesUnlikewithOpenCV2.4,somemodulesarecontainedinarepositorycalledopencv_contrib,whichisavailableathttps://github.com/Itseez/opencv_contrib.IhighlyrecommendinstallingthesemodulesastheycontainextrafunctionalitiesthatarenotincludedinOpenCV,suchasthefacerecognitionmodule.

Oncedownloaded(eitherthroughziporgit,Irecommendgitsothatyoucankeepuptodatewithasimplegitpullcommand),youcanrerunyourcmakecommandtoincludethebuildingofOpenCVwiththeopencv_contribmodulesasfollows:

cmake-DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules

<opencv_source_directory>

So,ifyou’vefollowedthestandardprocedureandcreatedabuilddirectoryinyourOpenCVdownloadfolder,youshouldrunthefollowingcommand:

mkdirbuild&&cdbuild

cmake-DCMAKE_BUILD_TYPE=Release-DOPENCV_EXTRA_MODULES_PATH=

<opencv_contrib>/modules-DCMAKE_INSTALL_PREFIX=/usr/local..

make

https://github.com/Itseez/opencv_contrib

RunningsamplesRunningafewsamplescriptsisagoodwaytotestwhetherOpenCViscorrectlysetup.ThesamplesareincludedinOpenCV’ssourcecodearchive.

OnWindows,weshouldhavealreadydownloadedandunzippedOpenCV’sself-extractingZIP.Findthesamplesin<unzip_destination>/opencv/samples.

OnUnix-likesystems,includingMac,downloadthesourcecodearchivefromhttps://github.com/Itseez/opencv/archive/3.0.0.zipandunzipittoanylocation(ifwehavenotalreadydoneso).Findthesamplesin<unzip_destination>/OpenCV-3.0.0/samples.

Someofthesamplescriptsrequirecommand-linearguments.However,thefollowingscripts(amongothers)shouldworkwithoutanyarguments:

python/camera.py:Thisscriptdisplaysawebcamfeed(assumingthatawebcamispluggedin).python/drawing.py:Thisscriptdrawsaseriesofshapes,suchasascreensaver.python2/hist.py:Thisscriptdisplaysaphoto.PressA,B,C,D,orEtoseethevariationsofthephotoalongwithacorrespondinghistogramofcolororgrayscalevalues.python2/opt_flow.py(missingfromtheUbuntupackage):Thisscriptdisplaysawebcamfeedwithasuperimposedvisualizationofanopticalflow(suchasthedirectionofmotion).Forexample,slowlywaveyourhandatthewebcamtoseetheeffect.Press1or2foralternativevisualizations.

Toexitascript,pressEsc(notthewindow’sclosebutton).

IfweencountertheImportError:Nomodulenamedcv2.cvmessage,thenthismeansthatwearerunningthescriptfromaPythoninstallationthatdoesnotknowanythingaboutOpenCV.Therearetwopossibleexplanationsforthis:

SomestepsintheOpenCVinstallationmighthavefailedorbeenmissed.Gobackandreviewthesteps.IfwehavemultiplePythoninstallationsonthemachine,wemightbeusingthewrongversionofPythontolaunchthescript.Forexample,onMac,itmightbethecasethatOpenCVisinstalledforMacPortsPython,butwearerunningthescriptwiththesystem’sPython.Gobackandreviewtheinstallationstepsabouteditingthesystempath.Also,trylaunchingthescriptmanuallyfromthecommandlineusingcommandssuchasthis:

$pythonpython/camera.py

Youcanalsousethefollowingcommand:

$python2.7python/camera.py

AsanotherpossiblemeansofselectingadifferentPythoninstallation,tryeditingthesamplescripttoremovethe#!lines.TheselinesmightexplicitlyassociatethescriptwiththewrongPythoninstallation(forourparticularsetup).


Findingdocumentation,help,andupdatesOpenCV’sdocumentationcanbefoundonlineathttp://docs.opencv.org/.ThedocumentationincludesacombinedAPIreferenceforOpenCV’snewC++API,itsnewPythonAPI(whichisbasedontheC++API),oldCAPI,anditsoldPythonAPI(whichisbasedontheCAPI).Whenlookingupaclassorfunction,besuretoreadthesectionaboutthenewPythonAPI(thecv2module),andnottheoldPythonAPI(thecvmodule).

ThedocumentationisalsoavailableasseveraldownloadablePDFfiles:

APIreference:Thisdocumentationcanbefoundathttp://docs.opencv.org/modules/refman.htmlTutorials:Thesedocumentscanbefoundathttp://docs.opencv.org/doc/tutorials/tutorials.html(thesetutorialsusetheC++code;foraPythonportofthetutorials’code,seetherepositoryofAbidRahmanK.athttp://goo.gl/EPsD1)

IfyouwritecodeonairplanesorotherplaceswithoutInternetaccess,youwilldefinitelywanttokeepofflinecopiesofthedocumentation.

Ifthedocumentationdoesnotseemtoansweryourquestions,trytalkingtotheOpenCVcommunity.Herearesomesiteswhereyouwillfindhelpfulpeople:

TheOpenCVforum:http://www.answers.opencv.org/questions/DavidMillánEscrivá‘sblog(oneofthisbook’sreviewers):http://blog.damiles.com/AbidRahmanK.‘sblog(oneofthisbook’sreviewers):http://www.opencvpython.blogspot.com/AdrianRosebrock’swebsite(oneofthisbook’sreviewers):http://www.pyimagesearch.com/JoeMinichino’swebsiteforthisbook(authorofthisbook):http://techfort.github.io/pycv/JoeHowse’swebsiteforthisbook(authorofthefirsteditionofthisbook):http://nummist.com/opencv/

Lastly,ifyouareanadvanceduserwhowantstotrynewfeatures,bugfixes,andsamplescriptsfromthelatest(unstable)OpenCVsourcecode,havealookattheproject’srepositoryathttps://github.com/Itseez/opencv/.

http://docs.opencv.org/

http://docs.opencv.org/modules/refman.html

http://docs.opencv.org/doc/tutorials/tutorials.html

http://goo.gl/EPsD1

http://www.answers.opencv.org/questions/

http://blog.damiles.com/

http://www.opencvpython.blogspot.com/

http://www.pyimagesearch.com/

http://techfort.github.io/pycv/

http://nummist.com/opencv/

https://github.com/Itseez/opencv/

SummaryBynow,weshouldhaveanOpenCVinstallationthatcandoeverythingweneedfortheprojectdescribedinthisbook.Dependingonwhichapproachwetook,wemightalsohaveasetoftoolsandscriptsthatareusabletoreconfigureandrebuildOpenCVforourfutureneeds.

WeknowwheretofindOpenCV’sPythonsamples.Thesesamplescoveredadifferentrangeoffunctionalitiesoutsidethisbook’sscope,buttheyareusefulasadditionallearningaids.

Inthenextchapter,wewillfamiliarizeourselveswiththemostbasicfunctionsoftheOpenCVAPI,namely,displayingimages,videos,capturingvideosthroughawebcam,andhandlingbasickeyboardandmouseinputs.

Chapter2.HandlingFiles,Cameras,andGUIsInstallingOpenCVandrunningsamplesisfun,butatthisstage,wewanttotryitoutourselves.ThischapterintroducesOpenCV’sI/Ofunctionality.Wealsodiscusstheconceptofaprojectandthebeginningsofanobject-orienteddesignforthisproject,whichwewillfleshoutinsubsequentchapters.

BystartingwithalookattheI/Ocapabilitiesanddesignpatterns,wewillbuildourprojectinthesamewaywewouldmakeasandwich:fromtheoutsidein.Breadslicesandspread,orendpointsandglue,comebeforefillingsoralgorithms.Wechoosethisapproachbecausecomputervisionismostlyextroverted—itcontemplatestherealworldoutsideourcomputer—andwewanttoapplyalloursubsequentalgorithmicworktotherealworldthroughacommoninterface.

BasicI/OscriptsMostCVapplicationsneedtogetimagesasinput.Mostalsoproduceimagesasoutput.AninteractiveCVapplicationmightrequireacameraasaninputsourceandawindowasanoutputdestination.However,otherpossiblesourcesanddestinationsincludeimagefiles,videofiles,andrawbytes.Forexample,rawbytesmightbetransmittedviaanetworkconnection,ortheymightbegeneratedbyanalgorithmifweincorporateproceduralgraphicsintoourapplication.Let’slookateachofthesepossibilities.

Reading/writinganimagefileOpenCVprovidestheimread()andimwrite()functionsthatsupportvariousfileformatsforstillimages.ThesupportedformatsvarybysystembutshouldalwaysincludetheBMPformat.Typically,PNG,JPEG,andTIFFshouldbeamongthesupportedformatstoo.

Let’sexploretheanatomyoftherepresentationofanimageinPythonandNumPy.

Nomattertheformat,eachpixelhasavalue,butthedifferenceisinhowthepixelisrepresented.Forexample,wecancreateablacksquareimagefromscratchbysimplycreatinga2DNumPyarray:

img=numpy.zeros((3,3),dtype=numpy.uint8)

Ifweprintthisimagetoaconsole,weobtainthefollowingresult:

array([[0,0,0],

[0,0,0],

[0,0,0]],dtype=uint8)

Eachpixelisrepresentedbyasingle8-bitinteger,whichmeansthatthevaluesforeachpixelareinthe0-255range.

Let’snowconvertthisimageintoBlue-green-red(BGR)usingcv2.cvtColor:

img=cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)

Let’sobservehowtheimagehaschanged:

array([[[0,0,0],

[0,0,0],

[0,0,0]],

[[0,0,0],

[0,0,0],

[0,0,0]],

[[0,0,0],

[0,0,0],

[0,0,0]]],dtype=uint8)

Asyoucansee,eachpixelisnowrepresentedbyathree-elementarray,witheachintegerrepresentingtheB,G,andRchannels,respectively.Othercolorspaces,suchasHSV,willberepresentedinthesameway,albeitwithdifferentvalueranges(forexample,thehuevalueoftheHSVcolorspacehasarangeof0-180)anddifferentnumbersofchannels.

Youcancheckthestructureofanimagebyinspectingtheshapeproperty,whichreturnsrows,columns,andthenumberofchannels(ifthereismorethanone).

Considerthisexample:

>>>img=numpy.zeros((3,3),dtype=numpy.uint8)

>>>img.shape

Theprecedingcodewillprint(3,3).IfyouthenconvertedtheimagetoBGR,theshape

wouldbe(3,3,3),whichindicatesthepresenceofthreechannelsperpixel.

Imagescanbeloadedfromonefileformatandsavedtoanother.Forexample,let’sconvertanimagefromPNGtoJPEG:

importcv2

image=cv2.imread('MyPic.png')

cv2.imwrite('MyPic.jpg',image)

NoteMostoftheOpenCVfunctionalitiesthatweuseareinthecv2module.YoumightcomeacrossotherOpenCVguidesthatinsteadrelyonthecvorcv2.cvmodules,whicharelegacyversions.ThereasonwhythePythonmoduleiscalledcv2isnotbecauseitisaPythonbindingmoduleforOpenCV2.x.x,butbecauseithasintroducedabetterAPI,whichleveragesobject-orientedprogrammingasopposedtothepreviouscvmodule,whichadheredtoamoreproceduralstyleofprogramming.

Bydefault,imread()returnsanimageintheBGRcolorformatevenifthefileusesagrayscaleformat.BGRrepresentsthesamecolorspaceasred-green-blue(RGB),butthebyteorderisreversed.

Optionally,wemayspecifythemodeofimread()tobeoneofthefollowingenumerators:

IMREAD_ANYCOLOR=4

IMREAD_ANYDEPTH=2

IMREAD_COLOR=1

IMREAD_GRAYSCALE=0

IMREAD_LOAD_GDAL=8

IMREAD_UNCHANGED=-1

Forexample,let’sloadaPNGfileasagrayscaleimage(losinganycolorinformationintheprocess),andthen,saveitasagrayscalePNGimage:

importcv2

grayImage=cv2.imread('MyPic.png',cv2.IMREAD_GRAYSCALE)

cv2.imwrite('MyPicGray.png',grayImage)

Toavoidunnecessaryheadaches,useabsolutepathstoyourimages(forexample,C:\Users\Joe\Pictures\MyPic.pngonWindowsor/home/joe/pictures/MyPic.pngonUnix)atleastwhileyou’refamiliarizingyourselfwithOpenCV’sAPI.Thepathofanimage,unlessabsolute,isrelativetothefolderthatcontainsthePythonscript,sointheprecedingexample,MyPic.pngwouldhavetobeinthesamefolderasyourPythonscriptortheimagewon’tbefound.

Regardlessofthemode,imread()discardsanyalphachannel(transparency).Theimwrite()functionrequiresanimagetobeintheBGRorgrayscaleformatwithacertainnumberofbitsperchannelthattheoutputformatcansupport.Forexample,bmprequires8bitsperchannel,whilePNGallowseither8or16bitsperchannel.

ConvertingbetweenanimageandrawbytesConceptually,abyteisanintegerrangingfrom0to255.Inallreal-timegraphicapplicationstoday,apixelistypicallyrepresentedbyonebyteperchannel,thoughotherrepresentationsarealsopossible.

AnOpenCVimageisa2Dor3Darrayofthe.arraytype.An8-bitgrayscaleimageisa2Darraycontainingbytevalues.A24-bitBGRimageisa3Darray,whichalsocontainsbytevalues.Wemayaccessthesevaluesbyusinganexpression,suchasimage[0,0]orimage[0,0,0].Thefirstindexisthepixel’sycoordinateorrow,0beingthetop.Thesecondindexisthepixel’sxcoordinateorcolumn,0beingtheleftmost.Thethirdindex(ifapplicable)representsacolorchannel.

Forexample,inan8-bitgrayscaleimagewithawhitepixelintheupper-leftcorner,image[0,0]is255.Fora24-bitBGRimagewithabluepixelintheupper-leftcorner,image[0,0]is[255,0,0].

NoteAsanalternativetousinganexpression,suchasimage[0,0]orimage[0,0]=128,wemayuseanexpression,suchasimage.item((0,0))orimage.setitem((0,0),128).Thelatterexpressionsaremoreefficientforsingle-pixeloperations.However,aswewillseeinsubsequentchapters,weusuallywanttoperformoperationsonlargeslicesofanimageratherthanonsinglepixels.

Providedthatanimagehas8bitsperchannel,wecancastittoastandardPythonbytearray,whichisone-dimensional:

byteArray=bytearray(image)

Conversely,providedthatbytearraycontainsbytesinanappropriateorder,wecancastandthenreshapeittogetanumpy.arraytypethatisanimage:

grayImage=numpy.array(grayByteArray).reshape(height,width)

bgrImage=numpy.array(bgrByteArray).reshape(height,width,3)

Asamorecompleteexample,let’sconvertbytearray,whichcontainsrandombytestoagrayscaleimageandaBGRimage:

importcv2

importnumpy

importos

#Makeanarrayof120,000randombytes.

randomByteArray=bytearray(os.urandom(120000))

flatNumpyArray=numpy.array(randomByteArray)

#Convertthearraytomakea400x300grayscaleimage.

grayImage=flatNumpyArray.reshape(300,400)

cv2.imwrite('RandomGray.png',grayImage)

#Convertthearraytomakea400x100colorimage.

bgrImage=flatNumpyArray.reshape(100,400,3)

cv2.imwrite('RandomColor.png',bgrImage)

Afterrunningthisscript,weshouldhaveapairofrandomlygeneratedimages,RandomGray.pngandRandomColor.png,inthescript’sdirectory.

NoteHere,weusePython’sstandardos.urandom()functiontogeneraterandomrawbytes,whichwewillthenconverttoaNumPyarray.NotethatitisalsopossibletogeneratearandomNumPyarraydirectly(andmoreefficiently)usingastatement,suchasnumpy.random.randint(0,256,120000).reshape(300,400).Theonlyreasonweuseos.urandom()istohelpdemonstrateaconversionfromrawbytes.

Accessingimagedatawithnumpy.arrayNowthatyouhaveabetterunderstandingofhowanimageisformed,wecanstartperformingbasicoperationsonit.Weknowthattheeasiest(andmostcommon)waytoloadanimageinOpenCVistousetheimreadfunction.Wealsoknowthatthiswillreturnanimage,whichisreallyanarray(eithera2Dor3Done,dependingontheparametersyoupassedtoimread()).

They.arraystructureiswelloptimizedforarrayoperations,anditallowscertainkindsofbulkmanipulationsthatarenotavailableinaplainPythonlist.Thesekindsof.arraytype-specificoperationscomeinhandyforimagemanipulationsinOpenCV.Let’sexploreimagemanipulationsfromthestartandstepbystepthough,withabasicexample:sayyouwanttomanipulateapixelatthecoordinates,(0,0),ofaBGRimageandturnitintoawhitepixel.

importcv

importnumpyasnp

img=cv.imread('MyPic.png')

img[0,0]=[255,255,255]

Ifyouthenshowedtheimagewithastandardimshow()call,youwillseeawhitedotinthetop-leftcorneroftheimage.Naturally,thisisn’tveryuseful,butitshowswhatcanbeaccomplished.Let’snowleveragetheabilityofnumpy.arraytooperatetransformationstoanarraymuchfasterthanaplainPythonarray.

Let’ssaythatyouwanttochangethebluevalueofaparticularpixel,forexample,thepixelatcoordinates,(150,120).Thenumpy.arraytypeprovidesaveryhandymethod,item(),whichtakesthreeparameters:thex(orleft)position,y(ortop),andtheindexwithinthearrayat(x,y)position(rememberthatinaBGRimage,thedataatacertainpositionisathree-elementarraycontainingtheB,G,andRvaluesinthisorder)andreturnsthevalueattheindexposition.Anotheritemset()methodsetsthevalueofaparticularchannelofaparticularpixeltoaspecifiedvalue(itemset()takestwoarguments:athree-elementtuple(x,y,andindex)andthenewvalue).

Inthisexample,wewillchangethevalueofblueat(150,120)fromitscurrentvalue(127)toanarbitrary255:

importcv

importnumpyasnp


printimg.item(150,120,0)//printsthecurrentvalueofBforthat

pixel

img.itemset((150,120,0),255)

printimg.item(150,120,0)//prints255

Rememberthatwedothiswithnumpy.arrayfortworeasons:numpy.arrayisanextremelyoptimizedlibraryforthesekindofoperations,andbecauseweobtainmorereadablecodethroughNumPy’selegantmethodsratherthantherawindexaccessofthefirstexample.

Thisparticularcodedoesn’tdomuchinitself,butitdoesopenaworldofpossibilities.Itis,however,advisablethatyouutilizebuilt-infiltersandmethodstomanipulateanentireimage;theaboveapproachisonlysuitableforsmallregionsofinterest.

Now,let’stakealookataverycommonoperation,namely,manipulatingchannels.Sometimes,you’llwanttozero-outallthevaluesofaparticularchannel(B,G,orR).

TipUsingloopstomanipulatethePythonarraysisverycostlyintermsofruntimeandshouldbeavoidedatallcosts.Usingarrayindexingallowsforefficientmanipulationofpixels.Thisisacostlyandslowoperation,especiallyifyoumanipulatevideos,you’llfindyourselfwithajitteryoutput.Thenafeaturecalledindexingcomestotherescue.SettingallG(green)valuesofanimageto0isassimpleasusingthiscode:

importcv

importasnp


img[:,:,1]=0

Thisisafairlyimpressivepieceofcodeandeasytounderstand.Therelevantlineisthelastone,whichbasicallyinstructstheprogramtotakeallpixelsfromallrowsandcolumnsandsettheresultingvalueatindexoneofthethree-elementarray,representingthecolorofthepixelto0.Ifyoudisplaythisimage,youwillnoticeacompleteabsenceofgreen.

ThereareanumberofinterestingthingswecandobyaccessingrawpixelswithNumPy’sarrayindexing;oneofthemisdefiningregionsofinterests(ROI).Oncetheregionisdefined,wecanperformanumberofoperations,namely,bindingthisregiontoavariable,andthenevendefiningasecondregionandassigningitthevalueofthefirstone(visuallycopyingaportionoftheimageovertoanotherpositionintheimage):

importcv

importnumpyasnp


my_roi=img[0:100,0:100]

img[300:400,300:400]=my_roi

It’simportanttomakesurethatthetworegionscorrespondintermsofsize.Ifnot,NumPywill(rightly)complainthatthetwoshapesmismatch.

Finally,thereareafewinterestingdetailswecanobtainfromnumpy.array,suchastheimagepropertiesusingthiscode:

importcv

importnumpyasnp


printimg.shape

printimg.size

printimg.dtype

Thesethreepropertiesareinthisorder:

Shape:NumPyreturnsatuplecontainingthewidth,height,and—iftheimageisincolor—thenumberofchannels.Thisisusefultodebugatypeofimage;iftheimageismonochromaticorgrayscale,itwillnotcontainachannel’svalue.Size:Thispropertyreferstothesizeofanimageinpixels.Datatype:Thispropertyreferstothedatatypeusedforanimage(normallyavariationofanunsignedintegertypeandthebitssupportedbythistype,thatis,uint8).

Allinall,itisstronglyadvisablethatyoufamiliarizeyourselfwithNumPyingeneralandnumpy.arrayinparticularwhenworkingwithOpenCV,asitisthefoundationofanimageprocessingdonewithPython.

Reading/writingavideofileOpenCVprovidestheVideoCaptureandVideoWriterclassesthatsupportvariousvideofileformats.ThesupportedformatsvarybysystembutshouldalwaysincludeanAVI.Viaitsread()method,aVideoCaptureclassmaybepolledfornewframesuntilitreachestheendofitsvideofile.EachframeisanimageinaBGRformat.

Conversely,animagemaybepassedtothewrite()methodoftheVideoWriterclass,whichappendstheimagetoafileinVideoWriter.Let’slookatanexamplethatreadsframesfromoneAVIfileandwritesthemtoanotherwithaYUVencoding:

importcv2

videoCapture=cv2.VideoCapture('MyInputVid.avi')

fps=videoCapture.get(cv2.CAP_PROP_FPS)

size=(int(videoCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),

int(videoCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))

videoWriter=cv2.VideoWriter(

'MyOutputVid.avi',cv2.VideoWriter_fourcc('I','4','2','0'),fps,size)

success,frame=videoCapture.read()

whilesuccess:#Loopuntiltherearenomoreframes.

videoWriter.write(frame)

success,frame=videoCapture.read()

TheargumentstotheVideoWriterclassconstructordeservespecialattention.Avideo’sfilenamemustbespecified.Anypreexistingfilewiththisnameisoverwritten.Avideocodecmustalsobespecified.Theavailablecodecsmayvaryfromsystemtosystem.Thesearetheoptionsthatareincluded:

cv2.VideoWriter_fourcc('I','4','2','0'):ThisoptionisanuncompressedYUVencoding,4:2:0chromasubsampled.Thisencodingiswidelycompatiblebutproduceslargefiles.Thefileextensionshouldbe.avi.cv2.VideoWriter_fourcc('P','I','M','1'):ThisoptionisMPEG-1.Thefileextensionshouldbe.avi.cv2.VideoWriter_fourcc('X','V','I','D'):ThisoptionisMPEG-4andapreferredoptionifyouwanttheresultingvideosizetobeaverage.Thefileextensionshouldbe.avi.cv2.VideoWriter_fourcc('T','H','E','O'):ThisoptionisOggVorbis.Thefileextensionshouldbe.ogv.cv2.VideoWriter_fourcc('F','L','V','1'):ThisoptionisaFlashvideo.Thefileextensionshouldbe.flv.

Aframerateandframesizemustbespecifiedtoo.Sincewearecopyingvideoframesfromanothervideo,thesepropertiescanbereadfromtheget()methodoftheVideoCaptureclass.

CapturingcameraframesAstreamofcameraframesisrepresentedbytheVideoCaptureclasstoo.However,foracamera,weconstructaVideoCaptureclassbypassingthecamera’sdeviceindexinsteadofavideo’sfilename.Let’sconsideranexamplethatcaptures10secondsofvideofromacameraandwritesittoanAVIfile:

importcv2

cameraCapture=cv2.VideoCapture(0)

fps=30#anassumption

size=(int(cameraCapture.get(cv2.CAP_PROP_FRAME_WIDTH)),

int(cameraCapture.get(cv2.CAP_PROP_FRAME_HEIGHT)))

videoWriter=cv2.VideoWriter(

'MyOutputVid.avi',cv2.VideoWriter_fourcc('I','4','2','0'),fps,size)

success,frame=cameraCapture.read()

numFramesRemaining=10*fps-1

whilesuccessandnumFramesRemaining>0:

videoWriter.write(frame)


numFramesRemaining-=1

cameraCapture.release()

Unfortunately,theget()methodofaVideoCaptureclassdoesnotreturnanaccuratevalueforthecamera’sframerate;italwaysreturns0.Theofficialdocumentationathttp://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.htmlreads:

“WhenqueryingapropertythatisnotsupportedbythebackendusedbytheVideoCaptureclass,value0isreturned.”

Thisoccursmostcommonlyonsystemswherethedriveronlysupportsbasicfunctionalities.

ForthepurposeofcreatinganappropriateVideoWriterclassforthecamera,wehavetoeithermakeanassumptionabouttheframerate(aswedidinthecodepreviously)ormeasureitusingatimer.Thelatterapproachisbetterandwewillcoveritlaterinthischapter.

Thenumberofcamerasandtheirorderisofcoursesystem-dependent.Unfortunately,OpenCVdoesnotprovideanymeansofqueryingthenumberofcamerasortheirproperties.IfaninvalidindexisusedtoconstructaVideoCaptureclass,theVideoCaptureclasswillnotyieldanyframes;itsread()methodwillreturn(false,None).AgoodwaytopreventitfromtryingtoretrieveframesfromVideoCapturethatwerenotopenedcorrectlyistousetheVideoCapture.isOpenedmethod,whichreturnsaBoolean.

Theread()methodisinappropriatewhenweneedtosynchronizeasetofcamerasoramultiheadcamera(suchasastereocameraorKinect).Then,weusethegrab()and

http://docs.opencv.org/modules/highgui/doc/reading_and_writing_images_and_video.html

retrieve()methodsinstead.Forasetofcameras,weusethiscode:

success0=cameraCapture0.grab()

success1=cameraCapture1.grab()

ifsuccess0andsuccess1:

frame0=cameraCapture0.retrieve()

frame1=cameraCapture1.retrieve()

DisplayingimagesinawindowOneofthemostbasicoperationsinOpenCVisdisplayinganimage.Thiscanbedonewiththeimshow()function.IfyoucomefromanyotherGUIframeworkbackground,youwouldthinkitsufficienttocallimshow()todisplayanimage.Thisisonlypartiallytrue:theimagewillbedisplayed,andwilldisappearimmediately.Thisisbydesign,toenabletheconstantrefreshingofawindowframewhenworkingwithvideos.Here’saverysimpleexamplecodetodisplayanimage:

importcv2

importnumpyasnp

img=cv2.imread('my-image.png')

cv2.imshow('myimage',img)

cv2.waitKey()

cv2.destroyAllWindows()

Theimshow()functiontakestwoparameters:thenameoftheframeinwhichwewanttodisplaytheimage,andtheimageitself.We’lltalkaboutwaitKey()inmoredetailwhenweexplorethedisplayingofframesinawindow.

TheaptlynameddestroyAllWindows()functiondisposesofallthewindowscreatedbyOpenCV.

DisplayingcameraframesinawindowOpenCVallowsnamedwindowstobecreated,redrawn,anddestroyedusingthenamedWindow(),imshow(),anddestroyWindow()functions.Also,anywindowmaycapturekeyboardinputviathewaitKey()functionandmouseinputviathesetMouseCallback()function.Let’slookatanexamplewhereweshowtheframesofalivecamerainput:

importcv2

clicked=False

defonMouse(event,x,y,flags,param):

globalclicked

ifevent==cv2.EVENT_LBUTTONUP:

clicked=True

cameraCapture=cv2.VideoCapture(0)

cv2.namedWindow('MyWindow')

cv2.setMouseCallback('MyWindow',onMouse)

print'Showingcamerafeed.Clickwindoworpressanykeytostop.'


whilesuccessandcv2.waitKey(1)==-1andnotclicked:

cv2.imshow('MyWindow',frame)


cv2.destroyWindow('MyWindow')

cameraCapture.release()

TheargumentforwaitKey()isanumberofmillisecondstowaitforkeyboardinput.Thereturnvalueiseither-1(meaningthatnokeyhasbeenpressed)oranASCIIkeycode,suchas27forEsc.ForalistofASCIIkeycodes,seehttp://www.asciitable.com/.Also,notethatPythonprovidesastandardfunction,ord(),whichcanconvertacharactertoitsASCIIkeycode.Forexample,ord('a')returns97.

TipOnsomesystems,waitKey()mayreturnavaluethatencodesmorethanjusttheASCIIkeycode.(AbugisknowntooccuronLinuxwhenOpenCVusesGTKasitsbackendGUIlibrary.)Onallsystems,wecanensurethatweextractjusttheASCIIkeycodebyreadingthelastbytefromthereturnvaluelikethis:

keycode=cv2.waitKey(1)

ifkeycode!=-1:

keycode&=0xFF

OpenCV’swindowfunctionsandwaitKey()areinterdependent.OpenCVwindowsareonlyupdatedwhenwaitKey()iscalled,andwaitKey()onlycapturesinputwhenanOpenCVwindowhasfocus.

ThemousecallbackpassedtosetMouseCallback()shouldtakefivearguments,asseeninourcodesample.Thecallback’sparamargumentissetasanoptionalthirdargumentto

http://www.asciitable.com/

setMouseCallback().Bydefault,itis0.Thecallback’seventargumentisoneofthefollowingactions:

cv2.EVENT_MOUSEMOVE:Thiseventreferstomousemovementcv2.EVENT_LBUTTONDOWN:Thiseventreferstotheleftbuttondowncv2.EVENT_RBUTTONDOWN:Thisreferstotherightbuttondowncv2.EVENT_MBUTTONDOWN:Thisreferstothemiddlebuttondowncv2.EVENT_LBUTTONUP:Thisreferstotheleftbuttonupcv2.EVENT_RBUTTONUP:Thiseventreferstotherightbuttonupcv2.EVENT_MBUTTONUP:Thiseventreferstothemiddlebuttonupcv2.EVENT_LBUTTONDBLCLK:Thiseventreferstotheleftbuttonbeingdouble-clickedcv2.EVENT_RBUTTONDBLCLK:Thisreferstotherightbuttonbeingdouble-clickedcv2.EVENT_MBUTTONDBLCLK:Thisreferstothemiddlebuttonbeingdouble-clicked

Themousecallback’sflagsargumentmaybesomebitwisecombinationofthefollowingevents:

cv2.EVENT_FLAG_LBUTTON:Thiseventreferstotheleftbuttonbeingpressedcv2.EVENT_FLAG_RBUTTON:Thiseventreferstotherightbuttonbeingpressedcv2.EVENT_FLAG_MBUTTON:Thiseventreferstothemiddlebuttonbeingpressedcv2.EVENT_FLAG_CTRLKEY:ThiseventreferstotheCtrlkeybeingpressedcv2.EVENT_FLAG_SHIFTKEY:ThiseventreferstotheShiftkeybeingpressedcv2.EVENT_FLAG_ALTKEY:ThiseventreferstotheAltkeybeingpressed

Unfortunately,OpenCVdoesnotprovideanymeansofhandlingwindowevents.Forexample,wecannotstopourapplicationwhenawindow’sclosebuttonisclicked.DuetoOpenCV’slimitedeventhandlingandGUIcapabilities,manydevelopersprefertointegrateitwithotherapplicationframeworks.Laterinthischapter,wewilldesignanabstractionlayertohelpintegrateOpenCVintoanyapplicationframework.

ProjectCameo(facetrackingandimagemanipulation)OpenCVisoftenstudiedthroughacookbookapproachthatcoversalotofalgorithmsbutnothingabouthigh-levelapplicationdevelopment.Toanextent,thisapproachisunderstandablebecauseOpenCV’spotentialapplicationsaresodiverse.OpenCVisusedinawidevarietyofapplications:photo/videoeditors,motion-controlledgames,arobot’sAI,orpsychologyexperimentswherewelogparticipants’eyemovements.Acrosssuchdifferentusecases,canwetrulystudyausefulsetofabstractions?

Ibelievewecanandthesoonerwestartcreatingabstractions,thebetter.WewillstructureourstudyofOpenCVaroundasingleapplication,but,ateachstep,wewilldesignacomponentofthisapplicationtobeextensibleandreusable.

Wewilldevelopaninteractiveapplicationthatperformsfacetrackingandimagemanipulationsoncamerainputinrealtime.ThistypeofapplicationcoversabroadrangeofOpenCV’sfunctionalityandchallengesustocreateanefficient,effectiveimplementation.

Specifically,ourapplicationwillperformreal-timefacialmerging.Giventwostreamsofcamerainput(or,optionally,prerecordedvideoinput),theapplicationwillsuperimposefacesfromonestreamontofacesintheother.Filtersanddistortionswillbeappliedtogivethisblendedsceneaunifiedlookandfeel.Usersshouldhavetheexperienceofbeingengagedinaliveperformancewheretheyenteranotherenvironmentandpersona.ThistypeofuserexperienceispopularinamusementparkssuchasDisneyland.

Insuchanapplication,userswouldimmediatelynoticeflaws,suchasalowframerateorinaccuratetracking.Togetthebestresults,wewilltryseveralapproachesusingconventionalimaginganddepthimaging.

WewillcallourapplicationCameo.Acameois(injewelry)asmallportraitofapersonor(infilm)averybriefroleplayedbyacelebrity.

Cameo–anobject-orienteddesignPythonapplicationscanbewritteninapurelyproceduralstyle.Thisisoftendonewithsmallapplications,suchasourbasicI/Oscripts,discussedpreviously.However,fromnowon,wewilluseanobject-orientedstylebecauseitpromotesmodularityandextensibility.

FromouroverviewofOpenCV’sI/Ofunctionality,weknowthatallimagesaresimilar,regardlessoftheirsourceordestination.Nomatterhowweobtainastreamofimagesorwherewesenditasoutput,wecanapplythesameapplication-specificlogictoeachframeinthisstream.SeparationofI/Ocodeandapplicationcodebecomesespeciallyconvenientinanapplication,suchasCameo,whichusesmultipleI/Ostreams.

WewillcreateclassescalledCaptureManagerandWindowManagerashigh-levelinterfacestoI/Ostreams.OurapplicationcodemayuseCaptureManagertoreadnewframesand,optionally,todispatcheachframetooneormoreoutputs,includingastillimagefile,avideofile,andawindow(viaaWindowManagerclass).AWindowManagerclassletsourapplicationcodehandleawindowandeventsinanobject-orientedstyle.

BothCaptureManagerandWindowManagerareextensible.WecouldmakeimplementationsthatdonotrelyonOpenCVforI/O.Indeed,AppendixA,IntegratingwithPygame,OpenCVComputerVisionwithPython,usesaWindowManagersubclass.

Abstractingavideostreamwithmanagers.CaptureManagerAswehaveseen,OpenCVcancapture,show,andrecordastreamofimagesfromeitheravideofileoracamera,buttherearesomespecialconsiderationsineachcase.OurCaptureManagerclassabstractssomeofthedifferencesandprovidesahigher-levelinterfacetodispatchimagesfromthecapturestreamtooneormoreoutputs—astillimagefile,videofile,orawindow.

ACaptureManagerclassisinitializedwithaVideoCaptureclassandhastheenterFrame()andexitFrame()methodsthatshouldtypicallybecalledoneveryiterationofanapplication’smainloop.BetweenacalltoenterFrame()andexitFrame(),theapplicationmay(anynumberoftimes)setachannelpropertyandgetaframeproperty.Thechannelpropertyisinitially0andonlymultiheadcamerasuseothervalues.Theframepropertyisanimagecorrespondingtothecurrentchannel’sstatewhenenterFrame()wascalled.

ACaptureManagerclassalsohasthewriteImage(),startWritingVideo(),andstopWritingVideo()methodsthatmaybecalledatanytime.ActualfilewritingispostponeduntilexitFrame().Also,duringtheexitFrame()method,theframepropertymaybeshowninawindow,dependingonwhethertheapplicationcodeprovidesaWindowManagerclasseitherasanargumenttotheconstructorofCaptureManagerorbysettingapreviewWindowManagerproperty.

Iftheapplicationcodemanipulatesframe,themanipulationsarereflectedinrecordedfilesandinthewindow.ACaptureManagerclasshasaconstructorargumentandpropertycalledshouldMirrorPreview,whichshouldbeTrueifwewantframetobemirrored(horizontallyflipped)inthewindowbutnotinrecordedfiles.Typically,whenfacingacamera,userspreferlivecamerafeedtobemirrored.

RecallthataVideoWriterclassneedsaframerate,butOpenCVdoesnotprovideanywaytogetanaccurateframerateforacamera.TheCaptureManagerclassworksaroundthislimitationbyusingaframecounterandPython’sstandardtime.time()functiontoestimatetheframerateifnecessary.Thisapproachisnotfoolproof.Dependingonframeratefluctuationsandthesystem-dependentimplementationoftime.time(),theaccuracyoftheestimatemightstillbepoorinsomecases.However,ifwedeploytounknownhardware,itisbetterthanjustassumingthattheuser’scamerahasaparticularframerate.

Let’screateafilecalledmanagers.py,whichwillcontainourimplementationofCaptureManager.Theimplementationturnsouttobequitelong.So,wewilllookatitinseveralpieces.First,let’saddimports,aconstructor,andproperties,asfollows:

importcv2

importnumpy

importtime

classCaptureManager(object):

def__init__(self,capture,previewWindowManager=None,

shouldMirrorPreview=False):

self.previewWindowManager=previewWindowManager

self.shouldMirrorPreview=shouldMirrorPreview

self._capture=capture

self._channel=0

self._enteredFrame=False

self._frame=None

self._imageFilename=None

self._videoFilename=None

self._videoEncoding=None

self._videoWriter=None

self._startTime=None

self._framesElapsed=long(0)

self._fpsEstimate=None

@property

defchannel(self):

returnself._channel

@channel.setter

defchannel(self,value):

ifself._channel!=value:

self._channel=value

self._frame=None

@property

defframe(self):

ifself._enteredFrameandself._frameisNone:

_,self._frame=self._capture.retrieve()

returnself._frame

@property

defisWritingImage(self):

returnself._imageFilenameisnotNone

@property

defisWritingVideo(self):

returnself._videoFilenameisnotNone

Notethatmostofthemembervariablesarenon-public,asdenotedbytheunderscoreprefixinvariablenames,suchasself._enteredFrame.Thesenonpublicvariablesrelatetothestateofthecurrentframeandanyfile-writingoperations.Asdiscussedpreviously,theapplicationcodeonlyneedstoconfigureafewthings,whichareimplementedasconstructorargumentsandsettablepublicproperties:thecamerachannel,windowmanager,andtheoptiontomirrorthecamerapreview.

ThisbookassumesacertainleveloffamiliaritywithPython;however,ifyouaregetting

confusedbythose@annotations(forexample,@property),refertothePythondocumentationaboutdecorators,abuilt-infeatureofthelanguagethatallowsthewrappingofafunctionbyanotherfunction,normallyusedtoapplyauser-definedbehaviorinseveralplacesofanapplication(refertohttps://docs.python.org/2/reference/compound_stmts.html#grammar-token-decorator).

NotePythondoesnothavetheconceptofprivatemembervariablesandthesingle/doubleunderscoreprefix(_)isonlyaconvention.

Bythisconvention,inPython,variablesthatareprefixedwithasingleunderscoreshouldbetreatedasprotected(accessedonlywithintheclassanditssubclasses),whilevariablesthatareprefixedwithadoubleunderscoreshouldbetreatedasprivate(accessedonlywithintheclass).

Continuingwithourimplementation,let’saddtheenterFrame()andexitFrame()methodstomanagers.py:

defenterFrame(self):

"""Capturethenextframe,ifany."""

#Butfirst,checkthatanypreviousframewasexited.

assertnotself._enteredFrame,\

'previousenterFrame()hadnomatchingexitFrame()'

ifself._captureisnotNone:

self._enteredFrame=self._capture.grab()

defexitFrame(self):

"""Drawtothewindow.Writetofiles.Releasetheframe."""

#Checkwhetheranygrabbedframeisretrievable.

#Thegettermayretrieveandcachetheframe.

ifself.frameisNone:


return

#UpdatetheFPSestimateandrelatedvariables.

ifself._framesElapsed==0:

self._startTime=time.time()

else:

timeElapsed=time.time()-self._startTime

self._fpsEstimate=self._framesElapsed/timeElapsed

self._framesElapsed+=1

#Drawtothewindow,ifany.

ifself.previewWindowManagerisnotNone:

ifself.shouldMirrorPreview:

mirroredFrame=numpy.fliplr(self._frame).copy()

self.previewWindowManager.show(mirroredFrame)

else:

self.previewWindowManager.show(self._frame)

https://docs.python.org/2/reference/compound_stmts.html#grammar-token-decorator

#Writetotheimagefile,ifany.

ifself.isWritingImage:

cv2.imwrite(self._imageFilename,self._frame)

self._imageFilename=None

#Writetothevideofile,ifany.

self._writeVideoFrame()

#Releasetheframe.

self._frame=None


NotethattheimplementationofenterFrame()onlygrabs(synchronizes)aframe,whereasactualretrievalfromachannelispostponedtoasubsequentreadingoftheframevariable.TheimplementationofexitFrame()takestheimagefromthecurrentchannel,estimatesaframerate,showstheimageviathewindowmanager(ifany),andfulfillsanypendingrequeststowritetheimagetofiles.

Severalothermethodsalsopertaintofilewriting.Tofinishourclassimplementation,let’saddtheremainingfile-writingmethodstomanagers.py:

defwriteImage(self,filename):

"""Writethenextexitedframetoanimagefile."""

self._imageFilename=filename

defstartWritingVideo(

self,filename,

encoding=cv2.VideoWriter_fourcc('I','4','2','0')):

"""Startwritingexitedframestoavideofile."""

self._videoFilename=filename

self._videoEncoding=encoding

defstopWritingVideo(self):

"""Stopwritingexitedframestoavideofile."""

self._videoFilename=None

self._videoEncoding=None

self._videoWriter=None

def_writeVideoFrame(self):

ifnotself.isWritingVideo:

return

ifself._videoWriterisNone:

fps=self._capture.get(cv2.CAP_PROP_FPS)

iffps==0.0:

#Thecapture'sFPSisunknownsouseanestimate.

ifself._framesElapsed<20:

#Waituntilmoreframeselapsesothatthe

#estimateismorestable.

return

else:

fps=self._fpsEstimate

size=(int(self._capture.get(

cv2.CAP_PROP_FRAME_WIDTH)),

int(self._capture.get(

cv2.CAP_PROP_FRAME_HEIGHT)))

self._videoWriter=cv2.VideoWriter(

self._videoFilename,self._videoEncoding,

fps,size)

self._videoWriter.write(self._frame)

ThewriteImage(),startWritingVideo(),andstopWritingVideo()publicmethodssimplyrecordtheparametersforfile-writingoperations,whereastheactualwritingoperationsarepostponedtothenextcallofexitFrame().The_writeVideoFrame()nonpublicmethodcreatesorappendsavideofileinamannerthatshouldbefamiliarfromourearlierscripts.(SeetheReading/writingavideofilesection.)However,insituationswheretheframerateisunknown,weskipsomeframesatthestartofthecapturesessionsothatwehavetimetobuildupanestimateoftheframerate.

AlthoughourcurrentimplementationofCaptureManagerreliesonVideoCapture,wecouldmakeotherimplementationsthatdonotuseOpenCVforinput.Forexample,wecouldmakeasubclassthatisinstantiatedwithasocketconnection,whosebytestreamcouldbeparsedasastreamofimages.Wecouldalsomakeasubclassthatusesathird-partycameralibrarywithdifferenthardwaresupportthanwhatOpenCVprovides.However,forCameo,ourcurrentimplementationissufficient.

Abstractingawindowandkeyboardwithmanagers.WindowManagerAswehaveseen,OpenCVprovidesfunctionsthatcauseawindowtobecreated,destroyed,showanimage,andprocessevents.Ratherthanbeingmethodsofawindowclass,thesefunctionsrequireawindow’snametopassasanargument.Sincethisinterfaceisnotobject-oriented,itisinconsistentwithOpenCV’sgeneralstyle.Also,itisunlikelytobecompatiblewithotherwindoworeventhandlinginterfacesthatwemighteventuallywanttouseinsteadofOpenCV’s.

Forthesakeofobjectorientationandadaptability,weabstractthisfunctionalityintoaWindowManagerclasswiththecreateWindow(),destroyWindow(),show(),andprocessEvents()methods.Asaproperty,aWindowManagerclasshasafunctionobjectcalledkeypressCallback,which(ifnotNone)iscalledfromprocessEvents()inresponsetoanykeypress.ThekeypressCallbackobjectmusttakeasingleargument,suchasanASCIIkeycode.

Let’saddthefollowingimplementationofWindowManagertomanagers.py:

classWindowManager(object):

def__init__(self,windowName,keypressCallback=None):

self.keypressCallback=keypressCallback

self._windowName=windowName

self._isWindowCreated=False

@property

defisWindowCreated(self):

returnself._isWindowCreated

defcreateWindow(self):

cv2.namedWindow(self._windowName)

self._isWindowCreated=True

defshow(self,frame):

cv2.imshow(self._windowName,frame)

defdestroyWindow(self):

cv2.destroyWindow(self._windowName)

self._isWindowCreated=False

defprocessEvents(self):

keycode=cv2.waitKey(1)

ifself.keypressCallbackisnotNoneandkeycode!=-1:

#Discardanynon-ASCIIinfoencodedbyGTK.

keycode&=0xFF

self.keypressCallback(keycode)

Ourcurrentimplementationonlysupportskeyboardevents,whichwillbesufficientfor

Cameo.However,wecouldmodifyWindowManagertosupportmouseeventstoo.Forexample,theclass’sinterfacecouldbeexpandedtoincludeamouseCallbackproperty(andoptionalconstructorargument),butcouldotherwiseremainthesame.WithsomeeventframeworkotherthanOpenCV’s,wecouldsupportadditionaleventtypesinthesamewaybyaddingcallbackproperties.

AppendixA,IntegratingwithPygame,OpenCVComputerVisionwithPython,showsaWindowManagersubclassthatisimplementedwithPygame’swindowhandlingandeventframeworkinsteadofOpenCV’s.ThisimplementationimprovesonthebaseWindowManagerclassbyproperlyhandlingquitevents—forexample,whenauserclicksonawindow’sclosebutton.Potentially,manyothereventtypescanbehandledviaPygametoo.

Applyingeverythingwithcameo.CameoOurapplicationisrepresentedbyaCameoclasswithtwomethods:run()andonKeypress().Oninitialization,aCameoclasscreatesaWindowManagerclasswithonKeypress()asacallback,aswellasaCaptureManagerclassusingacameraandtheWindowManagerclass.Whenrun()iscalled,theapplicationexecutesamainloopinwhichframesandeventsareprocessed.Asaresultofeventprocessing,onKeypress()maybecalled.Thespacebarcausesascreenshottobetaken,Tabcausesascreencast(avideorecording)tostart/stop,andEsccausestheapplicationtoquit.

Inthesamedirectoryasmanagers.py,let’screateafilecalledcameo.pycontainingthefollowingimplementationofCameo:

importcv2

frommanagersimportWindowManager,CaptureManager

classCameo(object):

def__init__(self):

self._windowManager=WindowManager('Cameo',

self.onKeypress)

self._captureManager=CaptureManager(

cv2.VideoCapture(0),self._windowManager,True)

defrun(self):

"""Runthemainloop."""

self._windowManager.createWindow()

whileself._windowManager.isWindowCreated:

self._captureManager.enterFrame()

frame=self._captureManager.frame

#TODO:Filtertheframe(Chapter3).

self._captureManager.exitFrame()

self._windowManager.processEvents()

defonKeypress(self,keycode):

"""Handleakeypress.

space->Takeascreenshot.

tab->Start/stoprecordingascreencast.

escape->Quit.

"""

ifkeycode==32:#space

self._captureManager.writeImage('screenshot.png')

elifkeycode==9:#tab

ifnotself._captureManager.isWritingVideo:

self._captureManager.startWritingVideo(

'screencast.avi')

else:

self._captureManager.stopWritingVideo()

elifkeycode==27:#escape

self._windowManager.destroyWindow()

if__name__=="__main__":

Cameo().run()

Whenrunningtheapplication,notethatthelivecamerafeedismirrored,whilescreenshotsandscreencastsarenot.Thisistheintendedbehavior,aswepassTrueforshouldMirrorPreviewwheninitializingtheCaptureManagerclass.

Sofar,wedonotmanipulatetheframesinanywayexcepttomirrorthemforpreview.WewillstarttoaddmoreinterestingeffectsinChapter3,FilteringImages.

SummaryBynow,weshouldhaveanapplicationthatdisplaysacamerafeed,listensforkeyboardinput,and(oncommand)recordsascreenshotorscreencast.Wearereadytoextendtheapplicationbyinsertingsomeimage-filteringcode(Chapter3,FilteringImages)betweenthestartandendofeachframe.Optionally,wearealsoreadytointegrateothercameradriversorapplicationframeworks(AppendixA,IntegratingwithPygame,OpenCVComputerVisionwithPython)besidestheonessupportedbyOpenCV.

WealsonowhavetheknowledgetoprocessimagesandunderstandtheprincipleofimagemanipulationthroughtheNumPyarrays.Thisformstheperfectfoundationtounderstandthenexttopic,filteringimages.

Chapter3.ProcessingImageswithOpenCV3Soonerorlater,whenworkingwithimages,youwillfindyourselfinneedofalteringimages:beitapplyingartisticfilters,extrapolatingcertainsections,cutting,pasting,orwhateverelseyourmindcanconjure.Thischapterpresentssometechniquestoalterimages,andbytheendofit,youshouldbeabletoperformtasks,suchasdetectingskintoneinanimage,sharpeninganimage,markcontoursofsubjects,anddetectingcrosswalksusingalinesegmentdetector.

ConvertingbetweendifferentcolorspacesThereareliterallyhundredsofmethodsinOpenCVthatpertaintotheconversionofcolorspaces.Ingeneral,threecolorspacesareprevalentinmoderndaycomputervision:gray,BGR,andHue,Saturation,Value(HSV).

Grayisacolorspacethateffectivelyeliminatescolorinformationtranslatingtoshadesofgray:thiscolorspaceisextremelyusefulforintermediateprocessing,suchasfacedetection.BGRistheblue-green-redcolorspace,inwhicheachpixelisathree-elementarray,eachvaluerepresentingtheblue,green,andredcolors:webdeveloperswouldbefamiliarwithasimilardefinitionofcolors,excepttheorderofcolorsisRGB.InHSV,hueisacolortone,saturationistheintensityofacolor,andvaluerepresentsitsdarkness(orbrightnessattheoppositeendofthespectrum).

AquicknoteonBGRWhenIfirststarteddealingwiththeBGRcolorspace,somethingwasn’taddingup:the[0255255]value(noblue,fullgreen,andfullred)producestheyellowcolor.Ifyouhaveanartisticbackground,youwon’tevenneedtopickuppaintsandbrushestowitnessgreenandredmixintoamuddyshadeofbrown.Thatisbecausethecolormodelusedincomputingiscalledanadditiveanddealswithlights.Lightsbehavedifferentlyfrompaints(whichfollowthesubtractivecolormodel),and—assoftwarerunsoncomputerswhosemediumisamonitorthatemitslight—thecolormodelofreferenceistheadditiveone.

TheFourierTransformMuchoftheprocessingyouapplytoimagesandvideosinOpenCVinvolvestheconceptofFourierTransforminsomecapacity.JosephFourierwasan18thcenturyFrenchmathematicianwhodiscoveredandpopularizedmanymathematicalconcepts,andconcentratedhisworkonstudyingthelawsgoverningheat,andinmathematics,allthingswaveform.Inparticular,heobservedthatallwaveformsarejustthesumofsimplesinusoidsofdifferentfrequencies.

Inotherwords,thewaveformsyouobserveallaroundyouarethesumofotherwaveforms.Thisconceptisincrediblyusefulwhenmanipulatingimages,becauseitallowsustoidentifyregionsinimageswhereasignal(suchasimagepixels)changesalot,andregionswherethechangeislessdramatic.Wecanthenarbitrarilymarktheseregionsasnoiseorregionsofinterests,backgroundorforeground,andsoon.Thesearethefrequenciesthatmakeuptheoriginalimage,andwehavethepowertoseparatethemtomakesenseoftheimageandextrapolateinterestingdata.

NoteInanOpenCVcontext,thereareanumberofalgorithmsimplementedthatenableustoprocessimagesandmakesenseofthedatacontainedinthem,andthesearealsoreimplementedinNumPytomakeourlifeeveneasier.NumPyhasaFastFourierTransform(FFT)package,whichcontainsthefft2()method.ThismethodallowsustocomputeDiscreteFourierTransform(DFT)oftheimage.

Let’sexaminethemagnitudespectrumconceptofanimageusingFourierTransform.Themagnitudespectrumofanimageisanotherimage,whichgivesarepresentationoftheoriginalimageintermsofitschanges:thinkofitastakinganimageanddraggingallthebrightestpixelstothecenter.Then,yougraduallyworkyourwayouttotheborderwhereallthedarkestpixelshavebeenpushed.Immediately,youwillbeabletoseehowmanylightanddarkpixelsarecontainedinyourimageandthepercentageoftheirdistribution.

TheconceptofFourierTransformisthebasisofmanyalgorithmsusedforcommonimageprocessingoperations,suchasedgedetectionorlineandshapedetection.

Beforeexaminingtheseindetail,let’stakealookattwoconceptsthat—inconjunctionwiththeFourierTransform—formthefoundationoftheaforementionedprocessingoperations:highpassfiltersandlowpassfilters.

HighpassfilterAhighpassfilter(HPF)isafilterthatexaminesaregionofanimageandbooststheintensityofcertainpixelsbasedonthedifferenceintheintensitywiththesurroundingpixels.

Take,forexample,thefollowingkernel:

[[0,-0.25,0],

[-0.25,1,-0.25],

[0,-0.25,0]]

NoteAkernelisasetofweightsthatareappliedtoaregioninasourceimagetogenerateasinglepixelinthedestinationimage.Forexample,aksizeof7impliesthat49(7x7)sourcepixelsareconsideredingeneratingeachdestinationpixel.Wecanthinkofakernelasapieceoffrostedglassmovingoverthesourceimageandlettingthroughadiffusedblendofthesource’slight.

Aftercalculatingthesumofdifferencesoftheintensitiesofthecentralpixelcomparedtoalltheimmediateneighbors,theintensityofthecentralpixelwillbeboosted(ornot)ifahighlevelofchangesarefound.Inotherwords,ifapixelstandsoutfromthesurroundingpixels,itwillgetboosted.

Thisisparticularlyeffectiveinedgedetection,whereacommonformofHPFcalledhighboostfilterisused.

Bothhighpassandlowpassfiltersuseapropertycalledradius,whichextendstheareaoftheneighborsinvolvedinthefiltercalculation.

Let’sgothroughanexampleofanHPF:

importcv2

importnumpyasnp

fromscipyimportndimage

kernel_3x3=np.array([[-1,-1,-1],

[-1,8,-1],

[-1,-1,-1]])

kernel_5x5=np.array([[-1,-1,-1,-1,-1],

[-1,1,2,1,-1],

[-1,2,4,2,-1],

[-1,1,2,1,-1],

[-1,-1,-1,-1,-1]])

NoteNotethatbothfilterssumupto0,thereasonforthisisexplainedindetailintheEdgedetectionsection.

img=cv2.imread("../images/color1_small.jpg",0)

k3=ndimage.convolve(img,kernel_3x3)

k5=ndimage.convolve(img,kernel_5x5)

blurred=cv2.GaussianBlur(img,(11,11),0)

g_hpf=img-blurred

cv2.imshow("3x3",k3)

cv2.imshow("5x5",k5)

cv2.imshow("g_hpf",g_hpf)

cv2.waitKey()


Aftertheinitialimports,wedefinea3x3kernelanda5x5kernel,andthenweloadtheimageingrayscale.Normally,themajorityofimageprocessingisdonewithNumPy;however,inthisparticularcase,wewantto“convolve”animagewithagivenkernelandNumPyhappenstoonlyacceptone-dimensionalarrays.

Thisdoesnotmeanthattheconvolutionofdeeparrayscan’tbeachievedwithNumPy,justthatitwouldbeabitcomplex.Instead,ndimage(whichisapartofSciPy,soyoushouldhaveitinstalledaspertheinstructionsinChapter1,SettingUpOpenCV),makesthistrivial,throughitsconvolve()function,whichsupportstheclassicNumPyarraysthatthecv2modulesusetostoreimages.

WeapplytwoHPFswiththetwoconvolutionkernelswedefined.Lastly,wealsoimplementadifferentialmethodofobtainingaHPFbyapplyingalowpassfilterandcalculatingthedifferencewiththeoriginalimage.Youwillnoticethatthethirdmethodactuallyyieldsthebestresult,solet’salsoelaborateonlowpassfilters.

LowpassfilterIfanHPFbooststheintensityofapixel,givenitsdifferencewithitsneighbors,alowpassfilter(LPF)willsmoothenthepixelifthedifferencewiththesurroundingpixelsislowerthanacertainthreshold.Thisisusedindenoisingandblurring.Forexample,oneofthemostpopularblurring/smootheningfilters,theGaussianblur,isalowpassfilterthatattenuatestheintensityofhighfrequencysignals.

CreatingmodulesAsinthecaseofourCaptureManagerandWindowManagerclasses,ourfiltersshouldbereusableoutsideCameo.Thus,weshouldseparatethefiltersintotheirownPythonmoduleorfile.

Let’screateafilecalledfilters.pyinthesamedirectoryascameo.py.Weneedthefollowingimportstatementsinfilters.py:

importcv2

importnumpy

importutils

Let’salsocreateafilecalledutils.pyinthesamedirectory.Itshouldcontainthefollowingimportstatements:

importcv2

importnumpy

importscipy.interpolate

Wewillbeaddingfilterfunctionsandclassestofilters.py,whilemoregeneral-purposemathfunctionswillgoinutils.py.

EdgedetectionEdgesplayamajorroleinbothhumanandcomputervision.We,ashumans,caneasilyrecognizemanyobjecttypesandtheirposejustbyseeingabacklitsilhouetteoraroughsketch.Indeed,whenartemphasizesedgesandposes,itoftenseemstoconveytheideaofanarchetype,suchasRodin’sTheThinkerorJoeShuster’sSuperman.Software,too,canreasonaboutedges,poses,andarchetypes.Wewilldiscussthesekindsofreasoningsinlaterchapters.

OpenCVprovidesmanyedge-findingfilters,includingLaplacian(),Sobel(),andScharr().Thesefiltersaresupposedtoturnnon-edgeregionstoblackwhileturningedgeregionstowhiteorsaturatedcolors.However,theyarepronetomisidentifyingnoiseasedges.Thisflawcanbemitigatedbyblurringanimagebeforetryingtofinditsedges.OpenCValsoprovidesmanyblurringfilters,includingblur()(simpleaverage),medianBlur(),andGaussianBlur().Theargumentsfortheedge-findingandblurringfiltersvarybutalwaysincludeksize,anoddwholenumberthatrepresentsthewidthandheight(inpixels)ofafilter’skernel.

Forblurring,let’susemedianBlur(),whichiseffectiveinremovingdigitalvideonoise,especiallyincolorimages.Foredge-finding,let’suseLaplacian(),whichproducesboldedgelines,especiallyingrayscaleimages.AfterapplyingmedianBlur(),butbeforeapplyingLaplacian(),weshouldconverttheimagefromBGRtograyscale.

OncewehavetheresultofLaplacian(),wecaninvertittogetblackedgesonawhitebackground.Then,wecannormalizeit(sothatitsvaluesrangefrom0to1)andmultiplyitwiththesourceimagetodarkentheedges.Let’simplementthisapproachinfilters.py:

defstrokeEdges(src,dst,blurKsize=7,edgeKsize=5):

ifblurKsize>=3:

blurredSrc=cv2.medianBlur(src,blurKsize)

graySrc=cv2.cvtColor(blurredSrc,cv2.COLOR_BGR2GRAY)

else:

graySrc=cv2.cvtColor(src,cv2.COLOR_BGR2GRAY)

cv2.Laplacian(graySrc,cv2.CV_8U,graySrc,ksize=edgeKsize)

normalizedInverseAlpha=(1.0/255)*(255-graySrc)

channels=cv2.split(src)

forchannelinchannels:

channel[:]=channel*normalizedInverseAlpha

cv2.merge(channels,dst)

NotethatweallowkernelsizestobespecifiedasargumentsforstrokeEdges().TheblurKsizeargumentisusedasksizeformedianBlur(),whileedgeKsizeisusedasksizeforLaplacian().Withmywebcams,IfindthatablurKsizevalueof7andanedgeKsizevalueof5looksbest.Unfortunately,medianBlur()isexpensivewithalargeksize,suchas7.

TipIfyouencounterperformanceproblemswhenrunningstrokeEdges(),trydecreasingthe

blurKsizevalue.Toturnoffblur,setittoavaluelessthan3.

Customkernels–gettingconvolutedAswehavejustseen,manyofOpenCV’spredefinedfiltersuseakernel.Rememberthatakernelisasetofweights,whichdeterminehoweachoutputpixeliscalculatedfromaneighborhoodofinputpixels.Anothertermforakernelisaconvolutionmatrix.Itmixesuporconvolvesthepixelsinaregion.Similarly,akernel-basedfiltermaybecalledaconvolutionfilter.

OpenCVprovidesaveryversatilefilter2D()function,whichappliesanykernelorconvolutionmatrixthatwespecify.Tounderstandhowtousethisfunction,let’sfirstlearntheformatofaconvolutionmatrix.Itisa2Darraywithanoddnumberofrowsandcolumns.Thecentralelementcorrespondstoapixelofinterestandtheotherelementscorrespondtotheneighborsofthispixel.Eachelementcontainsanintegerorfloatingpointvalue,whichisaweightthatgetsappliedtoaninputpixel’svalue.Considerthisexample:

kernel=numpy.array([[-1,-1,-1],

[-1,9,-1],

[-1,-1,-1]])

Here,thepixelofinteresthasaweightof9anditsimmediateneighborseachhaveaweightof-1.Forthepixelofinterest,theoutputcolorwillbeninetimesitsinputcolorminustheinputcolorsofalleightadjacentpixels.Ifthepixelofinterestisalreadyabitdifferentfromitsneighbors,thisdifferencebecomesintensified.Theeffectisthattheimagelookssharperasthecontrastbetweentheneighborsisincreased.

Continuingourexample,wecanapplythisconvolutionmatrixtoasourceanddestinationimage,respectively,asfollows:

cv2.filter2D(src,-1,kernel,dst)

Thesecondargumentspecifiestheper-channeldepthofthedestinationimage(suchascv2.CV_8Ufor8bitsperchannel).Anegativevalue(asusedhere)meansthatthedestinationimagehasthesamedepthasthesourceimage.

NoteForcolorimages,notethatfilter2D()appliesthekernelequallytoeachchannel.Tousedifferentkernelsondifferentchannels,wewouldalsohavetousethesplit()andmerge()functions.

Basedonthissimpleexample,let’saddtwoclassestofilters.py.Oneclass,VConvolutionFilter,willrepresentaconvolutionfilteringeneral.Asubclass,SharpenFilter,willrepresentoursharpeningfilterspecifically.Let’seditfilters.pytoimplementthesetwonewclassesasfollows:

classVConvolutionFilter(object):

"""AfilterthatappliesaconvolutiontoV(orallofBGR)."""

def__init__(self,kernel):

self._kernel=kernel

defapply(self,src,dst):

"""ApplythefilterwithaBGRorgraysource/destination."""

cv2.filter2D(src,-1,self._kernel,dst)

classSharpenFilter(VConvolutionFilter):

"""Asharpenfilterwitha1-pixelradius."""

def__init__(self):


[-1,9,-1],

[-1,-1,-1]])

VConvolutionFilter.__init__(self,kernel)

Notethattheweightssumupto1.Thisshouldbethecasewheneverwewanttoleavetheimage’soverallbrightnessunchanged.Ifwemodifyasharpeningkernelslightlysothatitsweightssumupto0instead,wehaveanedgedetectionkernelthatturnsedgeswhiteandnon-edgesblack.Forexample,let’saddthefollowingedgedetectionfiltertofilters.py:

classFindEdgesFilter(VConvolutionFilter):

"""Anedge-findingfilterwitha1-pixelradius."""

def__init__(self):


[-1,8,-1],

[-1,-1,-1]])


Next,let’smakeablurfilter.Generally,forablureffect,theweightsshouldsumupto1andshouldbepositivethroughouttheneighborhood.Forexample,wecantakeasimpleaverageoftheneighborhoodasfollows:

classBlurFilter(VConvolutionFilter):

"""Ablurfilterwitha2-pixelradius."""

def__init__(self):

kernel=numpy.array([[0.04,0.04,0.04,0.04,0.04],

[0.04,0.04,0.04,0.04,0.04],

[0.04,0.04,0.04,0.04,0.04],

[0.04,0.04,0.04,0.04,0.04],

[0.04,0.04,0.04,0.04,0.04]])


Oursharpening,edgedetection,andblurfiltersusekernelsthatarehighlysymmetric.Sometimes,though,kernelswithlesssymmetryproduceaninterestingeffect.Let’sconsiderakernelthatblursononeside(withpositiveweights)andsharpensontheother(withnegativeweights).Itwillproducearidgedorembossedeffect.Hereisanimplementationthatwecanaddtofilters.py:

classEmbossFilter(VConvolutionFilter):

"""Anembossfilterwitha1-pixelradius."""

def__init__(self):

kernel=numpy.array([[-2,-1,0],

[-1,1,1],

[0,1,2]])


Thissetofcustomconvolutionfiltersisverybasic.Indeed,itismorebasicthanOpenCV’sready-madesetoffilters.However,withabitofexperimentation,youshouldbeabletowriteyourownkernelsthatproduceauniquelook.

ModifyingtheapplicationNowthatwehavehigh-levelfunctionsandclassesforseveralfilters,itistrivialtoapplyanyofthemtothecapturedframesinCameo.Let’seditcameo.pyandaddthelinesthatappearinboldfaceinthefollowingexcerpt:

importcv2

importfilters

frommanagersimportWindowManager,CaptureManager

classCameo(object):

def__init__(self):

self._windowManager=WindowManager('Cameo',

self.onKeypress)

self._captureManager=CaptureManager(

cv2.VideoCapture(0),self._windowManager,True)

self._curveFilter=filters.BGRPortraCurveFilter()

defrun(self):

"""Runthemainloop."""

self._windowManager.createWindow()

whileself._windowManager.isWindowCreated:

self._captureManager.enterFrame()

frame=self._captureManager.frame

filters.strokeEdges(frame,frame)

self._curveFilter.apply(frame,frame)

self._captureManager.exitFrame()

self._windowManager.processEvents()

#...TherestisthesameasinChapter2.

Here,Ihavechosentoapplytwoeffects:strokingtheedgesandemulatingPortrafilmcolors.Feelfreetomodifythecodetoapplyanyfiltersyoulike.

HereisascreenshotfromCameowithstrokededgesandPortra-likecolors:

EdgedetectionwithCannyOpenCValsooffersaveryhandyfunctioncalledCanny(afterthealgorithm’sinventor,JohnF.Canny),whichisverypopularnotonlybecauseofitseffectiveness,butalsothesimplicityofitsimplementationinanOpenCVprogram,asitisaone-liner:

importcv2

importnumpyasnp

img=cv2.imread("../images/statue_small.jpg",0)

cv2.imwrite("canny.jpg",cv2.Canny(img,200,300))

cv2.imshow("canny",cv2.imread("canny.jpg"))

cv2.waitKey()


Theresultisaveryclearidentificationoftheedges:

TheCannyedgedetectionalgorithmisquitecomplexbutalsointeresting:it’safive-stepprocessthatdenoisestheimagewithaGaussianfilter,calculatesgradients,appliesnonmaximumsuppression(NMS)onedges,adoublethresholdonallthedetectededgestoeliminatefalsepositives,and,lastly,analyzesalltheedgesandtheirconnectiontoeachothertokeeptherealedgesanddiscardtheweakones.

ContourdetectionAnothervitaltaskincomputervisioniscontourdetection,notonlybecauseoftheobviousaspectofdetectingcontoursofsubjectscontainedinanimageorvideoframe,butbecauseofthederivativeoperationsconnectedwithidentifyingcontours.

Theseoperationsare,namely,computingboundingpolygons,approximatingshapes,andgenerallycalculatingregionsofinterest,whichconsiderablysimplifyinteractionwithimagedatabecausearectangularregionwithNumPyiseasilydefinedwithanarrayslice.Wewillbeusingthistechniquealotwhenexploringtheconceptofobjectdetection(includingfaces)andobjecttracking.

Let’sgoinorderandfamiliarizeourselveswiththeAPIfirstwithanexample:

importcv2

importnumpyasnp

img=np.zeros((200,200),dtype=np.uint8)

img[50:150,50:150]=255

ret,thresh=cv2.threshold(img,127,255,0)

image,contours,hierarchy=cv2.findContours(thresh,cv2.RETR_TREE,

cv2.CHAIN_APPROX_SIMPLE)

color=cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)

img=cv2.drawContours(color,contours,-1,(0,255,0),2)

cv2.imshow("contours",color)

cv2.waitKey()


Firstly,wecreateanemptyblackimagethatis200x200pixelsinsize.Then,weplaceawhitesquareinthecenterofitutilizingndarray’sabilitytoassignvaluesonaslice.

Wethenthresholdtheimage,andcallthefindContours()function.Thisfunctionhasthreeparameters:theinputimage,hierarchytype,andthecontourapproximationmethod.Thereareanumberofaspectsthatareofparticularinterestinthisfunction:

Thefunctionmodifiestheinputimage,soitwouldbeadvisabletouseacopyoftheoriginalimage(forexample,bypassingimg.copy()).Secondly,thehierarchytreereturnedbythefunctionisquiteimportant:cv2.RETR_TREEwillretrievetheentirehierarchyofcontoursintheimage,enablingyoutoestablish“relationships”betweencontours.Ifyouonlywanttoretrievethemostexternalcontours,usecv2.RETR_EXTERNAL.Thisisparticularlyusefulwhenyouwanttoeliminatecontoursthatareentirelycontainedinothercontours(forexample,inavastmajorityofcases,youwon’tneedtodetectanobjectwithinanotherobjectofthesametype).

ThefindContoursfunctionreturnsthreeelements:themodifiedimage,contours,andtheirhierarchy.Weusethecontourstodrawonthecolorversionoftheimage(sothatwecandrawcontoursingreen)andeventuallydisplayit.

Theresultisawhitesquarewithitscontourdrawningreen.Spartan,buteffectivein

demonstratingtheconcept!Let’smoveontomoremeaningfulexamples.

Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleFindingthecontoursofasquareisasimpletask;irregular,skewed,androtatedshapesbringthebestoutofthecv2.findContoursutilityfunctionofOpenCV.Let’stakealookatthefollowingimage:

Inareal-lifeapplication,wewouldbemostinterestedindeterminingtheboundingboxofthesubject,itsminimumenclosingrectangle,anditscircle.Thecv2.findContoursfunctioninconjunctionwithafewotherOpenCVutilitiesmakesthisveryeasytoaccomplish:

importcv2

importnumpyasnp

img=cv2.pyrDown(cv2.imread("hammer.jpg",cv2.IMREAD_UNCHANGED))

ret,thresh=cv2.threshold(cv2.cvtColor(img.copy(),cv2.COLOR_BGR2GRAY),

127,255,cv2.THRESH_BINARY)

image,contours,hier=cv2.findContours(thresh,cv2.RETR_EXTERNAL,


forcincontours:

#findboundingboxcoordinates

x,y,w,h=cv2.boundingRect(c)

cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)

#findminimumarea

rect=cv2.minAreaRect(c)

#calculatecoordinatesoftheminimumarearectangle

box=cv2.boxPoints(rect)

#normalizecoordinatestointegers

box=np.int0(box)

#drawcontours

cv2.drawContours(img,[box],0,(0,0,255),3)

#calculatecenterandradiusofminimumenclosingcircle

(x,y),radius=cv2.minEnclosingCircle(c)

#casttointegers

center=(int(x),int(y))

radius=int(radius)

#drawthecircle

img=cv2.circle(img,center,radius,(0,255,0),2)

cv2.drawContours(img,contours,-1,(255,0,0),1)

cv2.imshow("contours",img)

Aftertheinitialimports,weloadtheimage,andthenapplyabinarythresholdonagrayscaleversionoftheoriginalimage.Bydoingthis,weoperateallfind-contourcalculationsonagrayscalecopy,butwedrawontheoriginalsothatwecanutilizecolorinformation.

Firstly,let’scalculateasimpleboundingbox:

x,y,w,h=cv2.boundingRect(c)

Thisisaprettystraightforwardconversionofcontourinformationtothe(x,y)coordinates,plustheheightandwidthoftherectangle.Drawingthisrectangleisaneasytaskandcanbedoneusingthiscode:


Secondly,let’scalculatetheminimumareaenclosingthesubject:

rect=cv2.minAreaRect(c)

box=cv2.boxPoints(rect)

box=np.int0(box)

Themechanismusedhereisparticularlyinteresting:OpenCVdoesnothaveafunctiontocalculatethecoordinatesoftheminimumrectanglevertexesdirectlyfromthecontourinformation.Instead,wecalculatetheminimumrectanglearea,andthencalculatethevertexesofthisrectangle.Notethatthecalculatedvertexesarefloats,butpixelsareaccessedwithintegers(youcan’taccessa“portion”ofapixel),soweneedtooperatethisconversion.Next,wedrawthebox,whichgivesustheperfectopportunitytointroducethecv2.drawContoursfunction:

cv2.drawContours(img,[box],0,(0,0,255),3)

Firstly,thisfunction—likealldrawingfunctions—modifiestheoriginalimage.Secondly,ittakesanarrayofcontoursinitssecondparameter,soyoucandrawanumberofcontoursinasingleoperation.Therefore,ifyouhaveasinglesetofpointsrepresentinga

contourpolygon,youneedtowrapthesepointsintoanarray,exactlylikewedidwithourboxintheprecedingexample.Thethirdparameterofthisfunctionspecifiestheindexofthecontoursarraythatwewanttodraw:avalueof-1willdrawallcontours;otherwise,acontouratthespecifiedindexinthecontoursarray(thesecondparameter)willbedrawn.

Mostdrawingfunctionstakethecolorofthedrawinganditsthicknessasthelasttwoparameters.

Thelastboundingcontourwe’regoingtoexamineistheminimumenclosingcircle:

(x,y),radius=cv2.minEnclosingCircle(c)

center=(int(x),int(y))

radius=int(radius)

img=cv2.circle(img,center,radius,(0,255,0),2)

Theonlypeculiarityofthecv2.minEnclosingCirclefunctionisthatitreturnsatwo-elementtuple,ofwhichthefirstelementisatupleitself,representingthecoordinatesofthecircle’scenter,andthesecondelementistheradiusofthiscircle.Afterconvertingallthesevaluestointegers,drawingthecircleisquiteatrivialoperation.

Thefinalresultontheoriginalimagelookslikethis:

Contours–convexcontoursandtheDouglas-PeuckeralgorithmMostofthetime,whenworkingwithcontours,subjectswillhavethemostdiverseshapes,includingconvexones.Aconvexshapeisonewheretherearetwopointswithinthisshapewhoseconnectinglinegoesoutsidetheperimeteroftheshapeitself.

ThefirstfacilitythatOpenCVofferstocalculatetheapproximateboundingpolygonofashapeiscv2.approxPolyDP.Thisfunctiontakesthreeparameters:

AcontourAnepsilonvaluerepresentingthemaximumdiscrepancybetweentheoriginalcontourandtheapproximatedpolygon(thelowerthevalue,theclosertheapproximatedvaluewillbetotheoriginalcontour)ABooleanflagsignifyingthatthepolygonisclosed

Theepsilonvalueisofvitalimportancetoobtainausefulcontour,solet’sunderstandwhatitrepresents.Anepsilonisthemaximumdifferencebetweentheapproximatedpolygon’sperimeterandtheoriginalcontour’sperimeter.Thelowerthisdifferenceis,themoretheapproximatedpolygonwillbesimilartotheoriginalcontour.

Youmayaskyourselfwhyweneedanapproximatepolygonwhenwehaveacontourthatisalreadyapreciserepresentation.Theanswertothisisthatapolygonisasetofstraightlines,andtheimportanceofbeingabletodefinepolygonsinaregionforfurthermanipulationandprocessingisparamountinmanycomputervisiontasks.

Nowthatweknowwhatanepsilonis,weneedtoobtaincontourperimeterinformationasareferencevalue.Thisisobtainedwiththecv2.arcLengthfunctionofOpenCV:

epsilon=0.01*cv2.arcLength(cnt,True)

approx=cv2.approxPolyDP(cnt,epsilon,True)

Effectively,we’reinstructingOpenCVtocalculateanapproximatedpolygonwhoseperimetercanonlydifferfromtheoriginalcontourinanepsilonratio.

OpenCValsooffersacv2.convexHullfunctiontoobtainprocessedcontourinformationforconvexshapesandthisisastraightforwardone-lineexpression:

hull=cv2.convexHull(cnt)

Let’scombinetheoriginalcontour,approximatedpolygoncontour,andtheconvexhullinoneimagetoobservethedifferencebetweenthem.Tosimplifythings,I’veappliedthecontourstoablackimagesothattheoriginalsubjectisnotvisiblebutitscontoursare:

Asyoucansee,theconvexhullsurroundstheentiresubject,theapproximatedpolygonistheinnermostpolygonshape,andinbetweenthetwoistheoriginalcontour,mainlycomposedofarcs.

LineandcircledetectionDetectingedgesandcontoursarenotonlycommonandimportanttasks,theyalsoconstitutethebasisforothercomplexoperations.Linesandshapedetectiongohandinhandwithedgeandcontourdetection,solet’sexaminehowOpenCVimplementsthese.

ThetheorybehindlinesandshapedetectionhasitsfoundationinatechniquecalledtheHoughtransform,inventedbyRichardDudaandPeterHart,whoextended(generalized)theworkdonebyPaulHoughintheearly1960s.

Let’stakealookatOpenCV’sAPIfortheHoughtransforms.

LinedetectionFirstofall,let’sdetectsomelines,whichisdonewiththeHoughLinesandHoughLinesPfunctions.TheonlydifferencebetweenthetwofunctionsisthatoneusesthestandardHoughtransform,andthesecondusestheprobabilisticHoughtransform(hencePinthename).

Theprobabilisticversionisso-calledbecauseitonlyanalyzesasubsetofpointsandestimatestheprobabilityofthesepointsallbelongingtothesameline.ThisimplementationisanoptimizedversionofthestandardHoughtransform,andinthiscase,it’slesscomputationallyintensiveandexecutesfaster.

Let’stakealookataverysimpleexample:

importcv2

importnumpyasnp

img=cv2.imread('lines.jpg')


edges=cv2.Canny(gray,50,120)

minLineLength=20

maxLineGap=5

lines=cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)

forx1,y1,x2,y2inlines[0]:

cv2.line(img,(x1,y1),(x2,y2),(0,255,0),2)

cv2.imshow("edges",edges)

cv2.imshow("lines",img)

cv2.waitKey()


Thecrucialpointofthissimplescript—asidefromtheHoughLinesfunctioncall—isthesettingofminimumlinelength(shorterlineswillbediscarded)andthemaximumlinegap,whichisthemaximumsizeofagapinalinebeforethetwosegmentsstartbeingconsideredasseparatelines.

AlsonotethattheHoughLinesfunctiontakesasinglechannelbinaryimage,processedthroughtheCannyedgedetectionfilter.Cannyisnotastrictrequirement,however;animagethat’sbeendenoisedandonlyrepresentsedges,istheidealsourceforaHoughtransform,soyouwillfindthistobeacommonpractice.

TheparametersofHoughLinesPareasfollows:

Theimagewewanttoprocess.Thegeometricalrepresentationsofthelines,rhoandtheta,whichareusually1andnp.pi/180.Thethreshold,whichrepresentsthethresholdbelowwhichalineisdiscarded.TheHoughtransformworkswithasystemofbinsandvotes,witheachbinrepresentingaline,soanylinewithaminimumofthe<threshold>votesisretained,therestdiscarded.MinLineLengthandMaxLineGap,whichwementionedpreviously.

CircledetectionOpenCValsohasafunctionfordetectingcircles,calledHoughCircles.ItworksinaverysimilarfashiontoHoughLines,butwhereminLineLengthandmaxLineGapweretheparameterstodiscardorretainlines,HoughCircleshasaminimumdistancebetweencircles’centers,minimum,andmaximumradiusofthecircles.Here’stheobligatoryexample:

importcv2

importnumpyasnp

planets=cv2.imread('planet_glow.jpg')

gray_img=cv2.cvtColor(planets,cv2.COLOR_BGR2GRAY)

img=cv2.medianBlur(gray_img,5)

cimg=cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)

circles=cv2.HoughCircles(img,cv2.HOUGH_GRADIENT,1,120,

param1=100,param2=30,minRadius=0,maxRadius=0)

circles=np.uint16(np.around(circles))

foriincircles[0,:]:

#drawtheoutercircle

cv2.circle(planets,(i[0],i[1]),i[2],(0,255,0),2)

#drawthecenterofthecircle

cv2.circle(planets,(i[0],i[1]),2,(0,0,255),3)

cv2.imwrite("planets_circles.jpg",planets)

cv2.imshow("HoughCirlces",planets)

cv2.waitKey()


Here’savisualrepresentationoftheresult:

DetectingshapesThedetectionofshapeswiththeHoughtransformislimitedtocircles;however,wealreadyimplicitlyexploreddetectingshapesofanykind,specificallywhenwetalkedaboutapproxPolyDP.Thisfunctionallowstheapproximationofpolygons,soifyourimagecontainspolygons,theywillbequiteaccuratelydetected,combiningtheusageofcv2.findContoursandcv2.approxPolyDP.

SummaryAtthispoint,youshouldhavegainedagoodunderstandingofcolorspaces,FourierTransform,andtheseveralkindsoffiltersmadeavailablebyOpenCVtoprocessimages.

Youshouldalsobeproficientindetectingedges,lines,circles,andshapesingeneral.Additionally,youshouldbeabletofindcontoursandexploittheinformationtheyprovideaboutthesubjectscontainedinanimage.Theseconceptswillserveastheidealbackgroundtoexplorethetopicsinthenextchapter.

Chapter4.DepthEstimationandSegmentationThischaptershowsyouhowtousedatafromadepthcameratoidentifyforegroundandbackgroundregions,sothatwecanlimitaneffecttoonlytheforegroundoronlythebackground.Asprerequisites,weneedadepthcamera,suchasMicrosoftKinect,andweneedtobuildOpenCVwithsupportforourdepthcamera.Forbuildinstructions,seeChapter1,SettingUpOpenCV.

We’lldealwithtwomaintopicsinthischapter:depthestimationandsegmentation.Wewillexploredepthestimationwithtwodistinctapproaches:firstly,byusingadepthcamera(aprerequisiteofthefirstpartofthechapter),suchasMicrosoftKinect,andthen,byusingstereoimages,forwhichanormalcamerawillsuffice.ForinstructionsonhowtobuildOpenCVwithsupportfordepthcameras,seeChapter1,SettingUpOpenCV.Thesecondpartofthechapterisaboutsegmentation,thetechniquethatallowsustoextractforegroundobjectsfromanimage.

CreatingmodulesThecodetocaptureandmanipulatedepth-cameradatawillbereusableoutsideCameo.py.So,weshouldseparateitintoanewmodule.Let’screateafilecalleddepth.pyinthesamedirectoryasCameo.py.Weneedthefollowingimportstatementindepth.py:

importnumpy

Wewillalsoneedtomodifyourpreexistingrects.pyfilesothatourcopyoperationscanbelimitedtoanonrectangularsubregionofarectangle.Tosupportthechangeswearegoingtomake,let’saddthefollowingimportstatementstorects.py:

importnumpy

importutils

Finally,thenewversionofourapplicationwillusedepth-relatedfunctionalities.So,let’saddthefollowingimportstatementtoCameo.py:

importdepth

Now,let’sgodeeperintothesubjectofdepth.

CapturingframesfromadepthcameraBackinChapter2,HandlingFiles,Cameras,andGUIs,wediscussedtheconceptthatacomputercanhavemultiplevideocapturedevicesandeachdevicecanhavemultiplechannels.Supposeagivendeviceisastereocamera.Eachchannelmightcorrespondtoadifferentlensandsensor.Also,eachchannelmightcorrespondtodifferentkindsofdata,suchasanormalcolorimageversusadepthmap.TheC++versionofOpenCVdefinessomeconstantsfortheidentifiersofcertaindevicesandchannels.However,theseconstantsarenotdefinedinthePythonversion.

Toremedythissituation,let’saddthefollowingdefinitionsindepth.py:

#Devices.CAP_OPENNI=900#OpenNI(forMicrosoftKinect)CAP_OPENNI_ASUS=

910#OpenNI(forAsusXtion)

#ChannelsofanOpenNI-compatibledepthgenerator.CAP_OPENNI_DEPTH_MAP=0

#Depthvaluesinmm(16UC1)CAP_OPENNI_POINT_CLOUD_MAP=1#XYZinmeters

(32FC3)CAP_OPENNI_DISPARITY_MAP=2#Disparityinpixels

(8UC1)CAP_OPENNI_DISPARITY_MAP_32F=3#Disparityinpixels

(32FC1)CAP_OPENNI_VALID_DEPTH_MASK=4#8UC1

#ChannelsofanOpenNI-compatibleRGBimagegenerator.CAP_OPENNI_BGR_IMAGE

=5CAP_OPENNI_GRAY_IMAGE=6

Thedepth-relatedchannelsrequiresomeexplanation,asgiveninthefollowinglist:

Adepthmapisagrayscaleimageinwhicheachpixelvalueistheestimateddistancefromthecameratoasurface.Specifically,animagefromtheCAP_OPENNI_DEPTH_MAPchannelgivesthedistanceasafloating-pointnumberofmillimeters.Apointcloudmapisacolorimageinwhicheachcolorcorrespondstoan(x,y,orz)spatialdimension.Specifically,theCAP_OPENNI_POINT_CLOUD_MAPchannelyieldsaBGRimage,whereBisx(blueisright),Gisy(greenisup),andRisz(redisdeep),fromthecamera’sperspective.Thevaluesareinmeters.Adisparitymapisagrayscaleimageinwhicheachpixelvalueisthestereodisparityofasurface.Toconceptualizestereodisparity,let’ssupposeweoverlaytwoimagesofascene,shotfromdifferentviewpoints.Theresultwouldbesimilartoseeingdoubleimages.Forpointsonanypairoftwinobjectsinthescene,wecanmeasurethedistanceinpixels.Thismeasurementisthestereodisparity.Nearbyobjectsexhibitgreaterstereodisparitythanfar-offobjects.Thus,nearbyobjectsappearbrighterinadisparitymap.Avaliddepthmaskshowswhetherthedepthinformationatagivenpixelisbelievedtobevalid(shownbyanonzerovalue)orinvalid(shownbyavalueofzero).Forexample,ifthedepthcameradependsonaninfraredilluminator(aninfraredflash),depthinformationisinvalidinregionsthatareoccluded(shadowed)fromthislight.

Thefollowingscreenshotshowsapointcloudmapofamansittingbehindasculptureofacat:

Thefollowingscreenshothasadisparitymapofamansittingbehindasculptureofacat:

Avaliddepthmaskofamansittingbehindasculptureofacatisshowninthefollowingscreenshot:

CreatingamaskfromadisparitymapForthepurposesofCameo,weareinterestedindisparitymapsandvaliddepthmasks.Theycanhelpusrefineourestimatesoffacialregions.

UsingtheFaceTrackerfunctionandanormalcolorimage,wecanobtainrectangularestimatesoffacialregions.Byanalyzingsucharectangularregioninthecorrespondingdisparitymap,wecantellthatsomepixelswithintherectangleareoutliers—toonearortoofartoreallybeapartoftheface.Wecanrefinethefacialregiontoexcludetheseoutliers.However,weshouldonlyapplythistestwherethedataisvalid,asindicatedbythevaliddepthmask.

Let’swriteafunctiontogenerateamaskwhosevaluesare0fortherejectedregionsofthefacialrectangleand1fortheacceptedregions.Thisfunctionshouldtakeadisparitymap,validdepthmask,andarectangleasarguments.Wecanimplementitindepth.pyasfollows:

defcreateMedianMask(disparityMap,validDepthMask,rect=None):

"""Returnamaskselectingthemedianlayer,plusshadows."""

ifrectisnotNone:

x,y,w,h=rect

disparityMap=disparityMap[y:y+h,x:x+w]

validDepthMask=validDepthMask[y:y+h,x:x+w]

median=numpy.median(disparityMap)

returnnumpy.where((validDepthMask==0)|\

(abs(disparityMap-median)<12),

1.0,0.0)

Toidentifyoutliersinthedisparitymap,wefirstfindthemedianusingnumpy.median(),whichtakesanarrayasanargument.Ifthearrayisofanoddlength,median()returnsthevaluethatwouldlieinthemiddleofthearrayifthearrayweresorted.Ifthearrayisofevenlength,median()returnstheaverageofthetwovaluesthatwouldbesortednearesttothemiddleofthearray.

Togenerateamaskbasedonper-pixelBooleanoperations,weusenumpy.where()withthreearguments.Inthefirstargument,where()takesanarraywhoseelementsareevaluatedfortruthorfalsity.Anoutputarrayoflikedimensionsisreturned.Whereveranelementintheinputarrayistrue,thewhere()function’ssecondargumentisassignedtothecorrespondingelementintheoutputarray.Conversely,whereveranelementintheinputarrayisfalse,thewhere()function’sthirdargumentisassignedtothecorrespondingelementintheoutputarray.

Ourimplementationtreatsapixelasanoutlierwhenithasavaliddisparityvaluethatdeviatesfromthemediandisparityvalueby12ormore.I’vechosenthevalueof12justbyexperimentation.FeelfreetotweakthisvaluelaterbasedontheresultsyouencounterwhenrunningCameowithyourparticularcamerasetup.

MaskingacopyoperationAspartofthepreviouschapter’swork,wewrotecopyRect()asacopyoperationthatlimitsitselftothegivenrectanglesofasourceanddestinationimage.Now,wewanttoapplyfurtherlimitstothiscopyoperation.Wewanttouseagivenmaskthathasthesamedimensionsasthesourcerectangle.

Weshallcopyonlythosepixelsinthesourcerectanglewherethemask’svalueisnotzero.Otherpixelsshallretaintheiroldvaluesfromthedestinationimage.Thislogic,withanarrayofconditionsandtwoarraysofpossibleoutputvalues,canbeexpressedconciselywiththenumpy.where()functionthatwehaverecentlylearned.

Let’sopenrects.pyandeditcopyRect()toaddanewmaskargument.ThisargumentmaybeNone,inwhichcase,wefallbacktoouroldimplementationofthecopyoperation.Otherwise,wenextensurethatmaskandtheimageshavethesamenumberofchannels.Weassumethatmaskhasonechannelbuttheimagesmayhavethreechannels(BGR).Wecanaddduplicatechannelstomaskusingtherepeat()andreshape()methodsofnumpy.array.

Finally,weperformthecopyoperationusingwhere().Thecompleteimplementationisasfollows:

defcopyRect(src,dst,srcRect,dstRect,mask=None,

interpolation=cv2.INTER_LINEAR):

"""Copypartofthesourcetopartofthedestination."""

x0,y0,w0,h0=srcRect

x1,y1,w1,h1=dstRect

#Resizethecontentsofthesourcesub-rectangle.

#Puttheresultinthedestinationsub-rectangle.

ifmaskisNone:

dst[y1:y1+h1,x1:x1+w1]=\

cv2.resize(src[y0:y0+h0,x0:x0+w0],(w1,h1),

interpolation=interpolation)

else:

ifnotutils.isGray(src):

#Convertthemaskto3channels,liketheimage.

mask=mask.repeat(3).reshape(h0,w0,3)

#Performthecopy,withthemaskapplied.

dst[y1:y1+h1,x1:x1+w1]=\

numpy.where(cv2.resize(mask,(w1,h1),

interpolation=\

cv2.INTER_NEAREST),

cv2.resize(src[y0:y0+h0,x0:x0+w0],(w1,h1),

interpolation=interpolation),

dst[y1:y1+h1,x1:x1+w1])

WealsoneedtomodifyourswapRects()function,whichusescopyRect()toperformacircularswapofalistofrectangularregions.ThemodificationstoswapRects()arequitesimple.Wejustneedtoaddanewmasksargument,whichisalistofmaskswhoseelementsarepassedtotherespectivecopyRect()calls.Ifthevalueofthegivenmasks

argumentisNone,wepassNonetoeverycopyRect()call.

Thefollowingcodeshowsyouthefullimplementationofthis:

defswapRects(src,dst,rects,masks=None,

interpolation=cv2.INTER_LINEAR):

"""Copythesourcewithtwoormoresub-rectanglesswapped."""

ifdstisnotsrc:

dst[:]=src

numRects=len(rects)

ifnumRects<2:

return

ifmasksisNone:

masks=[None]*numRects

#Copythecontentsofthelastrectangleintotemporarystorage.

x,y,w,h=rects[numRects-1]

temp=src[y:y+h,x:x+w].copy()

#Copythecontentsofeachrectangleintothenext.

i=numRects-2

whilei>=0:

copyRect(src,dst,rects[i],rects[i+1],masks[i],

interpolation)

i-=1

#Copythetemporarilystoredcontentintothefirstrectangle.

copyRect(temp,dst,(0,0,w,h),rects[0],masks[numRects-1],

interpolation)

NotethatthemasksargumentincopyRect()andswapRects()bothdefaulttoNone.Thus,ournewversionsofthesefunctionsarebackwardcompatiblewithourpreviousversionsofCameo.

DepthestimationwithanormalcameraAdepthcameraisafantasticlittledevicetocaptureimagesandestimatethedistanceofobjectsfromthecameraitself,but,howdoesthedepthcameraretrievedepthinformation?Also,isitpossibletoreproducethesamekindofcalculationswithanormalcamera?

Adepthcamera,suchasMicrosoftKinect,usesatraditionalcameracombinedwithaninfraredsensorthathelpsthecameradifferentiatesimilarobjectsandcalculatetheirdistancefromthecamera.However,noteverybodyhasaccesstoadepthcameraoraKinect,andespeciallywhenyou’rejustlearningOpenCV,you’reprobablynotgoingtoinvestinanexpensivepieceofequipmentuntilyoufeelyourskillsarewell-sharpened,andyourinterestinthesubjectisconfirmed.

Oursetupincludesasimplecamera,whichismostlikelyintegratedinourmachine,orawebcamattachedtoourcomputer.So,weneedtoresorttolessfancymeansofestimatingthedifferenceindistanceofobjectsfromthecamera.

Geometrywillcometotherescueinthiscase,andinparticular,EpipolarGeometry,whichisthegeometryofstereovision.Stereovisionisabranchofcomputervisionthatextractsthree-dimensionalinformationoutoftwodifferentimagesofthesamesubject.

Howdoesepipolargeometrywork?Conceptually,ittracesimaginarylinesfromthecameratoeachobjectintheimage,thendoesthesameonthesecondimage,andcalculatesthedistanceofobjectsbasedontheintersectionofthelinescorrespondingtothesameobject.Hereisarepresentationofthisconcept:

Let’sseehowOpenCVappliesepipolargeometrytocalculateaso-calleddisparitymap,whichisbasicallyarepresentationofthedifferentdepthsdetectedintheimages.Thiswillenableustoextracttheforegroundofapictureanddiscardtherest.

Firstly,weneedtwoimagesofthesamesubjecttakenfromdifferentpointsofview,butpayingattentiontothefactthatthepicturesaretakenatanequaldistancefromtheobject,

otherwisethecalculationswillfailandthedisparitymapwillbemeaningless.

So,movingontoanexample:

importnumpyasnp

importcv2

defupdate(val=0):

#disparityrangeistunedfor'aloe'imagepair

stereo.setBlockSize(cv2.getTrackbarPos('window_size','disparity'))

stereo.setUniquenessRatio(cv2.getTrackbarPos('uniquenessRatio',

'disparity'))

stereo.setSpeckleWindowSize(cv2.getTrackbarPos('speckleWindowSize',

'disparity'))

stereo.setSpeckleRange(cv2.getTrackbarPos('speckleRange','disparity'))

stereo.setDisp12MaxDiff(cv2.getTrackbarPos('disp12MaxDiff',

'disparity'))

print'computingdisparity…'

disp=stereo.compute(imgL,imgR).astype(np.float32)/16.0

cv2.imshow('left',imgL)

cv2.imshow('disparity',(disp-min_disp)/num_disp)


window_size=5

min_disp=16

num_disp=192-min_disp

blockSize=window_size

uniquenessRatio=1

speckleRange=3

speckleWindowSize=3

disp12MaxDiff=200

P1=600

P2=2400

imgL=cv2.imread('images/color1_small.jpg')

imgR=cv2.imread('images/color2_small.jpg')

cv2.namedWindow('disparity')

cv2.createTrackbar('speckleRange','disparity',speckleRange,50,

update)

cv2.createTrackbar('window_size','disparity',window_size,21,update)

cv2.createTrackbar('speckleWindowSize','disparity',speckleWindowSize,

200,update)

cv2.createTrackbar('uniquenessRatio','disparity',uniquenessRatio,50,

update)

cv2.createTrackbar('disp12MaxDiff','disparity',disp12MaxDiff,250,

update)

stereo=cv2.StereoSGBM_create(

minDisparity=min_disp,

numDisparities=num_disp,

blockSize=window_size,

uniquenessRatio=uniquenessRatio,

speckleRange=speckleRange,

speckleWindowSize=speckleWindowSize,

disp12MaxDiff=disp12MaxDiff,

P1=P1,

P2=P2

)

update()

cv2.waitKey()

Inthisexample,wetaketwoimagesofthesamesubjectandcalculateadisparitymap,showinginbrightercolorsthepointsinthemapthatareclosertothecamera.Theareasmarkedinblackrepresentthedisparities.

Firstofall,weimportnumpyandcv2asusual.

Let’sskipthedefinitionoftheupdatefunctionforasecondandtakealookatthemaincode;theprocessisquitesimple:loadtwoimages,createaStereoSGBMinstance(StereoSGBMstandsforsemiglobalblockmatching,anditisanalgorithmusedforcomputingdisparitymaps),andalsocreateafewtrackbarstoplayaroundwiththeparametersofthealgorithmandcalltheupdatefunction.

TheupdatefunctionappliesthetrackbarvaluestotheStereoSGBMinstance,andthencallsthecomputemethod,whichproducesadisparitymap.Allinall,prettysimple!HereisthefirstimageI’veused:

Thisisthesecondone:

Thereyougo:aniceandquiteeasytointerpretdisparitymap.

TheparametersusedbyStereoSGBMareasfollows(takenfromtheOpenCVdocumentation):

Parameter Description

minDisparity

Thisparameterreferstotheminimumpossibledisparityvalue.Normally,itiszerobutsometimes,rectificationalgorithmscanshiftimages,sothisparameterneedstobeadjustedaccordingly.

numDisparitiesThisparameterreferstothemaximumdisparityminusminimumdisparity.Theresultantvalueisalwaysgreaterthanzero.Inthecurrentimplementation,thisparametermustbedivisibleby16.

windowSizeThisparameterreferstoamatchedblocksize.Itmustbeanoddnumbergreaterthanorequalto1.Normally,itshouldbesomewhereinthe3-11range.

P1Thisparameterreferstothefirstparametercontrollingthedisparitysmoothness.Seethenextpoint.

P2

Thisparameterreferstothesecondparameterthatcontrolsthedisparitysmoothness.Thelargerthevaluesare,thesmootherthedisparityis.P1isthepenaltyonthedisparitychangebyplusorminus1betweenneighborpixels.P2isthepenaltyonthedisparitychangebymorethan1betweenneighborpixels.ThealgorithmrequiresP2>P1.

Seethestereo_match.cppsamplewheresomereasonablygoodP1andP2valuesareshown(suchas8*number_of_image_channels*windowSize*windowSizeand32*number_of_image_channels*windowSize*windowSize,respectively).

disp12MaxDiffThisparameterreferstothemaximumalloweddifference(inintegerpixelunits)intheleft-rightdisparitycheck.Setittoanonpositivevaluetodisablethecheck.

preFilterCap

Thisparameterreferstothetruncationvalueforprefilteredimagepixels.Thealgorithmfirstcomputesthex-derivativeateachpixelandclipsitsvaluebythe[-preFilterCap,preFilterCap]interval.TheresultantvaluesarepassedtotheBirchfield-Tomasipixelcostfunction.

uniquenessRatio

Thisparameterreferstothemargininpercentagebywhichthebest(minimum)computedcostfunctionvalueshould“win”thesecondbestvaluetoconsiderthefoundmatchtobecorrect.Normally,avaluewithinthe5-15rangeisgoodenough.

speckleWindowSize

Thisparameterreferstothemaximumsizeofsmoothdisparityregionstoconsidertheirnoisespecklesandinvalidate.Setitto0todisablespecklefiltering.Otherwise,setitsomewhereinthe50-200range.

speckleRange

Thisparameterreferstothemaximumdisparityvariationwithineachconnectedcomponent.Ifyoudospecklefiltering,settheparametertoapositivevalue;itwillimplicitlybemultipliedby16.Normally,1or2isgoodenough.

Withtheprecedingscript,you’llbeabletoloadtheimagesandplayaroundwithparametersuntilyou’rehappywiththedisparitymapgeneratedbyStereoSGBM.

ObjectsegmentationusingtheWatershedandGrabCutalgorithmsCalculatingadisparitymapcanbeveryusefultodetecttheforegroundofanimage,butStereoSGBMisnottheonlyalgorithmavailabletoaccomplishthis,andinfact,StereoSGBMismoreaboutgathering3Dinformationfrom2Dpictures,thananythingelse.GrabCut,however,isaperfecttoolforthispurpose.TheGrabCutalgorithmfollowsaprecisesequenceofsteps:

1. Arectangleincludingthesubject(s)ofthepictureisdefined.2. Thearealyingoutsidetherectangleisautomaticallydefinedasabackground.3. Thedatacontainedinthebackgroundisusedasareferencetodistinguish

backgroundareasfromforegroundareaswithintheuser-definedrectangle.4. AGaussiansMixtureModel(GMM)modelstheforegroundandbackground,and

labelsundefinedpixelsasprobablebackgroundandforegrounds.5. Eachpixelintheimageisvirtuallyconnectedtothesurroundingpixelsthrough

virtualedges,andeachedgegetsaprobabilityofbeingforegroundorbackground,basedonhowsimilaritisincolortothepixelssurroundingit.

6. Eachpixel(ornodeasitisconceptualizedinthealgorithm)isconnectedtoeitheraforegroundorabackgroundnode,whichyoucanpicturelookinglikethis:

7. Afterthenodeshavebeenconnectedtoeitherterminal(backgroundorforeground,alsocalledasourceandsink),theedgesbetweennodesbelongingtodifferentterminalsarecut(thefamouscutpartofthealgorithm),whichenablestheseparationofthepartsoftheimage.Thisgraphadequatelyrepresentsthealgorithm:

ExampleofforegrounddetectionwithGrabCutLet’slookatanexample.Westartwiththepictureofabeautifulstatueofanangel.

Wewanttograbourangelanddiscardthebackground.Todothis,wewillcreatearelativelyshortscriptthatwillinstantiateGrabCut,operatetheseparation,andthendisplaytheresultingimagesidebysidetotheoriginal.Wewilldothisusingmatplotlib,averyusefulPythonlibrary,whichmakesdisplayingchartsandimagesatrivialtask:

importnumpyasnp

importcv2

frommatplotlibimportpyplotasplt

img=cv2.imread('images/statue_small.jpg')

mask=np.zeros(img.shape[:2],np.uint8)

bgdModel=np.zeros((1,65),np.float64)

fgdModel=np.zeros((1,65),np.float64)

rect=(100,50,421,378)

cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)

mask2=np.where((mask==2)|(mask==0),0,1).astype('uint8')

img=img*mask2[:,:,np.newaxis]

plt.subplot(121),plt.imshow(img)

plt.title("grabcut"),plt.xticks([]),plt.yticks([])

plt.subplot(122),

plt.imshow(cv2.cvtColor(cv2.imread('images/statue_small.jpg'),

cv2.COLOR_BGR2RGB))

plt.title("original"),plt.xticks([]),plt.yticks([])

plt.show()

Thiscodeisactuallyquitestraightforward.Firstly,weloadtheimagewewanttoprocess,andthenwecreateamaskpopulatedwithzeroswiththesameshapeastheimagewe’veloaded:

importnumpyasnp

importcv2


img=cv2.imread('images/statue_small.jpg')

mask=np.zeros(img.shape[:2],np.uint8)

Wethencreatezero-filledforegroundandbackgroundmodels:

bgdModel=np.zeros((1,65),np.float64)

fgdModel=np.zeros((1,65),np.float64)

Wecouldhavepopulatedthesemodelswithdata,butwe’regoingtoinitializetheGrabCutalgorithmwitharectangleidentifyingthesubjectwewanttoisolate.So,backgroundandforegroundmodelsaregoingtobedeterminedbasedontheareasleftoutoftheinitialrectangle.Thisrectangleisdefinedinthenextline:

rect=(100,50,421,378)

Nowtotheinterestingpart!WeruntheGrabCutalgorithmspecifyingtheemptymodelsandmask,andthefactthatwe’regoingtousearectangletoinitializetheoperation:

cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)

You’llalsonoticeanintegerafterfgdModel,whichisthenumberofiterationsthealgorithmisgoingtorunontheimage.Youcanincreasethese,butthereisapointinwhichpixelclassificationswillconverge,andeffectively,you’lljustbeaddingiterationswithoutobtaininganymoreimprovements.

Afterthis,ourmaskwillhavechangedtocontainvaluesbetween0and3.Thevalues,0and2,willbeconvertedintozeros,and1-3intoones,andstoredintomask2,whichwecanthenusetofilteroutallzero-valuepixels(theoreticallyleavingallforegroundpixelsintact):

mask2=np.where((mask==2)|(mask==0),0,1).astype('uint8')

img=img*mask2[:,:,np.newaxis]

Thelastpartofthecodedisplaystheimagessidebyside,andhere’stheresult:

Thisisquiteasatisfactoryresult.You’llnoticethatanareaofbackgroundisleftundertheangel’sarm.Itispossibletoapplytouchstrokestoapplymoreiterations;thetechniqueisquitewellillustratedinthegrabcut.pyfileinsamples/python2ofyourOpenCVinstallation.

ImagesegmentationwiththeWatershedalgorithmFinally,wetakeaquicklookattheWatershedalgorithm.ThealgorithmiscalledWatershed,becauseitsconceptualizationinvolveswater.Imagineareaswithlowdensity(littletonochange)inanimageasvalleys,andareaswithhighdensity(lotsofchange)aspeaks.Startfillingthevalleyswithwatertothepointwherewaterfromtwodifferentvalleysisabouttomerge.Topreventthemergingofwaterfromdifferentvalleys,youbuildabarriertokeepthemseparated.Theresultingbarrieristheimagesegmentation.

AsanItalian,Ilovefood,andoneofthethingsIlovethemostisagoodplateofpastawithapestosauce.Sohere’sapictureofthemostvitalingredientforapesto,basil:

Now,wewanttosegmenttheimagetoseparatethebasilleavesfromthewhitebackground.

Oncemore,weimportnumpy,cv2,andmatplotlib,andthenimportourbasilleaves’image:

importnumpyasnp

importcv2


img=cv2.imread('images/basil.jpg')


Afterchangingthecolortograyscale,werunathresholdontheimage.Thisoperationhelpsdividingtheimageintwo,blacksandwhites:

ret,thresh=

cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

Nextup,weremovenoisefromtheimagebyapplyingthemorphologyExtransformation,anoperationthatconsistsofdilatingandthenerodinganimagetoextractfeatures:

kernel=np.ones((3,3),np.uint8)

opening=cv2.morphologyEx(thresh,cv2.MORPH_OPEN,kernel,iterations=2)

Bydilatingtheresultofthemorphologytransformation,wecanobtainareasoftheimagethataremostcertainlybackground:

sure_bg=cv2.dilate(opening,kernel,iterations=3)

Conversely,wecanobtainsureforegroundareasbyapplyingdistanceTransform.Inpracticalterms,ofalltheareasmostlikelytobeforeground,thefartherawayfromthe“border”withthebackgroundapointis,thehigherthechanceitisforeground.Oncewe’veobtainedthedistanceTransformrepresentationoftheimage,weapplyathresholdtodeterminewithahighlymathematicalprobabilitywhethertheareasareforeground:

dist_transform=cv2.distanceTransform(opening,cv2.DIST_L2,5)

ret,sure_fg=cv2.threshold(dist_transform,0.7*dist_transform.max(),255,0)

Atthisstage,wehavesomesureforegroundsandbackgrounds.Now,whatabouttheareasinbetween?Firstofall,weneedtodeterminetheseregions,whichcanbedonebysubtractingthesureforegroundfromthebackground:

sure_fg=np.uint8(sure_fg)

unknown=cv2.subtract(sure_bg,sure_fg)

Nowthatwehavetheseareas,wecanbuildourfamous“barriers”tostopthewaterfrommerging.ThisisdonewiththeconnectedComponentsfunction.WetookaglimpseatthegraphtheorywhenweanalyzedtheGrabCutalgorithm,andconceptualizedanimageasasetofnodesthatareconnectedbyedges.Giventhesureforegroundareas,someofthesenodeswillbeconnectedtogether,butsomewon’t.Thismeansthattheybelongtodifferentwatervalleys,andthereshouldbeabarrierbetweenthem:

ret,markers=cv2.connectedComponents(sure_fg)

Nowweadd1tothebackgroundareasbecauseweonlywantunknownstostayat0:

markers=markers+1

markers[unknown==255]=0

Finally,weopenthegates!Letthewaterfallandourbarriersbedrawninred:

markers=cv2.watershed(img,markers)

img[markers==-1]=[255,0,0]

plt.imshow(img)

plt.show()

Now,let’sshowtheresult:

Needlesstosay,Iamnowhungry!

SummaryInthischapter,welearnedaboutgatheringthree-dimensionalinformationfrombi-dimensionalinput(avideoframeoranimage).Firstly,weexamineddepthcameras,andthenepipolargeometryandstereoimages,sowearenowabletocalculatedisparitymaps.Finally,welookedatimagesegmentationwithtwoofthemostpopularmethods:GrabCutandWatershed.

ThischapterhasintroducedustotheworldofinterpretinginformationprovidedbyimagesandwearenowreadytoexploreanotherimportantfeatureofOpenCV:featuredescriptorsandkeypointdetection.

Chapter5.DetectingandRecognizingFacesAmongthemanyreasonsthatmakecomputervisionafascinatingsubjectisthefactthatcomputervisionmakesveryfuturistic-soundingtasksareality.Onesuchfeatureisfacedetection.OpenCVhasabuilt-infacilitytoperformfacedetection,whichhasvirtuallyinfiniteapplicationsintherealworldinallsortsofcontexts,fromsecuritytoentertainment.

ThischapterintroducessomeofOpenCV’sfacedetectionfunctionalities,alongwiththedatafilesthatdefineparticulartypesoftrackableobjects.Specifically,welookatHaarcascadeclassifiers,whichanalyzecontrastbetweenadjacentimageregionstodeterminewhetherornotagivenimageorsubimagematchesaknowntype.WeconsiderhowtocombinemultipleHaarcascadeclassifiersinahierarchy,suchthatoneclassifieridentifiesaparentregion(forourpurposes,aface)andotherclassifiersidentifychildregions(eyes,nose,andmouth).

Wealsotakeadetourintothehumblebutimportantsubjectofrectangles.Bydrawing,copying,andresizingrectangularimageregions,wecanperformsimplemanipulationsonimageregionsthatwearetracking.

Bytheendofthischapter,wewillintegratefacetrackingandrectanglemanipulationsintoCameo.Finally,we’llhavesomeface-to-faceinteraction!

ConceptualizingHaarcascadesWhenwetalkaboutclassifyingobjectsandtrackingtheirlocation,whatexactlyarewehopingtopinpoint?Whatconstitutesarecognizablepartofanobject?

Photographicimages,evenfromawebcam,maycontainalotofdetailforour(human)viewingpleasure.However,imagedetailtendstobeunstablewithrespecttovariationsinlighting,viewingangle,viewingdistance,camerashake,anddigitalnoise.Moreover,evenrealdifferencesinphysicaldetailmightnotinterestusforthepurposeofclassification.Iwastaughtinschoolthatnotwosnowflakeslookalikeunderamicroscope.Fortunately,asaCanadianchild,Ihadalreadylearnedhowtorecognizesnowflakeswithoutamicroscope,asthesimilaritiesaremoreobviousinbulk.

Thus,somemeansofabstractingimagedetailisusefulinproducingstableclassificationandtrackingresults.Theabstractionsarecalledfeatures,whicharesaidtobeextractedfromtheimagedata.Thereshouldbefarfewerfeaturesthanpixels,thoughanypixelmightinfluencemultiplefeatures.ThelevelofsimilaritybetweentwoimagescanbeevaluatedbasedonEuclideandistancesbetweentheimages’correspondingfeatures.

Forexample,distancemightbedefinedintermsofspatialcoordinatesorcolorcoordinates.Haar-likefeaturesareonetypeoffeaturethatisoftenappliedtoreal-timefacetracking.Theywerefirstusedforthispurposeinthepaper,RobustReal-TimeFaceDetection,PaulViolaandMichaelJones,KluwerAcademicPublishers,2001(availableathttp://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf).EachHaar-likefeaturedescribesthepatternofcontrastamongadjacentimageregions.Forexample,edges,vertices,andthinlineseachgeneratedistinctivefeatures.

Foranygivenimage,thefeaturesmayvarydependingontheregion’ssize;thismaybecalledthewindowsize.Twoimagesthatdifferonlyinscaleshouldbecapableofyieldingsimilarfeatures,albeitfordifferentwindowsizes.Thus,itisusefultogeneratefeaturesformultiplewindowsizes.Suchacollectionoffeaturesiscalledacascade.WemaysayaHaarcascadeisscale-invariantor,inotherwords,robusttochangesinscale.OpenCVprovidesaclassifierandtrackerforscale-invariantHaarcascadesthatitexpectstobeinacertainfileformat.

Haarcascades,asimplementedinOpenCV,arenotrobusttochangesinrotation.Forexample,anupside-downfaceisnotconsideredsimilartoanuprightfaceandafaceviewedinprofileisnotconsideredsimilartoafaceviewedfromthefront.Amorecomplexandmoreresource-intensiveimplementationcouldimproveHaarcascades’robustnesstorotationbyconsideringmultipletransformationsofimagesaswellasmultiplewindowsizes.However,wewillconfineourselvestotheimplementationinOpenCV.

http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf

GettingHaarcascadedataOnceyouhaveacopyofthesourcecodeofOpenCV3,youwillfindafolder,data/haarcascades.

ThisfoldercontainsalltheXMLfilesusedbytheOpenCVfacedetectionenginetodetectfacesinstillimages,videos,andcamerafeeds.

Onceyoufindhaarcascades,createadirectoryforyourproject;inthisfolder,createasubfoldercalledcascades,andcopythefollowingfilesfromhaarcascadesintocascades:

haarcascade_profileface.xml

haarcascade_righteye_2splits.xml

haarcascade_russian_plate_number.xml

haarcascade_smile.xml

haarcascade_upperbody.xml

Astheirnamessuggest,thesecascadesarefortrackingfaces,eyes,noses,andmouths.Theyrequireafrontal,uprightviewofthesubject.Wewillusethemlaterwhenbuildingafacedetector.Ifyouarecuriousabouthowthesedatasetsaregenerated,refertoAppendixB,GeneratingHaarCascadesforCustomTargets,OpenCVComputerVisionwithPython.Withalotofpatienceandapowerfulcomputer,youcanmakeyourowncascadesandtrainthemforvarioustypesofobjects.

UsingOpenCVtoperformfacedetectionUnlikewhatyoumaythinkfromtheoutset,performingfacedetectiononastillimageoravideofeedisanextremelysimilaroperation.Thelatterisjustthesequentialversionoftheformer:facedetectiononvideosissimplyfacedetectionappliedtoeachframereadintotheprogramfromthecamera.Naturally,awholehostofconceptsareappliedtovideofacedetectionsuchastracking,whichdoesnotapplytostillimages,butit’salwaysgoodtoknowthattheunderlyingtheoryisthesame.

Solet’sgoaheadanddetectsomefaces.

PerformingfacedetectiononastillimageThefirstandmostbasicwaytoperformfacedetectionistoloadanimageanddetectfacesinit.Tomaketheresultvisuallymeaningful,wewilldrawrectanglesaroundfacesontheoriginalimage.

Nowthatyouhavehaarcascadesincludedinyourproject,let’sgoaheadandcreateabasicscripttoperformfacedetection.

importcv2

filename='/path/to/my/pic.jpg'

defdetect(filename):

face_cascade=

cv2.CascadeClassifier('./cascades/haarcascade_frontalface_default.xml')

img=cv2.imread(filename)


faces=face_cascade.detectMultiScale(gray,1.3,5)

for(x,y,w,h)infaces:

img=cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)

cv2.namedWindow('VikingsDetected!!')

cv2.imshow('VikingsDetected!!',img)

cv2.imwrite('./vikings.jpg',img)

cv2.waitKey(0)

detect(filename)

Let’sgothroughthecode.First,weusetheobligatorycv2import(you’llfindthateveryscriptinthisbookwillstartlikethis,oralmostsimilar).Secondly,wedeclarethedetectfunction.

defdetect(filename):

Withinthisfunction,wedeclareaface_cascadevariable,whichisaCascadeClassifierobjectforfaces,andresponsibleforfacedetection.

face_cascade=


Wethenloadourfilewithcv2.imread,andconvertittograyscale,becausethat’sthecolorspaceinwhichthefacedetectionhappens.

Thenextstep(face_cascade.detectMultiScale)iswhereweoperatetheactualfacedetection.

img=cv2.imread(filename)



TheparameterspassedarescaleFactorandminNeighbors,whichdeterminethepercentagereductionoftheimageateachiterationofthefacedetectionprocess,andtheminimumnumberofneighborsretainedbyeachfacerectangleateachiteration.Thismay

allseemalittlecomplexinthebeginningbutyoucancheckalltheoptionsoutintheofficialdocumentation.

Thevaluereturnedfromthedetectionoperationisanarrayoftuplesthatrepresentthefacerectangles.Theutilitymethod,cv2.rectangle,allowsustodrawrectanglesatthespecifiedcoordinates(xandyrepresenttheleftandtopcoordinates,wandhrepresentthewidthandheightofthefacerectangle).

Wewilldrawbluerectanglesaroundallthefaceswefindbyloopingthroughthefacesvariable,makingsureweusetheoriginalimagefordrawing,notthegrayversion.



Lastly,wecreateanamedWindowinstanceanddisplaytheresultingprocessedimageinit.Topreventtheimagewindowfromclosingautomatically,weinsertacalltowaitKey,whichclosesthewindowdownatthepressofanykey.

cv2.namedWindow('VikingsDetected!!')

cv2.imshow('VikingsDetected!!',img)

cv2.waitKey(0)

Andtherewego,awholesetofVikingshavebeendetectedinourimage,asshowninthefollowingscreenshot:

PerformingfacedetectiononavideoWenowhaveagoodfoundationtounderstandhowtoperformfacedetectiononastillimage.Asmentionedpreviously,wecanrepeattheprocessontheindividualframesofavideo(beitacamerafeedoravideo)andperformfacedetection.

Thescriptwillperformthefollowingtasks:itwillopenacamerafeed,itwillreadaframe,itwillexaminethatframeforfaces,itwillscanforeyeswithinthefacesdetected,andthenitwilldrawbluerectanglesaroundthefacesandgreenrectanglesaroundtheeyes.

1. Let’screateafilecalledface_detection.pyandstartbyimportingthenecessarymodule:

importcv2

2. Afterthis,wedeclareamethod,detect(),whichwillperformfacedetection.

defdetect():

face_cascade=


eye_cascade=cv2.CascadeClassifier('./cascades/haarcascade_eye.xml')

camera=cv2.VideoCapture(0)

3. Thefirstthingweneedtodoinsidethedetect()methodistoloadtheHaarcascadefilessothatOpenCVcanoperatefacedetection.Aswecopiedthecascadefilesinthelocalcascades/folder,wecanusearelativepath.Then,weopenaVideoCaptureobject(thecamerafeed).TheVideoCaptureconstructortakesaparameter,whichindicatesthecameratobeused;zeroindicatesthefirstcameraavailable.

while(True):

ret,frame=camera.read()

gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)

4. Nextup,wecaptureaframe.Theread()methodreturnstwovalues:aBooleanindicatingthesuccessoftheframereadoperation,andtheframeitself.Wecapturetheframe,andthenweconvertittograyscale.Thisisanecessaryoperation,becausefacedetectioninOpenCVhappensinthegrayscalecolorspace:


5. Muchlikethesinglestillimageexample,wecalldetectMultiScaleonthegrayscaleversionoftheframe.


img=cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),2)

roi_gray=gray[y:y+h,x:x+w]

eyes=eye_cascade.detectMultiScale(roi_gray,1.03,5,0,

(40,40))

NoteThereareafewadditionalparametersintheeyedetection.Why?ThemethodsignaturefordetectMultiScaletakesanumberofoptionalparameters:inthecaseofdetectingaface,thedefaultoptionsweregoodenoughtodetectfaces.However,eyesareasmallerfeatureoftheface,andself-castingshadowsinmybeardormynoseandrandomshadowsintheframeweretriggeringfalsepositives.

Bylimitingthesearchforeyestoaminimumsizeof40x40pixels,Iwasabletodiscardallfalsepositives.Goaheadandtesttheseparametersuntilyoureachapointatwhichyourapplicationperformsasyouexpecteditto(forexample,youcantryandspecifyamaximumsizeforthefeaturetoo,orincreasethescalefactorandnumberofneighbors).

6. Herewehaveafurtherstepcomparedtothestillimageexample:wecreatearegionofinterestcorrespondingtothefacerectangle,andwithinthisrectangle,weoperate“eyedetection”.Thismakessenseasyouwouldn’twanttogolookingforeyesoutsideaface(well,forhumanbeingsatleast!).

for(ex,ey,ew,eh)ineyes:

cv2.rectangle(img,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)

7. Again,weloopthroughtheresultingeyetuplesanddrawgreenrectanglesaroundthem.

cv2.imshow("camera",frame)

ifcv2.waitKey(1000/12)&0xff==ord("q"):

break

camera.release()



detect()

8. Finally,weshowtheresultingframeinthewindow.Allbeingwell,ifanyfaceiswithinthefieldofviewofthecamera,youwillhaveabluerectanglearoundtheirfaceandagreenrectanglearoundeacheye,asshowninthisscreenshot:

PerformingfacerecognitionDetectingfacesisafantasticfeatureofOpenCVandonethatconstitutesthebasisforamoreadvancedoperation:facerecognition.Whatisfacerecognition?It’stheabilityofaprogram,givenanimageoravideofeed,toidentifyaperson.Oneofthewaystoachievethis(andtheapproachadoptedbyOpenCV)isto“train”theprogrambyfeedingitasetofclassifiedpictures(afacialdatabase),andoperatetherecognitionagainstthosepictures.

ThisistheprocessthatOpenCVanditsfacerecognitionmodulefollowtorecognizefaces.

Anotherimportantfeatureofthefacerecognitionmoduleisthateachrecognitionhasaconfidencescore,whichallowsustosetthresholdsinreal-lifeapplicationstolimittheamountoffalsereads.

Let’sstartfromtheverybeginning;tooperatefacerecognition,weneedfacestorecognize.Youcandothisintwoways:supplytheimagesyourselforobtainfreelyavailablefacedatabases.ThereareanumberoffacedatabasesontheInternet:

TheYalefacedatabase(Yalefaces):http://vision.ucsd.edu/content/yale-face-databaseTheAT&T:http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.htmlTheExtendedYaleorYaleB:http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

Tooperatefacerecognitiononthesesamples,youwouldthenhavetorunfacerecognitiononanimagethatcontainsthefaceofoneofthesampledpeople.Thatmaybeaneducationalprocess,butIfoundittobenotassatisfyingasprovidingimagesofmyown.Infact,Iprobablyhadthesamethoughtthatmanypeoplehad:IwonderifIcouldwriteaprogramthatrecognizesmyfacewithacertaindegreeofconfidence.

GeneratingthedataforfacerecognitionSolet’sgoaheadandwriteascriptthatwillgeneratethoseimagesforus.Afewimagescontainingdifferentexpressionsareallthatweneed,butwehavetomakesurethesampleimagesadheretocertaincriteria:

Imageswillbegrayscaleinthe.pgmformatSquareshapeAllthesamesizeimages(Iused200x200;mostfreelyavailablesetsaresmallerthanthat)

Here’sthescriptitself:

importcv2

defgenerate():

face_cascade=


eye_cascade=cv2.CascadeClassifier('./cascades/haarcascade_eye.xml')


http://vision.ucsd.edu/content/yale-face-database

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

count=0

while(True):


gray=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)



img=cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),2)

f=cv2.resize(gray[y:y+h,x:x+w],(200,200))

cv2.imwrite('./data/at/jm/%s.pgm'%str(count),f)

count+=1

cv2.imshow("camera",frame)


break

camera.release()



generate()

Whatisquiteinterestingaboutthisexerciseisthatwearegoingtogeneratesampleimagesbuildingonournewfoundknowledgeofhowtodetectafaceinavideofeed.Effectively,whatwearedoingisdetectingaface,croppingthatregionofthegray-scaledframe,resizingittobe200x200pixels,andsavingitwithanameinaparticularfolder(inmycase,jm;youcanuseyourinitials)inthe.pgmformat.

Iinsertedavariable,count,becauseweneededprogressivenamesfortheimages.Runthescriptforafewseconds,changeexpressionsafewtimes,andcheckthedestinationfolderyouspecifiedinthescript.Youwillfindanumberofimagesofyourface,grayed,resized,andnamedwiththeformat,<count>.pgm.

Let’snowmoveontotryandrecognizeourfaceinavideofeed.Thisshouldbefun!

RecognizingfacesOpenCV3comeswiththreemainmethodsforrecognizingfaces,basedonthreedifferentalgorithms:Eigenfaces,Fisherfaces,andLocalBinaryPatternHistograms(LBPH).Itisbeyondthescopeofthisbooktogetintothenitty-grittyofthetheoreticaldifferencesbetweenthesemethods,butwecangiveahigh-leveloverviewoftheconcepts.

Iwillreferyoutothefollowinglinksforadetaileddescriptionofthealgorithms:

PrincipalComponentAnalysis(PCA):AveryintuitiveintroductionbyJonathonShlensisavailableathttp://arxiv.org/pdf/1404.1100v1.pdf.Thisalgorithmwasinventedin1901byK.Pearson,andtheoriginalpaper,OnLinesandPlanesofClosestFittoSystemsofPointsinSpace,isavailableathttp://stat.smmu.edu.cn/history/pearson1901.pdf.Eigenfaces:Thepaper,EigenfacesforRecognition,M.TurkandA.Pentland,1991,

http://arxiv.org/pdf/1404.1100v1.pdf

http://stat.smmu.edu.cn/history/pearson1901.pdf

isavailableathttp://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf.Fisherfaces:Theseminalpaper,THEUSEOFMULTIPLEMEASUREMENTSINTAXONOMICPROBLEMS,R.A.Fisher,1936,isavailableathttp://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.1936.tb02137.x/pdf.LocalBinaryPattern:ThefirstpaperdescribingthisalgorithmisPerformanceevaluationoftexturemeasureswithclassificationbasedonKullbackdiscriminationofdistributions,T.Ojala,M.Pietikainen,D.Harwoodandisavailableathttp://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=576366&searchWithin%5B%5D=%22Authors%22%3A.QT.Ojala%2C+T..QT.&newsearch=true

Firstandforemost,allmethodsfollowasimilarprocess;theyalltakeasetofclassifiedobservations(ourfacedatabase,containingnumeroussamplesperindividual),get“trained”onit,performananalysisoffacesdetectedinanimageorvideo,anddeterminetwoelements:whetherthesubjectisidentified,andameasureoftheconfidenceofthesubjectreallybeingidentified,whichiscommonlyknownastheconfidencescore.

EigenfacesperformsasocalledPCA,which—ofallthemathematicalconceptsyouwillhearmentionedinrelationtocomputervision—ispossiblythemostdescriptive.Itbasicallyidentifiesprincipalcomponentsofacertainsetofobservations(again,yourfacedatabase),calculatesthedivergenceofthecurrentobservation(thefacesbeingdetectedinanimageorframe)comparedtothedataset,anditproducesavalue.Thesmallerthevalue,thesmallerthedifferencebetweenfacedatabaseanddetectedface;hence,avalueof0isanexactmatch.

FisherfacesderivesfromPCAandevolvestheconcept,applyingmorecomplexlogic.Whilecomputationallymoreintensive,ittendstoyieldmoreaccurateresultsthanEigenfaces.

LBPHinsteadroughly(again,fromaveryhighlevel)dividesadetectedfaceintosmallcellsandcompareseachcelltothecorrespondingcellinthemodel,producingahistogramofmatchingvaluesforeacharea.Becauseofthisflexibleapproach,LBPHistheonlyfacerecognitionalgorithmthatallowsthemodelsamplefacesandthedetectedfacestobeofdifferentshapeandsize.Ipersonallyfoundthistobethemostaccuratealgorithmgenerallyspeaking,buteachalgorithmhasitsstrengthsandweaknesses.

PreparingthetrainingdataNowthatwehaveourdata,weneedtoloadthesesamplepicturesintoourfacerecognitionalgorithms.Allfacerecognitionalgorithmstaketwoparametersintheirtrain()method:anarrayofimagesandanarrayoflabels.Whatdotheselabelsrepresent?TheyaretheIDsofacertainindividual/facesothatwhenfacerecognitionisperformed,wenotonlyknowthepersonwasrecognizedbutalsowho—amongthemanypeopleavailableinourdatabase—thepersonis.

Todothat,weneedtocreateacomma-separatedvalue(CSV)file,whichwillcontainthepathtoasamplepicturefollowedbytheIDofthatperson.Inmycase,Ihave20picturesgeneratedwiththepreviousscript,inthesubfolder,jm/,ofthefolder,data/at/,whichcontainsallthepicturesofalltheindividuals.

http://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf

http://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.1936.tb02137.x/pdf

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=576366&searchWithin%5B%5D=%22Authors%22%3A.QT.Ojala%2C+T..QT.&newsearch=true

MyCSVfilethereforelookslikethis:

jm/1.pgm;0

jm/2.pgm;0

jm/3.pgm;0…

jm/20.pgm;0

NoteThedotsareallthemissingnumbers.Thejm/instanceindicatesthesubfolder,andthe0valueattheendistheIDformyface.

OK,atthisstage,wehaveeverythingweneedtoinstructOpenCVtorecognizeourface.

LoadingthedataandrecognizingfacesNextup,weneedtoloadthesetworesources(thearrayofimagesandCSVfile)intothefacerecognitionalgorithm,soitcanbetrainedtorecognizeourface.Todothis,webuildafunctionthatreadstheCSVfileand—foreachlineofthefile—loadstheimageatthecorrespondingpathintotheimagesarrayandtheIDintothelabelsarray.

defread_images(path,sz=None):

c=0

X,y=[],[]

fordirname,dirnames,filenamesinos.walk(path):

forsubdirnameindirnames:

subject_path=os.path.join(dirname,subdirname)

forfilenameinos.listdir(subject_path):

try:

if(filename==".directory"):

continue

filepath=os.path.join(subject_path,filename)

im=cv2.imread(os.path.join(subject_path,filename),

cv2.IMREAD_GRAYSCALE)

#resizetogivensize(ifgiven)

if(szisnotNone):

im=cv2.resize(im,(200,200))

X.append(np.asarray(im,dtype=np.uint8))

y.append(c)

exceptIOError,(errno,strerror):

print"I/Oerror({0}):{1}".format(errno,strerror)

except:

print"Unexpectederror:",sys.exc_info()[0]

raise

c=c+1

return[X,y]

PerforminganEigenfacesrecognitionWe’rereadytotestthefacerecognitionalgorithm.Here’sthescripttoperformit:

defface_rec():

names=['Joe','Jane','Jack']

iflen(sys.argv)<2:

print"USAGE:facerec_demo.py</path/to/images>

[</path/to/store/images/at>]"

sys.exit()

[X,y]=read_images(sys.argv[1])

y=np.asarray(y,dtype=np.int32)

iflen(sys.argv)==3:

out_dir=sys.argv[2]

model=cv2.face.createEigenFaceRecognizer()

model.train(np.asarray(X),np.asarray(y))


face_cascade=


while(True):

read,img=camera.read()

faces=face_cascade.detectMultiScale(img,1.3,5)




roi=gray[x:x+w,y:y+h]

try:

roi=cv2.resize(roi,(200,200),

interpolation=cv2.INTER_LINEAR)

params=model.predict(roi)

print"Label:%s,Confidence:%.2f"%(params[0],params[1])

cv2.putText(img,names[params[0]],(x,y-20),

cv2.FONT_HERSHEY_SIMPLEX,1,255,2)

except:

continue

cv2.imshow("camera",img)


break


Thereareafewlinesthatmaylookabitmysterious,solet’sanalyzethescript.Firstofall,there’sanarrayofnamesdeclared;thosearetheactualnamesoftheindividualpeopleIstoredinmydatabaseoffaces.It’sgreattoidentifyapersonasID0,butprinting'Joe'ontopofafacethat’sbeencorrectlydetectedandrecognizedismuchmoredramatic.

SowheneverthescriptrecognizesanID,wewillprintthecorrespondingnameinthenamesarrayinsteadofanID.

Afterthis,weloadtheimagesasdescribedinthepreviousfunction,createthefacerecognitionmodelwithcv2.createEigenFaceRecognizer(),andtrainitbypassingthetwoarraysofimagesandlabels(IDs).NotethattheEigenfacerecognizertakestwoimportantparametersthatyoucanspecify:thefirstoneisthenumberofprincipalcomponentsyouwanttokeepandthesecondisafloatvaluespecifyingaconfidencethreshold.

Nextup,werepeatasimilarprocesstothefacedetectionoperation.Thistime,though,weextendtheprocessingoftheframesbyalsooperatingfacerecognitiononanyfacethat’sbeendetected.

Thishappensintwosteps:firstly,weresizethedetectedfacetotheexpectedsize(inmycase,sampleswere200x200pixels),andthenwecallthepredict()functionontheresizedregion.

NoteThisisabitofasimplifiedprocess,anditservesthepurposeofenablingyoutohaveabasicapplicationrunningandunderstandtheprocessoffacerecognitioninOpenCV3.Inreality,youwillapplyafewmoreoptimizations,suchascorrectlyaligningandrotatingdetectedfaces,sotheaccuracyoftherecognitionismaximized.

Lastly,weobtaintheresultsoftherecognitionand,justforeffect,wedrawitintheframe:

PerformingfacerecognitionwithFisherfacesWhataboutFisherfaces?Theprocessdoesn’tchangemuch;wesimplyneedtoinstantiateadifferentalgorithm.So,thedeclarationofourmodelvariablewouldlooklikeso:

model=cv2.face.createFisherFaceRecognizer()

FisherfacetakesthesametwoargumentsasEigenfaces:theFisherfacestokeepandtheconfidencethreshold.Faceswithconfidenceabovethisthresholdwillbediscarded.

PerformingfacerecognitionwithLBPHFinally,let’stakeaquicklookattheLBPHalgorithm.Again,theprocessisverysimilar.However,theparameterstakenbythealgorithmfactoryareabitmorecomplexasthey

indicateinorder:radius,neighbors,grid_x,grid_y,andtheconfidencethreshold.Ifyoudon’tspecifythesevalues,theywillautomaticallybesetto1,8,8,8,and123.0.Themodeldeclarationwilllooklikeso:

model=cv2.face.createLBPHFaceRecognizer()

NoteNotethatwithLBPH,youwon’tneedtoresizeimages,asthedivisioningridsallowscomparingpatternsidentifiedineachcell.

DiscardingresultswithconfidencescoreThepredict()methodreturnsatwo-elementarray:thefirstelementisthelabeloftherecognizedindividualandthesecondistheconfidencescore.Allalgorithmscomewiththeoptionofsettingaconfidencescorethreshold,whichmeasuresthedistanceoftherecognizedfacefromtheoriginalmodel,thereforeascoreof0signifiesanexactmatch.

Theremaybecasesinwhichyouwouldratherretainallrecognitions,andthenapplyfurtherprocessing,soyoucancomeupwithyourownalgorithmstoestimatetheconfidencescoreofarecognition;forexample,ifyouaretryingtoidentifypeopleinavideo,youmaywanttoanalyzetheconfidencescoreinsubsequentframestoestablishwhethertherecognitionwassuccessfulornot.Inthiscase,youcaninspecttheconfidencescoreobtainedbythealgorithmanddrawyourownconclusions.

NoteTheconfidencescorevalueiscompletelydifferentinEigenfaces/FisherfacesandLBPH.EigenfacesandFisherfaceswillproducevalues(roughly)intherange0to20,000,withanyscorebelow4-5,000beingquiteaconfidentrecognition.

LBPHworkssimilarly;however,thereferencevalueforagoodrecognitionisbelow50,andanyvalueabove80isconsideredasalowconfidencescore.

Anormalcustomapproachwouldbetohold-offdrawingarectanglearoundarecognizedfaceuntilwehaveanumberofframeswithasatisfyingarbitraryconfidencescore,butyouhavetotalfreedomtouseOpenCV’sfacerecognitionmoduletotailoryourapplicationtoyourneeds.

SummaryBynow,youshouldhaveagoodunderstandingofhowfacedetectionandfacerecognitionwork,andhowtoimplementtheminPythonandOpenCV3.

Facedetectionandfacerecognitionareconstantlyevolvingbranchesofcomputervision,withalgorithmsbeingdevelopedcontinuously,andtheywillevolveevenfasterinthenearfuturewiththeemphasisposedonroboticsandtheInternetofthings.

Fornow,theaccuracyofdetectionandrecognitionheavilydependsonthequalityofthetrainingdata,somakesureyouprovideyourapplicationswithhigh-qualityfacedatabasesandyouwillbesatisfiedwiththeresults.

Chapter6.RetrievingImagesandSearchingUsingImageDescriptorsSimilartothehumaneyesandbrain,OpenCVcandetectthemainfeaturesofanimageandextracttheseintoso-calledimagedescriptors.Thesefeaturescanthenbeusedasadatabase,enablingimage-basedsearches.Moreover,wecanusekeypointstostitchimagestogetherandcomposeabiggerimage(thinkofputtingtogethermanypicturestoforma360degreepanorama).

ThischaptershowsyouhowtodetectfeaturesofanimagewithOpenCVandmakeuseofthemtomatchandsearchimages.Throughoutthechapter,wewilltakesampleimagesanddetecttheirmainfeatures,andthentrytofindasampleimagecontainedinanotherimageusinghomography.

FeaturedetectionalgorithmsThereareanumberofalgorithmsthatcanbeusedtodetectandextractfeatures,andwewillexploremostofthem.ThemostcommonalgorithmsusedinOpenCVareasfollows:

Harris:ThisalgorithmisusefultodetectcornersSIFT:ThisalgorithmisusefultodetectblobsSURF:ThisalgorithmisusefultodetectblobsFAST:ThisalgorithmisusefultodetectcornersBRIEF:ThisalgorithmisusefultodetectblobsORB:ThisalgorithmstandsforOrientedFASTandRotatedBRIEF

Matchingfeaturescanbeperformedwiththefollowingmethods:

Brute-ForcematchingFLANN-basedmatching

Spatialverificationcanthenbeperformedwithhomography.

DefiningfeaturesWhatisafeatureexactly?Whyisaparticularareaofanimageclassifiableasafeature,whileothersarenot?Broadlyspeaking,afeatureisanareaofinterestintheimagethatisuniqueoreasilyrecognizable.Asyoucanimagine,cornersandhigh-densityareasaregoodfeatures,whilepatternsthatrepeatthemselvesalotorlow-densityareas(suchasabluesky)arenot.Edgesaregoodfeaturesastheytendtodividetworegionsofanimage.Ablob(anareaofanimagethatgreatlydiffersfromitssurroundingareas)isalsoaninterestingfeature.

Mostfeaturedetectionalgorithmsrevolvearoundtheidentificationofcorners,edges,andblobs,withsomealsofocusingontheconceptofaridge,whichyoucanconceptualizeasthesymmetryaxisofanelongatedobject(think,forexample,aboutidentifyingaroadinanimage).

Somealgorithmsarebetteratidentifyingandextractingfeaturesofacertaintype,soit’simportanttoknowwhatyourinputimageissothatyoucanutilizethebesttoolinyourOpenCVbelt.

Detectingfeatures–cornersLet’sstartbyidentifyingcornersbyutilizingCornerHarris,andlet’sdothiswithanexample.IfyoucontinuestudyingOpenCVbeyondthisbook,you’llfindthat—formanyareason—chessboardsareacommonsubjectofanalysisincomputervision,partlybecauseachequeredpatternissuitedtomanytypesoffeaturedetections,andmaybebecausechessisprettypopularamonggeeks.

Here’soursampleimage:

OpenCVhasaveryhandyutilityfunctioncalledcornerHarris,whichdetectscornersinanimage.Thecodetoillustratethisisincrediblysimple:

importcv2

importnumpyasnp





img[dst>0.01*dst.max()]=[0,0,255]

while(True):

cv2.imshow('corners',img)


break


Let’sanalyzethecode:aftertheusualimports,weloadthechessboardimageandturnittograyscalesothatcornerHarriscancomputeit.Then,wecallthecornerHarrisfunction:


Themostimportantparameterhereisthethirdone,whichdefinestheapertureoftheSobeloperator.TheSobeloperatorperformsthechangedetectionintherowsandcolumnsofanimagetodetectedges,anditdoesthisusingakernel.TheOpenCVcornerHarrisfunctionusesaSobeloperatorwhoseapertureisdefinedbythisparameter.Inplainenglish,itdefineshowsensitivecornerdetectionis.Itmustbebetween3and31andbeanoddvalue.Atvalue3,allthosediagonallinesintheblacksquaresofthechessboardwillregisterascornerswhentheytouchtheborderofthesquare.At23,onlythecornersofeachsquarewillbedetectedascorners.

Considerthefollowingline:

img[dst>0.01*dst.max()]=[0,0,255]

Here,inplaceswherearedmarkinthecornersisdetected,tweakingthesecondparameterincornerHarriswillchangethis,thatis,thesmallerthevalue,thesmallerthemarksindicatingcorners.

Here’sthefinalresult:

Great,wehavecornerpointsmarked,andtheresultismeaningfulatfirstglance;allthecornersaremarkedinred.

FeatureextractionanddescriptionusingDoGandSIFTTheprecedingtechnique,whichusescornerHarris,isgreattodetectcornersandhasadistinctadvantagebecausecornersarecorners;theyaredetectedeveniftheimageisrotated.

However,ifwereduce(orincrease)thesizeofanimage,somepartsoftheimagemayloseorevengainacornerquality.

Forexample,lookatthefollowingcornerdetectionsoftheF1ItalianGrandPrixtrack:

Here’sasmallerversionofthesamescreenshot:

Youwillnoticehowcornersarealotmorecondensed;however,wedidn’tonlygaincorners,wealsolostsome!Inparticular,lookattheVarianteAscarichicanethatsquigglesattheendoftheNW/SEstraightpartofthetrack.Inthelargerversionoftheimage,boththeentranceandtheapexofthedoublebendweredetectedascorners.Inthe

smallerimage,theapexisnotdetectedassuch.Themorewereducetheimage,themorelikelyitisthatwe’regoingtolosetheentrancetothatchicanetoo.

Thislossoffeaturesraisesanissue;weneedanalgorithmthatwillworkregardlessofthescaleoftheimage.EnterSIFT:whileScale-InvariantFeatureTransformmaysoundabitmysterious,nowthatweknowwhatproblemwe’retryingtoresolve,itactuallymakessense.Weneedafunction(atransform)thatwilldetectfeatures(afeaturetransform)andwillnotoutputdifferentresultsdependingonthescaleoftheimage(ascale-invariantfeaturetransform).NotethatSIFTdoesnotdetectkeypoints(whichisdonewithDifferenceofGaussians),butitdescribestheregionsurroundingthembymeansofafeaturevector.

AquickintroductiontoDifferenceofGaussians(DoG)isinorder;wehavealreadytalkedaboutlowpassfiltersandblurringoperations,specificallywiththecv2.GaussianBlur()function.DoGistheresultofdifferentGaussianfiltersappliedtothesameimage.InChapter3,ProcessingImageswithOpenCV3,weappliedthistechniquetocomputeaveryefficientedgedetection,andtheideaisthesame.ThefinalresultofaDoGoperationcontainsareasofinterest(keypoints),whicharethengoingtobedescribedthroughSIFT.

Let’sseehowSIFTbehavesinanimagefullofcornersandfeatures:

Now,thebeautifulpanoramaofVarese(Lombardy,Italy)alsogainsacomputervisionmeaning.Here’sthecodeusedtoobtainthisprocessedimage:

importcv2

importsys

importnumpyasnp

imgpath=sys.argv[1]

img=cv2.imread(imgpath)


sift=cv2.xfeatures2d.SIFT_create()

keypoints,descriptor=sift.detectAndCompute(gray,None)

img=cv2.drawKeypoints(image=img,outImage=img,keypoints=keypoints,

flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINT,color=(51,163,236))

cv2.imshow('sift_keypoints',img)

while(True):


break


Aftertheusualimports,weloadtheimagethatwewanttoprocess.Tomakethisscriptgeneric,wewilltaketheimagepathasacommand-lineargumentusingthesysmoduleofPython:

imgpath=sys.argv[1]



Wethenturntheimageintograyscale.Atthisstage,youmayhavegatheredthatmostprocessingalgorithmsinPythonneedagrayscalefeedinordertowork.

ThenextstepistocreateaSIFTobjectandcomputethegrayscaleimage:


keypoints,descriptor=sift.detectAndCompute(gray,None)

Thisisaninterestingandimportantprocess;theSIFTobjectusesDoGtodetectkeypointsandcomputesafeaturevectorforthesurroundingregionsofeachkeypoint.Asthenameofthemethodclearlygivesaway,therearetwomainoperationsperformed:detectionandcomputation.Thereturnvalueoftheoperationisatuplecontainingkeypointinformation(keypoints)andthedescriptor.

Finally,weprocessthisimagebydrawingthekeypointsonitanddisplayingitwiththeusualimshowfunction.

NotethatinthedrawKeypointsfunction,wepassaflagthathasavalueof4.Thisisactuallythecv2moduleproperty:

cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINT

Thiscodeenablesthedrawingofcirclesandorientationofeachkeypoint.

AnatomyofakeypointLet’stakeaquicklookatthedefinition,fromtheOpenCVdocumentation,ofthekeypointclass:

pt

size

angle

response

octave

class_id

Somepropertiesaremoreself-explanatorythanothers,butlet’snottakeanythingforgrantedandgothrougheachone:

Thept(point)propertyindicatesthexandycoordinatesofthekeypointintheimage.Thesizepropertyindicatesthediameterofthefeature.Theanglepropertyindicatestheorientationofthefeatureasshownintheprecedingprocessedimage.Theresponsepropertyindicatesthestrengthofthekeypoint.SomefeaturesareclassifiedbySIFTasstrongerthanothers,andresponseisthepropertyyouwouldchecktoevaluatethestrengthofafeature.Theoctavepropertyindicatesthelayerinthepyramidwherethefeaturewasfound.Tofullyexplainthisproperty,wewouldneedtowriteanentirechapteronit,soIwillonlyintroducethebasicconcept.TheSIFTalgorithmoperatesinasimilarfashiontofacedetectionalgorithmsinthat,itprocessesthesameimagesequentiallybutalterstheparametersofthecomputation.

Forexample,thescaleoftheimageandneighboringpixelsareparametersthatchangeateachiteration(octave)ofthealgorithm.So,theoctavepropertyindicatesthelayeratwhichthekeypointwasdetected.

Finally,theobjectIDistheIDofthekeypoint.

FeatureextractionanddetectionusingFastHessianandSURFComputervisionisarelativelyrecentbranchofcomputerscienceandmanyalgorithmsandtechniquesareofrecentinvention.SIFTisinfactonly16yearsold,havingbeenpublishedbyDavidLowein1999.

SURFisafeaturedetectionalgorithmpublishedin2006byHerbertBay,whichisseveraltimesfasterthanSIFT,anditispartiallyinspiredbyit.

NoteNotethatbothSIFTandSURFarepatentedalgorithmsand,forthisreason,aremadeavailableinthexfeatures2dmoduleofOpenCV.

ItisnotparticularlyrelevanttothisbooktounderstandhowSURFworksunderthehood,asmuchaswecanuseitinourapplicationsandmakethebestofit.WhatisimportanttounderstandisthatSURFisanOpenCVclassthatoperateskeypointdetectionwiththeFastHessianalgorithmandextractionwithSURF,muchliketheSIFTclassinOpenCVoperatingkeypointdetectionwithDoGandextractionwithSIFT.

Also,thegoodnewsisthatasafeaturedetectionalgorithm,theAPIofSURFdoesnotdifferfromSIFT.Therefore,wecansimplyeditthepreviousscripttodynamicallychooseafeaturedetectionalgorithminsteadofrewritingtheentireprogram.

Asweonlysupporttwoalgorithmsfornow,thereisnoneedtofindaparticularlyelegantsolutiontotheevaluationofthealgorithmtobeusedandwe’llusethesimpleifblocks,asshowninthefollowingcode:

importcv2

importsys

importnumpyasnp

imgpath=sys.argv[1]


alg=sys.argv[2]

deffd(algorithm):

ifalgorithm=="SIFT":

returncv2.xfeatures2d.SIFT_create()

ifalgorithm=="SURF":

returncv2.xfeatures2d.SURF_create(float(sys.argv[3])iflen(sys.argv)

==4else4000)


fd_alg=fd(alg)

keypoints,descriptor=fd_alg.detectAndCompute(gray,None)

img=cv2.drawKeypoints(image=img,outImage=img,keypoints=keypoints,

flags=4,color=(51,163,236))

cv2.imshow('keypoints',img)

while(True):


break


Here’stheresultusingSURFwithathreshold:

ThisimagehasbeenobtainedbyprocessingitwithaSURFalgorithmusingaHessianthresholdof8000.Tobeprecise,Iranthefollowingcommand:

>pythonfeat_det.pyimages/varese.jpgSURF8000

Thehigherthethreshold,thelessfeaturesidentified,soplayaroundwiththevaluesuntilyoureachanoptimaldetection.Intheprecedingcase,youcanclearlyseehowindividualbuildingsaredetectedasfeatures.

InaprocesssimilartotheoneweadoptedinChapter4,DepthEstimationandSegmentation,whenwewerecalculatingdisparitymaps,try—asanexercise—tocreateatrackbartofeedthevalueoftheHessianthresholdtotheSURFinstance,andseethenumberoffeaturesincreaseanddecreaseinaninverselyproportionalfashion.

Now,let’sexaminecornerdetectionwithFAST,theBRIEFkeypointdescriptor,andORB(whichusesthetwo)andputfeaturedetectiontogooduse.

ORBfeaturedetectionandfeaturematchingIfSIFTisyoung,andSURFyounger,ORBisinitsinfancy.ORBwasfirstpublishedin2011asafastalternativetoSIFTandSURF.

Thealgorithmwaspublishedinthepaper,ORB:anefficientalternativetoSIFTorSURF,andisavailableinthePDFformatathttp://www.vision.cs.chubu.ac.jp/CV-R/pdf/Rublee_iccv2011.pdf.

ORBmixestechniquesusedintheFASTkeypointdetectionandtheBRIEFdescriptor,soitisdefinitelyworthtakingaquicklookatFASTandBRIEFfirst.WewillthentalkaboutBrute-Forcematching—oneofthealgorithmsusedforfeaturematching—andshowanexampleoffeaturematching.

FASTTheFeaturesfromAcceleratedSegmentTest(FAST)algorithmworksinacleverway;itdrawsacirclearoundincluding16pixels.Itthenmarkseachpixelbrighterordarkerthanaparticularthresholdcomparedtothecenterofthecircle.Acornerisdefinedbyidentifyinganumberofcontiguouspixelsmarkedasbrighterordarker.

FASTimplementsahigh-speedtest,whichattemptsatquicklyskippingthewhole16-pixeltest.Tounderstandhowthistestworks,let’stakealookatthisscreenshot:

Asyoucansee,threeoutoffourofthetestpixels(pixelsnumber1,9,5,and13)mustbewithin(orbeyond)thethreshold(and,therefore,markedasbrighterordarker)andonemustbeintheoppositesideofthethreshold.Ifallfouraremarkedasbrighterordarker,ortwoareandtwoarenot,thepixelisnotacandidatecorner.

FASTisanincrediblycleveralgorithm,butnotdevoidofweaknesses,andtocompensatetheseweaknesses,developersanalyzingimagescanimplementamachinelearningapproach,feedingasetofimages(relevanttoyourapplication)tothealgorithmsothatcornerdetectionisoptimized.

Despitethis,FASTdependsonathreshold,sothedeveloper’sinputisalwaysnecessary(unlikeSIFT).

http://www.vision.cs.chubu.ac.jp/CV-R/pdf/Rublee_iccv2011.pdf

BRIEFBinaryRobustIndependentElementaryFeatures(BRIEF),ontheotherhand,isnotafeaturedetectionalgorithm,butadescriptor.Wehavenotyetexploredthisconcept,solet’sexplainwhatadescriptoris,andthenlookatBRIEF.

YouwillnoticethatwhenwepreviouslyanalyzedimageswithSIFTandSURF,theheartoftheentireprocessisthecalltothedetectAndComputefunction.Thisfunctionoperatestwodifferentsteps:detectionandcomputation,andtheyreturntwodifferentresultsifcoupledinatuple.

Theresultofdetectionisasetofkeypoints;theresultofthecomputationisthedescriptor.ThismeansthattheOpenCV’sSIFTandSURFclassesarebothdetectorsanddescriptors(although,rememberthattheoriginalalgorithmsarenot!OpenCV’sSIFTisreallyDoGplusSIFTandOpenCV’sSURFisreallyFastHessianplusSURF).

Keypointdescriptorsarearepresentationoftheimagethatservesasthegatewaytofeaturematchingbecauseyoucancomparethekeypointdescriptorsoftwoimagesandfindcommonalities.

BRIEFisoneofthefastestdescriptorscurrentlyavailable.ThetheorybehindBRIEFisactuallyquitecomplicated,butsufficetosaythatBRIEFadoptsaseriesofoptimizationsthatmakeitaverygoodchoiceforfeaturematching.

Brute-ForcematchingABrute-Forcematcherisadescriptormatcherthatcomparestwodescriptorsandgeneratesaresultthatisalistofmatches.Thereasonwhyit’scalledBrute-Forceisthatthereislittleoptimizationinvolvedinthealgorithm;allthefeaturesinthefirstdescriptorarecomparedtothefeaturesintheseconddescriptor,andeachcomparisonisgivenadistancevalueandthebestresultisconsideredamatch.

Thisiswhyit’scalledBrute-Force.Incomputing,theterm,brute-force,isoftenassociatedwithanapproachthatprioritizestheexhaustionofallpossiblecombinations(forexample,allthepossiblecombinationsofcharacterstocrackapassword)oversomecleverandconvolutedalgorithmicallogic.OpenCVprovidesaBFMatcherobjectthatdoesjustthat.

FeaturematchingwithORBNowthatwehaveageneralideaofwhatFASTandBRIEFare,wecanunderstandwhytheteambehindORB(atthetimecomposedbyEthanRublee,VincentRabaud,KurtKonolige,andGaryR.Bradski)chosethesetwoalgorithmsasafoundationforORB.

Intheirpaper,theauthorsaimatachievingthefollowingresults:

TheadditionofafastandaccurateorientationcomponenttoFASTTheefficientcomputationoforientedBRIEFfeaturesAnalysisofvarianceandcorrelationoforientedBRIEFfeaturesAlearningmethodtodecorrelateBRIEFfeaturesunderrotationalinvariance,leadingtobetterperformanceinnearest-neighborapplications.

Asidefromverytechnicaljargon,themainpointsarequiteclear;ORBaimsatoptimizingandspeedingupoperations,includingtheveryimportantstepofutilizingBRIEFinarotation-awarefashionsothatmatchingisimprovedeveninsituationswhereatrainingimagehasaverydifferentrotationtothequeryimage.

Atthisstage,though,Ibetyouhavehadenoughofthetheoryandwanttosinkyourteethinsomefeaturematching,solet’sgolookatsomecode.

Asanavidlistenerofmusic,thefirstexamplethatcomestomymindistogetthelogoofabandandmatchittooneoftheband’salbums:

importnumpyasnp

importcv2


img1=cv2.imread('images/manowar_logo.png',cv2.IMREAD_GRAYSCALE)

img2=cv2.imread('images/manowar_single.jpg',cv2.IMREAD_GRAYSCALE)

orb=cv2.ORB_create()

kp1,des1=orb.detectAndCompute(img1,None)


bf=cv2.BFMatcher(cv2.NORM_HAMMING,crossCheck=True)

matches=bf.match(des1,des2)

matches=sorted(matches,key=lambdax:x.distance)

img3=cv2.drawMatches(img1,kp1,img2,kp2,matches[:40],img2,flags=2)

plt.imshow(img3),plt.show()

Let’snowexaminethiscodestepbystep.

Aftertheusualimports,weloadtwoimages(thequeryimageandthetrainingimage).

Notethatyouhaveprobablyseentheloadingofimageswithasecondparameterwiththevalueof0beingpassed.Thisisbecausecv2.imreadtakesasecondparameterthatcanbeoneofthefollowingflags:

IMREAD_ANYCOLOR=4

IMREAD_ANYDEPTH=2

IMREAD_COLOR=1

IMREAD_GRAYSCALE=0

IMREAD_LOAD_GDAL=8

IMREAD_UNCHANGED=-1

Asyoucansee,cv2.IMREAD_GRAYSCALEisequalto0,soyoucanpasstheflagitselforitsvalue;theyarethesamething.

Thisistheimagewe’veloaded:

Thisisanotherimagethatwe’veloaded:

Now,weproceedtocreatingtheORBfeaturedetectoranddescriptor:




InasimilarfashiontowhatwedidwithSIFTandSURF,wedetectandcomputethe

keypointsanddescriptorsforbothimages.

Thetheoryatthispointisprettysimple;iteratethroughthedescriptorsanddeterminewhethertheyareamatchornot,andthencalculatethequalityofthismatch(distance)andsortthematchessothatwecandisplaythetopnmatcheswithadegreeofconfidencethattheyare,infact,matchingfeaturesonbothimages.

BFMatcher,asdescribedinBrute-Forcematching,doesthisforus:


matches=bf.match(des1,des2)

matches=sorted(matches,key=lambdax:x.distance)

Atthisstage,wealreadyhavealltheinformationweneed,butascomputervisionenthusiasts,weplacequiteabitofimportanceonvisuallyrepresentingdata,solet’sdrawthesematchesinamatplotlibchart:

img3=cv2.drawMatches(img1,kp1,img2,kp2,matches[:40],img2,flags=2)


Theresultisasfollows:

UsingK-NearestNeighborsmatchingThereareanumberofalgorithmsthatcanbeusedtodetectmatchessothatwecandrawthem.OneofthemisK-NearestNeighbors(KNN).Usingdifferentalgorithmsfordifferenttaskscanbereallybeneficial,becauseeachalgorithmhasstrengthsandweaknesses.Somemaybemoreaccuratethanothers,somemaybefasterorlesscomputationallyexpensive,soit’suptoyoutodecidewhichonetouse,dependingonthetaskathand.

Forexample,ifyouhavehardwareconstraints,youmaychooseanalgorithmthatislesscostly.Ifyou’redevelopingareal-timeapplication,youmaychoosethefastestalgorithm,regardlessofhowheavyitisontheprocessorormemoryusage.

Amongallthemachinelearningalgorithms,KNNisprobablythesimplest,andwhilethetheorybehinditisinteresting,itiswelloutofthescopeofthisbook.Instead,wewillsimplyshowyouhowtouseKNNinyourapplication,whichisnotverydifferentfromtheprecedingexample.

Crucially,thetwopointswherethescriptdifferstoswitchtoKNNareinthewaywecalculatematcheswiththeBrute-Forcematcher,andthewaywedrawthesematches.Theprecedingexample,whichhasbeeneditedtouseKNN,lookslikethis:

importnumpyasnp

importcv2


img1=cv2.imread('images/manowar_logo.png',0)

img2=cv2.imread('images/manowar_single.jpg',0)





matches=bf.knnMatch(des1,des2,k=2)

img3=cv2.drawMatchesKnn(img1,kp1,img2,kp2,matches,img2,flags=2)


Thefinalresultissomewhatsimilartothepreviousone,sowhatisthedifferencebetweenmatchandknnMatch?Thedifferenceisthatmatchreturnsbestmatches,whileKNNreturnskmatches,givingthedevelopertheoptiontofurthermanipulatethematchesobtainedwithknnMatch.

Forexample,youcoulditeratethroughthematchesandapplyaratiotestsothatyoucanfilteroutmatchesthatdonotsatisfyauser-definedcondition.

FLANN-basedmatchingFinally,wearegoingtotakealookatFastLibraryforApproximateNearestNeighbors(FLANN).TheofficialInternethomeofFLANNisathttp://www.cs.ubc.ca/research/flann/.

LikeORB,FLANNhasamorepermissivelicensethanSIFTorSURF,soyoucanfreelyuseitinyourproject.QuotingthewebsiteofFLANN,

“FLANNisalibraryforperformingfastapproximatenearestneighborsearchesinhighdimensionalspaces.Itcontainsacollectionofalgorithmswefoundtoworkbestfornearestneighborsearchandasystemforautomaticallychoosingthebestalgorithmandoptimumparametersdependingonthedataset.

FLANNiswritteninC++andcontainsbindingsforthefollowinglanguages:C,MATLABandPython.”

Inotherwords,FLANNpossessesaninternalmechanismthatattemptsatemployingthebestalgorithmtoprocessadatasetdependingonthedataitself.FLANNhasbeenproventobe10timestimesfasterthanothernearestneighborssearchsoftware.

FLANNisevenavailableonGitHubathttps://github.com/mariusmuja/flann.Inmyexperience,I’vefoundFLANN-basedmatchingtobeveryaccurateandfastaswellasfriendlytouse.

Let’slookatanexampleofFLANN-basedfeaturematching:

importnumpyasnp

importcv2


queryImage=cv2.imread('images/bathory_album.jpg',0)

trainingImage=cv2.imread('images/vinyls.jpg',0)

#createSIFTanddetect/compute


kp1,des1=sift.detectAndCompute(queryImage,None)

kp2,des2=sift.detectAndCompute(trainingImage,None)

#FLANNmatcherparameters

FLANN_INDEX_KDTREE=0

indexParams=dict(algorithm=FLANN_INDEX_KDTREE,trees=5)

searchParams=dict(checks=50)#orpassemptydictionary

flann=cv2.FlannBasedMatcher(indexParams,searchParams)

matches=flann.knnMatch(des1,des2,k=2)

#prepareanemptymasktodrawgoodmatches

matchesMask=[[0,0]foriinxrange(len(matches))]

#DavidG.Lowe'sratiotest,populatethemask

fori,(m,n)inenumerate(matches):

http://www.cs.ubc.ca/research/flann/

https://github.com/mariusmuja/flann

ifm.distance<0.7*n.distance:

matchesMask[i]=[1,0]

drawParams=dict(matchColor=(0,255,0),

singlePointColor=(255,0,0),

matchesMask=matchesMask,

flags=0)

resultImage=

cv2.drawMatchesKnn(queryImage,kp1,trainingImage,kp2,matches,None,**drawPara

ms)

plt.imshow(resultImage,),plt.show()

Somepartsoftheprecedingscriptwillbefamiliartoyouatthisstage(importofmodules,imageloading,andcreationofaSIFTobject).

NoteTheinterestingpartisthedeclarationoftheFLANNmatcher,whichfollowsthedocumentationathttp://www.cs.ubc.ca/~mariusm/uploads/FLANN/flann_manual-1.6.pdf.

WefindthattheFLANNmatchertakestwoparameters:anindexParamsobjectandasearchParamsobject.Theseparameters,passedinadictionaryforminPython(andastructinC++),determinethebehavioroftheindexandsearchobjectsusedinternallybyFLANNtocomputethematches.

Inthiscase,wecouldhavechosenbetweenLinearIndex,KTreeIndex,KMeansIndex,CompositeIndex,andAutotuneIndex,andwechoseKTreeIndex.Why?Thisisbecauseit’sasimpleenoughindextoconfigure(onlyrequirestheusertospecifythenumberofkerneldensitytreestobeprocessed;agoodvalueisbetween1and16)andcleverenough(thekd-treesareprocessedinparallel).ThesearchParamsdictionaryonlycontainsonefield(checks)thatspecifiesthenumberoftimesanindextreeshouldbetraversed.Thehigherthevalue,thelongerittakestocomputethematching,butitwillalsobemoreaccurate.

Inreality,alotdependsontheinputthatyoufeedtheprogramwith.I’vefoundthat5kd-treesand50checksalwaysyieldarespectablyaccurateresult,whileonlytakingashorttimetocomplete.

AfterthecreationoftheFLANNmatcherandhavingcreatedthematchesarray,matchesarethenfilteredaccordingtothetestdescribedbyLoweinhispaper,DistinctiveImageFeaturesfromScale-InvariantKeypoints,availableathttps://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf.

Initschapter,Applicationtoobjectrecognition,Loweexplainsthatnotallmatchesare“good”ones,andthatfilteringaccordingtoanarbitrarythresholdwouldnotyieldgoodresultsallthetime.Instead,Dr.Loweexplains,

“Theprobabilitythatamatchiscorrectcanbedeterminedbytakingtheratioofdistancefromtheclosestneighbortothedistanceofthesecondclosest.”

http://www.cs.ubc.ca/~mariusm/uploads/FLANN/flann_manual-1.6.pdf

https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

Intheprecedingexample,discardinganyvaluewithadistancegreaterthan0.7willresultinjustafewgoodmatchesbeingfilteredout,whilegettingridofaround90percentoffalsematches.

Let’sunveiltheresultofapracticalexampleofFLANN.ThisisthequeryimagethatI’vefedthescript:

Thisisthetrainingimage:

Here,youmaynoticethattheimagecontainsthequeryimageatposition(5,3)ofthisgrid.

ThisistheFLANNprocessedresult:

Aperfectmatch!!

FLANNmatchingwithhomographyFirstofall,whatishomography?Let’sreadadefinitionfromtheInternet:

“Arelationbetweentwofigures,suchthattoanypointoftheonecorrespondsoneandbutonepointintheother,andviseversa.Thus,atangentlinerollingonacirclecutstwofixedtangentsofthecircleintwosetsofpointsthatarehomographic.”

Ifyou—likeme—arenonethewiserfromtheprecedingdefinition,youwillprobablyfindthisexplanationabitclearer:homographyisaconditioninwhichtwofiguresfindeachotherwhenoneisaperspectivedistortionoftheother.

Unlikeallthepreviousexamples,let’sfirsttakealookatwhatwewanttoachievesothatwecanfullyunderstandwhathomographyis.Then,we’llgothroughthecode.Here’sthefinalresult:

Asyoucanseefromthescreenshot,wetookasubjectontheleft,correctlyidentifiedintheimageontheright-handside,drewmatchinglinesbetweenkeypoints,andevendrewawhitebordershowingtheperspectivedeformationoftheseedsubjectintheright-handsideoftheimage:

importnumpyasnp

importcv2


MIN_MATCH_COUNT=10

img1=cv2.imread('images/bb.jpg',0)

img2=cv2.imread('images/color2_small.jpg',0)


kp1,des1=sift.detectAndCompute(img1,None)

kp2,des2=sift.detectAndCompute(img2,None)


index_params=dict(algorithm=FLANN_INDEX_KDTREE,trees=5)

search_params=dict(checks=50)

flann=cv2.FlannBasedMatcher(index_params,search_params)

matches=flann.knnMatch(des1,des2,k=2)

#storeallthegoodmatchesasperLowe'sratiotest.

good=[]

form,ninmatches:


good.append(m)

iflen(good)>MIN_MATCH_COUNT:

src_pts=np.float32([kp1[m.queryIdx].ptformingood

]).reshape(-1,1,2)

dst_pts=np.float32([kp2[m.trainIdx].ptformingood

]).reshape(-1,1,2)

M,mask=cv2.findHomography(src_pts,dst_pts,cv2.RANSAC,5.0)

matchesMask=mask.ravel().tolist()

h,w=img1.shape

pts=np.float32([[0,0],[0,h-1],[w-1,h-1],[w-1,0]]).reshape(-1,1,2)

dst=cv2.perspectiveTransform(pts,M)

img2=cv2.polylines(img2,[np.int32(dst)],True,255,3,cv2.LINE_AA)

else:

print"Notenoughmatchesarefound-%d/%d"%

(len(good),MIN_MATCH_COUNT)

matchesMask=None

draw_params=dict(matchColor=(0,255,0),#drawmatchesingreencolor

singlePointColor=None,

matchesMask=matchesMask,#drawonlyinliers

flags=2)

img3=cv2.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)

plt.imshow(img3,'gray'),plt.show()

WhencomparedtothepreviousFLANN-basedmatchingexample,theonlydifference(andthisiswherealltheactionhappens)isintheifblock.

Here’swhathappensinthiscodestepbystep:firstly,wemakesurethatwehaveatleastacertainnumberofgoodmatches(theminimumrequiredtocomputeahomographyisfour),whichwewillarbitrarilysetat10(inreallife,youwouldprobablyuseahighervaluethanthis):


Then,wefindthekeypointsintheoriginalimageandthetrainingimage:

src_pts=np.float32([kp1[m.queryIdx].ptformingood]).reshape(-1,1,2)

dst_pts=np.float32([kp2[m.trainIdx].ptformingood]).reshape(-1,1,2)

Now,wefindthehomography:

M,mask=cv2.findHomography(src_pts,dst_pts,cv2.RANSAC,5.0)

matchesMask=mask.ravel().tolist()

NotethatwecreatematchesMask,whichwillbeusedinthefinaldrawingofthematchessothatonlypointslyingwithinthehomographywillhavematchinglinesdrawn.

Atthisstage,wesimplyhavetocalculatetheperspectivedistortionoftheoriginalobjectintothesecondpicturesothatwecandrawtheborder:

h,w=img1.shape

pts=np.float32([[0,0],[0,h-1],[w-1,h-1],[w-1,0]]).reshape(-1,1,2)

dst=cv2.perspectiveTransform(pts,M)

img2=cv2.polylines(img2,[np.int32(dst)],True,255,3,cv2.LINE_AA)

Andwethenproceedtodrawasperallourpreviousexamples.

Asampleapplication–tattooforensicsLet’sconcludethischapterwithareal-life(orkindof)example.Imaginethatyou’reworkingfortheGothamforensicsdepartmentandyouneedtoidentifyatattoo.Youhavetheoriginalpictureofthetattoo(imaginethiscomingfromaCCTVfootage)belongingtoacriminal,butyoudon’tknowtheidentityoftheperson.However,youpossessadatabaseoftattoos,indexedwiththenameofthepersontowhomthetattoobelongs.

So,let’sdividethetaskintwoparts:saveimagedescriptorstofilesfirst,andthen,scantheseformatchesagainstthepictureweareusingasaqueryimage.

SavingimagedescriptorstofileThefirstthingwewilldoissaveimagedescriptorstoanexternalfile.Thisissothatwedon’thavetorecreatethedescriptorseverytimewewanttoscantwoimagesformatchesandhomography.

Inourapplication,wewillscanafolderforimagesandcreatethecorrespondingdescriptorfilessothatwehavethemreadilyavailableforfuturesearches.

Tocreatedescriptorsandsavethemtoafile,wewilluseaprocesswehaveusedanumberoftimesinthischapter,namelyloadanimage,createafeaturedetector,detect,andcompute:

#generate_descriptors.py

importcv2

importnumpyasnp

fromosimportwalk

fromos.pathimportjoin

importsys

defcreate_descriptors(folder):

files=[]

for(dirpath,dirnames,filenames)inwalk(folder):

files.extend(filenames)

forfinfiles:

save_descriptor(folder,f,cv2.xfeatures2d.SIFT_create())

defsave_descriptor(folder,image_path,feature_detector):

img=cv2.imread(join(folder,image_path),0)

keypoints,descriptors=feature_detector.detectAndCompute(img,None)

descriptor_file=image_path.replace("jpg","npy")

np.save(join(folder,descriptor_file),descriptors)

dir=sys.argv[1]

create_descriptors(dir)

Inthisscript,wepassthefoldernamewhereallourimagesarecontained,andthencreateallthedescriptorfilesinthesamefolder.

NumPyhasaveryhandysave()utility,whichdumpsarraydataintoafileinanoptimizedway.Togeneratethedescriptorsinthefoldercontainingyourscript,runthis

command:

>pythongenerate_descriptors.py<foldercontainingimages>

NotethatcPickle/picklearemorepopularlibrariesforPythonobjectserialization.However,inthisparticularcontext,wearetryingtolimitourselvestotheusageofOpenCVandPythonwithNumPyandSciPy.

ScanningformatchesNowthatwehavedescriptorssavedtofiles,allweneedtodoistorepeatthehomographyprocessonallthedescriptorsandfindapotentialmatchtoourqueryimage.

Thisistheprocesswewillputinplace:

Loadingaqueryimageandcreatingadescriptorforit(tattoo_seed.jpg)ScanningthefolderwithdescriptorsForeachdescriptor,computingaFLANN-basedmatchIfthenumberofmatchesisbeyondanarbitrarythreshold,includingthefileofpotentialculprits(rememberwe’reinvestigatingacrime)Ofalltheculprits,electingtheonewiththehighestnumberofmatchesasthepotentialsuspect

Let’sinspectthecodetoachievethis:


fromosimportwalk

importnumpyasnp

importcv2

fromsysimportargv

#createanarrayoffilenames

folder=argv[1]

query=cv2.imread(join(folder,"tattoo_seed.jpg"),0)

#createfiles,images,descriptorsglobals

files=[]

images=[]

descriptors=[]

for(dirpath,dirnames,filenames)inwalk(folder):

files.extend(filenames)

forfinfiles:

iff.endswith("npy")andf!="tattoo_seed.npy":

descriptors.append(f)

printdescriptors

#createthesiftdetector


query_kp,query_ds=sift.detectAndCompute(query,None)

#createFLANNmatcher


index_params=dict(algorithm=FLANN_INDEX_KDTREE,trees=5)

search_params=dict(checks=50)

flann=cv2.FlannBasedMatcher(index_params,search_params)

#minimumnumberofmatches

MIN_MATCH_COUNT=10

potential_culprits={}

print">>Initiatingpicturescan…"

fordindescriptors:

print"---------analyzing%sformatches------------"%d

matches=flann.knnMatch(query_ds,np.load(join(folder,d)),k=2)

good=[]

form,ninmatches:


good.append(m)


print"%sisamatch!(%d)"%(d,len(good))

else:

print"%sisnotamatch"%d

potential_culprits[d]=len(good)

max_matches=None

potential_suspect=None

forculprit,matchesinpotential_culprits.iteritems():

ifmax_matches==Noneormatches>max_matches:

max_matches=matches

potential_suspect=culprit

print"potentialsuspectis%s"%potential_suspect.replace("npy",

"").upper()

Isavedthisscriptasscan_for_matches.py.Theonlyelementofnoveltyinthisscriptistheuseofnumpy.load(filename),whichloadsannpyfileintoannparray.

Runningthescriptproducesthefollowingoutput:

>>Initiatingpicturescan…

---------analyzingposion-ivy.npyformatches------------

posion-ivy.npyisnotamatch

---------analyzingbane.npyformatches------------

bane.npyisnotamatch

---------analyzingtwo-face.npyformatches------------

two-face.npyisnotamatch

---------analyzingriddler.npyformatches------------

riddler.npyisnotamatch

---------analyzingpenguin.npyformatches------------

penguin.npyisnotamatch

---------analyzingdr-hurt.npyformatches------------

dr-hurt.npyisamatch!(298)

---------analyzinghush.npyformatches------------

hush.npyisamatch!(301)

potentialsuspectisHUSH.

Ifweweretorepresentthisgraphically,thisiswhatwewouldsee:

SummaryInthischapter,welearnedaboutdetectingfeaturesinimagesandextractingthemintodescriptors.WeexploredanumberofalgorithmsavailableinOpenCVtoaccomplishthistask,andthenappliedthemtoreal-lifescenariostounderstandareal-worldapplicationoftheconceptsweexplored.

Wearenowfamiliarwiththeconceptofdetectingfeaturesinanimage(oravideoframe),whichisagoodfoundationforthenextchapter.

Chapter7.DetectingandRecognizingObjectsThischapterwillintroducetheconceptofdetectingandrecognizingobjects,whichisoneofthemostcommonchallengesincomputervision.You’vecomethisfarinthebook,soatthisstage,you’rewonderinghowfarareyoufrommountingacomputerinyourcarthatwillgiveyouinformationaboutcarsandpeoplesurroundingyouthroughtheuseofacamera.Well,You’renottoofarfromyourgoal,actually.

Inthischapter,wewillexpandontheconceptofobjectdetection,whichweinitiallyexploredwhentalkingaboutrecognizingfaces,andadaptittoallsortsofreal-lifeobjects,notjustfaces.

ObjectdetectionandrecognitiontechniquesWemadeadistinctioninChapter5,DetectingandRecognizingFaces,whichwe’llreiterateforclarity:detectinganobjectistheabilityofaprogramtodetermineifacertainregionofanimagecontainsanunidentifiedobject,andrecognizingistheabilityofaprogramtoidentifythisobject.Recognizingnormallyonlyoccursinareasofinterestwhereanobjecthasbeendetected,forexample,wehaveattemptedtorecognizefacesontheareasofanimagethatcontainedafaceinthefirstplace.

Whenitcomestorecognizinganddetectingobjects,thereareanumberoftechniquesusedincomputervision,whichwe’llbeexamining:

HistogramofOrientedGradientsImagepyramidsSlidingwindows

Unlikefeaturedetectionalgorithms,thesearenotmutuallyexclusivetechniques,rather,theyarecomplimentary.YoucanperformaHistogramofOrientedGradients(HOG)whileapplyingtheslidingwindowstechnique.

So,let’stakealookatHOGfirstandunderstandwhatitis.

HOGdescriptorsHOGisafeaturedescriptor,soitbelongstothesamefamilyofalgorithms,suchasSIFT,SURF,andORB.

Itisusedinimageandvideoprocessingtodetectobjects.Itsinternalmechanismisreallyclever;animageisdividedintoportionsandagradientforeachportioniscalculated.We’veobservedasimilarapproachwhenwetalkedaboutfacerecognitionthroughLBPH.

HOG,however,calculateshistogramsthatarenotbasedoncolorvalues,rather,theyarebasedongradients.AsHOGisafeaturedescriptor,itiscapableofdeliveringthetypeofinformationthatisvitalforfeaturematchingandobjectdetection/recognition.

BeforedivingintothetechnicaldetailsofhowHOGworks,let’sfirsttakealookathowHOGseestheworld;hereisanimageofatruck:

ThisisitsHOGversion:

Youcaneasilyrecognizethewheelsandthemainstructureofthevehicle.So,whatisHOGseeing?Firstofall,youcanseehowtheimageisdividedintocells;theseare16x16pixelscells.Eachcellcontainsavisualrepresentationofthecalculatedgradientsofcolorineightdirections(N,NW,W,SW,S,SE,E,andNE).

Theseeightvaluescontainedineachcellarethefamoushistograms.Therefore,asinglecellgetsauniquesignature,whichyoucanmentallyvisualizetobesomewhatlikethis:

Theextrapolationofhistogramsintodescriptorsisquiteacomplexprocess.First,localhistogramsforeachcellarecalculated.Thecellsaregroupedintolargerregionscalledblocks.Theseblockscanbemadeofanynumberofcells,butDalalandTriggsfoundthat2x2cellblocksyieldedthebestresultswhenperformingpeopledetection.Ablock-widevectoriscreatedsothatitcanbenormalized,accountingforvariationsinilluminationandshadowing(asinglecellistoosmallaregiontodetectsuchvariations).Thisimprovestheaccuracyofdetectionasitreducestheilluminationandshadowingdifferencebetweenthesampleandtheblockbeingexamined.

Simplycomparingcellsintwoimageswouldnotworkunlesstheimagesareidentical(bothintermsofsizeanddata).

Therearetwomainproblemstoresolve:

LocationScale

ThescaleissueImagine,forexample,ifyoursamplewasadetail(say,abike)extrapolatedfromalargerimage,andyou’retryingtocomparethetwopictures.Youwouldnotobtainthesamegradientsignaturesandthedetectionwouldfail(eventhoughthebikeisinbothpictures).

ThelocationissueOncewe’veresolvedthescaleproblem,wehaveanotherobstacleinourpath:apotentiallydetectableobjectcanbeanywhereintheimage,soweneedtoscantheentireimageinportionstomakesurewecanidentifyareasofinterest,andwithintheseareas,trytodetectobjects.Evenifasampleimageandobjectintheimageareofidenticalsize,thereneedstobeawaytoinstructOpenCVtolocatethisobject.So,therestoftheimageisdiscardedandacomparisonismadeonpotentiallymatchingregions.

Toobviatetheseproblems,weneedtofamiliarizeourselveswiththeconceptsofimagepyramidandslidingwindows.

Imagepyramid

Manyofthealgorithmsusedincomputervisionutilizeaconceptcalledpyramid.

Animagepyramidisamultiscalerepresentationofanimage.Thisdiagramshouldhelpyouunderstandthisconcept:

Amultiscalerepresentationofanimage,oranimagepyramid,helpsyouresolvetheproblemofdetectingobjectsatdifferentscales.Theimportanceofthisconceptiseasilyexplainedthroughreal-lifehardfacts,suchasitisextremelyunlikelythatanobjectwillappearinanimageattheexactscaleitappearedinoursampleimage.

Moreover,youwilllearnthatobjectclassifiers(utilitiesthatallowyoutodetectobjectsinOpenCV)needtraining,andthistrainingisprovidedthroughimagedatabasesmadeupofpositivematchesandnegativematches.Amongthepositives,itisagainunlikelythattheobjectwewanttoidentifywillappearinthesamescalethroughoutthetrainingdataset.

We’vegotit,Joe.Weneedtotakescaleoutoftheequation,sonowlet’sexaminehowanimagepyramidisbuilt.

Animagepyramidisbuiltthroughthefollowingprocess:

1. Takeanimage.2. Resize(smaller)theimageusinganarbitraryscaleparameter.3. Smoothentheimage(usingGaussianblurring).4. Iftheimageislargerthananarbitraryminimumsize,repeattheprocessfromstep1.

Despiteexploringimagepyramids,scaleratio,andminimumsizesonlyatthisstageofthebook,you’vealreadydealtwiththem.IfyourecallChapter5,DetectingandRecognizingFaces,weusedthedetectMultiScalemethodoftheCascadeClassifierobject.

Straightaway,detectMultiScaledoesn’tsoundsoobscureanymore;infact,ithas

becomeself-explanatory.Thecascadeclassifierobjectattemptsatdetectinganobjectatdifferentscalesofaninputimage.ThesecondpieceofinformationthatshouldbecomemuchcleareristhescaleFactorparameterofthedetectMultiScale()method.Thisparameterrepresentstheratioatwhichtheimagewillberesampledtoasmallersizeateachstepofthepyramid.

ThesmallerthescaleFactorparameter,themorelayersinthepyramid,andtheslowerandmorecomputationallyintensivetheoperationwillbe,although—toanextent—moreaccurateinresults.

So,bynow,youshouldhaveanunderstandingofwhatanimagepyramidis,andwhyitisusedincomputervision.Let’snowmoveontoslidingwindows.

Slidingwindows

Slidingwindowsisatechniqueusedincomputervisionthatconsistsofexaminingtheshiftingportionsofanimage(slidingwindows)andoperatingdetectiononthoseusingimagepyramids.Thisisdonesothatanobjectcanbedetectedatamultiscalelevel.

Slidingwindowsresolveslocationissuesbyscanningsmallerregionsofalargerimage,andthenrepeatingthescanningondifferentscalesofthesameimage.

Withthistechnique,eachimageisdecomposedintoportions,whichallowsdiscardingportionsthatareunlikelytocontainobjects,whiletheremainingportionsareclassified.

Thereisoneproblemthatemergeswiththisapproach,though:overlappingregions.

Let’sexpandalittlebitonthisconcepttoclarifythenatureoftheproblem.Say,you’reoperatingfacedetectiononanimageandareusingslidingwindows.

Eachwindowslidesoffafewpixelsatatime,whichmeansthataslidingwindowhappenstobeapositivematchforthesamefaceinfourdifferentpositions.Naturally,wedon’twanttoreportfourmatches,ratheronlyone;furthermore,we’renotinterestedintheportionoftheimagewithagoodscore,butsimplyintheportionwiththehighestscore.

Here’swherenon-maximumsuppressioncomesintoplay:givenasetofoverlappingregions,wecansuppressalltheregionsthatarenotclassifiedwiththemaximumscore.

Non-maximum(ornon-maxima)suppressionNon-maximum(ornon-maxima)suppressionisatechniquethatsuppressesalltheresultsthatrelatetothesameareaofanimage,whicharenotthemaximumscoreforaparticulararea.Thisisbecausesimilarlycolocatedwindowstendtohavehigherscoresandoverlappingareasaresignificant,butweareonlyinterestedinthewindowwiththebestresult,anddiscardingoverlappingwindowswithlowerscores.

Whenexamininganimagewithslidingwindows,youwanttomakesuretoretainthebestwindowofabunchofwindows,alloverlappingaroundthesamesubject.

Todothis,youdeterminethatallthewindowswithmorethanathreshold,x,incommonwillbethrownintothenon-maximumsuppressionoperation.

Thisisquitecomplex,butit’salsonottheendofthisprocess.Remembertheimage

pyramid?We’rescanningtheimageatsmallerscalesiterativelytomakesuretodetectobjectsindifferentscales.

Thismeansthatyouwillobtainaseriesofwindowsatdifferentscales,then,computethesizeofawindowobtainedinasmallerscaleasifitweredetectedintheoriginalscale,and,finally,throwthiswindowintotheoriginalmix.

Itdoessoundabitcomplex.Thankfully,we’renotthefirsttocomeacrossthisproblem,whichhasbeenresolvedinseveralways.ThefastestalgorithminmyexperiencewasimplementedbyDr.TomaszMalisiewiczathttp://www.computervisionblog.com/2011/08/blazing-fast-nmsm-from-exemplar-svm.html.TheexampleisinMATLAB,butintheapplicationexample,wewillobviouslyuseaPythonversionofit.

Thegeneralapproachbehindnon-maximumsuppressionisasfollows:

1. Onceanimagepyramidhasbeenconstructed,scantheimagewiththeslidingwindowapproachforobjectdetection.

2. Collectallthecurrentwindowsthathavereturnedapositiveresult(beyondacertainarbitrarythreshold),andtakeawindow,W,withthehighestresponse.

3. EliminateallwindowsthatoverlapWsignificantly.4. Moveontothenextwindowwiththehighestresponseandrepeattheprocessforthe

currentscale.

Whenthisprocessiscomplete,moveupthenextscaleintheimagepyramidandrepeattheprecedingprocess.Tomakesurewindowsarecorrectlyrepresentedattheendoftheentirenon-maximumsuppressionprocess,besuretocomputethewindowsizeinrelationtotheoriginalsizeoftheimage(forexample,ifyoudetectawindowat50percentscaleoftheoriginalsizeinthepyramid,thedetectedwindowwillactuallybefourtimesthesizeintheoriginalimage).

Attheendofthisprocess,youwillhaveasetofmaximumscoredwindows.Optionally,youcancheckforwindowsthatareentirelycontainedinotherwindows(likewedidforthepeopledetectionprocessatthebeginningofthechapter)andeliminatethose.

Now,howdowedeterminethescoreofawindow?Weneedaclassificationsystemthatdetermineswhetheracertainfeatureispresentornotandaconfidencescoreforthisclassification.Thisiswheresupportvectormachines(SVM)comesintoplay.

SupportvectormachinesExplainingindetailwhatanSVMisanddoesisbeyondthescopeofthisbook,butsufficeittosay,SVMisanalgorithmthat—givenlabeledtrainingdata–enablestheclassificationofthisdatabyoutputtinganoptimalhyperplane,which,inplainEnglish,istheoptimalplanethatdividesdifferentlyclassifieddata.Avisualrepresentationwillhelpyouunderstandthis:

http://www.computervisionblog.com/2011/08/blazing-fast-nmsm-from-exemplar-svm.html

Whyisitsohelpfulincomputervisionandobjectdetectioninparticular?Thisisduetothefactthatfindingtheoptimaldivisionlinebetweenpixelsthatbelongtoanobjectandthosethatdon’tisavitalcomponentofobjectdetection.

TheSVMmodelhasbeenaroundsincetheearly1960s;however,thecurrentformofitsimplementationoriginatesina1995paperbyCorinnaCortesandVadimirVapnik,whichisavailableathttp://link.springer.com/article/10.1007/BF00994018.

Nowthatwehaveagoodunderstandingoftheconceptsinvolvedinobjectdetection,wecanstartlookingatafewexamples.Wewillstartfrombuilt-infunctionsandevolveintotrainingourowncustomobjectdetectors.

http://link.springer.com/article/10.1007/BF00994018

PeopledetectionOpenCVcomeswithHOGDescriptorthatperformspeopledetection.

Here’saprettystraightforwardexample:

importcv2

importnumpyasnp

defis_inside(o,i):

ox,oy,ow,oh=o

ix,iy,iw,ih=i

returnox>ixandoy>iyandox+ow<ix+iwandoy+oh<iy+ih

defdraw_person(image,person):

x,y,w,h=person


img=cv2.imread("../images/people.jpg")

hog=cv2.HOGDescriptor()

hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

found,w=hog.detectMultiScale(img)

found_filtered=[]

forri,rinenumerate(found):

forqi,qinenumerate(found):

ifri!=qiandis_inside(r,q):

break

else:

found_filtered.append(r)

forpersoninfound_filtered:

draw_person(img,person)

cv2.imshow("peopledetection",img)

cv2.waitKey(0)


Aftertheusualimports,wedefinetwoverysimplefunctions:is_insideanddraw_person,whichperformtwominimaltasks,namely,determiningwhetherarectangleisfullycontainedinanotherrectangle,anddrawingrectanglesarounddetectedpeople.

WethenloadtheimageandcreateHOGDescriptorthroughaverysimpleandself-explanatorycode:

cv2.HOGDescriptor()

Afterthis,wespecifythatHOGDescriptorwilluseadefaultpeopledetector.

ThisisdonethroughthesetSVMDetector()method,which—afterourintroductiontoSVM—soundslessobscurethanitmayhaveifwehadn’tintroducedSVMs.

Next,weapplydetectMultiScaleontheloadedimage.Interestingly,unlikeallthefacedetectionalgorithms,wedon’tneedtoconverttheoriginalimagetograyscalebefore

applyinganyformofobjectdetection.

Thedetectionmethodwillreturnanarrayofrectangles,whichwouldbeagoodenoughsourceofinformationforustostartdrawingshapesontheimage.Ifwedidthis,however,youwouldnoticesomethingstrange:someoftherectanglesareentirelycontainedinotherrectangles.Thisclearlyindicatesanerrorindetection,andwecansafelyassumethatarectangleentirelyinsideanotheronecanbediscarded.

Thisispreciselythereasonwhywedefinedanis_insidefunction,andwhyweiteratethroughtheresultofthedetectiontodiscardfalsepositives.

Ifyourunthescriptyourself,youwillseerectanglesaroundpeopleintheimage.

CreatingandtraininganobjectdetectorUsingbuilt-infeaturesmakesiteasytocomeupwithaquickprototypeforanapplication,andwe’reallverygratefultotheOpenCVdevelopersformakinggreatfeatures,suchasfacedetectionorpeopledetectionreadilyavailable(truly,weare).

However,whetheryouareahobbyistoracomputervisionprofessional,it’sunlikelythatyouwillonlydealwithpeopleandfaces.

Moreover,ifyou’relikeme,youwonderhowthepeopledetectorfeaturewascreatedinthefirstplaceandifyoucanimproveit.Furthermore,youmayalsowonderwhetheryoucanapplythesameconceptstodetectthemostdiversetypeofobjects,rangingfromcarstogoblins.

Inanenterpriseenvironment,youmayhavetodealwithveryspecificdetection,suchasregistrationplates,bookcovers,orwhateveryourcompanymaydealwith.

So,thequestionis,howdowecomeupwithourownclassifiers?

TheanswerliesinSVMandbag-of-wordstechnique.

We’vealreadytalkedaboutHOGandSVM,solet’stakeacloserlookatbag-of-words.

Bag-of-wordsBag-of-words(BOW)isaconceptthatwasnotinitiallyintendedforcomputervision,rather,weuseanevolvedversionofthisconceptinthecontextofcomputervision.So,let’sfirsttalkaboutitsbasicversion,which—asyoumayhaveguessed—originallybelongstothefieldoflanguageanalysisandinformationretrieval.

BOWisthetechniquebywhichweassignacountweighttoeachwordinaseriesofdocuments;wethenrerepresentthesedocumentswithvectorsthatrepresentthesesetofcounts.Let’slookatanexample:

Document1:IlikeOpenCVandIlikePythonDocument2:IlikeC++andPythonDocument3:Idon'tlikeartichokes

Thesethreedocumentsallowustobuildadictionary(orcodebook)withthesevalues:

{

I:4,

like:4,

OpenCV:2,

and:2,

Python:2,

C++:1,

dont:1,

artichokes:1

}

Wehaveeightentries.Let’snowrerepresenttheoriginaldocumentsusingeight-entryvectors,eachvectorcontainingallthewordsinthedictionarywithvaluesrepresentingthe

countforeachterminthedocument.Thevectorrepresentationoftheprecedingthreesentencesisasfollows:

[2,2,1,1,1,0,0,0]

[1,1,0,1,1,1,0,0]

[1,1,0,0,0,0,1,1]

Thiskindofrepresentationofdocumentshasmanyeffectiveapplicationsintherealworld,suchasspamfiltering.

Thesevectorscanbeconceptualizedasahistogramrepresentationofdocumentsorasafeature(thesamewayweextractedfeaturesfromimagesinpreviouschapters),whichcanbeusedtotrainclassifiers.

NowthatwehaveagraspofthebasicconceptofBOWorbagofvisualwords(BOVW)incomputervision,let’sseehowthisappliestotheworldofcomputervision.

BOWincomputervisionWearebynowfamiliarwiththeconceptofimagefeatures.We’veusedfeatureextractors,suchasSIFT,andSURF,toextractfeaturesfromimagessothatwecouldmatchthesefeaturesinanotherimage.

We’vealsofamiliarizedourselveswiththeconceptofcodebook,andweknowaboutSVM,amodelthatcanbefedasetoffeaturesandutilizescomplexalgorithmstoclassifytraindata,andcanpredicttheclassificationofnewdata.

So,theimplementationofaBOWapproachwillinvolvethefollowingsteps:

1. Takeasampledataset.2. Foreachimageinthedataset,extractdescriptors(withSIFT,SURF,andsoon).3. AddeachdescriptortotheBOWtrainer.4. Clusterthedescriptorstokclusters(okay,thissoundsobscure,butbearwithme)

whosecenters(centroids)areourvisualwords.

Atthispoint,wehaveadictionaryofvisualwordsreadytobeused.Asyoucanimagine,alargedatasetwillhelpmakeourdictionaryricherinvisualwords.Uptoanextent,themorewords,thebetter!

Afterthis,wearereadytotestourclassifierandattemptdetection.Thegoodnewsisthattheprocessisverysimilartotheoneoutlinedpreviously:givenatestimage,wecanextractfeaturesandquantizethembasedontheirdistancetothenearestcentroidtoformahistogram.

Basedonthis,wecanattempttorecognizevisualwordsandlocatethemintheimage.Here’savisualrepresentationoftheBOWprocess:

Thisisthepointinthechapterwhenyouhavebuiltanappetiteforapracticalexample,andarerearingtocode.However,beforeproceeding,Ifeelthataquickdigressionintothetheoryofthek-meansclusteringisnecessarysothatyoucanfullyunderstandhowvisualwordsarecreated,andgainabetterunderstandingoftheprocessofobjectdetectionusingBOWandSVM.

Thek-meansclustering

Thek-meansclusteringisamethodofvectorquantizationtoperformdataanalysis.Givenadataset,krepresentsthenumberofclustersinwhichthedatasetisgoingtobedivided.Theterm“means”referstothemathematicalconceptofmean,whichisprettybasic,butforthesakeofclarity,it’swhatpeoplecommonlyrefertoasaverage;whenvisuallyrepresented,themeanofaclusterisitscentroidorthegeometricalcenterofpointsinthecluster.

NoteClusteringreferstothegroupingofpointsinadatasetintoclusters.

OneoftheclasseswewillbeusingtoperformobjectdetectioniscalledBagOfWordsKMeansTrainer;bynow,youshouldabletodeducewhattheresponsibilityofthisclassistocreate:

”kmeans()-basedclasstotrainavisualvocabularyusingthebag-of-wordsapproach”

ThisisaspertheOpenCVdocumentation.

Here’sarepresentationofak-meansclusteringoperationwithfiveclusters:

Afterthislongtheoreticalintroduction,wecanlookatanexample,andstarttrainingourobjectdetector.

DetectingcarsThereisnovirtuallimittothetypeofobjectsyoucandetectinyourimagesandvideos.However,toobtainanacceptablelevelofaccuracy,youneedasufficientlylargedataset,containingtrainimagesthatareidenticalinsize.

Thiswouldbeatimeconsumingoperationifweweretodoitallbyourselves(whichisentirelypossible).

Wecanavailofready-madedatasets;thereareanumberofthemfreelydownloadablefromvarioussources:

TheUniversityofIllinois:http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/CarData.tar.gzStanfordUniversity:http://ai.stanford.edu/~jkrause/cars/car_dataset.html

NoteNotethattrainingimagesandtestimagesareavailableinseparatefiles.

I’llbeusingtheUIUCdatasetinmyexample,butfeelfreetoexploretheInternetforothertypesofdatasets.

Now,let’stakealookatanexample:

importcv2

importnumpyasnp


datapath="/home/d3athmast3r/dev/python/CarData/TrainImages/"

defpath(cls,i):

return"%s/%s%d.pgm"%(datapath,cls,i+1)

pos,neg="pos-","neg-"

detect=cv2.xfeatures2d.SIFT_create()

extract=cv2.xfeatures2d.SIFT_create()

flann_params=dict(algorithm=1,trees=5)flann=

cv2.FlannBasedMatcher(flann_params,{})

bow_kmeans_trainer=cv2.BOWKMeansTrainer(40)

extract_bow=cv2.BOWImgDescriptorExtractor(extract,flann)

defextract_sift(fn):

im=cv2.imread(fn,0)

returnextract.compute(im,detect.detect(im))[1]

foriinrange(8):

bow_kmeans_trainer.add(extract_sift(path(pos,i)))

bow_kmeans_trainer.add(extract_sift(path(neg,i)))

voc=bow_kmeans_trainer.cluster()

extract_bow.setVocabulary(voc)

http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/CarData.tar.gz

http://ai.stanford.edu/~jkrause/cars/car_dataset.html

defbow_features(fn):

im=cv2.imread(fn,0)

returnextract_bow.compute(im,detect.detect(im))

traindata,trainlabels=[],[]

foriinrange(20):

traindata.extend(bow_features(path(pos,i)));trainlabels.append(1)

traindata.extend(bow_features(path(neg,i)));trainlabels.append(-1)

svm=cv2.ml.SVM_create()

svm.train(np.array(traindata),cv2.ml.ROW_SAMPLE,np.array(trainlabels))

defpredict(fn):

f=bow_features(fn);

p=svm.predict(f)

printfn,"\t",p[1][0][0]

returnp

car,notcar="/home/d3athmast3r/dev/python/study/images/car.jpg",

"/home/d3athmast3r/dev/python/study/images/bb.jpg"

car_img=cv2.imread(car)

notcar_img=cv2.imread(notcar)

car_predict=predict(car)

not_car_predict=predict(notcar)

font=cv2.FONT_HERSHEY_SIMPLEX

if(car_predict[1][0][0]==1.0):

cv2.putText(car_img,'CarDetected',(10,30),font,1,

(0,255,0),2,cv2.LINE_AA)

if(not_car_predict[1][0][0]==-1.0):

cv2.putText(notcar_img,'CarNotDetected',(10,30),font,1,(0,0,

255),2,cv2.LINE_AA)

cv2.imshow('BOW+SVMSuccess',car_img)

cv2.imshow('BOW+SVMFailure',notcar_img)

cv2.waitKey(0)


Whatdidwejustdo?Thisisquitealottoassimilate,solet’sgothroughwhatwe’vedone:

1. Firstofall,ourusualimportsarefollowedbythedeclarationofthebasepathofourtrainingimages.Thiswillcomeinhandytoavoidrewritingthebasepatheverytimeweprocessanimageinaparticularfolderonourcomputer.

2. Afterthis,wedeclareafunction,path:

defpath(cls,i):



NoteMoreonthepathfunction

Thisfunctionisautilitymethod:giventhenameofaclass(inourcase,wehavetwoclasses,posandneg)andanumericalindex,wereturnthefullpathtoaparticulartestingimage.Ourcardatasetcontainsimagesnamedinthefollowingway:pos-x.pgmandneg-x.pgm,wherexisanumber.

Immediately,youwillfindtheusefulnessofthisfunctionwheniteratingthrougharangeofnumbers(say,20),whichwillallowyoutoloadallimagesfrompos-0.pgmtopos-20.pgm,andthesamegoesforthenegativeclass.

3. Nextup,we’llcreatetwoSIFTinstances:onetoextractkeypoints,theothertoextractfeatures:


extract=cv2.xfeatures2d.SIFT_create()

4. WheneveryouseeSIFTinvolved,youcanbeprettysuresomefeaturematchingalgorithmwillbeinvolvedtoo.Inourcase,we’llcreateaninstanceforaFLANNmatcher:

flann_params=dict(algorithm=1,trees=5)flann=

cv2.FlannBasedMatcher(flann_params,{})

NoteNotethatcurrently,theenumvaluesforFLANNaremissingfromthePythonversionofOpenCV3,so,number1,whichispassedasthealgorithmparameter,representstheFLANN_INDEX_KDTREEalgorithm.Isuspectthefinalversionwillbecv2.FLANN_INDEX_KDTREE,whichisalittlemorehelpful.Makesuretochecktheenumvaluesforthecorrectflags.

5. Next,wementiontheBOWtrainer:


6. ThisBOWtrainerutilizes40clusters.Afterthis,we’llinitializetheBOWextractor.

ThisistheBOWclassthatwillbefedavocabularyofvisualwordsandwilltrytodetecttheminthetestimage:


7. ToextracttheSIFTfeaturesfromanimage,webuildautilitymethod,whichtakesthepathtotheimage,readsitingrayscale,andreturnsthedescriptor:

defextract_sift(fn):

im=cv2.imread(fn,0)

returnextract.compute(im,detect.detect(im))[1]

Atthisstage,wehaveeverythingweneedtostarttrainingtheBOWtrainer.

1. Let’sreadeightimagesperclass(eightpositivesandeightnegatives)fromourdataset:

foriinrange(8):

bow_kmeans_trainer.add(extract_sift(path(pos,i)))

bow_kmeans_trainer.add(extract_sift(path(neg,i)))

2. Tocreatethevocabularyofvisualwords,we’llcallthecluster()methodonthetrainer,whichperformsthek-meansclassificationandreturnsthesaidvocabulary.We’llassignthisvocabularytoBOWImgDescriptorExtractorsothatitcanextractdescriptorsfromtestimages:

vocabulary=bow_kmeans_trainer.cluster()

extract_bow.setVocabulary(vocabulary)

3. Inlinewithotherutilityfunctionsdeclaredinthisscript,we’lldeclareafunctionthattakesthepathtoanimageandreturnsthedescriptorascomputedbytheBOWdescriptorextractor:

defbow_features(fn):

im=cv2.imread(fn,0)

returnextract_bow.compute(im,detect.detect(im))

4. Let’screatetwoarraystoaccommodatethetraindataandlabels,andpopulatethemwiththedescriptorsgeneratedbyBOWImgDescriptorExtractor,associatinglabelstothepositiveandnegativeimageswe’refeeding(1standsforapositivematch,-1foranegative):


foriinrange(20):

traindata.extend(bow_features(path(pos,i)));trainlabels.append(1)

traindata.extend(bow_features(path(neg,i)));trainlabels.append(-1)

5. Now,let’screateaninstanceofanSVM:


6. Then,trainitbywrappingthetraindataandlabelsintotheNumPyarrays:

svm.train(np.array(traindata),cv2.ml.ROW_SAMPLE,

np.array(trainlabels))

We’reallsetwithatrainedSVM;allthatislefttodoistofeedtheSVMacoupleofsampleimagesandseehowitbehaves.

1. Let’sfirstdefineanotherutilitymethodtoprinttheresultofourpredictmethodandreturnit:

defpredict(fn):

f=bow_features(fn);

p=svm.predict(f)

printfn,"\t",p[1][0][0]

returnp

2. Let’sdefinetwosampleimagepathsandreadthemastheNumPyarrays:

car,notcar="/home/d3athmast3r/dev/python/study/images/car.jpg",

"/home/d3athmast3r/dev/python/study/images/bb.jpg"

car_img=cv2.imread(car)

notcar_img=cv2.imread(notcar)

3. We’llpasstheseimagestothetrainedSVM,andgettheresultoftheprediction:

car_predict=predict(car)

not_car_predict=predict(notcar)

Naturally,we’rehopingthatthecarimagewillbedetectedasacar(resultofpredict()shouldbe1.0),andthattheotherimagewillnot(resultshouldbe-1.0),sowewillonlyaddtexttotheimagesiftheresultistheexpectedone.

4. Atlast,we’llpresenttheimagesonthescreen,hopingtoseethecorrectcaptiononeach:


if(car_predict[1][0][0]==1.0):

cv2.putText(car_img,'CarDetected',(10,30),font,1,

(0,255,0),2,cv2.LINE_AA)

if(not_car_predict[1][0][0]==-1.0):

cv2.putText(notcar_img,'CarNotDetected',(10,30),font,1,(0,0,

255),2,cv2.LINE_AA)

cv2.imshow('BOW+SVMSuccess',car_img)

cv2.imshow('BOW+SVMFailure',notcar_img)

cv2.waitKey(0)


Theprecedingoperationproducesthefollowingresult:

Italsoresultsinthis:

SVMandslidingwindowsHavingdetectedanobjectisanimpressiveachievement,butnowwewanttopushthistothenextlevelintheseways:

DetectingmultipleobjectsofthesamekindinanimageDeterminingthepositionofadetectedobjectinanimage

Toaccomplishthis,wewillusetheslidingwindowsapproach.Ifit’snotalreadyclearfromthepreviousexplanationoftheconceptofslidingwindows,therationalebehindtheadoptionofthisapproachwillbecomemoreapparentifwetakealookatadiagram:

Observethemovementoftheblock:

1. Wetakearegionoftheimage,classifyit,andthenmoveapredefinedstepsizetotheright-handside.Whenwereachtherightmostendoftheimage,we’llresetthexcoordinateto0andmovedownastep,andrepeattheentireprocess.

2. Ateachstep,we’llperformaclassificationwiththeSVMthatwastrainedwithBOW.

3. KeepatrackofalltheblocksthathavepassedtheSVMpredicttest.4. Whenyou’vefinishedclassifyingtheentireimage,scaletheimagedownandrepeat

theentireslidingwindowsprocess.

Continuerescalingandclassifyinguntilyougettoaminimumsize.

Thisgivesyouthechancetodetectobjectsinseveralregionsoftheimageandatdifferentscales.

Atthisstage,youwillhavecollectedimportantinformationaboutthecontentoftheimage;however,there’saproblem:it’smostlikelythatyouwillendupwithanumberofoverlappingblocksthatgiveyouapositivescore.Thismeansthatyourimagemaycontainoneobjectthatgetsdetectedfourorfivetimes,andifyouweretoreporttheresultofthedetection,yourreportwouldbequiteinaccurate,sohere’swherenon-maximum

suppressioncomesintoplay.

Example–cardetectioninasceneWearenowreadytoapplyalltheconceptswelearnedsofartoareal-lifeexample,andcreateacardetectorapplicationthatscansanimageanddrawsrectanglesaroundcars.

Let’ssummarizetheprocessbeforedivingintothecode:

1. Obtainatraindataset.2. CreateaBOWtrainerandcreateavisualvocabulary.3. TrainanSVMwiththevocabulary.4. Attemptdetectionusingslidingwindowsonanimagepyramidofatestimage.5. Applynon-maximumsuppressiontooverlappingboxes.6. Outputtheresult.

Let’salsotakealookatthestructureoftheproject,asitisabitmorecomplexthantheclassicstandalonescriptapproachwe’veadopteduntilnow.

Theprojectstructureisasfollows:

├──car_detector

│├──detector.py

│├──__init__.py

│├──non_maximum.py

│├──pyramid.py

│└──sliding_w112661222.indow.py

└──car_sliding_windows.py

Themainprogramisincar_sliding_windows.py,andalltheutilitiesarecontainedinthecar_detectorfolder.Aswe’reusingPython2.7,we’llneedan__init__.pyfileinthefolderforittobedetectedasamodule.

Thefourfilesinthecar_detectormoduleareasfollows:

TheSVMtrainingmodelThenon-maximumsuppressionfunctionTheimagepyramidTheslidingwindowsfunction

Let’sexaminethemonebyone,startingfromtheimagepyramid:

importcv2

defresize(img,scaleFactor):

returncv2.resize(img,(int(img.shape[1]*(1/scaleFactor)),

int(img.shape[0]*(1/scaleFactor))),interpolation=cv2.INTER_AREA)

defpyramid(image,scale=1.5,minSize=(200,80)):

yieldimage

whileTrue:

image=resize(image,scale)

ifimage.shape[0]<minSize[1]orimage.shape[1]<minSize[0]:

break

yieldimage

Thismodulecontainstwofunctiondefinitions:

ResizetakesanimageandresizesitbyaspecifiedfactorPyramidtakesanimageandreturnsaresizedversionofituntiltheminimumconstraintsofwidthandheightarereached

NoteYouwillnoticethattheimageisnotreturnedwiththereturnkeywordbutwiththeyieldkeyword.Thisisbecausethisfunctionisaso-calledgenerator.Ifyouarenotfamiliarwithgenerators,takealookathttps://wiki.python.org/moin/Generators.

Thiswillallowustoobtainaresizedimagetoprocessinourmainprogram.

Nextupistheslidingwindowsfunction:

defsliding_window(image,stepSize,windowSize):

foryinxrange(0,image.shape[0],stepSize):

forxinxrange(0,image.shape[1],stepSize):

yield(x,y,image[y:y+windowSize[1],x:x+windowSize[0]])

Again,thisisagenerator.Althoughabitdeep-nested,thismechanismisverysimple:givenanimage,returnawindowthatmovesofanarbitrarysizedstepfromtheleftmargintowardstheright,untiltheentirewidthoftheimageiscovered,thengoesbacktotheleftmarginbutdownastep,coveringthewidthoftheimagerepeatedlyuntilthebottomrightcorneroftheimageisreached.Youcanvisualizethisasthesamepatternusedforwritingonapieceofpaper:startfromtheleftmarginandreachtherightmargin,thenmoveontothenextlinefromtheleftmargin.

Thelastutilityisnon-maximumsuppression,whichlookslikethis(Malisiewicz/Rosebrock’scode):

defnon_max_suppression_fast(boxes,overlapThresh):

#iftherearenoboxes,returnanemptylist

iflen(boxes)==0:

return[]

#iftheboundingboxesintegers,convertthemtofloats—

#thisisimportantsincewe'llbedoingabunchofdivisions

ifboxes.dtype.kind=="i":

boxes=boxes.astype("float")

https://wiki.python.org/moin/Generators

#initializethelistofpickedindexes

pick=[]

#grabthecoordinatesoftheboundingboxes

x1=boxes[:,0]

y1=boxes[:,1]

x2=boxes[:,2]

y2=boxes[:,3]

scores=boxes[:,4]

#computetheareaoftheboundingboxesandsortthebounding

#boxesbythescore/probabilityoftheboundingbox

area=(x2-x1+1)*(y2-y1+1)

idxs=np.argsort(scores)[::-1]

#keeploopingwhilesomeindexesstillremainintheindexes

#list

whilelen(idxs)>0:

#grabthelastindexintheindexeslistandaddthe

#indexvaluetothelistofpickedindexes

last=len(idxs)-1

i=idxs[last]

pick.append(i)

#findthelargest(x,y)coordinatesforthestartof

#theboundingboxandthesmallest(x,y)coordinates

#fortheendoftheboundingbox

xx1=np.maximum(x1[i],x1[idxs[:last]])

yy1=np.maximum(y1[i],y1[idxs[:last]])

xx2=np.minimum(x2[i],x2[idxs[:last]])

yy2=np.minimum(y2[i],y2[idxs[:last]])

#computethewidthandheightoftheboundingbox

w=np.maximum(0,xx2-xx1+1)

h=np.maximum(0,yy2-yy1+1)

#computetheratioofoverlap

overlap=(w*h)/area[idxs[:last]]

#deleteallindexesfromtheindexlistthathave

idxs=np.delete(idxs,np.concatenate(([last],

np.where(overlap>overlapThresh)[0])))

#returnonlytheboundingboxesthatwerepickedusingthe

#integerdatatype

returnboxes[pick].astype("int")

Thisfunctionsimplytakesalistofrectanglesandsortsthembytheirscore.Startingfromtheboxwiththehighestscore,iteliminatesallboxesthatoverlapbeyondacertainthresholdbycalculatingtheareaofintersectionanddeterminingwhetheritisgreaterthanacertainthreshold.

Examiningdetector.py

Now,let’sexaminetheheartofthisprogram,whichisdetector.py.Thisabitlongandcomplex;however,everythingshouldappearmuchclearergivenournewfoundfamiliaritywiththeconceptsofBOW,SVM,andfeaturedetection/extraction.

Here’sthecode:

importcv2

importnumpyasnp

datapath="/path/to/CarData/TrainImages/"

SAMPLES=400

defpath(cls,i):


defget_flann_matcher():

flann_params=dict(algorithm=1,trees=5)

returncv2.FlannBasedMatcher(flann_params,{})

defget_bow_extractor(extract,flann):

returncv2.BOWImgDescriptorExtractor(extract,flann)

defget_extract_detect():

returncv2.xfeatures2d.SIFT_create(),cv2.xfeatures2d.SIFT_create()

defextract_sift(fn,extractor,detector):

im=cv2.imread(fn,0)

returnextractor.compute(im,detector.detect(im))[1]

defbow_features(img,extractor_bow,detector):

returnextractor_bow.compute(img,detector.detect(img))

defcar_detector():


detect,extract=get_extract_detect()

matcher=get_flann_matcher()

print"buildingBOWKMeansTrainer…"



print"addingfeaturestotrainer"

foriinrange(SAMPLES):

printi

bow_kmeans_trainer.add(extract_sift(path(pos,i),extract,detect))

bow_kmeans_trainer.add(extract_sift(path(neg,i),extract,detect))

voc=bow_kmeans_trainer.cluster()

extract_bow.setVocabulary(voc)


print"addingtotraindata"


printi

traindata.extend(bow_features(cv2.imread(path(pos,i),0),extract_bow,

detect))

trainlabels.append(1)

traindata.extend(bow_features(cv2.imread(path(neg,i),0),extract_bow,

detect))

trainlabels.append(-1)


svm.setType(cv2.ml.SVM_C_SVC)

svm.setGamma(0.5)

svm.setC(30)

svm.setKernel(cv2.ml.SVM_RBF)


returnsvm,extract_bow

Let’sgothroughit.First,we’llimportourusualmodules,andthensetapathforthetrainingimages.

Then,we’lldefineanumberofutilityfunctions:

defpath(cls,i):


Thisfunctionreturnsthepathtoanimagegivenabasepathandaclassname.Inourexample,we’regoingtousetheneg-andpos-classnames,becausethisiswhatthetrainingimagesarecalled(thatis,neg-1.pgm).Thelastargumentisanintegerusedtocomposethefinalpartoftheimagepath.

Next,we’lldefineautilityfunctiontoobtainaFLANNmatcher:

defget_flann_matcher():

flann_params=dict(algorithm=1,trees=5)

returncv2.FlannBasedMatcher(flann_params,{})

Again,it’snotthattheinteger,1,passedasanalgorithmargumentrepresentsFLANN_INDEX_KDTREE.

ThenexttwofunctionsreturntheSIFTfeaturedetectors/extractorsandaBOWtrainer:

defget_bow_extractor(extract,flann):

returncv2.BOWImgDescriptorExtractor(extract,flann)

defget_extract_detect():

returncv2.xfeatures2d.SIFT_create(),cv2.xfeatures2d.SIFT_create()

Thenextutilityisafunctionusedtoreturnfeaturesfromanimage:

defextract_sift(fn,extractor,detector):

im=cv2.imread(fn,0)

returnextractor.compute(im,detector.detect(im))[1]

NoteASIFTdetectordetectsfeatures,whileaSIFTextractorextractsandreturnsthem.

We’llalsodefineasimilarutilityfunctiontoextracttheBOWfeatures:

defbow_features(img,extractor_bow,detector):

returnextractor_bow.compute(img,detector.detect(img))

Inthemaincar_detectorfunction,we’llfirstcreatethenecessaryobjectusedtoperformfeaturedetectionandextraction:


detect,extract=get_extract_detect()

matcher=get_flann_matcher()



Then,we’lladdfeaturestakenfromtrainingimagestothetrainer:

print"addingfeaturestotrainer"


printi

bow_kmeans_trainer.add(extract_sift(path(pos,i),extract,detect))

Foreachclass,we’lladdapositiveimagetothetrainerandanegativeimage.

Afterthis,we’llinstructthetrainertoclusterthedataintokgroups.

Theclustereddataisnowourvocabularyofvisualwords,andwecansettheBOWImgDescriptorExtractorclass’vocabularyinthisway:

vocabulary=bow_kmeans_trainer.cluster()

extract_bow.setVocabulary(vocabulary)

Associatingtrainingdatawithclasses

Withavisualvocabularyready,wecannowassociatetraindatawithclasses.Inourcase,wehavetwoclasses:-1fornegativeresultsand1forpositiveones.

Let’spopulatetwoarrays,traindataandtrainlabels,containingextractedfeaturesandtheircorrespondinglabels.Iteratingthroughthedataset,wecanquicklysetthisupwiththefollowingcode:




printi

traindata.extend(bow_features(cv2.imread(path(pos,i),0),extract_bow,

detect))


traindata.extend(bow_features(cv2.imread(path(neg,i),0),extract_bow,

detect))

trainlabels.append(-1)

Youwillnoticethatateachcycle,we’lladdonepositiveandonenegativeimage,andthenpopulatethelabelswitha1anda-1valuetokeepthedatasynchronizedwiththelabels.

Shouldyouwishtotrainmoreclasses,youcoulddothatbyfollowingthispattern:




printi

traindata.extend(bow_features(cv2.imread(path(class1,i),0),

extract_bow,detect))








Forexample,youcouldtrainadetectortodetectcarsandpeopleandperformdetectionontheseinanimagecontainingbothcarsandpeople.

Lastly,we’lltraintheSVMwiththefollowingcode:


svm.setType(cv2.ml.SVM_C_SVC)

svm.setGamma(0.5)

svm.setC(30)

svm.setKernel(cv2.ml.SVM_RBF)


returnsvm,extract_bow

TherearetwoparametersinparticularthatI’dliketofocusyourattentionon:

C:Withthisparameter,youcouldconceptualizethestrictnessorseverityoftheclassifier.Thehigherthevalue,thelesschancesofmisclassification,butthetrade-offisthatsomepositiveresultsmaynotbedetected.Ontheotherhand,alowvaluemayover-fit,soyouriskgettingfalsepositives.Kernel:Thisparameterdeterminesthenatureoftheclassifier:SVM_LINEARindicatesalinearhyperplane,which,inpracticalterms,worksverywellforabinaryclassification(thetestsampleeitherbelongstoaclassoritdoesn’t),whileSVM_RBF(radialbasisfunction)separatesdatausingtheGaussianfunctions,whichmeansthatthedataissplitintoseveralkernelsdefinedbythesefunctions.WhentrainingtheSVMtoclassifyformorethantwoclasses,youwillhavetouseRBF.

Finally,we’llpassthetraindataandtrainlabelsarraysintotheSVMtrainmethod,andreturntheSVMandBOWextractorobject.Thisisbecauseinourapplications,wedon’twanttohavetorecreatethevocabularyeverytime,soweexposeitforreuse.

Dude,where’smycar?Wearereadytotestourcardetector!

Let’sfirstcreateasimpleprogramthatloadsanimage,andthenoperatesdetectionusingtheslidingwindowsandimagepyramidtechniques,respectively:

importcv2

importnumpyasnp

fromcar_detector.detectorimportcar_detector,bow_features

fromcar_detector.pyramidimportpyramid

fromcar_detector.non_maximumimportnon_max_suppression_fastasnms

fromcar_detector.sliding_windowimportsliding_window

defin_range(number,test,thresh=0.2):

returnabs(number-test)<thresh

test_image="/path/to/cars.jpg"

svm,extractor=car_detector()


w,h=100,40

img=cv2.imread(test_img)

rectangles=[]

counter=1

scaleFactor=1.25

scale=1

font=cv2.FONT_HERSHEY_PLAIN

forresizedinpyramid(img,scaleFactor):

scale=float(img.shape[1])/float(resized.shape[1])

for(x,y,roi)insliding_window(resized,20,(w,h)):

ifroi.shape[1]!=worroi.shape[0]!=h:

continue

try:

bf=bow_features(roi,extractor,detect)

_,result=svm.predict(bf)

a,res=svm.predict(bf,flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)

print"Class:%d,Score:%f"%(result[0][0],res[0][0])

score=res[0][0]

ifresult[0][0]==1:

ifscore<-1.0:

rx,ry,rx2,ry2=int(x*scale),int(y*scale),int((x+w)*

scale),int((y+h)*scale)

rectangles.append([rx,ry,rx2,ry2,abs(score)])

except:

pass

counter+=1

windows=np.array(rectangles)

boxes=nms(windows,0.25)

for(x,y,x2,y2,score)inboxes:

printx,y,x2,y2,score

cv2.rectangle(img,(int(x),int(y)),(int(x2),int(y2)),(0,255,0),1)

cv2.putText(img,"%f"%score,(int(x),int(y)),font,1,(0,255,0))

cv2.imshow("img",img)

cv2.waitKey(0)

Thenotablepartoftheprogramisthefunctionwithinthepyramid/slidingwindowloop:

bf=bow_features(roi,extractor,detect)

_,result=svm.predict(bf)

a,res=svm.predict(bf,flags=cv2.ml.STAT_MODEL_RAW_OUTPUT)

print"Class:%d,Score:%f"%(result[0][0],res[0][0])

score=res[0][0]

ifresult[0][0]==1:

ifscore<-1.0:

rx,ry,rx2,ry2=int(x*scale),int(y*scale),int((x+w)*

scale),int((y+h)*scale)

rectangles.append([rx,ry,rx2,ry2,abs(score)])

Here,weextractthefeaturesoftheregionofinterest(ROI),whichcorrespondstothecurrentslidingwindow,andthenwecallpredictontheextractedfeatures.Thepredictmethodhasanoptionalparameter,flags,whichreturnsthescoreoftheprediction(containedatthe[0][0]value).

NoteAwordonthescoreoftheprediction:thelowerthevalue,thehighertheconfidencethat

theclassifiedelementreallybelongstotheclass.

So,we’llsetanarbitrarythresholdof-1.0forclassifiedwindows,andallwindowswithlessthan-1.0aregoingtobetakenasgoodresults.AsyouexperimentwithyourSVMs,youmaytweakthistoyourlikinguntilyoufindagoldenmeanthatassuresbestresults.

Finally,weaddthecomputedcoordinatesoftheslidingwindow(meaning,wemultiplythecurrentcoordinatesbythescaleofthecurrentlayerintheimagepyramidsothatitgetscorrectlyrepresentedinthefinaldrawing)tothearrayofrectangles.

There’sonelastoperationweneedtoperformbeforedrawingourfinalresult:non-maximumsuppression.

WeturntherectanglesarrayintoaNumPyarray(toallowcertainkindofoperationsthatareonlypossiblewithNumPy),andthenapplyNMS:

windows=np.array(rectangles)

boxes=nms(windows,0.25)

Finally,weproceedwithdisplayingallourresults;forthesakeofconvenience,I’vealsoprintedthescoreobtainedforalltheremainingwindows:

Thisisaremarkablyaccurateresult!

AfinalnoteonSVM:youdon’tneedtotrainadetectoreverytimeyouwanttouseit,whichwouldbeextremelyimpractical.Youcanusethefollowingcode:

svm.save('/path/to/serialized/svmxml')

Youcansubsequentlyreloaditwithaloadmethodandfeedittestimagesorframes.

SummaryInthischapter,wetalkedaboutnumerousobjectdetectionconcepts,suchasHOG,BOW,SVM,andsomeusefultechniques,suchasimagepyramid,slidingwindows,andnon-maximumsuppression.

Weintroducedtheconceptofmachinelearningandexploredthevariousapproachesusedtotrainacustomdetector,includinghowtocreateorobtainatrainingdatasetandclassifydata.Finally,weputthisknowledgetogoodusebycreatingacardetectorfromscratchandverifyingitscorrectfunctioning.

Alltheseconceptsformthefoundationofthenextchapter,inwhichwewillutilizeobjectdetectionandclassificationtechniquesinthecontextofmakingvideos,andlearnhowtotrackobjectstoretaininformationthatcanpotentiallybeusedforbusinessorapplicationpurposes.

Chapter8.TrackingObjectsInthischapter,wewillexplorethevasttopicofobjecttracking,whichistheprocessoflocatingamovingobjectinamovieorvideofeedfromacamera.Real-timeobjecttrackingisacriticaltaskinmanycomputervisionapplicationssuchassurveillance,perceptualuserinterfaces,augmentedreality,object-basedvideocompression,anddriverassistance.

Trackingobjectscanbeaccomplishedinseveralways,withtheoptimaltechniquebeinglargelydependentonthetaskathand.Wewilllearnhowtoidentifymovingobjectsandtrackthemacrossframes.

DetectingmovingobjectsThefirsttaskthatneedstobeaccomplishedforustobeabletotrackanythinginavideoistoidentifythoseregionsofavideoframethatcorrespondtomovingobjects.

Therearemanywaystotrackobjectsinavideo,allofthemfulfillingaslightlydifferentpurpose.Forexample,youmaywanttotrackanythingthatmoves,inwhichcasedifferencesbetweenframesaregoingtobeofhelp;youmaywanttotrackahandmovinginavideo,inwhichcaseMeanshiftbasedonthecoloroftheskinisthemostappropriatesolution;youmaywanttotrackaparticularobjectofwhichyouknowtheaspect,inwhichcasetechniquessuchastemplatematchingwillbeofhelp.

Objecttrackingtechniquescangetquitecomplex,let’sexplorethemintheascendingorderofdifficulty,startingfromthesimplesttechnique.

BasicmotiondetectionThefirstandmostintuitivesolutionistocalculatethedifferencesbetweenframes,orbetweenaframeconsidered“background”andalltheotherframes.

Let’slookatanexampleofthisapproach:

importcv2

importnumpyasnp


es=cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(9,4))

kernel=np.ones((5,5),np.uint8)

background=None

while(True):


ifbackgroundisNone:

background=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)

background=cv2.GaussianBlur(background,(21,21),0)

continue

gray_frame=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)

gray_frame=cv2.GaussianBlur(gray_frame,(21,21),0)

diff=cv2.absdiff(background,gray_frame)

diff=cv2.threshold(diff,25,255,cv2.THRESH_BINARY)[1]

diff=cv2.dilate(diff,es,iterations=2)

image,cnts,hierarchy=cv2.findContours(diff.copy(),cv2.RETR_EXTERNAL,


forcincnts:

ifcv2.contourArea(c)<1500:

continue

(x,y,w,h)=cv2.boundingRect(c)

cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)

cv2.imshow("contours",frame)

cv2.imshow("dif",diff)


break


camera.release()

Afterthenecessaryimports,weopenthevideofeedobtainedfromthedefaultsystemcamera,andwesetthefirstframeasthebackgroundoftheentirefeed.Eachframereadfromthatpointonwardisprocessedtocalculatethedifferencebetweenthebackgroundandtheframeitself.Thisisatrivialoperation:


Beforewegettodothat,though,weneedtoprepareourframeforprocessing.Thefirstthingwedoisconverttheframetograyscaleandbluritabit:

gray_frame=cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)

gray_frame=cv2.GaussianBlur(gray_frame,(21,21),0)

NoteYoumaywonderabouttheblurring:thereasonwhyweblurtheimageisthat,ineachvideofeed,there’sanaturalnoisecomingfromnaturalvibrations,changesinlighting,andthenoisegeneratedbythecameraitself.Wewanttosmooththisnoiseoutsothatitdoesn’tgetdetectedasmotionandconsequentlygettracked.

Nowthatourframeisgrayscaledandsmoothed,wecancalculatethedifferencecomparedtothebackground(whichhasalsobeengrayscaledandsmoothed),andobtainamapofdifferences.Thisisnottheonlyprocessingstep,though.We’realsogoingtoapplyathreshold,soastoobtainablackandwhiteimage,anddilatetheimagesoholesandimperfectionsgetnormalized,likeso:

diff=cv2.absdiff(background,gray_frame)


diff=cv2.dilate(diff,es,iterations=2)

Notethaterodinganddilatingcanalsoactasanoisefilter,muchliketheblurringweapplied,andthatitcanalsobeobtainedinonefunctioncallusingcv2.morphologyEx,weshowbothstepsexplicitlyfortransparencypurposes.Allthatislefttodoatthispointistofindthecontoursofallthewhiteblobsinthecalculateddifferencemap,anddisplaythem.Optionally,weonlydisplaycontoursforrectanglesgreaterthananarbitrarythreshold,sotinymovementsarenotdisplayed.Naturally,thisisuptoyouandyourapplicationneeds.Withaconstantlightingandaverynoiselesscamera,youmaywishtohavenothresholdontheminimumsizeofthecontours.Thisishowwedisplaytherectangles:

image,cnts,hierarchy=cv2.findContours(diff.copy(),cv2.RETR_EXTERNAL,


forcincnts:

ifcv2.contourArea(c)<1500:

continue



cv2.imshow("contours",frame)

cv2.imshow("dif",diff)

OpenCVofferstwoveryhandyfunctions:

cv2.findContours:Thisfunctioncomputesthecontoursofsubjectsinanimagecv2.boundinRect:Thisfunctioncalculatestheirboundingbox

Sothereyouhaveit,abasicmotiondetectorwithrectanglesaroundsubjects.Thefinalresultissomethinglikethis:

Forsuchasimpletechnique,thisisquiteaccurate.However,thereareafewdrawbacksthatmakethisapproachunsuitableforallbusinessneeds,mostnotablythefactthatyouneedafirst“default”frametosetasabackground.Insituationssuchas—forexample—outdoorcameras,withlightschangingquiteconstantly,thisprocessresultsinaquiteinflexibleapproach,soweneedabitmoreintelligenceintooursystem.That’swherebackgroundsubtractorscomeintoplay.

Backgroundsubtractors–KNN,MOG2,andGMGOpenCVprovidesaclasscalledBackgroundSubtractor,whichisahandywaytooperateforegroundandbackgroundsegmentation.

ThisworkssimilarlytotheGrabCutalgorithmweanalyzedinChapter3,ProcessingImageswithOpenCV3,however,BackgroundSubtractorisafullyfledgedclasswithaplethoraofmethodsthatnotonlyperformbackgroundsubtraction,butalsoimprovebackgrounddetectionintimethroughmachinelearningandletsyousavetheclassifiertoafile.

TofamiliarizeourselveswithBackgroundSubtractor,let’slookatabasicexample:

importnumpyasnp

importcv2

cap=cv2.VideoCapture')

mog=cv2.createBackgroundSubtractorMOG2()

while(1):

ret,frame=cap.read()

fgmask=mog.apply(frame)

cv2.imshow('frame',fgmask)

ifcv2.waitKey(30)&0xff:

break

cap.release()


Let’sgothroughthisinorder.Firstofall,let’stalkaboutthebackgroundsubtractorobject.TherearethreebackgroundsubtractorsavailableinOpenCV3:K-NearestNeighbors(KNN),MixtureofGaussians(MOG2),andGeometricMultigrid(GMG),correspondingtothealgorithmusedtocomputethebackgroundsubtraction.

YoumayrememberthatwealreadyelaboratedonthetopicofforegroundandbackgrounddetectioninChapter5,DepthEstimationandSegmentation,inparticularwhenwetalkedaboutGrabCutandWatershed.

SowhydoweneedtheBackgroundSubtractorclasses?ThemainreasonbehindthisisthatBackgroundSubtractorclassesarespecificallybuiltwithvideoanalysisinmind,whichmeansthattheOpenCVBackgroundSubtractorclasses“learn”somethingabouttheenvironmentwitheveryframe.Forexample,withGMG,youcanspecifythenumberofframesusedtoinitializethevideoanalysis,withthedefaultbeing120(roughly5secondswithaveragecameras).TheconstantaspectabouttheBackgroundSubtractorclassesisthattheyoperateacomparisonbetweenframesandtheystoreahistory,whichallowsthemtoimprovemotionanalysisresultsastimepasses.

Anotherfundamental(andfrankly,quiteamazing)featureoftheBackgroundSubtractor

classesistheabilitytocomputeshadows.Thisisabsolutelyvitalforanaccuratereadingofvideoframes;bydetectingshadows,youcanexcludeshadowareas(bythresholdingthem)fromtheobjectsyoudetected,andconcentrateontherealfeatures.Italsogreatlyreducestheunwanted“merging”ofobjects.AnimagecomparisonwillgiveyouagoodideaoftheconceptI’mtryingtoillustrate.Here’sasampleofbackgroundsubtractionwithoutshadowdetection:

Here’sanexampleofshadowdetection(withshadowsthresholded):

Notethatshadowdetectionisn’tabsolutelyperfect,butithelpsbringtheobjectcontoursbacktotheobject’soriginalshape.Let’stakealookatareimplementedexampleofmotiondetectionutilizingBackgroundSubtractorKNN:

importcv2

importnumpyasnp

bs=cv2.createBackgroundSubtractorKNN(detectShadows=True)

camera=cv2.VideoCapture("/path/to/movie.flv")

whileTrue:


fgmask=bs.apply(frame)

th=cv2.threshold(fgmask.copy(),244,255,cv2.THRESH_BINARY)[1]

dilated=cv2.dilate(th,cv2.getStructuringElement(cv2.MORPH_ELLIPSE,

(3,3)),iterations=2)

image,contours,hier=cv2.findContours(dilated,cv2.RETR_EXTERNAL,


forcincontours:

ifcv2.contourArea(c)>1600:



cv2.imshow("mog",fgmask)

cv2.imshow("thresh",th)

cv2.imshow("detection",frame)

ifcv2.waitKey(30)&0xff==27:

break

camera.release()


Asaresultoftheaccuracyofthesubtractor,anditsabilitytodetectshadows,weobtainareallyprecisemotiondetection,inwhichevenobjectsthatarenexttoeachotherdon’tgetmergedintoonedetection,asshowninthefollowingscreenshot:

That’saremarkableresultforfewerthan30linesofcode!

Thecoreoftheentireprogramistheapply()methodofthebackgroundsubtractor;itcomputesaforegroundmask,whichcanbeusedasabasisfortherestoftheprocessing:







forcincontours:




Onceaforegroundmaskisobtained,wecanapplyathreshold:theforegroundmaskhaswhitevaluesfortheforegroundandgrayforshadows;thus,inthethresholdedimage,allpixelsthatarenotalmostpurewhite(244-255)arebinarizedto0insteadof1.

Fromthere,weproceedwiththesameapproachweadoptedforthebasicmotiondetectionexample:identifyingobjects,detectingcontours,anddrawingthemontheoriginalframe.

MeanshiftandCAMShiftBackgroundsubtractionisareallyeffectivetechnique,butnottheonlyoneavailabletotrackobjectsinavideo.Meanshiftisanalgorithmthattracksobjectsbyfindingthemaximumdensityofadiscretesampleofaprobabilityfunction(inourcase,aregionofinterestinanimage)andrecalculatingitatthenextframe,whichgivesthealgorithmanindicationofthedirectioninwhichtheobjecthasmoved.

Thiscalculationgetsrepeateduntilthecentroidmatchestheoriginalone,orremainsunalteredevenafterconsecutiveiterationsofthecalculation.Thisfinalmatchingiscalledconvergence.Forreference,thealgorithmwasfirstdescribedinthepaper,Theestimationofthegradientofadensityfunction,withapplicationsinpatternrecognition,FukunagaK.andHoestetlerL.,IEEE,1975,whichisavailableathttp://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330(notethatthispaperisnotfreefordownload).

Here’savisualrepresentationofthisprocess:

Asidefromthetheory,Meanshiftisveryusefulwhentrackingaparticularregionofinterestinavideo,andthishasaseriesofimplications;forexample,ifyoudon’tknowaprioriwhattheregionyouwanttotrackis,you’regoingtohavetomanagethiscleverlyanddevelopprogramsthatdynamicallystarttracking(andceasetracking)certainareasofthevideo,dependingonarbitrarycriteria.OneexamplecouldbethatyouoperateobjectdetectionwithatrainedSVM,andthenstartusingMeanshifttotrackadetectedobject.

Let’snotmakeourlifecomplicatedfromtheverybeginning,though;let’sfirstgetfamiliarwithMeanshift,andthenuseitinmorecomplexscenarios.

Wewillstartbysimplymarkingaregionofinterestandkeepingtrackofit,likeso:

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1055330&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1055330

importnumpyasnp

importcv2

cap=cv2.VideoCapture(0)


r,h,c,w=10,200,10,200

track_window=(c,r,w,h)

roi=frame[r:r+h,c:c+w]

hsv_roi=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)

mask=cv2.inRange(hsv_roi,np.array((100.,30.,32.)),

np.array((180.,120.,255.)))

roi_hist=cv2.calcHist([hsv_roi],[0],mask,[180],[0,180])

cv2.normalize(roi_hist,roi_hist,0,255,cv2.NORM_MINMAX)

term_crit=(cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,1)

whileTrue:


ifret==True:

hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)

dst=cv2.calcBackProject([hsv],[0],roi_hist,[0,180],1)

#applymeanshifttogetthenewlocation

ret,track_window=cv2.meanShift(dst,track_window,term_crit)

#Drawitonimage

x,y,w,h=track_window

img2=cv2.rectangle(frame,(x,y),(x+w,y+h),255,2)

cv2.imshow('img2',img2)

k=cv2.waitKey(60)&0xff

ifk==27:

break

else:

break


cap.release()

Intheprecedingcode,IsuppliedtheHSVvaluesfortrackingsomeshadesoflilac,andhere’stheresult:

Ifyouranthecodeonyourmachine,you’dnoticehowtheMeanshiftwindowactuallylooksforthespecifiedcolorrange;ifitdoesn’tfindit,you’lljustseethewindowwobbling(itactuallylooksabitimpatient).Ifanobjectwiththespecifiedcolorrangeentersthewindow,thewindowwillthenstarttrackingit.

Let’sexaminethecodesothatwecanfullyunderstandhowMeanshiftperformsthistrackingoperation.

ColorhistogramsBeforeshowingthecodefortheprecedingexample,though,hereisanot-so-briefdigressiononcolorhistogramsandthetwoveryimportantbuilt-infunctionsofOpenCV:calcHistandcalcBackProject.

Thefunction,calcHist,calculatescolorhistogramsofanimage,sothenextlogicalstepistoexplaintheconceptofcolorhistograms.Acolorhistogramisarepresentationofthecolordistributionofanimage.Onthexaxisoftherepresentation,wehavecolorvalues,andontheyaxis,wehavethenumberofpixelscorrespondingtothecolorvalues.

Let’slookatavisualrepresentationofthisconcept,hopingtheadage,“apicturespeaksathousandwords”,willapplyinthisinstancetoo:

Thepictureshowsarepresentationofacolorhistogramwithonecolumnpervaluefrom0to180(notethatOpenCVusesHvalues0-180.Othersystemsmayuse0-360or0-255).

AsidefromMeanshift,colorhistogramsareusedforanumberofdifferentandusefulimageandvideoprocessingoperations.

ThecalcHistfunctionThecalcHist()functioninOpenCVhasthefollowingPythonsignature:

calcHist(...)

calcHist(images,channels,mask,histSize,ranges[,hist[,

accumulate]])->hist

Thedescriptionoftheparameters(astakenfromtheofficialOpenCVdocumentation)areasfollows:

Parameter Description

imagesThisparameteristhesourcearrays.Theyallshouldhavethesamedepth,CV_8UorCV_32F,andthesamesize.Eachofthemcanhaveanarbitrarynumberofchannels.

channels Thisparameteristhelistofthedimschannelsusedtocomputethehistogram.

maskThisparameteristheoptionalmask.Ifthematrixisnotempty,itmustbean8-bitarrayofthesamesizeasimages[i].Thenonzeromaskelementsmarkthearrayelementscountedinthehistogram.

histSize Thisparameteristhearrayofhistogramsizesineachdimension.

ranges Thisparameteristhearrayofthedimsarraysofthehistogrambinboundariesineachdimension.

hist Thisparameteristheoutputhistogram,whichisadenseorsparsedims(dimensional)array.

accumulate

Thisparameteristheaccumulationflag.Ifitisset,thehistogramisnotclearedinthebeginningwhenitisallocated.Thisfeatureenablesyoutocomputeasinglehistogramfromseveralsetsofarrays,ortoupdatethehistogramintime.

Inourexample,wecalculatethehistogramsoftheregionofinterestlikeso:


ThiscanbeinterpretedasthecalculationofcolorhistogramsforanarrayofimagescontainingonlytheregionofinterestintheHSVspace.Inthisregion,wecomputeonlytheimagevaluescorrespondingtothemaskvaluesnotequalto0,with18histogramcolumns,andwitheachhistogramhaving0asthelowerboundaryand180astheupperboundary.

Thisisratherconvolutedtodescribebut,onceyouhavefamiliarizedyourselfwiththeconceptofahistogram,thepiecesofthepuzzleshouldclickintoplace.

ThecalcBackProjectfunctionTheotherfunctionthatcoversavitalroleintheMeanshiftalgorithm(butnotonlythis)iscalcBackProject,whichisshortforhistogrambackprojection(calculation).Ahistogrambackprojectionissocalledbecauseittakesahistogramandprojectsitbackontoanimage,withtheresultbeingtheprobabilitythateachpixelwillbelongtotheimagethatgeneratedthehistograminthefirstplace.Therefore,calcBackProjectgivesaprobabilityestimationthatacertainimageisequalorsimilartoamodelimage(fromwhichtheoriginalhistogramwasgenerated).

Again,ifyouthoughtcalcHistwasabitconvoluted,calcBackProjectisprobablyevenmorecomplex!

InsummaryThecalcHistfunctionextractsacolorhistogramfromanimage,givingastatisticalrepresentationofthecolorsinanimage,andcalcBackProjecthelpsincalculatingtheprobabilityofeachpixelofanimagebelongingtotheoriginalimage.

BacktothecodeLet’sgetbacktoourexample.Firstourusualimports,andthenwemarktheinitialregionofinterest:



r,h,c,w=10,200,10,200


Then,weextractandconverttheROItoHSVcolorspace:



Now,wecreateamasktoincludeallpixelsoftheROIwithHSVvaluesbetweenthelowerandupperbounds:


np.array((180.,120.,255.)))

Next,wecalculatethehistogramsoftheROI:



Afterthehistogramsarecalculated,thevaluesarenormalizedtobeincludedwithintherange0-255.

Meanshiftperformsanumberofiterationsbeforereachingconvergence;however,thisconvergenceisnotassured.So,OpenCVallowsustopassso-calledterminationcriteria,whichisawaytospecifythebehaviorofMeanshiftwithregardtoterminatingtheseriesofcalculations:


Inthisparticularcase,we’respecifyingabehaviorthatinstructsMeanshifttostopcalculatingthecentroidshiftafterteniterationsorifthecentroidhasmovedatleast1pixel.Thatfirstflag(EPSorCRITERIA_COUNT)indicateswe’regoingtouseeitherofthetwocriteria(countor“epsilon”,meaningtheminimummovement).

Nowthatwehaveahistogramcalculated,andterminationcriteriaforMeanshift,wecanstartourusualinfiniteloop,grabthecurrentframefromthecamera,andstartprocessingit.ThefirstthingwedoisswitchtoHSVcolorspace:

ifret==True:


NowthatwehaveanHSVarray,wecanoperatethelongawaitedhistogrambackprojection:


TheresultofcalcBackProjectisamatrix.Ifyouprintedittoconsole,itlooksmoreorlesslikethis:

[[000…,000]

[000…,000]

[000…,000]

...,

[0020…,000]

[78200…,000]

[25513720…,000]]

Eachpixelisrepresentedwithitsprobability.

ThismatrixcanthefinallybepassedintoMeanshift,togetherwiththetrackwindowandtheterminationcriteriaasoutlinedbythePythonsignatureofcv2.meanShift:

meanShift(...)

meanShift(probImage,window,criteria)->retval,window

Sohereitis:

ret,track_window=cv2.meanShift(dst,track_window,term_crit)

Finally,wecalculatethenewcoordinatesofthewindow,drawarectangletodisplayitintheframe,andthenshowit:


img2=cv2.rectangle(frame,(x,y),(x+w,y+h),255,2)


That’sit.Youshouldbynowhaveagoodideaofcolorhistograms,backprojections,andMeanshift.However,thereremainsoneissuetoberesolvedwiththeprecedingprogram:thesizeofthewindowdoesnotchangewiththesizeoftheobjectintheframesbeingtracked.

Oneoftheauthoritiesincomputervisionandauthoroftheseminalbook,LearningOpenCV,GaryBradski,O’Reilly,publishedapaperin1988toimprovetheaccuracyofMeanshift,anddescribedanewalgorithmcalledContinuouslyAdaptiveMeanshift(CAMShift),whichisverysimilartoMeanshiftbutalsoadaptsthesizeofthetrackwindowwhenMeanshiftreachesconvergence.

CAMShiftWhileCAMShiftaddscomplexitytoMeanshift,theimplementationofthetheprecedingprogramusingCAMShiftissurprisingly(ornot?)similartotheMeanshiftexample,withthemaindifferencebeingthat,afterthecalltoCamShift,therectangleisdrawnwithaparticularrotationthatfollowstherotationoftheobjectbeingtracked.

Here’sthecodereimplementedwithCAMShift:

importnumpyasnp

importcv2


#takefirstframeofthevideo


#setupinitiallocationofwindow

r,h,c,w=300,200,400,300#simplyhardcodedthevalues





np.array((180.,120.,255.)))




while(1):


ifret==True:



ret,track_window=cv2.CamShift(dst,track_window,term_crit)

pts=cv2.boxPoints(ret)

pts=np.int0(pts)

img2=cv2.polylines(frame,[pts],True,255,2)


k=cv2.waitKey(60)&0xff

ifk==27:

break

else:

break


cap.release()

ThedifferencebetweentheCAMShiftcodeandtheMeanshiftoneliesinthesefourlines:

ret,track_window=cv2.CamShift(dst,track_window,term_crit)


pts=np.int0(pts)

img2=cv2.polylines(frame,[pts],True,255,2)

ThemethodsignatureofCamShiftisidenticaltoMeanshift.

TheboxPointsfunctionfindstheverticesofarotatedrectangle,whilethepolylinesfunctiondrawsthelinesoftherectangleontheframe.

Bynow,youshouldbefamiliarwiththethreeapproachesweadoptedfortrackingobjects:basicmotiondetection,Meanshift,andCAMShift.

Let’snowexploreanothertechnique:theKalmanfilter.

TheKalmanfilterTheKalmanfilterisanalgorithmmainly(butnotonly)developedbyRudolfKalmaninthelate1950s,andhasfoundpracticalapplicationinmanyfields,particularlynavigationsystemsforallsortsofvehiclesfromnuclearsubmarinestoaircrafts.

TheKalmanfilteroperatesrecursivelyonstreamsofnoisyinputdata(whichincomputervisionisnormallyavideofeed)toproduceastatisticallyoptimalestimateoftheunderlyingsystemstate(thepositioninsidethevideo).

Let’stakeaquickexampletoconceptualizetheKalmanfilterandtranslatethepreceding(purposelybroadandgeneric)definitionintoplainerEnglish.Thinkofasmallredballonatable,andimagineyouhaveacamerapointingatthescene.Youidentifytheballasthesubjecttobetracked,andflickitwithyourfingers.Theballwillstartrollingonthetable,followingthelawsofmotionwe’refamiliarwith.

Iftheballisrollingataspeedof1meterpersecond(1m/s)inaparticulardirection,youdon’tneedtheKalmanfiltertoestimatewheretheballwillbein1second’stime:itwillbe1meteraway.TheKalmanfilterappliestheselawstopredictanobject’spositioninthecurrentvideoframebasedonobservationsgatheredinthepreviousframes.Naturally,theKalmanfiltercannotknowaboutapencilonthetabledeflectingthecourseoftheball,butitcanadjustforthiskindofunforeseeableevent.

PredictandupdateFromtheprecedingdescription,wegatherthattheKalmanfilteralgorithmisdividedintotwophases:

Predict:Inthefirstphase,theKalmanfilterusesthecovariancecalculateduptothecurrentpointintimetoestimatetheobject’snewpositionUpdate:Inthesecondphase,itrecordstheobject’spositionandadjuststhecovarianceforthenextcycleofcalculations

Thisadjustmentis—inOpenCVterms—acorrection,hencetheAPIoftheKalmanFilterclassinthePythonbindingsofOpenCVisasfollows:

classKalmanFilter(__builtin__.object)

|Methodsdefinedhere:

|

|__repr__(...)

|x.__repr__()<==>repr(x)

|

|correct(...)

|correct(measurement)->retval

|

|predict(...)

|predict([,control])->retval

Wecandeducethat,inourprograms,wewillcallpredict()toestimatethepositionofanobject,andcorrect()toinstructtheKalmanfiltertoadjustitscalculations.

AnexampleUltimately,wewillaimtousetheKalmanfilterincombinationwithCAMShifttoobtainthehighestdegreeofaccuracyandperformance.However,beforewegointosuchlevelsofcomplexity,let’sanalyzeasimpleexample,specificallyonethatseemstobeverycommonontheWebwhenitcomestotheKalmanfilterandOpenCV:mousetracking.

Inthefollowingexample,wewilldrawanemptyframeandtwolines:onecorrespondingtotheactualmovementofthemouse,andtheothercorrespondingtotheKalmanfilterprediction.Here’sthecode:

importcv2

importnumpyasnp

frame=np.zeros((800,800,3),np.uint8)

last_measurement=current_measurement=np.array((2,1),np.float32)

last_prediction=current_prediction=np.zeros((2,1),np.float32)

defmousemove(event,x,y,s,p):

globalframe,current_measurement,measurements,last_measurement,

current_prediction,last_prediction

last_prediction=current_prediction

last_measurement=current_measurement

current_measurement=np.array([[np.float32(x)],[np.float32(y)]])

kalman.correct(current_measurement)

current_prediction=kalman.predict()

lmx,lmy=last_measurement[0],last_measurement[1]

cmx,cmy=current_measurement[0],current_measurement[1]

lpx,lpy=last_prediction[0],last_prediction[1]

cpx,cpy=current_prediction[0],current_prediction[1]

cv2.line(frame,(lmx,lmy),(cmx,cmy),(0,100,0))

cv2.line(frame,(lpx,lpy),(cpx,cpy),(0,0,200))

cv2.namedWindow("kalman_tracker")

cv2.setMouseCallback("kalman_tracker",mousemove)

kalman=cv2.KalmanFilter(4,2)

kalman.measurementMatrix=np.array([[1,0,0,0],[0,1,0,0]],np.float32)

kalman.transitionMatrix=np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],

[0,0,0,1]],np.float32)

kalman.processNoiseCov=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],

[0,0,0,1]],np.float32)*0.03

whileTrue:

cv2.imshow("kalman_tracker",frame)

if(cv2.waitKey(30)&0xFF)==27:

break


Asusual,let’sanalyzeitstepbystep.Afterthepackagesimport,wecreateanemptyframe,ofsize800x800,andtheninitializethearraysthatwilltakethecoordinatesofthemeasurementsandpredictionsofthemousemovements:

frame=np.zeros((800,800,3),np.uint8)

last_measurement=current_measurement=np.array((2,1),np.float32)

last_prediction=current_prediction=np.zeros((2,1),np.float32)

Then,wedeclarethemousemoveCallbackfunction,whichisgoingtohandlethedrawingofthetracking.Themechanismisquitesimple;westorethelastmeasurementsandlastprediction,correcttheKalmanwiththecurrentmeasurement,calculatetheKalmanprediction,andfinallydrawtwolines,fromthelastmeasurementtothecurrentandfromthelastpredictiontothecurrent:

defmousemove(event,x,y,s,p):

globalframe,current_measurement,measurements,last_measurement,

current_prediction,last_prediction

last_prediction=current_prediction

last_measurement=current_measurement

current_measurement=np.array([[np.float32(x)],[np.float32(y)]])

kalman.correct(current_measurement)

current_prediction=kalman.predict()

lmx,lmy=last_measurement[0],last_measurement[1]

cmx,cmy=current_measurement[0],current_measurement[1]

lpx,lpy=last_prediction[0],last_prediction[1]

cpx,cpy=current_prediction[0],current_prediction[1]

cv2.line(frame,(lmx,lmy),(cmx,cmy),(0,100,0))

cv2.line(frame,(lpx,lpy),(cpx,cpy),(0,0,200))

ThenextstepistoinitializethewindowandsettheCallbackfunction.OpenCVhandlesmouseeventswiththesetMouseCallbackfunction;specificeventsmustbehandledusingthefirstparameteroftheCallback(event)functionthatdetermineswhatkindofeventhasbeentriggered(click,move,andsoon):

cv2.namedWindow("kalman_tracker")

cv2.setMouseCallback("kalman_tracker",mousemove)

Nowwe’rereadytocreatetheKalmanfilter:

kalman=cv2.KalmanFilter(4,2)

kalman.measurementMatrix=np.array([[1,0,0,0],[0,1,0,0]],np.float32)

kalman.transitionMatrix=np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],

[0,0,0,1]],np.float32)

kalman.processNoiseCov=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],

[0,0,0,1]],np.float32)*0.03

TheKalmanfilterclasstakesoptionalparametersinitsconstructor(fromtheOpenCVdocumentation):

dynamParams:ThisparameterstatesthedimensionalityofthestateMeasureParams:ThisparameterstatesthedimensionalityofthemeasurementControlParams:Thisparameterstatesthedimensionalityofthecontrolvector.type:ThisparameterstatesthetypeofthecreatedmatricesthatshouldbeCV_32ForCV_64F

Ifoundtheprecedingparameters(bothfortheconstructorandtheKalmanproperties)toworkverywell.

Fromthispointon,theprogramisstraightforward;everymousemovementtriggersaKalmanprediction,boththeactualpositionofthemouseandtheKalmanpredictionaredrawnintheframe,whichiscontinuouslydisplayed.Ifyoumoveyourmousearound,you’llnoticethat,ifyoumakeasuddenturnathighspeed,thepredictionlinewillhaveawidertrajectory,whichisconsistentwiththemomentumofthemousemovementatthetime.Here’sasampleresult:

Areal-lifeexample–trackingpedestriansUptothispoint,wehavefamiliarizedourselveswiththeconceptsofmotiondetection,objectdetection,andobjecttracking,soIimagineyouareanxioustoputthisnewfoundknowledgetogooduseinareal-lifescenario.Let’sdojustthatbyexaminingthevideofeedofasurveillancecameraandtrackingpedestriansinit.

Firstofall,weneedasamplevideo;ifyoudownloadtheOpenCVsource,youwillfindtheperfectvideofileforthispurposein<opencv_dir>/samples/data/768x576.avi.

Nowthatwehavetheperfectassettoanalyze,let’sstartbuildingtheapplication.

TheapplicationworkflowTheapplicationwilladheretothefollowinglogic:

1. Examinethefirstframe.2. Examinethefollowingframesandperformbackgroundsubtractiontoidentify

pedestriansinthesceneatthestartofthescene.3. EstablishanROIperpedestrian,anduseKalman/CAMShifttotrackgivinganIDto

eachpedestrian.4. Examinethenextframesfornewpedestriansenteringthescene.

Ifthiswereareal-worldapplication,youwouldprobablystorepedestrianinformationtoobtaininformationsuchastheaveragepermanenceofapedestrianinthesceneandmostlikelyroutes.However,thisisallbeyondtheremitofthisexampleapplication.

Inareal-worldapplication,youwouldmakesuretoidentifynewpedestriansenteringthescene,butfornow,we’llfocusontrackingthoseobjectsthatareinthesceneatthestartofthevideo,utilizingtheCAMShiftandKalmanfilteralgorithms.

Youwillfindthecodeforthisapplicationinchapter8/surveillance_demo/ofthecoderepository.

Abriefdigression–functionalversusobject-orientedprogrammingAlthoughmostprogrammersareeitherfamiliar(orworkonaconstantbasis)withObject-orientedProgramming(OOP),Ihavefoundthat,themoretheyearspass,themoreIpreferFunctionalProgramming(FP)solutions.

Forthosenotfamiliarwiththeterminology,FPisaprogrammingparadigmadoptedbymanylanguagesthattreatsprogramsastheevaluationofmathematicalfunctions,allowsfunctionstoreturnfunctions,andpermitsfunctionsasargumentsinafunction.ThestrengthofFPdoesnotonlyresideinwhatitcando,butalsoinwhatitcanavoid,oraimsatavoidingside-effectsandchangingstates.Ifthetopicoffunctionalprogramminghassparkedaninterest,makesuretocheckoutlanguagessuchasHaskell,Clojure,orML.

NoteWhatisaside-effectinprogrammingterms?Youcandefineasideeffectasanyfunction

thatchangesanyvaluethatdoesnotdependonthefunction’sinput.Python,alongwithmanyotherlanguages,issusceptibletocausingside-effectsbecause—muchlike,forexample,JavaScript—itallowsaccesstoglobalvariables(andsometimesthisaccesstoglobalvariablescanbeaccidental!).

Anothermajorissueencounteredwithlanguagesthatarenotpurelyfunctionalisthefactthatafunction’sresultwillchangeovertime,dependingonthestateofthevariablesinvolved.Ifafunctiontakesanobjectasanargument—forexample—andthecomputationreliesontheinternalstateofthatobject,thefunctionwillreturndifferentresultsaccordingtothechangesintheobject’sstate.Thisissomethingthatverytypicallyhappensinlanguages,suchasCandC++,infunctionswhereoneormoreoftheargumentsarereferencestoobjects.

Whythisdigression?BecausesofarIhaveillustratedconceptsusingmostlyfunctions;Ididnotshyawayfromaccessingglobalvariableswherethiswasthesimplestandmostrobustapproach.However,thenextprogramwewillexaminewillcontainOOP.SowhydoIchoosetoadoptOOPwhileadvocatingFP?BecauseOpenCVhasquiteanopinionatedapproach,whichmakesithardtoimplementaprogramwithapurelyfunctionalorobject-orientedapproach.

Forexample,anydrawingfunction,suchascv2.rectangleandcv2.circle,modifiestheargumentpassedintoit.Thisapproachcontravenesoneofthecardinalrulesoffunctionalprogramming,whichistoavoidside-effectsandchangingstates.

Outofcuriosity,youcould—inPython—redeclaretheAPIofthesedrawingfunctionsinawaythatismoreFP-friendly.Forexample,youcouldrewritecv2.rectanglelikethis:

defdrawRect(frame,topLeft,bottomRight,color,thickness,fill=

cv2.LINE_AA):

newframe=frame.copy()

cv2.rectangle(newframe,topLeft,bottomRight,color,thickness,fill)

returnnewframe

Thisapproach—whilecomputationallymoreexpensiveduetothecopy()operation—allowstheexplicitreassignmentofaframe,likeso:

frame=camera.read()

frame=drawRect(frame,(0,0),(10,10),(0,255,0),1)

Toconcludethisdigression,Iwillreiterateabeliefveryoftenmentionedinallprogrammingforumsandresources:thereisnosuchthingasthebestlanguageorparadigm,onlythebesttoolforthejobinhand.

Solet’sgetbacktoourprogramandexploretheimplementationofasurveillanceapplication,trackingmovingobjectsinavideo.

ThePedestrianclassThemainrationalebehindthecreationofaPedestrianclassisthenatureoftheKalmanfilter.TheKalmanfiltercanpredictthepositionofanobjectbasedonhistoricalobservationsandcorrectthepredictionbasedontheactualdata,butitcanonlydothatforoneobject.

Asaconsequence,weneedoneKalmanfilterperobjecttracked.

SothePedestrianclasswillactasaholderforaKalmanfilter,acolorhistogram(calculatedonthefirstdetectionoftheobjectandusedasareferenceforthesubsequentframes),andinformationabouttheregionofinterest,whichwillbeusedbytheCAMShiftalgorithm(thetrack_windowparameter).

Furthermore,westoretheIDofeachpedestrianforsomefancyreal-timeinfo.

Let’stakealookatthePedestrianclass:

classPedestrian():

"""Pedestrianclass

eachpedestrianiscomposedofaROI,anIDandaKalmanfilter

sowecreateaPedestrianclasstoholdtheobjectstate

"""

def__init__(self,id,frame,track_window):

"""initthepedestrianobjectwithtrackwindowcoordinates"""

#setuptheroi

self.id=int(id)


self.track_window=track_window

self.roi=cv2.cvtColor(frame[y:y+h,x:x+w],cv2.COLOR_BGR2HSV)

roi_hist=cv2.calcHist([self.roi],[0],None,[16],[0,180])

self.roi_hist=cv2.normalize(roi_hist,roi_hist,0,255,

cv2.NORM_MINMAX)

#setupthekalman

self.kalman=cv2.KalmanFilter(4,2)

self.kalman.measurementMatrix=np.array([[1,0,0,0],

[0,1,0,0]],np.float32)

self.kalman.transitionMatrix=np.array([[1,0,1,0],[0,1,0,1],[0,0,1,0],

[0,0,0,1]],np.float32)

self.kalman.processNoiseCov=np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],

[0,0,0,1]],np.float32)*0.03

self.measurement=np.array((2,1),np.float32)

self.prediction=np.zeros((2,1),np.float32)

self.term_crit=(cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,10,

1)

self.center=None

self.update(frame)

def__del__(self):

print"Pedestrian%ddestroyed"%self.id

defupdate(self,frame):

#print"updating%d"%self.id


back_project=cv2.calcBackProject([hsv],[0],self.roi_hist,[0,180],1)

ifargs.get("algorithm")=="c":

ret,self.track_window=cv2.CamShift(back_project,

self.track_window,self.term_crit)


pts=np.int0(pts)

self.center=center(pts)

cv2.polylines(frame,[pts],True,255,1)

ifnotargs.get("algorithm")orargs.get("algorithm")=="m":

ret,self.track_window=cv2.meanShift(back_project,

self.track_window,self.term_crit)

x,y,w,h=self.track_window

self.center=center([[x,y],[x+w,y],[x,y+h],[x+w,y+h]])


self.kalman.correct(self.center)

prediction=self.kalman.predict()

cv2.circle(frame,(int(prediction[0]),int(prediction[1])),4,(0,255,

0),-1)

#fakeshadow

cv2.putText(frame,"ID:%d->%s"%(self.id,self.center),(11,

(self.id+1)*25+1),

font,0.6,

(0,0,0),

1,

cv2.LINE_AA)

#actualinfo

cv2.putText(frame,"ID:%d->%s"%(self.id,self.center),(10,

(self.id+1)*25),

font,0.6,

(0,255,0),

1,

cv2.LINE_AA)

Atthecoreoftheprogramliesthebackgroundsubtractorobject,whichletsusidentifyregionsofinterestcorrespondingtomovingobjects.

Whentheprogramstarts,wetakeeachoftheseregionsandinstantiateaPedestrianclass,passingtheID(asimplecounter),andtheframeandtrackwindowcoordinates(sowecanextracttheRegionofInterest(ROI),and,fromthis,theHSVhistogramoftheROI).

Theconstructorfunction(__init__inPython)ismoreorlessanaggregationofallthepreviousconcepts:givenanROI,wecalculateitshistogram,setupaKalmanfilter,andassociateittoaproperty(self.kalman)oftheobject.

Intheupdatemethod,wepassthecurrentframeandconvertittoHSVsothatwecancalculatethebackprojectionofthepedestrian’sHSVhistogram.

WethenuseeitherCAMShiftorMeanshift(dependingontheargumentpassed;Meanshiftisthedefaultifnoargumentsarepassed)totrackthemovementofthepedestrian,andcorrecttheKalmanfilterforthatpedestrianwiththeactualposition.

WealsodrawbothCAMShift/Meanshift(withasurroundingrectangle)andKalman(withadot),soyoucanobserveKalmanandCAMShift/Meanshiftgonearlyhandinhand,exceptforsuddenmovementsthatcauseKalmantohavetoreadjust.

Lastly,weprintsomepedestrianinformationonthetop-leftcorneroftheimage.

ThemainprogramNowthatwehaveaPedestrianclassholdingallspecificinformationforeachobject,let’stakealookatthemainfunctionintheprogram.

First,weloadavideo(itcouldbeawebcam),andthenweinitializeabackgroundsubtractor,setting20framesastheframesaffectingthebackgroundmodel:

history=20

bs=cv2.createBackgroundSubtractorKNN(detectShadows=True)

bs.setHistory(history)

Wealsocreatethemaindisplaywindow,andthensetupapedestriansdictionaryandafirstFrameflag,whichwe’regoingtousetoallowafewframesforthebackgroundsubtractortobuildhistory,soitcanbetteridentifymovingobjects.Tohelpwiththis,wealsosetupaframecounter:

cv2.namedWindow("surveillance")

pedestrians={}

firstFrame=True

frames=0

Nowwestarttheloop.Wereadcameraframes(orvideoframes)onebyone:

whileTrue:

print"--------------------FRAME%d--------------------"%frames

grabbed,frane=camera.read()

if(grabbedisFalse):

print"failedtograbframe."

break


WeletBackgroundSubtractorKNNbuildthehistoryforthebackgroundmodel,sowedon’tactuallyprocessthefirst20frames;weonlypassthemintothesubtractor:


#thisisjusttoletthebackgroundsubtractorbuildabitofhistory

ifframes<history:

frames+=1

continue

Thenweprocesstheframewiththeapproachexplainedearlierinthechapter,byapplyingaprocessofdilationanderosionontheforegroundmasksoastoobtaineasilyidentifiableblobsandtheirboundingboxes.Theseareobviouslymovingobjectsintheframe:


th=cv2.erode(th,cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3)),

iterations=2)





Oncethecontoursareidentified,weinstantiateonepedestrianpercontourforthefirst

frameonly(notethatIsetaminimumareaforthecontourtofurtherdenoiseourdetection):

counter=0

forcincontours:




#onlycreatepedestriansinthefirstframe,thenjustfollowthe

onesyouhave

iffirstFrameisTrue:

pedestrians[counter]=Pedestrian(counter,frame,(x,y,w,h))

counter+=1

Then,foreachpedestriandetected,weperformanupdatemethodpassingthecurrentframe,whichisneededinitsoriginalcolorspace,becausethepedestrianobjectsareresponsiblefordrawingtheirowninformation(textandMeanshift/CAMShiftrectangles,andKalmanfiltertracking):

fori,pinpedestrians.iteritems():

p.update(frame)

WesetthefirstFrameflagtoFalse,sowedon’tinstantiateanymorepedestrians;wejustkeeptrackoftheoneswehave:

firstFrame=False

frames+=1

Finally,weshowtheresultinthedisplaywindow.TheprogramcanbeexitedbypressingtheEsckey:

cv2.imshow("surveillance",frame)

ifcv2.waitKey(110)&0xff==27:

break


main()

Thereyouhaveit:CAMShift/MeanshiftworkingintandemwiththeKalmanfiltertotrackmovingobjects.Allbeingwell,youshouldobtainaresultsimilartothis:

Inthisscreenshot,thebluerectangleistheCAMShiftdetectionandthegreenrectangleistheKalmanfilterpredictionwithitscenteratthebluecircle.

Wheredowegofromhere?

Thisprogramconstitutesabasisforyourapplicationdomain’sneeds.Therearemanyimprovementsthatcanbemadebuildingontheprogramabovetosuitanapplication’sadditionalrequirements.Considerthefollowingexamples:

YoucoulddestroyapedestrianobjectifKalmanpredictsitspositiontobeoutsidetheframeYoucouldcheckwhethereachdetectedmovingobjectiscorrespondingtoexistingpedestrianinstances,andifnot,createaninstanceforitYoucouldtrainanSVMandoperateclassificationoneachmovingobjecttoestablishwhetherornotthemovingobjectisofthenatureyouintendtotrack(forinstance,adogmightenterthescenebutyourapplicationrequirestoonlytrackhumans)

Whateveryourneeds,hopefullythischapterwillhaveprovidedyouwiththenecessaryknowledgetobuildapplicationsthatsatisfyyourrequirements.

SummaryThischapterexploredthevastandcomplextopicofvideoanalysisandtrackingobjects.

Welearnedaboutvideobackgroundsubtractionwithabasicmotiondetectiontechniquethatcalculatesframedifferences,andthenmovedtomorecomplexandefficienttoolssuchasBackgroundSubtractor.

Wethenexploredtwoveryimportantvideoanalysisalgorithms:MeanshiftandCAMShift.Inthecourseofthis,wetalkedindetailaboutcolorhistogramsandbackprojections.WealsofamiliarizedourselveswiththeKalmanfilter,anditsusefulnessinacomputervisioncontext.Finally,weputallourknowledgetogetherinasamplesurveillanceapplication,whichtracksmovingobjectsinavideo.

NowthatourfoundationinOpenCVandmachinelearningissolidifying,wearereadytotackleartificialneuralnetworksanddivedeeperintoartificialintelligencewithOpenCVandPythoninthenextchapter.

Chapter9.NeuralNetworkswithOpenCV–anIntroductionMachinelearningisabranchofartificialintelligence,onethatdealsspecificallywithalgorithmsthatenableamachinetorecognizepatternsandtrendsindataandsuccessfullymakepredictionsandclassifications.

ManyofthealgorithmsandtechniquesusedbyOpenCVtoaccomplishsomeofthemoreadvancedtasksincomputervisionaredirectlyrelatedtoartificialintelligenceandmachinelearning.

ThischapterwillintroduceyoutotheMachineLearningconceptsinOpenCVsuchasartificialneuralnetworks.Thisisagentleintroductionthatbarelyscratchesthesurfaceofavastworld,thatofMachineLearning,whichiscontinuouslyevolving.

ArtificialneuralnetworksLet’sstartbydefiningArtificialNeuralNetworks(ANN)withanumberoflogicalsteps,ratherthanaclassicmonolithicsentenceusingobscurejargonwithanevenmoreobscuremeaning.

Firstofall,anANNisastatisticalmodel.Whatisastatisticalmodel?Astatisticalmodelisapairofelements,namelythespaceS(asetofobservations)andtheprobabilityP,wherePisadistributionthatapproximatesS(inotherwords,afunctionthatwouldgenerateasetofobservationsthatisverysimilartoS).

IliketothinkofPintwoways:asasimplificationofacomplexscenario,andasthefunctionthatgeneratedSinthefirstplace,orattheveryleastasetofobservationsverysimilartoS.

SoANNsaremodelsthattakeacomplexreality,simplifyit,anddeduceafunctionto(approximately)representstatisticalobservationsonewouldexpectfromthatreality,inmathematicalform.

ThenextstepinourjourneytowardscomprehendingANNsistounderstandhowanANNimprovesontheconceptofasimplestatisticalmodel.

Whatifthefunctionthatgeneratedthedatasetislikelytotakealargeamountof(unknown)inputs?

TheapproachthatANNstakeistodelegateworktoanumberofneurons,nodes,orunits,eachofwhichiscapableof“approximating”thefunctionthatcreatedtheinputs.Approximationismathematicallythedefinitionofasimplerfunctionthatapproximatesamorecomplexfunction,whichenablesustodefineerrors(relativetotheapplicationdomain).Furthermore,foraccuracy’ssake,anetworkisgenerallyrecognizedtobeneuraliftheneuronsorunitsarecapableofapproximatinganonlinearfunction.

Let’stakeacloserlookatneurons.

NeuronsandperceptronsTheperceptronisaconceptthatdatesbacktothe1950s,and(toputitsimply)aperceptronisafunctionthattakesanumberofinputsandproducesasinglevalue.

Eachoftheinputshasanassociatedweightthatsignifiestheimportanceoftheinputinthefunction.Thesigmoidfunctionproducesasinglevalue:

Asigmoidfunctionisatermthatindicatesthatthefunctionproduceseithera0or1value.Thediscriminantisathresholdvalue;iftheweightedsumoftheinputsisgreaterthanacertainthreshold,theperceptronproducesabinaryclassificationof1,otherwise0.

Howaretheseweightsdetermined,andwhatdotheyrepresent?

Neuronsareinterconnectedtoeachother,andeachneuron’ssetofweights(thesearejustnumericalparameters)definesthestrengthoftheconnectiontootherneurons.Theseweightsare“adaptive”,meaningtheychangeintimeaccordingtoalearningalgorithm.

ThestructureofanANNHere’savisualrepresentationofaneuralnetwork:

Asyoucanseefromthefigure,therearethreedistinctlayersinaneuralnetwork:Inputlayer,Hiddenlayer(ormiddle),andOutputlayer.

Therecanbemorethanonehiddenlayer;however,onehiddenlayerwouldbeenoughtoresolvethemajorityofreal-lifeproblems.

NetworklayersbyexampleHowdowedeterminethenetwork’stopology,andhowmanyneuronstocreateforeachlayer?Let’smakethisdeterminationlayerbylayer.

TheinputlayerTheinputlayerdefinesthenumberofinputsintothenetwork.Forexample,let’ssayyouwanttocreateanANN,whichwillhelpyoudeterminewhatanimalyou’relookingatgivenadescriptionofitsattributes.Let’sfixtheseattributestoweight,length,andteeth.That’sasetofthreeattributes;ournetworkwillneedtocontainthreeinputnodes.

TheoutputlayerTheoutputlayerisequaltothenumberofclassesweidentified.Continuingwiththeprecedingexampleofananimalclassificationnetwork,wewillarbitrarilysettheoutputlayerto4,becauseweknowwe’regoingtodealwiththefollowinganimals:dog,condor,dolphin,anddragon.Ifwefeedindataforananimalthatisnotinoneofthesecategories,thenetworkwillreturntheclassmostlikelytoresemblethisunclassifiedanimal.

ThehiddenlayerThehiddenlayercontainsperceptrons.Asmentioned,thevastmajorityofproblemsonlyrequireonehiddenlayer;mathematicallyspeaking,thereisnoverifiedreasontohavemorethantwohiddenlayers.Wewill,therefore,sticktoonehiddenlayerandworkwiththat.

Thereareanumberofrulesofthumbtodeterminethenumberofneuronscontainedinthehiddenlayer,butthereisnohard-and-fastrule.Theempiricalwayisyourfriendinthisparticularcircumstance:testyournetworkwithdifferentsettings,andchoosetheonethatfitsbest.

ThesearesomeofthemostcommonrulesusedwhenbuildinganANN:

Thenumberofhiddenneuronsshouldbebetweenthesizeoftheinputlayerandthesizeoftheoutputlayer.Ifthedifferencebetweentheinputlayersizeandtheoutputlayerislarge,itismyexperiencethatahiddenlayersizemuchclosertotheoutputlayerispreferable.Forrelativelysmallinputlayers,thenumberofhiddenneuronsistwo-thirdsthesizeoftheinputlayer,plusthesizeoftheoutputlayer,orlessthantwicethesizeoftheinputlayer.

Oneveryimportantfactortokeepinmindisoverfitting.Overfittingoccurswhenthere’ssuchaninordinateamountofinformationcontainedinthehiddenlayer(forexample,adisproportionateamountofneuronsinthelayer)comparedtotheinformationprovidedbythetrainingdatathatclassificationisnotverymeaningful.

Thelargerthehiddenlayer,themoretraininginformationisrequiredforthenetworktobetrainedproperly.And,needlesstosay,thisisgoingtolengthenthetimerequiredbythenetworktoproperlytrain.

So,followingthesecondrulesofthumbillustratedearlier,ournetworkwillhaveahiddenlayerofsize8,justbecauseafterafewrunsofthenetwork,Ifoundittoyieldthebestresults.Asasidenote,theempiricalapproachisverymuchencouragedintheworldofANNs.Thebestnetworktopologyisrelatedtothetypeofdatafedtothenetwork,sodon’trefrainfromtestingANNsinatrial-and-errorfashion.

Insummary,ournetworkhasthefollowingsizes:

Input:3Hidden:8Output:4

Thelearningalgorithms

ThereareanumberoflearningalgorithmsusedbyANNs,butwecanidentifythreemajorones:

Supervisedlearning:Withthisalgorithm,wewanttoobtainafunctionfromtheANN,whichdescribesthedatawelabeled.Weknow,apriori,thenatureofthisdata,andwedelegatetotheANNtheprocessoffindingafunctionthatdescribesthedata.Unsupervisedlearning:Thisalgorithmdiffersfromsupervisedlearning;inthis,thedataisunlabeled.Thisimpliesthatwedon’thavetoselectandlabeldata,butitalsomeanstheANNhasalotmoreworktodo.Theclassificationofthedataisusuallyobtainedthroughtechniquessuchas(butnotonly)clustering,whichweexploredinChapter7,DetectingandRecognizingObjects.Reinforcementlearning:Reinforcementlearningisalittlemorecomplex.Asystemreceivesaninput;adecision-makingmechanismdeterminesanaction,whichisperformedandscored(success/failureandgradesinbetween);andfinallytheinputandtheactionarepairedwiththeirscore,sothesystemlearnstorepeatorchangetheactiontobeperformedforacertaininputorstate.

NowthatwehaveageneralideaofwhatANNsare,let’sseehowOpenCVimplementsthem,andhowtoputthemtogooduse.Finally,we’llworkourwayuptoafullblownapplication,inwhichwewillattempttorecognizehandwrittendigits.

ANNsinOpenCVUnsurprisingly,ANNsresideinthemlmoduleofOpenCV.

Let’sexamineadummyexample,asagentleintroductiontoANNs:

importcv2

importnumpyasnp

ann=cv2.ml.ANN_MLP_create()

ann.setLayerSizes(np.array([9,5,9],dtype=np.uint8))

ann.setTrainMethod(cv2.ml.ANN_MLP_BACKPROP)

ann.train(np.array([[1.2,1.3,1.9,2.2,2.3,2.9,3.0,3.2,3.3]],

dtype=np.float32),

cv2.ml.ROW_SAMPLE,

np.array([[0,0,0,0,0,1,0,0,0]],dtype=np.float32))

printann.predict(np.array([[1.4,1.5,1.2,2.,2.5,2.8,3.,3.1,3.8]],

dtype=np.float32))

First,wecreateanANN:


YoumaywonderabouttheMLPacronyminthefunctionname;itstandsformultilayerperceptron.Bynow,youshouldknowwhataperceptronis.

Aftercreatingthenetwork,weneedtosetitstopology:

ann.setLayerSizes(np.array([9,5,9],dtype=np.uint8))

ann.setTrainMethod(cv2.ml.ANN_MLP_BACKPROP)

ThelayersizesaredefinedbytheNumPyarraythatispassedintothesetLayerSizesmethod.Thefirstelementsetsthesizeoftheinputlayer,thelastelementsetsthesizeoftheoutputlayer,andallintermediaryelementsdefinethesizeofthehiddenlayers.

Wethensetthetrainmethodtobebackpropagation.Therearetwochoiceshere:BACKPROPandRPROP.

BothBACKPROPandRPROParebackpropagationalgorithms—insimpleterms,algorithmsthathaveaneffectonweightsbasedonerrorsinclassification.

Thesetwoalgorithmsworkinthecontextofsupervisedlearning,whichiswhatweareusingintheexample.Howcanwetellthisparticulardetail?Lookatthenextstatement:


dtype=np.float32),

cv2.ml.ROW_SAMPLE,


Youshouldnoticeanumberofdetails.Themethodlooksextremelysimilartothetrain()methodofsupportvectormachine.Themethodcontainsthreeparameters:samples,layout,andresponses.Onlysamplesistherequiredparameter;theothertwoareoptional.

Thisrevealsthefollowinginformation:

First,ANN,likeSVM,isanOpenCVStatModel(statisticalmodel);trainandpredictarethemethodsinheritedfromthebaseStatModelclass.Second,astatisticalmodeltrainedwithonlysamplesisadoptinganunsupervisedlearningalgorithm.Ifweprovidelayoutandresponses,we’reinasupervisedlearningcontext.

Aswe’reusingANNs,wecanspecifythetypeofbackpropagationalgorithmwe’regoingtouse(BACKPROPorRPROP),because—aswesaid—we’reinasupervisedlearningenvironment.

Sowhatisbackpropagation?Backpropagationisatwo-phasealgorithmthatcalculatestheerrorofpredictionsandupdatesinbothdirectionsofthenetwork(theinputandoutputlayers);itthenupdatestheneuronweightsaccordingly.

Let’straintheANN;aswespecifiedaninputlayerofsize9,weneedtoprovide9inputs,and9outputstoreflectthesizeoftheoutputlayer:


dtype=np.float32),

cv2.ml.ROW_SAMPLE,


Thestructureoftheresponseissimplyanarrayofzeros,witha1valueinthepositionindicatingtheclasswewanttoassociatetheinputwith.Inourprecedingexample,weindicatedthatthespecifiedinputarraycorrespondstoclass5(classesarezero-indexed)ofclasses0to8.

Lastly,weperformclassification:

printann.predict(np.array([[1.4,1.5,1.2,2.,2.5,2.8,3.,3.1,3.8]],

dtype=np.float32))

Thiswillyieldthefollowingresult:

(5.0,array([[-0.06419383,-0.13360272,-0.1681568,-0.18708915,

0.0970564,

0.89237726,0.05093023,0.17537238,0.13388439]],dtype=float32))

Thismeansthattheprovidedinputwasclassifiedasbelongingtoclass5.Thisisonlyadummyexampleandtheclassificationisprettymeaningless;however,thenetworkbehavedcorrectly.Inthiscode,weonlyprovidedonetrainingrecordforclass5,sothenetworkclassifiedanewinputasbelongingtoclass5.

Asyoumayhaveguessed,theoutputofapredictionisatuple,withthefirstvaluebeingtheclassandthesecondbeinganarraycontainingtheprobabilitiesforeachclass.Thepredictedclasswillhavethehighestvalue.Let’smoveontoaslightlymoreusefulexample:animalclassification.

ANN-imalclassificationPickingupfromwhereweleftoff,let’sillustrateaverysimpleexampleofanANNthatattemptstoclassifyanimalsbasedontheirstatistics(weight,length,andteeth).Myintentistodescribeamockreal-lifescenariotoimproveourunderstandingofANNsbeforewestartapplyingittocomputervisionand,specifically,OpenCV:

importcv2

importnumpyasnp

fromrandomimportrandint

animals_net=cv2.ml.ANN_MLP_create()

animals_net.setTrainMethod(cv2.ml.ANN_MLP_RPROP|

cv2.ml.ANN_MLP_UPDATE_WEIGHTS)

animals_net.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)

animals_net.setLayerSizes(np.array([3,8,4]))

animals_net.setTermCriteria((cv2.TERM_CRITERIA_EPS|

cv2.TERM_CRITERIA_COUNT,10,1))

"""Inputarrays

weight,length,teeth

"""

"""Outputarrays

dog,eagle,dolphinanddragon

"""

defdog_sample():

return[randint(5,20),1,randint(38,42)]

defdog_class():

return[1,0,0,0]

defcondor_sample():

return[randint(3,13),3,0]

defcondor_class():

return[0,1,0,0]

defdolphin_sample():

return[randint(30,190),randint(5,15),randint(80,100)]

defdolphin_class():

return[0,0,1,0]

defdragon_sample():


defdragon_class():

return[0,0,0,1]

SAMPLES=5000

forxinrange(0,SAMPLES):

print"Samples%d/%d"%(x,SAMPLES)

animals_net.train(np.array([dog_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([dog_class()],dtype=np.float32))

animals_net.train(np.array([condor_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([condor_class()],dtype=np.float32))

animals_net.train(np.array([dolphin_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([dolphin_class()],dtype=np.float32))

animals_net.train(np.array([dragon_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([dragon_class()],dtype=np.float32))

printanimals_net.predict(np.array([dog_sample()],dtype=np.float32))

printanimals_net.predict(np.array([condor_sample()],dtype=np.float32))

printanimals_net.predict(np.array([dragon_sample()],dtype=np.float32))

Thereareagoodfewdifferencesbetweenthisexampleandthedummyexample,solet’sexaminetheminorder.

First,theusualimports.Then,weimportrandint,justbecausewewanttogeneratesomerelativelyrandomdata:

importcv2

importnumpyasnp

fromrandomimportrandint

Then,wecreatetheANN.Thistime,wespecifythetrainmethodtoberesilientbackpropagation(animprovedversionofbackpropagation)andtheactivationfunctiontobeasigmoidfunction:

animals_net=cv2.ml.ANN_MLP_create()

animals_net.setTrainMethod(cv2.ml.ANN_MLP_RPROP|

cv2.ml.ANN_MLP_UPDATE_WEIGHTS)

animals_net.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)

animals_net.setLayerSizes(np.array([3,8,4]))

Also,wespecifytheterminationcriteriasimilarlytothewaywedidintheCAMShiftalgorithminthepreviouschapter:

animals_net.setTermCriteria((cv2.TERM_CRITERIA_EPS|

cv2.TERM_CRITERIA_COUNT,10,1))

Nowweneedsomedata.We’renotreallysomuchinterestedinrepresentinganimalsaccurately,asrequiringabunchofrecordstobeusedastrainingdata.Sowebasicallydefinefoursamplecreationfunctionsandfourclassificationfunctionsthatwillhelpustrainthenetwork:

"""Inputarrays

weight,length,teeth

"""

"""Outputarrays

dog,eagle,dolphinanddragon

"""

defdog_sample():

return[randint(5,20),1,randint(38,42)]

defdog_class():

return[1,0,0,0]

defcondor_sample():

return[randint(3,13),3,0]

defcondor_class():

return[0,1,0,0]

defdolphin_sample():


defdolphin_class():

return[0,0,1,0]

defdragon_sample():


defdragon_class():

return[0,0,0,1]

Let’sproceedwiththecreationofourfakeanimaldata;we’llcreate5,000samplesperclass:

SAMPLES=5000

forxinrange(0,SAMPLES):

print"Samples%d/%d"%(x,SAMPLES)

animals_net.train(np.array([dog_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([dog_class()],dtype=np.float32))

animals_net.train(np.array([condor_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([condor_class()],dtype=np.float32))

animals_net.train(np.array([dolphin_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([dolphin_class()],dtype=np.float32))

animals_net.train(np.array([dragon_sample()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([dragon_class()],dtype=np.float32))

Intheend,weprinttheresultsthatyieldthefollowingcode:

(1.0,array([[1.49817729,1.60551953,-1.56444871,-0.04313202]],

dtype=float32))

(1.0,array([[1.49817729,1.60551953,-1.56444871,-0.04313202]],

dtype=float32))

(3.0,array([[-1.54576635,-1.68725526,1.6469276,2.23223686]],

dtype=float32))

Fromtheseresults,wededucethefollowing:

Thenetworkgottwooutofthreesamplescorrect,whichisnotperfectbutservesasagoodexampletoillustratetheimportanceofalltheelementsinvolvedinbuildingandtraininganANN.Thesizeoftheinputlayerisveryimportanttocreatediversificationbetweenthedifferentclasses.Inourcase,weonlyhadthreestatisticsandthereisarelativeoverlappinginfeatures.Thesizeofthehiddenlayerneedstobetested.Youwillfindthatincreasingneuronsmayimproveaccuracytoapoint,andthenitwilloverfit,unlessyoustartcompensatingwithenormousamountsofdata:thenumberoftrainingrecords.

Definitely,avoidhavingtoofewrecordsorfeedingalotofidenticalrecordsastheANNwon’tlearnmuchfromthem.

TrainingepochsAnotherimportantconceptintrainingANNsistheideaofepochs.Atrainingepochisaniterationthroughthetrainingdata,afterwhichthedataistestedforclassification.MostANNstrainoverseveralepochs;you’llfindthatsomeofthemostcommonexamplesofANNs,classifyinghandwrittendigits,willhavethetrainingdataiteratedseveralhundredtimes.

IpersonallysuggestyouspendalotoftimeplayingwithANNsandthenumberofepochs,untilyoureachconvergence,whichmeansthatfurtheriterationswillnolongerimprove(atleastnotnoticeably)theaccuracyoftheresults.

Theprecedingexamplecanbemodifiedasfollowstoleverageepochs:

defrecord(sample,classification):

return(np.array([sample],dtype=np.float32),np.array([classification],

dtype=np.float32))

records=[]

RECORDS=5000

forxinrange(0,RECORDS):

records.append(record(dog_sample(),dog_class()))

records.append(record(condor_sample(),condor_class()))

records.append(record(dolphin_sample(),dolphin_class()))

records.append(record(dragon_sample(),dragon_class()))

EPOCHS=5

foreinrange(0,EPOCHS):

print"Epoch%d:"%e

fort,cinrecords:

animals_net.train(t,cv2.ml.ROW_SAMPLE,c)

Then,dosometests,startingwiththedogclass:

dog_results=0

forxinrange(0,100):

clas=int(animals_net.predict(np.array([dog_sample()],

dtype=np.float32))[0])

print"class:%d"%clas

if(clas)==0:

dog_results+=1

Repeatoverallclassesandoutputtheresults:

print"Dogaccuracy:%f"%(dog_results)

print"condoraccuracy:%f"%(condor_results)

print"dolphinaccuracy:%f"%(dolphin_results)

print"dragonaccuracy:%f"%(dragon_results)

Finally,weobtainthefollowingresults:

Dogaccuracy:100.000000%

condoraccuracy:0.000000%

dolphinaccuracy:0.000000%

dragonaccuracy:92.000000%

Considerthefactthatwe’reonlyplayingwithtoy/fakedataandthesizeoftrainingdata/trainingiterations;thisteachesusquitealot.WecandiagnosetheANNasoverfittingtowardscertainclasses,soit’simportanttoimprovethequalityofthedatayoufeedintothetrainingprocess.

Allthatsaid,timeforareal-lifeexample:handwrittendigitrecognition.

HandwrittendigitrecognitionwithANNsTheworldofMachineLearningisvastandmostlyunexplored,andANNsarebutoneofthemanyconceptsrelatedtoMachineLearning,whichisoneofthemanysubdisciplinesofArtificialIntelligence.Forthepurposeofthischapter,wewillonlybeexploringtheconceptofANNsinthecontextofOpenCV.ItisbynomeansanexhaustivetreatiseonthesubjectofArtificialIntelligence.

Ultimately,we’reinterestedinseeingANNsworkintherealworld.Solet’sgoaheadandmakeithappen.

MNIST–thehandwrittendigitdatabaseOneofthemostpopularresourcesontheWebforthetrainingofclassifiersdealingwithOCRandhandwrittencharacterrecognitionistheMNISTdatabase,publiclyavailableathttp://yann.lecun.com/exdb/mnist/.

Thisparticulardatabaseisafreelyavailableresourcetokick-startthecreationofaprogramthatutilizesANNstorecognizehandwrittendigits.

http://yann.lecun.com/exdb/mnist/

CustomizedtrainingdataItisalwayspossibletobuildyourowntrainingdata.Itwilltakealittlebitofpatiencebutit’sfairlyeasy;collectavastnumberofhandwrittendigitsandcreateimagescontainingasingledigit,makingsurealltheimagesarethesamesizeandingrayscale.

Afterthis,youwillhavetocreateamechanismthatkeepsatrainingsampleinsyncwiththeexpectedclassification.

TheinitialparametersLet’stakealookattheindividuallayersinthenetwork:

InputlayerHiddenlayerOutputlayer

TheinputlayerSincewe’regoingtoutilizetheMNISTdatabase,theinputlayerwillhaveasizeof784inputnodes:that’sbecauseMNISTsamplesare28x28pixelimages,whichmeans784pixels.

ThehiddenlayerAswehaveseen,there’snohard-and-fastruleforthesizeofthehiddenlayer,I’vefound—throughseveralattempts—that50to60nodesyieldsthebestresultwhilenotnecessitatinganinordinateamountoftrainingdata.

Youcanincreasethesizeofthehiddenlayerwiththeamountofdata,butbeyondacertainpoint,therewillbenoadvantagetothat;youwillalsohavetobepreparedforyournetworktotakehourstotrain(themorehiddenneurons,thelongerittakestotrainthenetwork).

TheoutputlayerTheoutputlayerwillhaveasizeof10.Thisshouldnotbeasurpriseaswewanttoclassify10digits(0-9).

TrainingepochsWewillinitiallyusetheentiresetofthetraindatafromMNIST,whichconsistsofover60,000handwrittenimages,halfofwhichwerewrittenbyUSgovernmentemployees,andtheotherhalfbyhigh-schoolstudents.That’salotofdata,sowewon’tneedmorethanoneepochtoachieveanacceptablyhighaccuracyondetection.

Fromthereon,itisuptoyoutotrainthenetworkiterativelyonthesametraindata,andmysuggestionisthatyouuseanaccuracytest,andfindtheepochatwhichtheaccuracy“peaks”.Bydoingso,youwillhaveaprecisemeasurementofthehighestpossibleaccuracyachievedbyyournetworkgivenitscurrentconfiguration.

OtherparametersWewilluseasigmoidactivationfunction,ResilientBackPropagation(RPROP),andextendtheterminationcriteriaforeachcalculationto20iterationsinsteadof10,likewedidforeveryotheroperationinthisbookthatinvolvedcv2.TermCriteria.

NoteImportantnotesontraindataandANNslibraries

ExploringtheInternetforsources,IfoundanamazingarticlebyMichaelNielsenathttp://neuralnetworksanddeeplearning.com/chap1.html,whichillustrateshowtowriteanANNlibraryfromscratch,andthecodeforthislibraryisfreelyavailableonGitHubathttps://github.com/mnielsen/neural-networks-and-deep-learning;thisisthesourcecodeforabook,NeuralNetworksandDeepLearning,byMichaelNielsen.

Inthedatafolder,youwillfindapicklefile,signifyingdatathathasbeensavedtodiskthroughthepopularPythonlibrary,cPickle,whichmakesloadingandsavingthePythondataatrivialtask.

ThispicklefileisacPicklelibrary-serializedversionoftheMNISTdataand,asitissousefulandreadytoworkwith,Istronglysuggestyouusethat.NothingstopsyoufromloadingtheMNISTdatasetbuttheprocessofdeserializingthetrainingdataisquitetediousand—strictlyspeaking—outsidetheremitofthisbook.

Second,IwouldliketopointoutthatOpenCVisnottheonlyPythonlibrarythatallowsyoutouseANNs,notbyanystretchoftheimagination.TheWebisfullofalternativesthatIstronglyencourageyoutotryout,mostnotablyPyBrain,alibrarycalledLasagna(which—asanItalian—Ifindexceptionallyattractive)andmanycustom-writtenimplementations,suchastheaforementionedMichaelNielsen’simplementation.

Enoughintroductorydetails,though.Let’sgetgoing.

http://neuralnetworksanddeeplearning.com/chap1.html

https://github.com/mnielsen/neural-networks-and-deep-learning

Mini-librariesSettingupanANNinOpenCVisnotdifficult,butyouwillalmostdefinitelyfindyourselftrainingyournetworkcountlesstimes,insearchofthatelusivepercentagepointthatbooststheaccuracyofyourresults.

Toautomatethisasmuchaspossible,wewillbuildamini-librarythatwrapstheOpenCV’snativeimplementationofANNsandletsusrerunandretrainthenetworkeasily.

Here’sanexampleofawrapperlibrary:

importcv2

importcPickle

importnumpyasnp

importgzip

defload_data():

mnist=gzip.open('./data/mnist.pkl.gz','rb')

training_data,classification_data,test_data=cPickle.load(mnist)

mnist.close()

return(training_data,classification_data,test_data)

defwrap_data():

tr_d,va_d,te_d=load_data()

training_inputs=[np.reshape(x,(784,1))forxintr_d[0]]

training_results=[vectorized_result(y)foryintr_d[1]]

training_data=zip(training_inputs,training_results)

validation_inputs=[np.reshape(x,(784,1))forxinva_d[0]]

validation_data=zip(validation_inputs,va_d[1])

test_inputs=[np.reshape(x,(784,1))forxinte_d[0]]

test_data=zip(test_inputs,te_d[1])

return(training_data,validation_data,test_data)

defvectorized_result(j):

e=np.zeros((10,1))

e[j]=1.0

returne

defcreate_ANN(hidden=20):


ann.setLayerSizes(np.array([784,hidden,10]))

ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP)

ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)

ann.setTermCriteria((cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,

20,1))

returnann

deftrain(ann,samples=10000,epochs=1):

tr,val,test=wrap_data()

forxinxrange(epochs):

counter=0

forimgintr:

if(counter>samples):

break

if(counter%1000==0):

print"Epoch%d:Trained%d/%d"%(x,counter,samples)

counter+=1

data,digit=img

ann.train(np.array([data.ravel()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([digit.ravel()],dtype=np.float32))

print"Epoch%dcomplete"%x

returnann,test

deftest(ann,test_data):

sample=np.array(test_data[0][0].ravel(),dtype=np.float32).reshape(28,

28)

cv2.imshow("sample",sample)

cv2.waitKey()

printann.predict(np.array([test_data[0][0].ravel()],dtype=np.float32))

defpredict(ann,sample):

resized=sample.copy()

rows,cols=resized.shape

if(rows!=28orcols!=28)androws*cols>0:

resized=cv2.resize(resized,(28,28),interpolation=

cv2.INTER_CUBIC)

returnann.predict(np.array([resized.ravel()],dtype=np.float32))

Let’sexamineitinorder.First,theload_data,wrap_data,andvectorized_resultfunctionsareincludedinMichaelNielsen’scodeforloadingthepicklefile.

It’sarelativelystraightforwardloadingofapicklefile.Mostnotably,though,theloadeddatahasbeensplitintothetrainandtestdata.Bothtrainandtestdataarearrayscontainingtwo-elementtuples:thefirstoneisthedataitself;thesecondoneistheexpectedclassification.SowecanusethetraindatatotraintheANNandthetestdatatoevaluateitsaccuracy.

Thevectorized_resultfunctionisaverycleverfunctionthat—givenanexpectedclassification—createsa10-elementarrayofzeros,settingasingle1fortheexpectedresult.This10-elementarray,youmayhaveguessed,willbeusedasaclassificationfortheoutputlayer.

ThefirstANN-relatedfunctioniscreate_ANN:

defcreate_ANN(hidden=20):


ann.setLayerSizes(np.array([784,hidden,10]))

ann.setTrainMethod(cv2.ml.ANN_MLP_RPROP)

ann.setActivationFunction(cv2.ml.ANN_MLP_SIGMOID_SYM)

ann.setTermCriteria((cv2.TERM_CRITERIA_EPS|cv2.TERM_CRITERIA_COUNT,

20,1))

returnann

ThisfunctioncreatesanANNspecificallygearedtowardshandwrittendigitrecognitionwithMNIST,byspecifyinglayersizesasillustratedintheInitialparameterssection.

Wenowneedatrainingfunction:

deftrain(ann,samples=10000,epochs=1):

tr,val,test=wrap_data()

forxinxrange(epochs):

counter=0

forimgintr:

if(counter>samples):

break

if(counter%1000==0):

print"Epoch%d:Trained%d/%d"%(x,counter,samples)

counter+=1

data,digit=img

ann.train(np.array([data.ravel()],dtype=np.float32),

cv2.ml.ROW_SAMPLE,np.array([digit.ravel()],dtype=np.float32))

print"Epoch%dcomplete"%x

returnann,test

Again,thisisquitesimple:givenanumberofsamplesandtrainingepochs,weloadthedata,andtheniteratethroughthesamplesanx-number-of-epochstimes.

Theimportantsectionofthisfunctionisthedeconstructionofthesingletrainingrecordintothetraindataandanexpectedclassification,whichisthenpassedintotheANN.

Todoso,weutilizethenumpyarrayfunction,ravel(),whichtakesanarrayofanyshapeand“flattens”itintoasingle-rowarray.So,forexample,considerthisarray:

data=[[1,2,3],[4,5,6],[7,8,9]]

Theprecedingarrayonce“raveled”,becomesthefollowingarray:

[1,2,3,4,5,6,7,8,9]

ThisistheformatthatOpenCV’sANNexpectsdatatolooklikeinitstrain()method.

Finally,wereturnboththenetworkandtestdata.Wecouldhavejustreturnedthedata,buthavingthetestdataathandforaccuracycheckingisquiteuseful.

Thelastfunctionweneedisapredict()functiontowrapANN’sownpredict()method:

defpredict(ann,sample):

resized=sample.copy()

rows,cols=resized.shape

if(rows!=28orcols!=28)androws*cols>0:

resized=cv2.resize(resized,(28,28),interpolation=

cv2.INTER_CUBIC)

returnann.predict(np.array([resized.ravel()],dtype=np.float32))

ThisfunctiontakesanANNandasampleimage;itoperatesaminimumof“sanitization”bymakingsuretheshapeofthedataisasexpectedandresizingitifit’snot,andthenravelingitforasuccessfulprediction.

ThefileIcreatedalsocontainsatestfunctiontoverifythatthenetworkworksanditdisplaysthesampleprovidedforclassification.

ThemainfileThiswholechapterhasbeenanintroductoryjourneyleadingustothispoint.Infact,manyofthetechniqueswe’regoingtousearefrompreviouschapters,soinawaytheentirebookhasledustothispoint.Solet’sputallourknowledgetogooduse.

Let’stakeaninitiallookatthefile,andthendecomposeitforabetterunderstanding:

importcv2

importnumpyasnp

importdigits_annasANN

definside(r1,r2):

x1,y1,w1,h1=r1

x2,y2,w2,h2=r2

if(x1>x2)and(y1>y2)and(x1+w1<x2+w2)and(y1+h1<y2+h2):

returnTrue

else:

returnFalse

defwrap_digit(rect):

x,y,w,h=rect

padding=5

hcenter=x+w/2

vcenter=y+h/2

if(h>w):

w=h

x=hcenter-(w/2)

else:

h=w

y=vcenter-(h/2)

return(x-padding,y-padding,w+padding,h+padding)

ann,test_data=ANN.train(ANN.create_ANN(56),20000)


path="./images/numbers.jpg"

img=cv2.imread(path,cv2.IMREAD_UNCHANGED)

bw=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

bw=cv2.GaussianBlur(bw,(7,7),0)

ret,thbw=cv2.threshold(bw,127,255,cv2.THRESH_BINARY_INV)

thbw=cv2.erode(thbw,np.ones((2,2),np.uint8),iterations=2)

image,cntrs,hier=cv2.findContours(thbw.copy(),cv2.RETR_TREE,


rectangles=[]

forcincntrs:

r=x,y,w,h=cv2.boundingRect(c)

a=cv2.contourArea(c)

b=(img.shape[0]-3)*(img.shape[1]-3)

is_inside=False

forqinrectangles:

ifinside(r,q):

is_inside=True

break

ifnotis_inside:

ifnota==b:

rectangles.append(r)

forrinrectangles:

x,y,w,h=wrap_digit(r)


roi=thbw[y:y+h,x:x+w]

try:

digit_class=int(ANN.predict(ann,roi.copy())[0])

except:

continue

cv2.putText(img,"%d"%digit_class,(x,y-1),font,1,(0,255,0))

cv2.imshow("thbw",thbw)


cv2.imwrite("sample.jpg",img)

cv2.waitKey()

Aftertheinitialusualimports,weimportthemini-librarywecreated,whichisstoredindigits_ann.py.

Ifinditgoodpracticetodefinefunctionsatthetopofthefile,solet’sexaminethose.Theinside()functiondetermineswhetherarectangleisentirelycontainedinanotherrectangle:

definside(r1,r2):

x1,y1,w1,h1=r1

x2,y2,w2,h2=r2

if(x1>x2)and(y1>y2)and(x1+w1<x2+w2)and(y1+h1<y2+h2):

returnTrue

else:

returnFalse

Thewrap_digit()functiontakesarectanglethatsurroundsadigit,turnsitintoasquare,andcentersitonthedigititself,with5-pointpaddingtomakesurethedigitisentirelycontainedinit:

defwrap_digit(rect):

x,y,w,h=rect

padding=5

hcenter=x+w/2

vcenter=y+h/2

if(h>w):

w=h

x=hcenter-(w/2)

else:

h=w

y=vcenter-(h/2)

return(x-padding,y-padding,w+padding,h+padding)

Thepointofthisfunctionwillbecomeclearerlateron;let’snotdwellonittoomuchatthe

moment.

Now,let’screatethenetwork.Wewilluse58hiddennodes,andtrainover20,000samples:

ann,test_data=ANN.train(ANN.create_ANN(58),20000)

Thisisgoodenoughforapreliminarytesttokeepthetrainingtimedowntoaminuteortwo(dependingontheprocessingpowerofyourmachine).Theidealistousethefullsetoftrainingdata(50,000),anditeratethroughitseveraltimes,untilsomeconvergenceisreached(aswediscussedearlier,theaccuracy“peak”).Youwoulddothisbycallingthefollowingfunction:

ann,test_data=ANN.train(ANN.create_ANN(100),50000,30)

Wecannowpreparethedatatotest.Todothat,we’regoingtoloadanimage,andcleanupalittle:

path="./images/numbers.jpg"

img=cv2.imread(path,cv2.IMREAD_UNCHANGED)

bw=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

bw=cv2.GaussianBlur(bw,(7,7),0)

Nowthatwehaveagrayscalesmoothedimage,wecanapplyathresholdandsomemorphologyoperationstomakesurethenumbersareproperlystandingoutfromthebackgroundandrelativelycleanedupforirregularities,whichmightthrowthepredictionoperationoff:

ret,thbw=cv2.threshold(bw,127,255,cv2.THRESH_BINARY_INV)

thbw=cv2.erode(thbw,np.ones((2,2),np.uint8),iterations=2)

NoteNotethethresholdflag,whichisforaninversebinarythreshold:asthesamplesoftheMNISTdatabasearewhiteonblack(andnotblackonwhite),weturntheimageintoablackbackgroundwithwhitenumbers.

Afterthemorphologyoperation,weneedtoidentifyandseparateeachnumberinthepicture.Todothis,wefirstidentifythecontoursintheimage:

image,cntrs,hier=cv2.findContours(thbw.copy(),cv2.RETR_TREE,


Then,weiteratethroughthecontours,anddiscardalltherectanglesthatareentirelycontainedinotherrectangles;weonlyappendtothelistofgoodrectanglestheonesthatarenotcontainedinotherrectanglesandarealsonotaswideastheimageitself.Insomeofthetests,findContoursyieldedtheentireimageasacontouritself,whichmeantnootherrectanglepassedtheinsidetest:

rectangles=[]

forcincntrs:

r=x,y,w,h=cv2.boundingRect(c)

a=cv2.contourArea(c)

b=(img.shape[0]-3)*(img.shape[1]-3)

is_inside=False

forqinrectangles:

ifinside(r,q):

is_inside=True

break

ifnotis_inside:

ifnota==b:

rectangles.append(r)

Nowthatwehavealistofgoodrectangles,wecaniteratethroughthemanddefinearegionofinterestforeachoftherectanglesweidentified:

forrinrectangles:

x,y,w,h=wrap_digit(r)

Thisiswherethewrap_digit()functionwedefinedatthebeginningoftheprogramcomesintoplay:weneedtopassasquareregionofinteresttothepredictorfunction;ifwesimplyresizedarectangleintoasquare,we’druinourtestdata.

Youmaywonderwhythinkofthenumberone.Arectanglesurroundingthenumberonewouldbeverynarrow,especiallyifithasbeendrawnwithouttoomuchofaleantoeitherside.Ifyousimplyresizedittoasquare,youwould“fatten”thenumberoneinsuchawaythatnearlytheentiresquarewouldturnblack,renderingthepredictionimpossible.Instead,wewanttocreateasquarearoundtheidentifiednumber,whichisexactlywhatwrap_digit()does.

Thisapproachisquick-and-dirty;itallowsustodrawasquarearoundanumberandsimultaneouslypassthatsquareasaregionofinterestfortheprediction.Apurerapproachwouldbetotaketheoriginalrectangleand“center”itintoasquarenumpyarraywithrowsandcolumnsequaltothelargerofthetwodimensionsoftheoriginalrectangle.Thereasonforthisisyouwillnoticethatsomeofthesquarewillincludetinybitsofadjacentnumbers,whichcanthrowthepredictionoff.Withasquarecreatedfromanp.zeros()function,noimpuritieswillbeaccidentallydraggedin:


roi=thbw[y:y+h,x:x+w]

try:

digit_class=int(ANN.predict(ann,roi.copy())[0])

except:

continue

cv2.putText(img,"%d"%digit_class,(x,y-1),font,1,(0,255,0))

Oncethepredictionforthesquareregioniscomplete,wedrawitontheoriginalimage:

cv2.imshow("thbw",thbw)


cv2.imwrite("sample.jpg",img)

cv2.waitKey()

Andthat’sit!Thefinalresultwilllooksimilartothis:

PossibleimprovementsandpotentialapplicationsWehaveillustratedhowtobuildanANN,feedittrainingdata,anduseitforclassification.Thereareanumberofaspectswecanimprove,dependingonthetaskathand,andanumberofpotentialapplicationsofournew-foundknowledge.

ImprovementsThereareanumberofimprovementsthatcanbeappliedtothisapproach,someofwhichwehavealreadydiscussed:

Forexample,youcouldenlargeyourdatasetanditeratemoretimes,untilaperformancepeakisreachedYoucouldalsoexperimentwiththeseveralactivationfunctions(cv2.ml.ANN_MLP_SIGMOID_SYMisnottheonlyone;thereisalsocv2.ml.ANN_MLP_IDENTITYandcv2.ml.ANN_MLP_GAUSSIAN)Youcouldutilizedifferenttrainingflags(cv2.ml.ANN_MLP_UPDATE_WEIGHTS,cv2.ml.ANN_MLP_NO_INPUT_SCALE,cv2.ml.ANN_MLP_NO_OUTPUT_SCALE),andtrainingmethods(backpropagationorresilientbackpropagation)

Asidefromthat,bearinmindoneofthemantrasofsoftwaredevelopment:thereisnosinglebesttechnology,thereisonlythebesttoolforthejobathand.So,carefulanalysisoftheapplicationrequirementswillleadyoutothebestchoicesofparameters.Forexample,noteveryonedrawsdigitsthesameway.Infact,youwillevenfindthatsomecountriesdrawnumbersinaslightlydifferentway.

TheMNISTdatabasewascompiledintheUS,inwhichthenumbersevenisdrawnlikethecharacter7.Butyouwillfindthatthenumber7inEuropeisoftendrawnwithasmallhorizontallinehalfwaythroughthediagonalportionofthenumber,whichwasintroducedtodistinguishitfromthenumber1.

NoteForamoredetailedoverviewofregionalhandwritingvariations,checktheWikipediaarticleonthesubject,whichisagoodintroduction,availableathttps://en.wikipedia.org/wiki/Regional_handwriting_variation.

ThismeanstheMNISTdatabasehaslimitedaccuracywhenappliedtoEuropeanhandwriting;somenumberswillbeclassifiedmoreaccuratelythanothers.Soyoumayendupcreatingyourowndataset.Inalmostallcircumstances,itispreferabletoutilizethetraindatathat’srelevantandbelongstothecurrentapplicationdomain.

Finally,rememberthatonceyou’rehappywiththeaccuracyofyournetwork,youcanalwayssaveitandreloaditlater,soitcanbeutilizedinthird-partyapplicationswithouthavingtotraintheANNeverytime.

PotentialapplicationsTheprecedingprogramisonlythefoundationofahandwritingrecognitionapplication.Straightaway,youcanquicklyextendtheearlierapproachtovideosanddetecthandwrittendigitsinreal-time,oryoucouldtrainyourANNtorecognizetheentirealphabetforafull-blownOCRsystem.

Carregistrationplatedetectionseemslikeanobviousextensionofthelessonslearnedtothispoint,anditshouldbeaneveneasierdomaintoworkwith,asregistrationplatesuseconsistentcharacters.

https://en.wikipedia.org/wiki/Regional_handwriting_variation

Also,foryourownedificationorbusinesspurposes,youmaytrytobuildaclassifierwithANNsandplainSVMs(withfeaturedetectorssuchasSIFT)andseehowtheybenchmark.

SummaryInthischapter,wescratchedthesurfaceofthevastandfascinatingworldofANNs,focusingonOpenCV’simplementationofit.WelearnedaboutthestructureofANNs,andhowtodesignanetworktopologybasedonapplicationrequirements.

Finally,weutilizedvariousconceptsthatweexploredinthepreviouschapterstobuildahandwrittendigitrecognitionapplication.

Toboldlygo…IhopeyouenjoyedthejourneythroughthePythonbindingsforOpenCV3.AlthoughcoveringOpenCV3wouldtakeaseriesofbooks,weexploredveryfascinatingandfuturisticconcepts,andIencourageyoutogetintouchandletmeandtheOpenCVcommunityknowwhatyournextgroundbreakingcomputer-vision-basedprojectis!

IndexA

additiveabout/AquicknoteonBGR

applicationmodifying/Modifyingtheapplication

artificialneuralnetworks(ANN)about/Artificialneuralnetworksstatisticalmodel/Artificialneuralnetworksperceptrons/Neuronsandperceptronsstructure/ThestructureofanANNnetworklayers,byexample/NetworklayersbyexampleinOpenCV/ANNsinOpenCVANN-imalclassification/ANN-imalclassificationtrainingepochs/Trainingepochsdigitrecognition,handwritten/HandwrittendigitrecognitionwithANNshandwrittendigitrecognition/HandwrittendigitrecognitionwithANNsURL/Otherparametersimprovements/Improvementspotentialapplications/Potentialapplications

AT&TURL/Performingfacerecognition

Bbackgroundsubtractors

about/Backgroundsubtractors–KNN,MOG2,andGMGcode/Backtothecode

Bagofwords(BOW)about/Bag-of-words,BOWincomputervisionincomputervision/BOWincomputervisionkmeansclustering/Thek-meansclustering

BGRabout/AquicknoteonBGR

BinaryRobustIndependentElementaryFeatures(BRIEF)algorithm/BRIEFBlue-green-red(BGR)/Reading/writinganimagefileBRIEF/FeaturedetectionalgorithmsBrute-Forcematching/Brute-Forcematching

CcalcBackProjectfunction

about/ThecalcBackProjectfunctioncalcHistfunction/ThecalcHistfunctionCameo

about/ProjectCameo(facetrackingandimagemanipulation),Cameo–anobject-orienteddesignvideostreamabstracting,managers.CaptureManagerused/Abstractingavideostreamwithmanagers.CaptureManagerwindowabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManagerkeyboardabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManagerclass/Applyingeverythingwithcameo.Cameo

cameraframescapturing/Capturingcameraframesdisplaying,inwindow/Displayingcameraframesinawindow

CAMShiftabout/Backtothecode,CAMShift

Cannyused,foredgedetection/EdgedetectionwithCanny

cardetectionabout/Detectingcars,Whatdidwejustdo?slidingwindows/SVMandslidingwindowsSVM/SVMandslidingwindowsinscene/Example–cardetectioninascenetesting/Dude,where’smycar?

cardetection,inscenedetector.py,examining/Examiningdetector.pytrainingdata,associatingwithclasses/Associatingtrainingdatawithclasses

cascade/ConceptualizingHaarcascadeschannels

depthmap/Capturingframesfromadepthcamerapointcloudmap/Capturingframesfromadepthcameradisparitymap/Capturingframesfromadepthcameravaliddepthmask/Capturingframesfromadepthcamera

circledetectionabout/Lineandcircledetection,Circledetection

clustering/Thek-meansclusteringcolorhistograms

about/ColorhistogramscalcHist/ThecalcHistfunctioncalcBackProject/ThecalcBackProjectfunction

insummary/Insummarycolorspaces

converting,between/Convertingbetweendifferentcolorspacescomma-separatedvalue(CSV)file/Preparingthetrainingdatacontours

detection/Contourdetectionboundingbox/Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleminimumenclosingcircle/Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleminimumarearectangle/Contours–boundingbox,minimumarearectangle,andminimumenclosingcircleconvexcontours/Contours–convexcontoursandtheDouglas-Peuckeralgorithmdouglas-peuckeralgorithm/Contours–convexcontoursandtheDouglas-Peuckeralgorithm

Contribmodulesinstalling/InstallingtheContribmodules

convergenceURL/MeanshiftandCAMShift

convexcontours/Contours–convexcontoursandtheDouglas-Peuckeralgorithmconvolutionmatrix

about/Customkernels–gettingconvolutedcopyoperation

masking/Maskingacopyoperation

Ddepthcamera

modules,creating/Creatingmodulesframes,capturing/Capturingframesfromadepthcamera

depthestimationwithnormalcamera/Depthestimationwithanormalcamera

depthmapabout/Capturingframesfromadepthcamera

DifferenceofGaussians(DoG)/FeatureextractionanddescriptionusingDoGandSIFTDiscreteFourierTransform(DFT)/TheFourierTransformdisparitymap/Capturingframesfromadepthcamera

mask,creatingfrom/Creatingamaskfromadisparitymapdouglas-peuckeralgorithm/Contours–convexcontoursandtheDouglas-Peuckeralgorithm

Eedge

detecting/Edgedetectiondetection,withCanny/EdgedetectionwithCanny

Eigenfacesabout/RecognizingfacesURL/Recognizingfacesrecognition,performing/PerforminganEigenfacesrecognition

ExtendedYaleorYaleBURL/Performingfacerecognition

Ffacedetection

performing,OpneCVused/UsingOpenCVtoperformfacedetectionperforming,onstillimage/Performingfacedetectiononastillimageperforming,onvideo/Performingfacedetectiononavideoperforming/Performingfacerecognitiondata,generating/Generatingthedataforfacerecognitionfaces,recognizing/Recognizingfaces,Loadingthedataandrecognizingfacestrainingdata,preparing/Preparingthetrainingdatadata,loading/LoadingthedataandrecognizingfacesEigenfacesrecognition,performing/PerforminganEigenfacesrecognitionfacerecognition,performingwithFisherfaces/PerformingfacerecognitionwithFisherfacesfacerecognition,performingwithLBPH/PerformingfacerecognitionwithLBPHresults,discardingwithconfidencescore/Discardingresultswithconfidencescore

FAST/FeaturedetectionalgorithmsFastFourierTransform(FFT)/TheFourierTransformFastLibraryforApproximateNearestNeighbors(FLANN)/FLANN-basedmatching

matching,withhomography/FLANNmatchingwithhomographyfeaturedetectionalgorithms

about/Featuredetectionalgorithmsfeatures,defining/Definingfeaturescorners/Detectingfeatures–cornersDoGused/FeatureextractionanddescriptionusingDoGandSIFTSIFTused/FeatureextractionanddescriptionusingDoGandSIFTkeypoint,anatomy/AnatomyofakeypointFastHessianused/FeatureextractionanddetectionusingFastHessianandSURFSURFused/FeatureextractionanddetectionusingFastHessianandSURFORBfeaturedetection/ORBfeaturedetectionandfeaturematchingFeaturesfromAcceleratedSegmentTest(FAST)algorithm/FASTBinaryRobustIndependentElementaryFeatures(BRIEF)algorithm/BRIEFBrute-Forcematching/Brute-Forcematchingfeaturematching,withORB/FeaturematchingwithORBK-NearestNeighborsmatching,using/UsingK-NearestNeighborsmatchingFastLibraryforApproximateNearestNeighbors(FLANN)/FLANN-basedmatchingFLANNmatching,withhomography/FLANNmatchingwithhomography

features/ConceptualizingHaarcascadesFeaturesfromAcceleratedSegmentTest(FAST)algorithm/FAST

Fisherfacesabout/RecognizingfacesURL/Recognizingfacesfacerecognition,performingwith/PerformingfacerecognitionwithFisherfaces

FourierTransformabout/TheFourierTransformhighpassfilter(HPF)/Highpassfilterlowpassfilter(LPF)/Lowpassfilter

framescapturing,fromdepthcamera/Capturingframesfromadepthcamera

functionalprogramming(FP)solutions/Abriefdigression–functionalversusobject-orientedprogramming

Ggeometricmultigrid(GMG)/Backgroundsubtractors–KNN,MOG2,andGMGGrabCutalgorithm

used,forobjectsegmentation/ObjectsegmentationusingtheWatershedandGrabCutalgorithmsforegrounddetection,example/ExampleofforegrounddetectionwithGrabCut

HHaarcascades

conceptualizing/ConceptualizingHaarcascadesdata,getting/GettingHaarcascadedata

handwrittendigitrecognitionwithANNs/HandwrittendigitrecognitionwithANNsMNISTdatabase/MNIST–thehandwrittendigitdatabasetrainingdata,customized/Customizedtrainingdatainitialparameters/Theinitialparameterstrainingepochs/Trainingepochsotherparameters/Otherparametersmini-libraries/Mini-librariesmainfile/Themainfile

Harris/Featuredetectionalgorithmshiddenlayer

about/Thehiddenlayer,Thehiddenlayerhighpassfilter(HPF)/Highpassfilterhistogrambackprojection/ThecalcBackProjectfunctionHistogramofOrientedGradients(HOG)

about/HOGdescriptorsscaleissue/Thescaleissuelocationissue/Thelocationissueimagepyramid/Imagepyramidslidingwindows/Slidingwindowsnonmaximum(ornonmaxima)suppression/Non-maximum(ornon-maxima)suppressionsupportvectormachines/Supportvectormachines

Hue,Saturation,Value(HSV)about/Convertingbetweendifferentcolorspaces

II/Oscripts

basic/BasicI/Oscriptsimagefile,reading/Reading/writinganimagefileimagefile,writing/Reading/writinganimagefileimageandrawbytes,conversions/Convertingbetweenanimageandrawbytesimagedata,accessingwithnumpy.array/Accessingimagedatawithnumpy.arrayvideofile,writing/Reading/writingavideofilevideofile,reading/Reading/writingavideofilecameraframes,capturing/Capturingcameraframesimages,displayinginwindow/Displayingimagesinawindowcameraframes,displayinginwindow/Displayingcameraframesinawindow

imageabout/Convertingbetweenanimageandrawbytesdisplaying,inwindow/Displayingimagesinawindow

imagedataaccessing,numpy.arrayused/Accessingimagedatawithnumpy.array

imagedescriptorssaving,tofile/Savingimagedescriptorstofilematches,scanningfor/Scanningformatches

imagefilereading/Reading/writinganimagefilewriting/Reading/writinganimagefile

imagesegmentationwithWatershedalgorithm/ImagesegmentationwiththeWatershedalgorithm

inputlayer/Theinputlayer,Theinputlayerinsummary

about/Insummary

KK-NearestNeighbors(KNN)/UsingK-NearestNeighborsmatching,Backgroundsubtractors–KNN,MOG2,andGMGKalmanfilter

about/TheKalmanfilter,Wheredowegofromhere?predictphase/Predictandupdateupdatephase/Predictandupdateexample/Anexamplereal-lifeexample/Areal-lifeexample–trackingpedestriansPedestrianclass/ThePedestrianclassmainprogram/Themainprogram

Kalmanfilter,real-lifeexampleabout/Areal-lifeexample–trackingpedestriansapplicationworkflow/Theapplicationworkflowfunctional,versusobject-orientedprogramming/Abriefdigression–functionalversusobject-orientedprogramming

kernelsabout/Highpassfiltercustomkernels/Customkernels–gettingconvoluted

keyboardabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManager

kmeansclustering/Thek-meansclustering

LLasagna/Otherparameterslearningalgorithms

about/Thelearningalgorithmssupervisedlearning/Thelearningalgorithmsunsupervisedlearning/Thelearningalgorithmsreinforcementlearning/Thelearningalgorithms

linesdetectionabout/Lineandcircledetection,Linedetection

LocalBinaryPatternURL/Recognizingfaces

LocalBinaryPatternHistogramsfacerecognition,performingwith/PerformingfacerecognitionwithLBPH

LocalBinaryPatternHistograms(LBPH)about/Recognizingfaces

lowpassfilter(LPF)/Lowpassfilter

Mmagnitudespectrum/TheFourierTransformmask

creating,fromdisparitymap/CreatingamaskfromadisparitymapMeanshift

about/MeanshiftandCAMShiftmixtureofGaussians(MOG2)/Backgroundsubtractors–KNN,MOG2,andGMGMNIST

about/MNIST–thehandwrittendigitdatabaseURL/MNIST–thehandwrittendigitdatabase

modulescreating/Creatingmodules

multilayerperceptron/ANNsinOpenCV

Nnetworklayers

byexample/Networklayersbyexampleinputlayer/Theinputlayeroutputlayer/Theoutputlayerhiddenlayer/Thehiddenlayer

nonmaximum(ornonmaxima)suppression/Non-maximum(ornon-maxima)suppressionnonmaximumsuppression(NMS)/EdgedetectionwithCannynumpy.array

used,foraccessingimagedata/Accessingimagedatawithnumpy.array

Oobjectdetection

about/Objectdetectionandrecognitiontechniquesobjectdetection,techniques

HistogramofOrientedGradients(HOG)/HOGdescriptorspeopledetection/Peopledetectioncreating/Creatingandtraininganobjectdetectortraining/CreatingandtraininganobjectdetectorBagofwords(BOW)/Bag-of-words

objectsmovingobjects,detecting/Detectingmovingobjectsmotiondetection/Basicmotiondetection

objectsegmentationWatershedalgorithmused/ObjectsegmentationusingtheWatershedandGrabCutalgorithmsGrabCutalgorithmused/ObjectsegmentationusingtheWatershedandGrabCutalgorithms

OpenSourceComputerVision(OCV)building,fromsource/BuildingOpenCVfromasourcedocumentation,URL/Findingdocumentation,help,andupdatescommunity,URL/Findingdocumentation,help,andupdates

OrientedFASTandRotatedBRIEF(ORB)/Featuredetectionalgorithmsfeaturedetectionandfeaturematching/ORBfeaturedetectionandfeaturematching

OSXinstallationon/InstallingonOSXMacPorts,usingwithready-madepackages/UsingMacPortswithready-madepackagesMacPorts,usingwithowncustompackages/UsingMacPortswithyourowncustompackagesHomebrew,usingwithready-madepackages/UsingHomebrewwithready-madepackages(nosupportfordepthcameras)Homebrew,usingwithowncustompackages/UsingHomebrewwithyourowncustompackages

outputlayer/Theoutputlayer,Theoutputlayeroverfitting/Thehiddenlayer

PPedestrianclass

about/ThePedestrianclassperceptrons

about/Neuronsandperceptronspointcloudmap/CapturingframesfromadepthcameraPrincipalComponentAnalysis(PCA)

URL/RecognizingfacesPyBrain/Otherparameterspyramid/Imagepyramid

Rrawbytes

about/Convertingbetweenanimageandrawbytesred-green-blue(RGB)/Reading/writinganimagefileregionalhandwritingvariations

URL/Improvementsregionofinterest(ROI)/ThePedestrianclassregionsofinterests(ROI)/Accessingimagedatawithnumpy.arrayreinforcementlearning/Thelearningalgorithmsresilientbackpropagation(RPROP)/Otherparametersridge/Definingfeatures

Ssamples

running/RunningsamplesScale-InvariantFeatureTransform(SIFT)/FeatureextractionanddescriptionusingDoGandSIFTsemiglobalblockmatching/Depthestimationwithanormalcamerasetuptools

selecting/Choosingandusingtherightsetuptoolsinstallation,onWindows/InstallationonWindowsinstallation,onOSX/InstallingonOSXinstallation,onUbuntuandderivatives/InstallationonUbuntuanditsderivativesinstallation,onUnix-likesystems/InstallationonotherUnix-likesystems

shapesdetecting/Detectingshapes

SIFT/Featuredetectionalgorithmssigmoidfunction/Neuronsandperceptronsslidingwindows/SlidingwindowsStanfordUniversity

URL/Detectingcarsstatisticalmodel/Artificialneuralnetworks,ANNsinOpenCVStereoSGBMparameters

minDisparity/DepthestimationwithanormalcameranumDisparities/DepthestimationwithanormalcamerawindowSize/DepthestimationwithanormalcameraP1/DepthestimationwithanormalcameraP2/Depthestimationwithanormalcameradisp12MaxDiff/DepthestimationwithanormalcamerapreFilterCap/DepthestimationwithanormalcamerauniquenessRatio/DepthestimationwithanormalcameraspeckleWindowSize/DepthestimationwithanormalcameraspeckleRange/Depthestimationwithanormalcamera

subtractivecolormodelabout/AquicknoteonBGR

supervisedlearning/Thelearningalgorithmssupportvectormachines(SVM)/Non-maximum(ornon-maxima)suppression

URL/SupportvectormachinesSURF/Featuredetectionalgorithms

UUbuntu

repository,using/UsingtheUbunturepository(nosupportfordepthcameras)UniversityofIllinois

URL/DetectingcarsUnix-likesystems

installationon/InstallationonotherUnix-likesystemsunsupervisedlearning/Thelearningalgorithms

Vvaliddepthmask/CapturingframesfromadepthcameraVarianteAscari/FeatureextractionanddescriptionusingDoGandSIFTvideofile

reading/Reading/writingavideofilewriting/Reading/writingavideofile

videostreamabstracting,managers.CaptureManagerused/Abstractingavideostreamwithmanagers.CaptureManager

WWatershedalgorithm

used,forobjectsegmentation/ObjectsegmentationusingtheWatershedandGrabCutalgorithmsused,forimagesegmentation/ImagesegmentationwiththeWatershedalgorithm

windowabstracting,managers.WindowManagerused/Abstractingawindowandkeyboardwithmanagers.WindowManager

Windowsinstallationon/InstallationonWindowsbinaryinstallers,using/Usingbinaryinstallers(nosupportfordepthcameras)CMake,using/UsingCMakeandcompilerscompilers,using/UsingCMakeandcompilers

windowsize/ConceptualizingHaarcascades

YYalefacedatabase(Yalefaces)

URL/Performingfacerecognition

Learning OpenCV 3 Computer Vision with...

Documents

Transcript of Learning OpenCV 3 Computer Vision with...