Computer Vision Introduction One Lecture ( Short )

172
Computer vision, in one lecture Bill Freeman Electrical Engineering and Computer Science Dept. Massachuse<s Ins>tute of Technology April 21, 2010

Transcript of Computer Vision Introduction One Lecture ( Short )

Page 1: Computer Vision Introduction One Lecture ( Short )

Computervision,inonelecture

BillFreemanElectricalEngineeringandComputerScienceDept.

Massachuse<sIns>tuteofTechnologyApril21,2010

Page 2: Computer Vision Introduction One Lecture ( Short )

TheTaiyuanUniversityofTechnologyComputerCenterstaff,andme(1987)

Page 3: Computer Vision Introduction One Lecture ( Short )

Meandmywife,ridingfromtheForeigners’Cafeteria

Page 4: Computer Vision Introduction One Lecture ( Short )

Insidethecomputercenter,withtheimageprocessingequipment

Page 5: Computer Vision Introduction One Lecture ( Short )

WhileinChina,Ireadthisbook(tobere‐issuedbyMITPressthisyear),andgotveryexcitedaboutcomputervision.StudiedforPhDatMIT.

Page 6: Computer Vision Introduction One Lecture ( Short )

Goalofcomputervision

Marr:“Totellwhatiswherebylooking”.

Wantto:– Es>matetheshapesandproper>esofthings.– Recognizeobjects– Findandrecognizepeople– Findroadlanesandothercars– Helparobotwalk,navigate,orfly.–  Inspectformanufacturing

Page 7: Computer Vision Introduction One Lecture ( Short )

Somepar>culargoalsofcomputervision

•  Waveacameraaround,geta3‐dmodelout.•  Capturebodyposeofactordancing.•  Detectandrecognizefaces.•  Recognizeobjects.•  Trackpeopleorobjects

Page 8: Computer Vision Introduction One Lecture ( Short )

Let’sgobackin>me,tothemid‐1980’s

Page 9: Computer Vision Introduction One Lecture ( Short )

Whateveryonelookedlikebackthen

Page 10: Computer Vision Introduction One Lecture ( Short )

10

Features

•  Points

butalso,•  Lines•  Conics•  Otherfi<edcurves

Page 11: Computer Vision Introduction One Lecture ( Short )

11

Features“blocksworld”Atoyworldinwhichtostudyimageinterpreta>on.Allwehavetodoistoconvertrealworldimagestotheirblocksworldequivalentsandwe’reallset.

YvanLeclercandMar>nFischler,anop>miza>on‐basedapproachtothe

interpreta>onofsinglelinedrawingsas3‐dwire

frames.

Objects

Page 12: Computer Vision Introduction One Lecture ( Short )

12 Hu<enlocherandUllman,Objectrecogni>onusingalignment,ICCV,1986

Computervisionresearchresults,1986

Page 13: Computer Vision Introduction One Lecture ( Short )

13 FromRothwelletal,Efficientmodellibraryaccessbyprojec>velyinvariantindexingfunc>ons,CVPR1992.

6yearslater:Recognizingplanarobjectsusinginvariants.

Inputimage Edgepointsfi<edwithlinesorconics

Objectsthathavebeenrecognizedandverified.

Computervisionresearchresults,1992

Page 14: Computer Vision Introduction One Lecture ( Short )

Backtothepresent…

Page 15: Computer Vision Introduction One Lecture ( Short )

Companiesandapplica>ons

•  Cognex•  Poseidon•  Mobileye•  Eyetoy•  Iden>x•  Google•  Microsoh•  Facerecogni>onincameras

Page 16: Computer Vision Introduction One Lecture ( Short )
Page 17: Computer Vision Introduction One Lecture ( Short )
Page 18: Computer Vision Introduction One Lecture ( Short )
Page 19: Computer Vision Introduction One Lecture ( Short )
Page 20: Computer Vision Introduction One Lecture ( Short )

MobilEye

Page 21: Computer Vision Introduction One Lecture ( Short )
Page 22: Computer Vision Introduction One Lecture ( Short )
Page 23: Computer Vision Introduction One Lecture ( Short )
Page 24: Computer Vision Introduction One Lecture ( Short )

Google

Page 25: Computer Vision Introduction One Lecture ( Short )

Microsoh

Page 26: Computer Vision Introduction One Lecture ( Short )

Microsoh

Page 27: Computer Vision Introduction One Lecture ( Short )

Somepar>culargoalsofcomputervision(statusreport)

•  Waveacameraaround,geta3‐dmodelout(almost)

•  Capturebodyposeofactordancing.Usingmul>plecameras(pre<ywell),usingasinglecamera(notyet)

•  Detectandrecognizefaces.(frontal,yes)•  Recognizeobjects.(workingonit,lotsofprogress)•  Trackpeopleorobjects(overshort>mes)

Page 28: Computer Vision Introduction One Lecture ( Short )

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatabases

Page 29: Computer Vision Introduction One Lecture ( Short )

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatabases

Page 30: Computer Vision Introduction One Lecture ( Short )

BuildingaPanorama

M.BrownandD.G.Lowe.RecognisingPanoramas.ICCV2003

Page 31: Computer Vision Introduction One Lecture ( Short )

Howdowebuildapanorama?

•  Weneedtomatch(align)images

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

Page 32: Computer Vision Introduction One Lecture ( Short )

MatchingwithFeatures• Detectfeaturepointsinbothimages

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

Page 33: Computer Vision Introduction One Lecture ( Short )

MatchingwithFeatures• Detectfeaturepointsinbothimages

• Findcorrespondingpairs

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

Page 34: Computer Vision Introduction One Lecture ( Short )

MatchingwithFeatures• Detectfeaturepointsinbothimages

• Findcorrespondingpairs• Usethesepairstoalignimages‐weknowthis

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

Page 35: Computer Vision Introduction One Lecture ( Short )

MatchingwithFeatures

•  Problem1:– Detectthesamepointindependentlyinbothimages

nochancetomatch!

Weneedarepeatabledetector

counter‐example:

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

Page 36: Computer Vision Introduction One Lecture ( Short )

MatchingwithFeatures

•  Problem2:– Foreachpointcorrectlyrecognizethecorrespondingone

?

Weneedareliableanddis>nc>vedescriptor

h<p://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/InvariantFeatures.ppt

Page 37: Computer Vision Introduction One Lecture ( Short )

Overviewoffeaturedetec2onfor(instance)objectrecogni2on

Descriptor

detector location

Note:hereviewpointisdifferent,notpanorama(theyshowoff)

•  Detector:detectsamescenepointsindependentlyinbothimages

•  Descriptor:encodelocalneighboringwindow–  Notehowscale&rota>onofwindowarethesameinbothimage(butcomputedindependently)

•  Correspondence:findmostsimilardescriptorinotherimage

Page 38: Computer Vision Introduction One Lecture ( Short )

CVPR2003Tutorial

Recogni2onandMatchingBasedonLocalInvariant

Features

DavidLoweComputerScienceDepartment

UniversityofBri>shColumbia

h<p://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

Page 39: Computer Vision Introduction One Lecture ( Short )

InvariantLocalFeatures•  Imagecontentistransformedintolocalfeaturecoordinatesthatareinvarianttotransla>on,rota>on,scale,andotherimagingparameters

SIFT Features

Page 40: Computer Vision Introduction One Lecture ( Short )

Freemanetal,1998h<p://people.csail.mit.edu/billf/papers/cga1.pdf

Page 41: Computer Vision Introduction One Lecture ( Short )

Advantagesofinvariantlocalfeatures

•  Locality:featuresarelocal,sorobusttoocclusionandclu<er(nopriorsegmenta>on)

•  Dis2nc2veness:individualfeaturescanbematchedtoalargedatabaseofobjects

•  Quan2ty:manyfeaturescanbegeneratedforevensmallobjects

•  Efficiency:closetoreal‐>meperformance

•  Extensibility:caneasilybeextendedtowiderangeofdifferingfeaturetypes,witheachaddingrobustness

Page 42: Computer Vision Introduction One Lecture ( Short )

SIFTvectorforma2on•  Computedonrotatedandscaledversionofwindowaccordingtocomputedorienta>on&scale–  resamplea16x16versionofthewindow

•  BasedongradientsweightedbyaGaussianofvariancehalfthewindow(forsmoothfalloff)

Page 43: Computer Vision Introduction One Lecture ( Short )

SIFTvectorforma2on•  4x4arrayofgradientorienta>onhistograms

– notreallyhistogram,weightedbymagnitude•  8orienta>onsx4x4array=128dimensions•  Mo>va>on:somesensi>vitytospa>allayout,butnottoomuch.

showingonly2x2herebutis4x4

Page 44: Computer Vision Introduction One Lecture ( Short )

SIFTvectorforma2on•  Thresholdedimagegradientsaresampledover16x16arrayofloca>onsinscalespace

•  Createarrayoforienta>onhistograms

•  8orienta>onsx4x4histogramarray=128dimensions

showingonly2x2herebutis4x4

Page 45: Computer Vision Introduction One Lecture ( Short )

Ensuresmoothness•  Gaussianweight•  Trilinearinterpola>on

– agivengradientcontributesto8bins:4inspace>mes2inorienta>on

Page 46: Computer Vision Introduction One Lecture ( Short )

Reduceeffectofillumina2on•  128‐dimvectornormalizedto1

•  Thresholdgradientmagnitudestoavoidexcessiveinfluenceofhighgradients

– ahernormaliza>on,clampgradients>0.2–  renormalize

Page 47: Computer Vision Introduction One Lecture ( Short )

Featurestabilitytonoise•  Matchfeaturesaherrandomchangeinimagescale&orienta>on,withdifferinglevelsofimagenoise

•  Findnearestneighborindatabaseof30,000features

Page 48: Computer Vision Introduction One Lecture ( Short )

Featurestabilitytoaffinechange•  Matchfeaturesaherrandomchangeinimagescale&

orienta>on,with2%imagenoise,andaffinedistor>on

•  Findnearestneighborindatabaseof30,000features

Page 49: Computer Vision Introduction One Lecture ( Short )

Dis2nc2venessoffeatures•  Varysizeofdatabaseoffeatures,with30degreeaffinechange,2%imagenoise

•  Measure%correctforsinglenearestneighbormatch

Page 50: Computer Vision Introduction One Lecture ( Short )
Page 51: Computer Vision Introduction One Lecture ( Short )
Page 52: Computer Vision Introduction One Lecture ( Short )

Thesefeaturepointdetectorsanddescriptorsarethemostimportantrecentadvancein

computervisionandgraphics.

•  Featurepointsareusedalsofor:–  Imagealignment(homography,fundamentalmatrix)–  3Dreconstruc>on– Mo>ontracking–  Objectrecogni>on–  Indexinganddatabaseretrieval–  Robotnaviga>on–  …other

Page 53: Computer Vision Introduction One Lecture ( Short )

MoreusesforSIFTfeatures

SIFTfeatureshavealsobeenappliedto(categorical)objectrecogni>on

First,let’spresentvariousoftheissuesinobjectrecogni>on.

Page 54: Computer Vision Introduction One Lecture ( Short )

intra‐classvaria>on

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Page 55: Computer Vision Introduction One Lecture ( Short )

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Page 56: Computer Vision Introduction One Lecture ( Short )

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

– Appearanceonlyorloca>onandappearance

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Page 57: Computer Vision Introduction One Lecture ( Short )

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

– Appearanceonlyorloca>onandappearance

–  Invariances•  Viewpoint•  Illumina>on•  Occlusion•  Scale•  Deforma>on•  Clu<er•  etc.

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Page 58: Computer Vision Introduction One Lecture ( Short )

Objectrecogni2onissues

– Genera>ve/discrimina>ve/hybrid

– Appearanceonlyorloca>onandappearance

–  invariances– Partsorglobalw/sub‐window

– Usesetoffeaturesoreachpixelinimage

Slidefrom:LiFei‐Fei,RobFergusandAntonioTorralba,shortcourseonobjectrecogni>on,h<p://people.csail.mit.edu/torralba/shortCourseRLOC/

Page 59: Computer Vision Introduction One Lecture ( Short )

Currentapproachesinobjectrecogni>on

•  Bagofwords•  Boos>ng•  Labeltransfer

Page 60: Computer Vision Introduction One Lecture ( Short )

Visualwords

•  Vectorquan>zeSIFTdescriptorstoavocabularyof2or3thousand“visualwords”.

•  Heuris>cdesignofdescriptorsmakesthesewordssomewhatinvariantto:– Ligh>ng– 2‐dOrienta>on– 3‐dViewpoint

Page 61: Computer Vision Introduction One Lecture ( Short )

Comparewithobjectclassdatabase

Findwords

Formhistograms

Objectrecogni>onusingvisualwords

Page 62: Computer Vision Introduction One Lecture ( Short )

Manycombinatorialmatchingproblemstobesolvedforobjectrecogni>on.

Instancerecogni>on:withfeaturesallowedtoappearornotinboththetestandtrainingexamples.

Deformableobjectrecogni>on:somefeatureclustersmaintainspa>alcoherence,otherscanvary.

Categoryrecogni>on:eachclassdefinedbymanydifferenttrainingsetexemplars.Findtheclassthatbestexplainstheobservedfeatureset.

Semi‐supervisedobjectrecogni>on:observedtrainingsetfeaturesincludemanybackgroundobjectfeatures.

h<p://www‐cvr.ai.uiuc.edu/ponce_grp/publica>on/paper/cvpr06b.pdf

h<p://www.cs.utexas.edu/~grauman/research/projects/pmk/pmk_projectpage.htm

Page 63: Computer Vision Introduction One Lecture ( Short )

Caltech101

Page 64: Computer Vision Introduction One Lecture ( Short )

Caltech101resultsover>me

Page 65: Computer Vision Introduction One Lecture ( Short )

Problem:Categorylevelrecogni>onusingvisualwordsrepresenta>on.

Applica>ons:Objectrecogni>on.

References:Lazebnik,Schmid,andPonce,Beyondbagsoffeatures:Spa>alpyramidmatchingforrecognizingnaturalscenecategories,ComputerVisionandPa<ernRecogni>on(CVPR2006),h<p://www‐cvr.ai.uiuc.edu/ponce_grp/publica>on/paper/cvpr06b.pdf

K.GraumanandT.Darrell.UnsupervisedLearningofCategoriesfromSetsofPar>allyMatchingImageFeatures.InProceedingsoftheIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),NewYorkCity,NY,June2006,h<p://www.cs.utexas.edu/~grauman/papers/grauman_darrell_cvpr2006.pdf

Page 66: Computer Vision Introduction One Lecture ( Short )

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers—SVM’sandboos>ng

•  Bayesianmethods

•  Largedatabases

Page 67: Computer Vision Introduction One Lecture ( Short )

PaulViolaMichaelJ.JonesMitsubishiElectricResearchLaboratories(MERL)

Cambridge,MA

MostofthisworkwasdoneatCompaqCRLbeforetheauthorsmovedtoMERL

Rapid Object Detection Using a Boosted Cascade of Simple Features

h<p://citeseer.ist.psu.edu/cache/papers/cs/23183/h<p:zSzzSzwww.ai.mit.eduzSzpeoplezSzviolazSzresearchzSzpublica>onszSzICCV01‐Viola‐Jones.pdf/viola01robust.pdf

Manuscriptavailableonweb:

Page 68: Computer Vision Introduction One Lecture ( Short )

Viola‐Jonesapproach

•  Largefeatureset(…ishugeabout16,000,000features)

•  Efficientfeatureselec>onusingAdaBoost

•  CascadedClassifierforrapiddetec>on– HierarchyofA<en>onalFilters

The combination of these ideas yields the fastest known face detector for gray scale images.

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

Page 69: Computer Vision Introduction One Lecture ( Short )

ImageFeatures

“Rectangle filters”

Similar to Haar wavelets

Differences between sums of pixels in adjacent rectangles

{ ht(x) = +1 if ft(x) > θt -1 otherwise Unique Features

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

Page 70: Computer Vision Introduction One Lecture ( Short )

Huge“Library”ofFilters

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

Page 71: Computer Vision Introduction One Lecture ( Short )

IntegralImage

•  DefinetheIntegralImage

•  Anyrectangularsumcanbecomputedinconstant>me:

•  Rectanglefeaturescanbecomputedasdifferencesbetweenrectangles

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 72: Computer Vision Introduction One Lecture ( Short )

Construc>ngclassifiersbycombiningfilteroutputs

•  Perceptronyieldsasufficientlypowerfulclassifier

•  UseAdaBoosttoefficientlychoosebestfeatures

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

Page 73: Computer Vision Introduction One Lecture ( Short )

AdaBoost Ini>aluniformweightontrainingexamples

weakclassifier1

weakclassifier2

Incorrectclassifica2onsre‐weightedmoreheavily

weakclassifier3

Finalclassifierisweightedcombina2onofweakclassifiers

(Freund&Shapire’95)

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 74: Computer Vision Introduction One Lecture ( Short )

Ada‐BoostTutorial

•  GivenaWeaklearningalgorithm–  Learnertakesatrainingsetandreturnsthebestclassifierfromaweakconceptspace

•  requiredtohaveerror<50%

•  Star>ngwithaTrainingSet(ini>alweights1/n)– Weaklearningalgorithmreturnsaclassifier–  Reweighttheexamples

•  Weightoncorrectexamplesisdecreased•  Weightonerrorsisdecreased

•  FinalclassifierisaweightedmajorityofWeakClassifiers– Weakclassifierswithlowerrorgetlargerweight

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 75: Computer Vision Introduction One Lecture ( Short )

ReviewofAdaBoost(Freund&Shapire95)

• Givenexamples(x1,y1),…,(xN,yN)whereyi=0,1fornega>veandposi>veexamplesrespec>vely.• Ini>alizeweightswt=1,i=1/N

• Fort=1,…,T• Normalizetheweights,wt,i=wt,i/Σwt,j

• Findaweaklearner,i.e.ahypothesis,ht(x)withweightederrorlessthan.5• Calculatetheerrorofht:et=Σwt,i|ht(xi)–yi|

• Updatetheweights:wt,i=wt,iBt(1‐di)whereBt=et/(1‐et)anddi=0ifexamplexiisclassifiedcorrectly,di=1otherwise.

• Thefinalstrongclassifieris

whereαt=log(1/Bt)

j=1

N

1if Σ αtht(x)> 0.5Σ αt

0otherwise

T

t=1 t=1

T

{h(x)=

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 76: Computer Vision Introduction One Lecture ( Short )

ExampleClassifierforFaceDetec>on

ROC curve for 200 feature classifier

One stage: a classifier with 200 rectangle features was learned using AdaBoost

95% correct detection on test set with 1 in 14084 false positives.

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

Page 77: Computer Vision Introduction One Lecture ( Short )

Developfast,accurateclassifierusingacascade

•  Givenanestedsetofclassifierhypothesisclasses

•  Computa>onalRiskMinimiza>on

vsfalsenegdeterminedby

%FalsePos

%Detec>o

n

050

5099

FACEIMAGESUB‐WINDOW

Classifier1

F

T

NON‐FACE

Classifier3T

F

NON‐FACE

F

T

NON‐FACE

Classifier2T

F

NON‐FACE

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 78: Computer Vision Introduction One Lecture ( Short )

Experiment:SimpleCascadedClassifier

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 79: Computer Vision Introduction One Lecture ( Short )

CascadedClassifier

1Feature 5Features

F

50%20Features

20% 2%

FACE

NON‐FACE

F

NON‐FACE

F

NON‐FACE

IMAGESUB‐WINDOW

•  A1featureclassifierachieves100%detec>onrateandabout50%falseposi>verate.

•  A5featureclassifierachieves100%detec>onrateand40%falseposi>verate(20%cumula>ve)–  usingdatafrompreviousstage.

•  A20featureclassifierachieve100%detec>onratewith10%falseposi>verate(2%cumula>ve)

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 80: Computer Vision Introduction One Lecture ( Short )

AReal‐>meFaceDetec>onSystem

Trainingfaces:4916faceimages(24x24pixels)plusver>calflipsforatotalof9832faces

Trainingnon‐faces:350millionsub‐windowsfrom9500non‐faceimages

Finaldetector:38layercascadedclassifierThenumberoffeaturesperlayerwas1,10,25,25,50,50,50,75,100,…,200,…

Finalclassifiercontains6061features.ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 81: Computer Vision Introduction One Lecture ( Short )

AccuracyofFaceDetector

Performance on MIT+CMU test set containing 130 images with 507 faces and about 75 million sub-windows.

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 82: Computer Vision Introduction One Lecture ( Short )

ComparisontoOtherSystems

10 31 50 65 78 95 110 167

Viola-Jones 76.1 88.4 91.4 92.0 92.1 92.9 93.1 93.9

Viola-Jones (voting)

81.1 89.7 92.1 93.1 93.1 93.2 93.7 93.7

Rowley-Baluja-Kanade

83.2 86.0 89.2 90.1

Schneiderman-Kanade

94.4

Detector

False Detections

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 83: Computer Vision Introduction One Lecture ( Short )

SpeedofFaceDetector

Speed is proportional to the average number of features computed per sub-window.

On the MIT+CMU test set, an average of 9 features out of a total of 6061 are computed per sub-window.

On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).

Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 84: Computer Vision Introduction One Lecture ( Short )

OutputofFaceDetectoronTestImages

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 85: Computer Vision Introduction One Lecture ( Short )

MoreExamples

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 86: Computer Vision Introduction One Lecture ( Short )

Conclusions

•  We[they]havedevelopedthefastestknownfacedetectorforgrayscaleimages

•  Threecontribu>onswithbroadapplicability– Cascadedclassifieryieldsrapidclassifica>on– AdaBoostasanextremelyefficientfeatureselector

– RectangleFeatures+IntegralImagecanbeusedforrapidimageanalysis

ViolaandJones,Robustobjectdetec>onusingaboostedcascadeofsimplefeatures,CVPR2001

Page 87: Computer Vision Introduction One Lecture ( Short )

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatabases

Page 88: Computer Vision Introduction One Lecture ( Short )

Trackingahumanin3D

Page 89: Computer Vision Introduction One Lecture ( Short )

The appearance of people can vary dramatically.

Page 90: Computer Vision Introduction One Lecture ( Short )

People can appear in arbitrary poses.

Structure is unobservable—inference from visible parts.

Page 91: Computer Vision Introduction One Lecture ( Short )

Geometrically under-constrained.

Page 92: Computer Vision Introduction One Lecture ( Short )

Butthisrequiresthatweusemarkers,whichwedon’twant,andalsorequiresmul>plecameras.

http://www.vicon.com/animation/

Page 93: Computer Vision Introduction One Lecture ( Short )

State of the Art.

•  Brightnessconstancycue–  Insensi>vetoappearance

•  Full‐bodyrequiredmul>plecameras

•  Singlehypothesis

Page 94: Computer Vision Introduction One Lecture ( Short )

State of the Art.

I(x, t) = I(x+u, 0) + η

•  Singlecamera,mul>plehypotheses•  2Dtemplates(nodrihbutviewdependent)

Page 95: Computer Vision Introduction One Lecture ( Short )

State of the Art.

•  Mul>plehypotheses

•  Mul>plecameras

•  Simplifiedclothing,ligh>ngandbackground

Page 96: Computer Vision Introduction One Lecture ( Short )

* No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment

Task: Infer 3D human motion from 2D image

Page 97: Computer Vision Introduction One Lecture ( Short )

p(model | cues) = p(cues | model) p(model)

3. Posterior probability: Need an effective way to explore the model space (very high dimensional) and represent ambiguities.

p(cues)

1.  Need a constraining likelihood model that is also invariant to variations in human appearance.

2. Need a prior model of how people move.

Page 98: Computer Vision Introduction One Lecture ( Short )

Systemcomponentsforhumanbodytracking

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).•  Modelsforhumanappearance(likelihoodterm).

Page 99: Computer Vision Introduction One Lecture ( Short )

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).

•  Modelsforhumanappearance(likelihoodterm).

Page 100: Computer Vision Introduction One Lecture ( Short )

* Limbs are truncated cones * Parameter vector of joint angles and angular velocities = φ

Page 101: Computer Vision Introduction One Lecture ( Short )

•  Posteriordistribu>onovermodelparametersohenmul>‐modal(duetoambigui>es)

•  Representwholedistribu>on:–  sampledrepresenta>on–  eachsampleisapose–  predictover>meusingapar>clefilteringapproach

•  IsardandBlake,1998,“Condensa>onAlgorithm”

Page 102: Computer Vision Introduction One Lecture ( Short )

Posterior Temporal dynamics

Likelihood Posterior

Giventhedatasofar,whatdoIthinkisthesetofpossiblestatesthebodycouldbein?

Whatcouldeachofthosestatesbecomeatthenext>mestep?(Usespriormodelforhumanmo>on).

Howmuchiseachofthosepossiblestatessupportedbythevisualdataatthenext>mestep?

Updatees>mateofpossiblestates,giventhevisualdata.

Page 103: Computer Vision Introduction One Lecture ( Short )

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).•  Modelsforhumanappearance(likelihoodterm).

Page 104: Computer Vision Introduction One Lecture ( Short )

•  Onlyhandlespeoplewalking.•  Verypowerfulconstraintonhumanmo>on.

Page 105: Computer Vision Introduction One Lecture ( Short )

•  Ac>on‐specificmodel‐Walking– Trainingdata:3Dmo>oncapturedata

– Fromtrainingset,learnmeancycleandcommonmodesofdevia>on(PCA)

Mean cycle Small noise Large noise

Page 106: Computer Vision Introduction One Lecture ( Short )

Initialize to figure, then let go…

Page 107: Computer Vision Introduction One Lecture ( Short )

•  Representa>onforprobabilis>canalysis.•  Modelsforhumanmo>on(priorterm).•  Modelsforhumanappearance(likelihoodterm).

Page 108: Computer Vision Introduction One Lecture ( Short )

Changing background

Low contrast limb boundaries

Occlusion

Varying shadows

Deforming clothing

What do non-people look like?

What do people look like?

Page 109: Computer Vision Introduction One Lecture ( Short )

(5000 samples in each example)

Page 110: Computer Vision Introduction One Lecture ( Short )

Edge cues

Page 111: Computer Vision Introduction One Lecture ( Short )

Ridge cues

Page 112: Computer Vision Introduction One Lecture ( Short )

Flow cues

Page 113: Computer Vision Introduction One Lecture ( Short )

Edge cues

Ridge cues

Flow cues

Page 114: Computer Vision Introduction One Lecture ( Short )

Walking model

2500 samples ~10 min/frame

Page 115: Computer Vision Introduction One Lecture ( Short )

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Largedatasets•  Miscellaneousadvances:exploi>ngcontext

Page 116: Computer Vision Introduction One Lecture ( Short )

Images by Antonio Torralba

Useofcontextforobjectdetec>on

car pedestrian

Identical local image features!

Page 117: Computer Vision Introduction One Lecture ( Short )

Contextspeedsobjectdetec>on:thisiswhattheworldlooksliketoafacedetectorthatdoesn’ttakeadvantageofcontext.Canyoufindthe

face?

AntonioTorralba

Page 118: Computer Vision Introduction One Lecture ( Short )
Page 119: Computer Vision Introduction One Lecture ( Short )
Page 120: Computer Vision Introduction One Lecture ( Short )

Thebestobjectdetec>onalgorithmscombinetop‐down(context)withbo<om‐up(localfeatures)cues.

Thetop‐downinforma>oncanhelpsuppressfalsedetec>onscausedbyambiguouslocalinforma>on.

Page 121: Computer Vision Introduction One Lecture ( Short )

Featurevectorforanimage:the“gist”ofthescene

–  Compute 12 x 30 = 360 dim. feature vector –  Or use steerable filter bank, 6 orientations, 4 scales, averaged

over 4x4 regions = 384 dim. feature vector –  Reduce to ~ 80 dimensions using PCA

Oliva & Torralba, IJCV 2001

Page 122: Computer Vision Introduction One Lecture ( Short )

Low‐dimensionalrepresenta>onforimagecontext

Images

Random noise filtered to have the

same 80-dimensional

representation as the images above.

Page 123: Computer Vision Introduction One Lecture ( Short )

“gist”usefulforobjectpriming

Page 124: Computer Vision Introduction One Lecture ( Short )

Examplesoflearnedfeaturesforbo<om‐updetec>on:applythefiltershownattoprowsandaveragethesquaredoutputover

regionsshowninbo<omrows.

Page 125: Computer Vision Introduction One Lecture ( Short )

Theadvantageofcontextinobjectdetec>onFor each type of object, we plot the single most probable detection if it is above a threshold (set to give 80% detection rate)

If we know we are in a street, we can prune false positives such as chair and coffee-machine (which are hard to detect, and hence must have low thresholds to get 80% hit rate)

Objectdetec>onswithoutcontext:notefalsealarms

Objectdetec>onsahersuppressionoffalsedetec>onsusingcontext

Page 126: Computer Vision Introduction One Lecture ( Short )

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Large,labeleddatasets.

Page 127: Computer Vision Introduction One Lecture ( Short )

Acorrespondence‐basedapproachtosceneparsing

Givenanimage

–  Findanotherannotatedimagewithsimilarscene

–  Findcorrespondencebetweenthesetwoimages

– Warptheannota>onaccordingtothecorrespondence

tree

sky

road

field

car

unlabeled

building

window

Input Support

Userannota>onWarpedannota>onDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Page 128: Computer Vision Introduction One Lecture ( Short )

Systemoverview

Flow visualization code

Query

RGB SIFT

RGB SIFT Annota>onSIFTflow

Nearestneighbors

tree

sky

road

field

car

unlabeledDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Page 129: Computer Vision Introduction One Lecture ( Short )

Systemoverview

Flow visualization code

SIFTflow RGB SIFT Annota>on

Warpednearestneighbors

Query

RGB SIFT Parsing Groundtruth

tree

sky

road

field

car

unlabeledDensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Page 130: Computer Vision Introduction One Lecture ( Short )

Sceneparsingresults(1)

Query Bestmatch Annota>onofbestmatch

Warpedbestmatchtoquery

Parsingresult Groundtruth

Page 131: Computer Vision Introduction One Lecture ( Short )

Sceneparsingresults(2)

Query Bestmatch Annota>onofbestmatch

Warpedbestmatchtoquery

Parsingresult Groundtruth

Page 132: Computer Vision Introduction One Lecture ( Short )

Pixel‐wiseperformance

Oursystemop>mizedparameters

Per‐pixelrate74.75%

Pixel‐wisefrequencycountofeachclass

DensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Page 133: Computer Vision Introduction One Lecture ( Short )

Comparison

J.Sho<onetal.Textonboost:Jointappearance,shapeandcontextmodelingformul>‐classobjectrecogni>onandsegmenta>on.ECCV,2006

(a)Oursystemop>mizedparameters

74.75%

(b)OursystemNoMarkovrandomfield

66.24%

(c)Sho<onetal.NoMarkovrandomfield

51.67%

(d)OursystemMatchingcolorinsteadofSIFT

49.68%

Page 134: Computer Vision Introduction One Lecture ( Short )

Comparisonforeachclass

•  Weconvertoursystemtoabinarydetectorforeachclassandcompareitwith[Dalal&Triggs.CVPR2005]

•  InROC,oursystem(red)outperformstheirs(blue)formostoftheclasses

DensescenealignmentusingSIFTFlowforobjectrecogni>onC.Liu,J.Yuen,A.TorralbaIEEEConferenceonComputerVisionandPa<ernRecogni>on(CVPR),2009.

Page 135: Computer Vision Introduction One Lecture ( Short )

Whathasallowedustomakeprogress?

•  SIFTfeatures•  Discrimina>veclassifiers

•  Bayesianmethods

•  Non‐parametricmethods

Page 136: Computer Vision Introduction One Lecture ( Short )
Page 137: Computer Vision Introduction One Lecture ( Short )

Algorithm

– Picksizeofblockandsizeofoverlap– Synthesizeblocksinrasterorder

– Searchinputtextureforblockthatsa>sfiesoverlapconstraints(aboveandleh)•  Easytoop>mizeusingNNsearch[Lianget.al.,’01]

– Pastenewblockintoresul>ngtexture•  usedynamicprogrammingtocomputeminimalerrorboundarycut

Page 138: Computer Vision Introduction One Lecture ( Short )
Page 139: Computer Vision Introduction One Lecture ( Short )
Page 140: Computer Vision Introduction One Lecture ( Short )
Page 141: Computer Vision Introduction One Lecture ( Short )

Problem:Howtoconstructandmanageanon‐parametricsignalprior?Howselecttheexemplarstouse,howquicklyfindnearestneighbormatches?

Applica>ons:Low‐levelvision:noiseremoval,super‐resolu>on,filling‐in,texture

synthesis.

References:W.T.Freeman,E.C.Pasztor,O.T.CarmichaelLearningLow‐LevelVisionInterna>onalJournalofComputerVision,40(1),pp.25‐47,2000.h<p://www.merl.com/reports/docs/TR2000‐05.pdf

AlexeiA.EfrosandThomasK.Leung,TextureSynthesisbyNon‐parametricSampling,IEEEInterna>onalConferenceonComputerVision(ICCV'99),Corfu,Greece,September1999,h<p://graphics.cs.cmu.edu/people/efros/research/NPS/efros‐iccv99.pdf

Page 142: Computer Vision Introduction One Lecture ( Short )

2009BIRSWorkshoponComputerVisionandtheInternet

Page 143: Computer Vision Introduction One Lecture ( Short )

RobFergus

RickSzeliski

LanaLazebnik

Page 144: Computer Vision Introduction One Lecture ( Short )

Nearestneighborsearchinhighdimensions

Nearestneighborsinhigh‐dimensions.categoryrecogni>on.forinstancerecogni>on,nnforindividualfeaturesworksfine.butforcategoryrecogni>on,many>mesthelocalfeaturesarenot,bythemselves,aclosematch,duetowithin‐classvaria>ons.

Nearestneighborsearch,buttakingintoaccountarepar>culardata.or,telluswhatques>onsweshouldbeaskingaboutourdatainordertodonearestneighborsearchwell.

onthelargedatabaseside:howstorememories,concepts,objectsinverylargedatabases?Largedatabaseissues.mul>dimensional:kdtree(butonlyupto20dims)findingsimilarthingsinveryhighdimensions.

Parallelism‐‐wherecanweexploitit?kdtreehighdsearch.DoesLSHworkasadver>sed?inprac>cenotaswell.

Page 145: Computer Vision Introduction One Lecture ( Short )

Problem:Nearestneighborsearchinhighdimensions.

Applica>ons:Non‐parametrictexturesynthesisandsuper‐resolu>on.Imagefilling‐in.Objectrecogni>on.Scenerecogni>on.

References:(ManyinCSliterature,LSH,etc.)

PatchMatch:ARandomizedCorrespondenceAlgorithmforStructuralImageEdi>ngACMTransac>onsonGraphics(Proc.SIGGRAPH),August2009ConnellyBarnes,EliShechtman,AdamFinkelstein,DanBGoldman,h<p://www.cs.princeton.edu/gfx/pubs/Barnes_2009_PAR/patchmatch.pdf

Page 146: Computer Vision Introduction One Lecture ( Short )

ShaiAvidan

Page 147: Computer Vision Introduction One Lecture ( Short )

Blindvision

Page 148: Computer Vision Introduction One Lecture ( Short )

Problem:Developsecuremul>‐partytechniquesforvisionalgorithms.

Applica>ons:Secure,distributedimageanalysis.

References:

S.AvidanandM.ButmanBlindVisionEuropeanConferenceonComputerVision(ECCV),Graz,Austria,2006.h<p://www.merl.com/reports/docs/TR2006‐006.pdf

Paperabstract:Alicewouldliketodetectfacesinacollec>onofsensi>vesurveillanceimagessheown.Bobhasafacedetec>onalgorithmthatheiswillingtoletAliceuse,forafee,aslongasshelearnsnothingabouthisdetector.AliceiswillingtouseBob´sdetectorprovidedthathewilllearnnothingaboutherimages,noteventheresultofthefacedetec>onopera>on.Blindvisionisaboutapplyingsecuremul>‐partytechniquestovisionalgorithmssothatBobwilllearnnothingabouttheimagesheoperateson,noteventheresultofhisownopera>onandAlicewilllearnnothingaboutthedetector.Theprolifera>onofsurveillancecamerasraisesprivacyconcernsthatcanbeaddressedbysecuremul>‐partytechniquesandtheiradapta>ontovisionalgorithms.

Page 149: Computer Vision Introduction One Lecture ( Short )

DevaRamanan

Page 150: Computer Vision Introduction One Lecture ( Short )

Evaluateeasilyoverapowersetofallsegmenta>ons.

DevaRamanan:wantsafastandefficientwaytosearchoverallpossiblesegmenta>onsofanimage,scoringeachoneagainstsomemodel.

h<p://www.di.ens.fr/~russell/papers/Russell06.pdf

Page 151: Computer Vision Introduction One Lecture ( Short )

Problem:Evaluatesomesegmenta>on‐dependentfunc>onover(someapproxima>onto)allpossiblesegmenta>ons.Note:differentthanbo<om‐upsegmenta>on,whichIwouldnotrecommendasa

researchproject.

Applica>ons:Imageunderstanding.

References:Deva’shomepage:h<p://www.ics.uci.edu/~dramanan/

UsingMul>pleSegmenta>onstoDiscoverObjectsandtheirExtentinImageCollec>ons,BryanRussell,AlexeiA.Efros,JosefSivic,BillFreeman,AndrewZissermaninCVPR2006,h<p://people.csail.mit.edu/brussell/research/proj/mult_seg_discovery/index.html

Page 152: Computer Vision Introduction One Lecture ( Short )

AlyoshaEfros

Page 153: Computer Vision Introduction One Lecture ( Short )

Efroscomments

Alyosha:non‐booleanretrievaloflargedataset.ie,it'snotlogicalopera>onswewannaretreive,butrealvaluednumbers.

alyosha:theneedleinthehaystackproblem.findsignalclusters/characteris>cswhenthere'slotsofnoise.findthepa<erns,ignorethenoise.seethepictureofthe4ofuswithhatsanddeterminethathatsarewhat'sincommon.

alyosha:weneedtofindsomethingnewtogeneralizefromgraphicalmodels.thoseweregoodfortoyproblemswheretherewerelotsofcondi>onalindependencies.Butnowwedon'thavethat.wantsomeothermodel.somethingthatprovidestheabstrac>on,maybe,thatonlyafewofthesecondi>onalindependenciesareac>veatanyone>me(likesparsecoding).sortofsimilartohigherordercliques.

Page 154: Computer Vision Introduction One Lecture ( Short )

DavidLowe

Page 155: Computer Vision Introduction One Lecture ( Short )

DavidLowe

needbe<erfeatures.anar>stcandrawthenendofanelephant'strunk,andyouknowimmediatelywhatitis.butourfeaturesdon'tcapturethatsimilarityatall.

learningoffeaturesfromimages.whatisanaturalencodingofimages?asawarningforwhatapproachnottotake:don'tbotherlearningtransla>oninvariance,orrota>oninvariance.soali<lebitofsupervisionisok.

Page 156: Computer Vision Introduction One Lecture ( Short )

Computervisionacademicculture

Nomore“ifonly”papers

End‐to‐endempiricalorienta>onThereisacertainoverheadincominguptospeedonthefiltersandrepresenta>ons.Needdatasetvalida>onThecompe>>veconferenceshave20‐25%acceptancerate.Otherconferenceshaveli<leimpact.Thecompe>>veconferences:CVPR,ICCV,ECCV,NIPS.

Thus:besttocollaborate.

Page 157: Computer Vision Introduction One Lecture ( Short )

PeopleatMITtoworkwith

EdwardAdelson—BrainandCogni>veSciences,materialpercep>oninhumansandmachines;mul>‐resolu>onimagerepresenta>ons.

FredoDurand—EECS,computa>onalphotography,computergraphics.BillFreeman—EECS,computa>onalphotography,computervision.JohnFisher—CSAIL,machinelearning,computervision.PolinaGolland—EECS,medicalapplica>ons.EricGrimson—EECS,surveillance,medicalapplica>ons.BertholdHorn—EECS,computedimaging.TommyPoggio—BrainandCogni>veSciences,machinelearning,

computervision,inspiredbyandmodelinghumanvision.RameshRaskar—MediaLab,computa>onalphotography.AntonioTorralba—EECS,objectrecogni>on,sceneinterpreta>on.

Page 158: Computer Vision Introduction One Lecture ( Short )

Acomputergraphicsapplica>onofnearest‐neighborfindinginhighdimensions

Page 159: Computer Vision Introduction One Lecture ( Short )

Theimagedatabase

•  Wehavecollected~6millionimagesfromFlickrbasedonkeywordandgroupsearches

–  typicalimagesizeis500x375pixels– 720GBofdiskspace(jpegcompressed)

Page 160: Computer Vision Introduction One Lecture ( Short )

Imagerepresenta>on

Color layout

GIST [Oliva and Torralba’01]

Original image

Page 161: Computer Vision Introduction One Lecture ( Short )

Obtainingseman>callycoherentthemesWe further break-up the collection into themes of semantically coherent scenes:

Train SVM-based classifiers from 1-2k training images [Oliva and Torralba, 2001]

Page 162: Computer Vision Introduction One Lecture ( Short )

Basiccameramo>ons

Forward motion Camera rotation Camera pan

Starting from a single image, find a sequence of images to simulate a camera motion:

Page 163: Computer Vision Introduction One Lecture ( Short )

3. Find a match to fill the missing pixels

Scene matching with camera view transformations: Translation

1. Move camera

2. View from the virtual camera

4. Locally align images

5. Find a seam

6. Blend in the gradient domain

Page 164: Computer Vision Introduction One Lecture ( Short )

4. Stitched rotation

Scene matching with camera view transformations: Camera rotation

1. Rotate camera

2. View from the virtual camera

3. Find a match to fill-in the missing pixels

5. Display on a cylinder

Page 165: Computer Vision Introduction One Lecture ( Short )

More “infinite” images – camera translation

Page 166: Computer Vision Introduction One Lecture ( Short )
Page 167: Computer Vision Introduction One Lecture ( Short )
Page 168: Computer Vision Introduction One Lecture ( Short )
Page 169: Computer Vision Introduction One Lecture ( Short )

Virtual space as an image graph

Forward Rotate (left/right)

Pan (left/right)

•  Nodes represent Images

•  Edges represent particular motions:

•  Edge cost is given by the cost of the image match under the particular transformation

Image graph

Kaneva,Sivic,Torralba,Avidan,andFreeman,InfiniteImages,toappearinProceedingsofIEEE.

Page 170: Computer Vision Introduction One Lecture ( Short )

Virtual image space laid out in 3D

Kaneva,Sivic,Torralba,Avidan,andFreeman,InfiniteImages,toappearinProceedingsofIEEE.

Page 171: Computer Vision Introduction One Lecture ( Short )

Outline

•  Aboutme•  Computervisionapplica>ons

•  Computervisiontechniquesandproblems:– Low‐levelvision:underdeterminedproblems– High‐levelvision:combinatorialproblems– Miscellaneousproblems

Page 172: Computer Vision Introduction One Lecture ( Short )

Problem:InferenceinMarkovRandomFields.Wanttohandlehigherordercliquepoten>als,high‐dimensionalstatevariables,andreal‐valuedstatevariables.

Applica>ons:Low‐levelvision:noiseremoval,super‐resolu>on,filling‐in,texture

synthesis.

References:PushmeetKohli,LuborLadicky,PhilipTorrRobustHigherOrderPoten>alsforEnforcingLabelConsistency.In:Interna>onalJournalofComputerVision,2009.h<p://research.microsoh.com/en‐us/um/people/pkohli/papers/klt_IJCV09.pdf