A Subset Approach to Contour Tracking in Clutter

6
In Proceedings of the IEEE International Conference on Computer Vision, 1999. 1 A Subset Approach to Contour Tracking in Clutter Daniel Freedman and Michael S. Brandstein Harvard University Division of Engineering and Applied Sciences Cambridge, MA, USA 02138 freedman, msb @hrl.harvard.edu Abstract A new method for tracking contours of moving objects in clutter is presented. For a given object, a model of its con- tours is learned from training data in the form of a subset of contour space. Greater complexity is added to the contour model by analyzing rigid and non-rigid transformations of contours separately. In the course of tracking, multiple con- tours may be observed due to the presence of extraneous edges in the form of clutter; the learned model guides the algorithm in picking out the correct one. The algorithm, which is posed as a solution to a minimization problem, is made efficient by the use of several iterative schemes. Re- sults applying the proposed algorithm to the tracking of a flexing finger and to a conversing individual’s lips are pre- sented. 1. Problem Motivation and Review This work addresses the tracking of moving contours in a video-stream. Specifically, given a sequence of images in which a known object of interest is in motion, the goal is to track the object’s silhouette across time-varying im- ages. Applications abound in medicine (automated analy- sis of human organ performance, [1]), surveillance [9], and audio-visual speech recognition for noisy environments [5]. In this paper, a new contour tracker, the “subset-tracker,” is proposed, based on a rather different philosophy than those of the existing trackers. Before presenting the new algo- rithm, it is worth reviewing the existing approaches. The deformable template approach [11] minimizes, for each frame, an energy function which is specific to the ge- ometry of the tracked object. Elastic snakes [8] minimize a more general energy function, which has terms represent- ing elastic and tensile energy to ensure that the snake is smooth, and an image-dependent term that pushes the snake towards the feature of interest. The Kalman tracker [3] re- quires a learned linear stochastic dynamical model which describes the evolution of the contour to be tracked. Assum- ing that the observation of the contour has been corrupted by Gaussian noise, the conditional density of the contour given all past observations may be found, and then used to estimate the contour position. The condensation tracker [4] also assumes a dynamical model describing contour motion is known and that imprecise observations are made. How- ever, both the dynamical system and the observation process may be completely general and the conditional density may be propagated forward in time using the numerical tech- nique known as the “condensation” method. This density may then be used for estimating the current contour. A principal problem of the deformable template ap- proach is that for any given application, hand-crafting is re- quired; that is, if it is desired to track the motion of lips, a specific energy function that is appropriate for lips must be designed. The snake method does not suffer from this prob- lem; snakes will in principal find the edges of any object whose outlines are a continuous closed curve. However, the method is too general in the sense that its modes of detec- tion are not tuned to the motion of any particular object, so that it is not particularly efficient. Furthermore, both de- formable templates and elastic snakes suffer from problems of slowness in tracking contours through multiples images. The Kalman tracker has the right level of generality in that it need not be hand-crafted for any specific application and a dynamical system for the relevant class of contours can be learned. Furthermore, the process of finding observations is fast, so that the whole algorithm is much quicker than ei- ther of the above techniques. However, the Kalman tracker ignores a basic problem in tracking: the presence of clutter. Clutter can come from detected edges which are either ex- traneous to the object, or intrinsic to it. In either case, there are often multiple edges, and there is no one observation; rather, there are a multiplicity of observed contours. This problem is addressed by the condensation tracker, which is designed specifically to deal with clutter. However, this tracker has two shortcomings. First, it requires a dynami- cal model of the object being tracked. This is problematic,

Transcript of A Subset Approach to Contour Tracking in Clutter

In Proceedingsof theIEEE InternationalConferenceon ComputerVision,1999. 1

A SubsetApproachto Contour Tracking in Clutter

DanielFreedmanandMichaelS.BrandsteinHarvardUniversity

Divisionof EngineeringandAppliedSciencesCambridge,MA, USA 02138�

freedman,msb� @hrl.harvard.edu

Abstract

A new methodfor trackingcontoursof movingobjectsinclutter is presented.For a givenobject,a modelof its con-toursis learnedfromtrainingdatain theformof a subsetofcontourspace. Greatercomplexity is addedto thecontourmodelby analyzingrigid andnon-rigid transformationsofcontoursseparately. In thecourseof tracking, multiplecon-tours may be observeddue to the presenceof extraneousedgesin the form of clutter; the learnedmodelguidesthealgorithm in picking out the correct one. The algorithm,which is posedasa solutionto a minimizationproblem,ismadeefficient by the useof several iterative schemes.Re-sultsapplyingthe proposedalgorithm to the tracking of aflexing finger andto a conversingindividual’s lips are pre-sented.

1. Problem Moti vation and Review

This work addressesthetrackingof moving contoursina video-stream.Specifically, given a sequenceof imagesin which a known objectof interestis in motion, the goalis to track the object’s silhouetteacrosstime-varying im-ages.Applicationsaboundin medicine(automatedanaly-sisof humanorganperformance,[1]), surveillance[9], andaudio-visualspeechrecognitionfor noisyenvironments[5].In this paper, a new contourtracker, the“subset-tracker,” isproposed,basedon a ratherdifferentphilosophythanthoseof the existing trackers. Before presentingthe new algo-rithm, it is worth reviewing theexisting approaches.

The deformabletemplateapproach[11] minimizes,foreachframe,anenergy functionwhich is specificto thege-ometryof thetrackedobject.Elasticsnakes[8] minimizeamoregeneralenergy function, which hastermsrepresent-ing elasticand tensileenergy to ensurethat the snake issmooth,andanimage-dependenttermthatpushesthesnaketowardsthe featureof interest.TheKalmantracker [3] re-quiresa learnedlinear stochasticdynamicalmodelwhich

describestheevolutionof thecontourto betracked.Assum-ing that the observation of the contourhasbeencorruptedby Gaussiannoise,the conditionaldensityof the contourgivenall pastobservationsmaybefound,andthenusedtoestimatethecontourposition.Thecondensationtracker [4]alsoassumesadynamicalmodeldescribingcontourmotionis known andthat impreciseobservationsaremade.How-ever, boththedynamicalsystemandtheobservationprocessmaybecompletelygeneralandtheconditionaldensitymaybe propagatedforward in time using the numericaltech-niqueknown as the “condensation”method. This densitymaythenbeusedfor estimatingthecurrentcontour.

A principal problem of the deformabletemplateap-proachis thatfor any givenapplication,hand-craftingis re-quired;that is, if it is desiredto track themotionof lips, aspecificenergy functionthat is appropriatefor lips mustbedesigned.Thesnakemethoddoesnotsuffer from thisprob-lem; snakeswill in principal find the edgesof any objectwhoseoutlinesareacontinuousclosedcurve. However, themethodis too generalin thesensethat its modesof detec-tion arenot tunedto themotionof any particularobject,sothat it is not particularly efficient. Furthermore,both de-formabletemplatesandelasticsnakessuffer from problemsof slownessin trackingcontoursthroughmultiplesimages.TheKalmantracker hastheright level of generalityin thatit neednot behand-craftedfor any specificapplicationandadynamicalsystemfor therelevantclassof contourscanbelearned.Furthermore,theprocessof findingobservationsisfast,so that the whole algorithmis muchquicker thanei-therof theabove techniques.However, theKalmantrackerignoresabasicproblemin tracking:thepresenceof clutter.Cluttercancomefrom detectededgeswhich areeitherex-traneousto theobject,or intrinsic to it. In eithercase,thereareoften multiple edges,andthereis no oneobservation;rather, therearea multiplicity of observed contours. Thisproblemis addressedby the condensationtracker, whichis designedspecificallyto dealwith clutter. However, thistracker hastwo shortcomings.First, it requiresa dynami-cal modelof theobjectbeingtracked. This is problematic,

asin many casesmultiple motionsmaybepossiblefor thesameobject (for example,a finger may flex, translate,orrotate),and in practicethis multiplicity canbe difficult tomodel. Second,the condensationtracker postulatesa par-ticular form for theclutter, asembodiedin theobservationdensity. This is unreasonable,asthe point of allowing forclutter is that any type of interferingobjectsmight be ex-pected.

Theproposedtrackermaintainstheadvantagesof thelat-ter two trackers,while circumventingtheirshortcomings.Itmaintainstheright level of generalityby relying on knowl-edgeof theparticularobjectit is tracking,knowledgewhichis learnedrather than constructedby hand. However, itdoessowithoutdynamicalmodels,only usingthemoreba-sic information concernedwith the shapeor geometryoftheobject’s silhouette.Furthermore,no restrictionson thetypesof clutterto beencounteredaremadein theproposedtracker. Theseissueswill beexploredmorefully in therestof thepaper. Section2 presentsanoverview of thetracker,discussesissuesof terminology, andexplainsthemannerinwhich contourmodelsfor particularobjectsmay be built.Section3 presentsthe detailsof the proposedtrackingal-gorithm. Section4 presentsexperimentalresultsandcon-cludes.

2. Notation and Contour Models

2.1. Overview and EdgeNotation

Thebasicapproachto contourtrackingis asfollows. Ina given frame, the previous frame’s contouris taken asastartingpointfor searchingfor theedgesof thenew contour.Many algorithmsuseedgesearchwhichis normalto theoldcontour, in order to avoid the “apertureproblem” [3], [4].However, normalsearchalongtheselinesfrequentlyresultsin missededges,particularly in regionsof high curvature,or if theframerateis relatively low. In this work, theedgesearchis conductedin circular regionsaround � equally-spacedpointsof the old contour, referredto assites. Thesizeof thesearchregionvariesateachsite,andis discussedmore fully in [6]. To find edges,��� � is performed,andthenthreshholdedusinga relatively low threshhold,yield-ing asoutputa binary image.This image,which will haveedgeswhich are too thick (due to the low threshhold),isthenthinnedusinga morphologicalthinningoperator. Theresultis theedge-map.

At eachof thesites,severaledgesmaybedetected;thesetof edgesdetectedat the �� site is denoted� � (andagenericedgeat this site is denoted� � ). Themultiplicity ofedgesis dueto the fact that theobjectbeingtracked is nottheonly objectpresentin thescene;thereis clutter, andthisis whatmakestheproblemdifficult. For if therewereonlyoneedgedetectedper site, thenthe contourin the current

frameis effectively detectedwithout any further computa-tion. Evenif thebackgroundis fairly pristine,andtheonlyobjectpresentis theonebeingtracked,spuriousedgesmaystill bedetecteddueto theinaccuracy of theedge-detectionalgorithms,thenoisepresentin theimage(dueto quantiza-tion,blooming,andsoon),andbecausetheobjectmaypos-sessmany interioredgesif is notmonochrome.Thegoalofthe algorithmis to decidewhich of the edgesdetectedarethe correctones,that is which correspondto the contourandwhicharedueto clutter. An edge-vectoris an � -vectorof pointsonefrom eachsite ����� � � � � � �� � ��� (or a � � -vectorif eachelementis takento bea realnumber).Givenan edge-vector, a contourmay be interpolatedthroughitsconstituentpoints;thiscontouris referredto asanobservedcurveanddenoted� . Thesetof suchobjectsis denoted� .Thenthegoal is to find theobservedcurve in � which bestdescribesthecontourof theobjectbeingtracked.

2.2. Contour Notation

A contour � is a continuousfunction ��� � !#" � " $&%(' � ,which will often be written �)��� � * � � + � . (The reasonforthecuriouschoiceof domainwill becomeapparentshortly.)Thespaceof all contoursis denoted, . , - is definedto bethespaceof contourswhichcanbeexpressedasapairof ex-pansionsin thefirst . Legendrepolynomials.Theproper-tiesof Legendrepolynomialsaregivenin detail in [10]; forthecurrentpurposes,it is sufficientto notethattheirdomainis � !#" � " $ andthey maybewritten / 0 � 1 �2�43 -5 6 � �#0 5 1 5 7 � ,8 �9" � � � �� . where � is a known .;:�. matrix. (Note:therelationshipbetween1 andthearc-length<1 is takento be1��4�=<1 > ?@!A" , where? is thetotal arc-length.)Of course,, -CB9, , andas . increases,any element� of , canbeincreasinglybetterrepresentedby anelementof , - . In par-ticular, the valueof . is fixed at somevaluehigh enoughto captureall of the detail in the contoursof the objectofinterest.ThereasonthatLegendrepolynomialsareusedisthattheirorthogonalitymakesseveralof thealgorithmspre-sentedin section3 muchsimpler. Further, they have beenchosenoverasetof . complex exponentialssincethelatteris a setof periodicfunctions,which arethusunsuitableformodellingopencontours:capturingdiscontinuitiesrequiresveryhigh . , andGibbs’ phenomenastill arise.

A contourin , - may be identifiedentirely by � . realnumbers;namely, if � * � 1 ���D3 -0 6 �FE * G 0 /=0 � 1 � and � + � 1 ���3 -0 6 �FE + G 0 / 0 � 1 � then � � 1 ���H� � * � 1 � � � + � 1 � � is completelyspecifiedby thetwo . -vectorsE * and E + . A compactnota-tion is E �I� E * � E + � , in which E sospecifiedis takento bea � . columnvector. Therelationshipbetween� and E maybewrittenmoreconciselyas � � 1 �2�J� EK* ��L�� 1 � � EK+ ��L�� 1 � �whereL#� 1 �2�J� 1 M � 1 � � � � �� 1 - 7 � $ K .

2.3. A Contour Model

The modellingof the contoursof a specificobjectmaynow beaddressed.A typical objectwill give riseto a classof contours: most objectsmay both transformin a rigidfashion,anddeformin a non-rigid fashion.The goal is tofind a modelwhich will captureall of thesecontours.Theideahereproposedis that themostconcisemodelis a sub-set N of O , the contourspace.This subsetcapturesall ofthe possiblecontourswhich canarisefrom differentnon-euclideandeformationsof the objectunderconsideration;euclideantransformationswill beconsideredin section2.4.A relatedapproach[2] hasbeentaken to finding suitablemanifoldsfor parametricfeaturedetection.

Sinceit hasbeenassumedthatany contourof theobjectof interestmay be capturedby a contourin O P , andsinceO P hasbeenshown to be isomorphicto QSR P , the problemof findinganappropriatesubsetof thespaceof contourshasbeenreducedto finding a subsetof QSR P . Thus: given theT

contoursspecfiedby the U V -vectors W X Y Z [Y \=] , find thesubsetof QSR P which bestcapturesthe training contours.A tool which enablesoneto do this is theKarhunen-Loevetransform(principalcomponentanalysis),whosedetailsarewell known, see[7]. Theoutputsof this transformaretheaveragevector ^X andthesetof orthogonalvectorsin QSR P ,labelledW _F` Z a` \=] , whereb#cAU V , andis oftenmuchsmallerthan U V . Thenthe subsetof QSR P may be taken to be the“shifted linear” spacedIefW XDgSXheCi�jD^X=klinmDo�ZwhereoDe spanW _] k p p p=k _ a Z andb hasbeenchosenappro-priately (i.e., to capturemostof the variation). The subsetof contoursN , which is inducedby d , is thenN4e4W q mrO Png q s t u2e4s Xvw&x�y s t u k Xvz2x#y s t u u k X{m{d Z

Theproblemwith thisapproachis thatthespaceN is notcompact.Compactnessis a desirablepropertyfor thecon-toursubset,asthesubsetshouldbebounded;otherwise,thesubsetwill containcontourshapeswhichdonotcorrespondto theobjectof interest.(Recall:only non-euclideandefor-mationsarebeingconsidered.)As a result,the Karhunen-Loeveproceduremaybemodifiedslightly to getacompactsubset.Specifically, thesubsetd is emendedto bed{e4W X{g Xre a| ` \=]} ` _F`Fjn^X k�~ } ` ~ c�� ` Zwhich is itself compactandinducesa compactN . Thevec-tors_F` andthevalueof b arefoundasbefore,andtheboundsW � ` Z a` \=] arefoundby � `=eA�)� � ] �Y � [ ~ X vY _F` ~ .

To summarize,then,thefollowing formalismwill proveconvenient:N4e4W q mrO Png q s t u2e4s X vw&x�y s t u k X vz2x#y s t u u k X{m{d Zd{e4W X{m{Q R P g Xre@_ } jn^X k } m{�#Z�ne4W } mrQ a gF~ } ` ~ c�� ` k��&eJ� k p p pk b Z

where _ is the U VC�@b matrix whosecolumnsare the _F` .Thekey insight is thatany contourof theobjectof interestis specifiedby a b -dimensionalvector} which is amemberof thecompactset � .

2.4. Affine Transformations

It is preferableif the training contoursusedfor findingN simply capturenon-rigid transformations,andeuclideansimilarity transformationsare ignored. The reasonis thatthere is a standardmathematicaltheory of the euclideansimilarity transformationswhich can be implementedinstraightforward fashion. In particular, any euclideansimi-larity transformationmay be representedas XJ���Ds Xu �where �Ds Xu2eC� ��� X w�� X z��� X z X w��where � e;� � k � k p p p � � v , � is the first columnof s x v u � ] ,and �4mhQ2� . (For a derivation, see[6].) � ] correspondsto translationin the � -direction, � R to translationin the � -direction,and � � and� � togethercoverrotationandscaling.

At any given frame � , �&  , the setof possibleeuclideansimilarity transformations,is Q2� . However, supposethatthe camerais fixed, and it is known that the object itselfmayonly translate(including translationtowardsthe cam-era)androtateat a certainrate. Then �   may be approxi-matedas ¡�   enW �)g �   ` c�� ` c �   ` k �&eJ� k p p p k ¢ Zwherethe valuesof bounds,in termsof both �   � ] andthemore basicparameterso#£ w , o�£ z , o#¤ , and o�¥ , may befoundin [6].

3. The Tracking Algorithm

The essenceof the tracking algorithm is as follows.Given the contour from the previous frame, the currentframe’s imageis searchedfor edgesat ¦ equally-spacedpointsalongthe old contour. Searchis in circular regionscenteredat eachsite. At eachsite, several edgesare de-tected;thegoalof thetrackingalgorithmis to sortoutwhichof the edgescorrespondto the true contour. Framingtheproblemin this mannermakes it particularlyamenabletotrackingin clutter.

3.1. BasicSetupand ObjectiveFunction

Following the edgesearch,dataconsistsof the setsofedgesdetectedateachsite, W § ¨ Z ©¨ \=] , andthroughthecom-poundingof these,thesetof observedcurves,ª . Thetrack-ing algorithmis posedasthe solutionto the minimizationproblem

Begin with « ¬ ­ ® (asgivenby thecoarse-to-finealgorithm,seesection3.3). Let ¯ ° ±�² .STEP 1:Given « ¬ ­ ¬ ³ ´ ° µ ¶ · ¬ ­ ¬ ³ µ ¸ ¬ ­ ¬ ³ ¹=± argminº ­ »H¼ · ½)¸ « ¬ ­ ¬ ³ ´ ° ¼STEP 2:Given · ¬ ­ ¬ ³ and ¸ ¬ ­ ¬ ³ µ « ¬ ­ ¬ ³ ± argmin¾ ¼ · ¬ ­ ¬ ³ ½)¸ ¬ ­ ¬ ³ « ¼¯ ° ¿I¯ °FÀ@² . Go to STEP 1.

Figure 1. The local minimization algorithm.

Á� ÃÄ Å Æ Ç È Å ÉÇ ÊFÅ Ë�Ì Í2Î�Ï ÐFÌ (1)

whereÑ is thesetof allowedeuclideansimilarity transfor-mations,Ï is onesuchtransformation,and Ì�ÒFÌ is the Ó2Ônorm. The idea is to find the edge-vector, that is, the listof observededges,(at most)onefrom eachsite,whosein-terpolatedcontouris closestto the learnedcontoursubsetÕ

. (The mannerin which the thesecontoursare interpo-lated is discussedin [6].) Of course,aswasdiscussedinsection2.3,

Õis learnedfrom trainingcontoursin a single

orientation;thus,euclideansimilarity transformationsof theobservedcurvesmustbe taken into account,which resultsin thepresenceof Ï . Theparticularpartof theoptimizationsolutionwhich is of interestis theminimizing Í ; in partic-ular, Ï2Ö=×Ø2Ù Ú Í ØSÙ Ú is taken to be the contourfor the currentframe.

This optimizationproblemis peculiarin several ways.First, it hasboth continuousanddiscreteelements.Whiletheset Û is fundamentallydiscrete,dueto thefact thata fi-nitenumberof edgesareobserved,both Í and Ï arecontin-uous.Furthermore,Û is generatedby searchingat Ü sites;sincetypically a few edgeswill be foundat eachsite, Û isthus Ý�Þ ßrà�á in size. As a result, exhaustive searchoverthediscretepartof theproblemis ruledout for any reason-ablesized Ü . Thegoalof this section,therefore,will betodescribeanefficientway to solvetheoptimizationproblem.Thebasicmethodwill beasfollows.First,analgorithmwillbedetailedwhich enablesa local minimumof thefunctionto befound;thisalgorithmis iterativein nature,andhastwodistinctsteps.Second,analgorithmfor finding a goodini-tial conditionis explained;this initial conditionis thenfedinto the local minimizationroutine,which thenmay resultin the global minimum (andif not, somethingcloseto theglobalminimum).This latterprocedureis referredto asthecoarse-to-fine(CTF)minimizationalgorithm.

3.2. The Local Minimization Algorithm

An iterative procedureis proposedwhich is guaranteedto convergeto the local minimum. In theprocedureshownin Figure1, â × is thetime-stepof theiterations,asopposed

Given: « ® , ã ® , ä ® .Let:

å ± å ¶ « ® ¹æ ® ±{ç è ¶ ã ® ½@éã ¹ê ±rç è ç (êë

denotesthe ì ¬ í row ofê

)î ± å è å (î2ë

denotesthe ì ¬ í rowofî

)

Notation: ï ð ñ òóSôHõö÷ öøð if ù#ú�ð�ú{ûù if ð�ü�ùû if ð�ý�û

¯ þ&±�²do

for i = 1 to qÿ ± å ä ¬ � ´ °=½@éãæ ¬ �ë ±nï ¶ ÿ ë ½ êë æ ¬ � ´ ° À êë ë æ ¬ � ´ °ë ¹ � êë ë ñ ò �´ ò �¯ þ&¿I¯ þ=À�²endfor j = 1 to 4� ± å ¶ ç æ ¬ � ´ ° À éã ¹ä ¬ �� ±nï ¶ � � ½ î � ä ¬ � ´ ° À î � � ä ¬ � ´ °� ¹ � î � � ñ �� ��� ��¯ þ&¿I¯ þ=À�²end

untilæ

and ä haveconverged

Figure 2. The step 1 algorithm.

to â which is the overall time (i.e., which framethe algo-rithm hasreached).Proof that this iterative proceduredoesindeedconvergeto thelocal minimumis containedin [6].

As thealgorithmstands,notmuchhasbeengained,sinceit is not known how to actuallyimplementstep1 or step2.However, theproblemin (1) maybewrittenÁ� ÃÄ Å Æ � Ç Å Ç � Å �� � Ì �)Î�� Þ Ð á � Ìusingtheresultsof section2.4,wherenow thenorm Ì&Ò Ì istheusualeuclideannormin � Ô � . With this in mind,step1maybesolveditself via aniterative procedure.This proce-durewill have its own time-scaleâ Ô , which is subordinateto both â and â × . However, for easeof notation,bothsuper-scriptsâ and â × will besuppressed.Thealgorithmis shownin Figure2; the proof that this proceduredoesindeedleadto therequiredminimumis detailedin [6].

Step 2 seemsto require an exhaustive search,in thatall interpolatededge-vectorsmustbeevaluated.This is anÝ�Þ ß{à�á operation,andis thusunreasonablein practice.In-stead,notethat if Ü is large,so that Í doesnot vary muchtoo muchover intervalsof length � � Ü , thenthe followingapproachmaybeused:

Ì Í2Î{Ï&Ð Ì Ô���� ×Ö=× Ì Í Þ � á Î Þ Ï Ð á Þ � á Ì Ô � �� �Ü à�Ú � × Ì Í Þ � Ú á Î�Ï! Ú Ì Ô

frame23 frame40 frame57 edge-mapfor frame57 frame75

Figure 3. Tracking a flexing fing er.

frame17 frame18 frame75 edge-mapfor frame75 frame130

Figure 4. Tracking a speaker’ s lips.

where " #%$'&)(�*,+ - .�&�( / + 0 / 1 and 2 # is the .43 5 pointof the edge-vector 2�$6- 2 7 8 9 9 948 2 :�0 from which ; wasinterpolated. If the approximationis valid, the problem<>= ? @ A B)C D &�E ; C is equivalentto<>= ?F G A H G I J J J I F K A H K +1 :L# MN7 C D - " # 0 &�E!2 # C O$ +1 :L# M 7�P <>= ?F Q A H Q C D - " # 0 &�E!2 # C O Rin thatbothproblemswill giveriseto thesameedge-vector2 andhencethesameobservedcurve ; . Thelatterproblemis an ST- 1U0 problem: it is a simplematterto determine,ateachsite, which of the edges(properly orientedby E ) istheclosestto thepoint D - " # 0 . Of courseall edgesmustbetransformedby E beforehand,but this is alsojust an ST- 1U0operation.Thus,it hasbeendemonstratedthatbothstepsofthe iterative procedurefor finding a local minimumcanbeimplementedefficiently.

3.3. The CTF Minimization Algorithm

Local minimization of (1) is useful, but what is reallydesiredis theglobalminimum. What is suggestedis not awayto reachthisglobalminimumwith certainty, but rather,a way to cleverly pick an initial condition ; 3 I V for the localminimization routine: the coarse-to-fine(CTF) minimiza-tion algorithm.

SupposethereareintegersW and X suchthat 1Y$,W Z[*( . Thereare X consecutivestagesof decisions.Let \ ]^ _>$` W Z>a4]b*T- cd&b( 0 W Z>a4]fe 7 *T( . In stage1,asingledecisionismade,namely, which edgesat thesites g \ 7^ 7 h i^ M V shouldbeselected.Theedgesareselectedas jk- \ 7V 7 8 9 9 9 8 \ 7i 7 0 , wherej is definedby

jf- l47 8 9 9 9 8 ldm 0!$argminn F o p A H o p q rp s Gkt <>= ?5 A ufv w G I J J J I w r x C y 2 w G z 9 9 9 z 2 w r { &�| C }

In the above definition, lN7 8 9 9 9 8 ldm are any ~ sites, and� - l47 8 9 9 9 8 l4m 0 is the set of all possibleconfigurationsofpointsat suchsites,ascanbederivedfrom thelearnedsub-set � (see[6] for more details). That is, j is the set ofpoints 2 w G 8 9 9 9 8 2 w r at sites l47 8 9 9 9 8 l4m which, out of all oftheobservedpointsat therelevantsites,areclosestto thosewhich havebeenlearned.Usingknowledgeof

�, theinner

minimizationproblemcanbesolved[6].In the � 3 5 stage,��$�+ 8 9 9 9 8 X , thereare W ]�a 7 deci-

sionsmade.The c 3 5 suchdecisioninvolvesselectingedgesat the sites g \ ]^ _ h i a 7^ MN7 , as �jf- \ ]7 _ 8 9 9 9 8 \ ]i a 7 I _ z \ ]V _ 8 \ ]i _ 0 ,where �j is definedby�jf- l O 8 9 9 9 8 ldm a 7 z l47 8 l4m 0k$

argminn F o p A H o p q r � Gp s � t <>= ?5 A ufv w G I J J J I w r x C y 2 w G z 9 9 9 z 2 w r { &�| C }Thatis, 2 w G and 2 w r arefixed,asthey havebeenselectedinapreviousstage.Thusedgesareselectedin acoarse-to-finemanner, andthereareatotalof � Z�a 7^ M VYW ^ $�- W Z�&�( 0 / - W!&( 0k�,1>/ - W�&�( 0 decisions.Sincethetimefor eachdecisionis ST- ( 0 , thealgorithm’soverall runningtime is S>- 1U0 .4. Resultsand Conclusions

Two setsof resultsarepresentedto illustratethe effec-tivenessof the proposedtracker: a flexing finger and aspeaker’s lips. A summaryis given in Table1. In the lat-ter case,lipstick wasusedto helphighlight contrastin both

Experiment � � VideoRate Resolution TrainingSequence RunningSequence AccuracyFinger 80 20 13 Hz 320by 240 120frames= 9.2s 166frames= 12.8s 100%Lips 80 20 13 Hz 320by 240 200frames= 15.4s 130frames= 10.0s 94%

Table 1. Summar y of the experiments.

the training andrunningsequences.The edge-mapin thecaseof thefingerwasgeneratedfrom thegray-scaleinten-sity; clutteris clearlyvisible in theedge-mapshown in Fig-ure 3, and is in the form of both the backgroundobjects(keys, pens)aswell asthe self-clutterof the doubledoverfinger. A sequenceof tracked framesis shown in Figure3; in this instance,the tracker got all 166 tracked framescorrect. In the caseof the speaker’s lips, the edge-mapwasgeneratedfrom the greenportion of the RGB image,which hasslightly bettercontrastthantheintensity. Clutteris clearly visible in the edge-mapshown in Figure4, dueto the detectionof many extraneousedges,aswell as thefact that over the searchrangethe lips interferewith eachother. A sequenceof tracked framesis shown in Figure4,andthetrackergot94%of thetrackedframescorrect;how-ever, equallyimportantasthishighsuccessrateis theabilityto recover from theoccasionalerror, asshown in Figure5.Full video sequencesof both trackingexperimentscanbeviewed at http://himmel.deas.harvard.edu/(underprojects:audio/visualbasedspeechenhancement).

frame48 frame49

Figure 5. Recovering from mistakes.

In the light of thesehighly successfulexperimentalre-sults, it is worth restatingsomeof the advantagesthat arepresentedby this algorithmover othercontourtrackingap-proaches.As opposedto thedeformabletemplateapproach,thereis noneedfor hand-constructedmodelsof theobject’sgeometry;rather this is learned. Whereaselasticsnakesusenospecialinformationabouttheobjectunderstudy, thelearnedinformation usedby the subsettracker allows formoreaccuratetracking. Furthermore,the subsettracker iscomputationallylessburdensomethan thesetwo typesoftrackers. The subsettracker dealswell with clutter, whichis a main failing of the Kalmantracker. Finally, thereareseveraladvantagesoverthecondensationtracker. (Morede-tail is presentedhere,asthecondensationtrackerrepresentsthe state-of-the-artin termsof contourtrackers.) First, nodynamicalmodelsarerequired;only shape/geometryinfor-mation,asembodiedin � , is needed.This representsanadvantageasmultiple typesof complex motion arepossi-ble for many moving objects,and this may be difficult to

modelaccurately. By contrast,theshapesetof anobject’ssilhouettesis independentof thatobject’smotion.Second,amethodis presentedfor learningthecomplex geometricin-formationof thecontoursubsetfrom trainingdata,whereasin [4], this informationmustsometimesbe constructedbyhand(for example, in the caseof the shapespaceof themoving hand).Third, thecondensationapproachmakesas-sumptionsaboutthe typesof clutter likely to be encoun-tered;no suchassumptionsaremadehere.As a result,thesubsettracker is very generalin termsof the typesof ob-jectsit cantrack,aswell asthebackgroundsagainstwhichtheseobjectsmove. Neitherlow frame-ratenor poor light-ing presentparticularproblems.

References

[1] N. Ayache,I. Cohen,andI. Herlin. Medicalimagetracking.In A. BlakeandA. Yuille, editors,ActiveVision, pages285–302.MIT Press,Cambridge,MA, 1992.

[2] S. Baker, S. Nayar, andH. Murase. Parametricfeaturede-tection. Int. J. Comp.Vis., 27(1):27–50,1998.

[3] A. Blake, R. Curwen,andA. Zisserman.A framework forspatio-temporalcontrol in the tracking of visual contours.Int. J. Comp.Vis., 11(2):127–145,1993.

[4] A. Blake andM. Isard. Condensation- conditionaldensitypropagationfor visualtracking. Int. J. Comp.Vis., 29(1):5–28,1998.

[5] B. Dalton,R. Kaucic,andA. Blake. Automaticspeechread-ingusingdynamiccontours.In ProceedingsNATOASICon-ferenceon Speechreadingby Man and Machine: Models,Systems,and Applications. NATO Scientific Affairs Divi-sion,September1995.

[6] D. Freedman.Contourtrackingin clutter:asubsetapproach.Technicalreport,Division of EngineeringandApplied Sci-ences,HarvardUniversity, 1999.

[7] S. Haykin. AdaptiveFilter Theory. Prentice-Hall,UpperSaddleRiver, N.J.,3rdedition,1996.

[8] M. Kass,A. Witkin, andD. Terzopoulos. Snakes: activecontourmodels. In Proc. 1st Intern. Conf. Comput.Vis.,London,June1987.

[9] G. Sullivan. Visual interpretationof known objectsin con-strainedscenes.Phil. Trans.Roy. Soc.LondonB, 337:109–118,1992.

[10] G.Szego.Orthogonalpolynomials. AmericanMathematicalSociety, Providence,1975.

[11] A. Yuille, P. Hallinan, and D. Cohen. Featureextractionfrom facesusingdeformabletemplates.Int. J. Comp.Vis.,8(2):99–112,1992.