Recovering Human Body Configurations: Combining Segmentation and Recognition
-
Upload
brock-pena -
Category
Documents
-
view
31 -
download
0
description
Transcript of Recovering Human Body Configurations: Combining Segmentation and Recognition
Recovering Recovering Human Body Human Body
Configurations: Configurations: Combining Combining
Segmentation and Segmentation and RecognitionRecognitionGreg Mori, Xiaofeng Ren, and Greg Mori, Xiaofeng Ren, and
Jitentendra Malik (UC Jitentendra Malik (UC Berkeley)Berkeley)
Alexei A. Efros (Oxford)Alexei A. Efros (Oxford)
The goalThe goal
Given an image:Given an image: Detect a human figureDetect a human figure Localize joints and limbsLocalize joints and limbs
Create a skeleton of their poseCreate a skeleton of their pose Create a segmentation mask of the personCreate a segmentation mask of the person
Other approaches: Other approaches: Simple featuresSimple features
Model people as generalized Model people as generalized cylinders (1980’s)cylinders (1980’s) Easily implemented bottom upEasily implemented bottom up Often use tree to express Often use tree to express
relationsrelations Problems:Problems:
Cylinders are commonCylinders are common Often dependencies between Often dependencies between
body partsbody parts Really need contextReally need context
Other approaches: Other approaches: Probable poseProbable pose
Often use probable poseOften use probable pose Template matchingTemplate matching Top down constraints on poseTop down constraints on pose But even highly improbable poses are But even highly improbable poses are
still possiblestill possible
Other approaches: Other approaches: Frequent simplificationsFrequent simplifications
Nude modelsNude models Limited posesLimited poses Background subtraction or limited Background subtraction or limited
clutterclutter
““Arguably the most Arguably the most difficult recognition difficult recognition problem in computer problem in computer
vision”vision” Variation in clothingVariation in clothing Variation in limbsVariation in limbs Variation in poseVariation in pose
Solution: “Islands of Solution: “Islands of Saliency”Saliency”
Use low-level features that are Use low-level features that are informative independent of contextinformative independent of context
Based on these islands, one is able Based on these islands, one is able to fill in gaps with contextto fill in gaps with context
Algorithm: Segmenting Algorithm: Segmenting into regions and into regions and
superpixelssuperpixels
SegmentationSegmentation
Combine boundary finder (Martin et Combine boundary finder (Martin et al., 2002) with Normalized Cuts al., 2002) with Normalized Cuts (Malik, Belongie, et al., 2001)(Malik, Belongie, et al., 2001) Groups similar pixels into regionsGroups similar pixels into regions
Segmentation: RegionsSegmentation: Regions
40 regions40 regions Most salient parts Most salient parts
of body become of body become regionsregions Limbs usually two Limbs usually two
“half-limbs”“half-limbs”
Segmentation: Segmentation: SuperpixelsSuperpixels
200 region 200 region (oversegmentation(oversegmentation))
Retains virtually Retains virtually all structures in all structures in originaloriginal
Still reduces Still reduces complexity from complexity from 400,000 pixels to 400,000 pixels to 200 superpixels200 superpixels
Finding limbsFinding limbs
Candidates: all 40 regionsCandidates: all 40 regions Four cues for half-limb detectionFour cues for half-limb detection
Contour: Probability of the boundaryContour: Probability of the boundary Average probability of the region’s Average probability of the region’s
boundary, as measured by Martin’s boundary, as measured by Martin’s boundary finderboundary finder
Shape: How close to a rectangleShape: How close to a rectangle Area of overlap with reconstructed Area of overlap with reconstructed
rectangle,rectangle,
Find limbsFind limbs
ShadingShading Limbs are roughly cylindrical, so should Limbs are roughly cylindrical, so should
have 3D pop out due to shadinghave 3D pop out due to shading Compare ICompare Ix-x-, I, Ix+x+, I, Iy-y-, I, Iy+y+ for region to mean of for region to mean of
IIx-x-, I, Ix+x+, I, Iy-y-, I, Iy+y+ for training set for training set
Focus cueFocus cue Background is often not in focusBackground is often not in focus CCfocusfocus = E = Ehighhigh/(a E/(a Elowlow + b) + b)
Finding limbsFinding limbs
Cues are combined by summingCues are combined by summing Use logistic regression to learn Use logistic regression to learn
weights (training set of hand-labeled weights (training set of hand-labeled half-limbs)half-limbs)
Evaluation summaryEvaluation summary
Not very good detectorsNot very good detectors Strength of boundary best cueStrength of boundary best cue Combining cues yields better Combining cues yields better
performanceperformance On average 4.08 of top 8 candidates On average 4.08 of top 8 candidates
produced were hitsproduced were hits 89% have at least 3 hits among top 889% have at least 3 hits among top 8
Motivates search for 3 half-limbs Motivates search for 3 half-limbs combined with head and torsocombined with head and torso
Finding torsosFinding torsos
Unlike half-limbs, typically several Unlike half-limbs, typically several regionsregions
Consider all sets of adjacent regions Consider all sets of adjacent regions within some range of total sizeswithin some range of total sizes
Set of cues:Set of cues: ContourContour ShapeShape FocusFocus (No shading)(No shading)
Finding torsosFinding torsos
Find orientation of torsoFind orientation of torso Find best matching headFind best matching head
Again contour, shape, and focus cues with Again contour, shape, and focus cues with shape a diskshape a disk
Score for torso, score for head, and Score for torso, score for head, and score for relative positions of head to score for relative positions of head to torso multiplied to create score for torso multiplied to create score for oriented torsooriented torso
EvaluationEvaluation
Success if all four torso points within 60 Success if all four torso points within 60 pixels of ground truthpixels of ground truth
Body buildingBody building
From 5-7 half-limbs and ~50 From 5-7 half-limbs and ~50 candidate oriented torsos form candidate oriented torsos form partial configurations consisting of:partial configurations consisting of: Each torsoEach torso Three half limbs assigned each assigned Three half limbs assigned each assigned
to:to: One of 8 half limb body partsOne of 8 half limb body parts One of two polaritiesOne of two polarities
2-3 million partial configurations!2-3 million partial configurations!
Enforce constraints:Enforce constraints: Relative widthsRelative widths
Foreshortening doesn’t affect width of limbs muchForeshortening doesn’t affect width of limbs much Use anthropomorphic data to rule out limbs more Use anthropomorphic data to rule out limbs more
than 4 standard deviations wider than expectedthan 4 standard deviations wider than expected Length of limbs relative to torsoLength of limbs relative to torso
Assume torso not too foreshortenedAssume torso not too foreshortened No more than +/- 40% angle with image planeNo more than +/- 40% angle with image plane
Again, prune limbs more than 4 standard Again, prune limbs more than 4 standard deviations away from mean length, relative to deviations away from mean length, relative to torsotorso
Seems to be making some assumptions of probable Seems to be making some assumptions of probable posepose
Enforce constraintsEnforce constraints
AdjacencyAdjacency Upper limbs must be adjacent to torsoUpper limbs must be adjacent to torso Lower limbs must be adjacent to upper limbsLower limbs must be adjacent to upper limbs
Symmetry in clothing: color histograms Symmetry in clothing: color histograms must not be overly dissimilar for must not be overly dissimilar for corresponding segmentscorresponding segments E.g. right and left upper arms should be E.g. right and left upper arms should be
similarsimilar Makes some small assumptions about Makes some small assumptions about
variations in clothingvariations in clothing
Body building: slimming Body building: slimming downdown
Reduces to ~1000 partial Reduces to ~1000 partial configurationsconfigurations
Sorted by linear combination of the Sorted by linear combination of the torso and the three half-limb scorestorso and the three half-limb scores (This score can be used to improve (This score can be used to improve
torso detection)torso detection)
Extending to full limbsExtending to full limbs Adding additional rectangles evaluated on Adding additional rectangles evaluated on
adjacent superpixels to empty limb jointsadjacent superpixels to empty limb joints Want high internal similarity and high Want high internal similarity and high
dissimilarity to surroundingsdissimilarity to surroundings
SummarySummary ““Arguably the most difficult problem in Arguably the most difficult problem in
computer vision”computer vision” Not solved hereNot solved here
Method here is appealing:Method here is appealing: Don’t need to store exemplarsDon’t need to store exemplars Island of saliency approach seems useful in Island of saliency approach seems useful in
many contextsmany contexts Use some configural knowledge to make Use some configural knowledge to make
reasonable guessesreasonable guesses Good illustration of integrating recognition Good illustration of integrating recognition
and segmentationand segmentation