Statistical Models of Appearance for Computer Vision 主講人:虞台文.
-
Upload
todd-walker -
Category
Documents
-
view
240 -
download
1
description
Transcript of Statistical Models of Appearance for Computer Vision 主講人:虞台文.
Statistical Models of Appearancefor Computer Vision
主講人:虞台文
ContentOverviewStatistical Shape ModelsStatistical Appearance ModelsActive Shape ModelsActive Appearance ModelsComparison : ASM v/s AAMConclusion
Statistical Models of Appearancefor Computer Vision
Overview
Computer Vision Goal
– Image understanding Challenge
– Deformability of objects Statistic model-based approach
– Shape model– Appearance model– Model matching
Image interpretation
Deformable Models
General– capable of generating any plausible
example of the class they represent
Specific– only capable of generating legal examples
Statistical Models of Appearancefor Computer Vision
Statistical Shape Models
Shapes The shape of an object is represented
by a set of n points in any dimension
Invariance under some transformations– in 2-3 dimension – translation, rotation,
scaling– called similarity transformation
Hand-Annotated Training Set
The training set typically comes from hand annotation of a set of training images
Suitable Landmarks
Object Boundary
High Curvature
Equally spacedIntermediate points
T Junction
Suitable Landmarks
Shape Vector A d-dim shape with n landmarks is represented
by a vector with elements In a 2D-image with n landmark points, a shape
vector x is then
1 1( , , , , )Tn nx y x yx
Training Set of Shape Vectors
A d-dim shape with n landmarks is represented by a vector with nd elements In a 2D-image with n landmark points, a shape vector x is then
The shape vectors in the training set should be in the same coordinate frame– Alignment of the training set is required
1 1( , , , , )Tn nx y x yx
Aligning the Training Set
Aligning the Training Set Procrustes AnalysisMinimize
2| |iD x x
Alignment : Iterative Approach
Alignment : Iterative Approach
Alignment : Iterative Approach
1x 0 x x
1x 0 x x
Alignment : Iterative Approach
1x 0 x x
Possible Alignment Approaches:• Center -> scale (|x| =1) -> rotation• Center -> (scale + rotation)• Center -> (scale + rotation) -> projection onto tangent space of the mean
Alignment : Iterative Approach
1x 0 x x
Possible Alignment Approaches:• Center -> scale (|x| =1) -> rotation• Center -> (scale + rotation)• Center -> (scale + rotation) -> projection onto tangent space of the mean
Alignment : Iterative Approach
1x 0 x x
Possible Alignment Approaches:• Center -> scale (|x| =1) -> rotation• Center -> (scale + rotation)• Center -> (scale + rotation) -> projection onto tangent space of the mean
Tangent space to xt – all vectors s.t. (xt x) xt = 0, or x
xt = 1 if xtxt = 1 Obtained by aligning the shape
s with the mean, allowing scaling and rotation, then project into the tangent space by scaling x by 1/(xx)
Alignment : Iterative Approach
1x 0 x x
Alignment : Iterative Approach
1x 0 x x
Modeling Shape Variation
Parametric Shape Model
Estimate the distribution of b– Generating new shapes– Examining new shapes (plausibility)
: prameter vect( ) o rMx bb
PCA1. Compute the mean
2. Compute the covariance matrix
3. Compute the eigenvectors, i and corresponding eigenvalues i of S s.t. 1 2
1
1 s
iis
x x
1
1 ( )( )1
sT
i iis
S x x x x
1 2| | | t Φ
Shape Approximation
x x Φb
1 2| | | t Φ
a plausible shapemean shape
parameter vector of deformable model
( )T b Φ x x
Shape Approximation
x x Φb
1 2| | | t Φ
parameter vector of deformable model
1
2
t
bb
b
b
with 3 3i i ib The generated shape is similar those in training set (plausible).
Choice of Number of Modes (t = ?)
Let fv be the proportion of the total variation one wishes to explain (e.g., 98%)
Total variance VT =i is the sum of all eigenvalues, assuming 1 2 T
Then, we can chose the smallest t s.t.
1 2| | | t Φ
1
t
i v Ti
f V
Examples of Shape Models
Some members in the training set (18 hands)
Each is represented by 72 landmark points
Examples of Shape Models
1 1 13 0 3b
2 2 23 0 3b
3 3 33 0 3b
Examples of Shape Models
Some members in the training set (300 faces)
Each is represented by 133 landmark points
Examples of Shape Models
1 1 13 0 3b
2 2 23 0 3b
3 3 33 0 3b
Generating Plausible Shapes
x x Φb
1 2| | | t Φ
parameter vector of deformable model
1
2
t
bb
b
b
Assumption :bi’s are independent and Gaussian
Options Hard limits on independent bi’s Constrain b in a hyperellipsoid
Generating Plausible Shapes
x x Φb
1 2| | | t Φ
Variation is modeled as a linear combination of eigenvectors
How about for nonlinear cases?
Nonlinear Shape Variation
Nonlinear Shape Variation
b1 and b2 are not independent.
Nonlinear Shape Variation
Nonlinear Shape Variation
Nonlinear shape variations often caused by– Rotating parts of objects– View point change– Other special cases
Eg : Only 2 valid positions (x = f(b) fails) Plausible Shapes cannot be generated using aforementioned method.
Non-Linear Models for PDF
x x Φb1
2
t
bb
b
b
Polar coordinates (Heap and Hogg) Modeling p(b) by Mixture of Gaussians
DrawbacksFiguring out no. of Gaussians to be usedFinding nearest plausible shape
Similar Transformation
, , , ( )t tX Y sT x x Φb
Similar TransformationTranslationScaleRotationA shape of the model
A shape in a image
, , ,
cos sinsin cost t
tX Y s
t
Xx s s xT
Yy s s y
Pose parameters
Fitting a Model to New Points in a Image
, , , ( )t tX Y sT x x Φb
, , ,
cos sinsin cost t
tX Y s
t
Xx s s xT
Yy s s y
Given a set of new image points Y, find b and pose parameters so as to minimize
,2
, ,| ( ) |t tX Y sY T x Φb
Fitting a Model to New Points in a Image
,2
, ,| ( ) |t tX Y sY T x ΦbMinimize
Statistical Models of Appearancefor Computer Vision
Statistical Models of Appearance
AppearanceShapeTexture
– Pattern of intensities
Appearance Model
shape variance
texture variance
Shapenormalization
Shape Normalization Remove spurious texture variations due to shape differences Warp each image to match control points with the mean image
– triangulation algorithm
Intensity Normalization
( ) /im g g 1Gray-level vector of shape normalized image
im g g( ) /im n g 1
Mean of the normalized data, scaled and offset s.t. elements’ sum is 0 and variance is 1Obtained using a recursive process
PCA
g g g g P b
mean normalized grey-level vector
a set of orthogonal modes of variation
a set of grey-level parameters
Texture Reconstruction
g g g g P b
( ) /im g g 1
( ) /im g 1
( )im g g g g P b 1
Combined Appearance Model
( )Ts s b P x x Shape parameters
Appearance parameters( )Tg g b P g g
s s
g
W bb
b( )
( )
Ts s
Tg
W P x xP g g
Combined Appearance Models
s s
g
W bb
b( )
( )
Ts s
Tg
W P x xP g g
sW sW
a diagonal matrix of weights to accommodate the difference in units between the shape and grey models
PCA for Combined Vectors
( )( )
Ts s s s
Tg g
W b W P x xb
b P g g
cb P ceigenvectors from applying PCA on b’s
appearance parameters controlling both the shape
and grey-levels of the model
Shape & Grey-Level Reconstruction
( )( )
Ts s s s
Tg g
W b W P x xb
b P g g
cb P c( )
( )
Ts s
Tg
W P x xP g g
cs
cg
Pc
P
1s s cs
g cg
x x P W P cg g P P c
s
g
x Q cg Q c
Appearance Reconstruction
1s s cs
g cg
x x P W P cg g P P c
s
g
x Q cg Q c
Given appearance parameter cGenerate shape-free gray-level image g
warp g to the shape described by x
Review:Combined Appearance Models
s s
g
W bb
b( )
( )
Ts s
Tg
W P x xP g g
sW sW
a diagonal matrix of weights to accommodate the difference in units between the shape and grey models
How to obtained Ws?
Choice of Shape Parameter Weights
s s
g
W bb
b( )
( )
Ts s
Tg
W P x xP g g
sW sW
Method1 Displace each element of bs from its optimum value and observe change in g for each training example
– The RMS change gives elements in Ws
Method2 Ws = rI where r2 is the ratio of the total intensity variation to the total shape variation
Choice of Shape Parameter Weights
s s
g
W bb
b( )
( )
Ts s
Tg
W P x xP g g
sW sW
Method1 Displace each element of bs from its optimum value and observe change in g for each training example
– The RMS change gives elements in Ws
Method2 Ws = rI where r2 is the ratio of the total intensity variation to the total shape variation
The choice of Ws is relatively
insensitive
Example: Facial Appearance Model
First two modes of shape variation (3 sd) First two modes of grey-level variation (3 sd)
Example: Facial Appearance Model
First four modes of appearance variation (3 sd)
Approximating a New Example
Given a new image, labelled with a set of landmarks, to generate an approximation with the model.
Approx.
Approximating a New Example
Given a new image, labelled with a set of landmarks, to generate an approximation with the model.
Obtain bs and bg
Obtain b Obtain c Apply Inverting gray level normalization by Applying pose to the points Projecting the gray level vector to the image
1s s cs
g cg
x x P W P cg g P P c
im g g 1
Statistical Models of Appearancefor Computer Vision
Active Shape Models
Goal Given a rough starting approximation, to
fit an instance of a model to the image
Iterative Approach Iteratively improving the fit of the instance, X, to an image proceeds as follows:
1. Examine a region of the image around each point Xi to find the best nearby match for the point2. Update the parameters (Xt, Yt, s, , b) to best fit the new found points X3. Repeat until convergence
iX
Iterative Approach Iteratively improving the fit of the instance, X, to an image proceeds as follows:
1. Examine a region of the image around each point Xi to find the best nearby match for the point2. Update the parameters (Xt, Yt, s, , b) to best fit the new found points X3. Repeat until convergence
iX
• The above method is applicable if model points are edges• The best approach is to examine the local structures of model points (to be discussed)
Iterative Approach Iteratively improving the fit of the instance, X, to an image proceeds as follows:
1. Examine a region of the image around each point Xi to find the best nearby match for the point2. Update the parameters (Xt, Yt, s, , b) to best fit the new found points X3. Repeat until convergence
iX
Modeling Local Structure Sample the derivative along a profile, k pixels on either side of a model point, to get a vector gi of the
2k+1 points
gi
k points
k points
gi
k pointsk points
k pointsk points
Modeling Local Structure Sample the derivative along a profile, k pixels on either side of a model point, to get a vector gi of the
2k+1 points Normalize gi by Repeat for each training image for same model point to get {gi} Estimate mean and covariance
1| |i i
ijig
g g
igigS
Fitness Sample the derivative along a profile, k pixels on either side of a model point, to get a vector gi of the
2k+1 points Normalize gi by Repeat for each training image for same model point to get {gi} Estimate mean and covariance
1| |i i
ijig
g g
igigS
1( ) ( ) ( )i
Ts s i s if gg g g S g g
Mahalanobis distance gi
k pointsk points
k pointsk points
Using Local Structure Model
gi
k pointsk points
k pointsk points
Sample a profile m pixels either side of the current point (m>k)
Test quality of fit at 2(mk)+1 positions
Chose the one which gives the best match
Statistical Models of Grey-Level Profiles
The same number of pixels is used at each level
MRASM Algorithm
MRASM AlgorithmStarting from the
coarsest level
MRASM Algorithm
Set up the initial position before search
MRASM AlgorithmSearch best match position
for each model points
MRASM AlgorithmSearch best match position
for each model points
Ensure plausibility
MRASM Algorithm
Repeat until convergence of the level
Change to finer level
MRASM Algorithm
Report result
Summary of Parameters
Examples of Search
Examples of Search
Poor Starting Point
Examples of Search
Search using ASM of cartilage on an MR image of the knee
Statistical Models of Appearancefor Computer Vision
Active Appearance Models
Disadvantages of ASM
Only uses shape constraints (together with some information about the image structure near the landmarks) for search
Incapable of generating synthetic image
Goal of AAM Given a rough starting approximation of an
appearance model, to fit it within an image
Review: Appearance Model
s
g
Qx xc
Qg gc
model parameter
Overview of AAM Search
= i m g g g
difference vector
grey-level vector in image
grey-level vector by model
s
g
Qx xQg g
c
Overview of AAM Search
= i m g g g2= g
s
g
Qx xQg g
c
AAM Search Minimize by varying c effectivelyeffectively
Overview of AAM Search
s
g
Qx xQg g
c
AAM Search Minimize by varying c effectivelyeffectively
Given g,
how to obtain c deterministically?
= i m g g g2= g
Learning to Correct Model Parameters
c A g
s
g
Qx xQg g
c
Given g,
how to obtain c deterministically?
Linear Model
Multivariate regression on a sample of known model displacements, c, and the corresponding g
Extra Parameters for Pose
s
g
Qx xQg g
c
Linear Model
Multivariate regression on a sample of known model displacements, c, and the corresponding g
Pose parameters( , , , )x y x ys s t t
cossin
x
y
s ss s
Including (sx, sy, tx, ty)
c A g
Training
c A g
s
g
Qx xQg g
c
Linear Model
Multivariate regression on a sample of known model displacements, c, and the corresponding g
Including (sx, sy, tx, ty)
c0 a know appearance model in the current image Perturbing by c to get new parameters, i.e., c = c0 + c
– including small displacements in position, scale, and orientation Computing g = gi gm
– Use shape-normalized representation (warping) Record enough perturbations and image differences for regression
Trainings
g
Qx xQg g
c
c0 a know appearance model in the current image Perturbing by c to get new parameters, i.e., c = c0 + c
– including small displacements in position, scale, and orientation Computing g = gi gm
– Use shape-normalized representation (warping) Record enough perturbations and image differences for regression Ideally, we want a model that holds over large error r
ange, g, so also for parameter range c Experimentally, optimal perturbation around 0.5 sta
ndard deviations for each parameter
Results For The Face Model
c A g
iic
ga
iic a g
Results For The Face Model
c A g iic a g
The weight attached to different areas of the sampled patch when estimating the displacement
Weights for Pose Parameters
c A g iic a g
sx sy tx ty
Pose-Parameter Displacement Weights
c A g iic a g
sx sy tx ty
First Mode and Displacement Weights
c A g iic a g
Second Mode and Displacement Weights
c A g iic a g
Performance of the Prediction
Linear relation holds within 4 pixels As long as prediction has the same sign as actual error, and not much over-prediction, it converges Extend range by building multi-resolution model
Performance of the Prediction
Multi-Resolution ModelL0: 10000 pixels,L1: 2500 pixels,L2: 600 pixels.
AAM Search: Iterative Model Refinement
Evaluate the error vector Evaluate the current error Compute the predicted displacement, Set k = 1 Let Sample the image at this new prediction, and calculate a new error vector, If then accept the new estimate, c1, Otherwise try at k = 1.5, k = 0.5, k = 0.25 etc.
0 s m g g g2
0 0| |E g
0 c A g
1 0 k c c c
1g2
1 0| | E g
AAM Search: Iterative Model Refinement
Evaluate the error vector Evaluate the current error Compute the predicted displacement, Set k = 1 Let Sample the image at this new prediction, and calculate a new error vector, If then accept the new estimate, c1, Otherwise try at k = 1.5, k = 0.5, k = 0.25 etc.
0 s m g g g2
0 0| |E g
0 c A g
1 0 k c c c
1g2
1 0| | E g
Examples of AAM Search
Reconstruction (left) and original (right) given original landmark points
Examples of AAM Search
Multi-Resolution search from displaced position
Examples of AAM Search
First two modes of appearance variation of knee modelBest fit of knee model to
new image given landmarks
Examples of AAM Search
Multi-Resolution search from displaced position
Statistical Models of Appearancefor Computer Vision
Comparison : ASM v/s AAM
Key Differences ASM only uses models of the image texture in the small regions around each landmark point ASM searches around current position ASM seeks to minimize the distance b/w model points and corresponding image points
AAM uses a model of appearance of the whole region AAM only samples the image under current position AAM seeks to minimize the difference of the synthesized image and target image
Experiment Data Two data sets :
– 400 face images, 133 landmarks– 72 brain slices, 133 landmark points
Training data set– Faces : 200, tested on remaining 200– Brain : 400, leave-one-brain-experiments
Capture Range
Capture Range
Point Location Accuracy
Point Location Accuracy
ASM runs significantly faster for both models, and locates the points more accurately
Texture Matching
Statistical Models of Appearancefor Computer Vision
Conclusion
Conclusion ASM searches around the current location, along profiles, so one would expect them to have larger capture range ASM takes only the shape into account thus are less reliable AAM can work well with a much smaller number of landmarks as compared to ASM