Statistical Models of Appearance for Computer Vision 主講人：虞台文.

Statistical Models of Appearancefor Computer Vision

主講人：虞台文

ContentOverviewStatistical Shape ModelsStatistical Appearance ModelsActive Shape ModelsActive Appearance ModelsComparison : ASM v/s AAMConclusion


Overview

Computer Vision Goal

– Image understanding Challenge

– Deformability of objects Statistic model-based approach

– Shape model– Appearance model– Model matching

Image interpretation

Deformable Models

General– capable of generating any plausible

example of the class they represent

Specific– only capable of generating legal examples


Statistical Shape Models

Shapes The shape of an object is represented

by a set of n points in any dimension

Invariance under some transformations– in 2-3 dimension – translation, rotation,

scaling– called similarity transformation

Hand-Annotated Training Set

The training set typically comes from hand annotation of a set of training images

Suitable Landmarks

Object Boundary

High Curvature

Equally spacedIntermediate points

T Junction

Suitable Landmarks

Shape Vector A d-dim shape with n landmarks is represented

by a vector with elements In a 2D-image with n landmark points, a shape

vector x is then

1 1( , , , , )Tn nx y x yx

Training Set of Shape Vectors

A d-dim shape with n landmarks is represented by a vector with nd elements In a 2D-image with n landmark points, a shape vector x is then

The shape vectors in the training set should be in the same coordinate frame– Alignment of the training set is required

1 1( , , , , )Tn nx y x yx

Aligning the Training Set

Aligning the Training Set Procrustes AnalysisMinimize

2| |iD x x

Alignment : Iterative Approach


1x 0 x x

1x 0 x x


1x 0 x x

Possible Alignment Approaches:• Center -> scale (|x| =1) -> rotation• Center -> (scale + rotation)• Center -> (scale + rotation) -> projection onto tangent space of the mean


1x 0 x x

Possible Alignment Approaches:• Center -> scale (|x| =1) -> rotation• Center -> (scale + rotation)• Center -> (scale + rotation) -> projection onto tangent space of the mean

Tangent space to xt – all vectors s.t. (xt x) xt = 0, or x

xt = 1 if xtxt = 1 Obtained by aligning the shape

s with the mean, allowing scaling and rotation, then project into the tangent space by scaling x by 1/(xx)


1x 0 x x

Modeling Shape Variation

Parametric Shape Model

Estimate the distribution of b– Generating new shapes– Examining new shapes (plausibility)

: prameter vect( ) o rMx bb

PCA1. Compute the mean

2. Compute the covariance matrix

3. Compute the eigenvectors, i and corresponding eigenvalues i of S s.t. 1 2

1

1 s

iis

x x

1

1 ( )( )1

sT

i iis

S x x x x

1 2| | | t Φ

Shape Approximation

x x Φb

1 2| | | t Φ

a plausible shapemean shape

parameter vector of deformable model

( )T b Φ x x

Shape Approximation

x x Φb

1 2| | | t Φ


1

2

t

bb

b

b

with 3 3i i ib The generated shape is similar those in training set (plausible).

Choice of Number of Modes (t = ?)

Let fv be the proportion of the total variation one wishes to explain (e.g., 98%)

Total variance VT =i is the sum of all eigenvalues, assuming 1 2 T

Then, we can chose the smallest t s.t.

1 2| | | t Φ

1

t

i v Ti

f V

Examples of Shape Models

Some members in the training set (18 hands)

Each is represented by 72 landmark points


1 1 13 0 3b

2 2 23 0 3b

3 3 33 0 3b


Some members in the training set (300 faces)

Each is represented by 133 landmark points


1 1 13 0 3b

2 2 23 0 3b

3 3 33 0 3b

Generating Plausible Shapes

x x Φb

1 2| | | t Φ


1

2

t

bb

b

b

Assumption :bi’s are independent and Gaussian

Options Hard limits on independent bi’s Constrain b in a hyperellipsoid

Generating Plausible Shapes

x x Φb

1 2| | | t Φ

Variation is modeled as a linear combination of eigenvectors

How about for nonlinear cases?

Nonlinear Shape Variation


b1 and b2 are not independent.


Nonlinear shape variations often caused by– Rotating parts of objects– View point change– Other special cases

Eg : Only 2 valid positions (x = f(b) fails) Plausible Shapes cannot be generated using aforementioned method.

Non-Linear Models for PDF

x x Φb1

2

t

bb

b

b

Polar coordinates (Heap and Hogg) Modeling p(b) by Mixture of Gaussians

DrawbacksFiguring out no. of Gaussians to be usedFinding nearest plausible shape

Similar Transformation

, , , ( )t tX Y sT x x Φb

Similar TransformationTranslationScaleRotationA shape of the model

A shape in a image

, , ,

cos sinsin cost t

tX Y s

t

Xx s s xT

Yy s s y

Pose parameters

Fitting a Model to New Points in a Image

, , , ( )t tX Y sT x x Φb

, , ,

cos sinsin cost t

tX Y s

t

Xx s s xT

Yy s s y

Given a set of new image points Y, find b and pose parameters so as to minimize

,2

, ,| ( ) |t tX Y sY T x Φb

Fitting a Model to New Points in a Image

,2

, ,| ( ) |t tX Y sY T x ΦbMinimize


Statistical Models of Appearance

AppearanceShapeTexture

– Pattern of intensities

Appearance Model

shape variance

texture variance

Shapenormalization

Shape Normalization Remove spurious texture variations due to shape differences Warp each image to match control points with the mean image

– triangulation algorithm

Intensity Normalization

( ) /im g g 1Gray-level vector of shape normalized image

im g g( ) /im n g 1

Mean of the normalized data, scaled and offset s.t. elements’ sum is 0 and variance is 1Obtained using a recursive process

PCA

g g g g P b

mean normalized grey-level vector

a set of orthogonal modes of variation

a set of grey-level parameters

Texture Reconstruction

g g g g P b

( ) /im g g 1

( ) /im g 1

( )im g g g g P b 1

Combined Appearance Model

( )Ts s b P x x Shape parameters

Appearance parameters( )Tg g b P g g

s s

g

W bb

b( )

( )

Ts s

Tg

W P x xP g g

Combined Appearance Models

s s

g

W bb

b( )

( )

Ts s

Tg

W P x xP g g

sW sW

a diagonal matrix of weights to accommodate the difference in units between the shape and grey models

PCA for Combined Vectors

( )( )

Ts s s s

Tg g

W b W P x xb

b P g g

cb P ceigenvectors from applying PCA on b’s

appearance parameters controlling both the shape

and grey-levels of the model

Shape & Grey-Level Reconstruction

( )( )

Ts s s s

Tg g

W b W P x xb

b P g g

cb P c( )

( )

Ts s

Tg

W P x xP g g

cs

cg

Pc

P

1s s cs

g cg

x x P W P cg g P P c

s

g

x Q cg Q c

Appearance Reconstruction

1s s cs

g cg


s

g

x Q cg Q c

Given appearance parameter cGenerate shape-free gray-level image g

warp g to the shape described by x

Review:Combined Appearance Models

s s

g

W bb

b( )

( )

Ts s

Tg

W P x xP g g

sW sW

a diagonal matrix of weights to accommodate the difference in units between the shape and grey models

How to obtained Ws?

Choice of Shape Parameter Weights

s s

g

W bb

b( )

( )

Ts s

Tg

W P x xP g g

sW sW

Method1 Displace each element of bs from its optimum value and observe change in g for each training example

– The RMS change gives elements in Ws

Method2 Ws = rI where r2 is the ratio of the total intensity variation to the total shape variation

Choice of Shape Parameter Weights

s s

g

W bb

b( )

( )

Ts s

Tg

W P x xP g g

sW sW

Method1 Displace each element of bs from its optimum value and observe change in g for each training example

– The RMS change gives elements in Ws

Method2 Ws = rI where r2 is the ratio of the total intensity variation to the total shape variation

The choice of Ws is relatively

insensitive

Example: Facial Appearance Model

First two modes of shape variation (3 sd) First two modes of grey-level variation (3 sd)

Example: Facial Appearance Model

First four modes of appearance variation (3 sd)

Approximating a New Example

Given a new image, labelled with a set of landmarks, to generate an approximation with the model.

Approx.

Approximating a New Example

Given a new image, labelled with a set of landmarks, to generate an approximation with the model.

Obtain bs and bg

Obtain b Obtain c Apply Inverting gray level normalization by Applying pose to the points Projecting the gray level vector to the image

1s s cs

g cg


im g g 1


Active Shape Models

Goal Given a rough starting approximation, to

fit an instance of a model to the image

http://personalpages.manchester.ac.uk/staff/timothy.f.cootes/Models/aam.html

http://personalpages.manchester.ac.uk/staff/timothy.f.cootes/Models/aam.html

Iterative Approach Iteratively improving the fit of the instance, X, to an image proceeds as follows:

1. Examine a region of the image around each point Xi to find the best nearby match for the point2. Update the parameters (Xt, Yt, s, , b) to best fit the new found points X3. Repeat until convergence

iX



iX

• The above method is applicable if model points are edges• The best approach is to examine the local structures of model points (to be discussed)



iX

Modeling Local Structure Sample the derivative along a profile, k pixels on either side of a model point, to get a vector gi of the

2k+1 points

gi

k points

k points

gi

k pointsk points

k pointsk points

Modeling Local Structure Sample the derivative along a profile, k pixels on either side of a model point, to get a vector gi of the

2k+1 points Normalize gi by Repeat for each training image for same model point to get {gi} Estimate mean and covariance

1| |i i

ijig

g g

igigS

Fitness Sample the derivative along a profile, k pixels on either side of a model point, to get a vector gi of the

2k+1 points Normalize gi by Repeat for each training image for same model point to get {gi} Estimate mean and covariance

1| |i i

ijig

g g

igigS

1( ) ( ) ( )i

Ts s i s if gg g g S g g

Mahalanobis distance gi

k pointsk points

k pointsk points

Using Local Structure Model

gi

k pointsk points

k pointsk points

Sample a profile m pixels either side of the current point (m>k)

Test quality of fit at 2(mk)+1 positions

Chose the one which gives the best match

Statistical Models of Grey-Level Profiles

The same number of pixels is used at each level

MRASM Algorithm

MRASM AlgorithmStarting from the

coarsest level

MRASM Algorithm

Set up the initial position before search

MRASM AlgorithmSearch best match position

for each model points

MRASM AlgorithmSearch best match position

for each model points

Ensure plausibility

MRASM Algorithm

Repeat until convergence of the level

Change to finer level

MRASM Algorithm

Report result

Summary of Parameters

Examples of Search

Examples of Search

Poor Starting Point

Examples of Search

Search using ASM of cartilage on an MR image of the knee


Active Appearance Models

Disadvantages of ASM

Only uses shape constraints (together with some information about the image structure near the landmarks) for search

Incapable of generating synthetic image

Goal of AAM Given a rough starting approximation of an

appearance model, to fit it within an image

http://www2.imm.dtu.dk/~aam/faces/opt_c.avi

http://www2.imm.dtu.dk/~aam/faces/opt_c.avi

Review: Appearance Model

s

g

Qx xc

Qg gc

model parameter

Overview of AAM Search

= i m g g g

difference vector

grey-level vector in image

grey-level vector by model

s

g

Qx xQg g

c


= i m g g g2= g

s

g

Qx xQg g

c

AAM Search Minimize by varying c effectivelyeffectively


s

g

Qx xQg g

c

AAM Search Minimize by varying c effectivelyeffectively

Given g,

how to obtain c deterministically?

= i m g g g2= g

Learning to Correct Model Parameters

c A g

s

g

Qx xQg g

c

Given g,

how to obtain c deterministically?

Linear Model

Multivariate regression on a sample of known model displacements, c, and the corresponding g

Extra Parameters for Pose

s

g

Qx xQg g

c

Linear Model


Pose parameters( , , , )x y x ys s t t

cossin

x

y

s ss s

Including (sx, sy, tx, ty)

c A g

Training

c A g

s

g

Qx xQg g

c

Linear Model


Including (sx, sy, tx, ty)

c0 a know appearance model in the current image Perturbing by c to get new parameters, i.e., c = c0 + c

– including small displacements in position, scale, and orientation Computing g = gi gm

– Use shape-normalized representation (warping) Record enough perturbations and image differences for regression

Trainings

g

Qx xQg g

c

c0 a know appearance model in the current image Perturbing by c to get new parameters, i.e., c = c0 + c

– including small displacements in position, scale, and orientation Computing g = gi gm

– Use shape-normalized representation (warping) Record enough perturbations and image differences for regression Ideally, we want a model that holds over large error r

ange, g, so also for parameter range c Experimentally, optimal perturbation around 0.5 sta

ndard deviations for each parameter

Results For The Face Model

c A g

iic

ga

iic a g

Results For The Face Model

c A g iic a g

The weight attached to different areas of the sampled patch when estimating the displacement

Weights for Pose Parameters

c A g iic a g

sx sy tx ty

Pose-Parameter Displacement Weights

c A g iic a g

sx sy tx ty

First Mode and Displacement Weights

c A g iic a g

Second Mode and Displacement Weights

c A g iic a g

Performance of the Prediction

Linear relation holds within 4 pixels As long as prediction has the same sign as actual error, and not much over-prediction, it converges Extend range by building multi-resolution model

Performance of the Prediction

Multi-Resolution ModelL0: 10000 pixels,L1: 2500 pixels,L2: 600 pixels.

AAM Search: Iterative Model Refinement

Evaluate the error vector Evaluate the current error Compute the predicted displacement, Set k = 1 Let Sample the image at this new prediction, and calculate a new error vector, If then accept the new estimate, c1, Otherwise try at k = 1.5, k = 0.5, k = 0.25 etc.

0 s m g g g2

0 0| |E g

0 c A g

1 0 k c c c

1g2

1 0| | E g

Examples of AAM Search

Reconstruction (left) and original (right) given original landmark points


Multi-Resolution search from displaced position


First two modes of appearance variation of knee modelBest fit of knee model to

new image given landmarks


Multi-Resolution search from displaced position


Comparison : ASM v/s AAM

Key Differences ASM only uses models of the image texture in the small regions around each landmark point ASM searches around current position ASM seeks to minimize the distance b/w model points and corresponding image points

AAM uses a model of appearance of the whole region AAM only samples the image under current position AAM seeks to minimize the difference of the synthesized image and target image

Experiment Data Two data sets :

– 400 face images, 133 landmarks– 72 brain slices, 133 landmark points

Training data set– Faces : 200, tested on remaining 200– Brain : 400, leave-one-brain-experiments

Capture Range

Point Location Accuracy

Point Location Accuracy

ASM runs significantly faster for both models, and locates the points more accurately

Texture Matching


Conclusion

Conclusion ASM searches around the current location, along profiles, so one would expect them to have larger capture range ASM takes only the shape into account thus are less reliable AAM can work well with a much smaller number of landmarks as compared to ASM

Statistical Models of Appearance for Computer Vision 主講人：虞台文.

Documents

Transcript of Statistical Models of Appearance for Computer Vision 主講人：虞台文.