Segmentation via Generative Models - Vision...
Transcript of Segmentation via Generative Models - Vision...
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Segmentation via Generative Models
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Overview
• In the last lecture we introduced classifier basedsegmentation models where– class -> image generative models were used
– P(pixel | class) is learned from training data
Here we show how to use class -> image generative modelswithout labeled data.
The key idea is to simultaneously estimate class densityparameters and the class-membership each piece of imagedata.
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Missing variable problems• In many vision problems, if some variables were known the
maximum likelihood inference problem would be easy– fitting; if we knew which line each token came from, it would be easy
to determine line parameters
– segmentation; if we knew the segment each pixel came from, it wouldbe easy to determine the segment parameters
– fundamental matrix estimation; if we knew which featurecorresponded to which, it would be easy to determine thefundamental matrix
– etc.
• This sort of thing happens in statistics, too
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
For Independent data samples
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Missing variables - strategy• We have a problem with
parameters, missingvariables
• This suggests:
• Iterate until convergence– replace missing variable with
expected values, given fixedvalues of parameters
– fix missing variables, chooseparameters to maximiselikelihood given fixed valuesof missing variables
• e.g., iterate till convergence– allocate each point to a line
with a weight, which is theprobability of the point giventhe line
– refit lines to the weighted setof points
• Converges to localextremum
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Density Est: Mixture of Densities
e.g. θ = { µ, σ }
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Motion Estimation
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Mixture Model ApplicationsV
eloc
ity
Image position
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
But only if we are given the distributions and prior
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
http://www.ncrg.aston.ac.uk/netlab/
* PCA* Mixtures of probabilistic PCA* Gaussian mixture model with EM training* Linear and logistic regression with IRLS* Multi-layer perceptron with linear, logistic and
softmax outputs and error functions* Radial basis function (RBF) networks with
both Gaussian and non-local basis functions* Optimisers, including quasi-Newton methods,
conjugate gradients and scaled conj grad.* Multi-layer perceptron with Gaussian mixture
outputs (mixture density networks)* Gaussian prior distributions over parameters
for the MLP, RBF and GLM including multiple hyper-parameters
* Laplace approximation framework for Bayesian inference (evidence procedure)
* Automatic Relevance Determination for input selection
* Markov chain Monte-Carlo including simple Metropolis and hybrid Monte-Carlo
* K-nearest neighbour classifier* K-means clustering* Generative Topographic Map* Neuroscale topographic projection* Gaussian Processes* Hinton diagrams for network weights* Self-organising map
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Data sampled fromMixture of 3 Gaussians Spectral Clustering
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Original Data Gaussian Mixture ModelClassification
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Missing variable problems• In many vision problems, if some variables were known the maximum
likelihood inference problem would be easy– fitting; if we knew which line each token came from, it would be easy to determine
line parameters
– segmentation; if we knew the segment each pixel came from, it would be easy todetermine the segment parameters
– fundamental matrix estimation; if we knew which feature corresponded to which, itwould be easy to determine the fundamental matrix
– etc.
• This sort of thing happens in statistics, too
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
For Independent data samples
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Missing variables - strategy• We have a problem with
parameters, missingvariables
• This suggests:
• Iterate until convergence– replace missing variable with
expected values, given fixedvalues of parameters
– fix missing variables, chooseparameters to maximiselikelihood given fixed valuesof missing variables
• e.g., iterate till convergence– allocate each point to a line
with a weight, which is theprobability of the point giventhe line
– refit lines to the weighted setof points
• Converges to localextremum
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Finite Mixtures
P(x) = Σi=1:3 a(i) gi(x; θ)
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Expection onIndicatorvariables
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005Figure from “Color and Texture Based Image Segmentation Using EM and Its Application to ContentBased Image Retrieval”,S.J. Belongie et al., Proc. Int. Conf. Computer Vision, 1998, c1998, IEEE
Segmentation with EM
Scale Estimatemap
6 texture features
EM components
SegmentationInto ‘Blobs’
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Fitting• Choose a parametric
object/some objects torepresent a set of tokens
• Most interesting case iswhen criterion is not local– can’t tell whether a set of
points lies on a line bylooking only at each point andthe next.
• Three main questions:– what object represents this
set of tokens best?
– which of several objects getswhich token?
– how many objects are there?
(you could read line for objecthere, or circle, or ellipse or...)
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Fitting and the Hough Transform
• Purports to answer all threequestions– in practice, answer isn’t usually
all that much help
• We do for lines only
• A line is the set of points (x, y)such that
• Different choices of θ, d>0 givedifferent lines
• For any (x0, y0) there is a oneparameter family of lines throughthis point, given by
• Plot these curves in discretized, r,θ,space. Each point (r,θ) is abucket.
• Each point gets to vote for eachline in the family; if there is a linethat has lots of votes, that shouldbe the line passing through thepoints.
• This voting can be done by add 1to every (r,θ) point that the curvespass through, accumulating acrossthe set of (x0, y0) points.
€
sinθ( )x + cosθ( )y + d = 0
€
r = −sinθ x0 − cosθ y0
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
tokensvotes
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Mechanics of the Hough transform• Construct an array
representing θ, d
• For each point, render thecurve (θ, d) into this array,adding one at each cell
• Difficulties– how big should the cells be?
(too big, and we cannotdistinguish between quitedifferent lines; too small, andnoise causes lines to bemissed)
• How many lines?– count the peaks in the Hough
array
• Who belongs to which line?– tag the votes
• Hardly ever satisfactory inpractice, because problemswith noise and cell sizedefeat it
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
tokens votes
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Line fitting can be max.likelihood - but choice ofmodel is important
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Who came from which line?
• Assume we know how many lines there are - but whichlines are they?– easy, if we know who came from which line
• Three strategies– Incremental line fitting
– K-means
– Probabilistic (later!)
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Robustness
• As we have seen, squared error can be a source ofbias in the presence of noise points– One fix is EM - we’ll do this shortly
– Another is an M-estimator• Square nearby, threshold far away
– A third is RANSAC• Search for good points
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005€
Example :ρθ r( ) = r2 = (yi − f (xi))
2
∂ρθ yi − f (xi)( )∂yi
= 2yi
Influence function:
€
Err = ρθ yi − f (xi)( )i=1:N∑
Example :
ρθ r( ) =r2
r2 + θ 2Influence :∂ρθ r( )∂r
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Modified Error metrics
Euclidean: d2
Robust: d2/(d2+ s2)
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Too small
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Too large
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
RANSAC• Choose a small subset
uniformly at random
• Fit to that
• Anything that is close toresult is signal; all othersare noise
• Refit
• Do this many times andchoose the best
• Issues– How many times?
• Often enough that we are likelyto have a good line
– How big a subset?• Smallest possible
– What does close mean?• Depends on the problem
– What is a good line?• One where the number of
nearby points is so big it isunlikely to be all outliers
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Fitting curves other than lines• In principle, an easy
generalisation– The probability of obtaining a
point, given a curve, is givenby a negative exponential ofdistance squared
• In practice, rather hard– It is generally difficult to
compute the distancebetween a point and a curve
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Lines and robustness• We have one line, and n
points
• Some come from the line,some from “noise”
• This is a mixture model:
• We wish to determine– line parameters
– p(comes from line)
€
P point | line and noise params( ) = P point | line( )P comes from line( ) +
P point | noise( )P comes from noise( )= P point | line( )λ + P point | noise( )(1− λ)
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Estimating the mixture model• Introduce a set of hidden
variables, δ, one for eachpoint. They are one whenthe point is on the line, andzero when off.
• If these are known, thenegative log-likelihoodbecomes (the line’sparameters are φ, c):
• Here K is a normalisingconstant, kn is the noiseintensity (we’ll choose thislater).
€
Lc ({xi,yi};θ) = δi xi cosφ + yi sinφ( )2 /2σ 2
i∑ + (1−δi)kn + K
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Substituting for delta• We shall substitute the
expected value of δ, for agiven θ
• recall θ=(φ, c, λ)
• E(δ_i)=1. P(δ_i=1|θ)+0....
• Notice that if kn is small andpositive, then if distance issmall, this value is close to 1and if it is large, close tozero
€
P δi = 1|θ,xi( ) =P xi |δi = 1,θ( )P δi = 1( )
P xi |δ i = 1,θ( )P δi = 1( ) + P xi |δi = 0,θ( )P δ i = 0( )
=exp −12σ 2 xi cosφ + yi sinϕ + c[ ]2( )λ
exp −12σ 2 xi cosφ + yi sinϕ + c[ ]2( )λ + exp −kn( ) 1− λ( )
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Algorithm for line fitting• Obtain some start point
• Now compute δ’s usingformula above
• Now compute maximumlikelihood estimate of
– φ, c come from fitting toweighted points
– λ comes by counting
• Iterate to convergence
€
θ 0( ) = φ 0( ),c 0( ) ,λ 0( )( )
€
θ 1( )
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
The expected values of the deltas at the maximum(notice the one value close to zero).
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Closeup of the fit
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Choosing parameters
• What about the noise parameter, and the sigma for theline?– several methods
• from first principles knowledge of the problem (seldom reallypossible)
• play around with a few examples and choose (usually quiteeffective, as precise choice doesn’t matter much)
– notice that if kn is large, this says that points very seldomcome from noise, however far from the line they lie
• usually biases the fit, by pushing outliers into the line
• rule of thumb; its better to fit to the better fitting points, withinreason; if this is hard to do, then the model could be a problem
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Other examples• Segmentation
– a segment is a gaussian thatemits feature vectors (whichcould contain colour; orcolour and position; or colour,texture and position).
– segment parameters aremean and (perhaps)covariance
– if we knew which segmenteach point belonged to,estimating these parameterswould be easy
– rest is on same lines as fittingline
• Fitting multiple lines– rather like fitting one line,
except there are more hiddenvariables
– easiest is to encode as anarray of hidden variables,which represent a table with aone where the i’th pointcomes from the j’th line, zerosotherwise
– rest is on same lines asabove
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Issues with EM
• Local maxima– can be a serious nuisance in some problems
– no guarantee that we have reached the “right” maximum
• Starting– k means to cluster the points is often a good idea
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Local maximum
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
which is an excellent fit to some points
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
and the deltas for this maximum
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
A dataset that is well fitted by four lines
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Result of EM fitting, with one line (or at least, one available local maximum).
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Result of EM fitting, with two lines (or at least, one available local maximum).
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Seven lines can produce a rather logical answer
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Motion segmentation with EM• Model image pair (or video
sequence) as consisting ofregions of parametric motion– affine motion is popular
• Now we need to– determine which pixels
belong to which region
– estimate parameters
• Likelihood– assume
• Straightforward missingvariable problem, rest iscalculation
€
vxvy
=
a bc d
xy +
txty
€
I x, y,t( ) = I x + vx, y + vy,t +1( )+noise
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Three frames from the MPEG “flower garden” sequence
Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEETransactions on Image Processing, 1994, c 1994, IEEE
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Grey level shows region no. with highest probability
Segments and motion fields associated with themFigure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEETransactions on Image Processing, 1994, c 1994, IEEE
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
If we use multiple frames to estimate the appearanceof a segment, we can fill in occlusions; so we canre-render the sequence with some segments removed.
Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEETransactions on Image Processing, 1994, c 1994, IEEE
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Some generalities• Many, but not all problems
that can be attacked withEM can also be attackedwith RANSAC– need to be able to get a
parameter estimate with amanageably small number ofrandom choices.
– RANSAC is usually better
• Didn’t present in the mostgeneral form– in the general form, the
likelihood may not be a linearfunction of the missingvariables
– in this case, one takes anexpectation of the likelihood,rather than substitutingexpected values of missingvariables
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Model Selection• We wish to choose a model
to fit to data– e.g. is it a line or a circle?
– e.g is this a perspective ororthographic camera?
– e.g. is there an aeroplanethere or is it noise?
• Issue– In general, models with more
parameters will fit a datasetbetter, but are poorer atprediction
– This means we can’t simplylook at the negative log-likelihood (or fitting error)
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Top is not necessarily a betterfit than bottom(actually, almost always worse)
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
We can discount the fitting error with some term in the numberof parameters in the model.
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Discounts• AIC (an information criterion)
– choose model with smallestvalue of
– p is the number ofparameters
• BIC (Bayes informationcriterion)– choose model with smallest
value of
– N is the number of datapoints
• Minimum description length– same criterion as BIC, but
derived in a completelydifferent way
€
−2L D;θ*( ) + p logN
€
−2L D;θ*( ) + 2 p
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Cross-validation• Split data set into two pieces,
fit to one, and computenegative log-likelihood onthe other
• Average over multipledifferent splits
• Choose the model with thesmallest value of thisaverage
• The difference in averagesfor two different models is anestimate of the difference inKL divergence of the modelsfrom the source of the data
CSCI 5561: Computer Vision, Prof. Paul Schrater, Spring 2005
Model averaging• Very often, it is smarter to
use multiple models forprediction than just one
• e.g. motion capture data– there are a small number of
schemes that are used to putmarkers on the body
– given we know the scheme Sand the measurements D, wecan estimate theconfiguration of the body X
• We want
• If it is obvious what the scheme is fromthe data, then averaging makes littledifference
• If it isn’t, then not averagingunderestimates the variance of X --- wethink we have a more precise estimatethan we do.
€
P X | D( ) = P X | S1,D( )P S1 | D( ) +
P X | S2,D( )P S2 | D( ) +
P X | S3,D( )P S3 | D( )