Inf5300 v2013 Lecture2 Random 2pp

7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

1/50

1

Technology for a better society

INF 5300 Advanced Topic: Video Content Analysis

Asbjrn Berge

Random algorithms in Computer Vision

x

y


Outline

Motivation / examples Intuitive approach RAndom SAmple Consensus Algorithm specificities

Robust fittingby randomsampling

Recap of boosting classifiers Tree classifiers / decision stumps Randomness in training Algorithm details

Randomizedclassifiers

2


2/50

2


Structure from motion

Obtain 3D scene structure frommultiple images from the same camerain different locations, poses

Typically, camera location & posetreated as unknowns

Track points across frames, infercamera pose & scene structure fromcorrespondences

Simultaneous Location And Mapping (SLAM)

Localize a robot and map itssurroundings with a single camera

3

Inferring 3D


3D Reconstruction

InternetPhotos(Colosseum) Reconstructed3Dcamerasandpointshttp://photosynth.net/default.aspx

http://phototour.cs.washington.edu/applet/index.html


3/50

3


Why extract features? Motivation: panorama stitching

We have two images how do we combine them?


Why extract features?

Motivation: panorama stitching


Step1:extractfeatures

Step2:matchfeatures


4/50

4


Why extract features? Motivation: panorama stitching


Step1:extractfeatures

Step2:matchfeatures

Step3:alignimages


Local invariant features: outline

1) Detection: Identify the interestpoints

2) Description: Extract vectorfeature descriptor surrounding

each interest point.3) Matching: Determine

correspondence betweendescriptors in two views

],,[)1()1(

11 dxx x

],,[ )2()2(12 dxx x


5/50

5


Computing transformations

Given a set of matches between images A and B

How can we compute the transform T from A to B?

Find transform T that best agrees with the matches


Evaluating the results

How can we measure the performance of a feature matcher?

50

75

200

featuredistance


6/50

6


True/false positives

The distance threshold affects performance True positives = # of detected matches that are correct

Suppose we want to maximize thesehow to choose threshold?

False positives = # of detected matches that are incorrect Suppose we want to minimize thesehow to choose threshold?

50

75

200false match

true match

featuredistance

Howcanwemeasuretheperformanceofafeaturematcher?


Robustnessoutliers


7/50

7


Robustness Lets consider a simpler example

How can we fix this?

Problem:Fitalinetothesedatapoints Leastsquaresfit


Idea

Given a hypothesized line

Count the number of points that agree with the line

Agree = within a small distance of the line

I.e., the inliers to that line

For all possible lines, select the one with the largest number of inliers


8/50

8


How do we find the best line?

Unlike least-squares, no simple closed-form solution

Hypothesize-and-test Try out many lines, keep the best one

Which lines?


Translations


9/50

9


RAndom SAmple Consensus

Selectone matchatrandom,countinliers



Selectanothermatchatrandom,countinliers


10/50

10



Outputthetranslationwiththehighestnumberofinliers


RANSAC

Idea: All the inliers will agree with each other on the translation

vector; the (hopefully small) number of outliers will (hopefully)disagree with each other

RANSAC only has guarantees if there are < 50% outliers

All good matches are alike; every bad match is bad in its ownway.

Tolstoy via Alyosha Efros


11/50

11


),(Pf

Omin OPI ,: such that:

TT PPPPf 1),(

Model parameters

RANSAC

[Fischler & Bolles, 1981}

(RANdom SAmple Consensus) :Learning technique to estimateparameters of a model by randomsampling of observed data


Algorithm:

1. Sample (randomly) the number of points required to fit the model2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model

Repeat 1-3 until the best model is found with high confidence

RANSAC

[Fischler & Bolles, 1981}

(RANdom SAmple Consensus) :


12/50

12


RANSAC

Algorithm:

1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model


Line fitting example


RANSAC

Algorithm:





13/50

13


RANSAC

6IN

Algorithm:





RANSAC

14INAlgorithm:




14/50

14


RANSAC

Inlier threshold related to the amount of noise we expect ininliers Often model noise as Gaussian with some standard deviation (e.g., 3

pixels)

Number of rounds related to the percentage of outliers weexpect, and the probability of success wed like to guarantee Suppose there are 20% outliers, and we want to find the correct answer

with 99% probability

How many rounds do we need?


RANSAC

xtranslation

ytranslation

setthresholdsothat,e.g.,

95%oftheGaussian

liesinsidethatradius


15/50

15


RANSAC

Back to linear regression

How do we generate a hypothesis?

x

y


RANSAC

x

y

Back to linear regression

How do we generate a hypothesis?


16/50

16


RANSAC

General version:1. Randomly choose s samples

Typically s = minimum sample size that lets you fit a model

2. Fit a model (e.g., line) to those samples

3. Count the number of inliers that approximately fit the model

4. Repeat Ntimes

5. Choose the model that has the largest set of inliers


How big is s?

For alignment, depends on the motion model

Here, each sample is a correspondence (pair of matching points)


17/50

17


Final step: least squares fit

Findaveragetranslationvectoroverallinliers


Choosing the parameters

Initial number of points s Typically minimum number needed to fit the model

Distance threshold t Choose tso probability for inlier isp (e.g. 0.95)

Zero-mean Gaussian noise with std. dev. : t2=3.842

Number of samples N Choose Nso that, with probabilityp, at least one random sample is free

from outliers (e.g.p=0.99) (outlier ratio: e)

sepN 11log/1log

pe Ns 111proportion of outliers e

s 5% 10% 20% 25% 30% 40% 50%

2 2 3 5 6 7 11 17

3 3 4 7 9 11 19 35

4 3 5 9 13 17 34 72

5 4 6 12 17 26 57 146

6 4 7 16 24 37 97 293

7 4 8 20 33 54 163 588

8 5 9 26 44 78 272 1177


18/50

18


Algorithmic specificities

Termination when when inlier ratio reaches expected ratio of inliers

e is often unknown a priori, so pick worst case, e.g. 50%, and adapt if more inliers are found,e.g. 80% would yield e=0.2

N=, sample_count=0

While N>sample_countrepeat

Choose a sample and count the number of inliers

Set e=1-(number of inliers)/(total number of points)

Recompute Nfrom e

Increment the sample_countby 1

Terminate

neT 1

sepN 11log/1log

Technology for a better society 36* From Marc Pollefeys COMP 256 2003


19/50

19


RANSAC conclusionsGood Robust to outliers

Applicable for larger number of parameters than Hough transform

Parameters are easier to choose than Hough transform

Bad Computational time grows quickly with fraction of outliers and number of

parameters

Not good for getting multiple fits

Common applications Robust linear regression (and similar).

Computing a homography (e.g., image stitching)

Estimating fundamental matrix (relating two views)


Sounds familiar?


20/50

20


VLFeat demo of Ransac Homography fit

335tentativematches

209(62.39%)inliner matches out of335

Mosaic


Reading materials and tools

R. Szeliski: Computer Vision: Algorithms and Applications

Chapters 4.1, 6.1 http://szeliski.org/Book/

M. Zuliani: Ransac for dummieshttp://vision.ece.ucsb.edu/~zuliani/Research/RANSAC/docs/RANSAC4Dummies.pdf

ToolsRansac toolbox: https://github.com/RANSAC/RANSAC-ToolboxVlFeat toolbox : http://www.vlfeat.org

OpenCV 3D reconstruction:http://opencv.itseez.com/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html


21/50

21


Outline

Motivation / examples Intuitive approach RAndom SAmple Consensus Algorithm specificities

Robust fittingby randomsampling

Recap of boosting classifiers Tree classifiers / decision stumps Randomness in training Algorithm details

Randomized

classifiers

41


Recap: AdaBoost Adaptive Boosting

Instead of resampling, reweight misclassified training examples. Increase the chance of being selected in a sampled training set.

Or increase the misclassification cost when training on the full set.

Components

: weak or base classifier Condition:


22/50

22


Recap: AdaBoost Intuition

43 B. Leibe

Consider a 2D feature

space with positive and

negative examples.

Each weak classifier splits

the training examples with

at least 50% accuracy.

Examples misclassified by

a previous weak learnerare given more emphasis

at future rounds.

Slide credit: Kristen Grauman Figure adapted from Freund & Schapire

43



44 B. Leibe

Slide credit: Kristen Grauman Figure adapted from Freund & Schapire

44


23/50

23



45

Final classifier iscombination of the

weak classifiers

Slide credit: Kristen Grauman

45


Recap: AdaBoost AlgorithmStart with uniformweights on trainingexamples

Evaluateweightederrorfor each feature, pick

best.

Re-weight the examples:Incorrectly classified -> more weightCorrectly classified -> less weight

Final classifier is combination of the weak ones, weightedaccording to error they had.

[Freund & Schapire, 1995]

{x1,xn}

For T rounds

46


24/50

24


Randomized Decision Forests Very fast tools for

classification

clustering

regression

Good generalization through randomized training

Inherently multi-class automatic feature sharing [Torralba et al. 07]

Simple training / testing algorithms

RandomizedDecisionForests=RandomizedForests=RandomForestsTM


A brief history of forests

[ L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees (CART). 1984 ]

[ Y. Amit and D. Geman.Randomized enquiries about shape; An application to

handwritten digit recognition. Technical Report 1994]

[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. 1997 ]

[ L. Breiman. Random forests. 1999, 2001 ]

[ V. Lepetit and P. Fua. Keypoint recognition using randomized trees.2005, 2006 ]

[ F. Moosman, B. Triggs, F. Jurie. Fast discriminative visual codebooks using

randomized clustering forests. 2006]

[ G. Rogez, J. Rihan, S. Ramalingam, P. Orrite, C. Torr. Randomized trees for human pose detection. 2008 ]

[ C. Leistner, A. Saffari, J. Santner, H. Bischoff. Semi-supervised random forests. 2009 ]

[ A. Saffari, C. Leistner, J. Santner, M. Godec, H. Bischoff. On-line random forests. 2009 ]

[ S. Nowozin, C. Rother, S. Bagon, T. Sharp, B. Yao, and P. Kohli. Decision tree fields. 2011 ]


25/50

25


What can decision forests do? tasks

Regression forests

Classification forests

Semi-supervised forests


What can decision forests do? applications

Regression forests


Semi-supervised forestse.g. semantic segmentation

e.g. object localization e.g. semi-sup. semantic segmentation


26/50

26


Decision trees and decision forests

A forest is an ensemble of trees. The trees are all slightly different from one another.

[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural

Computation. 9:1545--1588, 1997]

[ L. Breiman. Random forests. Machine Learning. 45(1):5--32, 2001]

Is toppart blue?

Is bottompart green?

Is bottompart blue?

A decision tree

terminal (leaf) node

internal(split) node

root node0

1 2

3 4 5 6

7 8 9 10 11 12 13 14

A general tree structure


Inputtestpoint Split the data at node

Decision tree testing (runtime)

Input data in feature space

Prediction at leaf


27/50


28/50

28


The decision forest model

Basic notation

Output/label space Categorical, continuous?e.g.

Input data point e.g. Collection of feature responses . d=?

Feature response selector Features can be e.g. wavelets? Pixel intensities? Context?

Forest model

tree

Node weak learner The test function for splitting data at a node j.e.g.

Node objective function (train.) The energy to be minimized when training the j-thsplit nodee.g.

Stopping criteria (train.) e.g. max tree depth = When to stop growing a tree during training

The ensemble model How to compute the forest output from that of individual trees?e.g.ensemble

Forest size Total number of trees in the forest

Leaf predictor model Point estimate? Full distribution?e.g.

Randomness model (train.)e.g. 1. Bagging,

2. Randomized node optimization

How is randomness injected during training? How much?

Node test parametersParameters related to each split node:i) which features, ii) what geometric primitive, iii) thresholds.


Decision forest model: the randomness model

1) Bagging (randomizing the training set)

The full training set

The randomly sampled subset of training data made available for the tree t

Forest training

Efficient training


29/50

29


Decision forest model: the randomness model

The full set of all possible node test parameters

For each node the set of randomly sampled features

Randomness control parameter.For no randomness and maximum tree correlation.For max randomness and minimum tree correlation.

2) Randomized node optimization (RNO)

S mal l val ue o f ; l it tl e t ree corr el at io n. Larg e val ue o f ; l arg e t ree corr el at io n.

The effect of

Node weak learner

Node test params

Node training


Decision forest model: the ensemble modelAn example forest to predictcontinuous variables


30/50

30


Decision forest model: training and information gain

Before

split

Information gain

Shannons entropy

Node training

(for categorical, non-parametric distributions)

Split1

Split2


Decision forest model: training and information gain

Information gain

Differential entropy of Gaussian

Node training

Before

split

(for continuous, parametric densities)

Split1

Split2


31/50

31


Background: overfitting and underfitting



Efficient, supervised multi-class classification

[ V. Lepetitand P. Fua. Keypoint Recognition Using Randomized Trees. IEEE Trans. PAMI. 2006.]


32/50

32


Decision Tree Pseudo-Codedouble[]ClassifyDT(node,v)

ifnode.IsSplitNode thenifnode.f(v)>=node.t then

returnClassifyDT(node.right,v)else

returnClassifyDT(node.left,v)end

else

returnnode.Pend

end


feature vectors are x, y coordinates: ,

split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green

x

yToy Example Try several lines, chosen at

random

Keep line that best separates data

information gain

Recurse


33/50

33




x


random


information gain

Recurse


x

y



Toy Example Try several lines, chosen at

random


information gain

Recurse


34/50

34




x


random


information gain

Recurse


Recursively split examples at node n set In indexes labeled training examples (vi, li):

At node , is histogram of example labels

Randomized Learning

left split

right split thresholdfunction ofexample is

feature vector


35/50

35


Randomized Learning Pseudo Code

TreeNode LearnDT(I)

repeatfeatureTests timesletf =RndFeature()letr =EvaluateFeatureResponses(I,f)repeatthreshTests times

lett =RndThreshold(r)let(I_l, I_r)=Split(I,r,t)letgain =InfoGain(I_l,I_r)ifgain isbestthenrememberf,t,I_l,I_r

end

end

ifbestgain issufficientreturnSplitNode(f,t,LearnDT(I_l),LearnDT(I_r))

elsereturnLeafNode(HistogramExamples(I))

end

end


Forest is ensemble of several decision trees

classification is

_|

A Forest of Trees

tree 1 tree

categoryc

categoryc

split nodes

leaf nodes

[Amit & Geman 97][Breiman 01][Lepetit et al. 06]

v v


36/50

36


Decision Forests Pseudo-Codedouble[]ClassifyDF(forest,v)

//allocatememoryletP =double[forest.CountClasses]//loopovertreesinforestfort =1toforest.CountTrees

letP =ClassifyDT(forest.Tree[t],v)P =P +P//sumdistributions

end

//normaliseP =P /forest.CountTrees

end


Learning a Forest

Divide training examples into subsets improves generalization

reduces memory requirements & training time

Train each decision tree on subset same decision tree learning as before

Multi-core friendly Subsetscanbechosenatrandomorhandpicked

Subsetscanhaveoverlap(andusuallydo)

Canenforcesubsetsofimages (notjustexamples)

Couldalsodividethefeaturepoolintosubsets


37/50

37


Learning a Forest Pseudo CodeForest LearnDF(countTrees,I)

//allocatememoryletforest =Forest(countTrees)//loopovertreesinforestfort =1tocountTrees

letI_t =RandomSplit(I)forest[t]=LearnDT(I_t)

end

//returnforestobjectreturnforest

end


Toy Forest Classification Demo

6 classes in a 2 dimensional feature space.Split functions are lines in this space.


38/50

38



With a depth 2 tree, you cannot separate all six classes.



With a depth 3 tree, you are doing better, but still cannot separate all six classes.


39/50

39



With a depth 4 tree, you now have at least as many leaf nodes as classes,and so are able to classify most examples correctly.



Different trees within a forest can give rise to very different decision boundaries,none of which is particularly good on its own.


40/50

40



But averaging together many trees in a forest can result in decision boundariesthat look very sensible, and are even quite close to the max margin classifier.(Shading represents entropy darker is higher entropy).


Classification forestTraining data in feature space

?

?

?

Entropy of a discrete distribu tion

with

Classification treetraining

Obj. funct. for node j (information gain)

Training node j

Output is categorical

Input data point

Node weak learner

Predictor model (class posterior)

Model specialization for classification

( is feature response)

(discrete set)


41/50

41


Classification forest: the weak learner model

Node weak learner

Node test params

Splitting data at node j

Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section

Examples of weak learners

See Appendix C for relation with kernel trick.

Feature responsefor 2D example.

With a generic line in homog. coordinates.


With a matrix representing a conic.


In general may select only a very small subset of features

With or


Classification forest: the prediction model

What do we do at the leaf?

leafleaf

leaf

Prediction model: probabilistic


42/50

42


Classification forest: the ensemble model

Tree t=1 t=2 t=3

Forest output probability

The ensemble model


Training different trees in the forest

Testing different trees in the forest

(2 videos in this page)

Classification forest: effect of the weak learner model

Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic

Accuracy of prediction

Quality of confidence

Generalization

Three concepts to keep in mind:

Training points


43/50

43




Classification forest: effect of the weak learner model

Parameters: T=200, D=2, weak learner = linear, leaf model = probabilistic(2 videos in this page)

Training points


Classification forest: effect of the weak learner modelTraining different trees in the forest


Parameters: T=200, D=2, weak learner = conic, leaf model = probabilistic(2 videos in this page)

Training points


44/50

44


Classification forest: with >2 classes




Training points


Classification forest: effect of tree depth

max tree depth, D

overfittingunderfitting

T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic

Predictor model = prob.(3 videos in this page)

Training points: 4-class mixed


45/50

45


Classification forest: analysing generalization

Parameters: T=200, D=3, leaf model = probabilistic

Weak learner: axis aligned

Weak learner: oriented line

Weak learner: conic section

Training points

(3 videos in this page. Increasing T)


Classification forest: analysing generalization

Parameters: T=200, D=13, w. l. = conic, predictor = prob.(3 videos in this page)

Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gaps

Testingposteriors


46/50

46


Classification forest: comparison with boosting

[Boosting code in http://graphics.cs.msu.ru/ru/science/research/machinelearning/adaboosttoolbox]

Boosting parameters: 200 weak learners.Weak learners = axis aligned

Forest parameters: T=200, D=13,w. l. = axis aligned,l. m. = probabilistic

Classification forest ModestBoost ModestBoost (soft output)

Example1

Example

2


Increased uncertaintyaway from trainingpoints.

Classification forest: comparison with SVM

Max-margin likebehaviour formulti-class problem

Increased uncertaintyin mixed regions



47/50

47


Increased uncertaintyaway from trainingpoints.



Increased uncertaintyin mixed regions

Max-margin likebehaviour for

multi-class problem




Note overfitting+

overly confident

Same high confidenceaway from trainingdata

Lack of symmetry

SVM produces niceseparation but noconfidence information

[SVM code in http://asi.insa-rouen.fr/enseignants/~arakotom/toolbox/index.html]


48/50

48


Classification forest: max-margin for multiple classes

Training points

weak learner: conic sectionweak learner: oriented line


Summary: Random Forests

Properties Very simple algorithm.

Resistant to overfitting generalizes well to new data.

Faster training

Extensions available for clustering, distance learning, etc.

Limitations Memory consumption

Decision tree construction uses much more memory.

Well-suited for problems with little training data

Little performance gain when training data is really large.

97 B. Leibe


49/50

49


Why do they work? Suppose there are 25 base classifiers Each classifier has error rate, Assume independence among classifiers Probability that the ensemble classifier makes a

wrong prediction:

06.0)1(25

25i25

13

i

i i

35.0


Relation to Cascades [Viola & Jones 04]

Boosted Cascades very unbalanced tree

good for unbalanced binary problemse.g. sliding window object detection

Hard to learn

Randomized forests less deep, fairly balanced

ensemble of trees gives robustness

good for multi-class problems


50/50


Credits, reading materials and tools

Many decision tree slides from [A. Criminisi and J. Shotton, 2013]

Tree software (C#/C++) Sherwooddownloadable fromhttp://research.microsoft.com/projects/decisionforests/

Random Forests in Matlab: https://github.com/karpathy/Random-Forest-Matlab

Random Forests : http://www.stat.berkeley.edu/~breiman/RandomForests/

Hastie et al "The elements of statistical learning" Chap 9.1, 9.2

Inf5300 v2013 Lecture2 Random 2pp

Documents

Transcript of Inf5300 v2013 Lecture2 Random 2pp