Inf5300 v2013 Lecture2 Random 2pp

download Inf5300 v2013 Lecture2 Random 2pp

of 50

Transcript of Inf5300 v2013 Lecture2 Random 2pp

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    1/50

    1

    Technology for a better society

    INF 5300 Advanced Topic: Video Content Analysis

    Asbjrn Berge

    Random algorithms in Computer Vision

    x

    y

    Technology for a better society

    Outline

    Motivation / examples Intuitive approach RAndom SAmple Consensus Algorithm specificities

    Robust fittingby randomsampling

    Recap of boosting classifiers Tree classifiers / decision stumps Randomness in training Algorithm details

    Randomizedclassifiers

    2

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    2/50

    2

    Technology for a better society

    Structure from motion

    Obtain 3D scene structure frommultiple images from the same camerain different locations, poses

    Typically, camera location & posetreated as unknowns

    Track points across frames, infercamera pose & scene structure fromcorrespondences

    Simultaneous Location And Mapping (SLAM)

    Localize a robot and map itssurroundings with a single camera

    3

    Inferring 3D

    Technology for a better society

    3D Reconstruction

    InternetPhotos(Colosseum) Reconstructed3Dcamerasandpointshttp://photosynth.net/default.aspx

    http://phototour.cs.washington.edu/applet/index.html

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    3/50

    3

    Technology for a better society

    Why extract features? Motivation: panorama stitching

    We have two images how do we combine them?

    Technology for a better society

    Why extract features?

    Motivation: panorama stitching

    We have two images how do we combine them?

    Step1:extractfeatures

    Step2:matchfeatures

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    4/50

    4

    Technology for a better society

    Why extract features? Motivation: panorama stitching

    We have two images how do we combine them?

    Step1:extractfeatures

    Step2:matchfeatures

    Step3:alignimages

    Technology for a better society

    Local invariant features: outline

    1) Detection: Identify the interestpoints

    2) Description: Extract vectorfeature descriptor surrounding

    each interest point.3) Matching: Determine

    correspondence betweendescriptors in two views

    ],,[)1()1(

    11 dxx x

    ],,[ )2()2(12 dxx x

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    5/50

    5

    Technology for a better society

    Computing transformations

    Given a set of matches between images A and B

    How can we compute the transform T from A to B?

    Find transform T that best agrees with the matches

    Technology for a better society

    Evaluating the results

    How can we measure the performance of a feature matcher?

    50

    75

    200

    featuredistance

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    6/50

    6

    Technology for a better society

    True/false positives

    The distance threshold affects performance True positives = # of detected matches that are correct

    Suppose we want to maximize thesehow to choose threshold?

    False positives = # of detected matches that are incorrect Suppose we want to minimize thesehow to choose threshold?

    50

    75

    200false match

    true match

    featuredistance

    Howcanwemeasuretheperformanceofafeaturematcher?

    Technology for a better society

    Robustnessoutliers

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    7/50

    7

    Technology for a better society

    Robustness Lets consider a simpler example

    How can we fix this?

    Problem:Fitalinetothesedatapoints Leastsquaresfit

    Technology for a better society

    Idea

    Given a hypothesized line

    Count the number of points that agree with the line

    Agree = within a small distance of the line

    I.e., the inliers to that line

    For all possible lines, select the one with the largest number of inliers

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    8/50

    8

    Technology for a better society

    How do we find the best line?

    Unlike least-squares, no simple closed-form solution

    Hypothesize-and-test Try out many lines, keep the best one

    Which lines?

    Technology for a better society

    Translations

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    9/50

    9

    Technology for a better society

    RAndom SAmple Consensus

    Selectone matchatrandom,countinliers

    Technology for a better society

    RAndom SAmple Consensus

    Selectanothermatchatrandom,countinliers

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    10/50

    10

    Technology for a better society

    RAndom SAmple Consensus

    Outputthetranslationwiththehighestnumberofinliers

    Technology for a better society

    RANSAC

    Idea: All the inliers will agree with each other on the translation

    vector; the (hopefully small) number of outliers will (hopefully)disagree with each other

    RANSAC only has guarantees if there are < 50% outliers

    All good matches are alike; every bad match is bad in its ownway.

    Tolstoy via Alyosha Efros

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    11/50

    11

    Technology for a better society

    ),(Pf

    Omin OPI ,: such that:

    TT PPPPf 1),(

    Model parameters

    RANSAC

    [Fischler & Bolles, 1981}

    (RANdom SAmple Consensus) :Learning technique to estimateparameters of a model by randomsampling of observed data

    Technology for a better society

    Algorithm:

    1. Sample (randomly) the number of points required to fit the model2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model

    Repeat 1-3 until the best model is found with high confidence

    RANSAC

    [Fischler & Bolles, 1981}

    (RANdom SAmple Consensus) :

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    12/50

    12

    Technology for a better society

    RANSAC

    Algorithm:

    1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model

    Repeat 1-3 until the best model is found with high confidence

    Line fitting example

    Technology for a better society

    RANSAC

    Algorithm:

    1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model

    Repeat 1-3 until the best model is found with high confidence

    Line fitting example

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    13/50

    13

    Technology for a better society

    RANSAC

    6IN

    Algorithm:

    1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model

    Repeat 1-3 until the best model is found with high confidence

    Line fitting example

    Technology for a better society

    RANSAC

    14INAlgorithm:

    1. Sample (randomly) the number of points required to fit the model (#=2)2. Solve for model parameters using samples3. Score by the fraction of inliers within a preset threshold of the model

    Repeat 1-3 until the best model is found with high confidence

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    14/50

    14

    Technology for a better society

    RANSAC

    Inlier threshold related to the amount of noise we expect ininliers Often model noise as Gaussian with some standard deviation (e.g., 3

    pixels)

    Number of rounds related to the percentage of outliers weexpect, and the probability of success wed like to guarantee Suppose there are 20% outliers, and we want to find the correct answer

    with 99% probability

    How many rounds do we need?

    Technology for a better society

    RANSAC

    xtranslation

    ytranslation

    setthresholdsothat,e.g.,

    95%oftheGaussian

    liesinsidethatradius

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    15/50

    15

    Technology for a better society

    RANSAC

    Back to linear regression

    How do we generate a hypothesis?

    x

    y

    Technology for a better society

    RANSAC

    x

    y

    Back to linear regression

    How do we generate a hypothesis?

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    16/50

    16

    Technology for a better society

    RANSAC

    General version:1. Randomly choose s samples

    Typically s = minimum sample size that lets you fit a model

    2. Fit a model (e.g., line) to those samples

    3. Count the number of inliers that approximately fit the model

    4. Repeat Ntimes

    5. Choose the model that has the largest set of inliers

    Technology for a better society

    How big is s?

    For alignment, depends on the motion model

    Here, each sample is a correspondence (pair of matching points)

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    17/50

    17

    Technology for a better society

    Final step: least squares fit

    Findaveragetranslationvectoroverallinliers

    Technology for a better society

    Choosing the parameters

    Initial number of points s Typically minimum number needed to fit the model

    Distance threshold t Choose tso probability for inlier isp (e.g. 0.95)

    Zero-mean Gaussian noise with std. dev. : t2=3.842

    Number of samples N Choose Nso that, with probabilityp, at least one random sample is free

    from outliers (e.g.p=0.99) (outlier ratio: e)

    sepN 11log/1log

    pe Ns 111proportion of outliers e

    s 5% 10% 20% 25% 30% 40% 50%

    2 2 3 5 6 7 11 17

    3 3 4 7 9 11 19 35

    4 3 5 9 13 17 34 72

    5 4 6 12 17 26 57 146

    6 4 7 16 24 37 97 293

    7 4 8 20 33 54 163 588

    8 5 9 26 44 78 272 1177

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    18/50

    18

    Technology for a better society

    Algorithmic specificities

    Termination when when inlier ratio reaches expected ratio of inliers

    e is often unknown a priori, so pick worst case, e.g. 50%, and adapt if more inliers are found,e.g. 80% would yield e=0.2

    N=, sample_count=0

    While N>sample_countrepeat

    Choose a sample and count the number of inliers

    Set e=1-(number of inliers)/(total number of points)

    Recompute Nfrom e

    Increment the sample_countby 1

    Terminate

    neT 1

    sepN 11log/1log

    Technology for a better society 36* From Marc Pollefeys COMP 256 2003

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    19/50

    19

    Technology for a better society

    RANSAC conclusionsGood Robust to outliers

    Applicable for larger number of parameters than Hough transform

    Parameters are easier to choose than Hough transform

    Bad Computational time grows quickly with fraction of outliers and number of

    parameters

    Not good for getting multiple fits

    Common applications Robust linear regression (and similar).

    Computing a homography (e.g., image stitching)

    Estimating fundamental matrix (relating two views)

    Technology for a better society

    Sounds familiar?

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    20/50

    20

    Technology for a better society

    VLFeat demo of Ransac Homography fit

    335tentativematches

    209(62.39%)inliner matches out of335

    Mosaic

    Technology for a better society

    Reading materials and tools

    R. Szeliski: Computer Vision: Algorithms and Applications

    Chapters 4.1, 6.1 http://szeliski.org/Book/

    M. Zuliani: Ransac for dummieshttp://vision.ece.ucsb.edu/~zuliani/Research/RANSAC/docs/RANSAC4Dummies.pdf

    ToolsRansac toolbox: https://github.com/RANSAC/RANSAC-ToolboxVlFeat toolbox : http://www.vlfeat.org

    OpenCV 3D reconstruction:http://opencv.itseez.com/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    21/50

    21

    Technology for a better society

    Outline

    Motivation / examples Intuitive approach RAndom SAmple Consensus Algorithm specificities

    Robust fittingby randomsampling

    Recap of boosting classifiers Tree classifiers / decision stumps Randomness in training Algorithm details

    Randomized

    classifiers

    41

    Technology for a better society

    Recap: AdaBoost Adaptive Boosting

    Instead of resampling, reweight misclassified training examples. Increase the chance of being selected in a sampled training set.

    Or increase the misclassification cost when training on the full set.

    Components

    : weak or base classifier Condition:

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    22/50

    22

    Technology for a better society

    Recap: AdaBoost Intuition

    43 B. Leibe

    Consider a 2D feature

    space with positive and

    negative examples.

    Each weak classifier splits

    the training examples with

    at least 50% accuracy.

    Examples misclassified by

    a previous weak learnerare given more emphasis

    at future rounds.

    Slide credit: Kristen Grauman Figure adapted from Freund & Schapire

    43

    Technology for a better society

    Recap: AdaBoost Intuition

    44 B. Leibe

    Slide credit: Kristen Grauman Figure adapted from Freund & Schapire

    44

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    23/50

    23

    Technology for a better society

    Recap: AdaBoost Intuition

    45

    Final classifier iscombination of the

    weak classifiers

    Slide credit: Kristen Grauman

    45

    Technology for a better society

    Recap: AdaBoost AlgorithmStart with uniformweights on trainingexamples

    Evaluateweightederrorfor each feature, pick

    best.

    Re-weight the examples:Incorrectly classified -> more weightCorrectly classified -> less weight

    Final classifier is combination of the weak ones, weightedaccording to error they had.

    [Freund & Schapire, 1995]

    {x1,xn}

    For T rounds

    46

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    24/50

    24

    Technology for a better society

    Randomized Decision Forests Very fast tools for

    classification

    clustering

    regression

    Good generalization through randomized training

    Inherently multi-class automatic feature sharing [Torralba et al. 07]

    Simple training / testing algorithms

    RandomizedDecisionForests=RandomizedForests=RandomForestsTM

    Technology for a better society

    A brief history of forests

    [ L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees (CART). 1984 ]

    [ Y. Amit and D. Geman.Randomized enquiries about shape; An application to

    handwritten digit recognition. Technical Report 1994]

    [ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. 1997 ]

    [ L. Breiman. Random forests. 1999, 2001 ]

    [ V. Lepetit and P. Fua. Keypoint recognition using randomized trees.2005, 2006 ]

    [ F. Moosman, B. Triggs, F. Jurie. Fast discriminative visual codebooks using

    randomized clustering forests. 2006]

    [ G. Rogez, J. Rihan, S. Ramalingam, P. Orrite, C. Torr. Randomized trees for human pose detection. 2008 ]

    [ C. Leistner, A. Saffari, J. Santner, H. Bischoff. Semi-supervised random forests. 2009 ]

    [ A. Saffari, C. Leistner, J. Santner, M. Godec, H. Bischoff. On-line random forests. 2009 ]

    [ S. Nowozin, C. Rother, S. Bagon, T. Sharp, B. Yao, and P. Kohli. Decision tree fields. 2011 ]

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    25/50

    25

    Technology for a better society

    What can decision forests do? tasks

    Regression forests

    Classification forests

    Semi-supervised forests

    Technology for a better society

    What can decision forests do? applications

    Regression forests

    Classification forests

    Semi-supervised forestse.g. semantic segmentation

    e.g. object localization e.g. semi-sup. semantic segmentation

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    26/50

    26

    Technology for a better society

    Decision trees and decision forests

    A forest is an ensemble of trees. The trees are all slightly different from one another.

    [ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural

    Computation. 9:1545--1588, 1997]

    [ L. Breiman. Random forests. Machine Learning. 45(1):5--32, 2001]

    Is toppart blue?

    Is bottompart green?

    Is bottompart blue?

    A decision tree

    terminal (leaf) node

    internal(split) node

    root node0

    1 2

    3 4 5 6

    7 8 9 10 11 12 13 14

    A general tree structure

    Technology for a better society

    Inputtestpoint Split the data at node

    Decision tree testing (runtime)

    Input data in feature space

    Prediction at leaf

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    27/50

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    28/50

    28

    Technology for a better society

    The decision forest model

    Basic notation

    Output/label space Categorical, continuous?e.g.

    Input data point e.g. Collection of feature responses . d=?

    Feature response selector Features can be e.g. wavelets? Pixel intensities? Context?

    Forest model

    tree

    Node weak learner The test function for splitting data at a node j.e.g.

    Node objective function (train.) The energy to be minimized when training the j-thsplit nodee.g.

    Stopping criteria (train.) e.g. max tree depth = When to stop growing a tree during training

    The ensemble model How to compute the forest output from that of individual trees?e.g.ensemble

    Forest size Total number of trees in the forest

    Leaf predictor model Point estimate? Full distribution?e.g.

    Randomness model (train.)e.g. 1. Bagging,

    2. Randomized node optimization

    How is randomness injected during training? How much?

    Node test parametersParameters related to each split node:i) which features, ii) what geometric primitive, iii) thresholds.

    Technology for a better society

    Decision forest model: the randomness model

    1) Bagging (randomizing the training set)

    The full training set

    The randomly sampled subset of training data made available for the tree t

    Forest training

    Efficient training

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    29/50

    29

    Technology for a better society

    Decision forest model: the randomness model

    The full set of all possible node test parameters

    For each node the set of randomly sampled features

    Randomness control parameter.For no randomness and maximum tree correlation.For max randomness and minimum tree correlation.

    2) Randomized node optimization (RNO)

    S mal l val ue o f ; l it tl e t ree corr el at io n. Larg e val ue o f ; l arg e t ree corr el at io n.

    The effect of

    Node weak learner

    Node test params

    Node training

    Technology for a better society

    Decision forest model: the ensemble modelAn example forest to predictcontinuous variables

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    30/50

    30

    Technology for a better society

    Decision forest model: training and information gain

    Before

    split

    Information gain

    Shannons entropy

    Node training

    (for categorical, non-parametric distributions)

    Split1

    Split2

    Technology for a better society

    Decision forest model: training and information gain

    Information gain

    Differential entropy of Gaussian

    Node training

    Before

    split

    (for continuous, parametric densities)

    Split1

    Split2

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    31/50

    31

    Technology for a better society

    Background: overfitting and underfitting

    Technology for a better society

    Classification forests

    Efficient, supervised multi-class classification

    [ V. Lepetitand P. Fua. Keypoint Recognition Using Randomized Trees. IEEE Trans. PAMI. 2006.]

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    32/50

    32

    Technology for a better society

    Decision Tree Pseudo-Codedouble[]ClassifyDT(node,v)

    ifnode.IsSplitNode thenifnode.f(v)>=node.t then

    returnClassifyDT(node.right,v)else

    returnClassifyDT(node.left,v)end

    else

    returnnode.Pend

    end

    Technology for a better society

    feature vectors are x, y coordinates: ,

    split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green

    x

    yToy Example Try several lines, chosen at

    random

    Keep line that best separates data

    information gain

    Recurse

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    33/50

    33

    Technology for a better society

    feature vectors are x, y coordinates: ,

    split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green

    x

    yToy Example Try several lines, chosen at

    random

    Keep line that best separates data

    information gain

    Recurse

    Technology for a better society

    x

    y

    feature vectors are x, y coordinates: ,

    split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green

    Toy Example Try several lines, chosen at

    random

    Keep line that best separates data

    information gain

    Recurse

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    34/50

    34

    Technology for a better society

    feature vectors are x, y coordinates: ,

    split functions are lines with parameters a, b: threshold determines intercepts: four classes: purple, blue, red, green

    x

    yToy Example Try several lines, chosen at

    random

    Keep line that best separates data

    information gain

    Recurse

    Technology for a better society

    Recursively split examples at node n set In indexes labeled training examples (vi, li):

    At node , is histogram of example labels

    Randomized Learning

    left split

    right split thresholdfunction ofexample is

    feature vector

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    35/50

    35

    Technology for a better society

    Randomized Learning Pseudo Code

    TreeNode LearnDT(I)

    repeatfeatureTests timesletf =RndFeature()letr =EvaluateFeatureResponses(I,f)repeatthreshTests times

    lett =RndThreshold(r)let(I_l, I_r)=Split(I,r,t)letgain =InfoGain(I_l,I_r)ifgain isbestthenrememberf,t,I_l,I_r

    end

    end

    ifbestgain issufficientreturnSplitNode(f,t,LearnDT(I_l),LearnDT(I_r))

    elsereturnLeafNode(HistogramExamples(I))

    end

    end

    Technology for a better society

    Forest is ensemble of several decision trees

    classification is

    _|

    A Forest of Trees

    tree 1 tree

    categoryc

    categoryc

    split nodes

    leaf nodes

    [Amit & Geman 97][Breiman 01][Lepetit et al. 06]

    v v

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    36/50

    36

    Technology for a better society

    Decision Forests Pseudo-Codedouble[]ClassifyDF(forest,v)

    //allocatememoryletP =double[forest.CountClasses]//loopovertreesinforestfort =1toforest.CountTrees

    letP =ClassifyDT(forest.Tree[t],v)P =P +P//sumdistributions

    end

    //normaliseP =P /forest.CountTrees

    end

    Technology for a better society

    Learning a Forest

    Divide training examples into subsets improves generalization

    reduces memory requirements & training time

    Train each decision tree on subset same decision tree learning as before

    Multi-core friendly Subsetscanbechosenatrandomorhandpicked

    Subsetscanhaveoverlap(andusuallydo)

    Canenforcesubsetsofimages (notjustexamples)

    Couldalsodividethefeaturepoolintosubsets

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    37/50

    37

    Technology for a better society

    Learning a Forest Pseudo CodeForest LearnDF(countTrees,I)

    //allocatememoryletforest =Forest(countTrees)//loopovertreesinforestfort =1tocountTrees

    letI_t =RandomSplit(I)forest[t]=LearnDT(I_t)

    end

    //returnforestobjectreturnforest

    end

    Technology for a better society

    Toy Forest Classification Demo

    6 classes in a 2 dimensional feature space.Split functions are lines in this space.

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    38/50

    38

    Technology for a better society

    Toy Forest Classification Demo

    With a depth 2 tree, you cannot separate all six classes.

    Technology for a better society

    Toy Forest Classification Demo

    With a depth 3 tree, you are doing better, but still cannot separate all six classes.

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    39/50

    39

    Technology for a better society

    Toy Forest Classification Demo

    With a depth 4 tree, you now have at least as many leaf nodes as classes,and so are able to classify most examples correctly.

    Technology for a better society

    Toy Forest Classification Demo

    Different trees within a forest can give rise to very different decision boundaries,none of which is particularly good on its own.

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    40/50

    40

    Technology for a better society

    Toy Forest Classification Demo

    But averaging together many trees in a forest can result in decision boundariesthat look very sensible, and are even quite close to the max margin classifier.(Shading represents entropy darker is higher entropy).

    Technology for a better society

    Classification forestTraining data in feature space

    ?

    ?

    ?

    Entropy of a discrete distribu tion

    with

    Classification treetraining

    Obj. funct. for node j (information gain)

    Training node j

    Output is categorical

    Input data point

    Node weak learner

    Predictor model (class posterior)

    Model specialization for classification

    ( is feature response)

    (discrete set)

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    41/50

    41

    Technology for a better society

    Classification forest: the weak learner model

    Node weak learner

    Node test params

    Splitting data at node j

    Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section

    Examples of weak learners

    See Appendix C for relation with kernel trick.

    Feature responsefor 2D example.

    With a generic line in homog. coordinates.

    Feature responsefor 2D example.

    With a matrix representing a conic.

    Feature responsefor 2D example.

    In general may select only a very small subset of features

    With or

    Technology for a better society

    Classification forest: the prediction model

    What do we do at the leaf?

    leafleaf

    leaf

    Prediction model: probabilistic

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    42/50

    42

    Technology for a better society

    Classification forest: the ensemble model

    Tree t=1 t=2 t=3

    Forest output probability

    The ensemble model

    Technology for a better society

    Training different trees in the forest

    Testing different trees in the forest

    (2 videos in this page)

    Classification forest: effect of the weak learner model

    Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic

    Accuracy of prediction

    Quality of confidence

    Generalization

    Three concepts to keep in mind:

    Training points

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    43/50

    43

    Technology for a better society

    Training different trees in the forest

    Testing different trees in the forest

    Classification forest: effect of the weak learner model

    Parameters: T=200, D=2, weak learner = linear, leaf model = probabilistic(2 videos in this page)

    Training points

    Technology for a better society

    Classification forest: effect of the weak learner modelTraining different trees in the forest

    Testing different trees in the forest

    Parameters: T=200, D=2, weak learner = conic, leaf model = probabilistic(2 videos in this page)

    Training points

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    44/50

    44

    Technology for a better society

    Classification forest: with >2 classes

    Training different trees in the forest

    Testing different trees in the forest

    Parameters: T=200, D=3, weak learner = conic, leaf model = probabilistic(2 videos in this page)

    Training points

    Technology for a better society

    Classification forest: effect of tree depth

    max tree depth, D

    overfittingunderfitting

    T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic

    Predictor model = prob.(3 videos in this page)

    Training points: 4-class mixed

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    45/50

    45

    Technology for a better society

    Classification forest: analysing generalization

    Parameters: T=200, D=3, leaf model = probabilistic

    Weak learner: axis aligned

    Weak learner: oriented line

    Weak learner: conic section

    Training points

    (3 videos in this page. Increasing T)

    Technology for a better society

    Classification forest: analysing generalization

    Parameters: T=200, D=13, w. l. = conic, predictor = prob.(3 videos in this page)

    Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gaps

    Testingposteriors

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    46/50

    46

    Technology for a better society

    Classification forest: comparison with boosting

    [Boosting code in http://graphics.cs.msu.ru/ru/science/research/machinelearning/adaboosttoolbox]

    Boosting parameters: 200 weak learners.Weak learners = axis aligned

    Forest parameters: T=200, D=13,w. l. = axis aligned,l. m. = probabilistic

    Classification forest ModestBoost ModestBoost (soft output)

    Example1

    Example

    2

    Technology for a better society

    Increased uncertaintyaway from trainingpoints.

    Classification forest: comparison with SVM

    Max-margin likebehaviour formulti-class problem

    Increased uncertaintyin mixed regions

    Max-margin likebehaviour formulti-class problem

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    47/50

    47

    Technology for a better society

    Increased uncertaintyaway from trainingpoints.

    Classification forest: comparison with SVM

    Max-margin likebehaviour formulti-class problem

    Increased uncertaintyin mixed regions

    Max-margin likebehaviour for

    multi-class problem

    Parameters: T=200, D=6, weak learner = conic, leaf model = probabilistic(4 videos in this page)

    Technology for a better society

    Classification forest: comparison with SVM

    Note overfitting+

    overly confident

    Same high confidenceaway from trainingdata

    Lack of symmetry

    SVM produces niceseparation but noconfidence information

    [SVM code in http://asi.insa-rouen.fr/enseignants/~arakotom/toolbox/index.html]

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    48/50

    48

    Technology for a better society

    Classification forest: max-margin for multiple classes

    Training points

    weak learner: conic sectionweak learner: oriented line

    Technology for a better society

    Summary: Random Forests

    Properties Very simple algorithm.

    Resistant to overfitting generalizes well to new data.

    Faster training

    Extensions available for clustering, distance learning, etc.

    Limitations Memory consumption

    Decision tree construction uses much more memory.

    Well-suited for problems with little training data

    Little performance gain when training data is really large.

    97 B. Leibe

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    49/50

    49

    Technology for a better society

    Why do they work? Suppose there are 25 base classifiers Each classifier has error rate, Assume independence among classifiers Probability that the ensemble classifier makes a

    wrong prediction:

    06.0)1(25

    25i25

    13

    i

    i i

    35.0

    Technology for a better society

    Relation to Cascades [Viola & Jones 04]

    Boosted Cascades very unbalanced tree

    good for unbalanced binary problemse.g. sliding window object detection

    Hard to learn

    Randomized forests less deep, fairly balanced

    ensemble of trees gives robustness

    good for multi-class problems

  • 7/29/2019 Inf5300 v2013 Lecture2 Random 2pp

    50/50

    Technology for a better society

    Credits, reading materials and tools

    Many decision tree slides from [A. Criminisi and J. Shotton, 2013]

    Tree software (C#/C++) Sherwooddownloadable fromhttp://research.microsoft.com/projects/decisionforests/

    Random Forests in Matlab: https://github.com/karpathy/Random-Forest-Matlab

    Random Forests : http://www.stat.berkeley.edu/~breiman/RandomForests/

    Hastie et al "The elements of statistical learning" Chap 9.1, 9.2