Classification Discriminant...

18
Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification Discriminant Analysis slides thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Transcript of Classification Discriminant...

Page 1: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 1

Classification

Discriminant Analysis

slides thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 2: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 2

Page 3: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 3

Page 4: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 4

Page 5: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 5

Page 6: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 6

We want to minimize “overlap” between projections of the two classes.One approach: make the class projections a) compact, b) far apart.An obvious idea: maximize separation between the projected means.

Page 7: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 7

Page 8: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 8

Page 9: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 9

Page 10: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 10

Page 11: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 11

Example of applying Fisher’s LDA

maximize separation of means maximize Fisher’s LDA criterion→ better class separation

Page 12: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 12

Fisher’s LDA gives an optimal choice of w, the vector for projection down to one dimension.For classification, we still need to select a threshold to compare projected values to. Two possibilities:– No explicit probabilistic assumptions. Find threshold

which minimizes empirical classification error.– Make assumptions about data distributions of the

classes, and derive theoretically optimal decision boundary.

Usual choice for class distributions is multivariate Gaussian.We also will need a bit of decision theory.

Using LDA for classification in one dimension

Page 13: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 13

To minimize classification error:

Decision theory

≈= )|(maxargˆ xCpyC

At a given point x in feature space, choose as the predicted class the class that has the greatest probability at x.

Page 14: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 14

Decision theory

≈= )|(maxargˆ xCpyC

At a given point x in feature space, choose as the predicted class the class that has the greatest probability at x.

probability densities for classes C1 and C2 relative probabilities for classes C1 and C2

Page 15: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 15

Classification via discriminant analysis,using the classify() function.

Data for each class modeled as multivariate Gaussian.

matlab_demo_06.m

class = classify( sample, training, group, ‘type’ )

MATLAB interlude

test datapredicted testlabels

traininglabels

model for classcovariances

training data

Page 16: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 16

Models for class covariances

MATLAB classify() function

‘linear’:all classes have same covariance matrix

→ linear decision boundary

‘diaglinear’:all classes have same diagonal covariance matrix

→ linear decision boundary

‘quadratic’:classes have different covariance matrices

→ quadratic decision boundary

‘diagquadratic’:classes have different diagonal covariance matrices

→ quadratic decision boundary

Page 17: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 17

Example with ‘quadratic’ model of class covariances

MATLAB classify() function

Page 18: Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

Jeff Howbert Introduction to Machine Learning Winter 2014 18

Relative class probabilities for LDA

‘linear’:all classes have same covariance matrix

→ linear decision boundaryrelative class probabilities have

exactly same sigmoidal formas in logistic regression