Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture...

28
1 Pattern recognition (4)

Transcript of Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture...

Page 1: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

1

Pattern recognition (4)

Page 2: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

2

Things we have discussed until now

Statistical pattern recognitionBuilding simple classifiers

Supervised classificationMinimum distance classifier Bayesian classifier (1D and multiple D)Building discriminant functions

Unsupervised classificationK-means algorithm

Page 3: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

3

Equivalence between classifiers

Pattern recognition using multivariate normal distributions and equal priors is simply a minimum Mahalonobis distance classifier.

Page 4: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

4

Today

Classifier design:Errors and risk in the classification process

Performance evaluation of classification systems

Reading: slides and blackboard derivations only

Page 5: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

5

Error

How often will we be wrong?in the two class-case:

Page 6: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

6

Global error

Suppose that using our training samples, we have partitioned the feature space into regions

Ri corresponds to class ωi if all samples belonging to this region of the feature space will be classified in class ωi

We can compute then the overall error of the classification process by integrating the class-specific errors over their corresponding regions

Page 7: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

7

Risk

Not every mistake has the same costClassification strategy: minimizing the probability of loss (risk) instead of minimizing the probability of error

Page 8: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

8

Example

Computer-assisted diagnosis of suspicious lesions in a CT scan: Two ways to go wrong

Alpha risk Beta risk

Label a lesion as malignant when in fact it is benignUnnecessary biopsy

Label a lesion as benign when in fact it is malignantProgression of cancer

Page 9: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

9

Another example

Quality control for manufactured parts: accept or reject a partTwo ways to go wrong

Alpha risk Beta risk

Accept a bad partLoses customers

Reject a good partWastes money

Page 10: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

10

Loss tables

Suppose that the cost of classifying into class ωj when the actual class is ωi is Lij

We can summarize this in a loss table

Page 11: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

11

Example (cont’d)

Let ω1 be the class for good parts and ω2 be the class for bad parts

Page 12: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

12

Risk for a specific pattern X

Loss tables are useful for computing the risk in making a specific choice αi given pattern x:

We can compute this risk by adding up the costs for each possible classification.

)/( XR iα

Page 13: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

13

Computing risks for our example

Suppose that for our earlier example we have computed the posterior probabilities

We will compute the risks for classifying X in ω1 and in ω2 respectively.

2.0)/(;8.0)/( 21 == XPXP ωω

Page 14: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

14

Bayesian classifiers revisited

Instead of maximizing posterior probability P(ωj/X), minimize risk R(αj/X)

Given N classes, we compute N risk values rj=R(αj/X)We assign X to the class corresponding to the minimum riskDerivation..

Page 15: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

15

The 0-1 Loss rule

Under this rule, the Bayesian classifier maximizes the posterior probability (as we have learned in the previous lecture) and can be expressed as a minimum distance classifier.

The most common assumption in Computer Vision classifiers!

Page 16: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

16

Performance classification paradigms

Against ground truth (manually generated segmentation/classification)

The method of preference in medical image segmentation

Benchmarking: for mature/maturing subfields in computer vision

Example 1: “The gait identification challenge problem: datasets and baseline algorithm”, in International Conference on Pattern Recognition 2002Example 2: “Benchmark Studies on Face Recognition”, in International Workshop on Automatic Face- and Gesture- Recognition 1995.

Page 17: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

17

Evaluation of classifiers

ROC analysisPrecision and recallConfusion matrices

Page 18: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

18

ROC analysis

ROC stands for receiver-operator characteristic and was initially used to analyze and compare the performances of human radar operators.A ROC curve=plot of false positive rate against true positive rate as some parameter is varied. 1970: ROC curves were used in medical studies; useful in bringing out the sensitivity (true positive rate) versus specificity (false positive rate) of diagnosis trials.Computer Vision performs ROC analysis for algorithmsWe can also compare different algorithms that are designed for the same task

Page 19: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

19

ROC terminology

Four kinds of errors:TP “yes” and are right (True Positives) “hit”TN “no” and are right (True Negatives) “correct rejection”FP “yes” and are wrong (False Positives) “false alarm”FN “no” and are wrong (False Negatives) “miss”

We don’t actually really need all four rates because

FN = 1-TPTN = 1-FP

Page 20: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

20

False positives, false negatives

Page 21: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

21

ROC curves

trade-off between the true positive rate and the false positive rate: an increase in true positive rate is accompanied by an increase in false positive rate

the area under each curve gives a measure of accuracy

Page 22: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

22

ROC curve

- the closer the curve approaches the top left-hand corner of the plot, the more accurate the classifier;- the closer the curve is to a 45 diagonal, the worse the classifier;

Page 23: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

23

Where are ROC curves helpful?

Detection-type problemsFace detection in images/video dataEvent detection in video dataLesion detection in medical imagesEtc…

Page 24: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

24

Precision and recall

Also used mostly for detection-type problemsIn a multiple class case, can be measured for each class

detections missedC1 trueC1 true

database in samples C1 ofnumber totaldetectionscorrect of norecall

alarms false C1 trueC1 true

detections ofnumber Total detectionscorrect of noprecision

+==

+==

Page 25: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

25

Trade-of between precision and recall

Example: content-based image retrievalSuppose we aim at detecting all sunset images from an image databaseThe image database contains 200 sunset imagesThe classifier retrieves 150 of the relevant 200 images and 100 images of no interest to the userPrecision=150/250=60%Recall=150/200=57%

The system could obtain 100 percent recall if returned all images in the database, but its precision would be terribleIf we aim at a low false alarm rate: precision would be high, recall would be low.

Page 26: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

26

Confusion matrix

Used for visualizing/reporting results of a classification system

Page 27: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

27

The binary confusion matrix

We can construct a binary confusion matrix for one class

Page 28: Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture 27...computer vision zExample 1: “The gait identification challenge problem: datasets and

28

Calculating the precision and recall from the confusion matrix

Example. Consider the confusion matrix of a OCR that produces the following output over a test document set

Calculate the precision and recall for class a.