Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture...

Post on 08-Aug-2020

1 views 0 download

Transcript of Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture...

1

Pattern recognition (4)

2

Things we have discussed until now

Statistical pattern recognitionBuilding simple classifiers

Supervised classificationMinimum distance classifier Bayesian classifier (1D and multiple D)Building discriminant functions

Unsupervised classificationK-means algorithm

3

Equivalence between classifiers

Pattern recognition using multivariate normal distributions and equal priors is simply a minimum Mahalonobis distance classifier.

4

Today

Classifier design:Errors and risk in the classification process

Performance evaluation of classification systems

Reading: slides and blackboard derivations only

5

Error

How often will we be wrong?in the two class-case:

6

Global error

Suppose that using our training samples, we have partitioned the feature space into regions

Ri corresponds to class ωi if all samples belonging to this region of the feature space will be classified in class ωi

We can compute then the overall error of the classification process by integrating the class-specific errors over their corresponding regions

7

Risk

Not every mistake has the same costClassification strategy: minimizing the probability of loss (risk) instead of minimizing the probability of error

8

Example

Computer-assisted diagnosis of suspicious lesions in a CT scan: Two ways to go wrong

Alpha risk Beta risk

Label a lesion as malignant when in fact it is benignUnnecessary biopsy

Label a lesion as benign when in fact it is malignantProgression of cancer

9

Another example

Quality control for manufactured parts: accept or reject a partTwo ways to go wrong

Alpha risk Beta risk

Accept a bad partLoses customers

Reject a good partWastes money

10

Loss tables

Suppose that the cost of classifying into class ωj when the actual class is ωi is Lij

We can summarize this in a loss table

11

Example (cont’d)

Let ω1 be the class for good parts and ω2 be the class for bad parts

12

Risk for a specific pattern X

Loss tables are useful for computing the risk in making a specific choice αi given pattern x:

We can compute this risk by adding up the costs for each possible classification.

)/( XR iα

13

Computing risks for our example

Suppose that for our earlier example we have computed the posterior probabilities

We will compute the risks for classifying X in ω1 and in ω2 respectively.

2.0)/(;8.0)/( 21 == XPXP ωω

14

Bayesian classifiers revisited

Instead of maximizing posterior probability P(ωj/X), minimize risk R(αj/X)

Given N classes, we compute N risk values rj=R(αj/X)We assign X to the class corresponding to the minimum riskDerivation..

15

The 0-1 Loss rule

Under this rule, the Bayesian classifier maximizes the posterior probability (as we have learned in the previous lecture) and can be expressed as a minimum distance classifier.

The most common assumption in Computer Vision classifiers!

16

Performance classification paradigms

Against ground truth (manually generated segmentation/classification)

The method of preference in medical image segmentation

Benchmarking: for mature/maturing subfields in computer vision

Example 1: “The gait identification challenge problem: datasets and baseline algorithm”, in International Conference on Pattern Recognition 2002Example 2: “Benchmark Studies on Face Recognition”, in International Workshop on Automatic Face- and Gesture- Recognition 1995.

17

Evaluation of classifiers

ROC analysisPrecision and recallConfusion matrices

18

ROC analysis

ROC stands for receiver-operator characteristic and was initially used to analyze and compare the performances of human radar operators.A ROC curve=plot of false positive rate against true positive rate as some parameter is varied. 1970: ROC curves were used in medical studies; useful in bringing out the sensitivity (true positive rate) versus specificity (false positive rate) of diagnosis trials.Computer Vision performs ROC analysis for algorithmsWe can also compare different algorithms that are designed for the same task

19

ROC terminology

Four kinds of errors:TP “yes” and are right (True Positives) “hit”TN “no” and are right (True Negatives) “correct rejection”FP “yes” and are wrong (False Positives) “false alarm”FN “no” and are wrong (False Negatives) “miss”

We don’t actually really need all four rates because

FN = 1-TPTN = 1-FP

20

False positives, false negatives

21

ROC curves

trade-off between the true positive rate and the false positive rate: an increase in true positive rate is accompanied by an increase in false positive rate

the area under each curve gives a measure of accuracy

22

ROC curve

- the closer the curve approaches the top left-hand corner of the plot, the more accurate the classifier;- the closer the curve is to a 45 diagonal, the worse the classifier;

23

Where are ROC curves helpful?

Detection-type problemsFace detection in images/video dataEvent detection in video dataLesion detection in medical imagesEtc…

24

Precision and recall

Also used mostly for detection-type problemsIn a multiple class case, can be measured for each class

detections missedC1 trueC1 true

database in samples C1 ofnumber totaldetectionscorrect of norecall

alarms false C1 trueC1 true

detections ofnumber Total detectionscorrect of noprecision

+==

+==

25

Trade-of between precision and recall

Example: content-based image retrievalSuppose we aim at detecting all sunset images from an image databaseThe image database contains 200 sunset imagesThe classifier retrieves 150 of the relevant 200 images and 100 images of no interest to the userPrecision=150/250=60%Recall=150/200=57%

The system could obtain 100 percent recall if returned all images in the database, but its precision would be terribleIf we aim at a low false alarm rate: precision would be high, recall would be low.

26

Confusion matrix

Used for visualizing/reporting results of a classification system

27

The binary confusion matrix

We can construct a binary confusion matrix for one class

28

Calculating the precision and recall from the confusion matrix

Example. Consider the confusion matrix of a OCR that produces the following output over a test document set

Calculate the precision and recall for class a.