Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture...
Transcript of Pattern recognition (4) - Electrical and Computer …aalbu/computer vision 2009/Lecture...
1
Pattern recognition (4)
2
Things we have discussed until now
Statistical pattern recognitionBuilding simple classifiers
Supervised classificationMinimum distance classifier Bayesian classifier (1D and multiple D)Building discriminant functions
Unsupervised classificationK-means algorithm
3
Equivalence between classifiers
Pattern recognition using multivariate normal distributions and equal priors is simply a minimum Mahalonobis distance classifier.
4
Today
Classifier design:Errors and risk in the classification process
Performance evaluation of classification systems
Reading: slides and blackboard derivations only
5
Error
How often will we be wrong?in the two class-case:
6
Global error
Suppose that using our training samples, we have partitioned the feature space into regions
Ri corresponds to class ωi if all samples belonging to this region of the feature space will be classified in class ωi
We can compute then the overall error of the classification process by integrating the class-specific errors over their corresponding regions
7
Risk
Not every mistake has the same costClassification strategy: minimizing the probability of loss (risk) instead of minimizing the probability of error
8
Example
Computer-assisted diagnosis of suspicious lesions in a CT scan: Two ways to go wrong
Alpha risk Beta risk
Label a lesion as malignant when in fact it is benignUnnecessary biopsy
Label a lesion as benign when in fact it is malignantProgression of cancer
9
Another example
Quality control for manufactured parts: accept or reject a partTwo ways to go wrong
Alpha risk Beta risk
Accept a bad partLoses customers
Reject a good partWastes money
10
Loss tables
Suppose that the cost of classifying into class ωj when the actual class is ωi is Lij
We can summarize this in a loss table
11
Example (cont’d)
Let ω1 be the class for good parts and ω2 be the class for bad parts
12
Risk for a specific pattern X
Loss tables are useful for computing the risk in making a specific choice αi given pattern x:
We can compute this risk by adding up the costs for each possible classification.
)/( XR iα
13
Computing risks for our example
Suppose that for our earlier example we have computed the posterior probabilities
We will compute the risks for classifying X in ω1 and in ω2 respectively.
2.0)/(;8.0)/( 21 == XPXP ωω
14
Bayesian classifiers revisited
Instead of maximizing posterior probability P(ωj/X), minimize risk R(αj/X)
Given N classes, we compute N risk values rj=R(αj/X)We assign X to the class corresponding to the minimum riskDerivation..
15
The 0-1 Loss rule
Under this rule, the Bayesian classifier maximizes the posterior probability (as we have learned in the previous lecture) and can be expressed as a minimum distance classifier.
The most common assumption in Computer Vision classifiers!
16
Performance classification paradigms
Against ground truth (manually generated segmentation/classification)
The method of preference in medical image segmentation
Benchmarking: for mature/maturing subfields in computer vision
Example 1: “The gait identification challenge problem: datasets and baseline algorithm”, in International Conference on Pattern Recognition 2002Example 2: “Benchmark Studies on Face Recognition”, in International Workshop on Automatic Face- and Gesture- Recognition 1995.
17
Evaluation of classifiers
ROC analysisPrecision and recallConfusion matrices
18
ROC analysis
ROC stands for receiver-operator characteristic and was initially used to analyze and compare the performances of human radar operators.A ROC curve=plot of false positive rate against true positive rate as some parameter is varied. 1970: ROC curves were used in medical studies; useful in bringing out the sensitivity (true positive rate) versus specificity (false positive rate) of diagnosis trials.Computer Vision performs ROC analysis for algorithmsWe can also compare different algorithms that are designed for the same task
19
ROC terminology
Four kinds of errors:TP “yes” and are right (True Positives) “hit”TN “no” and are right (True Negatives) “correct rejection”FP “yes” and are wrong (False Positives) “false alarm”FN “no” and are wrong (False Negatives) “miss”
We don’t actually really need all four rates because
FN = 1-TPTN = 1-FP
20
False positives, false negatives
21
ROC curves
trade-off between the true positive rate and the false positive rate: an increase in true positive rate is accompanied by an increase in false positive rate
the area under each curve gives a measure of accuracy
22
ROC curve
- the closer the curve approaches the top left-hand corner of the plot, the more accurate the classifier;- the closer the curve is to a 45 diagonal, the worse the classifier;
23
Where are ROC curves helpful?
Detection-type problemsFace detection in images/video dataEvent detection in video dataLesion detection in medical imagesEtc…
24
Precision and recall
Also used mostly for detection-type problemsIn a multiple class case, can be measured for each class
detections missedC1 trueC1 true
database in samples C1 ofnumber totaldetectionscorrect of norecall
alarms false C1 trueC1 true
detections ofnumber Total detectionscorrect of noprecision
+==
+==
25
Trade-of between precision and recall
Example: content-based image retrievalSuppose we aim at detecting all sunset images from an image databaseThe image database contains 200 sunset imagesThe classifier retrieves 150 of the relevant 200 images and 100 images of no interest to the userPrecision=150/250=60%Recall=150/200=57%
The system could obtain 100 percent recall if returned all images in the database, but its precision would be terribleIf we aim at a low false alarm rate: precision would be high, recall would be low.
26
Confusion matrix
Used for visualizing/reporting results of a classification system
27
The binary confusion matrix
We can construct a binary confusion matrix for one class
28
Calculating the precision and recall from the confusion matrix
Example. Consider the confusion matrix of a OCR that produces the following output over a test document set
Calculate the precision and recall for class a.