1
CSE190a, Winter 2010
Pattern classification
Biometrics CSE 190-a Lecture 3
CSE190a, Winter 2010
Biometrics: A Pattern Recognition System
• False accept rate (FAR): Proportion of imposters accepted • False reject rate (FRR): Proportion of genuine users rejected • Failure to enroll rate (FTE): portion of population that cannot be enrolled • Failure to acquire rate (FTA): portion of population that cannot be verified
Authentication Enrollment
Yes/No
CSE190a, Winter 2010
Error Rates
False Match (False Accept): Mistaking biometric measurements from two different persons to be from the same person; False Non-match (False reject): Mistaking two biometric measurements from the same person to be from two different persons
(c) Jain 2004 CSE190a, Winter 2010
Error vs Threshold
(c) Jain 2004
FAR: False accept rate FRR: False reject rate
CSE190a, Winter 2010
ROC Curve
Accuracy requirements of a biometric system are application dependent (c) Jain 2004
Pattern Classification
2
Pattern Classification, Chapter 1
7
An Example
• “Sorting incoming Fish on a conveyor according to species using optical sensing”
Sea bass Species Salmon
Pattern Classification, Chapter 1
8
Pattern Classification, Chapter 1
9
Pattern Classification, Chapter 1
10
• Adopt the lightness and add the width of the fish
Fish xT = [x1, x2]
Lightness Width
Pattern Classification, Chapter 1
11
Pattern Classification, Chapter 1
12
3
Pattern Classification, Chapter 1
13
• However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input
Issue of generalization!
Pattern Classification, Chapter 1
14
Bayesian Decision Theory Continuous Features
(Sections 2.1-2.2)
Pattern Classification, Chapter 1
16
Introduction
• The sea bass/salmon example
• State of nature, prior
• State of nature is a random variable
• The catch of salmon and sea bass is equiprobable
• P(ω1), P(ω2) Prior probabilities
• P(ω1) = P(ω2) (uniform priors)
• P(ω1) + P( ω2) = 1 (exclusivity and exhaustivity)
Pattern Classification, Chapter 1
17
• Decision rule with only the prior information • Decide ω1 if P(ω1) > P(ω2) otherwise decide ω2
• Use of the class–conditional information
• P(x | ω1) and P(x | ω2) describe the difference in lightness between populations of sea-bass and salmon
Pattern Classification, Chapter 1
18
4
Pattern Classification, Chapter 1
19
• Posterior, likelihood, evidence
• P(ωj | x) = (P(x | ωj) * P (ωj)) / P(x) (BAYES RULE)
• In words, this can be said as: Posterior = (Likelihood * Prior) / Evidence
• Where in case of two categories
Pattern Classification, Chapter 1
20
Pattern Classification, Chapter 1
21
• Intuitive decision rule given the posterior probabilities: Given x: if P(ω1 | x) > P(ω2 | x) True state of nature = ω1 if P(ω1 | x) < P(ω2 | x) True state of nature = ω2
Why do this?: Whenever we observe a particular x, the probability of error is : P(error | x) = P(ω1 | x) if we decide ω2 P(error | x) = P(ω2 | x) if we decide ω1
Pattern Classification, Chapter 1
22
• Since decision rule is optimal for each feature value X, there is not better rule for all x.
Pattern Classification, Chapter 1
23
Bayesian Decision Theory – Continuous Features
Generalization of the preceding ideas
• Use of more than one feature
• Use more than two states of nature
• Allowing actions and not only decide on the state of nature
• Introduce a loss of function (more general than the probability of error)
• Allowing actions other than classification primarily allows the possibility of rejection
• Refusing to make a decision in close or bad cases!
• Letting loss function state how costly each action taken is
Pattern Classification, Chapter 1
24
• Let X be a vector of features.
• Let {ω1, ω2,…, ωc} be the set of c states of nature (or “classes”)
• Let {α1, α2,…, αa} be the set of possible actions
• Let λ(αi | ωj) be the loss for action αi when the state of nature is ωj
Bayesian Decision Theory – Continuous Features
5
Pattern Classification, Chapter 1
25
What is the Expected Loss for action αi ?
R(αi | x) is called the Conditional Risk (or Expected Loss)
For any given x the expected loss is
Pattern Classification, Chapter 1
26
Overall risk R = Sum of all R(αi | x) for i = 1,…,a
Minimizing R Minimizing R(αi | x) for i = 1,…, a
for i = 1,…,a
Conditional risk
Pattern Classification, Chapter 1
27
Given a measured feature vector x, which actiion should we take?
Select the action αi for which R(αi | x) is minimum
R is minimum and R in this case is called the Bayes risk = best performance that can be achieved!
Pattern Classification, Chapter 1
28
Two-Category Classification
α1 : deciding ω1
α2 : deciding ω2
λij = λ(αi | ωj)
loss incurred for deciding ωi when the true state of nature is ωj
Conditional risk:
R(α1 | x) = λ11P(ω1 | x) + λ12P(ω2 | x) R(α2 | x) = λ21P(ω1 | x) + λ22P(ω2 | x)
Pattern Classification, Chapter 1
29
Our rule is the following: if R(α1 | x) < R(α2 | x)
λ11P(ω1 | x) + λ12P(ω2 | x) < λ21P(ω1 | x) + λ22P(ω2 | x) action α1: “decide ω1” is taken
This results in the equivalent rule : decide ω1 if:
(λ21- λ11) P(x | ω1) P(ω1) > (λ12- λ22) P(x | ω2) P(ω2)
and decide ω2 otherwise
Pattern Classification, Chapter 1
30
x (λ21- λ11)
x (λ12- λ22)
6
Pattern Classification, Chapter 1
31
Two-Category Decision Theory: Chopping Machine
α1 = chop
α2 = DO NOT chop ω1 = NO hand in machine
ω2 = hand in machine
λ11 = λ(α1 | ω1) = $ 0.00
λ12 = λ(α1 | ω2) = $ 100.00
λ21 = λ(α2 | ω1) = $ 0.01
λ22 = λ(α1 | ω1) = $ 0.01
Therefore our rule becomes
(λ21- λ11) P(x | ω1) P(ω1) > (λ12- λ22) P(x | ω2) P(ω2)
0.01 P(x | ω1) P(ω1) > 99.99 P(x | ω2) P(ω2)
Pattern Classification, Chapter 1
32
x 0.01
x 99.99
α1 = chop α2 = DO NOT chop
ω1 = NO hand in machine ω2 = hand in machine
Pattern Classification, Chapter 1
33
Exercise to do at home!!
Select the optimal decision where: W = {ω1, ω2}
P(x | ω1) N(2, 0.5) (Normal distribution) P(x | ω2) N(1.5, 0.2)
P(ω1) = 2/3 P(ω2) = 1/3
Pattern Classification, Chapter 1
34 Minimum-Error-Rate Classification
revisited
• Actions are decisions on classes If action αi is taken and the true state of nature is ωj then the decision is correct if i = j and in error if i ≠ j
• Seek a decision rule that minimizes the probability of error which is the error rate
Pattern Classification, Chapter 1
35
• Introduction of the zero-one loss function:
Therefore, the conditional risk for each action is:
“The risk corresponding to this loss function is the average (or expected) probability error”
Average Prob. of Error
Since
Pattern Classification, Chapter 1
36
• Minimize the risk requires maximize P(ωi | x) (since R(αi | x) = 1 – P(ωi | x))
• For Minimum error rate
• Decide ωi if P (ωi | x) > P(ωj | x) ∀j ≠ i
7
Pattern Classification, Chapter 1
37
Likelihood ratio:
The preceding rule is equivalent to the following rule:
Then take action α1 (decide ω1) Otherwise take action α2 (decide ω2)
Top Related