L5: Quadratic classifiers - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/pr/pr_l5.pdfCSCE...
Transcript of L5: Quadratic classifiers - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/pr/pr_l5.pdfCSCE...
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 1
L5: Quadratic classifiers
β’ Bayes classifiers for Normally distributed classes β Case 1: Ξ£π = π2πΌ
β Case 2: Ξ£π = Ξ£ (Ξ£ diagonal)
β Case 3: Ξ£π = Ξ£ (Ξ£ non-diagonal)
β Case 4: Ξ£π = ππ2πΌ
β Case 5: Ξ£π β Ξ£π (general case)
β’ Numerical example
β’ Linear and quadratic classifiers: conclusions
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 2
Bayes classifiers for Gaussian classes β’ Recap
β On L4 we showed that the decision rule that minimized π[πππππ] could be formulated in terms of a family of discriminant functions
β’ For normally Gaussian classes, these DFs reduce to simple expressions β The multivariate Normal pdf is
ππ π₯ = 2π βπ/2 Ξ£ β1/2πβ12π₯βπ πΞ£β1 π₯βπ
β Using Bayes rule, the DFs become
ππ π₯ = π ππ|π₯ = π ππ π π₯ ππ π π₯
= 2π βπ/2 Ξ£πβ1/2πβ
12π₯βππ
πΞ£πβ1 π₯βππ π ππ π π₯
β Eliminating constant terms
ππ π₯ = Ξ£πβ1/2 πβ
12π₯βππ
πΞ£πβ1 π₯βππ π ππ
β And taking natural logs
ππ π₯ = β1
2π₯ β ππ
πΞ£πβ1 π₯ β ππ β
1
2log Ξ£π + ππππ(ππ)
β This expression is called a quadratic discriminant function
x2x2 x3
x3 xdxd
g1(x)g1(x)
x1x1
g2(x)g2(x) gC(x)gC(x)
Select maxSelect max
CostsCosts
Class assignment
Discriminant
functions
Features
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 3
Case 1: πΊπ = πππ° β’ This situation occurs when features are statistically independent with
equal variance for all classes
β In this case, the quadratic DFs become
ππ π₯ = β1
2π₯ β ππ
π π2πΌ β1 π₯ β ππ β1
2log π2πΌ + πππππ β‘ β
1
2π2π₯ β ππ
π π₯ β ππ + πππππ
β Expanding this expression
ππ π₯ = β1
2π2π₯ππ₯ β 2ππ
ππ₯ + πππππ + πππππ
β Eliminating the term π₯ππ₯, which is constant for all classes
ππ π₯ = β1
2π2β2ππ
ππ₯ + πππππ + πππππ = π€π
ππ₯ + π€0
β So the DFs are linear, and the boundaries ππ(π₯) = ππ(π₯) are hyper-planes
β If we assume equal priors
ππ π₯ = β1
2π2π₯ β ππ
π π₯ β ππ
β’ This is called a minimum-distance or nearest mean classifier
β’ The equiprobable contours are hyper-spheres
β’ For unit variance (π2 = 1), ππ π₯ is the Euclidean distance
Distance 1
Min
imu
m S
elec
tor
Distance 2
Distance C
class
x
[Schalkoff, 1992]
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 4
β’ Example β Three-class 2D problem
with equal priors
π1 = 3 2 π π2 = 7 4 π π3 = 2 5 π
Ξ£1 =2
2Ξ£1 =
22
Ξ£1 =2
2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 5
Case 2: πΊπ = πΊ (diagonal)
β’ Classes still have the same covariance, but features are allowed to have different variances β In this case, the quadratic DFs becomes
ππ π₯ = β1
2π₯ β ππ
πΞ£πβ1 π₯ β ππ β
1
2log Ξ£π + πππππ =
β1
2βπ=1π
π₯π β ππ,π2
ππ2 β
1
2πππβπ=1
π ππ2 + πππππ
β Eliminating the term π₯π2, which is constant for all classes
ππ π₯ = β1
2βπ=1π
β2π₯πππ,π + ππ,π2
ππ2 β
1
2πππΞ π=1
π ππ2 + πππππ
β This discriminant is also linear, so the decision boundaries ππ(π₯) = ππ(π₯) will also be hyper-planes
β The equiprobable contours are hyper-ellipses aligned with the reference frame
β Note that the only difference with the previous classifier is that the distance of each axis is normalized by its variance
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 6
β’ Example β Three-class 2D problem
with equal priors
π1 = 3 2 π π2 = 5 4 π π3 = 2 5 π
Ξ£1 =1
2Ξ£2 =
12
Ξ£3 =1
2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 7
Case 3: πΊπ = πΊ (non-diagonal) β’ Classes have equal covariance matrix, but no longer diagonal
β The quadratic discriminant becomes
ππ π₯ = β1
2π₯ β ππ
πΞ£β1 π₯ β ππ β1
2log Ξ£ + πππππ
β Eliminating the term log Ξ£ , which is constant for all classes, and assuming equal priors
ππ π₯ = β1
2π₯ β ππ
πΞ£β1 π₯ β ππ
β The quadratic term is called the Mahalanobis distance, a very important concept in statistical pattern recognition
β The Mahalanobis distance is a vector distance that uses a Ξ£β1norm,
β Ξ£β1 acts as a stretching factor on the space
β Note that when Ξ£ = πΌ, the Mahalanobis distance becomes the familiar Euclidean distance
x2
x1
Ξ ΞΌ-x2
i =
K ΞΌ-x2
i 1 =-Γ₯
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 8
β Expanding the quadratic term
ππ π₯ = β1
2π₯πΞ£β1π₯ β 2ππ
πΞ£β1π₯ + πππΞ£β1ππ
β Removing the term π₯πΞ£β1π₯, which is constant for all classes
ππ π₯ = β1
2β2ππ
πΞ£β1π₯ + πππΞ£β1ππ = π€1
ππ₯ + π€0
β So the DFs are still linear, and the decision boundaries will also be hyper-planes
β The equiprobable contours are hyper-ellipses aligned with the eigenvectors of Ξ£
β This is known as a minimum (Mahalanobis) distance classifier
Distance 1
Min
imu
m S
elec
tor
Distance 2
Distance C
Γ₯
class
x
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 9
β’ Example β Three-class 2D problem
with equal priors
π1 = 3 2 π π2 = 5 4 π π3 = 2 5 π
Ξ£1 =1 .7.7 2
Ξ£2 =1 .7.7 2
Ξ£3 =1 .7.7 2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 10
Case 4: πΊπ = ππππ°
β’ In this case, each class has a different covariance matrix, which is proportional to the identity matrix β The quadratic discriminant becomes
ππ π₯ = β1
2π₯ β ππ
πππβ2 π₯ β ππ β
1
2πlog ππ
2 + πππππ
β This expression cannot be reduced further
β The decision boundaries are quadratic: hyper-ellipses
β The equiprobable contours are hyper-spheres aligned with the feature axis
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 11
β’ Example β Three-class 2D problem
with equal priors
π1 = 3 2 π π2 = 5 4 π π3 = 2 5 π
Ξ£1 =.5
.5Ξ£2 =
11
Ξ£3 =2
2
Zoom out
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 12
Case 5: πΊπ β πΊπ (general case)
β’ We already derived the expression for the general case
ππ π₯ = β1
2π₯ β ππ
πΞ£πβ1 π₯ β ππ β
1
2log Ξ£π + πππππ
β Reorganizing terms in a quadratic form yields
ππ π₯ = π₯ππ2,ππ₯ + π€1,ππ π₯ + π€0,π
where
π2,π = β1
2Ξ£πβ1
π€1,π = Ξ£πβ1ππ
π€π,π = β1
2πππΞ£π
β1ππ β1
2log Ξ£π + πππππ
β The equiprobable contours are hyper-ellipses, oriented with the eigenvectors of Ξ£π for that class
β The decision boundaries are again quadratic: hyper-ellipses or hyper-parabolloids
β Notice that the quadratic expression in the discriminant is proportional to the Mahalanobis distance for covariance Ξ£π
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 13
β’ Example β Three-class 2D problem
with equal priors
π1 = 3 2 π π2 = 5 4 π π3 = 3 4 π
Ξ£1 =1 β1β1 2
Ξ£2 =1 β1β1 7
Ξ£3 =.5 .5.5 3
Zoom out
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 14
Numerical example β’ Derive a linear DF for the following 2-class 3D problem
π1 = 0 0 0 π; π2 = 1 1 1 π; Ξ£1 = Ξ£2 =.25
.25.25
; P2 = 2P1
β Solution
π1 π₯ = β1
2π2π₯ β π1
π π₯ β π1 + ππππ1 = β1
2
π₯ β 0π¦ β 0π§ β 0
π4
44
π₯ β 0π¦ β 0π§ β 0
+ πππ1
3
β’ π2 π₯ = β1
2
π₯ β 1π¦ β 1π§ β 1
π4
44
π₯ β 1π¦ β 1π§ β 1
+ πππ2
3
β’ π1 π₯π1><π2
π2 π₯ β β2 π₯2 + π¦2 + π§2 + ππ1
3 π1><π2
β 2 π₯ β 1 2 + π¦ β 1 2 + π§ β 1 2 + lg2
3
β’ π₯ + π¦ + π§ π2><π1
6βπππ2
4= 1.32
β Classify the test example π₯π’ = 0.1 0.7 0.8 π
0.1 + 0.7 + 0.8 = 1.6
π2
><π1
1.32 β π₯π’ β π2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 15
Conclusions β’ The examples in this lecture illustrate the following points
β The Bayes classifier for Gaussian classes (general case) is quadratic β The Bayes classifier for Gaussian classes with equal covariance is linear β The Mahalanobis distance classifier is Bayes-optimal for
β’ normally distributed classes and β’ equal covariance matrices and β’ equal priors
β The Euclidean distance classifier is Bayes-optimal for β’ normally distributed classes and β’ equal covariance matrices proportional to the identity matrix and β’ equal priors
β Both Euclidean and Mahalanobis distance classifiers are linear classifiers
β’ Thus, some of the simplest and most popular classifiers can be derived from decision-theoretic principles β Using a specific (Euclidean or Mahalanobis) minimum distance classifier
implicitly corresponds to certain statistical assumptions β The question whether these assumptions hold or donβt can rarely be
answered in practice; in most cases we can only determine whether the classifier solves our problem