The Quadratic Formula and the Discriminant. The Discriminant.
Flexible Discriminant and Mixture Modelshastie/TALKS/mfpda.pdfStanford University—April 28, 1997...
Transcript of Flexible Discriminant and Mixture Modelshastie/TALKS/mfpda.pdfStanford University—April 28, 1997...
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 1'
&
$
%
Flexible Discriminantand Mixture Models
Trevor Hastie
Statistics Department
and Division of Biostatistics
Stanford University
joint with Andreas Buja and Rob Tibshirani
April 28, 1997
Papers fda.ps.Z, pda.ps.Z and mda.ps.Z are available from:
ftp://playfair.stanford/edu/pub/hastie
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 2'
&
$
%
Theme: Modular Extensions of Standard Tools
Linear Discriminant Analysis or LDA is a classic technique
for discrimination and classification
Virtues of LDA:
+ Simple prototype method for multiple class classification
+ Can produce optimal low dimensional views of the data
+ Sometimes produces the best results; e.g. LDA featured
in top 3 classifiers for 11/22 of the STATLOG datasets,
overall winner in 3/22.
Limitations of LDA:
- Lots of data, many predictors: LDA underfits (restricts to
linear boundaries)
- Many correlated predictors: LDA (noisy/wiggly
coefficients)
- Dimension reduction limited by the number of classes
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 3'
&
$
%
Example of extension: FDA
� Y = SX(Y ) where Y is an indicator response matrix
and SX a regression procedure (Linear regression,
Polynomial Regression, Additive Models, MARS, Neural
Network, � � �)� eigenY T Y = eigenY TSXY ) LDA, flexible
extensions of LDA.
Typically this amounts to expanding/selecting the predictors
via basis transformations chosen by regression, and then
(penalized) LDA in the new space.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 4'
&
$
%
Example: Vowel Recognition
Vowel Word Vowel Word
i heed o hod
I hid c: hoard
E head U hood
A had u: who’d
a: hard 3: heard
y hud
11 symbols, 8 speakers (train), 7 speakers (test), 6
replications each. 10 inputs features based on digitized
utterances.
Source: Tony Robinson, via Scott Falman, CMU
GOAL: Predict Vowels
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 5'
&
$
%
Some Results
Technique Error rates
Training Test
(1) LDA 0.32 0.56
Softmax 0.48 0.67
(2) QDA 0.01 0.53
(3) CART 0.05 0.56
(4) CART (linear combination splits) 0.05 0.54
(5) Single-layer Perceptron 0.67
(6) Multi-layer Perceptron (88 hidden units) 0.49
(7) Gaussian Node Network ( 528 hidden units) 0.45
(8) Nearest Neighbor 0.44
(9) FDA/BRUTO 0.06 0.44
Softmax 0.11 0.50
(10) FDA/MARS (degree = 1) 0.09 0.45
Best reduced dimension (=2) 0.18 0.42
Softmax 0.14 0.48
(11) FDA/MARS (degree = 2) 0.02 0.42
Best reduced dimension (=6) 0.13 0.39
Softmax 0.10 0.50
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 6'
&
$
%Coordinate 1 for Training Data
Coo
rdin
ate
2 fo
r T
rain
ing
Dat
a
-4 -2 0 2 4
-6-4
-20
24
oooooo
ooo
o
oo
ooo
o oo
ooo
ooo
o o oo oo
ooo o oo
oo
o
o
o o
o oo
ooo
oooooo
o
o
oo
o
o
oooo
o
o
oooooo
o oooo
o
o
o
oooo
oooooo
o
oo
oo
o
oooooooo
ooo
o
o
ooo
o
o
o ooooo
ooo o oo
oooooo
oo o ooo
o ooo
oo
oooooo
oooooo
oo
oooo
oooooo
oooooo
oooooo
oooooo
oo
oooo
ooo
oo
o
o ooooo
ooooo
o
oooooooooooo
oooooo
oooooo
oo
o
ooo
ooooooo ooo o o
o ooo
ooo
oooo
o
ooo
o
oo
oooo
o
o
oooo
o
o oo
o
ooo
oooooo
oooooo
oo
oo oo o
oo
oo o
o o o
oo
ooooooo
oooo
o
o
oo
o
ooo
ooo
ooo
o o ooo o
ooooo
o
ooooo
o
o o
oo
oo
ooo ooo
ooo
oo o
o
oo
o o o
ooooo
o
oooo
o
o
o
oo
ooo
ooooo
o
oooo
oo
o oooo
o
oooooo
ooo o
oo
ooooooo oo
o o o
oooooo
oo
ooo o
oo
o oo o
ooo
ooo
o
o
oo
oo
o oo
oooooo
ooo
oooooo
ooo
o ooo oo
oo
oooo
ooo
ooo
ooo
oooooo
oo
o
oo o
••••
•••• •••• •• ••
•• ••••
Linear Discriminant Analysis
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 7'
&
$
%Coordinate 1 for Training Data
Coo
rdin
ate
2 fo
r T
rain
ing
Dat
a
-10 -5 0 5
-6-4
-20
24
6
oooooo
oo
ooo
o
oooooo ooooo
ooooooo o
o o oooooo
oo
o
o
ooo
o o
oo
oo o oo
oooo
o
ooooo
o
oooooooooo
o
o
oo
oooo
oo
o ooo
oo
o o oo
ooooo
o
ooooo
oo ooooo
ooooooo
oo
o
oo
ooo
oo
o
ooo
oo
o
oooo
oo
oo
oooo
oooooo
oooooooooooo
oooooo
ooo
ooo
oooo oo
oooo
oo
ooo
ooo
ooooooooooo
o
oo
o
ooooooo
oooo
oooo
oooooo
oo
oo
o
o oooooo
o
o
oo
o
o
ooooo
o
oo
ooo
o o
o
oooo
oooooo
oooo
ooo
oo
ooo
oooo o
o
oo
ooooo
o
ooo
o
oo
o
o
o
o
o
oooo
o
oooo
oo
ooo oo
o
oo
oooo
ooo
o
oo o
oo
oo
ooooo o
oo
oooo
o
o
ooo
o
oo
o
o
oo
o
ooooo o
oo
o
oo o
ooo oo
o
oo
oo
oo
o
o
o
oo o o
o
ooo
o
oo oooo
oo
oo
o
o
oo
o
o
o
o
oooo
o
o
ooooo
oo
oo ooo
ooooo
o
ooooo
o
oooo
oo
oo
oooo
oooooo
o
oo
o
oo
o ooo
ooo
oo
o o
o
oo
ooo o
o ooooooo oo
o
o
oooooo
o
ooooo
oo
ooo
o
•• •••• ••
••••
••••
••••
••
Flexible Discriminant Analysis
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 8'
&
$
%
List of Extensions
� (Reduced Rank) LDA! (reduced rank) FDA via flexible
regression: Y = SXY� (Reduced rank) LDA! (reduced rank) PDA (Penalized
Discriminant Analysis) via penalized regressionY = SY = [X(XTX + �)�1XT ]Y , e.g. for
image and signal classification.� (Reduced rank) Mixture models. Each class a mixture of
Gaussians. Each iteration of EM is a special form of
FDA/PDA: Z = SZ where Z is a random response
matrix.� In the mixture model above the classes can share
centers (radial basis functions) or own separate ones.
Separate Centers per Class
Z = Z =
Common Centers
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 9'
&
$
%
� In the above we use full likelihood for training:P (X;G) = P (GjX)P (X); what if we use the
conditional likelihood P (GjX)?[+ LDA becomes multinomial regression, and the
non-parametric versions likewise.
[+ The mixture problems again result in an E-M
algorithm, where each M-step is a multinomial
regression with random response Z .
References:Breiman and Ihaka (1984) Unpublished manuscript
Campbell (1980) Applied Statistics
Kiiveri (1982) Technometrics
Ripley and Hjort (1995) Monograph in preparation
Ahmad and Tresp (1994) papers on RBFs
Bishop (1995) Monograph preprint
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 10'
&
$
%
Details of LDA
LDA is Bayes rule for P (XjG) Gaussian with density�(X;�j ;�) = 1(2�pj�j) 12 e� 12 (X��j)T��1(X��j)P (jjx) = �(x;�j ;�)�jP` �(x;�`;�)�`� exp�xT��1�j � 12�Tj ��1�j + log�j�= exp(xT�j + �j)
Note: maxrankf�jg=K;� JXj=1Xgi=j [log�(xi;�j ;�) + log�j ]
is equivalent to Fisher’s rank-K LDA:max vTk Bvk subject to vTkWvk = 1; k = 1; : : : ; Kwhere B and W are the sample Between and Within covariance
matrices. The latter is the usual formulation of Fisher’s LDA.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 11'
&
$
%
LDA, FDA () Optimal ScoringnXi=1[�(gi)� �(xi)]2 + J(�) = minwith normalization
Pi �2(gi) = 1, and “roughness”
penalty functional J .
�(xi) =8>>>>>>>><>>>>>>>>:
xTi � LDAf1(xi1) + f2(xi2) + : : :+ fp(xip)MARS(xi)NN(xi)...
Often �(x) =Pm hm(x)�m and J(�) = �T�.
There is a 1-1 correspondence between optimal scoring and
Fishers LDA, as well as the flexible extensions.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 12'
&
$
%
Optimal scoring algorithm for LDA/FDA
Indicator Response Matrix Y0BBBBBBBBBBB@
C1 C2 C3 C4 C5g1 = 2 0 1 0 0 0g2 = 1 1 0 0 0 0g3 = 1 1 0 0 0 0g4 = 5 0 0 0 0 1g5 = 4 0 0 0 1 0...
...gn = 3 0 0 1 0 0
1CCCCCCCCCCCAY = SYY T Y = �D�T
where S = linear regression, additive regression, MARS,
NN,. . . , each giving a different version of FDA.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 13'
&
$
%
FDA and Penalized Discriminant Analysis
The steps in FDA are� Enlarge the set of predictors X via a basis expansionh(X), and hence inject us into a higher dimensional
space.� Use (penalized) LDA in the enlarged space, where the
penalized Mahalanobis distance is given byD(x; �) = (h(x)�h(�))T (�W+)�1(h(x)�h(�))�W is defined in terms of bases functions h(xi).� Decompose the classification subspace using a
penalized metric:max tr(UT�BetU) subject to UT (�W +)U = I
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 14'
&
$
%
Skin of the Orange
x[,1]
x[,2
]
-4 -2 0 2 4
-4-2
02
46
00
00 00
000
0
0
00
00
00 0
00
0
00
0
0
000 0
0
0
0
00
0
000
0 0000
0
0
0
00
0
00
0
0
00
0
0
000
00
0
00 0
0
0
0
000
0
000
00 0
0 0
0 0
0
000
00 0
000
0
0
00 00
0
00
0000 0
0
000
0
0
0
0
0
0
00
0
00
0
0 000
000
00
00 0
0
0
000 00
0
00
0 0 0
00
00 0
0
00 0
0
00 0
0
00
0
00
0
0
0
0
0
00 0
0
0
000
0
0 00
00
0
00
00
00
00
0
0
00
0
0
0
0
0
0
0
0
0
0
0
000
0
0
0
000
0
0
0
0
0 0
0
0
0 0
0
0
0
000
0
0
0
0
0
0
0
0
0
00
0
0
0
0
00
0
0
0
00
00
0
00
0
000
0
0
0
0
0
0
0
0
0
00
00
0
0
0
0 0
0
0
0
00
0
0
0
0
0
00 00
0
0
0
0
00
000
00
0
0
00
0
0
0 00 00
0
0
00
0
0
0
00
0
0
0
0
00
0
0
00
0
0
0
00
0
0
0
0
0
0
0
00
00
0
0
0
00
0
0
0
0 000
0
0
0
0
0
0
0 0
0
0
0
0
00
0
0
0
00
0
00
0
00
00
0
0
00
00
Training Data
x[,1]
x[,2
]
-4 -2 0 2 4
-4-2
02
46
00
00 00
000
0
0
0
00
00 0
00
0
00
0
000
0
00
0
000
0 0000
0
0
00
00
0
00
0000
00
0
00 0
0
0
0
000
0
000
00 0
0 0
0 0000
00 0
000
0
00 00
0
00
0000 0
000
0
0
0
0
0
0
00 000 000
00
00
00 0
0
0
00 00
00
0 0 0
00 0
0
00 0
00 0
0
00
0
00
0
0
0
0
0
00 0
0
000
0
0 00
00
00
00
0
000
00
0
0
0
000 00 0
00
0
0
00 0
0
00
000
00
0 0
0
0
0
0
0 000
0
0
00
0
0
000
0
0
0
0
0
00
00
00
00
00 000
0
0
0
00
0
000
0
00
0 00
0
0
0
0
000
0
000
0
00
0
0
0
00
0
0
00
0 0
00
00
0
00
0
0
0
00
0
0
0
0
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
00
00
0
00
0
000
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
000
0
0
0
0
0 00 00
0
00
0
0
0
0
0
00
0
0
00
0
0
0
0
0
0
0
000
0
0
0
000
0
0
0
0
0
0
0
0
00
0
00
0
00
Predicted Classes
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 15'
&
$
%
FDA vs Regression
In FDA algorithm, we decomposeY TS(�)Y = Y T YY fitted values for dummy response matrix
Why not stop at Y � E(Y jX)?i.e. for new x, computey1(x); y2(x); : : : ; yJ(x)and assign x to the class j with the largest yj(x).
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 16'
&
$
%
Softmax after Linear Regression
1
1
1
1
1
1
11
1
1
1
11
1
1
11
1
11
1
1
11
1
1
1
1
1
11
1
1
11
1
11
1
1 1
1
11 1
1
1
1
1
1
1
11
11
1
1
1
1
1
11
1
1
1
11
111
1
1
111
1
1
1
1
11
11
1
1
1
1
1
11
1
11
1
11
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
11
1
11
1
1
1
1
1
1 11
11
1
1
1
1
1
1
1
1
1
1
1
11 1
1 1
1
111
1
1
1
1
1
1
1
1 1
1
1
1
1
1
11
1 1
11
1
1
1 1
11 1
1
1
1
1
11
1
1
1
1
1
1 1
11
1
1
1
1
111
1
1
1
11
111
1 1
1
1
1
1
1
1
11 11
1
1
11
1
11
1
1
1
1
11
1
1
1
1
1 1
1
1
1
1
11
1
11
11
11
1
1
1 1
1
11
1
11
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
11
11
1 1
1
1
1
1
1
1
1 1
1
1
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
11 11
1
11
1
1
1
1
1
1
1
111
1
1
1
1
1
11
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
11
1
1
1
1
111
1
1
11
1
11
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1
1
11
1
1 1
1
11
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111
1
1
1
1
1 11 1
11
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
11
1
11
11
1
11
1
1
1
1
1
1 1
1
1
1
1
11
11
1
1
1
1
1111
1
1
1
11
1
2 2
22
2
2
2
2
2
22
2
2
2
22
22
2
22
2
2
2
22
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
222
222
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
22
2
2
22
2
2
2
2
2
2
2
2
22
2
2
2
222
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 22
2
222
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2 2
2
2
2
2 22 2
2
2
2
22
2
22
2
2
2
22
2
22
22
2
2
2
2
2
2 222 2
22
22
2
2
2
2
2 2
2
2
2
2
22 22
2
2
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
2
22
2
2
2
2
2
2
22
22
2
2
2
2
2
2
2
2
2
22
2
22
22
2
2
2
2
2
22
2 2
2
2
222
2
22
2
2
2
2
2
22
2
2
22
22
2
2
22
2 2
2
22
2
2
2
2
2
2
2
2
22
2 2
2
22
2
22
2
2
2
2
2
2
2
2
2
2
2
2
22
2
22
2
2
2
2
2
2
2 2
2
2
2
2
2
22
22
2
2 22
22
2
2
2
2
22
2
2
2
2
22
2
22
2
2
2
2
22
22
2
22
2 22
2
2
22
2
2
2
2
2
2
2
22
2
22
2
2
22 22
22
2
2
2
2
2
2
22
2
2
2
2
2
2
22 2
2
22
2
2
2
222 2
22
2
2
22
2
22
2
2
22 22
2
2
2
2
2
22
22
22
2
22
2
2
2
2
2
2
2
222
22
22
22
2
2
2
2
2
2
2
2 22
2
2
2
2
2
22
2
2
22
2
2
2
2
2
2
2
22
2
2
22 22
2
2
2
2
22
22
22
2
3 3
3
3
33
33 3
333
3
3 3
33
33
3
3
33 3
3
3
3
3
3
3
3
3
3
3
3
3 3
33
3
3
3
33
3
33
3
3
3
3
3
3
3 3
3
3
33
3
3
3
3
3
3
33
3
3
3
3
3
3
33
3
3
33
3
3
3 3
3
3
3
3
33
3
3
33
3
33
3
3
3
3
3
3 3
3
3
3
3
3
3
3
3
3
33
3
3
3
3
3
3
33
3
3
3
3
3
33
333
3
3
3
3
333 3
3
3 3
3
3 3
3
3
3
33
3
3
3
3
3
3
3
3
3
3
3
3 33
3
3
3
3
3
3
33
3
3
3
3
3
3
3
3
3
33 33
33
33
3
3
33
3
3
33
33
3
3
3
3
33
33
3
3
3
3
33
33
3
3
3
3
3
3
3
3
33
3
3
3
33
33 3
3
33
3
3
33
3
3
3
3
33 3
3
3
3
3
3
3
3
3
3
3 3
3
3
3
3
3
33
3
3
3
3
3
3
3
3
3
33
3
3
3
3
3
33
3
333
3
3
3
3
3 3 33
3
33
3
3
3
3
3 3
3
33
33
3
3
3
3
3
3
33
3
3
3 3
3
3
3
3
3
3
3
33
33
3
3
33
3
3
3
3
3 3
3
3
3
33
3
3
3
3
33
3
33
3
3
3
3
3
3
3
3
33
33
3
3
33
3
33
3
3
3
3
3
33
33
3
3
3
33
33
3
3
33
3
3
3 3
3
3
3
33
3
3
33
3
33
3
33 3
3
33
33
3
3
3
3
3
3
3
3
3
3
3
3
33 3
3
3
3
333
33
3 3
33 3
3 33
3
33
33
3
3
33
3
3
3 3
3 3
33 3
33
3
3
3
33
3
3
33
3
3
3
33
33
33
3
33 3 3
3
3
33
3
X1
X2
error: 0.3333LDA
X1
X2
1
1
1
1
1
1
11
1
1
1
11
1
1
11
1
11
1
1
11
1
1
1
1
1
11
1
1
11
1
11
1
1 1
1
11 1
1
1
1
1
1
1
11
11
1
1
1
1
1
11
1
1
1
11
111
1
1
111
1
1
1
1
11
11
1
1
1
1
1
11
1
11
1
11
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
11
1
11
1
1
1
1
1
1 11
11
1
1
1
1
1
1
1
1
1
1
1
11 1
1 1
1
111
1
1
1
1
1
1
1
1 1
1
1
1
1
1
11
1 1
11
1
1
1 1
11 1
1
1
1
1
11
1
1
1
1
1
1 1
11
1
1
1
1
111
1
1
1
11
111
1 1
1
1
1
1
1
1
11 11
1
1
11
1
11
1
1
1
1
11
1
1
1
1
1 1
1
1
1
1
11
1
11
11
11
1
1
1 1
1
11
1
11
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
11
11
1 1
1
1
1
1
1
1
1 1
1
1
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
11 11
1
11
1
1
1
1
1
1
1
111
1
1
1
1
1
11
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
11
1
1
1
1
111
1
1
11
1
11
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1
1
11
1
1 1
1
11
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
111
1
1
1
1
1 11 1
11
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
11
1
11
11
1
11
1
1
1
1
1
1 1
1
1
1
1
11
11
1
1
1
1
1111
1
1
1
11
1
2 2
22
2
2
2
2
2
22
2
2
2
22
22
2
22
2
2
2
22
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
222
222
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
22
2
2
22
2
2
2
2
2
2
2
2
22
2
2
2
222
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 22
2
222
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2 2
2
2
2
2 22 2
2
2
2
22
2
22
2
2
2
22
2
22
22
2
2
2
2
2
2 222 2
22
22
2
2
2
2
2 2
2
2
2
2
22 22
2
2
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
2
22
2
2
2
2
2
2
22
22
2
2
2
2
2
2
2
2
2
22
2
22
22
2
2
2
2
2
22
2 2
2
2
222
2
22
2
2
2
2
2
22
2
2
22
22
2
2
22
2 2
2
22
2
2
2
2
2
2
2
2
22
2 2
2
22
2
22
2
2
2
2
2
2
2
2
2
2
2
2
22
2
22
2
2
2
2
2
2
2 2
2
2
2
2
2
22
22
2
2 22
22
2
2
2
2
22
2
2
2
2
22
2
22
2
2
2
2
22
22
2
22
2 22
2
2
22
2
2
2
2
2
2
2
22
2
22
2
2
22 22
22
2
2
2
2
2
2
22
2
2
2
2
2
2
22 2
2
22
2
2
2
222 2
22
2
2
22
2
22
2
2
22 22
2
2
2
2
2
22
22
22
2
22
2
2
2
2
2
2
2
222
22
22
22
2
2
2
2
2
2
2
2 22
2
2
2
2
2
22
2
2
22
2
2
2
2
2
2
2
22
2
2
22 22
2
2
2
2
22
22
22
2
3 3
3
3
33
33 3
333
3
3 3
33
33
3
3
33 3
3
3
3
3
3
3
3
3
3
3
3
3 3
33
3
3
3
33
3
33
3
3
3
3
3
3
3 3
3
3
33
3
3
3
3
3
3
33
3
3
3
3
3
3
33
3
3
33
3
3
3 3
3
3
3
3
33
3
3
33
3
33
3
3
3
3
3
3 3
3
3
3
3
3
3
3
3
3
33
3
3
3
3
3
3
33
3
3
3
3
3
33
333
3
3
3
3
333 3
3
3 3
3
3 3
3
3
3
33
3
3
3
3
3
3
3
3
3
3
3
3 33
3
3
3
3
3
3
33
3
3
3
3
3
3
3
3
3
33 33
33
33
3
3
33
3
3
33
33
3
3
3
3
33
33
3
3
3
3
33
33
3
3
3
3
3
3
3
3
33
3
3
3
33
33 3
3
33
3
3
33
3
3
3
3
33 3
3
3
3
3
3
3
3
3
3
3 3
3
3
3
3
3
33
3
3
3
3
3
3
3
3
3
33
3
3
3
3
3
33
3
333
3
3
3
3
3 3 33
3
33
3
3
3
3
3 3
3
33
33
3
3
3
3
3
3
33
3
3
3 3
3
3
3
3
3
3
3
33
33
3
3
33
3
3
3
3
3 3
3
3
3
33
3
3
3
3
33
3
33
3
3
3
3
3
3
3
3
33
33
3
3
33
3
33
3
3
3
3
3
33
33
3
3
3
33
33
3
3
33
3
3
3 3
3
3
3
33
3
3
33
3
33
3
33 3
3
33
33
3
3
3
3
3
3
3
3
3
3
3
3
33 3
3
3
3
333
33
3 3
33 3
3 33
3
33
33
3
3
33
3
3
3 3
3 3
33 3
33
3
3
3
33
3
3
33
3
3
3
33
33
33
3
33 3 3
3
3
33
3
error: 0.0047
X
Y
0
1 | || || ||| || | ||| | | | || || || || ||| ||| | |||||| | || | | || | ||| | || ||| || || | || || || || ||| || ||| || | || | || || | | || ||| || ||| || || | ||||| | || || || ||| | ||| | | || ||||| | ||| || ||| ||| || || || ||| | || | || | | || || || |||| ||| ||| |||||| | || || || | ||||| |||| | | || | | || | ||| |||| | || | || || || ||| || | |||| | || | || || ||| | ||| || || || || |||| | || |||| ||| | || || | || |||| | ||| | || || ||| || | |
With many classes, high order (polynomial) interaction terms
might be needed to avoid masking.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 17'
&
$
%
FDA vs PDA
There are two complimentary situations:
FDA h(x) is an expansion of x into m� p basis functions.�T� makes �(x) = �Th(x) smooth in x.
PDA h(x) = x, e.g. digitization of an analog
signal—x = (x1; : : : ; xp), xj = x(tj).�T� makes � smooth in t
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 18'
&
$
%
Handwritten Digit Identification
A sample of 5 handwritten digits within each digit class.
2000 training and 2000 test images.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 19'
&
$
%
Speech Recognition
aa
Frequency
Log-
Per
iodo
gram
0 50 100 150 200 250
05
1015
2025
ao
Frequency
Log-
Per
iodo
gram
0 50 100 150 200 250
05
1015
2025
dcl
Frequency
Log-
Per
iodo
gram
0 50 100 150 200 250
-50
510
15
iy
Frequency
Log-
Per
iodo
gram
0 50 100 150 200 250
05
1015
2025
sh
Frequency
Log-
Per
iodo
gram
0 50 100 150 200 250
05
1015
20
A sample of 10 log-periodograms within each phoneme class. 1633
training (11 speakers) and 1661 test (12 different speakers) frames.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 20'
&
$
%
Reasons for Regularization
There are two different reasons why regularization is
necessary:
Estimation: With digitized analog signals, the
variables-to-observation ratio can be small.
Regularization is needed for consistency in estimating a
population LDA model [Leurgans, Moyeed & Silverman,
JRSSB, 1993]
Interpretation: The LDA coefficients are given by� = ��1�j . If the eigenstructure of �W concentrates
on low-frequency signals, then � will tend to be noisy,
and hence hard to interpret.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 21'
&
$
%
PDA coefficients: Handwritten Digits
LDA: Coefficient 1 PDA: Coefficient 1 LDA: Coefficient 2 PDA: Coefficient 2 LDA: Coefficient 3 PDA: Coefficient 3
LDA: Coefficient 4 PDA: Coefficient 4 LDA: Coefficient 5 PDA: Coefficient 5 LDA: Coefficient 6 PDA: Coefficient 6
LDA: Coefficient 7 PDA: Coefficient 7 LDA: Coefficient 8 PDA: Coefficient 8 LDA: Coefficient 9 PDA: Coefficient 9
Left images: LDA coefficient image
Right images: PDA coefficient image
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 22'
&
$
%PDA: Coordinate 1
PD
A: C
oord
inat
e 2
0
0
0
0
0 0
0
00
0
00
0 0
000
0
0
0
00
0
0
00
00
0 00
0 0
00
0
0
000
0
00
0
0
0000
00
0
000
0 0
0
00
0
0
0
0
0
0
00
0
0
000
0
0
00
0
0 00
0
00
0 0000
00 0
00
0
0
00
00
0 000
0
0
0
0
0
0 00
00
0
0
0 0
00
0
0
0
0
0
00
0
0
0 0
0
0
0
00
0
0
0
0
0
0
0
0
0
000
0
00
000
0
0
0 0
0 0
0
00 00
0
0
0 0
0
0
0
0
0 0
00
0
0
0
0
0
0
0
00 00
0 0
0
0
0
000
0 00
00
0 0
00 000
000
0
0
0
0
0
0
0
0
00
0
0
0
00
000
00
0
0
0
0
0
00
0
0
0
0
0
00
0
0 0
0
000
00
00
0
0
0
0
0
00
0
00 0
00
0
0
0
0
00
0
0
0
0
0
0
0
0
0
0
0
0
0
00 0
000
1
1
1111
11
1
1
11
1
1
111
1
1
11 111
1
1
1
1
1
1
1
11
11
111
1
1111 1
1
1
11
11
1 111
11
11
1
11
1111 11
1 1
1
1 1111
11
1
11
11
1
1
1
1
11
1
11111
1
11 111
11
1
1111
11
1
111
1
1111
11
1
1
1
1 111
1
1
1
1
1
1
11
1
11
1
111
1
1111 1
11
11
11111
1
111
1
11
1
1
11 11
1
1 11111
1
11
11
1
1
11111
1
11 1
11
11 1111
1 1
11
1 11
1
1
1111
1
1
11
1
111
111
111 1
111 11
111
1111 1
11
111
111
1
1 1
1
1
1
1111
1
1
111
3
3
33
333
3
333
33
3
3
3
3
3
333
3
3
3333 3 3
33
33
3
33
3
3
3
3
333
3
3
3
3 33
3
3
333
33
3 3
3
3 3333
3 33
3
3
3
3
3
3 333
3
3
33
3
3
3
3
3 3
3
3
3
3
3
3
3
3
33
3
333
3 33
3
33
3
3
3
3
3
3 3
3
3
3
3
33
33
3
33
3
3
3 3
3
33
3
3
3
3
3
3
333
33
3
33
33
3
3
3
33
3
33
3
3
33
3
3
3
3
3
3
3
3
3
3
33
3
3
33
3333
3
3
3
3
3
33
3
3
3
3
33
333
3
3
3
3
3
3
3
3
3
3
3
3
3
3
33
3
33
3
3
3
3
3
3
333
33
3
3
3
3
3
3
3
3
3
33
33
3
3
333
3
3
33
3
3
333
3
9
999
9
9
9
9
9
9
9
9
9
9
9
9
99
99
99
99
9
9
99 9 9
9
9
9
9
9
9
9
9
9
99
99 99
9
9
9
999
9
9
9 9
9
9
9
9
9
9
9
9
99
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
999
9
99
9
9
9
99
9
9
9
9
9
99
9
9
9
9
9 9
9
9
9
9
9
99
9
99
9
9
9
9
9 999
99
99 9
9
99
999
9
9
9
99
9
9
9
9
99
9
9
9
9
999
99
9 9
9
9
9
8
8
8
8
88
8
8
88
8
8
8
8
8 8
8
88
8
88
8
8
8 8
8
8
8
8
8
8
8
8
88
8
8
8
8
8
88
8
88
8
88
88
8
8
8
8
8
8
88
8
8
8888 8
8
88
8
8
8
88
8
8
88
8
8
8
8
88 8
8
8
88
8
8
888
8
88 8
8
8
88 8
8 8
8
8
8 88
88
8
88
8
8
88
88
8
8
8
8 8
8
8
8
88
8
8
8
8
88
88
8
8
88
8
8
88
88 8
8
8
8
88
8
8
8
88
8
8
8
8
8
8
8 8
4
4
4
4
4
44
4
4
4
4
4
444
4
4
4
4
4
44
44
44
4
4
4
4
4
4
44
4
44
444
4
44
4
4
4
4 4
4
44
44
4
4
4 44
44
4
4 4
4
44
4
4
444
44
44
4
444
4
4 4
4
4
4
44
44
4
4
44
4 44
4
4 4
4
4
4 4 44
4
4
44
4
4
4
44
4
4
44 44
4
4
4
44
44
4 44
4
4
4
4
44
4
4 44
4
4
4
4
4
4
4
4
4
44
4 44
4
4
6
6
66
6
666
66
66
66
6
666
6
66
66
6
6
666
6
6
6
6
6
6
6
6
6
66
6
66
66
6
6
66
6
6
66
6
6
6
66
6
66
66
6 6
66
6
6
6
6
66
6
6
6
6
6 6
6
6
6
6
6
66
6
66
66
666 6
6
6
6
66
6
6
6
6
6
6 6
66
66
6
66
6
6
66
6
6
6
6
666
6
66 6
66
6
6
6
6
6
6
6 66
6
66
66
6
6
66
6
6
6
6
6
6 6
6
6
66
6
6
66
6 6
6
6
6
666
666
66
6
6
66
6
66
6
66
6
6
6
6
6
6
6
6
6
6
6
6666 6
6
6
6
6
6
66
6
6
6
66
6
6
6
6
6
6
7
77
77
7
7
77
7
7
7
77
77 77
77
7
77 7
7
77
7
7
7
77
7
7
7
77
7
7
77
7
7
7
7
77
7
77
7 77
77
7
7
7
7 77
7
77
77
77 7
7
7
7
77
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
77
7 7
777
77
7
77
77
7
777
7
7
7
7
77
77
777
777
7
7
77
77
77 7
7
7
7
7
7777 7
777
77777 7
7 7
7
77
77
77
77
7
7
7
7
7
7
7
7
7
77
7
7
7
7
7
7777
7 77
7
7
7 77
5
5
5
5
5
5
5
55
5
55
5
5
5
5
5 5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
555 55
5
5
5
5
5
5 55
5
55
5
55
5
55
5
55
5
5
5
55
5
55
55
5
55
5
5
5
55
5
5
5
5
55
5
5
55
5
5
5
5555
5
55
5
5 5
5
55
5
5
5
5
5 5
5
5
5
5
5
5
5
55
5
5
5
5
5 5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
55
55
55
5
5
55
5
5
5
5
55
5 55
52
22
2
22
22
22
2
2
2
2
2
2
22
22 2
2
2
2
2
2
22
2
2
2
2
2
2
22
22
2
22
2
2
2
2
22
22
2
2
2
2
2
2
22
2
2
2 2
22
22
2
2
22
22 2
22 2
2
2
2
22
22
22
2
2
2 2
22
22 22 2
22
2
2
2
2
222
2
2
22
2 2
222
222
22
2
2
Canonical Variate Plot --- Digit Test Data
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 23'
&
$
%
PDA coefficients: Phoneme Data
Frequency
Can
onic
al F
unct
ion
0 50 100 150 200 250
-0.2
-0.1
0.0
0.1
0.2
order 1
Frequency
Can
onic
al F
unct
ion
0 50 100 150 200 250
-0.2
-0.1
0.0
0.1
0.2
order 2
Frequency
Can
onic
al F
unct
ion
0 50 100 150 200 250
-0.2
-0.1
0.0
0.1
0.2
order 3
Frequency
Can
onic
al F
unct
ion
0 50 100 150 200 250
-0.2
-0.1
0.0
0.1
0.2
order 4
Ordinary LDA coefficient functions for the phoneme data,
and regularized versions.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 24'
&
$
%
Mixture Discriminant Analysis: MDA
1
1
1 1
1
11
1 1
11
1
1
1
1
1
1
1
11
1
1
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
111
1
1
1
1
1
11 1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
1
1 11
1 111
1
1 11
1
1
11 1
1
1
1
1
1
11
11 1
11
1
11 11
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
11
1
1
1
1
1
1
1
1
11
11
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1 11
11
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
11 1
11 111
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1 1 1 11
1
1
1
1
11
1
1
1
111
1
1
11
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
222
2
2
2
2 2
2
2
22
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2 2
22
2 22
2 22
2
2 2
2
2
2
2
2
2
2
2
2
2
2
22
2
2 2
2
2
22
2
2
22
2 2
2
22
2
22
22
2
2
2
2
2
2 2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 2
2
222
2
2
2 2
2
22
2
222
22 2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2222
2 2
2
2
2
Linear Discriminant Analysis
1
1
1 1
1
11
1 1
11
1
1
1
1
1
1
1
11
1
1
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
111
1
1
1
1
1
11 1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
1
1 11
1 111
1
1 11
1
1
11 1
1
1
1
1
1
11
11 1
11
1
11 11
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
11
1
1
1
1
1
1
1
1
11
11
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1 11
11
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
11 1
11 111
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1 1 1 11
1
1
1
1
11
1
1
1
111
1
1
11
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
222
2
2
2
2 2
2
2
22
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2 2
22
2 22
2 22
2
2 2
2
2
2
2
2
2
2
2
2
2
2
22
2
2 2
2
2
22
2
2
22
2 2
2
22
2
22
22
2
2
2
2
2
2 2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 2
2
222
2
2
2 2
2
22
2
222
22 2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2222
2 2
2
2
2
Mixture Discriminant Analysis
••
•
1
1
1 1
1
11
1 1
11
1
1
1
1
1
1
1
11
1
1
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
111
1
1
1
1
1
11 1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
1
1 11
111
11
1 1
11
1
11 1
1
1
1
1
1
11
11 1
11
1
11 11
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
11
1
1
1
1
1
1
1
1
11
11
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1 11
11
1
1
1 11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
11 1
11 111
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1 1 1 11
1
1
1
1
11
1
1
1
111
1
1
11
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
222
2
2
2
2 2
2
2
22
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
22
22
2 22
2 22
2
2 2
2
2
2
2
2
2
2
2
2
2
2
22
2
2 2
2
2
22
2
2
22
2 2
2
22
2
22
22
2
2
2
2
2
2 2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 2
2
222
2
2
2 2
2
22
2
222
22 2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2222
22
2
2
2
Learning Vector Quantization
••
••
•
•
•
1
1
1 1
1
11
1 1
11
1
1
1
1
1
1
1
11
1
1
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
111
1
1
1
1
1
11 1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
1
1 11
111
11
1 1
11
1
11 1
1
1
1
1
1
11
11 1
11
1
11 11
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
11
1
1
1
1
1
1
1
1
11
11
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1 11
11
1
1
1 11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
11 1
11 111
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1 1 1 11
1
1
1
1
11
1
1
1
111
1
1
11
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
222
2
2
2
2 2
2
2
22
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
22
22
2 22
2 22
2
2 2
2
2
2
2
2
2
2
2
2
2
2
22
2
2 2
2
2
22
2
2
22
2 2
2
22
2
22
22
2
2
2
2
2
2 2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 2
2
222
2
2
2 2
2
22
2
222
22 2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2222
22
2
2
2
Flexible Discriminant Analysis
1
1
1 1
1
11
1 1
11
1
1
1
1
1
1
1
11
1
1
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
111
1
1
1
1
1
11 1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
1
1 11
111
11
1 11
1
1
11 1
1
1
1
1
1
11
11 1
11
1
11 11
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1
1
1
1
1
1
11
11
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1 11
11
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
11 1
11 111
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1 1 1 11
1
1
1
1
11
1
1
1
111
1
1
11
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
222
2
2
2
2 2
2
2
22
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2 2
22
2 22
2 22
2
2 2
2
2
2
2
2
2
2
2
2
2
2
22
2
2 2
2
2
22
2
2
22
2 2
2
22
2
22
22
2
2
2
2
2
2 2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 2
2
222
2
2
2 2
2
22
2
222
22 2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2222
2 2
2
2
2
Mixture Discriminant Analysis
Rank 1 Model
••
•1
1
1 1
1
11
1 1
11
1
1
1
1
1
1
1
11
1
1
1
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
111
1
1
1
1
1
11 1
1
1
1
1
1
1
11
1
1
1
1
1
1 1
1
1 11
111
11
1 11
1
1
11 1
1
1
1
1
1
11
11 1
11
1
11 11
1
1
1
1
11
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
11
1
1
1
1
1
1
1
1
11
11
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1 11
11
1
1
1 1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
1
11
1
1
11 1
11 111
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1 1 1 11
1
1
1
1
11
1
1
1
111
1
1
11
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
222
2
2
2
2 2
2
2
22
222
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2 2
22
2 22
2 22
2
2 2
2
2
2
2
2
2
2
2
2
2
2
22
2
2 2
2
2
22
2
2
22
2 2
2
22
2
22
22
2
2
2
2
2
2 2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2 2
2
222
2
2
2 2
2
22
2
222
22 2
2
2
2
2 2
2
2
2
2
2
2
2
2
2
2
2
2
2
2222
2 2
2
2
2
MDA/FDA
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 25'
&
$
%
Gaussian Mixture Model
P (XjG = j) = RjXr=1 �jr�(X;�jr;�); Mixture of Gaussians
ThenP (G = jjX = x) = PRjr=1 �jr�(X;�jr;�)�jPJ=1PR`r=1 �`r�(X;�`r;�)�`Estimate parameters by maximum likelihood of P (X;G) (possibly
subject to rank constraints!)maxrankf�jrg=K;� JXj=1Xgi=j log( RjXr=1 �jr�(xi;�jr;�)�j)
Note: reduced rank amounts to dimension reduction in predictor
space!
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 26'
&
$
%
EM and Optimal Scoring
E-Step:
Compute memberships Prob(obs 2 rth subclass of class j jx; j)W (crjx; j) = �r�(x;�jr;�)PRjk=1 �jk�(x;�jk;�)M-Step:Construct Random Response Matrix Z with elementsW (cjrjx; j):0BBBBBBBBBB@
c11 c12 c13 c21 c22 c23 c31 c32 c33g1 = 2 0 0 0 :3 :5 :2 0 0 0g2 = 1 :9 :1 :0 0 0 0 0 0 0g3 = 1 :1 :8 :1 0 0 0 0 0 0g4 = 3 0 0 0 0 0 0 :5 :4 :1g5 = 2 0 0 0 :7 :1 :2 0 0 0...
...gn = 3 0 0 0 0 0 0 :1 :1 :8
1CCCCCCCCCCAZ = SZZT Z = �D�TUpdate �s and �s
9>>=>>;, M-step of MDA
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 27'
&
$
%
Waveform Example
xi = uh1(i) + (1� u)h2(i) + �i Class 1xi = uh1(i) + (1� u)h3(i) + �i Class 2xi = uh2(i) + (1� u)h3(i) + �i Class 3
1 1 1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1
1
1
12 2 2 2 2 2 2 2 2
2
2
2
2
2
2
2
2
2
2
2
23 3 3 3 3
3
3
3
3
3
33
33
3
3
33
33
34 4 4 4 4
4
4
4
4
4
4
4
4
4
4
4
44
44
45 5 5 5 5
5
5
5
5
5
5 5 5 5 5
5
55
55
5
Class 1
11
11
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 12 2 2 2 2
2
2
2
2
2
2
2
2
2
2
2
2 2 2 2 23 3 3 3 3
3
3
3
3
3
3
3
3
3
3
3
3 3 3 3 34
4
4
4
4
4
44
44
4
4
44
44
4 4 4 4 45
5
5
5
5
5
55
55
5
5
55
55
5 5 5 5 5
Class 2
1
1
1
1
1
1
1
1
1
1
1
1
11
11
11
11
12
2
2
2
2
2
2
2
2
2
2
2
22
22
22
22
23
3
3
3
3
3
3
3
3 3 3 3 33
33
33
33
34
4
4
4
4
4
4
4
4
4
4
4
4 4 4 4 4 4 4 4 45
5
5
5
5
5
5
5
5
5
5
5
5 5 5 5 5 5 5 5 5
Class 3
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 28'
&
$
%
Discriminant Var 1
Dis
crim
inan
t Var
2
-6 -4 -2 0 2 4 6
-6-4
-20
24
1
1
1
11
1
1
11
1
11
1
1
1
1
1
1
1
1
111
1
1
1
11
1
11
1
1
1 1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
11
11 1
1
11
1
1 1
1
1
1
1
1
1
1
1
1
1
1 1
1
1
11
1
11
11 1
1
1
1
1
11
1
1
1
1
1
11
1
1
1
11
1
11
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
11
1
1
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
1
1
1
11
1
1
1
1
11
11
1
1
1
2
2
2
2
2
2
2
2
22
2 2
22
22
2
2
22
2
2
2
2
2
2
2
2
2
222
2
2
2
2
22
2 22
22
2
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
22
2
2
2
2
2
2
2
2
2
2
222
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
22
2
2
2
23
33 3
33
33
3 333
3
3
3
33
3
33
3
3
3
3
3
33
33
33
33
3
33
3
33
333
3
333
3 33
3
3
3
3
3
3
3
3 3
333
3
3
3
3
333
3
3
3
33
33
3
3
3
3
33
3
3
33 3
3
3
3
3
33
3
3
33
3
3
3
3
3
3
3 3 3
3 33 3
3
333
3
3
3
3
3
33
33
3
33
33
33
3
33
3
3
33
3
3 3
3
33
3
3
33
3
33
3
3
33
3
33 33
3
3
33 3
3
33
3
3
3
33
3 33
33
3 subclasses, penalized 4df
Discriminant Var 1
Dis
crim
inan
t Var
3
-6 -4 -2 0 2 4 6
-2-1
01
2
1
1
11
1
1
11
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
111
1
1
1
1
11
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
11
1
1
11
1
1
1
1
1
1
111 1
11
11
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
1
111
1
1
11
1 1
1
1
1
1 11
1
11
11
11
1
1
1
1
1
1
11
1
1
1
11
1
1
11
1
1
1
1
11
1
11
1
1
1
1
1
1
11
1
1 11 22
2
222
2
2 2
2
2
2
22
2
2
22
2
2
2
2
2
2
2
2
2
22
22
2
2
2
22
222
22
2
2
2
2
2
22
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
2
2
2
2 2
2
2
2
2 2
2
2
22
2
2
2
2
2
2
2
2
222
2
222
2
2
2
2
222
2
2
2
2
2
2
2
2
2 2
2
2
2
2
22
2
2
2
2
2
2
22
2
2
22
2
22
2
2
2
22
2
22
2
2
2
2
22
2
2
2
2
22
2
22 2
3
3 3 3
3
3
3
3
3
3
3
3
3
3
3
33
3
3
3
33
3
3
3
3
3
3
33
3
3
3
3
3
3 3
33 3
3
3
3
3
33
3333
3
33 3
3
3
3
3
3
3
3
3
3
3
3
3
3
3 3
33
3
33
3
33
3
3
3
3
33
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
33
3
3
333
333
3 3
3
3
3
3
3
33
3
3
3
33
3
3
33
3
3
33
3
3
3
3
3
3
3
33
3 3
3
33
33
33
333
3
3
3 3
3
3
33
3
3
3
3
3
3
33 3
3 subclasses, penalized 4df
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 29'
&
$
%
Penalized MDA coefficients
Class 1
Dis
crim
inan
t Fun
ctio
ns
Class 2
Dis
crim
inan
t Fun
ctio
ns
Class 3
Dis
crim
inan
t Fun
ctio
ns
Stanford
University—
April28,1997
Flexible
Discrim
inantandM
ixtureM
odels30
'&
$%
Simulation Results for Waveform Example
Technique Error Rates
Training Test
LDA 0.121(.006) 0.191(.006)
QDA 0.039(.004) 0.205(.006)
CART 0.072(.003) 0.289(.004)
FDA/MARS (degree = 1) 0.100(.006) 0.191(.006)
FDA/MARS (degree = 2) 0.068(.004) 0.215(.002)
MDA (3 subclasses) 0.087(.005) 0.169(.006)
MDA (3 subclasses, penalized 4df) 0.137(.006) 0.157(.005)
PDA (penalized 4df) 0.150(.005) 0.171(.005)
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 31'
&
$
%
Gaussian Mixtures and RBFs
P (X;G = j) = RXr=1 �r�(X;�r;�)Pr(j)This is a mixture of joint-densities Pr(X;G) with R shared
mixture components. ThenP (G = jjX = x) = PRr=1 �r�(x;�r;�)Pr(j)PRr=1 �r�(x;�r;�)This closely resembles (renormalized) RBFs.
Estimate parameters by maximum likelihood of P (X;G) (possibly
subject to rank constraints!)maxrankf�rg=K;� JXj=1Xgi=j log( RXr=1 �r�(xi;�r;�)Pr(j)
E-M algorithm again yields optimal scoring Z = SZ , where nowZ is the full random response matrix described earlier.
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 32'
&
$
%
EM and Optimal Scoring
E-Step:
Compute memberships Prob(obs 2 rth subclass jx; j)W (crjx; j) = �r�(x;�r;�)Pr(j)PRk=1 �k�(x;�k;�)Pk(j)i.e. membership depends on jjx� �kjj2� � 2 logPk(j).M-Step:Construct Random Response Matrix Z with elementsW (crjx; j): 0BBBBBBB@
c1 c2 c3 c4 c5 c6 c7 c8 c9g1 = 2 0 :1 0 :2 :4 :2 0 :1 0g2 = 1 :8 :1 :0 0 :1 0 0 0 0g3 = 1 :1 :6 :1 0 0 0 0 :1 :1g4 = 3 0 0 0 0 0 0 :5 :4 :1...
...gn = 3 0 0 0 :1 0 0 0 :1 :8
1CCCCCCCAZ = SZZT Z = �D�TUpdate �s and �s
9>>=>>;, M-step of MDA