Flexible Discriminant and Mixture Models - Stanford University

32
Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 1 Flexible Discriminant and Mixture Models Trevor Hastie [email protected] Statistics Department and Division of Biostatistics Stanford University joint with Andreas Buja and Rob Tibshirani April 28, 1997 Papers fda.ps.Z, pda.ps.Z and mda.ps.Z are available from: ftp://playfair.stanford/edu/pub/hastie

Transcript of Flexible Discriminant and Mixture Models - Stanford University

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 1'

&

$

%

Flexible Discriminantand Mixture Models

Trevor Hastie

[email protected]

Statistics Department

and Division of Biostatistics

Stanford University

joint with Andreas Buja and Rob Tibshirani

April 28, 1997

Papers fda.ps.Z, pda.ps.Z and mda.ps.Z are available from:

ftp://playfair.stanford/edu/pub/hastie

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 2'

&

$

%

Theme: Modular Extensions of Standard Tools

Linear Discriminant Analysis or LDA is a classic technique

for discrimination and classification

Virtues of LDA:

+ Simple prototype method for multiple class classification

+ Can produce optimal low dimensional views of the data

+ Sometimes produces the best results; e.g. LDA featured

in top 3 classifiers for 11/22 of the STATLOG datasets,

overall winner in 3/22.

Limitations of LDA:

- Lots of data, many predictors: LDA underfits (restricts to

linear boundaries)

- Many correlated predictors: LDA (noisy/wiggly

coefficients)

- Dimension reduction limited by the number of classes

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 3'

&

$

%

Example of extension: FDA

� Y = SX(Y ) where Y is an indicator response matrix

and SX a regression procedure (Linear regression,

Polynomial Regression, Additive Models, MARS, Neural

Network, � � �)� eigenY T Y = eigenY TSXY ) LDA, flexible

extensions of LDA.

Typically this amounts to expanding/selecting the predictors

via basis transformations chosen by regression, and then

(penalized) LDA in the new space.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 4'

&

$

%

Example: Vowel Recognition

Vowel Word Vowel Word

i heed o hod

I hid c: hoard

E head U hood

A had u: who’d

a: hard 3: heard

y hud

11 symbols, 8 speakers (train), 7 speakers (test), 6

replications each. 10 inputs features based on digitized

utterances.

Source: Tony Robinson, via Scott Falman, CMU

GOAL: Predict Vowels

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 5'

&

$

%

Some Results

Technique Error rates

Training Test

(1) LDA 0.32 0.56

Softmax 0.48 0.67

(2) QDA 0.01 0.53

(3) CART 0.05 0.56

(4) CART (linear combination splits) 0.05 0.54

(5) Single-layer Perceptron 0.67

(6) Multi-layer Perceptron (88 hidden units) 0.49

(7) Gaussian Node Network ( 528 hidden units) 0.45

(8) Nearest Neighbor 0.44

(9) FDA/BRUTO 0.06 0.44

Softmax 0.11 0.50

(10) FDA/MARS (degree = 1) 0.09 0.45

Best reduced dimension (=2) 0.18 0.42

Softmax 0.14 0.48

(11) FDA/MARS (degree = 2) 0.02 0.42

Best reduced dimension (=6) 0.13 0.39

Softmax 0.10 0.50

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 6'

&

$

%Coordinate 1 for Training Data

Coo

rdin

ate

2 fo

r T

rain

ing

Dat

a

-4 -2 0 2 4

-6-4

-20

24

oooooo

ooo

o

oo

ooo

o oo

ooo

ooo

o o oo oo

ooo o oo

oo

o

o

o o

o oo

ooo

oooooo

o

o

oo

o

o

oooo

o

o

oooooo

o oooo

o

o

o

oooo

oooooo

o

oo

oo

o

oooooooo

ooo

o

o

ooo

o

o

o ooooo

ooo o oo

oooooo

oo o ooo

o ooo

oo

oooooo

oooooo

oo

oooo

oooooo

oooooo

oooooo

oooooo

oo

oooo

ooo

oo

o

o ooooo

ooooo

o

oooooooooooo

oooooo

oooooo

oo

o

ooo

ooooooo ooo o o

o ooo

ooo

oooo

o

ooo

o

oo

oooo

o

o

oooo

o

o oo

o

ooo

oooooo

oooooo

oo

oo oo o

oo

oo o

o o o

oo

ooooooo

oooo

o

o

oo

o

ooo

ooo

ooo

o o ooo o

ooooo

o

ooooo

o

o o

oo

oo

ooo ooo

ooo

oo o

o

oo

o o o

ooooo

o

oooo

o

o

o

oo

ooo

ooooo

o

oooo

oo

o oooo

o

oooooo

ooo o

oo

ooooooo oo

o o o

oooooo

oo

ooo o

oo

o oo o

ooo

ooo

o

o

oo

oo

o oo

oooooo

ooo

oooooo

ooo

o ooo oo

oo

oooo

ooo

ooo

ooo

oooooo

oo

o

oo o

••••

•••• •••• •• ••

•• ••••

Linear Discriminant Analysis

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 7'

&

$

%Coordinate 1 for Training Data

Coo

rdin

ate

2 fo

r T

rain

ing

Dat

a

-10 -5 0 5

-6-4

-20

24

6

oooooo

oo

ooo

o

oooooo ooooo

ooooooo o

o o oooooo

oo

o

o

ooo

o o

oo

oo o oo

oooo

o

ooooo

o

oooooooooo

o

o

oo

oooo

oo

o ooo

oo

o o oo

ooooo

o

ooooo

oo ooooo

ooooooo

oo

o

oo

ooo

oo

o

ooo

oo

o

oooo

oo

oo

oooo

oooooo

oooooooooooo

oooooo

ooo

ooo

oooo oo

oooo

oo

ooo

ooo

ooooooooooo

o

oo

o

ooooooo

oooo

oooo

oooooo

oo

oo

o

o oooooo

o

o

oo

o

o

ooooo

o

oo

ooo

o o

o

oooo

oooooo

oooo

ooo

oo

ooo

oooo o

o

oo

ooooo

o

ooo

o

oo

o

o

o

o

o

oooo

o

oooo

oo

ooo oo

o

oo

oooo

ooo

o

oo o

oo

oo

ooooo o

oo

oooo

o

o

ooo

o

oo

o

o

oo

o

ooooo o

oo

o

oo o

ooo oo

o

oo

oo

oo

o

o

o

oo o o

o

ooo

o

oo oooo

oo

oo

o

o

oo

o

o

o

o

oooo

o

o

ooooo

oo

oo ooo

ooooo

o

ooooo

o

oooo

oo

oo

oooo

oooooo

o

oo

o

oo

o ooo

ooo

oo

o o

o

oo

ooo o

o ooooooo oo

o

o

oooooo

o

ooooo

oo

ooo

o

•• •••• ••

••••

••••

••••

••

Flexible Discriminant Analysis

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 8'

&

$

%

List of Extensions

� (Reduced Rank) LDA! (reduced rank) FDA via flexible

regression: Y = SXY� (Reduced rank) LDA! (reduced rank) PDA (Penalized

Discriminant Analysis) via penalized regressionY = SY = [X(XTX + �)�1XT ]Y , e.g. for

image and signal classification.� (Reduced rank) Mixture models. Each class a mixture of

Gaussians. Each iteration of EM is a special form of

FDA/PDA: Z = SZ where Z is a random response

matrix.� In the mixture model above the classes can share

centers (radial basis functions) or own separate ones.

Separate Centers per Class

Z = Z =

Common Centers

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 9'

&

$

%

� In the above we use full likelihood for training:P (X;G) = P (GjX)P (X); what if we use the

conditional likelihood P (GjX)?[+ LDA becomes multinomial regression, and the

non-parametric versions likewise.

[+ The mixture problems again result in an E-M

algorithm, where each M-step is a multinomial

regression with random response Z .

References:Breiman and Ihaka (1984) Unpublished manuscript

Campbell (1980) Applied Statistics

Kiiveri (1982) Technometrics

Ripley and Hjort (1995) Monograph in preparation

Ahmad and Tresp (1994) papers on RBFs

Bishop (1995) Monograph preprint

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 10'

&

$

%

Details of LDA

LDA is Bayes rule for P (XjG) Gaussian with density�(X;�j ;�) = 1(2�pj�j) 12 e� 12 (X��j)T��1(X��j)P (jjx) = �(x;�j ;�)�jP` �(x;�`;�)�`� exp�xT��1�j � 12�Tj ��1�j + log�j�= exp(xT�j + �j)

Note: maxrankf�jg=K;� JXj=1Xgi=j [log�(xi;�j ;�) + log�j ]

is equivalent to Fisher’s rank-K LDA:max vTk Bvk subject to vTkWvk = 1; k = 1; : : : ; Kwhere B and W are the sample Between and Within covariance

matrices. The latter is the usual formulation of Fisher’s LDA.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 11'

&

$

%

LDA, FDA () Optimal ScoringnXi=1[�(gi)� �(xi)]2 + J(�) = minwith normalization

Pi �2(gi) = 1, and “roughness”

penalty functional J .

�(xi) =8>>>>>>>><>>>>>>>>:

xTi � LDAf1(xi1) + f2(xi2) + : : :+ fp(xip)MARS(xi)NN(xi)...

Often �(x) =Pm hm(x)�m and J(�) = �T�.

There is a 1-1 correspondence between optimal scoring and

Fishers LDA, as well as the flexible extensions.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 12'

&

$

%

Optimal scoring algorithm for LDA/FDA

Indicator Response Matrix Y0BBBBBBBBBBB@

C1 C2 C3 C4 C5g1 = 2 0 1 0 0 0g2 = 1 1 0 0 0 0g3 = 1 1 0 0 0 0g4 = 5 0 0 0 0 1g5 = 4 0 0 0 1 0...

...gn = 3 0 0 1 0 0

1CCCCCCCCCCCAY = SYY T Y = �D�T

where S = linear regression, additive regression, MARS,

NN,. . . , each giving a different version of FDA.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 13'

&

$

%

FDA and Penalized Discriminant Analysis

The steps in FDA are� Enlarge the set of predictors X via a basis expansionh(X), and hence inject us into a higher dimensional

space.� Use (penalized) LDA in the enlarged space, where the

penalized Mahalanobis distance is given byD(x; �) = (h(x)�h(�))T (�W+)�1(h(x)�h(�))�W is defined in terms of bases functions h(xi).� Decompose the classification subspace using a

penalized metric:max tr(UT�BetU) subject to UT (�W +)U = I

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 14'

&

$

%

Skin of the Orange

x[,1]

x[,2

]

-4 -2 0 2 4

-4-2

02

46

00

00 00

000

0

0

00

00

00 0

00

0

00

0

0

000 0

0

0

0

00

0

000

0 0000

0

0

0

00

0

00

0

0

00

0

0

000

00

0

00 0

0

0

0

000

0

000

00 0

0 0

0 0

0

000

00 0

000

0

0

00 00

0

00

0000 0

0

000

0

0

0

0

0

0

00

0

00

0

0 000

000

00

00 0

0

0

000 00

0

00

0 0 0

00

00 0

0

00 0

0

00 0

0

00

0

00

0

0

0

0

0

00 0

0

0

000

0

0 00

00

0

00

00

00

00

0

0

00

0

0

0

0

0

0

0

0

0

0

0

000

0

0

0

000

0

0

0

0

0 0

0

0

0 0

0

0

0

000

0

0

0

0

0

0

0

0

0

00

0

0

0

0

00

0

0

0

00

00

0

00

0

000

0

0

0

0

0

0

0

0

0

00

00

0

0

0

0 0

0

0

0

00

0

0

0

0

0

00 00

0

0

0

0

00

000

00

0

0

00

0

0

0 00 00

0

0

00

0

0

0

00

0

0

0

0

00

0

0

00

0

0

0

00

0

0

0

0

0

0

0

00

00

0

0

0

00

0

0

0

0 000

0

0

0

0

0

0

0 0

0

0

0

0

00

0

0

0

00

0

00

0

00

00

0

0

00

00

Training Data

x[,1]

x[,2

]

-4 -2 0 2 4

-4-2

02

46

00

00 00

000

0

0

0

00

00 0

00

0

00

0

000

0

00

0

000

0 0000

0

0

00

00

0

00

0000

00

0

00 0

0

0

0

000

0

000

00 0

0 0

0 0000

00 0

000

0

00 00

0

00

0000 0

000

0

0

0

0

0

0

00 000 000

00

00

00 0

0

0

00 00

00

0 0 0

00 0

0

00 0

00 0

0

00

0

00

0

0

0

0

0

00 0

0

000

0

0 00

00

00

00

0

000

00

0

0

0

000 00 0

00

0

0

00 0

0

00

000

00

0 0

0

0

0

0

0 000

0

0

00

0

0

000

0

0

0

0

0

00

00

00

00

00 000

0

0

0

00

0

000

0

00

0 00

0

0

0

0

000

0

000

0

00

0

0

0

00

0

0

00

0 0

00

00

0

00

0

0

0

00

0

0

0

0

0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

00

00

0

00

0

000

0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

000

0

0

0

0

0 00 00

0

00

0

0

0

0

0

00

0

0

00

0

0

0

0

0

0

0

000

0

0

0

000

0

0

0

0

0

0

0

0

00

0

00

0

00

Predicted Classes

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 15'

&

$

%

FDA vs Regression

In FDA algorithm, we decomposeY TS(�)Y = Y T YY fitted values for dummy response matrix

Why not stop at Y � E(Y jX)?i.e. for new x, computey1(x); y2(x); : : : ; yJ(x)and assign x to the class j with the largest yj(x).

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 16'

&

$

%

Softmax after Linear Regression

1

1

1

1

1

1

11

1

1

1

11

1

1

11

1

11

1

1

11

1

1

1

1

1

11

1

1

11

1

11

1

1 1

1

11 1

1

1

1

1

1

1

11

11

1

1

1

1

1

11

1

1

1

11

111

1

1

111

1

1

1

1

11

11

1

1

1

1

1

11

1

11

1

11

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1 11

11

1

1

1

1

1

1

1

1

1

1

1

11 1

1 1

1

111

1

1

1

1

1

1

1

1 1

1

1

1

1

1

11

1 1

11

1

1

1 1

11 1

1

1

1

1

11

1

1

1

1

1

1 1

11

1

1

1

1

111

1

1

1

11

111

1 1

1

1

1

1

1

1

11 11

1

1

11

1

11

1

1

1

1

11

1

1

1

1

1 1

1

1

1

1

11

1

11

11

11

1

1

1 1

1

11

1

11

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

11

1 1

1

1

1

1

1

1

1 1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11 11

1

11

1

1

1

1

1

1

1

111

1

1

1

1

1

11

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

11

1

1

1

1

111

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

11

1

1 1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

1

1 11 1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

11

11

1

11

1

1

1

1

1

1 1

1

1

1

1

11

11

1

1

1

1

1111

1

1

1

11

1

2 2

22

2

2

2

2

2

22

2

2

2

22

22

2

22

2

2

2

22

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

222

222

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

222

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 22

2

222

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2 2

2

2

2

2 22 2

2

2

2

22

2

22

2

2

2

22

2

22

22

2

2

2

2

2

2 222 2

22

22

2

2

2

2

2 2

2

2

2

2

22 22

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

22

2

2

2

2

2

2

22

22

2

2

2

2

2

2

2

2

2

22

2

22

22

2

2

2

2

2

22

2 2

2

2

222

2

22

2

2

2

2

2

22

2

2

22

22

2

2

22

2 2

2

22

2

2

2

2

2

2

2

2

22

2 2

2

22

2

22

2

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2

2

2 2

2

2

2

2

2

22

22

2

2 22

22

2

2

2

2

22

2

2

2

2

22

2

22

2

2

2

2

22

22

2

22

2 22

2

2

22

2

2

2

2

2

2

2

22

2

22

2

2

22 22

22

2

2

2

2

2

2

22

2

2

2

2

2

2

22 2

2

22

2

2

2

222 2

22

2

2

22

2

22

2

2

22 22

2

2

2

2

2

22

22

22

2

22

2

2

2

2

2

2

2

222

22

22

22

2

2

2

2

2

2

2

2 22

2

2

2

2

2

22

2

2

22

2

2

2

2

2

2

2

22

2

2

22 22

2

2

2

2

22

22

22

2

3 3

3

3

33

33 3

333

3

3 3

33

33

3

3

33 3

3

3

3

3

3

3

3

3

3

3

3

3 3

33

3

3

3

33

3

33

3

3

3

3

3

3

3 3

3

3

33

3

3

3

3

3

3

33

3

3

3

3

3

3

33

3

3

33

3

3

3 3

3

3

3

3

33

3

3

33

3

33

3

3

3

3

3

3 3

3

3

3

3

3

3

3

3

3

33

3

3

3

3

3

3

33

3

3

3

3

3

33

333

3

3

3

3

333 3

3

3 3

3

3 3

3

3

3

33

3

3

3

3

3

3

3

3

3

3

3

3 33

3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

33 33

33

33

3

3

33

3

3

33

33

3

3

3

3

33

33

3

3

3

3

33

33

3

3

3

3

3

3

3

3

33

3

3

3

33

33 3

3

33

3

3

33

3

3

3

3

33 3

3

3

3

3

3

3

3

3

3

3 3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

33

3

3

3

3

3

33

3

333

3

3

3

3

3 3 33

3

33

3

3

3

3

3 3

3

33

33

3

3

3

3

3

3

33

3

3

3 3

3

3

3

3

3

3

3

33

33

3

3

33

3

3

3

3

3 3

3

3

3

33

3

3

3

3

33

3

33

3

3

3

3

3

3

3

3

33

33

3

3

33

3

33

3

3

3

3

3

33

33

3

3

3

33

33

3

3

33

3

3

3 3

3

3

3

33

3

3

33

3

33

3

33 3

3

33

33

3

3

3

3

3

3

3

3

3

3

3

3

33 3

3

3

3

333

33

3 3

33 3

3 33

3

33

33

3

3

33

3

3

3 3

3 3

33 3

33

3

3

3

33

3

3

33

3

3

3

33

33

33

3

33 3 3

3

3

33

3

X1

X2

error: 0.3333LDA

X1

X2

1

1

1

1

1

1

11

1

1

1

11

1

1

11

1

11

1

1

11

1

1

1

1

1

11

1

1

11

1

11

1

1 1

1

11 1

1

1

1

1

1

1

11

11

1

1

1

1

1

11

1

1

1

11

111

1

1

111

1

1

1

1

11

11

1

1

1

1

1

11

1

11

1

11

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1 11

11

1

1

1

1

1

1

1

1

1

1

1

11 1

1 1

1

111

1

1

1

1

1

1

1

1 1

1

1

1

1

1

11

1 1

11

1

1

1 1

11 1

1

1

1

1

11

1

1

1

1

1

1 1

11

1

1

1

1

111

1

1

1

11

111

1 1

1

1

1

1

1

1

11 11

1

1

11

1

11

1

1

1

1

11

1

1

1

1

1 1

1

1

1

1

11

1

11

11

11

1

1

1 1

1

11

1

11

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

11

1 1

1

1

1

1

1

1

1 1

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11 11

1

11

1

1

1

1

1

1

1

111

1

1

1

1

1

11

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

11

1

1

1

1

111

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

11

1

1 1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

111

1

1

1

1

1 11 1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

11

11

1

11

1

1

1

1

1

1 1

1

1

1

1

11

11

1

1

1

1

1111

1

1

1

11

1

2 2

22

2

2

2

2

2

22

2

2

2

22

22

2

22

2

2

2

22

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

222

222

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

222

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 22

2

222

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2 2

2

2

2

2 22 2

2

2

2

22

2

22

2

2

2

22

2

22

22

2

2

2

2

2

2 222 2

22

22

2

2

2

2

2 2

2

2

2

2

22 22

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

22

2

2

2

2

2

2

22

22

2

2

2

2

2

2

2

2

2

22

2

22

22

2

2

2

2

2

22

2 2

2

2

222

2

22

2

2

2

2

2

22

2

2

22

22

2

2

22

2 2

2

22

2

2

2

2

2

2

2

2

22

2 2

2

22

2

22

2

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2

2

2 2

2

2

2

2

2

22

22

2

2 22

22

2

2

2

2

22

2

2

2

2

22

2

22

2

2

2

2

22

22

2

22

2 22

2

2

22

2

2

2

2

2

2

2

22

2

22

2

2

22 22

22

2

2

2

2

2

2

22

2

2

2

2

2

2

22 2

2

22

2

2

2

222 2

22

2

2

22

2

22

2

2

22 22

2

2

2

2

2

22

22

22

2

22

2

2

2

2

2

2

2

222

22

22

22

2

2

2

2

2

2

2

2 22

2

2

2

2

2

22

2

2

22

2

2

2

2

2

2

2

22

2

2

22 22

2

2

2

2

22

22

22

2

3 3

3

3

33

33 3

333

3

3 3

33

33

3

3

33 3

3

3

3

3

3

3

3

3

3

3

3

3 3

33

3

3

3

33

3

33

3

3

3

3

3

3

3 3

3

3

33

3

3

3

3

3

3

33

3

3

3

3

3

3

33

3

3

33

3

3

3 3

3

3

3

3

33

3

3

33

3

33

3

3

3

3

3

3 3

3

3

3

3

3

3

3

3

3

33

3

3

3

3

3

3

33

3

3

3

3

3

33

333

3

3

3

3

333 3

3

3 3

3

3 3

3

3

3

33

3

3

3

3

3

3

3

3

3

3

3

3 33

3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

33 33

33

33

3

3

33

3

3

33

33

3

3

3

3

33

33

3

3

3

3

33

33

3

3

3

3

3

3

3

3

33

3

3

3

33

33 3

3

33

3

3

33

3

3

3

3

33 3

3

3

3

3

3

3

3

3

3

3 3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

33

3

3

3

3

3

33

3

333

3

3

3

3

3 3 33

3

33

3

3

3

3

3 3

3

33

33

3

3

3

3

3

3

33

3

3

3 3

3

3

3

3

3

3

3

33

33

3

3

33

3

3

3

3

3 3

3

3

3

33

3

3

3

3

33

3

33

3

3

3

3

3

3

3

3

33

33

3

3

33

3

33

3

3

3

3

3

33

33

3

3

3

33

33

3

3

33

3

3

3 3

3

3

3

33

3

3

33

3

33

3

33 3

3

33

33

3

3

3

3

3

3

3

3

3

3

3

3

33 3

3

3

3

333

33

3 3

33 3

3 33

3

33

33

3

3

33

3

3

3 3

3 3

33 3

33

3

3

3

33

3

3

33

3

3

3

33

33

33

3

33 3 3

3

3

33

3

error: 0.0047

X

Y

0

1 | || || ||| || | ||| | | | || || || || ||| ||| | |||||| | || | | || | ||| | || ||| || || | || || || || ||| || ||| || | || | || || | | || ||| || ||| || || | ||||| | || || || ||| | ||| | | || ||||| | ||| || ||| ||| || || || ||| | || | || | | || || || |||| ||| ||| |||||| | || || || | ||||| |||| | | || | | || | ||| |||| | || | || || || ||| || | |||| | || | || || ||| | ||| || || || || |||| | || |||| ||| | || || | || |||| | ||| | || || ||| || | |

With many classes, high order (polynomial) interaction terms

might be needed to avoid masking.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 17'

&

$

%

FDA vs PDA

There are two complimentary situations:

FDA h(x) is an expansion of x into m� p basis functions.�T� makes �(x) = �Th(x) smooth in x.

PDA h(x) = x, e.g. digitization of an analog

signal—x = (x1; : : : ; xp), xj = x(tj).�T� makes � smooth in t

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 18'

&

$

%

Handwritten Digit Identification

A sample of 5 handwritten digits within each digit class.

2000 training and 2000 test images.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 19'

&

$

%

Speech Recognition

aa

Frequency

Log-

Per

iodo

gram

0 50 100 150 200 250

05

1015

2025

ao

Frequency

Log-

Per

iodo

gram

0 50 100 150 200 250

05

1015

2025

dcl

Frequency

Log-

Per

iodo

gram

0 50 100 150 200 250

-50

510

15

iy

Frequency

Log-

Per

iodo

gram

0 50 100 150 200 250

05

1015

2025

sh

Frequency

Log-

Per

iodo

gram

0 50 100 150 200 250

05

1015

20

A sample of 10 log-periodograms within each phoneme class. 1633

training (11 speakers) and 1661 test (12 different speakers) frames.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 20'

&

$

%

Reasons for Regularization

There are two different reasons why regularization is

necessary:

Estimation: With digitized analog signals, the

variables-to-observation ratio can be small.

Regularization is needed for consistency in estimating a

population LDA model [Leurgans, Moyeed & Silverman,

JRSSB, 1993]

Interpretation: The LDA coefficients are given by� = ��1�j . If the eigenstructure of �W concentrates

on low-frequency signals, then � will tend to be noisy,

and hence hard to interpret.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 21'

&

$

%

PDA coefficients: Handwritten Digits

LDA: Coefficient 1 PDA: Coefficient 1 LDA: Coefficient 2 PDA: Coefficient 2 LDA: Coefficient 3 PDA: Coefficient 3

LDA: Coefficient 4 PDA: Coefficient 4 LDA: Coefficient 5 PDA: Coefficient 5 LDA: Coefficient 6 PDA: Coefficient 6

LDA: Coefficient 7 PDA: Coefficient 7 LDA: Coefficient 8 PDA: Coefficient 8 LDA: Coefficient 9 PDA: Coefficient 9

Left images: LDA coefficient image

Right images: PDA coefficient image

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 22'

&

$

%PDA: Coordinate 1

PD

A: C

oord

inat

e 2

0

0

0

0

0 0

0

00

0

00

0 0

000

0

0

0

00

0

0

00

00

0 00

0 0

00

0

0

000

0

00

0

0

0000

00

0

000

0 0

0

00

0

0

0

0

0

0

00

0

0

000

0

0

00

0

0 00

0

00

0 0000

00 0

00

0

0

00

00

0 000

0

0

0

0

0

0 00

00

0

0

0 0

00

0

0

0

0

0

00

0

0

0 0

0

0

0

00

0

0

0

0

0

0

0

0

0

000

0

00

000

0

0

0 0

0 0

0

00 00

0

0

0 0

0

0

0

0

0 0

00

0

0

0

0

0

0

0

00 00

0 0

0

0

0

000

0 00

00

0 0

00 000

000

0

0

0

0

0

0

0

0

00

0

0

0

00

000

00

0

0

0

0

0

00

0

0

0

0

0

00

0

0 0

0

000

00

00

0

0

0

0

0

00

0

00 0

00

0

0

0

0

00

0

0

0

0

0

0

0

0

0

0

0

0

0

00 0

000

1

1

1111

11

1

1

11

1

1

111

1

1

11 111

1

1

1

1

1

1

1

11

11

111

1

1111 1

1

1

11

11

1 111

11

11

1

11

1111 11

1 1

1

1 1111

11

1

11

11

1

1

1

1

11

1

11111

1

11 111

11

1

1111

11

1

111

1

1111

11

1

1

1

1 111

1

1

1

1

1

1

11

1

11

1

111

1

1111 1

11

11

11111

1

111

1

11

1

1

11 11

1

1 11111

1

11

11

1

1

11111

1

11 1

11

11 1111

1 1

11

1 11

1

1

1111

1

1

11

1

111

111

111 1

111 11

111

1111 1

11

111

111

1

1 1

1

1

1

1111

1

1

111

3

3

33

333

3

333

33

3

3

3

3

3

333

3

3

3333 3 3

33

33

3

33

3

3

3

3

333

3

3

3

3 33

3

3

333

33

3 3

3

3 3333

3 33

3

3

3

3

3

3 333

3

3

33

3

3

3

3

3 3

3

3

3

3

3

3

3

3

33

3

333

3 33

3

33

3

3

3

3

3

3 3

3

3

3

3

33

33

3

33

3

3

3 3

3

33

3

3

3

3

3

3

333

33

3

33

33

3

3

3

33

3

33

3

3

33

3

3

3

3

3

3

3

3

3

3

33

3

3

33

3333

3

3

3

3

3

33

3

3

3

3

33

333

3

3

3

3

3

3

3

3

3

3

3

3

3

3

33

3

33

3

3

3

3

3

3

333

33

3

3

3

3

3

3

3

3

3

33

33

3

3

333

3

3

33

3

3

333

3

9

999

9

9

9

9

9

9

9

9

9

9

9

9

99

99

99

99

9

9

99 9 9

9

9

9

9

9

9

9

9

9

99

99 99

9

9

9

999

9

9

9 9

9

9

9

9

9

9

9

9

99

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

999

9

99

9

9

9

99

9

9

9

9

9

99

9

9

9

9

9 9

9

9

9

9

9

99

9

99

9

9

9

9

9 999

99

99 9

9

99

999

9

9

9

99

9

9

9

9

99

9

9

9

9

999

99

9 9

9

9

9

8

8

8

8

88

8

8

88

8

8

8

8

8 8

8

88

8

88

8

8

8 8

8

8

8

8

8

8

8

8

88

8

8

8

8

8

88

8

88

8

88

88

8

8

8

8

8

8

88

8

8

8888 8

8

88

8

8

8

88

8

8

88

8

8

8

8

88 8

8

8

88

8

8

888

8

88 8

8

8

88 8

8 8

8

8

8 88

88

8

88

8

8

88

88

8

8

8

8 8

8

8

8

88

8

8

8

8

88

88

8

8

88

8

8

88

88 8

8

8

8

88

8

8

8

88

8

8

8

8

8

8

8 8

4

4

4

4

4

44

4

4

4

4

4

444

4

4

4

4

4

44

44

44

4

4

4

4

4

4

44

4

44

444

4

44

4

4

4

4 4

4

44

44

4

4

4 44

44

4

4 4

4

44

4

4

444

44

44

4

444

4

4 4

4

4

4

44

44

4

4

44

4 44

4

4 4

4

4

4 4 44

4

4

44

4

4

4

44

4

4

44 44

4

4

4

44

44

4 44

4

4

4

4

44

4

4 44

4

4

4

4

4

4

4

4

4

44

4 44

4

4

6

6

66

6

666

66

66

66

6

666

6

66

66

6

6

666

6

6

6

6

6

6

6

6

6

66

6

66

66

6

6

66

6

6

66

6

6

6

66

6

66

66

6 6

66

6

6

6

6

66

6

6

6

6

6 6

6

6

6

6

6

66

6

66

66

666 6

6

6

6

66

6

6

6

6

6

6 6

66

66

6

66

6

6

66

6

6

6

6

666

6

66 6

66

6

6

6

6

6

6

6 66

6

66

66

6

6

66

6

6

6

6

6

6 6

6

6

66

6

6

66

6 6

6

6

6

666

666

66

6

6

66

6

66

6

66

6

6

6

6

6

6

6

6

6

6

6

6666 6

6

6

6

6

6

66

6

6

6

66

6

6

6

6

6

6

7

77

77

7

7

77

7

7

7

77

77 77

77

7

77 7

7

77

7

7

7

77

7

7

7

77

7

7

77

7

7

7

7

77

7

77

7 77

77

7

7

7

7 77

7

77

77

77 7

7

7

7

77

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

77

7 7

777

77

7

77

77

7

777

7

7

7

7

77

77

777

777

7

7

77

77

77 7

7

7

7

7

7777 7

777

77777 7

7 7

7

77

77

77

77

7

7

7

7

7

7

7

7

7

77

7

7

7

7

7

7777

7 77

7

7

7 77

5

5

5

5

5

5

5

55

5

55

5

5

5

5

5 5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

555 55

5

5

5

5

5

5 55

5

55

5

55

5

55

5

55

5

5

5

55

5

55

55

5

55

5

5

5

55

5

5

5

5

55

5

5

55

5

5

5

5555

5

55

5

5 5

5

55

5

5

5

5

5 5

5

5

5

5

5

5

5

55

5

5

5

5

5 5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

5

55

55

55

5

5

55

5

5

5

5

55

5 55

52

22

2

22

22

22

2

2

2

2

2

2

22

22 2

2

2

2

2

2

22

2

2

2

2

2

2

22

22

2

22

2

2

2

2

22

22

2

2

2

2

2

2

22

2

2

2 2

22

22

2

2

22

22 2

22 2

2

2

2

22

22

22

2

2

2 2

22

22 22 2

22

2

2

2

2

222

2

2

22

2 2

222

222

22

2

2

Canonical Variate Plot --- Digit Test Data

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 23'

&

$

%

PDA coefficients: Phoneme Data

Frequency

Can

onic

al F

unct

ion

0 50 100 150 200 250

-0.2

-0.1

0.0

0.1

0.2

order 1

Frequency

Can

onic

al F

unct

ion

0 50 100 150 200 250

-0.2

-0.1

0.0

0.1

0.2

order 2

Frequency

Can

onic

al F

unct

ion

0 50 100 150 200 250

-0.2

-0.1

0.0

0.1

0.2

order 3

Frequency

Can

onic

al F

unct

ion

0 50 100 150 200 250

-0.2

-0.1

0.0

0.1

0.2

order 4

Ordinary LDA coefficient functions for the phoneme data,

and regularized versions.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 24'

&

$

%

Mixture Discriminant Analysis: MDA

1

1

1 1

1

11

1 1

11

1

1

1

1

1

1

1

11

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

111

1

1

1

1

1

11 1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

1

1 11

1 111

1

1 11

1

1

11 1

1

1

1

1

1

11

11 1

11

1

11 11

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

11

1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1 11

11

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11 1

11 111

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1 1 1 11

1

1

1

1

11

1

1

1

111

1

1

11

1

1

11

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

222

2

2

2

2 2

2

2

22

222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2 2

22

2 22

2 22

2

2 2

2

2

2

2

2

2

2

2

2

2

2

22

2

2 2

2

2

22

2

2

22

2 2

2

22

2

22

22

2

2

2

2

2

2 2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 2

2

222

2

2

2 2

2

22

2

222

22 2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2222

2 2

2

2

2

Linear Discriminant Analysis

1

1

1 1

1

11

1 1

11

1

1

1

1

1

1

1

11

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

111

1

1

1

1

1

11 1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

1

1 11

1 111

1

1 11

1

1

11 1

1

1

1

1

1

11

11 1

11

1

11 11

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

11

1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1 11

11

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11 1

11 111

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1 1 1 11

1

1

1

1

11

1

1

1

111

1

1

11

1

1

11

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

222

2

2

2

2 2

2

2

22

222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2 2

22

2 22

2 22

2

2 2

2

2

2

2

2

2

2

2

2

2

2

22

2

2 2

2

2

22

2

2

22

2 2

2

22

2

22

22

2

2

2

2

2

2 2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 2

2

222

2

2

2 2

2

22

2

222

22 2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2222

2 2

2

2

2

Mixture Discriminant Analysis

••

1

1

1 1

1

11

1 1

11

1

1

1

1

1

1

1

11

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

111

1

1

1

1

1

11 1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

1

1 11

111

11

1 1

11

1

11 1

1

1

1

1

1

11

11 1

11

1

11 11

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

11

1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1 11

11

1

1

1 11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11 1

11 111

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1 1 1 11

1

1

1

1

11

1

1

1

111

1

1

11

1

1

11

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

222

2

2

2

2 2

2

2

22

222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

22

22

2 22

2 22

2

2 2

2

2

2

2

2

2

2

2

2

2

2

22

2

2 2

2

2

22

2

2

22

2 2

2

22

2

22

22

2

2

2

2

2

2 2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 2

2

222

2

2

2 2

2

22

2

222

22 2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2222

22

2

2

2

Learning Vector Quantization

••

••

1

1

1 1

1

11

1 1

11

1

1

1

1

1

1

1

11

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

111

1

1

1

1

1

11 1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

1

1 11

111

11

1 1

11

1

11 1

1

1

1

1

1

11

11 1

11

1

11 11

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

11

1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1 11

11

1

1

1 11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11 1

11 111

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1 1 1 11

1

1

1

1

11

1

1

1

111

1

1

11

1

1

11

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

222

2

2

2

2 2

2

2

22

222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

22

22

2 22

2 22

2

2 2

2

2

2

2

2

2

2

2

2

2

2

22

2

2 2

2

2

22

2

2

22

2 2

2

22

2

22

22

2

2

2

2

2

2 2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 2

2

222

2

2

2 2

2

22

2

222

22 2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2222

22

2

2

2

Flexible Discriminant Analysis

1

1

1 1

1

11

1 1

11

1

1

1

1

1

1

1

11

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

111

1

1

1

1

1

11 1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

1

1 11

111

11

1 11

1

1

11 1

1

1

1

1

1

11

11 1

11

1

11 11

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1 11

11

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11 1

11 111

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1 1 1 11

1

1

1

1

11

1

1

1

111

1

1

11

1

1

11

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

222

2

2

2

2 2

2

2

22

222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2 2

22

2 22

2 22

2

2 2

2

2

2

2

2

2

2

2

2

2

2

22

2

2 2

2

2

22

2

2

22

2 2

2

22

2

22

22

2

2

2

2

2

2 2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 2

2

222

2

2

2 2

2

22

2

222

22 2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2222

2 2

2

2

2

Mixture Discriminant Analysis

Rank 1 Model

••

•1

1

1 1

1

11

1 1

11

1

1

1

1

1

1

1

11

1

1

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

111

1

1

1

1

1

11 1

1

1

1

1

1

1

11

1

1

1

1

1

1 1

1

1 11

111

11

1 11

1

1

11 1

1

1

1

1

1

11

11 1

11

1

11 11

1

1

1

1

11

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

11

1

1

1

1

1

1

1

1

11

11

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1 11

11

1

1

1 1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

11

1

1

11 1

11 111

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1 1 1 11

1

1

1

1

11

1

1

1

111

1

1

11

1

1

11

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

222

2

2

2

2 2

2

2

22

222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2 2

22

2 22

2 22

2

2 2

2

2

2

2

2

2

2

2

2

2

2

22

2

2 2

2

2

22

2

2

22

2 2

2

22

2

22

22

2

2

2

2

2

2 2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2 2

2

222

2

2

2 2

2

22

2

222

22 2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2222

2 2

2

2

2

MDA/FDA

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 25'

&

$

%

Gaussian Mixture Model

P (XjG = j) = RjXr=1 �jr�(X;�jr;�); Mixture of Gaussians

ThenP (G = jjX = x) = PRjr=1 �jr�(X;�jr;�)�jPJ=1PR`r=1 �`r�(X;�`r;�)�`Estimate parameters by maximum likelihood of P (X;G) (possibly

subject to rank constraints!)maxrankf�jrg=K;� JXj=1Xgi=j log( RjXr=1 �jr�(xi;�jr;�)�j)

Note: reduced rank amounts to dimension reduction in predictor

space!

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 26'

&

$

%

EM and Optimal Scoring

E-Step:

Compute memberships Prob(obs 2 rth subclass of class j jx; j)W (crjx; j) = �r�(x;�jr;�)PRjk=1 �jk�(x;�jk;�)M-Step:Construct Random Response Matrix Z with elementsW (cjrjx; j):0BBBBBBBBBB@

c11 c12 c13 c21 c22 c23 c31 c32 c33g1 = 2 0 0 0 :3 :5 :2 0 0 0g2 = 1 :9 :1 :0 0 0 0 0 0 0g3 = 1 :1 :8 :1 0 0 0 0 0 0g4 = 3 0 0 0 0 0 0 :5 :4 :1g5 = 2 0 0 0 :7 :1 :2 0 0 0...

...gn = 3 0 0 0 0 0 0 :1 :1 :8

1CCCCCCCCCCAZ = SZZT Z = �D�TUpdate �s and �s

9>>=>>;, M-step of MDA

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 27'

&

$

%

Waveform Example

xi = uh1(i) + (1� u)h2(i) + �i Class 1xi = uh1(i) + (1� u)h3(i) + �i Class 2xi = uh2(i) + (1� u)h3(i) + �i Class 3

1 1 1 1 1 1 1 1 1

1

1

1

1

1

1

1

1

1

1

1

12 2 2 2 2 2 2 2 2

2

2

2

2

2

2

2

2

2

2

2

23 3 3 3 3

3

3

3

3

3

33

33

3

3

33

33

34 4 4 4 4

4

4

4

4

4

4

4

4

4

4

4

44

44

45 5 5 5 5

5

5

5

5

5

5 5 5 5 5

5

55

55

5

Class 1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1 1 1 1 12 2 2 2 2

2

2

2

2

2

2

2

2

2

2

2

2 2 2 2 23 3 3 3 3

3

3

3

3

3

3

3

3

3

3

3

3 3 3 3 34

4

4

4

4

4

44

44

4

4

44

44

4 4 4 4 45

5

5

5

5

5

55

55

5

5

55

55

5 5 5 5 5

Class 2

1

1

1

1

1

1

1

1

1

1

1

1

11

11

11

11

12

2

2

2

2

2

2

2

2

2

2

2

22

22

22

22

23

3

3

3

3

3

3

3

3 3 3 3 33

33

33

33

34

4

4

4

4

4

4

4

4

4

4

4

4 4 4 4 4 4 4 4 45

5

5

5

5

5

5

5

5

5

5

5

5 5 5 5 5 5 5 5 5

Class 3

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 28'

&

$

%

Discriminant Var 1

Dis

crim

inan

t Var

2

-6 -4 -2 0 2 4 6

-6-4

-20

24

1

1

1

11

1

1

11

1

11

1

1

1

1

1

1

1

1

111

1

1

1

11

1

11

1

1

1 1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

11

11 1

1

11

1

1 1

1

1

1

1

1

1

1

1

1

1

1 1

1

1

11

1

11

11 1

1

1

1

1

11

1

1

1

1

1

11

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

11

1

1

1

1

11

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

11

11

1

1

1

2

2

2

2

2

2

2

2

22

2 2

22

22

2

2

22

2

2

2

2

2

2

2

2

2

222

2

2

2

2

22

2 22

22

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2

2

2

2

2

2

222

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

23

33 3

33

33

3 333

3

3

3

33

3

33

3

3

3

3

3

33

33

33

33

3

33

3

33

333

3

333

3 33

3

3

3

3

3

3

3

3 3

333

3

3

3

3

333

3

3

3

33

33

3

3

3

3

33

3

3

33 3

3

3

3

3

33

3

3

33

3

3

3

3

3

3

3 3 3

3 33 3

3

333

3

3

3

3

3

33

33

3

33

33

33

3

33

3

3

33

3

3 3

3

33

3

3

33

3

33

3

3

33

3

33 33

3

3

33 3

3

33

3

3

3

33

3 33

33

3 subclasses, penalized 4df

Discriminant Var 1

Dis

crim

inan

t Var

3

-6 -4 -2 0 2 4 6

-2-1

01

2

1

1

11

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

111

1

1

1

1

11

1

1

1

1

1

1

1

1 1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

11

1

1

1

1

1

1

111 1

11

11

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

111

1

1

11

1 1

1

1

1

1 11

1

11

11

11

1

1

1

1

1

1

11

1

1

1

11

1

1

11

1

1

1

1

11

1

11

1

1

1

1

1

1

11

1

1 11 22

2

222

2

2 2

2

2

2

22

2

2

22

2

2

2

2

2

2

2

2

2

22

22

2

2

2

22

222

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2 2

2

2

2

2 2

2

2

22

2

2

2

2

2

2

2

2

222

2

222

2

2

2

2

222

2

2

2

2

2

2

2

2

2 2

2

2

2

2

22

2

2

2

2

2

2

22

2

2

22

2

22

2

2

2

22

2

22

2

2

2

2

22

2

2

2

2

22

2

22 2

3

3 3 3

3

3

3

3

3

3

3

3

3

3

3

33

3

3

3

33

3

3

3

3

3

3

33

3

3

3

3

3

3 3

33 3

3

3

3

3

33

3333

3

33 3

3

3

3

3

3

3

3

3

3

3

3

3

3

3 3

33

3

33

3

33

3

3

3

3

33

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

33

3

3

333

333

3 3

3

3

3

3

3

33

3

3

3

33

3

3

33

3

3

33

3

3

3

3

3

3

3

33

3 3

3

33

33

33

333

3

3

3 3

3

3

33

3

3

3

3

3

3

33 3

3 subclasses, penalized 4df

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 29'

&

$

%

Penalized MDA coefficients

Class 1

Dis

crim

inan

t Fun

ctio

ns

Class 2

Dis

crim

inan

t Fun

ctio

ns

Class 3

Dis

crim

inan

t Fun

ctio

ns

Stanford

University—

April28,1997

Flexible

Discrim

inantandM

ixtureM

odels30

'&

$%

Simulation Results for Waveform Example

Technique Error Rates

Training Test

LDA 0.121(.006) 0.191(.006)

QDA 0.039(.004) 0.205(.006)

CART 0.072(.003) 0.289(.004)

FDA/MARS (degree = 1) 0.100(.006) 0.191(.006)

FDA/MARS (degree = 2) 0.068(.004) 0.215(.002)

MDA (3 subclasses) 0.087(.005) 0.169(.006)

MDA (3 subclasses, penalized 4df) 0.137(.006) 0.157(.005)

PDA (penalized 4df) 0.150(.005) 0.171(.005)

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 31'

&

$

%

Gaussian Mixtures and RBFs

P (X;G = j) = RXr=1 �r�(X;�r;�)Pr(j)This is a mixture of joint-densities Pr(X;G) with R shared

mixture components. ThenP (G = jjX = x) = PRr=1 �r�(x;�r;�)Pr(j)PRr=1 �r�(x;�r;�)This closely resembles (renormalized) RBFs.

Estimate parameters by maximum likelihood of P (X;G) (possibly

subject to rank constraints!)maxrankf�rg=K;� JXj=1Xgi=j log( RXr=1 �r�(xi;�r;�)Pr(j)

E-M algorithm again yields optimal scoring Z = SZ , where nowZ is the full random response matrix described earlier.

Stanford University—April 28, 1997 Flexible Discriminant and Mixture Models 32'

&

$

%

EM and Optimal Scoring

E-Step:

Compute memberships Prob(obs 2 rth subclass jx; j)W (crjx; j) = �r�(x;�r;�)Pr(j)PRk=1 �k�(x;�k;�)Pk(j)i.e. membership depends on jjx� �kjj2� � 2 logPk(j).M-Step:Construct Random Response Matrix Z with elementsW (crjx; j): 0BBBBBBB@

c1 c2 c3 c4 c5 c6 c7 c8 c9g1 = 2 0 :1 0 :2 :4 :2 0 :1 0g2 = 1 :8 :1 :0 0 :1 0 0 0 0g3 = 1 :1 :6 :1 0 0 0 0 :1 :1g4 = 3 0 0 0 0 0 0 :5 :4 :1...

...gn = 3 0 0 0 :1 0 0 0 :1 :8

1CCCCCCCAZ = SZZT Z = �D�TUpdate �s and �s

9>>=>>;, M-step of MDA