Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... ·...

67
Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Introduction to Machine Learning , Chapter 6 2. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 3

Transcript of Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... ·...

Page 1: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Data Analysis and Dimension Reduction- PCA, LDA and LSA

Berlin ChenDepartment of Computer Science & Information Engineering

National Taiwan Normal Universityy

References:1. Introduction to Machine Learning , Chapter 6 2. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 3

Page 2: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Introduction (1/3)

• Goal: discover significant patterns or features from the input data– Salient feature selection or dimensionality reduction

NetworkNetworkx yInput space Feature space

W

C t i t t t i b d d i bl– Compute an input-output mapping based on some desirable properties

Statistics-2

Page 3: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Introduction (2/3)

• Principal Component Analysis (PCA)

• Linear Discriminant Analysis (LDA)

• Latent Semantic Analysis (LSA)

Statistics-3

Page 4: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Bivariate Random Variables

• If the random variables X and Y have a certain joint distribution that describes a bivariate random variabledistribution that describes a bivariate random variable

⎥⎦

⎤⎢⎣

⎡=YX

Z bivariate random variable

⎥⎤

⎢⎡

=⇒

⎥⎦

⎢⎣

X

Y

μμ mean vector

⎤⎡

⎥⎦

⎢⎣

=⇒

YXXX

YZ

σσ

μμ mean vector

⎥⎦

⎤⎢⎣

⎡=Σ⇒

YYYX

YXXXZ

,,

,,

σσσσ

covariance matrix

[ ]( )[ ] [ ] [ ]( )[ ]( ) [ ]( )[ ] [ ] [ ] [ ]YXXYYYXX

XXXXXXX

EEEEEEEEEE −=−== 2222

,

i variance σσ

Statistics-4

[ ]( ) [ ]( )[ ] [ ] [ ] [ ]YXXYYYXXXYYX EEEEEE −=−−== ,, covariance σσ

Page 5: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Multivariate Random Variables

• If the random variables X1,X2,…,Xn have a certain joint distribution that describes a multivariate random variabledistribution that describes a multivariate random variable

⎥⎥⎤

⎢⎢⎡

=X

X1

M multivariate random variable

⎥⎥⎤

⎢⎢⎡

⎥⎥⎦⎢

⎢⎣

X

nX

1μM t

⎤⎡

⎥⎥⎥

⎦⎢⎢⎢

=⇒

n

XXXX

X

X

σσ

μμ

L

M mean vector

⎥⎥⎥

⎢⎢⎢

=Σ⇒

nnn

n

XXXX

XXXX

Z

,,

,,

1

111

σσ

σσ

L

MMM

L

covariance matrix

[ ]( )[ ] [ ] [ ]( )[ ]( ) [ ]( )[ ] [ ] [ ] [ ]

iiiiXXX XXXXiii

EEEE −=−== 2222,

i

variance σσ

Statistics-5

[ ]( ) [ ]( )[ ] [ ] [ ] [ ]jijijjiiXXXX XXXXXXXXijji

EEEEEE −=−−== ,, covariance σσ

Page 6: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Introduction (3/3)

• Formulation for feature extraction and dimension reduction– Model-free (nonparametric)( p )

• Without prior information: e.g., PCA • With prior information: e.g., LDA

– Model-dependent (parametric), e.g.,• HLDA with Gaussian cluster distributions• PLSA with multinomial latent cluster distributions

Statistics-6

Page 7: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Principal Component Analysis (PCA) (1/2)P 1901

• Known as Karhunen-Loẻve Transform (1947, 1963)

– Or Hotelling Transform (1933)

Pearson, 1901

Or Hotelling Transform (1933)

• A standard technique commonly used for data reduction in statistical pattern recognition and signal processingstatistical pattern recognition and signal processing

• A transform by which the data set can be represented by d d b f ff ti f t d till t i threduced number of effective features and still retain the

most intrinsic information content– A small set of features to be found to represent the data samplesA small set of features to be found to represent the data samples

accurately

• Also called “Subspace Decomposition” “Factor Analysis”Also called Subspace Decomposition , Factor Analysis ..

Statistics-7

Page 8: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Principal Component Analysis (PCA) (2/2)

The patterns show a significant differencefrom each other in one of the transformed axes

Statistics-8

Page 9: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (1/13)

• Suppose x is an n-dimensional zero mean d t { }random vector,

– If x is not zero-mean, we can subtract the mean b f i th f ll i l i

{ } 0xμ == E

before processing the following analysis

b t d ith t b th ti– x can be represented without error by the summation of n linearly independent vectors

n∑ ===

n

iiiiy Φyφx [ ]Tni yyy .. where 1=y

[ ]ni φφφΦ ..1=Th i h

The basis vectorsThe i-th component

in the feature (mapped) space

Statistics-9

Page 10: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (2/13)

Subspace Decomposition(2,3)

(0,1)

(5/2’,1/2’)

(1,0)(1,1)(-1,1)

⎥⎦

⎤⎢⎣

⎡+⎥

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡−+⎥

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡+⎥

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡11

121

111

21

11

25

10

301

232

⎦⎣⎦⎣⎦⎣⎦⎣⎦⎣⎦⎣⎦⎣ 121 212103

orthogonal basis sets

Statistics-10

orthogonal basis sets

Page 11: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (3/13)

– Further assume the column (basis) vectors of ( )the matrix form an orthonormal set

⎨⎧ =

=ji

jT if 1φφ

Φ

• Such that is equal to the projectionof on

⎩⎨ ≠ jiji if 0

φφ

φxiy

of on iφxxx T

iiT

ii y ϕϕ ==∀

x1ϕ

2ϕ1y

2y

cos1

11 11 === xφxx

xx TT

y ϕϕ

θ

1y

Statistics-11

1 where, 1 =φ

Page 12: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (4/13)

– Further assume the column (basis) vectors of the matrix form an orthonormal setΦ ( )( ){ }= TμxμxΣ E

• also has the following properties– Its mean is zero, tooiy

( )( ){ }∑ −⎟

⎞⎜⎝

⎛≈

−−=

=T

N

iTiiN

μμxx

μxμxΣ1

1

E

0

– Its variance is

{ } { } { } 0==== 0xx Ti

Ti

Tiiy ϕϕϕ EEE jy{ } ∑≈=

iTii

TN

xxxxR 1E0

{ } { }[ ] { } { } { }[ ]

xxxx2222i

TTii

TTiii

i yyy ϕϕϕϕσ ===−= EEEEE

• The correlation between two projections and

[ ]xRR ofmatrix relation (auto-)cortheis iTi ϕϕ=

y jy• The correlation between two projections and is { } ( )( ){ } { }

{ }j

TTi

TTj

Tiji yy φxxφxφxφ == EEE

iy jy

Statistics-12

{ } jTij

TTi Rφφφxxφ == E

Page 13: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (5/13)

• Minimum Mean-Squared Error Criterion– We want to choose only m of that we still cans'iφWe want to choose only m of that we still can

approximate well in mean-squared error criterionxsiφ

∑∑∑ +nmn

yyy φφφxi i l t ∑∑∑+===

+==mj

jji

ii

i yyy111

φφφx ii

( ) ∑=m

iiym

1ˆ iφx

original vector

reconstructed vector=i 1

( ) ( ){ } ∑∑+=+= ⎭

⎬⎫

⎩⎨⎧

⎟⎠⎞⎜

⎝⎛⎟⎠⎞

⎜⎝⎛=−= k

n

mkk

Tj

n

mjj yymm

11

2 φφxxε EˆE

{ }

∑ ∑+= += ⎭

⎬⎫

⎩⎨⎧=

n

kTj

n

mj

n

mkkj yy

1 1 φφE

⎧ = kjif1{ } 0yE j = We should

discard the{ }

∑∑

∑+=

==

=

nj

Tj

nj

n

mjjy

2

12

R φφσ

E⎩⎨⎧

≠=

=kjkj

kTj if 0

if 1φφQ{ } { }[ ]

{ }2

222

j

j

yE

yEyE jj

=

−=σdiscard the

bases where the projections have lower variances

Statistics-13

∑∑+=+= mj

jjmj

j11

φφ

Page 14: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (6/13)

• Minimum Mean-Squared Error Criterion'– If the orthonormal (basis) set is selected to be the

eigenvectors of the correlation matrix , associated with eigenvalues

s'iφR

s'iλis real and s mmetric

• They will have the property that: i

R λ

is real and symmetric, therefore its eigenvectors

form a orthonormal setR

jjj φR φ λ=is positive definite ( )

=> all eigenvalues are positiveR 0>RxxT

– Such that the mean-squared error mentioned above will be

( ) ∑=n

jjm

1

2 σε

∑∑∑+=+=+=

+=

===n

mjj

n

mjjj

Tj

n

mjj

Tj

mjj

111

1

λλ φφR φφ

Statistics-14

Page 15: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (7/13)

• Minimum Mean-Squared Error Criterion– If the eigenvectors are retained associated with the m largest

eigenvalues, the mean-squared error will ben

( ) ( )0...... where 11

≥≥≥≥≥∑=+=

nmn

mjjeigen m λλλλε

– Any two projections and will be mutually uncorrelatediy jy

{ } ( )( ){ } { }TTTTT EEyyE φxxφxφxφ

( )( )

−⎟⎠⎞

⎜⎝⎛

∑≈

−−=

TN Tii

T

N

E

μμxx

μxμxΣ1

][

{ } ( )( ){ } { }{ } 0 j ====

==

jTij

Tij

TTi

jijiji

E

EEyyE

φφRφφφxxφ

φxxφxφxφ

λ⎥⎤

⎢⎡ ⋅⋅ nσσσ 11211

{ } ∑==

⎠⎝ =

i

Tii

T

iii

NE

N

xxxxR 11

• Good news for most statistical modeling approaches– Gaussians and diagonal matrices

⎥⎥⎥⎥⎥⎥

⎦⎢⎢⎢⎢⎢⎢

⎣ ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

nnnn σσσ

σσ

21

2222

Statistics-15

⎦⎣ nnnn σσσ 21

Page 16: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (8/13)

• A Two-dimensional Example of Principle Component AnalysisAnalysis

Statistics-16

Page 17: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (9/13)

• Minimum Mean-Squared Error CriterionIt can be proved that is the optimal solution under the( )mε– It can be proved that is the optimal solution under the mean-squared error criterion

( )meigenε

ConstraintsTo be minimized ⎩⎨⎧

≠=

=kjkj

jk if 0 if 1

δ

Objective function( )

[ ]( )

n

mj

n

mkjkk

Tjjk

n

mjj

Tj

J

uJ φφRφφ1 11

:Define+= +=+=

−−= ∑ ∑∑ δPartial Differentiation

ϕϕϕϕ RR 2=

∂∂ TObjective function

[ ]( )[ ]( )

jnmjT

n

mkkjkj

jnjm uuuJ

j

φφΦuΦRφ

u0φRφφ

.....1 1

1

where

where 22 ++=

≤≤+

∀⇒

==−=∂∂

∀⇒ ∑

[ ]( )[ ] [ ]

[ ]( )nmmnnm

nmmnjmnjnjm

UUΦRΦ

uuΦφφR

φφΦuΦRφ

..........

.....

11

11

h

where

+−+

+−−≤≤+

=⇒

==∀⇒

[ ]( )nmmnmnmnmn uuUUΦRΦ .....1 where +−−−− ==⇒

Have a particular solution if is a diagonal matrix and its diagonal elements i th i l f d i th i di i t

mn−Uλλ R φφ

Statistics-17

is the eigenvalues of and is their corresponding eigenvectors nm λλ ...1+ R nm φφ .....1+

Page 18: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (10/13)

• Given an input vector x with dimension mTry to construct a linear transform Φ’ (Φ’ is an nxm matrix m<n)– Try to construct a linear transform Φ’ (Φ’ is an nxm matrix m<n) such that the truncation result, Φ’Tx, is optimal in mean-squared error criterion

⎤⎡x ⎤⎡

Encoder⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

=xx

.2

1

x⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

=yy

.2

1

yxΦy T′=

T ′Φ

⎥⎥⎥

⎦⎢⎢⎢

⎣ nx.

⎥⎥⎥

⎦⎢⎢⎢

⎣ my.

⎤⎡

[ ]mϕϕϕ ..21 where =′Φ

Decoder⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

=xx

.ˆˆ

ˆ2

1

x

yΦx ′=ˆ

Φ ′⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

=yy

.2

1

y

⎥⎥⎥

⎦⎢⎢⎢

⎣ nx̂.

( ) ( )( )E T ˆˆi i i

Φ ′

⎥⎥⎥

⎦⎢⎢⎢

⎣ my.

y

Statistics-18

( ) ( )( )-xx-xxEx minimize

Page 19: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (11/13)

• Data compression in communication

– PCA is an optimal transform for signal representation and dimensional reduction, but not necessary for classification tasks, such as speech recognition ? (To be discussed later on)suc as speec ecog t o

– PCA needs no prior information (e.g. class distributions of output information) of the sample patterns

? (To be discussed later on)

Statistics-19

Page 20: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (12/13)

• Scree Graph– The plot of variance as a function of the number of eigenvectors– The plot of variance as a function of the number of eigenvectors

kept• Select such that Threshold

nm

m ≥+++++

+++λλλλ

λλλLL

L

21

21m

• Or select those eigenvectors with eigenvalues larger than the average input variance (average eivgenvalue)

∑≥n1 λλ

Statistics-20

∑≥=i

im n 1λλ

Page 21: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Derivations (13/13)

• PCA finds a linear transform W such that the sum of average between-class variation and average within-average between class variation and average withinclass variation is maximal

( ) WSWWSWSSSW TTJ ~~~ ?( ) WSWWSWSSSW bT

wT

bwJ +=+== ?

( )( )∑ −−= TiiN

xxxxS 1

∑=j

jjw NN

ΣS 1

( )( )i

iiNsample index

jN

( )( )∑ −−=j

Tjjjb N

NxxxxS 1

class index

WSWS wT

w =~

WSWS bT

b =~

SSS + :thatshowTry to

Statistics-21

WSWS bbbw SSS +=

Page 22: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Data Analysis

• Example 1: principal components of some data points

Statistics-22

Page 23: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Feature Transformation

• Example 2: feature transformation and selection

Correlation matrix for old feature di idimensions

New feature dimensions

Statistics-23

threshold for information content reserved

Page 24: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Image Coding (1/2)

• Example 3: Image Coding

8

8

256

256

Statistics-24

Page 25: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Image Coding (2/2)

• Example 3: Image Coding (cont.)(value reduction)(feature reduction) (value reduction)(feature reduction)

Statistics-25

Page 26: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Eigenface (1/4)

• Example 4: Eigenface in face recognition (Turk and Pentland, 1991)

Consider an individual image to be a linear combination of a small– Consider an individual image to be a linear combination of a small number of face components or “eigenfaces” derived from a set of reference images

⎥⎥⎤

⎢⎢⎡

⎥⎥⎤

⎢⎢⎡

⎥⎥⎤

⎢⎢⎡ L

xx

xx

xx 1,1,21,1

– Steps• Convert each of the L reference images into a vector of

⎥⎥⎥⎥⎥

⎦⎢⎢⎢⎢⎢

=

⎥⎥⎥⎥⎥

⎦⎢⎢⎢⎢⎢

=

⎥⎥⎥⎥⎥

⎦⎢⎢⎢⎢⎢

=

nL

L

L

nn x

x

x

x

x

x

.

2.

.2

2.2

2

.1

2.1

1

.

.,......,..,

.

. xxx

• Convert each of the L reference images into a vector of floating point numbers representing light intensity in each pixel

• Calculate the coverance/correlation matrix between these reference vectors

• Apply Principal Component Analysis (PCA) find the eigenvectors of the matrix: the eigenfaceseigenvectors of the matrix: the eigenfaces

• Besides, the vector obtained by averaging all images are called “eigenface 0”. The other eigenfaces from “eigenface 1”

d d l th i ti f thi f

Statistics-26

onwards model the variations from this average face

Page 27: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Eigenface (2/4)

• Example 4: Eigenface in face recognition (cont.) Steps– Steps

• Then the faces are then represented as eigenvoice 0 plus a linear combination of the remain K (K ≤ L) eigenfaces

– The Eigenface approach persists the minimum mean-squared error criterionerror criterion

– Incidentally, the eigenfaces are not only themselves usually plausible faces, but also directions of variations between faces

( ) ( ) ( )[ ]

Kiiii Kwww ,2,1, .....21ˆ ++++= eeexx[ ]Kiiii www ,2,1, ,...,,,1=⇒ y

Feature vector of a person i

Statistics-27

Page 28: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Eigenface (3/4)

Face images as the training setg g

The averaged face

Statistics-28

Page 29: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Eigenface (4/4)

A projected face imageSeven eigenfaces derived from the training set

A projected face imageg g

?

Statistics-29

(Indicate directions of variations between faces )

Page 30: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Eigenvoice (1/3)

• Example 5: Eigenvoice in speaker adaptation (PSTL, 2000)

– Steps– Steps• Concatenating the regarded parameters for each speaker r to

form a huge vector a(r) (a supervectors)• SD HMM model mean parameters (μ)

Speaker 1 Data Speaker R Data Each new speaker S is representedEach new speaker S is representedSI HMM

Model Training Model Training

by a point by a point PP in in KK--spacespace

( ) ( ) ( ) ( )Kwww Kiiii eeeeP ,2,1, .....210 ++++=

Ei iEi i

Speaker 1 HMM Speaker R HMM

D =

SI HMM model

Eigenvoice Eigenvoice space space

constructionconstruction

D = (M.n)×1 Principal Component

Analysis

Statistics-30

Page 31: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Eigenvoice (2/3)

• Example 4: Eigenvoice in speaker adaptation (cont.)

Statistics-31

Page 32: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA Examples: Eigenvoice (3/3)

• Example 5: Eigenvoice in speaker adaptation (cont.)– Dimension 1 (eigenvoice 1):

• Correlate with pitch or sexDimension 2 (eigenvoice 2):– Dimension 2 (eigenvoice 2):

• Correlate with amplitude– Dimension 3 (eigenvoice 3):( g )

• Correlate with second-formantmovement

Note that:Eigenface performs on feature spaceg p pwhile eigenvoice performs on model space

Statistics-32

Page 33: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Linear Discriminant Analysis (LDA) (1/2)

• Also called Fisher’s Linear Discriminant Analysis Fisher Rao Linear– Fisher s Linear Discriminant Analysis, Fisher-Rao Linear Discriminant Analysis

• Fisher (1936): introduced it for two-class classification

• Rao (1965): extended it to handle multiple-class classification

Statistics-33

Page 34: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Linear Discriminant Analysis (LDA) (2/2)

• Given a set of sample vectors with labeled (class) information try to find a linear transform W such that theinformation, try to find a linear transform W such that the ratio of average between-class variation over average within-class variation is maximal

Within-class distributions are assumed here to be GaussiansWith equal variance in the two-dimensional sample space

Statistics-34

Page 35: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LDA Derivations (1/4)

• Suppose there are N sample vectors with dimensionality n each of them is belongs to one of the J

ixdimensionality n, each of them is belongs to one of the Jclasses– The sample mean is:

( ) { } ( ) index class is ,,....,2,1 , ⋅∈= gJjjg ix

∑=N1 xxp

– The class sample means are:

∑==i

iN 1xx

( )∑

==

jgi

jj

iN xxx 1

– The class sample covariances are:

– The average within-class variation before transform

( )( )( )∑

=−−=

jg

Tjiji

jj

iN xxxxxΣ 1

The average within class variation before transform

The average between class variation before transform

∑=j

jjw NN

ΣS 1

– The average between-class variation before transform

( )( )∑ −−=j

Tjjjb N

NxxxxS 1

Statistics-35

jN

Page 36: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LDA Derivations (2/4)

• If the transform is appliedThe sample vectors will be

[ ]mwwwW 1 ....2=T xWy– The sample vectors will be

– The sample mean will be

ii xWy =

xWxWxWy TN

ii

TN

ii

T

NN=⎟

⎠⎞

⎜⎝⎛== ∑∑

== 11

11

– The class sample means will be

⎠⎝

( ) jT

jgi

T

jj

iNxWxWy

x== ∑

=

1

– The average within-class variation will be

( ) ( )T⎪⎫⎪⎧ ⎟

⎞⎜⎛⎟⎞

⎜⎛ 1111~ ( )

( )( )( )

( )xWxWxWxWS

xx x

T

jgi

T

ji

T

jg jgi

T

ji

T

jjjw NNN

NN ii i

⎬⎫

⎨⎧

⎪⎭

⎪⎬

⎪⎩

⎪⎨ ⎟

⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎟⎠

⎞⎜⎜⎝

⎛−⋅= ∑∑ ∑∑

== =

1

1111~

WSW

WΣW

wT

jj

jT NN

=⎭⎬⎫

⎩⎨⎧= ∑

1 x y

wS wS~

S ~

W

Statistics-36

bS bS

Page 37: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LDA Derivations (3/4)

• If the transform is applied– Similarly, the average between-class variation will be

[ ]mwwwW 1 ....2=y, g

T t fi d ti l h th t th f ll i bj ti f ti i

WSWS bT

b =~

W– Try to find optimal such that the following objective function is maximized

W

( )WSWS

Wb

Tb

J ==

~

• A closed-form solution: the column vectors of an optimal matrix

( )WSWS

Ww

Tw

J ~

are the generalized eigenvectors corresponding to the largest eigenvalues in

iwiib wSwS λ=W

• That is, are the eigenvectors corresponding to the largest eigenvalues of

s'iw

wwSS λ=−1bw SS 1−

Statistics-37

iiibw wwSS λ=

Page 38: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LDA Derivations (4/4)

• Proof:

( ) bTb WSWS

determinant

( )

i

wT

b

w

bJ

Ww

WSWWSW

SWW

WWW===

: thatfind want towe of tor column veceach for ly,equivalentOr

Q

,

maxarg~maxargmaxargˆˆˆˆ

( ) ( )TT

iwT

ibT

ii

i

SSSS

wSw

wSwλ =

22

:solution optimal has form qradtic The 2GFGGF

GF ′−′

=′⎟⎠⎞

⎜⎝⎛

( ) ( )( )

( ) ( )ibT

iwiwT

ib

iwT

ibT

iwiwT

ib

i

i

i

ii

wSwwSwSwwS

wSw

wSwwSwSwwS

=−

=∂∂

⇒ 2 022

( )xCCxCxx T

T

+=d

d )(

( )( )

( )( )

ibT

iwib

iwT

ibiw

iwT

iwib

i

i

i

i

i

wSwwSwS

wSw

wSwwS

wSw

wSwwS

λλ ⎟⎞

⎜⎛

=−⇒ 22

0

0

iwiibiwiib

iwTii

iwT

iw

iwT

ib

i

i

ii

SSwSwSwSwS

wSwwSwwSw

λλλ

λλ

=⇒=−⇒

⎟⎟

⎠⎜⎜

⎝==−

1

0

0 Q

Statistics-38

iiibwwwSS λ=⇒ −1

Page 39: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LDA Examples: Feature Transformation (1/2)

• Example1: Experiments on Speech Signal Processing

Covariance Matrix of the 18-Mel-filter-bank vectors Covariance Matrix of the 18-cepstral vectors

Calculated using Year-99’s 5471 files Calculated using Year-99’s 5471 files

( )( )∑ −−=i

TiiN x

xxxxΣ 1 ( )( )∑ −−=′i

TiiN y

yyyyΣ 1

Statistics-39

After Cosine Transform

Page 40: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LDA Examples: Feature Transformation (2/2)

• Example1: Experiments on Speech Signal Processing (cont )Covariance Matrix of the 18-PCA-cepstral vectors Covariance Matrix of the 18-LDA-cepstral vectors

Example1: Experiments on Speech Signal Processing (cont.)

Calculated using Year-99’s 5471 filesCalculated using Year-99’s 5471 files

Character Error RateAfter PCA Transform After LDA TransformCharacter Error Rate

TC WG

MFCC 26.32 22.71

LDA-1 23.12 20.17

After PCA Transform After LDA Transform

Statistics-40

LDA-2 23.11 20.11

Page 41: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

PCA vs. LDA (1/2)

PCA LDA

Statistics-41

Page 42: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Heteroscedastic Discriminant Analysis (HDA)

• HDA: Heteroscedastic Discriminant Analysis – The difference in the projections obtained from LDA and HDA for

2-class case

• Clearly, the HDA provides a much lower classificationClearly, the HDA provides a much lower classification error than LDA theoretically – However, most statistical modeling approaches assume data

Statistics-42

samples are Gaussian and have diagonal covariance matrices

Page 43: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

HW: Feature Transformation (1/4)

• Given two data sets (MaleData, Female Data) in which each row is a sample with 39 features please performeach row is a sample with 39 features, please perform the following operations:1. Merge these two data sets and find/plot the covariance matrix for g p

the merged data set.2. Apply PCA and LDA transformations to the merged data set,

respectively Also find/plot the covariance matrices forrespectively. Also, find/plot the covariance matrices for transformations, respectively. Describe the phenomena that you have observed.

3. Use the first two principal components of PCA as well as the first two eigenvectors of LDA to represent the merged data set. Selectively plot portions of samples from MaleData and y p p pFemaleData, respectively. Describe the phenomena that you have observed.

Statistics-43

http://berlin.csie.ntnu.edu.tw/PastCourses/2004S-MachineLearningandDataMining/Homework/HW-1/MaleData.txt

Page 44: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

HW: Feature Transformation (2/4)

Statistics-44

Page 45: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

HW: Feature Transformation (3/4)

• Plot Covariance MatrixCoVar=[[

3.0 0.5 0.4;0.9 6.3 0.2;0.4 0.4 4.2;

];

• Eigen Decomposition

];colormap('default');surf(CoVar);

%LDAIWI=inv(WI);A=IWI*BE;%PCA• Eigen Decomposition

BE=[3.0 3.5 1.4;1 9 6 3 2 2

A=BE+WI; % why ?? ( Prove it! )

[V,D]=eig(A);[V,D]=eigs(A,3);

1.9 6.3 2.2;2.4 0.4 4.2;

];

WI [

fid=fopen('Basis','w');for i=1:3 % feature vector lengthfor j=1:3 % basis number

WI=[4.0 4.1 2.1;2.9 8.7 3.5;4.4 3.2 4.3;

];

fprintf(fid,'%10.10f ',V(i,j));endfprintf(fid,'\n');

end

Statistics-45

]; fclose(fid);

Page 46: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

HW: Feature Transformation (4/4)

• Examples

5

62000筆原始資料經 PCA轉換後分布圖

0

0.22000筆原始資料經LDA轉換後分布圖

3

4

2 -0.4

-0.2

0

2

0

1

2

feat

ure

-0.8

-0.6feat

ure

-10 -8 -6 -4 -2 0 2 4-2

-1

feature 1-0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2

-1.2

-1

feature 1feature 1 feature 1

Statistics-46

Page 47: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

Latent Semantic Analysis (LSA) (1/7)

• Also called Latent Semantic Indexing (LSI), Latent Semantic Mapping (LSM)Semantic Mapping (LSM)

• A technique originally proposed for Information Retrieval (IR), which projects queries and docs into a space with “latent” semantic dimensions– Co-occurring terms are projected onto the

same dimensionssame dimensions

– In the latent semantic space (with fewer dimensions), a query fand doc can have high cosine similarity even if they do not share

any terms

– Dimensions of the reduced space correspond to the axes of greatest variation

• Closely related to Principal Component Analysis (PCA)

Statistics-47

y p p y ( )

Page 48: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA (2/7)

• Dimension Reduction and Feature Extraction– PCA feature spacePCA

XφTiiy =

Y

knX ∑

=

k

iiiy

1

φn

feature space

kn

kφ1φ

=i 1

kφ1φ

n

2orthonormal basis

– SVD (in LSA)

Σ’ V’T

kgiven afor ˆmin2

XX −

latent semantick

rxr

Σ

rxn

VU’

kxkA A’spacek

rxr

r ≤ min(m,n)rxn

mxrmxn mxn

kgivenaformin 2AA −′latent semantic

space

k

Statistics-48

kF given afor min AA −space

Page 49: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA (3/7)

– Singular Value Decomposition (SVD) used for the word-document matrix

• A least-squares method for dimension reduction

iφProjection of a Vector :xx

1ϕ2ϕ

1y2y

j

1 where,

cos

1

111 1

1

=

===

φ

xφxx

xx TT

y ϕϕ

θ

1y

Statistics-49

Page 50: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA (4/7)

• Frameworks to circumvent vocabulary mismatch

Doc terms structure model

doc expansion

literal term matching latent semantic i l

query expansion

literal term matching structure retrieval

Query terms structure model

Statistics-50

Page 51: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA (5/7)

Statistics-51

Page 52: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA (6/7)

Query: “human computer interaction”

An OOV word

Statistics-52

Page 53: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA (7/7)

• Singular Value Decomposition (SVD)d1 d2 dn

d1 d2 dn

Row A Rn

Col A Rm∈∈

w1w2

1 2 n

rxr rxnA UΣr VT

rxm

w1w2

Both U and V has orthonormalcolumn vectors

UTU=IrXr

Col A R

=

wm

rxr rxnA Umxr

wm

VTV=IrXr

r ≤ min(m,n)m

mxn mxr

w1

d1 d2 dn

Σ V’Td1 d2 dn

w1

K ≤ r ||A||F2 ≥ ||A’|| F2

m n

=

w1w2

kxk kxnA’ U’mxk

Σk V’Tkxm1

w2 ∑∑= =

=m

i

n

jijFaA

1 1

22

wmmxn mxk

A mxk

wm

Docs and queries are represented in a k-dimensional space. The quantities ofthe axes can be properly weighted according to the associated diagonal

Statistics-53

mxn mxk according to the associated diagonalvalues of Σk

Page 54: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Derivations (1/7)

• Singular Value Decomposition (SVD)ATA is symmetric nxn matrix– ATA is symmetric nxn matrix

• All eigenvalues λj are nonnegative real numbers

0≥≥≥≥ λλλ ( )diag λλλ2 =Σ• All eigenvectors vj are orthonormal ( Rn)

0....21 ≥≥≥≥ nλλλ

[ ]vvvV 21= 1=jT vv ( )T IVV =

( )ndiag λλλ ,...,, 11=Σ∈

• Define singular values:njjj ,...,1 , == λσ

[ ]nnn vvvV ...21=× jjvv ( )nxnIVV

sigma

– As the square roots of the eigenvalues of ATA– As the lengths of the vectors Av1, Av2 , …., Avn

22

11

Av

Av

=

=

σ

σFor λi≠ 0, i=1,…r,{Av1, Av2 , …. , Avr } is an

ii

iiiTii

TTii

Av

vvAvAvAv

σ

λλ

=⇒

===2

Statistics-54

.....orthogonal basis of Col A

Page 55: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Derivations (2/7)

• {Av1, Av2 , …. , Avr } is an orthogonal basis of Col A( ) 0====• TTTT vvAvAvAvAvAvAv λ

– Suppose that A (or ATA) has rank r ≤ n

( ) 0====• jijjijiji vvAvAvAvAvAvAv λ

00 ====>≥≥≥ λλλλλλ

– Define an orthonormal basis {u1, u2 ,…., ur} for Col A

0.... ,0.... 2121 ====>≥≥≥ ++ nrrr λλλλλλ

11

[ ] [ ]

iiiii

ii

i

vvvAuuu

AvuAvAvAv

u 11

=Σ⇒

=⇒== σσ V : an orthonormal matrix (nxr)u: also an

orthonormal matrix

• Extend to an orthonormal basis {u1, u2 ,…, um} of Rm

[ ] [ ]rrr vvvAuuu ... 2121 =Σ⇒

[ ] [ ]

Known in advanceorthonormal matrix

(mxr)

[ ] [ ]

T

TTnrmr

AVVVUAVU

vvvvAuuuu

=Σ⇒=Σ⇒

=Σ⇒

... ... ... ... 2121

2222A σσσ +++ ?

∑∑= =

=m

i

n

jijFaA

1 1

22

I ?

Statistics-55

TVUA Σ=⇒ 21 ... rFA σσσ +++= ?nxnI ?( )

( ) ( ) ( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛=

−×−×−

−××

rnrmrrm

rnrrnm 00

0Σ Σ

Page 56: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Derivations (3/7)

Av thespans i iu thespans

mxn

AA of space row

iTA of space row

mxn

1 V U

Rn Rm

V

1U

2U UTV

2 V 2U

( )VV

000Σ

UUVUΣ ⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛= 11

21 T

TT0=AX

U

( )

VAV

VΣU

V00

=

=

⎟⎠

⎜⎝⎟⎠

⎜⎝

11

111

221

T

T

T0 AX

AVU =Σ

Statistics-56

A=

Page 57: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Derivations (4/7)

• Additional Explanations– Each row of is related to the projection of a corresponding U p j p g

row of onto the basis formed by columns of A V

AA

VUAT

TΣ=

• the i-th entry of a row of is related to the projection of a corresponding row of onto the i th column of

AVUUVVUAV T =Σ⇒Σ=Σ=⇒

UA Vcorresponding row of onto the i-th column of

– Each row of is related to the projection of a corresponding row of onto the basis formed by

A

V

V

UTArow of onto the basis formed by UA

( ) VUUVUVUUA

VUATTTT

T

Σ=Σ=Σ=⇒

Σ=

• the i-th entry of a row of is related to the projection of a

( )UAV

VUUVUVUUAT=Σ⇒

Σ=Σ=Σ=⇒

V

Statistics-57

the i th entry of a row of is related to the projection of a corresponding row of onto the i-th column of TA

VU

Page 58: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Derivations (5/7)

• Fundamental comparisons based on SVDThe original word document matrix (A)– The original word-document matrix (A)

w1

d1 d2 dn • compare two terms → dot product of two rows of Aor an entry in AATw1

w2

A– or an entry in AAT

• compare two docs → dot product of two columns of A– or an entry in ATA

Th d d t t i (A’)

wmmxn

• compare a term and a doc → each individual entry of A

wjwi– The new word-document matrix (A’)• compare two terms

→ dot product of two rows of U’Σ’A’A’T=(U’Σ’V’T) (U’Σ’V’T)T=U’Σ’V’TV’Σ’TU’T =(U’Σ’)(U’Σ’)T

For stretching I

U’=Umxk

Σ’=Σk

V’ V

wi

→ dot product of two rows of UΣ• compare two docs

→ dot product of two rows of V’Σ’A’TA’=(U’Σ’V’T)T ’(U’Σ’V’T) =V’Σ’T’UT U’Σ’V’T=(V’Σ’)(V’Σ’)T

gor shrinking

IrxrV’=Vnxk

dk

ds

Statistics-58

• compare a query word and a doc → each individual entry of A’

Page 59: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Derivations (6/7)

• Fold-in: find representations for pesudo-docs qFor objects (new queries or docs) that did not appear in the– For objects (new queries or docs) that did not appear in the original analysis

• Fold-in a new mx1 query (or doc) vector

( ) 111ˆ −

×××× Σ= kkkmmT

k UqqQuery represented by the weighted

The separate dimensions are differentially weighted

Just like a row of V

– Cosine measure between the query and doc vectors in

Query represented by the weightedsum of it constituent term vectors

Just like a row of V

Cosine measure between the query and doc vectors in the latent semantic space

( ) Σ dq Tˆˆˆˆ2( )ΣΣ

Σ=ΣΣ=

dq

dqdqcoinedqsimˆˆ

)ˆ,ˆ(ˆ,ˆ

Statistics-59

row vectors

Page 60: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Derivations (7/7)

• Fold-in a new 1 X n term vector 1

11ˆ−×××× Σ= kkknnk Vtt 11 ×××× kkknnk

Statistics-60

Page 61: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Example

• Experimental results– HMM is consistently better than VSM at all recall levelsHMM is consistently better than VSM at all recall levels– LSA is better than VSM at higher recall levels

Recall-Precision curve at 11 standard recall levels evaluated on

Statistics-61

TDT-3 SD collection. (Using word-level indexing terms)

Page 62: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA: Conclusions

• AdvantagesA clean formal framework and a clearly defined optimization– A clean formal framework and a clearly defined optimization criterion (least-squares)

• Conceptual simplicity and clarity– Handle synonymy problems (“heterogeneous vocabulary”)

– Good results for high-recall searchT k t i t t• Take term co-occurrence into account

• DisadvantagesHigh computational complexity– High computational complexity

– LSA offers only a partial solution to polysemy• E.g. bank, bass,…g

Statistics-62

Page 63: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Toolkit: SVDLIBC (1/5)

• Doug Rohde's SVD C Library version 1.3 is basedon the SVDPACKC libraryon the SVDPACKC library

• Download it at http://tedlab mit edu/~dr/• Download it at http://tedlab.mit.edu/~dr/

Statistics-63

Page 64: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Toolkit: SVDLIBC (2/5)

• Given a sparse term-doc matrixE g 4 terms and 3 docs 4 3 6

Row#Tem

Col.# Doc

Nonzero entries

– E.g., 4 terms and 3 docsDoc

4 3 6 20 2.32 3 8

2 nonzero entries at Col 0

Col 0, Row 0 Col 0 Row 2

2.3 0.0 4.2 0.0 1.3 2.2 3 8 0 0 0 5

Term

2 3.811 1.33

Col 0, Row 2 1 nonzero entry

at Col 1Col 1, Row 1

3 nonzero entry

– Each entry is weighted by TFxIDF score

3.8 0.0 0.5 0.0 0.0 0.0

30 4.21 2.2

3 nonzero entryat Col 2

Col 2, Row 0 Col 2, Row 1 Col 2 Row 2– Each entry is weighted by TFxIDF score

• Perform SVD to obtain corresponding term and doc vectors represented in the latent semantic space

2 0.5 Col 2, Row 2

vectors represented in the latent semantic space• Evaluate the information retrieval capability of the LSA

approach by using varying sizes (e g 100 200 600

Statistics-64

approach by using varying sizes (e.g., 100, 200, ..,600 etc.) of LSA dimensionality

Page 65: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Toolkit: SVDLIBC (3/5)

• Example: term-docmatrixIndexing

D Nonzero

51253 2265 21885277508 7 725771

gTerm no. Doc no. entries

508 7.725771596 16.213399612 13.080868709 7.725771713 7.725771744 7.7257711190 7.7257711200 16 213399

• SVD command (IR_svd.bat)

1200 16.2133991259 7.725771…… LSA100-Utoutput

svd -r st -o LSA100 -d 100 Term-Doc-Matrix

sparse matrix input prefix of output filesNo. of reserved name of sparse

LSA100-S

LSA100-Vt

Statistics-65

sparse matrix input prefix of output files eigenvectors matrix input

Page 66: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Toolkit: SVDLIBC (4/5)

• LSA100-Ut100 51253

51253 words100 512530.003 0.001 ……..0.002 0.002 …….

• LSA100-Sword vector (uT): 1x100

• LSA100-Vt 2265 docs

1002686.18829 941

100 22650.021 0.035 ……..829.941

559.59….

100 eigenvalues

0.021 0.035 ……..0.012 0.022 …….

Statistics-66

doc vector (vT): 1x100

Page 67: Data Analysis and Dimension Reduction - NTNUberlin.csie.ntnu.edu.tw/Courses/Introduction to... · Data Analysis and Dimension Reduction - PCA, LDA and LSA Berlin Chen ... k m k jk

LSA Toolkit: SVDLIBC (5/5)

• Fold-in a new mx1 query vector

( ) 111ˆ −

×××× Σ= kkkmmT

k Uqq The separate dimensions are differentially weighted( )

Query represented by the weightedsum of it constituent term vectors

are differentially weightedJust like a row of V

• Cosine measure between the query and doc vectors in the latent semantic space

TFxIDF weighted beforehand

the latent semantic space

( ) Σ d Tˆˆ 2( )ΣΣ

Σ=ΣΣ=

dq

dqdqcoinedqsimT

ˆˆ

ˆ)ˆ,ˆ(ˆ,ˆ

2

Statistics-67