SVD, SVD applications to LSA, non-negative matrix ...
Transcript of SVD, SVD applications to LSA, non-negative matrix ...
9/28/2018
1
SVD, SVD applications to LSA, non-negative matrix
factorizations
Presented By: Sumedha Singla
Singular Value Decomposition (SVD)
9/28/2018
2
Singular Value Decomposition (SVD)
SVD of a matrix X
๐n รd = ๐n รn๐บn รd๐d รdT or
๐n รd = ๐n ร k๐บk รk๐k รdT
โข ๐: A set of n points in โd with rank kโข ๐ : Left Singular Vectors of ๐โข ๐ : Right Singular Vectors of ๐โข ๐บ: Rectangular diagonal matrix with positive real entries.
๐ = u1 โฆ uk โฆ un
ฯ1โฑ
ฯk
v1T โฆ vk
T โฆ vdT
๐ = ๐๐บ๐T = u1ฯ1v1T + โฆ+ ukฯkvk
T =
i=1
k
uiฯiviT
Singular Value Decomposition (SVD)
SVD of a matrix X
๐ ๐ฏ๐ข = ฯi๐ฎ๐ขโข Finding an orthogonal basis for the row space that gets transformed into an
orthogonal basis for the column space.
โข The columns of ๐ and ๐ are bases for the row and column spaces, respectively.
โข ๐ and ๐ are orthonormal square matrix i.e๐๐T = ๐T๐ = ๐๐๐T = ๐T๐ = ๐
โข Usually, ๐ โ ๐.
9/28/2018
3
Motivation
โข Goal: Find the best k-dimensional subspace w.r.t ๐ (Project ๐ to โk where k < d)โข minimize the sum of the squares of the perpendicular distances of the
points to the subspace
โข Consider a set of 2d points. ๐n ร2, ๐ฑi โโ2; 1 โค i โค nโข Goal: Find the best fitting line through
origin w.r.t ๐โข Here, k = 1โข Best least square fit
โข Minimize ฯฮฑi2 or
โข Maximize ฯฮฒi2 i.e projection of ๐ฑi
on subspace
โข ๐ฏ: A unit vector in the direction of the best fitting line through origin w.r.t ๐
โข ฮฒi = |xi . ๐ฏ|
โข Best least square fitโข Maximizing ฯฮฒi
2 = |๐ . ๐ฏ|2
โข First singular vectorโข ๐ฏ1 = argmax
๐ฏ =1|๐ . ๐ฏ|
โข First singular valueโข ๐1 = |๐ . ๐ฏ1|
โข Greedy approach for subsequent singular vectorsโข Best fit line perpendicular to ๐ฏ1โข ๐ฏ2 = arg max
๐ฏโฅ๐ฏ1, ๐ฏ =1|๐ . ๐ฏ|
Singular Vectors ๐ ๐ฏ๐ข = ฯi ๐ฎ๐ข
9/28/2018
4
Intuitive Interpretation
A composition of three geometrical transformations: a rotation or reflection, a scaling, and another rotation or reflection.
๐ = ๐ ๐บ ๐T
โข Consider a unit circle๐ฑโฒ. ๐ฑโฒ = ๐
โข An ellipse of any size and orientation by stretching and rotating it.
โข Consider 2-d points and fit an ellipse with major axis (a) and minor axes (b) to them.
โข Consider,
๐ =a 00 b
, ๐ =cos ฮธ sin ฮธโ sin ฮธ cos ฮธ
โข Any point can be transformed as๐ฑโฒ = ๐ฑ ๐ ๐โ1
โข The equation of unit circle๐โ1๐T๐ฑ . ๐ฑ ๐ ๐โ1 = ๐
Intuitive Interpretation ๐ = ๐ ๐บ ๐T
9/28/2018
5
โข Resulting matrix equation
๐โ1๐T๐T๐๐๐โ1 = ๐
โข If we regard ๐ as a collection of points, thenโข The singular values are the axes of a least squares fitted ellipsoid
โข ๐ is orientation of the ellipsoid.
โข The matrix ๐ is the projection of each of the points in ๐ onto the axes.
Intuitive Interpretation ๐ = ๐ ๐บ ๐T
โข Natural Language Processingโข Documents with 2 concepts:
โข Computer Science (CS)โข Medical Documents (MD)
SVD Example ๐n รd = ๐n ร k๐บk รk๐k รdT
Term-Document MatrixRow: 1 DocumentColumns: 1 Term
Document-Concept Similarity Matrix
Row: 1 DocumentColumns: 1 Concept
Concept Strength MatrixRow: 1 Concept
Term-Concept MatrixRow: 1 ConceptColumn: 1 Term
9/28/2018
6
Eigen Vector
โข An eigenvector of a square matrix ๐ is a nonzero vector ๐ฏ such that multiplication by ๐ alters only the scale of ๐ฏ
๐๐ฏ = ฮป๐ฏโข ฮป: Eigen value
โข ๐ฏ: Unit Eigen vector
Eigen Value Decomposition
๐ = ๐ diag(๐)๐โ1 whereโข Eigen vector matrix ๐ = [v1, โฆ , vn]
โข Diagonal matrix ๐ = ฮป1, โฆ , ฮปnMore general form
๐ = ๐ ๐ฆ ๐ธ๐
Eigen value decomposition ๐ = ๐ ๐บ ๐T
โข Eigen value decomposition: ๐ = ๐ ๐ฆ ๐ธ๐
โข ๐ needs โข orthonormal eigen vectors to allow ๐ = ๐ = ๐.
โข Eigenvalues ๐ โฅ 0 if ๐ฆ = ๐บ.
โข Hence, ๐ must be a positive semi-definte (or definite) symmetric matrix.
โข Eigen value decomposition is a special case of SVD.
When is singular values same as eigen values
๐ = ๐ ๐บ ๐T
9/28/2018
7
Rather than solving for U, V and ฮฃ simultaneously, we multiply both sides by๐T = ๐ฝ ๐บT ๐ผT
๐T๐ = (๐ ๐บ ๐T )T (๐ ๐บ ๐T )
= ๐ฝ ๐บT ๐ผT๐ ๐บ ๐T
= ๐ฝ ๐บT ๐บ ๐T
= ๐ฝ ๐บ2 ๐T
This is the form of eigen value decomposition. ๐ = ๐ ๐ฆ ๐ธ๐
๐: The eigen vectors of ๐T๐.
๐บT ๐บ: The eigen value matrix of ๐T๐.๐๐ = ฮปi
U: The eigen vectors of ๐๐T.
Calculating SVD using Eigen value decomposition
๐ = ๐ ๐บ ๐T
We know that,
๐ฎ๐๐ ๐ฎj =
๐ ๐ฏi
ฯi
T ๐ ๐ฏj
ฯj
๐ฎ๐๐ ๐ฎj =
๐ฏ๐๐ ๐T๐๐ฏj
ฯi ฯj=
๐T๐
ฯi ฯj๐ฏ๐๐๐ฏj = 0
๐: The orthonormal eigen vectors of ๐ ๐T.
We can thus write,
๐ ๐T๐ = ๐ ๐บ๐
SVD and Eigen value decomposition ๐ ๐ฏ๐ข = ฯi ๐ฎ๐ข
9/28/2018
8
โข Consider ๐ =4 4โ3 3
โข Compute ๐T๐ =4 โ34 3
4 4โ3 3
=25 77 25
โข Orthogonal Eigen vector of ๐T๐
โข ๐ฏ1 =1/ 2
1/ 2and ๐ฏ2 =
1/ 2
โ1/ 2
โข Eigen values of ๐T๐โข ๐1
2 = 32 and ๐22 = 18
โข We have,
4 4โ3 3
=4 2 0
0 3 2
1/ 2 1/ 2
1/ 2 โ1/ 2
Example SVD
โข Consider ๐ =4 4โ3 3
โข Compute ๐ ๐T =4 4โ3 3
4 โ34 3
=32 00 18
โข Orthogonal Eigen vector of ๐ ๐T
โข ๐ฎ1 =10
and ๐ฎ2 =0โ1
โข We have,
4 4โ3 3
=1 00 โ1
4 2 0
0 3 2
1/ 2 1/ 2
1/ 2 โ1/ 2
Example SVD
9/28/2018
9
โข An eigen-decomposition is valid only for square matrix. Any matrix (even rectangular) has an SVD.
โข In eigen-decomposition ๐ = ๐ ๐ฆ ๐ธ๐ , the eigen-basis (๐) is not always orthogonal. The basis of singular vectors is always orthogonal.
โข In SVD we have two singular-spaces (right and left).
โข Computing the SVD of a matrix is more numerically stable.
SVD vs Eigen Decomposition
โข The covariance matrix of ๐ is given by
๐๐จ๐ฏ = ๐T ๐/(๐ง โ ๐)
โข The eigen value decomposition of ๐๐จ๐ฏ matrix๐๐จ๐ฏ = ๐ ๐ฆ ๐ธ๐
Where,
๐ is a matrix of eigenvectors of ๐๐จ๐ฏ or principal axes of ๐
๐ฆ is a diagonal matrix with eigenvalues ฮปi in the decreasing order on the diagonal.
SVD and PCA ๐ = ๐ ๐บ ๐T
9/28/2018
10
โข We can rewrite covariance matrix of ๐ as
๐๐จ๐ฏ = ๐T ๐/(๐ง โ ๐)๐๐จ๐ฏ = ๐ ๐บ ๐T๐ ๐บ ๐T/(๐ง โ ๐)
= ๐๐บ2
(๐ โ 1)๐T
โข Right singular vector ๐ is the principal axes
โข ฮปi = ฮคฯi2 (n โ 1)
โข ๐ ๐ = ๐ ๐บ ๐T๐ = ๐ ๐บ
โข The columns of ๐ ๐บ are the principal components.
SVD and PCA ๐ = ๐ ๐บ ๐T
โข Input Data: ๐n รdโข Goal: Reduce the dimensionality to k where k < d
โข Select k first columns of ๐, and k ร k upper-left part of ๐บ
โข Construct ๐ = ๐k ๐บk ร k
โข ๐ is the required n ร k matrix containing first k PCs.
SVD for dimensionality reduction ๐ = ๐ ๐บ ๐T
9/28/2018
11
The best approximation to ๐ by a rank deficient matrix is obtained by the top singular values and vectors of ๐.
๐k =
i=1
k
uiฯiviT
Then,min
๐ โโn รd rank ๐ โคk๐ โ ๐ 2 = ๐ โ ๐k 2 = ฯk+1
ฯk+1 is the largest singular value of ๐ โ ๐k.
๐k is the best rank k 2-norm approximation of ๐.
Rank-k approximation in the spectral norm
๐ =
i=1
๐
uiฯiviT
โข Determining range, null space and rank (also numerical rank).
โข Matrix approximation.
โข Inverse and Pseudo-inverse: โข If ๐ = ๐ ๐บ ๐T and ๐บ is full rank, then ๐โ1 = ๐ ๐บโ1๐T.
โข If ๐บ is singular, then its pseudo-inverse is given by ๐โ = ๐ ๐บโ ๐T , where
๐บโ is formed by replacing every nonzero entry by its reciprocal.
โข Least squares: โข If we need to solve ๐๐ฑ = b in the least-squares sense, then ๐ฑLS =
๐ ๐บโ ๐T b
โข Denoising โ Small singular values typically correspond to noise.
Applications of SVD
9/28/2018
12
โข Input matrix: term-document matricesโข Rows: represents words.
โข Columns: represents documents.
โข Value: the count of the words in the document.
โข Example:
Latent Semantic Analysis using SVD
๐ =
1 1 1 0 0 00 0 0 0 1 00 1 0 0 0 01 1 1 1 1 10 1 0 0 1 0
โข Consider ๐, the term-document matrix.
โข Then, โข ๐ is the SVD term matrix
โข ๐ is the SVD document matrix
โข SVD provides a low rank approximation for ๐.
โข Constrained optimization problemโข Goal: Represent ๐ as ๐k with low Frobenius norm for the error ๐ - ๐k
Latent Semantic Indexing (LSI) ๐n รd = ๐n ร k๐บk รk๐k รdT
9/28/2018
13
Latent Semantic Indexing (LSI) ๐n รd = ๐n ร k๐บk รk๐k รdT
Latent Semantic Indexing (LSI) ๐n รd = ๐n ร k๐บk รk๐k รdT
k = 2
We can get rid of zero valued columns and rowsAnd have a 2 x 2 concept strength matrix
We can get rid of zero valued columnsAnd have a 5 x 2 term-to-concept similarity matrix
We can get rid of zero valued columnsAnd have a 2 x 6concept-to-doc similarity matrix
9/28/2018
14
Latent Semantic Indexing (LSI)
30.02dim
44.01dimr
Q
0
0
0
0
1cos
truck
car
moon
astronaut
monaut
Q
0)2,cos( dQ 88.0)2,cos( dQr
Original space Reduced latent semantic space
We see that query is not related to document 2 in the original space but in the latent semantic space they become highly related.
โข SVD allow words and documents to be mapped into the same "latent semantic spaceโ.
โข LSI projects queries and documents into a space with latent semantic dimensions.โข Co-occurring words are projected on the same dimensions
โข Non-co-occurring words are projected onto different dimensions
โข LSI captures similarities between wordsโข For example, we want to project โcarโ and โautomobileโ onto the same
dimension.
โข Dimensions of the reduced semantic space correspond to the axes of greatest variation in the original space.
Latent Semantic Indexing (LSI)
9/28/2018
15
โข Extracting information from link structures of a hyperlinked environment, rank pages relevant to a topic
โข Essentials:โข Authorities
โข Hubs
โข Goal: Identify good authorities and hubs for a topic.
โข Each page receive two scores, โข Authority score ๐ด(๐): It estimates value of content on page
โข Hub score ๐ป(๐): It estimates value of links on page
Kleinbergโs Algorithm Hyperlink-Induced Topic Search (HITS)aka โhubs and authoritiesโ
โข For a topic, authorities are relevant nodes which are referred by many hubs. (high in degree)
โข For a topic, hubs are nodes which connect many related authorities for that topic. (high out degree)
Authorities and Hubs
9/28/2018
16
โข Three Steps1. Create a focused base-set of the Web.
โข Start with a root set.
โข Add any page pointed by a page in the root set to it.
โข Add any page that points to a page in the root set to it (at most d).
โข The extended root set becomes our base set.
2. Iteratively compute hub and authority scores.โข A(p): sum of H q for all q pointing to p.
โข H(q): sum of A p for all p pointing to q.
โข Starts with all scores as 1, and Iteratively repeat till convergence.
3. Filter out the top hubs and authorities
HITS (cont.)
โข G (root set) is a directed graph with web pages as nodes and their links.
โข G can be presented as a connectivity matrix Aโข A(i,j)=1 only if i-th page points to j-th page.
โข Authority weights can be represented as a unit vector aโข ai The authority weight of the i-th page
โข Hub weights can be represented as a unit vector hโข hi : The hub weight of the i-th page
Matrix Notation
9/28/2018
17
โข Updating authority weights: a = ATh
โข Updating hub weights:h = Aa
โข After k iterations:a1 = ATh0h1 = Aa1
โ h1 = AATh0โ hk = (AAT)kh0
โข Convergenceโข ak: Converges to principal eigen vector of ATAโข hk: Converges to principal eigen vector of AAT
Algorithm
Given A โ โ+n ร d and a desired rank k โช min(n, d),
Find W โ โ+n ร k and H โ โ+
k รn s.t. Aโ WH.
โข minWโฅ0,H โฅ0
||A โWH||F
โข Nonconvex.
โข W and H not unique ( e.g. W = WD โฅ 0, H = Dโ1H โฅ 0)
Notation: โ+ nonnegative real numbers
Nonnegative Matrix Factorization (NMF)
9/28/2018
18
โข SVD gives: A = UฮฃVT
โข Then, A โ UฮฃVT F โค min A โWH F
โข Then WHY NMF???
โข NMF works better in terms of its non-negativity constraints. Example in โข Text mining. (A is represented as counts, so is strictly
positive.)
Nonnegative Matrix Factorization (NMF)
โข https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/positive-definite-matrices-and-applications/singular-value-decomposition/MIT18_06SCF11_Ses3.5sum.pdf
โข https://archive.siam.org/meetings/sdm11/park.pdf
References