Text Analytics (Text Mining) -...
Transcript of Text Analytics (Text Mining) -...
![Page 1: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/1.jpg)
Text Analytics (Text Mining) LSI (uses SVD), Visualization
CSE 6242 / CX 4242 Apr 3, 2014
Duen Horng (Polo) ChauGeorgia Tech
Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le Song
![Page 2: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/2.jpg)
Singular Value Decomposition (SVD): Motivation
Problem #1: "" Text - LSI uses SVD find “concepts”""
Problem #2: "" Compression / dimensionality reduction
![Page 3: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/3.jpg)
SVD - MotivationProblem #1: text - LSI: find “concepts”
![Page 4: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/4.jpg)
SVD - MotivationCustomer-product, for recommendation system:
bread
lettu
cebe
ef
vegetarians
meat eaters
tomato
sch
icken
![Page 5: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/5.jpg)
SVD - Motivation• problem #2: compress / reduce
dimensionality
![Page 6: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/6.jpg)
Problem - Specification~10^6 rows; ~10^3 columns; no updates;"Random access to any cell(s); small error: OK
![Page 7: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/7.jpg)
SVD - Motivation
![Page 8: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/8.jpg)
SVD - Motivation
![Page 9: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/9.jpg)
SVD - Definition(reminder: matrix multiplication)
x
3 x 2 2 x 1
=
![Page 10: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/10.jpg)
SVD - Definition(reminder: matrix multiplication)
x
3 x 2 2 x 1
=
3 x 1
![Page 11: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/11.jpg)
SVD - Definition(reminder: matrix multiplication)
x
3 x 2 2 x 1
=
3 x 1
![Page 12: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/12.jpg)
SVD - Definition(reminder: matrix multiplication)
x
3 x 2 2 x 1
=
3 x 1
![Page 13: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/13.jpg)
SVD - Definition(reminder: matrix multiplication)
x =
![Page 14: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/14.jpg)
SVD - DefinitionA[n x m] = U[n x r] Λ [ r x r] (V[m x r])T"
"
A: n x m matrix e.g., n documents, m terms"U: n x r matrix e.g., n documents, r concepts"Λ: r x r diagonal matrix r : rank of the matrix; strength of each ‘concept’"V: m x r matrix " e.g., m terms, r concepts
![Page 15: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/15.jpg)
SVD - DefinitionA[n x m] = U[n x r] Λ [ r x r] (V[m x r])T
= x xn
m r
rrn
mr
n documentsm terms
n documentsr concepts"
diagonal entries: concept strengths
m termsr concepts
![Page 16: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/16.jpg)
SVD - PropertiesTHEOREM [Press+92]:
always possible to decompose matrix A into A = U Λ VT"
U, Λ, V: unique, most of the time"U, V: column orthonormal "i.e., columns are unit vectors, orthogonal to each other"
UT U = I"VT V = I
Λ: diagonal matrix with non-negative diagonal entires, sorted in decreasing order
(I: identity matrix)
![Page 17: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/17.jpg)
SVD - ExampleA = U Λ VT - example:
datainf.
retrievalbrain lung
=CS
MD
x x
![Page 18: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/18.jpg)
SVD - Example• A = U Λ VT - example:
datainf.
retrievalbrain lung
=CS
MD
x x
CS-conceptMD-concept
![Page 19: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/19.jpg)
SVD - Example• A = U Λ VT - example:
datainf.
retrievalbrain lung
=CS
MD
x x
CS-conceptMD-concept
doc-to-concept similarity matrix
![Page 20: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/20.jpg)
SVD - Example• A = U Λ VT - example:
datainf.
retrievalbrain lung
=CS
MD
x x
‘strength’ of CS-concept
![Page 21: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/21.jpg)
SVD - Example• A = U Λ VT - example:
datainf.
retrievalbrain lung
=CS
MD
x x
term-to-concept similarity matrix
CS-concept
![Page 22: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/22.jpg)
SVD - Example• A = U Λ VT - example:
datainf.
retrievalbrain lung
=CS
MD
x x
term-to-concept similarity matrix
CS-concept
![Page 23: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/23.jpg)
SVD - Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!• U: document-to-concept similarity matrix!• V: term-to-concept sim. matrix!•Λ: its diagonal elements: ‘strength’ of each
concept
![Page 24: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/24.jpg)
SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!Q: if A is the document-to-term matrix, what
is AT A?!A:!Q: A AT ?!A:
![Page 25: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/25.jpg)
SVD – Interpretation #1‘documents’, ‘terms’ and ‘concepts’:!Q: if A is the document-to-term matrix, what
is AT A?!A: term-to-term ([m x m]) similarity matrix!Q: A AT ?!A: document-to-document ([n x n]) similarity
matrix
![Page 26: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/26.jpg)
• V are the eigenvectors of the covariance matrix ATA !
!
• U are the eigenvectors of the Gram (inner-product) matrix AAT
SVD properties
Thus, SVD is closely related to PCA, and can be numerically more stable. For more info, see:http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.
![Page 27: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/27.jpg)
SVD - Interpretation #2best axis to project on !
(‘best’ = min sum of squares of projection errors)
min RMS error
v1
First Singular Vector
![Page 28: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/28.jpg)
SVD - Interpretation #2
• A = U Λ VT - example:
= x x
variance (‘spread’) on the v1 axis
v1
![Page 29: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/29.jpg)
SVD - Interpretation #2
• A = U Λ VT - example:!–U Λ gives the coordinates of the points in the
projection axis
= x x
![Page 30: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/30.jpg)
SVD - Interpretation #2• More details!• Q: how exactly is dim. reduction done?
= x x
![Page 31: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/31.jpg)
SVD - Interpretation #2• More details!• Q: how exactly is dim. reduction done?!• A: set the smallest singular values to zero:
= x x
![Page 32: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/32.jpg)
SVD - Interpretation #2
~ x x
![Page 33: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/33.jpg)
SVD - Interpretation #2
~ x x
![Page 34: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/34.jpg)
SVD - Interpretation #2
~ x x
![Page 35: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/35.jpg)
SVD - Interpretation #2
~
![Page 36: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/36.jpg)
SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix
= x x
![Page 37: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/37.jpg)
SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix
= x x
![Page 38: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/38.jpg)
SVD - Interpretation #3• finds non-zero ‘blobs’ in a data matrix =!• ‘communities’ (bi-partite cores, here)
Row 1
Row 4
Col 1
Col 3
Col 4Row 5
Row 7
![Page 39: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/39.jpg)
SVD algorithm
• Numerical Recipes in C (free)
![Page 40: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/40.jpg)
SVD - Interpretation #3• Drill: find the SVD, ‘by inspection’!!• Q: rank = ??
= x x?? ??
??
![Page 41: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/41.jpg)
SVD - Interpretation #3• A: rank = 2 (2 linearly independent rows/
cols)
= x x??
????
??
![Page 42: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/42.jpg)
SVD - Interpretation #3• A: rank = 2 (2 linearly independent rows/
cols)
= x x
orthogonal??
![Page 43: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/43.jpg)
SVD - Interpretation #3• column vectors: are orthogonal - but not
unit vectors:
= x x
1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0
0 1/sqrt(2)0 1/sqrt(2)
![Page 44: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/44.jpg)
SVD - Interpretation #3• and the singular values are:
= x x
1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0
0 1/sqrt(2)0 1/sqrt(2)
![Page 45: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/45.jpg)
SVD - Interpretation #3• Q: How to check we are correct?
= x x
1/sqrt(3) 01/sqrt(3) 01/sqrt(3) 0
0 1/sqrt(2)0 1/sqrt(2)
![Page 46: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/46.jpg)
SVD - Interpretation #3• A: SVD properties:!
–matrix product should give back matrix A –matrix U should be column-orthonormal, i.e.,
columns should be unit vectors, orthogonal to each other!
–ditto for matrix V –matrix Λ should be diagonal, with non-negative
values
![Page 47: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/47.jpg)
SVD - ComplexityO(n*m*m) or O(n*n*m) (whichever is less)!!
Faster version, if just want singular values! or if we want first k singular vectors! or if the matrix is sparse [Berry]!!
No need to write your own!Available in most linear algebra packages (LINPACK, matlab, Splus/R, mathematica ...)
![Page 48: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/48.jpg)
References• Berry, Michael: http://www.cs.utk.edu/~lsi/!• Fukunaga, K. (1990). Introduction to Statistical
Pattern Recognition, Academic Press.!• Press, W. H., S. A. Teukolsky, et al. (1992).
Numerical Recipes in C, Cambridge University Press.
![Page 49: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/49.jpg)
Case study - LSIQ1: How to do queries with LSI?!Q2: multi-lingual IR (english query, on
spanish text?)
![Page 50: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/50.jpg)
Case study - LSIQ1: How to do queries with LSI?!Problem: Eg., find documents with ‘data’
datainf.
retrievalbrain lung
=CS
MD
x x
![Page 51: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/51.jpg)
Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?
datainf.
retrievalbrain lung
=CS
MD
x x
![Page 52: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/52.jpg)
Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?
datainf.
retrievalbrain lung
q=
term1
term2
v1
q
v2
![Page 53: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/53.jpg)
Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?
datainf.
retrievalbrain lung
q=
term1
term2
v1
q
v2
A: inner product (cosine similarity) with each ‘concept’ vector vi
![Page 54: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/54.jpg)
Case study - LSIQ1: How to do queries with LSI?!A: map query vectors into ‘concept space’ – how?
datainf.
retrievalbrain lung
q=
term1
term2
v1
q
v2
A: inner product (cosine similarity) with each ‘concept’ vector vi
q o v1
![Page 55: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/55.jpg)
Case study - LSIcompactly, we have:! q V= qconcept!
Eg:data
inf.retrieval
brain lung
q=
term-to-concept similarities
=
CS-concept
![Page 56: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/56.jpg)
Case study - LSIDrill: how would the document (‘information’,
‘retrieval’) be handled by LSI?
![Page 57: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/57.jpg)
Case study - LSIDrill: how would the document (‘information’,
‘retrieval’) be handled by LSI? A: SAME:!dconcept = d V Eg: data
inf.retrieval
brain lung
d=
term-to-concept similarities
=
CS-concept
![Page 58: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/58.jpg)
Case study - LSIObservation: document (‘information’,
‘retrieval’) will be retrieved by query (‘data’), although it does not contain ‘data’!!
datainf.
retrievalbrain lung
d=
CS-concept
q=
![Page 59: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/59.jpg)
Case study - LSIQ1: How to do queries with LSI?!Q2: multi-lingual IR (english query, on
spanish text?)
![Page 60: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/60.jpg)
Case study - LSI• Problem:!
–given many documents, translated to both languages (eg., English and Spanish)!
–answer queries across languages
![Page 61: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/61.jpg)
Case study - LSI• Solution: ~ LSI
datainf.
retrievalbrain lung
CS
MD
datosinformacion
![Page 62: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/62.jpg)
Switch Gear to Text Visualization
What comes up to your mind?!
What visualization have you seen before?
62
![Page 63: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/63.jpg)
Word/Tag Cloud (still popular?)
http://www.wordle.net!63
![Page 64: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/64.jpg)
Word Counts (words as bubbles)
http://www.infocaptor.com/bubble-my-page64
![Page 65: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/65.jpg)
Word Tree
http://www.jasondavies.com/wordtree/65
![Page 66: Text Analytics (Text Mining) - Visualizationpoloclub.gatech.edu/cse6242/2014spring/lectures/CSE6242-20140403-TextLsiVisApp.pdfApr 03, 2014 · Text Analytics (Text Mining) LSI (uses](https://reader030.fdocuments.net/reader030/viewer/2022041102/5edbbc0dad6a402d66661918/html5/thumbnails/66.jpg)
Phrase Net
http://www-958.ibm.com/software/data/cognos/manyeyes/page/Phrase_Net.html 66
Visualize pairs of words that satisfy a particular pattern, e.g., X and Y