Nonnegative Matrix Factorization: Algorithms and Applications
Lecture 3: Nonnegative Matrix Factorization: Algorithms and ......Lecture 3: Nonnegative Matrix...
Transcript of Lecture 3: Nonnegative Matrix Factorization: Algorithms and ......Lecture 3: Nonnegative Matrix...
Lecture 3: Nonnegative Matrix Factorization:Algorithms and Applications
Haesun Park
School of Computational Science and EngineeringGeorgia Institute of Technology
Atlanta GA, U.S.A.
SIAM Gene Golub Summer School, Aussois France,June 18, 2019
This work was supported in part by
Park NMF 1 / 53
Outline
Overview of NMFFast algorithms with Frobenius norm
Theoretical results on convergenceMultiplicative updatingAlternating nonnegativity constrained least Squares: Active-settype methods, ...Hierarchical alternating least squares
Variations/Extensions of NMF : sparse NMF, regularized NMF,nonnegative PARAFACEfficient adaptive NMF algorithmsApplications of NMF, NMF for ClusteringComputational resultsDiscussions
Park NMF 2 / 53
Nonnegative Matrix Factorization (NMF)(Lee&Seung 99, Paatero&Tapper 94)
Given A â R+mĂn and a desired rank k << min(m,n),
find W â R+mĂk and H â R+
kĂn s.t. A âWH.minWâ„0,Hâ„0 âAâWHâFNonconvexW and H not unique ( e.g. W = WD â„ 0, H = Dâ1H â„ 0)
Notation: R+: nonnegative real numbers
Park NMF 3 / 53
Nonnegative Matrix Factorization (NMF)(Lee&Seung 99, Paatero&Tapper 94)
Given A â R+mĂn and a desired rank k << min(m,n),
find W â R+mĂk and H â R+
kĂn s.t. A âWH.minWâ„0,Hâ„0 âAâWHâFNMF improves the approximation as k increases:If rank+(A) > k ,
minWk+1â„0,Hk+1â„0
âAâWk+1Hk+1âF < minWkâ„0,Hkâ„0
âAâWkHkâF ,
Wi â R+mĂi and Hi â R+
iĂn
But SVD does better: if A = UÎŁV T , thenâAâ Uk ÎŁkV T
k âF †minâAâWHâF , W â R+mĂk and H â R+
kĂn
So Why NMF? Dimension Reduction withBetter Interpretation/Lower Dim. Representation for NonnegativeData.
Park NMF 4 / 53
Nonnegative Rank of A â R+mĂn
(J. Cohen and U. Rothblum, LAA, 93)
rank+(A), is the smallest integer k for which there existV â R+
mĂk and U â R+kĂn such that A = VU.
Note: rank(A) †rank+(A) †min(m,n)If rank(A) †2, then rank+(A) = rank(A).If either m â {1,2,3} or n â {1,2,3}, then rank+(A) = rank(A).
(Perron-Frobenius) There are nonnegative left and right singularvectors u1 and v1 of A associated with the largest singular valueÏ1.rank 1 SVD of A = best rank-one NMF of A
Park NMF 5 / 53
Nonnegative Rank of A â R+mĂn
(J. Cohen and U. Rothblum, LAA, 93)
rank+(A), is the smallest integer k for which there existV â R+
mĂk and U â R+kĂn such that A = VU.
Note: rank(A) †rank+(A) †min(m,n)If rank(A) †2, then rank+(A) = rank(A).If either m â {1,2,3} or n â {1,2,3}, then rank+(A) = rank(A).
(Perron-Frobenius) There are nonnegative left and right singularvectors u1 and v1 of A associated with the largest singular valueÏ1.rank 1 SVD of A = best rank-one NMF of A
Park NMF 5 / 53
Applications of NMF
Text miningTopic model: NMF as an alternative way for PLSI ( Gaussier et al.,05; Ding et al., 08)Document clustering (Xu et al., 03; Shahnaz et al., 06)Topic detection and trend tracking, email analysis (Berry et al., 05;Keila et al., 05; Cao et al., 07)
Image analysis and computer visionFeature representation, sparse coding (Lee et al., 99; Guillamet etal., 01; Hoyer et al., 02; Li et al. 01)Video tracking (Bucak et al., 07)
Social networkCommunity structure and trend detection ( Chi et al., 07; Wang etal., 08)Recommendation system (Zhang et al., 06)
Bioinformatics-microarray data analysis (Brunet et al., 04, H. Kimand Park, 07)Acoustic signal processing, blind source separating (Cichocki et al.,04)Financial data (Drakakis et al., 08)Chemometrics (Andersson and Bro, 00)and SO MANY MORE...
Park NMF 6 / 53
Algorithms for NMF
Multiplicative update rules: Lee and Seung, 99Alternating least squares (ALS): Berry et al 06Alternating nonnegative least squares (ANLS)
Lin, 07, Projected gradient descentD. Kim et al., 07, Quasi-NewtonH. Kim and Park, 08, Active-setJ. Kim and Park, 08, Block principal pivoting
Other algorithms and variantsCichocki et al., 07, Hierarchical ALS (HALS)Ho, 08, Rank-one Residue Iteration (RRI)Zdunek, Cichocki, Amari 06, Quasi-NewtonChu and Lin, 07, Low dim polytope approx.Other rank-1 downdating based algorithms (Vavasis,..)C. Ding, T. Li, tri-factor NMF, orthogonal NMF, ...Cichocki, Zdunek, Phan, Amari: NMF and NTF: Applications toExploratory Multi-way Data Analysis and Blind Source Separation,Wiley, 09Andersson and Bro, Nonnegative Tensor Factorization, 00And SO MANY MORE...
Park NMF 7 / 53
Block Coordinate Descent (BCD) Method
A constrained nonlinear problem:
min f (x)(e.g., f (W ,H) = âAâWHâF )
subject to x â X = X1 Ă X2 à · · · Ă Xp,
where x = (x1, x2, . . . , xp), xi â Xi â Rni , i = 1, . . . ,p.Block Coordinate Descent method generatesx (k+1) = (x (k+1)
1 , . . . , x (k+1)p ) by
x (k+1)i = arg min
ΟâXi
f (x (k+1)1 , . . . , x (k+1)
iâ1 , Ο, x (k)i+1, . . . , x
(k)p ).
Th. (Bertsekas, 99): Suppose f is continuously differentiable over theCartesian product of closed, convex sets X1,X2, . . . ,Xp and suppose foreach i and x â X , the minimum for
minΟâXi
f (x (k+1)1 , . . . , x (k+1)
iâ1 , Ο, x (k)i+1, . . . , x
(k)p )
is uniquely attained. Then every limit point of the sequence generatedby the BCD method {x (k)} is a stationary point.NOTE: Uniqueness not required when p = 2 (Grippo and Sciandrone, 00).
Park NMF 8 / 53
Block Coordinate Descent (BCD) Method
A constrained nonlinear problem:
min f (x)(e.g., f (W ,H) = âAâWHâF )
subject to x â X = X1 Ă X2 à · · · Ă Xp,
where x = (x1, x2, . . . , xp), xi â Xi â Rni , i = 1, . . . ,p.Block Coordinate Descent method generatesx (k+1) = (x (k+1)
1 , . . . , x (k+1)p ) by
x (k+1)i = arg min
ΟâXi
f (x (k+1)1 , . . . , x (k+1)
iâ1 , Ο, x (k)i+1, . . . , x
(k)p ).
Th. (Bertsekas, 99): Suppose f is continuously differentiable over theCartesian product of closed, convex sets X1,X2, . . . ,Xp and suppose foreach i and x â X , the minimum for
minΟâXi
f (x (k+1)1 , . . . , x (k+1)
iâ1 , Ο, x (k)i+1, . . . , x
(k)p )
is uniquely attained. Then every limit point of the sequence generatedby the BCD method {x (k)} is a stationary point.NOTE: Uniqueness not required when p = 2 (Grippo and Sciandrone, 00).
Park NMF 8 / 53
BCD with k(m + n) Scalar Blocks
W
H
A
Minimize functions of wij or hij while all other components in Wand H are fixed:
wij â arg minwijâ„0
â(rTi â
âk 6=j
wikhTk )â wijhT
j â2
hij â arg minhijâ„0â(aj â
âk 6=i
wkhkj)â wihijâ2
where W =(
w1 · · · wk), H =
hT1...
hTk
and
A =(
a1 · · · an)
=
rT1...
rTm
Scalar quadratic function, closed form solution.
Park NMF 9 / 53
BCD with k(m + n) Scalar Blocks
Lee and Seung (01)âs multiplicative updating (MU) rule
wij â wij(AHT )ij
(WHHT )ij, hij â hij
(W T A)ij
(W T WH)ij
Derivation based on gradient-descent form:
wij â wij +wij
(WHHT )ij
[(AHT )ij â (WHHT )ij
]hij â hij +
hij
(W T WH)ij
[(W T A)ij â (W T WH)ij
]Rewriting of the solution of coordinate descent:
wij â[wij +
1(HHT )jj
((AHT )ij â (WHHT )ij
)]+
hij â[hij +
1(W T W )ii
((W T A)ij â (W T WH)ij
)]+
In MU, conservative steps are taken to ensure nonnegativity.Bertsekasâ Th. on convergence is not applicable to MU.
Park NMF 10 / 53
BCD with 2k Vector Blocks
W
H
A
Minimize functions of wi or hi while all other components in Wand H are fixed:
âAâkâ
j=1
wjhTj âF = â(Aâ
kâj=1j 6=i
wjhTj )â wihT
i âF = âR(i) â wihTi âF
wi â arg minwiâ„0âR(i) â wihT
i âF
hi â arg minhiâ„0âR(i) â wihT
i âF
Each subproblem has the form minxâ„0 âcxT âGâF andhas a closed form solution x = [GT c
cT c ]+ !Hierarchical Alternating Least Squares (HALS) (Cichocki et al, 07, 09),(actually HA-NLS)Rank-one Residue Iteration (RRI) (Ho, 08)
Park NMF 11 / 53
BCD with Scalar Blocks vs. 2k Vector Blocks
W
H
A
W
H
A
In scalar BCD, w1j ,w2j , · · · ,wmj can be computed independently.Also, hi1,hi2, · · · ,hin can be computed independently.â scalar BCDâ 2k vector BCD in NMF
Park NMF 12 / 53
Successive Rank-1 Deflation in SVD and NMF
Successive rank-1 deflation works for SVD but not for NMFAâ Ï1u1vT
1 â Ï2u2vT2 ? Aâ w1hT
1 â w2hT2 ? 4 6 0
6 4 00 0 1
=
1â2â 1â
20
1â2
1â2
00 0 1
10 0 0
0 2 00 0 1
1â2
1â2
01â2â 1â
20
0 0 1
The sum of two successive best rank-1 nonnegative approx. is 4 6 0
6 4 00 0 1
â 5 5 0
5 5 00 0 0
+
0 0 00 0 00 0 1
The best rank-2 nonnegative approx. is
WH =
4 6 06 4 00 0 0
=
4 66 40 0
( 1 0 00 1 0
)NOTE: 2k vector BCD 6= successive rank-1 deflation for NMF
Park NMF 13 / 53
BCD with 2 Matrix Blocks
W
H
A
Minimize functions of W or H while the other is fixed:
W â arg minWâ„0âHT W T â ATâF
H â arg minHâ„0âWH â AâF
Alternating Nonnegativity-constrained Least Squares (ANLS)No closed form solution.
Projected gradient method (Lin, 07)
Projected quasi-Newton method (D. Kim et al., 07)
Active-set method (H. Kim and Park, 08)
Block principal pivoting method (J. Kim and Park, 08)
ALS (M. Berry et al. 06) ??
Park NMF 14 / 53
NLS : minXâ„0 âCX â Bâ2F =
âminxi âCxi â biâ2
2
Nonnegativity-constrained Least Squares (NLS) problemProjected Gradient method (Lin, 07) x (k+1) â P+(x (k) â αkâf (x (k)))* P+(·): Projection operator to the nonnegative orthant* Back-tracking selection of step αkProjected Quasi-Newton method (Kim et al., 07)
x (k+1) â[
yzk
]=
[P+
[y (k) â αD(k)âf (y (k))
]0
]* Gradient scaling only for nonzero variablesThese do not fully exploit the structues of the NLS problems inNMFActive Set method (H. Kim and Park, (08)
Lawson and Hanson (74), Bro and De Jong (97), Van Benthem and Keenan (04) )
Block principal pivoting method (J. Kim and Park, 08)
linear complementarity problems (LCP) (Judice and Pires, 94)
Park NMF 15 / 53
Active-set type Algorithms forminxâ„0 âCx â bâ2,C : m Ă k
KKT conditions: y = CT Cx â CT by â„ 0, x â„ 0, xiyi = 0, i = 1, · · · , kIf we know P = {i |xi > 0} in the solution in advancethen we only need to solve min âCPxP â bâ2, and the rest ofxi = 0, where CP : columns of C with the indices in P
C x b
+
+
0
0
+
*
Park NMF 16 / 53
Active-set type Algorithms forminxâ„0 âCx â bâ2,C : m Ă k
KKT conditions: y = CT Cx â CT by â„ 0, x â„ 0, xiyi = 0, i = 1, · · · , kActive set method (Lawson and Hanson 74)
E = {1, · · · , k} (i.e. x = 0 initially), P = nullRepeat while E not null and yi < 0 for some i
Exchange indices between E and P while keeping feasibility andreducing the objective function value
Block Principal Pivoting method (Portugal et al. 94 MathComp):Lacks any monotonicity or feasibility but finds a correctactive-passive set partitioning.Guess two index sets P and E that partition {1, · · · , k}Repeat
Let xE = 0 and xP = argminxP âCPxP â bâ22
Then yE = CTE (CPxP â b) and yP = 0
If xP â„ 0 and yE â„ 0, then optimal values are found.Otherwise, update P and E .
Park NMF 17 / 53
How block principal pivoting works
k = 10, Initially P = {1,2,3,4,5}, E = {6,7,8,9,10}Update by CT
P CPxP = CTP b, and yE = CT
E (CPxP â b)
P
P
P
P
P
E
E
E
E
E
0
0
0
0
0
0
0
0
0
0
yx
Park NMF 18 / 53
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP â b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
Park NMF 19 / 53
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP â b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
0
0
0
0
0
0
0
0
0
0
yx
Park NMF 20 / 53
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP â b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
Park NMF 21 / 53
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP â b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
P
E
P
P
E
E
E
P
E
E
0
0
0
0
0
0
0
0
0
0
yx
Park NMF 22 / 53
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP â b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
P
E
P
P
E
E
E
P
E
E
+
0
+
+
0
0
0
+
0
0
0
+
0
0
+
+
+
0
+
+
yx
Park NMF 23 / 53
How block principal pivoting works
Update by CTP CPxP = CT
P b, and yE = CTE (CPxP â b)
P
P
P
P
P
E
E
E
E
E
+
-
-
+
-
0
0
0
0
0
0
0
0
0
0
-
+
-
+
+
yx
P
E
E
P
E
P
E
P
E
E
+
0
0
+
0
-
0
+
0
0
0
+
-
0
+
0
+
0
+
+
yx
P
E
P
P
E
E
E
P
E
E
+
0
+
+
0
0
0
+
0
0
0
+
0
0
+
+
+
0
+
+
yx
Solved!
Park NMF 24 / 53
Refined Exchange Rules
Active set algorithm is a special instance of single principalpivoting algorithm (H. Kim and Park, SIMAX 08)
Block exchange rule without modification does not always work.
The residual is not guaranteed to monotonically decrease.Block exchange rule may cycle (although rarely).Modification: if the block exchange rule fails to decrease thenumber of infeasible variables, use a backup exchange ruleWith this modification, block principal pivoting algorithm finds thesolution of NLS in a finite number of iterations.
Park NMF 25 / 53
Structure of NLS problems in NMF
Matrix is long and thin, solutions vectors short, many right handside vectors.minHâ„0 âWH â Aâ2F
minWâ„0â„â„HT W T â AT
â„â„2F
Park NMF 26 / 53
Efficient Algorithm for minXâ„0 âCX â Bâ2F
[Bro and de Jong, 97, Van Benthem and Keenan, 04]
Precompute CT C and CT BUpdate xP and yE by CT
P CPxP = CTP b and yE = CT
E CPxP â CTE b
All coefficients can be retrieved from CT C and CT BCT C and CT B is small. Storage is not a problem.
âExploit common P and E sets among col. in B in each iteration.X is flat and wide. â More common cases of P and E sets.
Proposed algorithm for NMF (ANLS/BPP):ANLS framework + Block principal pivoting algorithm for NLS withimprovements for multiple right-hand sides(J. Kim and H. Park, ICDM 08)
Park NMF 27 / 53
Rank Deficient NLS
What happens when C is rank deficient?Active set method: if the first matrix with passive set is of full rank,the method never runs into rank deficient subproblems[Drake et.al., Info Fusion 10]
Block principal pivoting: subproblems may become rank deficient
Park NMF 28 / 53
Sparse NMF and Regularized NMF
Sparse NMF (for sparse H) (H. Kim and Park, Bioinformatics, 07)
minW ,H
âAâWHâ2F + η âWâ2F + ÎČ
nâj=1
âH(:, j)â21
,âij ,Wij ,Hij â„ 0
ANLS reformulation (H. Kim and Park, 07) : alternate the following
minHâ„0
â„â„â„â„( WâÎČe1Ăk
)H â
(A
01Ăn
)â„â„â„â„2
F
minWâ„0
â„â„â„â„( HTâηIk
)W T â
(AT
0kĂm
)â„â„â„â„2
F
Regularized NMF (Pauca, et al. 06):
minW ,H
{âAâWHâ2F + η âWâ2F + ÎČ âHâ2F
}, âij ,Wij ,Hij â„ 0.
ANLS reformulation : alternate the following
minHâ„0
â„â„â„â„( WâÎČIk
)H â
(A
0kĂn
)â„â„â„â„2
F
minWâ„0
â„â„â„â„( HTâηIk
)W T â
(AT
0kĂm
)â„â„â„â„2
F
Park NMF 29 / 53
Nonnegative PARAFAC
Consider a 3-way Nonnegative Tensor T â RmĂnĂp+ and
its PARAFAC minA,B,Câ„0 âTâ [[ABC]]â2Fwhere A â RmĂk
+ , B â RnĂk+ , C â RpĂk
+ .The loading matrices (A,B, and C) can be iteratively estimated byan NLS algorithm such as block principal pivoting method.
Park NMF 30 / 53
Nonnegative PARAFAC
Iterate until a stopping criteria is satisfied:minAâ„0
â„â„YBCAT â T(1)
â„â„F
minBâ„0â„â„YACBT â T(2)
â„â„F
minCâ„0â„â„YABCT â T(3)
â„â„F where
YBC = B ïżœ C â R(np)Ăk , T(1) â R(np)Ăm,YAC = Aïżœ C â R(mp)Ăk , T(2) â R(mp)Ăn,YAB = Aïżœ B â R(mn)Ăk , T(3) â R(mn)Ăp unfolded matrices,and F ïżœG(mn)Ă(k) = [f1 â g1 f2 â g2 · · · fk â gk ] is theKhatri-Rao product of F â RmĂk and G â RnĂk .
Matrices are longer and thinner, ideal for ANLS/BPP.Can be similarly extended to higher order tensors.
Park NMF 31 / 53
Experimental Results (NMF)
NMF Algorithms ComparedName Description AuthorANLS-BPP ANLS / block principal pivoting J. Kim and HP 08ANLS-AS ANLS / active set H. Kim and HP 08ANLS-PGRAD ANLS / projected gradient Lin 07ANLS-PQN ANLS / projected quasi-Newton D. Kim et al. 07HALS Hierarchical ALS Cichocki et al. 07MU Multiplicative updating Lee and Seung 01ALS Alternating least squares Berry et al. 06
Park NMF 32 / 53
Residual vs. Execution time
0 10 20 30 40 50 60 70 80 90 100
0.84
0.85
0.86
0.87
0.88
0.89
0.9
time(sec)
rela
tive o
bj. v
alu
e
TDT2, k=10
HALS
MU
ALS
ANLSâPGRAD
ANLSâPQN
ANLSâBPP
0 100 200 300 400 500 600 700
0.58
0.59
0.6
0.61
0.62
0.63
0.64
0.65
time(sec)
rela
tive o
bj. v
alu
e
TDT2, k=160
HALS
MU
ALS
ANLSâPGRAD
ANLSâPQN
ANLSâBPP
TDT2 text data: 19, 009Ă 3, 087, k = 10 and k = 160
Park NMF 33 / 53
Residual vs. Execution time
0 10 20 30 40 50 60 70 80 90 1000.2
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
0.3
time(sec)
rela
tive o
bj. v
alu
e
ATNT, k=10
HALS
MU
ALS
ANLSâPGRAD
ANLSâPQN
ANLSâBPP
0 100 200 300 400 500 600 700
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
0.55
time(sec)
rela
tive o
bj. v
alu
e
20 Newsgroups, k=160
HALS
MU
ALS
ANLSâPGRAD
ANLSâPQN
ANLSâBPP
ATNT image data: 10, 304Ă 400, k = 10 and20 Newsgroups text data: 26, 214Ă 11, 314, k = 160
Park NMF 34 / 53
Residual vs. Execution time
0 50 100 150 200 250 300
0.15
0.2
0.25
0.3
0.35
0.4
time(sec)
rela
tive o
bj. v
alu
e
PIE 64, k=80
HALS
MU
ALS
ANLSâPGRAD
ANLSâPQN
ANLSâBPP
0 100 200 300 400 500 600 700
0.15
0.2
0.25
0.3
0.35
time(sec)
rela
tive o
bj. v
alu
e
PIE 64, k=160
HALS
MU
ANLSâPGRAD
ANLSâPQN
ANLSâBPP
PIE 64 image data: 4, 096Ă 11, 554, k = 80 and k = 160
Park NMF 35 / 53
Adaptive NMF for Varying Reduced Rank k â k
Given (W ,H) with k , how to compute (W , H) with k fast?E.g., model selection for NMF clustering
AdaNMFInitialize W and H using W and H
If k > k , compute NMF for AâWH â âW âH. Set W = [W âW ]and H = [H; âH]If k < k , initialize W and H with k pairs of (wi ,hi ) with largestâwihT
i âF = âwiâ2âhiâ2
Update W and H using HALS algorithm.Park NMF 36 / 53
Model Selection in NMF Clustering
Consensus matrix based on A âWH:
Ctij =
{0 max(H(:, i)) = max(H(:, j))
1 max(H(:, i)) 6= max(H(:, j)), t = 1, . . . , l
Dispersion coefficient Ï(k) = 1n2
âni=1ân
j=1 4(Cij â 12)2, where
C = 1lâ
Ct
Reordered Consensus Matrix, k=3
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Reordered Consensus matrix, k=4
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3 3.5 4 4.5 5 5.5 6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Approximation Rank k
Dis
pe
rsio
n c
oe
ffic
ien
t
Reordered Consensus Matrix, k=5
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Reordered Consensus Matrix, k=6
500 1000 1500 2000
200
400
600
800
1000
1200
1400
1600
1800
2000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
AdaNMF Recompute Warmârestart
4
5
6
7
8
9
10
11
12
Exe
cutio
n t
ime
(se
con
ds)
Clustering results on MNIST digit images (784Ă 2000) by AdaNMFwith k = 3,4,5 and 6. Averaged consensus matrices, dispersioncoefficient, execution time
Park NMF 37 / 53
Adaptive NMF for Varying Reduced Rank
0 1 2 3 4 5 6 70.06
0.08
0.1
0.12
0.14
0.16
0.18
Execution time
Re
lative
err
or
Recompute
Warmârestart
AdaNMF
0 2 4 6 8 10 120.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Execution time
Re
lative
err
or
Recompute
Warmârestart
AdaNMF
Relative error vs. exec. time of AdaNMF and ârecomputeâ. Given anNMF of 600Ă 600 synthetic matrix with k = 60, compute NMF withk = 50,80.
Park NMF 38 / 53
Adaptive NMF for Varying Reduced Rank
Theorem: For A â RmĂn+ , If rank+(A) > k , then
min âAâW (k+1)H(k+1)âF < min âAâW (k)H(k)âF ,where W (i) â RmĂi
+ and H(i) â RiĂn+ .
0 20 40 60 80 100 120 140 1600
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Approximation Rank k of NMF
Re
lativ
e O
bje
ctiv
e F
un
ctio
n V
alu
e o
f N
MF
rank=20
rank=40
rank=60
rank=80
5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Reduced rank k of NMFC
lass
ifica
tion
err
or
Training error
Testing error
Rank path on synthetic data set: relative residual vs. kORL Face image (10304Ă 400) classification errors (by LMNN) on trainingand testing set vs. k .k -dim rep. HT of training data T by BPP minHTâ„0 âWHT â TâF
Park NMF 39 / 53
NMF for Dynamic Data (DynNMF)
Given an NMF (W ,H) for A = [ÎŽA A], how to compute NMF(W , H) for A = [A âA] fast ?(Updating and Downdating)
DynNMF (Sliding Window NMF)Initialize H as follows:
Let H be the remaining columns of H.Solve minâHâ„0 âW âH ââAâ2
F using block principal pivotingSet H = [H âH]
Run HALS on A with initial factors W = W and HPark NMF 40 / 53
DynNMF for Dynamic Data
PET2001 data with 3064 images from a surveillance video.DynNMF on 110,592Ă 400 data matrix each time, with 100 newcolumns and 100 obsolete columns. The residual images track themoving vehicle in the video.
Park NMF 41 / 53
NMF for Clustering and Topic Modeling
Find k clusters (topics) in a text corpus represented in X
â Ă
X W
H
xiw1w2w3
h3ih2ih1i
xi â w1h1i + w2h2i + w3h3i
computermusicTuringMozart
...vegetable
0.80.050.1
0.05...0
â 0.8
0.90
0.10...0
+ 0.1
00.50
0.3...0
+ ..
Nonnegative w1,w2,w3:cluster representatives or topics: dist. of keywordsNonnegative h1i ,h2i ,h3i â soft clustering assignment of xi
In NMF, wi âs have equal roles among them unlike in SVD.Successive rank-1 deflation does not work in NMF.
Paatero&Tapper 94, Lee & Seung/Nature 99, Kim & Park/SIMAX 08, Xu et al. SIGIR 03, Kim & Park/SISC 11, Kim, He,Park/JOGO 14...
Park NMF 42 / 53
NMF and K-means
Clustering and Lower Rank Approximation are related.NMF for Clustering: Document (Xu et al. SIGIR 03), Image (Cai et al. ICDM 08), Microarray (Kim & Park, Bio 07), etc.
Objective functions for K-means and NMF may look the same:âi âxi âwÏiâ22 = âX âWHâ2F (Ding et al. SDM 05; Kim & Park, TR 08)
Ïi = j when i-th point is assigned to j-th cluster (j â {1, · · · , k}). However, constraints aredifferent:
K-means: H â {0,1}kĂn,1Tk H = 1T
nNMF: W â„ 0,H â„ 0
Paths to solution:K-means: Expectation-MaximizationNMF: Relax the condition on H to H â„ 0 with orthogonal rows orH â„ 0 with sparse columns - soft clustering
Park NMF 43 / 53
NMF vs K-means
K-means: W : k cluster centroids, hi : cluster membership indicatorNMF: W : basis vectors for rank-k approx., H: k-dim rep. of XSparse NMF (SNMF) (H. Kim & H. Park, Bioinformatics, 07)
Clustering accuracy on TDT2 text data: (aver. among 100 runs)# clusters 2 6 10 14 18K-means 0.8099 0.7295 0.7015 0.6675 0.6675
NMF/ANLS 0.9990 0.8717 0.7436 0.7021 0.7160SNMF/ANLS 0.9991 0.8770 0.7512 0.7269 0.7278
Sparsity constraint improves clustering result (J. Kim and Park, 08):minWâ„0,Hâ„0 âAâWHâ2F + ηâWâ2F + ÎČ
ânj=1 âH(:, j)â21
# of times achieving optimal assignment(a synthetic data set, with a clear cluster structure ):
k 3 6 9 12 15NMF 69 65 74 68 44
SNMF 100 100 100 100 97
NMF and SNMF much better than k-means in general.Park NMF 44 / 53
NMF and Spherical K-means
Equivalence of objective functions is not enough to explain theclustering capability of NMF:
Ex. k = 2K-means NMF
0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.50
0.5
1
1.5
2
2.5
NMF is more related to spherical k-means, than to k-meansâ NMF shown to work well in text data clustering
Park NMF 45 / 53
Symmetirc NMF and Spectral ClusteringSymmetric NMF: minSâ„0 âAâ SSTâF , A â R+
nĂn : affinity matrix
Spectral clusteringâ Eigenvectors (Ng et al. 01), A normalized if needed, Laplacian,...
Symmetric NMF (Ding et al.)â can handle nonlinear structure, and S â„ 0natually captures a cluster structure in S (a new PGD based Alg.)
Ex. Well separated, one loose and theother tight clusters, withexp(ââxi â xjâ2
2/2Ï2)
Eigenvalues of similarity matrix:
0 50 100 150 200 250 300 350 4000
0.2
0.4
0.6
0.8
1
1.2
1.4
Spectral clustering:
â0.5 0 0.5 1
â0.6
â0.4
â0.2
0
0.2
0.4
0.6
0.8
1
Standard
Eigenvectors vs S from SymNMF:
Park NMF 46 / 53
Targeted Topic Modeling
Large text collections such as Wikipedia and Twitter data containmany topics of very wide rangeOnly a subset of data items is related to a specific question andinterest, e.g. documents related to particular event, subject suchas sustainability, brand, or productDirect keyword match or keyword filtering does not work wellWant high recall than precision:retrieving as many relevant documents vs. a small number ofmost relevant documents
How to find the relevant parts of thedocuments?
Park NMF 47 / 53
Targeted Topic Modeling
Iterative application of NMF with topic refinement each step1 Apply initial NMF to find k clusters.2 Check top keywords of each cluster and determine relevant topics.
Assume first k1 topics are relevant and denote the first k1 columnsof matrix W as W1.
3 Refine W1 by setting its small components to zero.4 Solve the following NMF
minW2â„0,Hâ„0
f (W2,H) = â(W1 W2)H â XâF
5 Find more relevant topics and attach their corresponding columnsto W1.
6 Repeat from Step 3 until getting satisfactory results.7 Delete unrelated topics and docs, and redo topic modeling.
Advantage: easy to interprete the result and adjust intermediateresults for interactive computing
Park NMF 48 / 53
Targeted Topic Modeling for Event Detection inDocument Data
Park NMF 49 / 53
Targeted Topic Modeling: Ex. Yemen CeasefireViolation Detection in Arabic Text
Interface for selecting clusters to remove from the dataset
Key documents identified within the clusters (using a threshold parameter)
Reapplication of SmallK to produce refined clustering results
Park NMF 50 / 53
Case Study for Sustainability in Twitter: KeywordExpansion and Retrieval of Relevant Data Items
Direct match on Twitter with 50 specialized keywords related tosustainability such as new urbanism, rain water harvesting, green roof,walkable community, and decentralized energy, retrieved only a verysmall number of tweets.
Discover the casual terms that are used to expresssustainability-related topics in social networks in addition tospecialized keywordsExtract the data items that have relevance to sustainability. Thesedata items may not contain technical or specialized keywords.
Park NMF 51 / 53
Result on Wikipedia
Applying targeted topic modeling to the 60K Wikipedia pages obtainedby keyword refinement, we obtained 5,387 pages that are clearlyrelevant to sustainability. Applying HierNMF2, we discovered 20 topicssuch as solar energy, water energy, efficient electricity usage,sustainable development, renewable energy, and sustainablemanufacturing. A subset of the results is illustrated:
Wiki (5387)
(4412)âdevelopmentâ
âpublicââgovernmentââinternationalââenvironmentalâ
(1349)ârailwayââstationââtransitâârailâârouteâ
(997)ârailwayââtrainsââbuiltââstationââopenedâ
(719)âvillageââlocatedââareaââroadâ
âdistrictâ
Label: 31 (208)âvillageââschoolââdistrictââtownââmixedâ
(511)âbuiltââriverâânorthââportââshipâ
(413)âriverââbuiltâânorthââsiteââroadâ
Label: 111 (41)âfmâ
âradioââbroadcastââstationâ
âbroadcastingâ
(372)âbuiltââriverâânorthââsiteââroadâ
Label: 175 (188)âcastleââlondonââdiedâ
âbuildingââmixedâ
Label: 47 (184)âbridgeââriverâânorthââmineââwaterâ
Label: 15 (53)âseasonââmidwestâ
âasaââtourâ
âchampionshipâ
Label: 7 (278)ârailwayââtrainsââstationâ
âpassengerâârailâ
Label: 3 (352)âtransitââbusâ
âroutesââbusesââcityâ
(3063)âdevelopmentââenvironmentalââinternationalââgovernmentâ
âpolicyâ
(1415)âdevelopmentââenvironmentalââsustainableâ
âglobalââenvironmentâ
(787)âenvironmentalâ
âwaterââwasteâ
âemissionsââproductsâ
(269)âwaterââsoilâ
âspeciesââlandâ
âclimateâ
Label: 61 (43)âproteinââproteinsââdomainââfunctionââbacteriaâ
Label: 29 (226)âwaterâ
âenvironmentalââsoilâ
âclimateâânaturalâ
(518)âproductsâ
âenvironmentalââenergyâ
âefficiencyââwasteâ
Label: 45 (199)âemissionsâânuclearââwasteâ
ârenewableââcarbonâ
Label: 13 (319)âproductsââproductâ
âenvironmentalââsustainableââsustainabilityâ
Label: 5 (628)âinternationalââdevelopmentââsustainableâ
âglobalââcountriesâ
(1648)âcourtââlawâ
ârightsââbornââcaseâ
(1222)âbornâânetâ
âmixedââmusicââbookâ
(981)âmixedââbookââmusicââseriesâânetâ
Label: 57 (350)âbookââgameâ
âreceptionââseriesâ
âcharactersâ
Label: 25 (631)âmusicâ
âlanguageââmixedâânetâ
âalbumâ
Label: 9 (241)âbornâ
âpoliticianââfutsalâ
âministerââplayerâ
(426)âcourtââlawâ
ârightsââcaseâ
âsupremeâ
(282)âbillâ
âgovernmentââlawâ
âcommitteeââstatesâ
(121)ârightsâ
âpoliticalââgovernmentâ
âpoliceââprotestsâ
Label: 113 (85)âpoliceâârightsââprotestsââprotestââactivistsâ
Label: 49 (36)âresolutionââpeaceâânationsââcouncilââsecretaryâ
Label: 17 (161)âbillââlawâ
âlegislationââfederalââstatesâ
Label: 1 (144)âcourtââcaseââlawâ
âappealââsupremeâ
Label: 0 (975)âsolarââpowerââmwââplantâ
âelectricityâ
Solar Energy SustainableDevelopment
Sustainable Manufacturing
andLow Carbon Energy
Sustainable Manufacturingsustainableenvironmentaltechnologyperformance
productsefficiencyquality
Sustainable Developmentinternationaldevelopmentsustainableglobal
countriespolicyeconomic
Solar Energy
solarpowerplant
electricityrooftoppanels
(975 articles)
Low CarbonEnergy
emissionsnuclearwasterenewable
carbonelectricitygasoil
(199 articles) (319 articles)
(628 articles)
Park NMF 52 / 53
DoDIIS 2016
DARPA XDATA Challenge Problem: Tracking IED Drop Visualizing our analysis results
Visualization of our analytics fused with GPS and communications data can âguideâ an analyst to important âeventsâ and âactorsâ
âŻTrack messages of common topics occurring within a time window
âŻFind areas of interest when perpetrators within the same cluster often converge
âŻFlag persons of interest by detecting messages that occur within a topic of interest
NMF Application: Radar + SMS
DoDIIS 2016
Associate device movement with classes of communication activity
Topic2,9,10:Themajorityoftheconversa:onswereaboutplanning,mee:ng,observing,andexchanging.Topic3:Suspiciousmee:ngsmixedwithcivilianchaFerTopic7:Interes:ng.Couldbecodefororganizingthesimula:on?OtherTopics:Separateoutcivilianconversa:ons
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7 Topic 8 Topic 9 Topic 10 like picture get think just good line creative text video really took did know want answer green full terrorist took better ied time haha say sing red text pic sent sounds sending back getting got hey jabal little call snd say script need got see did medina data sending sim music site call better guess new ghazi collection script wait feel hole see car send morning amber did back package sure denver? thanks ready wait years send long sent checkpoint done taken days dont day name vehicles step ied sending did moving able need life music phone house site area people drop contact want going man time thing please convoy gaga frame meeting lol make tell life maybe meet send haha police report really lot songs chapter addicted phone mcc paul province ready days time hear guess gonna day king does end sir start things job blue sons short got
TopicModelingResults(NMF)
âąâŻ Pictures taken for
planning and observing IED bomb site and for the aftermath.
âąâŻ Videos taken of checkpoints and convoy areas.
âąâŻ Package exchange and drop off confirmations.
NMF Application: Radar + SMS
13
DoDIIS 2016
Scene 1 of 4
NMF Application: Radar + SMS
Georgia Tech-Kitware Visualization
1. Package dropped
DoDIIS 2016
Scene 2 of 4
NMF Application: Radar + SMS
2. Left area
DoDIIS 2016
Scene 3 of 4
NMF Application: Radar + SMS
3. Nice fireworks
DoDIIS 2016
Scene 4 of 4
NMF Application: Radar + SMS
4. Leader identified
Summary/Discussions
Overview of NMF with Frobenius norm and algorithmsFast algorithms and convergence via BCD frameworkAdaptive NMF algorithmsVariations/Extensions of NMF : nonnegative PARAFAC andsparse NMFNMF for clusteringComputational comparisons
NMF Matlab codes and papers available athttp://www.cc.gatech.edu/âŒhpark and
Park NMF 53 / 53