Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74)...
Transcript of Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74)...
![Page 1: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/1.jpg)
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energyʼs National Nuclear Security Administration
under contract DE-AC04-94AL85000.
Brett W. BaderSandia National Laboratories
TRICAP 2009June 15, 2009
Advances in Data Analysis using PARAFAC2 and Three-way
DEDICOM
![Page 2: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/2.jpg)
Acknowledgements
• Peter Chew, Sandia
• Tammy Kolda, Sandia
• Daniel Dunlavy, Sandia
• Evrim Acar, Sandia
• Ahmed Abdelali, New Mexico State University
• Alla Rozovskaya, University of Illinois, Urbana-Champaign
![Page 3: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/3.jpg)
Multi-way Analysis
3-way DEDICOM
Tensor or N-way array
PARAFAC2
Interested in algorithms for analysis of large data sets
![Page 4: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/4.jpg)
DEDICOM
• DEcomposition into DIrectional COMponents
• Introduced in 1978 by Harshman
• Past applications- Study asymmetries in telephone calls among cities- Marketing research• car switching: car owners and what they buy next • free associations of words for advertising- Asymmetric measures of world trade (import/export)
• Variations- Two-way DEDICOM- Three-way DEDICOM
![Page 5: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/5.jpg)
Two-way DEDICOM
minA,R
!
!
!
X ! ARAT
!
!
!
2
F
• A (N x P) is an orthogonal matrix of loadings or weights• R (P x P) is a dense matrix that captures asymmetric relationships
• Decomposition is not unique- A can be transformed with no loss of fit to the data- Nonsingular transformation Q:
- Usually “fix” A with some standard rotation (e.g., VARIMAX)
X ! ARAT =X A
R AT
s.t. A orthogonal
ARAT = (AQ)(Q!1RQ!T )(AQ)T
N
N P
N
Single domain model
![Page 6: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/6.jpg)
Three-way DEDICOM
=
• A (N x P) is a matrix of loadings or weights (not necessarily orthogonal)• R (P x P) is a dense matrix that captures asymmetric relationships• D (P x P x K) is a tensor with diagonal frontal slices giving the weights
of the columns of A for each slice in third mode
• *Unique* solution with enough slices of X with sufficient variation- i.e., no rotation of A possible- greater confidence in interpretation of results
N
NK
A
ATRD D
N
P
Xk ! ADkRDkAT for k = 1, . . . ,K
minA,R,D
!
k
""Xk !ADkRDkAT""2
F
![Page 7: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/7.jpg)
PARAFAC2
=
• A (N x P) is a matrix of loadings or weights (not necessarily orthogonal)• H (P x P) is a dense, symmetric matrix (usually positive definite)• D (P x P x K) is a tensor with diagonal frontal slices giving the weights
of the columns of A for each slice in third mode
• *Unique* solution with enough slices of X with sufficient variation- i.e., no rotation of A possible- greater confidence in interpretation of results
N
NK
A
ATHD D
N
P
Xk ! ADkHDkAT for k = 1, . . . ,K
minA,H,D
!
k
""Xk !ADkHDkAT""2
F
![Page 8: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/8.jpg)
DEDICOM Models & Algorithms
=XR AT
=
• Generalized Takane method• ASALSAN variant• All-at-once optimization
• Kiersʼ method• ASALSAN• All-at-once optimization
X AR
A
AT
(Takane, 1985; Kiers et al., 1990)
(Kiers, 1993)
2-wayDEDICOM
3-wayDEDICOM
(Bader, Harshman, Kolda, 2007)
(Bader, Harshman, Kolda, 2007)
![Page 9: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/9.jpg)
Kiersʼ Algorithm
f(R) =
!
!
!
!
!
!
!
"
#
$
Vec(X1)...
Vec(Xm)
%
&
'!
"
#
$
AD1 " AD1
...ADm " A Dm
%
&
'Vec(R)
!
!
!
!
!
!
!
Vec(R) =
!
m"
i=1
(DiATADi) ! (DiA
TADi)
#
!1 m"
i=1
Vec(DiATXiADi)
1) ALS over columns in A
3) ALS over elements in D
minimize:
2) Least-squares problem for R
minA,R,D
m!
i=1
"
"Xi ! ADiRDiAT
"
"
2
F
(involves EVD of dense n x n matrix)
(Kiers, 1993)
Alternating Least SquaresAccurate algorithm
but not suitablefor large-scale data
O(pn3)
![Page 10: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/10.jpg)
ASALSAN
!
X1 XT1 · · · Xm XT
m
"
= A!
D1RD1 D1RTD1 · · · DmRDm DmRTDm
" !
I2m ! AT"
A =
!
m"
i=1
#
XiADiRTDi + XT
i ADiRDi
$
% !
m"
i=1
(Bi + Ci)
%
!1
Bi ! DiRDi(ATA)DiR
TDi,
Ci ! DiRTDi(A
TA)DiRDi.
Solving for A:
where
= AY
minA,R,D
m!
i=1
"
"Xi ! ADiRDiAT
"
"
2
F
ZT
A = YZ(ZTZ)!1
“Alternating Simultaneous Approximation, Least Squares, and Newton”
(Bader, Harshman, Kolda, 2007)
![Page 11: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/11.jpg)
ASALSAN
Solving for R:
f(R) =
!
!
!
!
!
!
!
"
#
$
Vec(X1)...
Vec(Xm)
%
&
'!
"
#
$
AD1 " AD1
...ADm " A Dm
%
&
'Vec(R)
!
!
!
!
!
!
!
Vec(R) =
!
m"
i=1
(DiATADi) ! (DiA
TADi)
#
!1 m"
i=1
Vec(DiATXiADi)
minimize:
Use the approach in (Kiers, 1993)
minR
m!
i=1
"
"Xi ! ADiR DiAT
"
"
2
F
![Page 12: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/12.jpg)
ASALSAN
minDi
!
!Xi ! ADiRDiAT
!
!
2
F
A = QA,
minDi
!
!
!
QTXiQ ! ADiRDiAT
!
!
!
2
F
gk = !
!
i,j
"
2(X ! ADRDAT ) " (ADrka
Tk + akrk,:DA
T )#
i,j
Solving for D:
Use compressionQR factorization:
Use Newtonʼs method to solve the optimization problem for
Smaller problem (p x p)
hst = !2!
i,j
"
(X ! ADRDAT ) " (asrsta
Tt + atrtsa
Ts )
! (ADrsaTs + asrs:DA
T ) " (ADrtaTt + atrt:DA
T )#
i,j
dnew = d ! H!1g
d = diag(Di)
Gradient:
Hessian:
![Page 13: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/13.jpg)
Algorithm Costs
ATA
XiART
XT
i AR
QTXiQ
QR factorization of AO(p2
n)
Xi
Dominant costs:
For small p, updating A is most expensive part
linear in nnz of
ASALSAN is capable of handling large data sets
![Page 14: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/14.jpg)
Timings
working in this space. Specifically, we find an orthonormalbasis Q ! Rn!p of matrix A using, e.g., a compact QRdecomposition,
A = QA, (11)
where A is upper triangular. Then we use Q to project Xonto the basis of A. By the orthogonality of Q, the mini-mization problem of (10) is the same as
minDk
!!!QTXkQ " ADkRDkAT!!!
2
F, (12)
except that QTXkQ and A are both of size p # p. We usethese smaller matrices in place of Xk and A, respectively,in the updates of both R and D in (9) and (10) above.
The dominant costs of ASALSAN per iteration are lin-ear in the number of nonzeros of Xk and/or O(p2n) andcome from the following steps: ATA, QR factorization ofA, XkART, XT
kAR, and QTXkQ. In contrast, the dom-inant costs in Kiers’ ALS algorithm [23] come from up-dating A with p diagonalizations of a dense n # n matrix,costing O(pn3).
3.2 Nonnegative ASALSAN
Because we often deal with nonnegative data in X, itsometimes helps to examine decompositions that retain thenonnegative characteristics of the original data. So we havemodified ASALSAN to compute a three-way DEDICOMmodel with non-negativity constraints on A, R, and D.We call this algorithm NN-ASALSAN, for “nonnegative”ASALSAN. Modifications to the updates of both A and Rare made as follows: we replace the least squares solutionwith the multiplicative update introduced in [29] as imple-mented in [2]. Specifically, we modify the step to solve forthe A appearing on the left in (5):
aic $ aic
"#mk=1
$XkADkRTDk + XT
kADkRDk
%&ic
[A#m
k=1(Bk + Ck)]ic + !.
where Bk and Ck are the same as in (7)-(8) above and ! isa small number like 10"9. The solution for R is given by:
Vec(R)i $Vec(R)i
"#mk=1 Vec(DkATXkADk)
&i
[#m
k=1(DkATADk) % (DkATADk)Vec(R)]i + !.
We used the procedure for updating D as above, using thesame non-negativity constraints.
An algorithm for a nonnegative two-way DEDICOMmodel follows directly from NN-ASALSAN when one con-siders a matrix X as an array X having a single slice(m = 1) and the D array is just the identity matrix.
Table 1. Time in seconds per iteration (aver-age number of iterations) on both data sets.
Algorithm World trade EnronASALSAN 0.069 (50) 0.85 (184)NN-ASALSAN 0.083 (47) 1.0 (74)Kiers [23] 0.022 (67) 22.3 (400+)
4 Experimental results
We consider two applications: a small example usingthe international trade data used previously in [20] and thelarger email graph of the Enron corporation that was madepublic during the federal investigation.
ASALSAN was written in MATLAB, using the TensorToolbox [5, 6, 7], and Kiers’ algorithm [23] was compiledPascal code obtained from the author. All tests were per-formed on a dual 3GHz Pentium Xeon desktop computerwith 2GB of RAM.
Table 1 shows the timings per iteration and average num-ber of iterations to satisfy a tolerance of 10"5 (World trade)or 10"7 (Enron) in the change of fit for the three algorithms(using the same stopping criteria). We suspect the perfor-mance gap on the world trade example is due to more over-head in our MATLAB code relative to Kiers’ compiled ex-ecutable. Due to the poor asymptotic scalability of Kiers’algorithm on the larger Enron data, its running time is muchslower than ASALSAN. Processing larger data sets, the dis-crepancy will grow even larger.
A practice in some applications of DEDICOM is to ig-nore the diagonal entries of each Xk in the minimizationof (3). For both applications, this makes sense becausewe wish to ignore self-loops (i.e., no self-trade or sendingemail to yourself). We use an imputation technique of esti-mating the diagonal values from the current approximationADkRDkAT at each iteration and including them in Xk.
4.1 World trade
For a simple algorithmic comparison, we tested the in-ternational trade data of [20]. The data consists of im-port/export data among 18 nations in Europe, North Amer-ica, and the Pacific Rim from 1981 to 1990. A semanticgraph of this data corresponds to a dense adjacency array Xof size 18 # 18 # 10.
We computed a three component (p = 3) model usingASALSAN and Kiers’ algorithm, and the same minimizersmay be found among the results of both algorithms. We alsoused NN-ASALSAN to compute a new fully-nonnegativeversion of the DEDICOM model (A,R,D & 0). Becausethe nonnegative results are more easily interpreted and arenew, we report just these results.
Time in seconds per iteration (avg iterations)
(Bader, Harshman, Kolda, 2007)
184x184x449838 nonzeros
18x18x103060 nonzeros
![Page 15: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/15.jpg)
New Approach: All-at-once Optimization
minx
f(x)Minimization problem:
• Need a flexible method to handle- constraints (e.g., non-negativity)- regularization (e.g., sparsity)- missing data- large-scale data sets
• Want- speed comparable to ASALSAN- improved accuracy, if possible
• Follow what has been done with CANDECOMP/PARAFAC (see Evrim Acarʼs presentation)
• Use Poblano Optimization Toolbox in MATLAB (Dunlavy, et al. 2009)- Vectorize factor variables and derivatives
![Page 16: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/16.jpg)
Optimization Theory
!2f(x)
!f(x)T! +
s
sT
s
+
f(x)
1
2f(x + s)
minx
f(x)
Taylor series approx.
Minimization problem:
!f(x) =
!
"#
!f!x1...
!f!xn
$
%& !2f(x) =
!
"""""#
!2f!x2
1
!2f!x1!x2
· · · !2f!x1!xn
!2f!x2!x1
!2f!x2
2· · · !2f
!x2!xn
......
...!2f
!xn!x1
!2f!xn!x2
· · · !2f!x2
n
$
%%%%%&
Gradient Hessian
![Page 17: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/17.jpg)
Newtonʼs Method
xk
sk
x1
x2
!2f(x)
!f(x)T! +
s
sT
s
+
f(x)
1
2f(x + s)
Taylor series approx.
m(xk + s) = f(xk) +!f(xk)T s +12sT!2f(xk)s
!2f(xk)sk = "!f(xk),
xk+1 = xk + sk
Newtonʼs method:
Local model
Solve for sk
![Page 18: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/18.jpg)
Quasi-Newton Methods
Bk = Bk!1 +sk!1sT
k!1
sTk!1yk
! Bk!1ykyTk Bk!1
yTk Bk!1yk
.
minBk
!Bk "Bk!1 !F subject to Bk = BTk , Bkyk = sk!1,
Hksk = !"f(xk)
Hk ! "2f(xk)Approximation to Hessian:
New linear system:
Bk = H!1k
Secant method: use gradients at past points to approximate Hessian
BFGS update
sk = !Bk"f(xk)
yk = !f(xk)"!f(xk!1)
![Page 19: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/19.jpg)
Large-scale Methods
xk
!"f(xk)
xCPk
x1
x2
!H!1
k "f(xk)
Bk = Bk!1 +sk!1sT
k!1
sTk!1yk
! Bk!1ykyTk Bk!1
yTk Bk!1yk
.
BFGS update
Limited memory BFGS (L-BFGS) keeps only m past update vectors sk and yk
Nonlinear Conjugate Gradient (NCG)
Steepest descent then move in subsequent conjugate directions
These methods are implemented in the Poblano Optimization Toolbox for MATLAB (Dunlavy et al. 2009)
![Page 20: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/20.jpg)
Derivatives for 2-way DEDICOM
!f
!Aij= [!ET AR! EART ]ij
!f
!Rij= [!AT EA]ij
E ! Z "ARAT
Objective function: f(A, R) ! 12 "E "2F = 1
2
!!Z #ARAT!!2
F
Gradient:
AR = A*R;ARt = A*R';AtA = A'*A;G{1} = -Z'*AR - Z*ARt + ARt*(AtA*R) + AR*(AtA*R');G{2} = -A'*Z*A + AtA*R*AtA;
MATLAB:
Error:
![Page 21: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/21.jpg)
Derivatives for 3-way DEDICOM
Objective function:
Gradient:
Error: Ek ! Zk "ADkRDkAT
f(A, R, D) ! 12
K!
k=1
"Ek "2F = 12
K!
k=1
""Zk #ADkRDkAT""2
F
!f
!Aij=
!!
"
k
ET ADkRDk + EADkRT Dk
#
ij
!f
!Rij=
!!
"
k
AT DkEDkA
#
ij
!f
!Dkj= ADkrjEaj + aT
j Erj,:DkAT
![Page 22: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/22.jpg)
Derivatives for 3-way DEDICOM
for k = 1:size(D,1) DktDk = D(k,:)'*D(k,:); Rk = R.*DktDk; AR = A*Rk; ARt = A*Rk'; E = AR*A' - Z(:,:,k); G{1} = G{1} + E'*AR + E*ARt; G{2} = G{2} + (A'*E*A).*DktDk;
ADR = A*(diag(D(k,:))*R); RDA = (R*diag(D(k,:)))*A'; for j = 1:size(D,2) G{3}(k,j) = ADR(:,j)'*E*A(:,j) + A(:,j)'*E*RDA(j,:)'; endend
MATLAB:
![Page 23: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/23.jpg)
Results
• Synthetic data• 20x20x9• Random initialization
(5% error)• No noise in X• p = 2
0 2 4 6 8 10 12 1410
!12
10!10
10!8
10!6
10!4
10!2
100
Time (sec)
Re
l. R
esid
ua
l E
rro
r
ASALSAN
NCG
L!BFGS
![Page 24: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/24.jpg)
Results
• Synthetic data• 20x20x9• Random initialization
(5% error)• 0.1% noise in X• p = 2
0 1 2 3 4 5 610
!3
10!2
10!1
100
Time (sec)
Rel. R
esid
ual E
rror
ASALSAN
NCG
L!BFGS
![Page 25: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/25.jpg)
Results
• Synthetic data• 50x50x35• Random initialization
(5% error)• 0.1% noise in X• Oblique factors A• p = 2
0 10 20 30 40 50 60 70 80 90 100
100
101
102
Time (sec)
Rel. R
esid
ual E
rror
ASALSAN
NCG
L!BFGS
![Page 26: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/26.jpg)
Results
• Enron email data• 184x184x44• 9838 nnz• p = 4• Random
initialization
0 10 20 30 40 50 60 70 800.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
Time (sec)
Rel. R
esid
ual E
rror
ASALSAN
NCG
L!BFGS
![Page 27: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/27.jpg)
Derivatives for PARAFAC2
Objective function:
Gradient:
Error: Ek ! Zk "ADkHDkAT
f(A, H, D) ! 12
K!
k=1
"Ek "2F = 12
K!
k=1
""Zk #ADkHDkAT""2
F
!f
!Hij=
!!
"
k
AT DkEDkA
#
ij
!f
!Dkj= 2ADkhjEaj
!f
!Aij=
!!2
"
k
ET ADkHDk
#
ij
![Page 28: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/28.jpg)
Discussion
• All-at-once optimization is competitive but not the best. ASALSAN is fastest on real data.- DEDICOM and PARAFAC2 are difficult nonlinear
optimization problems with multiple minima- ALSALSAN includes some second order information
• Optimization approach convenient for:- Different loss functions (other than least squares)- General constraints (e.g., non-negativity)- Regularization (e.g., sparsity)- Missing data
• Future research will explore these benefits and aim for better performance
![Page 29: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/29.jpg)
Applications
DEDICOM
Tensor or N-way array
PARAFAC2• Temporal social network analysis (Enron)• Community finding• Part-of-speech tagging
• Multilingual document analysis
![Page 30: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/30.jpg)
Using PARAFAC2 for Multilingual Document
Clustering
Joint work with Peter Chew
![Page 31: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/31.jpg)
Basics: Graphs and Matrices
Term-by-doc (adjacency) matrix
Documents
MATRICES, VECTOR SPACES, AND INFORMATION RETRIEVAL 341
The t = 6 terms:
T1: bak(e,ing)T2: recipesT3: breadT4: cakeT5: pastr(y,ies)T6: pie
The d = 5 document titles:
D1: How to Bake Bread Without RecipesD2: The Classic Art of Viennese PastryD3: Numerical Recipes: The Art of Scientific ComputingD4: Breads, Pastries, Pies and Cakes: Quantity Baking RecipesD5: Pastry: A Book of Best French Recipes
The 6 ! 5 term-by-document matrix before normalization, where theelement aij is the number of times term i appears in document title j:
A =
!
"""""#
1 0 0 1 01 0 1 1 11 0 0 1 00 0 0 1 00 1 0 1 10 0 0 1 0
$
%%%%%&
The 6 ! 5 term-by-document matrix with unit columns:
A =
!
"""""#
0.5774 0 0 0.4082 00.5774 0 1.0000 0.4082 0.70710.5774 0 0 0.4082 0
0 0 0 0.4082 00 1.0000 0 0.4082 0.70710 0 0 0.4082 0
$
%%%%%&
Fig. 2.2 The construction of a term-by-document matrix A.
are returned as relevant. The second, third, and fifth documents, which concernneither of these topics, are correctly ignored.
If the user had simply requested books about baking, however, the results wouldhave been markedly di!erent. In this case, the query vector is given by
q(2) = ( 1 0 0 0 0 0 )T ,
and the cosines of the angles between the query and five document vectors are, in
MATRICES, VECTOR SPACES, AND INFORMATION RETRIEVAL 341
The t = 6 terms:
T1: bak(e,ing)T2: recipesT3: breadT4: cakeT5: pastr(y,ies)T6: pie
The d = 5 document titles:
D1: How to Bake Bread Without RecipesD2: The Classic Art of Viennese PastryD3: Numerical Recipes: The Art of Scientific ComputingD4: Breads, Pastries, Pies and Cakes: Quantity Baking RecipesD5: Pastry: A Book of Best French Recipes
The 6 ! 5 term-by-document matrix before normalization, where theelement aij is the number of times term i appears in document title j:
A =
!
"""""#
1 0 0 1 01 0 1 1 11 0 0 1 00 0 0 1 00 1 0 1 10 0 0 1 0
$
%%%%%&
The 6 ! 5 term-by-document matrix with unit columns:
A =
!
"""""#
0.5774 0 0 0.4082 00.5774 0 1.0000 0.4082 0.70710.5774 0 0 0.4082 0
0 0 0 0.4082 00 1.0000 0 0.4082 0.70710 0 0 0.4082 0
$
%%%%%&
Fig. 2.2 The construction of a term-by-document matrix A.
are returned as relevant. The second, third, and fifth documents, which concernneither of these topics, are correctly ignored.
If the user had simply requested books about baking, however, the results wouldhave been markedly di!erent. In this case, the query vector is given by
q(2) = ( 1 0 0 0 0 0 )T ,
and the cosines of the angles between the query and five document vectors are, in
MATRICES, VECTOR SPACES, AND INFORMATION RETRIEVAL 341
The t = 6 terms:
T1: bak(e,ing)T2: recipesT3: breadT4: cakeT5: pastr(y,ies)T6: pie
The d = 5 document titles:
D1: How to Bake Bread Without RecipesD2: The Classic Art of Viennese PastryD3: Numerical Recipes: The Art of Scientific ComputingD4: Breads, Pastries, Pies and Cakes: Quantity Baking RecipesD5: Pastry: A Book of Best French Recipes
The 6 ! 5 term-by-document matrix before normalization, where theelement aij is the number of times term i appears in document title j:
A =
!
"""""#
1 0 0 1 01 0 1 1 11 0 0 1 00 0 0 1 00 1 0 1 10 0 0 1 0
$
%%%%%&
The 6 ! 5 term-by-document matrix with unit columns:
A =
!
"""""#
0.5774 0 0 0.4082 00.5774 0 1.0000 0.4082 0.70710.5774 0 0 0.4082 0
0 0 0 0.4082 00 1.0000 0 0.4082 0.70710 0 0 0.4082 0
$
%%%%%&
Fig. 2.2 The construction of a term-by-document matrix A.
are returned as relevant. The second, third, and fifth documents, which concernneither of these topics, are correctly ignored.
If the user had simply requested books about baking, however, the results wouldhave been markedly di!erent. In this case, the query vector is given by
q(2) = ( 1 0 0 0 0 0 )T ,
and the cosines of the angles between the query and five document vectors are, in
example from (Berry, Drmac, Jessup, 1999)
Bipartite graph
Terms
T2T1
T4T3
T6T5
D2D1
D4D3
D5
T2T1
T4T3
T6T5
D1 D2 D3 D4 D5
Concepts
• Bag of words• Vector space model• Stemming• Stoplists• Scaling for information content
![Page 32: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/32.jpg)
Cross-language Information Retrieval (CLIR)
Web documents could be in any language
English
French
Arabic
Spanish
EnglishGermanJapaneseFrenchChinese SimplifiedSpanishRussianDutchKoreanPolishPortugueseChinese TraditionalSwedishCzechNorwegianItalianDanishHungarianFinnishHebrewArabicTurkishSlovakIndonesianBulgarianCroatianCatalanSlovenianGreekRomanianSerbianEstonianIcelandicLithuanianLatvian
Languages on the web
Goal: Cluster documents by topic regardless of
language
(P. Chew, B. Bader, A. Abdelali (NMSU), T. Kolda, P. Kegelmeyer)
![Page 33: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/33.jpg)
Bible as a ʻRosetta Stoneʼ
• The Bible has been translated carefully and widely - 451 complete & 2479 partial translations• Verse aligned
Afrikaans Estonian Norwegian
Albanian Finnish Persian (Farsi)
Amharic French Polish
Arabic German Portuguese
Aramaic Greek (New Testament) Romani
Armenian Eastern Greek (Modern) Romanian
Armenian Western Hebrew (Old Testament) Russian
Basque Hebrew (Modern) Scots Gaelic
Breton Hungarian Spanish
Chamorro Indonesian Swahili
Chinese (Simplified) Italian Swedish
Chinese (Traditional) Japanese Tagalog
Croatian Korean Thai
Czech Latin Turkish
Danish Latvian Ukrainian
Dutch Lithuanian Vietnamese
English Manx Gaelic Wolof
Esperanto Maori Xhosa
Sandiaʼs database: 54 languages: 99.76 % coverage of web
![Page 34: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/34.jpg)
Bible as Parallel Corpus
5 languages for training and testing
Translation Terms Total Words
English (King James) 12,335 789,744
Spanish (Reina Valera 1909) 28,456 704,004
Russian (Synodal 1876) 47,226 560,524
Arabic (Smith Van Dyke) 55,300 440,435
French (Darby) 20,428 812,947
• Languages convey information in different number of words
Isolating language Synthetic language
![Page 35: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/35.jpg)
Example of Statistical Differences
!""#$%&'()*+)()*,()-+&".+)%/&"(/0()*,(1%2.,(%&(/#-(3+-+..,.(4/-3#"(
+-,(+44#-+),(+&5(4/$3.,),6()*%"()+2.,(7/#.5(+33,+-()/("#'',")()*+)(
!-+2%4( )+8,"( 9#")( /:,-( *+.0( )*,( &#$2,-( /0( ),-$"( )/( ,;3-,""( )*,(
"+$,(+$/#&)(/0( %&0/-$+)%/&(+"(<&'.%"*6( )*+)(<&'.%"*(+&5(=-,&4*(
)+8,("%$%.+-(&#$2,-"(/0( ),-$"6(+&5("/(/&>( ?&)#%)%:,.@6( )*%"(",,$"(
-%'*)( '%:,&( )*+)( !-+2%4( +&5( A#""%+&( -,.@( $#4*( $/-,( )*+&( )*+&(
<&'.%"*6(=-,&4*(+&5(B3+&%"*(/&()*,(#",(/0($/-3*/./'@(C,&5%&'"6(
+&5("/(/&D()/(+55()/(/-($/5%0@()*,($,+&%&'"(/0(7/-5">(E*,("+$,(
3*,&/$,&/&( 4+&( +."/( 2,( %..#")-+),5( 7,..( /&( +( "$+..( "4+.,( 2@(
4/&"%5,-%&'( )*,( 0%-")( F5/4#$,&)G( %&( )*,( 3+-+..,.( 4/-3#"( C)*,( 0%-")(
:,-",D6("*/7&(%&(E+2.,(H>(?)(%"(.%8,.@()/(2,(&/(4/%&4%5,&4,()*+)()*,(
2,")(4-/""I.+&'#+',(3-,5%4)%/&(-,"#.)"(7,(+4*%,:,5(7,-,(0/-(3+%-"(
/0( .+&'#+',"( 7%)*( "%$%.+-( ")+)%")%4"( C0/-( ,;+$3.,6( <&'.%"*( +&5(
=-,&4*D6( +&5( )*,( 7/-")( -,"#.)"( 7,-,( 0/-( )*/",( 7%)*( 5%""%$%.+-(
")+)%")%4"(C"#4*(+"(!-+2%4(+&5(<&'.%"*D(C",,(E+2.,(J(+&5(E+2.,(KD>(
!"#$%&'(&)$$*+,-",./0&/1&+,",.+,.2"$&3.11%-%02%+&#%,4%%0&
$"05*"5%+&&
& !%6,& 4/-3&
2/*0,&
7&/1&
,/,"$&
!A( !"#$%(&$'()*$(+$(,-.(/01*$(23>((
L( MK(
<N( ?&( )*,( 2,'%&&%&'( O/5( 4-,+),5( )*,(
*,+:,&"(+&5()*,(,+-)*>(
MP( QK(
=A( !#( 4/$$,&4,$,&)( R%,#( 4-S+( .,"(
4%,#;(,)(.+(),--,>(
T( QM(
AU( 4( 567689( :;<=;>?8( @;A( 59B;( ?(C9D8E>(
V( MV(
<B( <&(,.(3-%&4%3%/(4-%W(R%/"(./"(4%,./"(@(
.+()%,--+>(
MP( QK(
&
!8!9:&
&
;<&
&
=>>&
(
E*,( ")+)%")%4+.( 5%00,-,&4,"( 4+&6( %&( 0+4)6( 2,( "*/7&( )/( *+:,( +(
5,)-%$,&)+.(,00,4)(/&(XB!(Y(&/)(9#")(,$3%-%4+..@6(2#)()*,/-,)%4+..@(
+"( 7,..>( U&5,-( )*,( ")+&5+-5( ./'I,&)-/3@( 7,%'*)%&'( "4*,$,6( 7,(
4+&(:,-%0@(7*,)*,-6(+44/-5%&'()/()*%"("4*,$,6()*,(4/&)-%2#)%/&(/0(
,+4*(/0()*,(Z(.+&'#+',"(%&(/#-($#.)%.%&'#+.(+.%'&,5(3+-+..,.(4/-3#"(
%"(,[#+.(Y(7*%4*(%)("*/#.5(2,6(%0()*,()-+&".+)%/&"(+-,(4/$3.,),(+&5(
+44#-+),>( E*,( 4/$3#),5( ,&)-/3@( %"( +( $,+"#-,( /0( %&0/-$+)%/&(
4/&),&)\( 2,4+#",( )*,( "+$,( %&0/-$+)%/&( %"( 2,%&'( 4/&:,@,5( %&( )*,(
)-+&".+)%/&"(/0(+&@('%:,&(5/4#$,&)(%&()*,()-+%&%&'(4/-3#"6()*,()/)+.(
,&)-/3@(3,-( .+&'#+',(C)*,("#$(/0( ),-$(,&)-/3%,"(/0( ),-$"( %&()*+)(
.+&'#+',D("*/#.5(2,(4/&")+&)(0/-(+&@('%:,&(5/4#$,&)>(
U3/&( ,;+$%&+)%/&6(7,( 0/#&5( )*+)(7%)*( )*,( ")+&5+-5( ./'I,&)-/3@(
7,%'*)%&'( )*,( 4/$3#),5( %&0/-$+)%/&( 4/&),&)( :+-%,"( [#%),(7%5,.@(
2@( .+&'#+',6( 7*%4*( %"( 3,-*+3"( #&"#-3-%"%&'>( ?0( %)( )+8,"( ZLP6ZQK(
A#""%+&( 7/-5"( )/( ,;3-,""( 7*+)( <&'.%"*( "+@"( %&( VHT6VKK( 7/-5"6(
)*,&(/&(+:,-+',(A#""%+&(7/-5"($#")(4/&)+%&($/-,(%&0/-$+)%/&(C/-(
$,+&%&'D()*+&(<&'.%"*(7/-5"(C+'+%&6(+(&/)%/&(7*%4*(%"(4/&"%"),&)(
7%)*(7*+)(7,(8&/7(+2/#)( )*,(7+@(7/-5"(+-,( 0/-$,5( %&(A#""%+&(
+&5(<&'.%"*D>(]/7,:,-6("%&4,(,&)-/3@(C/-(%&0/-$+)%/&(4/&),&)D(%&(
)*,(./'I,&)-/3@("4*,$,(%"("%$3.@()*,(,&)-/3@(/0(+(3+-)%4#.+-(),-$(
+4-/""( +..( 5/4#$,&)"6( +&5( "%&4,( )*,( "4*,$,( )+8,"( &/( +44/#&)( /0(
)*,( "3,4%0%4( 3-/3,-)%,"( /0( 5%00,-,&)( .+&'#+',"6( )*,-,( %"( &/(
'#+-+&),,( )*+)( )*,( 4/&)-%2#)%/&"( /0( 5%00,-,&)( .+&'#+',"( %&( )*,(
3+-+..,.( 4/-3#"( 7%..( 2,( ,[#+.( +"( )*,@( "*/#.5( 2,>( ?&( 0+4)6( %&( /#-(
3+-+..,.( 4/-3#"6( 7*,-,( #&5,-( XB!( +..( .+&'#+',"( +-,( F$%;,5(
)/',)*,-G( %&( )*,( 2+'I/0I7/-5"( +33-/+4*6( .+&'#+',"( 7*%4*( *+:,(
$/-,( ),-$"( /:,-+..( C"#4*( +"( <&'.%"*( +&5( =-,&4*D( ',&,-+..@(
+44/#&)( 0/-( +( *%'*,-( 3,-4,&)+',( /0( )*,( F%&0/-$+)%/&G( %&( ,+4*(
5/4#$,&)>(E*%"(3/%&)"()/(+(0.+7(%&(")+&5+-5($#.)%.%&'#+.(XB!6(/-(
+)(.,+")(%&()*,(./'I,&)-/3@(7,%'*)%&'("4*,$,(+"(+33.%,5(7%)*%&()*+)(
+33-/+4*>(
^&,( /)*,-( 3/%&)( )/( &/),( %"( )*,( 5%00,-,&4,( 2,)7,,&( )*,( )/)+.( /0(
MLJ6VKZ( "*/7&( %&( E+2.,( V( +2/:,6( +&5( )*,( 0%'#-,( /0( MLP6JTL(
$,&)%/&,5(%&(",4)%/&(K>M>(E*,(5%00,-,&4,(/0(J6JKT(-,3-,",&)"()*/",(
),-$"()*+)(/44#-(%&($/-,()*+&(/&,(.+&'#+',6("#4*(+"(F5,G(CF/0G(%&(
=-,&4*( +&5( B3+&%"*D6( <&'.%"*( F4/%&G( :,-"#"( =-,&4*( F4/%&G(
CF4/-&,-GD>(E*,(-,.+)%:,.@("$+..(&#$2,-(/0("#4*(),-$"(%"(#&.%8,.@(
)/(+00,4)()*,(4-/""I.+&'#+',(3-,4%"%/&(-,"#.)"("%'&%0%4+&).@6(2#)(%)(%"(
7/-)*( 3/%&)%&'( /#)( )*+)( ")+&5+-5(XB!(*+"(&/(7+@( )/(5%")%&'#%"*(
2,)7,,&(*/$/'-+3*"(0-/$(5%00,-,&)(.+&'#+',"6(+&5(%&("/$,(4+","(
)*%"(4/#.5(2,(3-/2.,$+)%46(,"3,4%+..@(7*,&()*,(*/$/'-+3*"(*+:,(
:,-@(5%00,-,&)($,+&%&'"(%&()*,(5%00,-,&)(.+&'#+',">J(
O%:,&(+..()*%"6()*,(")+)%")%4+.(,;3.+&+)%/&(",,$"()/(2,(+(-,+"/&+2.,(
/&,( 0/-( 7*@6( 7*,&( 7,( +)),$3),5( )/( $+3( )*,( 5/4#$,&)"(
'-+3*%4+..@( "#4*( )*+)( "%$%.+-( 5/4#$,&)"( 7,-,( 4./",( )/( /&,(
+&/)*,-6( )*,( 5/4#$,&)"( 4.#"),-,5( 2@( .+&'#+',( -+)*,-( )*+&( 2@(
)/3%4>( _-,4%",.@( )*,( "+$,( %""#,( *+"( 2,,&( %5,&)%0%,5( ,.",7*,-,( %&(
)*,( .%),-+)#-,\( `+)*%,#( ,)( +.( aMHb( -,3/-)( )*+)( F,:,&( %0( )*,( 4-/""I
.%&'#+.( "%$%.+-%)@($,+"#-,( %"( 5,"%'&,5( )/( 2,*+:,( )*,( "+$,(7*,&(
4/$3+-%&'( 5/4#$,&)"( 7-%)),&( %&( )*,( "+$,( .+&'#+',( +&5(
5/4#$,&)"(7-%)),&( %&(5%00,-,&)(/&,"6(/#-(,:+.#+)%/&("*/7"()*+)( %)(
")%..(),&5"()/('+)*,-(%&(+(4.#"),-(5/4#$,&)"(/0("+$,(.+&'#+',(3-%/-(
)/(5%00,-,&)(.+&'#+',(/&,"G>(
(
?(! 9@&9:!AB@9!)CA&9DDB89EF&!"( 5%"4#"",5( %&( )*,( 3-,:%/#"( ",4)%/&6( %)( %"( +( 5-+72+48( /0( )*,(
")+&5+-5( +33-/+4*( )/( XB!( )*+)( )*,-,( %"( &/( 5,.%&,+)%/&( 2,)7,,&(
5%00,-,&)( .+&'#+',"( %&( )*,( )-+%&%&'( 5+)+>( !..( .+&'#+',"( +-,(
4/&4+),&+),5( )/',)*,-( %&( )-+%&%&'6( "/( )*+)( ,+4*( F5/4#$,&)G( %"(
$#.)%.%&'#+.>( c%)*%&( )*,( XB!( 0-+$,7/-86( */7,:,-6( )*%"( %"(
#&+:/%5+2.,6( "%&4,( 7%)*/#)( )*,( 4/&4+),&+)%/&6( XB!( %"( #&+2.,( )/(
$+8,()*,(+""/4%+)%/&"(2,)7,,&(7/-5"(%&(5%00,-,&)(.+&'#+',"(7*,&(
)*,@(4/I/44#->(
E/( /:,-4/$,( )*%"( 3-/2.,$6( )*,-,0/-,6( 7,( 3-/3/",( +( &/:,.(
+33.%4+)%/&( /0( _!A!=!dQ( aMJb( +"( +&( +.),-&+)%:,( )/( XB!>(
_!A!=!dQ( %"( +( :+-%+&)( /0( _!A!=!d( aMQb6( +( $#.)%I7+@(
',&,-+.%e+)%/&(/0( )*,(BfR>(E*,(_!A!=!d($/5,.( %"(2+",5(/&(+(
F3+-+..,.( 3-/3/-)%/&+.( 3-/0%.,( 3-%&4%3.,G( )*+)( +33.%,"( )*,( "+$,(
0+4)/-"( +4-/""( +( 3+-+..,.( ",)( /0( $+)-%4,"( )/( $%&%$%e,( +( .,+")I
"[#+-,"(/29,4)%:,>(X,)()*,(`(!(N($+)-%;(g86(8(h(M6(i6(j6(5,&/),(
)*,(!)*(".%4,(/0(+()*-,,I7+@(5+)+(+--+@(g6(+&5(.,)(A(2,()*,(&#$2,-(
/0( 5%$,&"%/&"( /0( )*,( XB!( 4/&4,3)#+.( "3+4,>( E*,&( )*,( ")+&5+-5(
_!A!=!d($/5,.(%"(
( g8(h(U(B8(fE( ( ( CMD(
(((((((((((((((((((((((((((((((((((((((( (((((((((((((((((((((((((
J(E*,(5%00,-,&4,(2,)7,,&()*,()/)+.(/0(J6JPV6LZK(+&5()*,(Q6LHK6TJH(
&/&e,-/"( $,&)%/&,5( %&( ",4)%/&( K>M( 4+&( +."/( 2,( ,;3.+%&,5\( )*,(
0%'#-,(%&(E+2.,(L(%"(*%'*,-(2,4+#",("/$,(),-$"(/44#-($/-,()*+&(
/&4,(%&()*,("+$,(:,-",>(=/-(,;+$3.,6(%0(F)*,G(/44#-"()*-,,()%$,"(
%&( +( 3+-)%4#.+-( :,-",6( )*%"( 7/#.5( +44/#&)( 0/-( )*-,,( )/8,&"6( 2#)(
/&.@(/&,(&/&e,-/(,&)-@(%&()*,(),-$I2@I5/4#$,&)($+)-%;>(
Research Track Paper
148
![Page 36: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/36.jpg)
Language Morphology
• Isolating language: One morpheme per word- e.g., "He travelled by hovercraft on the sea." Largely isolating, but travelled and hovercraft each
have two morphemes per word. (Wikipedia)
• Synthetic language: High morpheme-per-word ratio- German: Aufsichtsratsmitgliederversammlung => "On-view-council-with-limbs-gathering" meaning
"meeting of members of the supervisory board". (Wikipedia)- Chulym: Aalychtypiskem => "I went out moose hunting" - Yupʼik Eskimo: tuntusssuatarniksaitengqiggtuq => “He had not yet said again that he was going to
hunt reindeer.” (Payne, 1997)
Translation Terms Total Words
English (King James) 12,335 789,744
Arabic (Smith Van Dyke) 55,300 440,435
Isolating language Synthetic language
Chinese Quechua, Inuit (Eskimo)
Languages convey information in different number of words
![Page 37: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/37.jpg)
Term-Doc Matrix
Term-by-verse matrix for all languages
terms
Bible verses
English
Spanish
Russian
Arabic
French
163,745 x 31,230
Look for co-occurrence of terms in the same verses and across languages to capture latent concepts
• Approach is not new- pairs of languages in LSA suggested by (Berry et al., 1994)- multi-parallel corpus is new
![Page 38: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/38.jpg)
Latent Semantic Indexing
Term-by-verse matrix for all languages
terms
Bible verses
English
Spanish
Russian
Arabic
French
UVΣ T
Truncated SVD
Project new documents of interest into subspace of UΣ-1
• cosine similarities for clustering• machine learning applications
term x concept
dimension 1 0.1375dimension 2 0.1052dimension 3 0.0341dimension 4 0.0441dimension 5 -0.0087dimension 6 0.0410dimension 7 0.1011dimension 8 0.0020dimension 9 0.0518dimension 10 0.0822dimension 11 -0.0101dimension 12 -0.1154dimension 13 -0.0990dimension 14 0.0228dimension 15 -0.0520dimension 16 0.1096dimension 17 0.0294dimension 18 0.0495dimension 19 0.0553dimension 20 0.1598
Projectnew document
Document feature vector
Ak = Uk!kV Tk =
k!
i=1
!iuivTi
![Page 39: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/39.jpg)
Quran as Test Set
• Quran is translated into many languages, just like the Bible
• 114 suras (or chapters)
• More variation across translations => harder IR task
![Page 40: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/40.jpg)
Performance Metrics
• Precision at 1 document (P1)- Equals 100% if the translation of the query ranked
highest, 0% otherwise- Calculated as an average over all queries for each
language pair or as a total average- Essentially, P1 measures success in retrieving
documents when the source and target languages are specified
• Average multilingual precision at 5 (or n) documents (MP5)- The average percentage of the top 5 documents that
are translations of the query document- Calculated as an average for all queries & all languages- Essentially, MP5 measures success in multilingual
clustering
• MP5 is a stricter measure than P1 because the retrieval task is harder
?
?
?
Lang 1
Lang 1Lang 2
query
query
![Page 41: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/41.jpg)
LSA Results
Documents tend to cluster more by language than by topic
5 languages, 240 latent dimensions
(Chew,Bader,Kolda,Abdelali, 2007)
Method Average P1 Average MP5
SVD/LSA 76.0% 26.1%
![Page 42: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/42.jpg)
LSA Results
Method Average P1 Average MP5
SVD/LSA (α=1) 76.0% 26.1%
SVD/LSA (α=1.8) 87.6% 65.5%
5 languages, 240 latent dimensions
![Page 43: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/43.jpg)
New Approach: Multi-matrix Array
X5
X4
X3X
2English
X1
Spanish
Russian
Arabic
French
Term-by-verse matrix for each language
(Chew, Bader, Kolda, Abdelali, 2007)
Array size: 55,300 x 31,230 x 5 with 2,765,719 nonzeros
![Page 44: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/44.jpg)
Tucker1
VT
S1
=U1
X1
X2
X3
U2
U3
!
!Tucker
Tucker1
![Page 45: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/45.jpg)
Tucker1 Results
Only minor improvement because each Uk is not orthogonal
5 languages, 240 latent dimensions
Method Average P1 Average MP5
SVD/LSA (α=1) 76.0% 26.1%
SVD/LSA 87.6% 65.5%
Tucker1 89.5% 71.3%
![Page 46: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/46.jpg)
PARAFAC2
Where each Uk is orthonormal and Sk is diagonal
Xk ! UkHSkV T
(Harshman, 1972)
![Page 47: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/47.jpg)
PARAFAC2 Results
Modest improvement over LSA
5 languages, 240 latent dimensions
Method Average P1 Average MP5
SVD/LSA (α=1) 76.0% 26.1%
SVD/LSA 87.6% 65.5%
Tucker1 89.5% 71.3%
PARAFAC2 89.8% 78.5%
![Page 48: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/48.jpg)
Bible, color-coded according to language, are represented in the vector space.18
Note that
the books cluster first to their counterparts in other languages, and then into larger
clusters containing related books; this kind of visualization is possible only when MP6 is
satisfactorily high. In summary, these techniques have effectively allowed us to factor
language out, focusing only on topic, just as we had hoped.
Figure 6. Visualization of multilingual Bible books in vector space
18 To create this visualization, we used Tamale 1.2. The graph layout is created using VxOrd. Both are
software created in-house at Sandia National Laboratories. For further details on VxOrd, see Boyack et al.
(2005).
Bible Clustering with LMSATA
• Books of the Bible color coded by language• MP5 about 90%• Books cluster first with
their counterparts in other languages, then in larger clusters by topic
• Visualization software: Tamale 1.2• Graph layout: VxOrd (Boyack et
al., 2005)
![Page 49: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/49.jpg)
Graph Layout (close-up)
Figure 7. Partial visualization of multilingual Bible books in vector space
7 Discussion
In this paper, we have offered what we believe to be clear evidence that computational
linguistics provides fertile ground for rethinking many of the standard techniques of
information retrieval. This is true because of the strong basis CL has in information
theory. All of our proposed modifications (weighting term-document pairs according to
mutual information; redefining the ‘term’ as the morpheme, or minimal unit of meaning;
not excluding high-entropy terms; and including term alignments of the sort used in
SMT) have obvious connections to IT. What is more, if included in an IR framework,
they clarify the relationship between IT and IR, theoretically strengthening the latter
without necessarily getting in the way of speed and efficiency.
We have examined four basic types of modification, but we believe that there are further
opportunities for improvement. An obvious example is that instead of limiting the
morphological component to stem-suffix tokenization, we could also take prefixes and
word-internal suffixes into account. Similarly, the performance of our CLIR model with
respect to Arabic and other Semitic languages could be improved if the morphological
component were able to identify non-concatenative roots and patterns.
Following this line of reasoning further, one could conceive of building a phonological
and/or morphophonological component into the framework without compromising its
basis in unsupervised learning. If morphological regularities can be learned in an
unsupervised manner, there is no reason in principle why morphophonological
regularities of the sort described in Chomsky and Halle (1968) cannot also be learned in
the same way. Where morphophonological or phonological alternations are reflected in
orthography, as in English divid+e/divis+ion or Russian !"#$%&+'/!"#$%(+#), one can
• John and Acts have tight clusters• Some mixing with Matthew, Mark,
Luke (synoptic gospels - share a similar perspective)
![Page 50: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/50.jpg)
Using DEDICOM for Completely Unsupervised Part-of-Speech Tagging
Joint work with Peter Chew
![Page 51: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/51.jpg)
Approaches to POS tagging (1)
•Supervised
–Rule-based (e.g. Harris 1962)• Dictionary + manually developed rules• Brittle – approach doesn’t port to new domains
–Stochastic (e.g. Stolz et al. 1965, Church 1988)• Examples: HMMs, CRFs• Relies on estimation of emission and transition probabilities from a tagged training corpus
• Again, difficulty in porting to new domains
![Page 52: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/52.jpg)
Approaches to POS tagging (2)
•Unsupervised–All approaches exploit distributional patterns–Singular Value Decomposition (SVD) of term-
adjacency matrix (Schütze 1993, 1995)–Graph clustering (Biemann 2006)
–Our approach: DEDICOM of term-adjacency matrix
–Most similar to Schütze (1993, 1995)Advantages:
• can be reconciled to stochastic approaches• completely unsupervised, like SVD and graph clustering• initial results appear promising
![Page 53: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/53.jpg)
DEDICOM – application to POS tagging
• The assumption that terms are a “single set of objects”, whether they precede or follow, sets DEDICOM apart from SVD and other unsupervised approaches
• This assumption models the fact that tokens play the same syntactic role whether we view them as the first or second element in a bigram
Term adjacency matrix ‘R’ matrix
‘A’ matrix
![Page 54: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/54.jpg)
Comparing DEDICOM output to HMM input
• The output of DEDICOM is essentially a transition and emission probability matrix
• DEDICOM offers the possibility of getting the familiar transition and emission probabilities without tagged training data
‘R’ matrix
‘A’ matrix
Output of DEDICOM
Transition prob. matrix
Emission prob. matrix
Input to HMM(after normalization of counts)
![Page 55: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/55.jpg)
Validation: method 1 (theoretical)
• Hypothetical example - suppose tagged training corpus exists
The man walked the big dog
DT NN VBD DT JJ NN
Corpus:
X:
A*: term-tag counts R*: tag-adjacency counts
• By definition (subject to difference of 1 for final token):– row sums of X = col sums of X = row sums of A*– col sums of A* = row sums of R* = col sums of R*
sparse matrix of bigram counts
![Page 56: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/56.jpg)
Validation: method 1 (theoretical)
• To turn A* and R* into transition and emission probability matrices, we simply multiply each by a diagonal matrix D where the entries are the inverses of the row-sum vector
• If the DEDICOM model is a good one, we should be able to multiply A*DR*D(A*)T to approximate the original matrix X
• In our example, A*DR*D(A*)T =
• This approximates X, and it also captures some syntactic regularities which aren’t instantiated in the corpus (this is one reason HMM-based POS tagging is successful)
![Page 57: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/57.jpg)
Validation: method 2 (empirical)
• Use a tagged corpus (CONLL 2000)– CONLL 2000 has 19,440 distinct terms– There are 44 distinct tags in the tagset (our “gold standard”)
• Tabulate X matrix (solely from bigram frequencies, blind to tags)
• Fit DEDICOM model to X (using k = 44) to ‘learn’ emission and transition probability matrices
• Use these as input to a HMM; tag each token with a numerical index (one of the DEDICOM ‘dimensions’)
• Evaluate by looking at correlation of induced tags with the 44 “gold standard” tags in a confusion matrix
![Page 58: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/58.jpg)
Validation: method 2 (empirical)
Confusion matrix: correlation with ‘ideal’ diagonal matrix = 0.494
Assuming the ‘gold standard’ is the optimal tagging scheme, the ideal confusion matrix would have one DEDICOM class per ‘gold standard’ tag
(either a diagonal matrix or some permutation thereof).
DEDICOM tags
Gold standard
tag
![Page 59: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/59.jpg)
Validation: method 2 (empirical)
• Examples of DEDICOM dimensions or clusters:
Soft clustering: U.S. appears under tag 2 (nouns) and tag 8 (adjectives)
“NN”
“NN”
“DT”
Tag 3: Grouping of ‘new’ with ‘the’ and ‘a’ explained by distributional similarities (all precede nouns). This is also in accordance with a traditional English grammar, which held
that determiners are a type of adjective.
![Page 60: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/60.jpg)
Future Work in POS Tagging
• Constrained DEDICOM- Tags for common terms (e.g., determiners, prepositions)- Semi-supervised learning
• Analyze multiple corpora via 3-way DEDICOM
• Constraints on R matrix in DEDICOM
![Page 61: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/61.jpg)
Discussion and Questions
• Challenges with global optimization- Ways reduce search space and find global minimizer - Deterministic algorithm that returns a good approximation
to global solution
• Fast algorithms for large-scale problems
• Other data analysis models
![Page 62: Advances in Data Analysis using PARAFAC2 and ... - cid.csic.es · NN-ASALSAN 0.083 (47) 1.0 (74) Kiers [23] 0.022 (67) 22.3 (400+) 4 Experimental results We consider two applications:](https://reader035.fdocuments.net/reader035/viewer/2022070614/5c44d52c93f3c34c5a37703d/html5/thumbnails/62.jpg)
Selected Publications
• Bader, Harshman, and Kolda. 2007. Temporal analysis of semantic graphs using ASALSAN. Proceedings of IEEE International Conference on Data Mining (ICDM), 2007.• Chew, Bader, Kolda and Abdelali. 2007. Cross-language
information retrieval using PARAFAC2. Proceedings of KDD 2007.• Chew, Kegelmeyer, Bader and Abdelali. 2008. The Knowledge of
Good and Evil: Multilingual Ideology Classification with PARAFAC2 and Machine Learning. Language Forum (34), 37-52.• Chew, Bader, and Rozovskaya. 2009. Using DEDICOM for
Completely Unsupervised Part-of-Speech Tagging. Proceedings of NAACL-HLT, Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics.• B. W. Bader. 2009. Constrained and Unconstrained Optimization.
Brown, Tauler, Walczak (eds.) Comprehensive Chemometrics, volume 1, pp. 507-545 Oxford: Elsevier.
Brett Bader ([email protected])