Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data HONMF (Higher Order...

1
Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data HONMF (Higher Order Non-negative Matrix Factorization) NTF2D/SNTF2D ((Sparse) Non-negative Tensor Factor 2D Deconvolution) Data results The algorithms were used on a dataset containing the inter trial phase coherence (ITPC) of wavelet transformed EEG data. Briefly stated the data consist of 14 subject recorded during a proprioceptive stimuli consisting of a weight change of left hand during odd trials and right hand during even trials giving a total of 14·2=28 trials. Consequently, the data has the following form X Channel Time-Frequency Trials (Mørup et al. 2006a) Informatics and Mathematical Modeling tention has lately been given to Non-negative Matrix Factorization due to its part based n and ease of algorithmic implementation (Lee & Seung, 1999 & 2001). However, NMF is not in general unique – only when data adequately spans the positive orthant (Donoho and 2004). Consequently, constraints in the form of sparsity is useful to achieve unique decompositions (Hoy Eggert & Körner 2004). As a result, algorithms for sparse coding using multiplicative updates have been (Eggert & Körner 2004, Mørup & Scmidt 2006b) Sparse Coding NMF: Sparse Coding NMF regularizes H while keeping W normalizes such that regularization is not simply achieved by letting H go to zero while W goes to infinity (Eggert and Körner, 2004 Mørup & Schmidt 2006b). C sparse (H) can be any function with positive derivative - a frequently used function is the 1-norm. Title of Nature article on NMF from 1999 NMF is based on gradient descent: Each component is updated by a step in the negative gradient direction NMF uses the concept of multiplicative updates: The derivative of the cost function can be split into a positive part i,d and a negative part i,d . Choosing the step size as the ratio of W i,d to the positive part of the derivative i,d yield multiplicative updates since the gradient step then cancel the W i,d term in the gradient based update. The resulting NMF updates: The least squares (LS) and Kullback-Leibler (KL) divergence updates derived from the multiplicative update approach (Lee & Seung, 2001). NMF not in general unique: If the data does not adequately span the positive orthant no unique solution can be obtained. Here red and green vectors both perfectly span the data points. However, the green vectors represent the solution the most sparse. NTF (Non-negative Tensor Factorization) based on the PARAFAC model (Harshman 1970, Carrol & Chang 1970, Fitzgerald et al., 2005) Model The HONMF is based on the Tucker model (Tucker, 1977) where non-negativity is imposed on all modalities (Mørup et al. 2006e). Model The NTF2D is a PARAFAC model convolutive in 2 dimensions (Mørup & Schmidt 2006c): Algorithms Algorithms Algorithms Data results The algorithms were used to analyze the absolute value of the log spectrogram of stereo recordings of music, i.e. the data had the form X Channel Log-Frequency Time Data results The algorithms were tested on a dataset of flow injection analysis (Nørgaard, 1994 Smilde, 1999), i.e. X Spectre Time Batch number Result obtained by the SNTF2D algorithms (bottom panel) when decomposing the log-spectrogram of synthetically generated stereo music (middle panel) generated from the true components given in the top panel. Decomposition result of a real stereo recording of music consisting of a Flute and Harp playing ”The Fog is Lifting” by Carl Nielsen. Scores given at the top. Clearly the SNTF2D separates the log- spectrogram into two components pertaining to the harp and flute respectively. By spectral masking of the log-spectrograms the two components are reconstructed revealing that the one component indeed pertains to the harp whereas the other pertains to the flute. Morten Mørup, Department of Signal Processing, Informatics and Mathematical Modeling, Technical University of Denmark, [email protected] webpage: www.imm.dtu.dk/~mm Parts of the above work done in collaboration with (see also references): Lars Kai Hansen, Professor Department of Signal Processing Informatics and Mathematical Modeling, Technical University of Denmark Mikkel N. Schmidt, Stud. PhD Department of Signal Processing Informatics and Mathematical Modeling, Technical University of Denmark Mathematical notation: Sidse M. Arnfred, Dr. Med. PhD Cognitive Research Unit Hvidovre Hospital University Hospital of Copenhagen References: Carroll, J. D. and Chang, J. J. Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition, Psychometrika 35 1970 283--319 Eggert, J. and Korner, E. Sparse coding and NMF. In Neural Networks volume 4, pages 2529-2533, 2004 Eggert, J et al Transformation-invariant representation and nmf. In Neural Networks, volume 4 , pages 535-2539, 2004 Fiitzgerald, D. et al. Non-negative tensor factorization for sound source separation. In proceedings of Irish Signals and Systems Conference, 2005 FitzGerald, D. and Coyle, E. C Sound source separation using shifted non.-negative tensor factorization. In ICASSP2006, 2006 Fitzgerald, D et al. Shifted non-negative matrix factorization for sound source separation. In Proceedings of the IEEE conference on Statistics in Signal Processing. 2005 Harshman, R. A. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modal factor analysis},UCLA Working Papers in Phonetics 16 1970 1—84 Lathauwer, Lieven De and Moor, Bart De and Vandewalle, Joos MULTILINEAR SINGULAR VALUE DECOMPOSITION.SIAM J. MATRIX ANAL. APPL.2000 (21)1253–1278 Lee, D.D. and Seung, H.S. Algorithms for non-negative matrix factorization. In NIPS, pages 556-462, 2000 Lee, D.D and Seung, H.S. Learning the parts of objects by non-negative matrix factorization, NATURE 1999 Mørup, M. and Schmidt, M.N. Sparse non-negative matrix factor 2-D deconvolution. Technical report, Institute for Mathematical Modeling, Tehcnical University of Denmark, 2006b Mørup, M and Schmidt, M.N. Non-negative Tensor Factor 2D Deconvolution for multi-channel time- frequency analysis. Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006c Schmidt, M.N. and Mørup, M. Non-negative matrix factor 2D deconvolution for blind single channel source separation. In ICA2006, pages 700-707, 2006d Mørup, M. and Hansen, L.K.and Arnfred, S.M. Algorithms for Sparse Higher Order Non-negative Matrix Factorization (HONMF), Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006e Nørgaard, L and Ridder, C.Rank annihilation factor analysis applied to flow injection analysis with photodiode-array detection Chemometrics and Intelligent Laboratory Systems 1994 (23) 107-114 Schmidt, M.N. and Mørup, M. Sparse Non-negative Matrix Factor 2-D Deconvolution for Automatic Transcription of Polyphonic Music, Technical report, Institute for Mathematical Modelling, Tehcnical University of Denmark, 2005 Smaragdis, P. Non-negative Matrix Factor deconvolution; Extraction of multiple sound sources from monophonic inputs. International Symposium on independent Component Analysis and Blind Source Separation (ICA)W Smilde, Age K. Smilde and Tauller, Roma and Saurina, Javier and Bro, Rasmus, Calibration methods for complex second-order data Analytica Chimica Acta 1999 237-251 Tamara G. Kolda Multilinear operators for higher-order decompositions technical report Sandia And also on the inter trial phase coherence (ITPC) of EEG data (see section on NTF for dataset details). The NTF decomposition reveals a right parietal activity mainly present during odd trials corresponding to left hand stimuli as well as a more frontal and a higher frequent central parietal activity While the HONMF is not unique when no sparseness is imposed, it becomes unique when imposing sparseness on the core. Here revealing that the appropriate model to the data is a PARAFAC model (Mørup et al., 2006e). Furthermore, the HONMF decomposition gives a more part based representation that is easier to interpret than the solution found by HOSVD (Lathauwer et al., 2000). The HONMF with sparseness imposed on the core and third modality resulted in a very consistent decomposition of the flow injection data capturing unsupervised the true concentrations present in each batch (given by modality 3). The PARAFAC model is a generalization of the factor analysis to higher orders, where the data is explained by an outer product of factor effects pertaining to each modality. To the right is given the general expression of the PARAFAC model for N-order tensors Synthetic data True stereo music Three equivalent ways of stating the Tucker model. The Tucker model accounts for all possible linear interactions betwee effects pertaining to each modality. Table giving how to update when imposing sparseness/normalizing the various modalities of the model 6850 0.7286 0.4209 0.9071 Estim ated H arp Estim ated Flute Stereo C hannel 1 Stereo C hannel 2 25.9 m s 50 H z 22 kH z 25.9 m s 22 kH z 50 H z 25.9 m s 25.9 m s 50 H z 22 kH z 22 kH z 50 H z Log-Spectrogram C hannel 1 Log-S pectrogram C hannel 2 Updates for the NTF2D - by including updates marked in gray sparseness is imposed on H forming the SNTf2D.

Transcript of Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data HONMF (Higher Order...

Page 1: Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data HONMF (Higher Order Non-negative Matrix Factorization) NTF2D/SNTF2D ((Sparse)

Extensions of Non-Negative Matrix Factorization (NMF) to Higher Order Data

HONMF (Higher Order Non-negative Matrix Factorization)

NTF2D/SNTF2D((Sparse) Non-negative Tensor Factor 2D Deconvolution)

Data resultsThe algorithms were used on a dataset containing the inter trial phase coherence (ITPC) of wavelet transformed EEG data. Briefly stated the data consist of 14 subject recorded during a proprioceptive stimuli consisting of a weight change of left hand during odd trials and right hand during even trials giving a total of 14·2=28 trials. Consequently, the data has the following form XChannel Time-Frequency Trials (Mørup et al. 2006a)

Informatics and Mathematical Modeling

Increasing attention has lately been given to Non-negative Matrix Factorization due to its part based representation and ease of algorithmic implementation (Lee & Seung, 1999 & 2001).

However, NMF is not in general unique – only when data adequately spans the positive orthant (Donoho and Stodden,2004). Consequently, constraints in the form of sparsity is useful to achieve unique decompositions (Hoyer 2002,2004Eggert & Körner 2004). As a result, algorithms for sparse coding using multiplicative updates have been derived(Eggert & Körner 2004, Mørup & Scmidt 2006b)

Sparse Coding NMF:Sparse Coding NMF regularizes H while keeping W normalizes such that regularization is not simply achieved by letting H go to zerowhile W goes to infinity (Eggert and Körner, 2004 Mørup & Schmidt 2006b). Csparse(H) can be any function with positive derivative - afrequently used function is the 1-norm.

Title of Nature article on NMF from 1999

NMF is based on gradient descent: Each component is updated by a step inthe negative gradient direction

NMF uses the concept of multiplicative updates:The derivative of the cost function can be split into a positive part i,d and a negative part i,d. Choosing the step size as theratio of W i,d to the positive part of the derivative i,d yieldmultiplicative updates since the gradient step then cancel theWi,d term in the gradient based update.

The resulting NMF updates:The least squares (LS) and Kullback-Leibler (KL) divergence updatesderived from the multiplicative update approach (Lee & Seung, 2001). NMF not in general unique:

If the data does not adequately span the positive orthant no unique solution can be obtained. Here red and green vectors both perfectly span the data points. However, the green vectors represent the solution the most sparse.

NTF (Non-negative Tensor Factorization)

ModelNTF is based on the PARAFAC model (Harshman 1970, Carrol & Chang 1970, Fitzgerald et al., 2005)

ModelThe HONMF is based on the Tucker model (Tucker, 1977) where non-negativity is imposed on all modalities (Mørup et al. 2006e).

ModelThe NTF2D is a PARAFAC model convolutive in 2 dimensions (Mørup & Schmidt 2006c):

Algorithms

Algorithms

Algorithms

Data resultsThe algorithms were used to analyze the absolute value of the log spectrogram of stereo recordings of music, i.e. the data had the form XChannel Log-Frequency Time

Data resultsThe algorithms were tested on a dataset of flow injection analysis (Nørgaard, 1994 Smilde, 1999), i.e. XSpectre Time Batch number

Result obtained by the SNTF2D algorithms (bottom panel) when decomposing the log-spectrogram of synthetically generated stereo music (middle panel) generated from the true components given in the top panel.

Decomposition result of a real stereo recording of music consisting of a Flute and Harp playing ”The Fog is Lifting” by Carl Nielsen. Scores given at the top. Clearly the SNTF2D separates the log-spectrogram into two components pertaining to the harp and flute respectively. By spectral masking of the log-spectrograms the two components are reconstructed revealing that the one component indeed pertains to the harp whereas the other pertains to the flute.

Morten Mørup, Department of Signal Processing, Informatics and Mathematical Modeling, Technical University of Denmark, [email protected] webpage: www.imm.dtu.dk/~mm

Parts of the above work done in collaboration with (see also references):

Lars Kai Hansen, ProfessorDepartment of Signal Processing

Informatics and Mathematical Modeling,Technical University of Denmark

Mikkel N. Schmidt, Stud. PhDDepartment of Signal Processing

Informatics and Mathematical Modeling,Technical University of Denmark

Mathematical notation:

Sidse M. Arnfred, Dr. Med. PhDCognitive Research Unit

Hvidovre HospitalUniversity Hospital of Copenhagen

References:Carroll, J. D. and Chang, J. J. Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition, Psychometrika 35 1970 283--319Eggert, J. and Korner, E. Sparse coding and NMF. In Neural Networks volume 4, pages 2529-2533, 2004Eggert, J et al Transformation-invariant representation and nmf. In Neural Networks, volume 4 , pages 535-2539, 2004Fiitzgerald, D. et al. Non-negative tensor factorization for sound source separation. In proceedings of Irish Signals and Systems Conference, 2005FitzGerald, D. and Coyle, E. C Sound source separation using shifted non.-negative tensor factorization. In ICASSP2006, 2006Fitzgerald, D et al. Shifted non-negative matrix factorization for sound source separation. In Proceedings of the IEEE conference on Statistics in Signal Processing. 2005Harshman, R. A. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modal factor analysis},UCLA Working Papers in Phonetics 16 1970 1—84Lathauwer, Lieven De and Moor, Bart De and Vandewalle, Joos MULTILINEAR SINGULAR VALUE DECOMPOSITION.SIAM J. MATRIX ANAL. APPL.2000 (21)1253–1278Lee, D.D. and Seung, H.S. Algorithms for non-negative matrix factorization. In NIPS, pages 556-462, 2000Lee, D.D and Seung, H.S. Learning the parts of objects by non-negative matrix factorization, NATURE 1999Mørup, M. and Hansen, L.K.and Arnfred, S.M.Decomposing the time-frequency representation of EEG using nonnegative matrix and multi-way factorization Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006a

Mørup, M. and Schmidt, M.N. Sparse non-negative matrix factor 2-D deconvolution. Technical report, Institute for Mathematical Modeling, Tehcnical University of Denmark, 2006bMørup, M and Schmidt, M.N. Non-negative Tensor Factor 2D Deconvolution for multi-channel time-frequency analysis. Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006cSchmidt, M.N. and Mørup, M. Non-negative matrix factor 2D deconvolution for blind single channel source separation. In ICA2006, pages 700-707, 2006dMørup, M. and Hansen, L.K.and Arnfred, S.M. Algorithms for Sparse Higher Order Non-negative Matrix Factorization (HONMF), Technical report, Institute for Mathematical Modeling, Technical University of Denmark, 2006eNørgaard, L and Ridder, C.Rank annihilation factor analysis applied to flow injection analysis with photodiode-array detection Chemometrics and Intelligent Laboratory Systems 1994 (23) 107-114Schmidt, M.N. and Mørup, M. Sparse Non-negative Matrix Factor 2-D Deconvolution for Automatic Transcription of Polyphonic Music, Technical report, Institute for Mathematical Modelling, Tehcnical University of Denmark, 2005Smaragdis, P. Non-negative Matrix Factor deconvolution; Extraction of multiple sound sources from monophonic inputs. International Symposium on independent Component Analysis and Blind Source Separation (ICA)WSmilde, Age K. Smilde and Tauller, Roma and Saurina, Javier and Bro, Rasmus, Calibration methods for complex second-order data Analytica Chimica Acta 1999 237-251Tamara G. Kolda Multilinear operators for higher-order decompositions technical report Sandia national laboratory 2006 SAND2006-2081.Tucker, L. R. Some mathematical notes on three-mode factor analysis Psychometrika 31 1966 279—311Welling, M. and Weber, M. Positive tensor factorization. Pattern Recogn. Lett. 2001

And also on the inter trial phase coherence (ITPC) of EEG data (see section on NTF for dataset details).

The NTF decomposition reveals a right parietal activity mainly present during odd trials corresponding to left hand stimuli as well as a more frontal and a higher frequent central parietal activity

While the HONMF is not unique when no sparseness is imposed, it becomes unique when imposing sparseness on the core. Here revealing that the appropriate model to the data is a PARAFAC model (Mørup et al., 2006e). Furthermore, the HONMF decomposition gives a more part based representation that is easier to interpret than the solution found by HOSVD (Lathauwer et al., 2000).

The HONMF with sparseness imposed on the core and third modality resulted in a very consistent decomposition of the flow injection data capturing unsupervised the true concentrations present in each batch (given by modality 3).

The PARAFAC model is a generalization of the factor analysis to higher orders, where the data is explained by an outer product of factor effects pertaining to each modality. To the right is given the general expression of the PARAFAC model for N-order tensors

Syn

thet

ic d

ata

Tru

e st

ereo

mu

sic

Three equivalent ways of stating the Tucker model. The Tucker model accounts for all possible linear interactions between the factoreffects pertaining to each modality.

Table giving how to update when imposing sparseness/normalizing the various modalities of the model

6850

0.72

86

0.4209

0.90

71

Estimated Harp Estimated Flute

Stereo Channel 1 Stereo Channel 2

25.9 ms

50 Hz

22 kHz

25.9 ms

22 kHz

50 Hz

25.9 ms25.9 ms

50 Hz

22 kHz 22 kHz

50 Hz

Log-Spectrogram Channel 1 Log-Spectrogram Channel 2

Updates for the NTF2D - by including updates marked in gray sparseness is imposed on H forming the SNTf2D.