Post on 08-Jul-2020
Music genre classification with multilinear andsparse techniques
Constantine Kotropoulos∗†, Yannis Panagakis∗, andGonzalo R. Arce†
∗ Department of InformaticsAristotle University of ThessalonikiThessaloniki 54124, GREECE
† Department of Electrical & Computer EngineeringUniversity of DelawareNewark, DE 19716, USA
Greek Signal Processing JamAthens, October 17th, 2009
Music genre classification with multilinear and sparse techniques 1/79
Outline
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse Projections
4 Sparse Representation-based Classification (SRC)
5 Locality Preserving Non-negative Tensor Factorization within SRC
6 Outlook
Music genre classification with multilinear and sparse techniques 2/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 3/79
Introduction
Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.
Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.
Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.
Music genre classification with multilinear and sparse techniques 4/79
Introduction
Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.
Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.
Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.
Music genre classification with multilinear and sparse techniques 4/79
Introduction
Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.
Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.
Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.
Music genre classification with multilinear and sparse techniques 4/79
Introduction
Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.
Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.
Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.
Music genre classification with multilinear and sparse techniques 4/79
Introduction
Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.
Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.
Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.
Music genre classification with multilinear and sparse techniques 4/79
Introduction
MotivationThe appealing properties of slow temporal and spectro-temporalmodulations from the human perceptual point of viewa;The strong theoretical foundations of sparse representationsbc.
aK. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and Audio
Processing, vol. 3, no. 5, pp. 382-396, 1995.b
E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly incompletefrequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489-509, February 2006.
cD. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289-1306, April 2006.
Music genre classification with multilinear and sparse techniques 5/79
Introduction
MotivationThe appealing properties of slow temporal and spectro-temporalmodulations from the human perceptual point of viewa;The strong theoretical foundations of sparse representationsbc.
aK. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and Audio
Processing, vol. 3, no. 5, pp. 382-396, 1995.b
E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly incompletefrequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489-509, February 2006.
cD. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289-1306, April 2006.
Music genre classification with multilinear and sparse techniques 5/79
Introduction
First approach: Ensemble Discriminant Sparse Projections (1)Each music recording is represented by its slow temporalmodulations, the so-called auditory temporal modulationrepresentation.Given a training set of auditory temporal modulations, thedictionary, that best represents each member of the training setunder sparsity constraints, is extracted by means of the K-SVDalgorithma.
aM. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse
representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.
Music genre classification with multilinear and sparse techniques 6/79
Introduction
First approach: Ensemble Discriminant Sparse Projections (1)Each music recording is represented by its slow temporalmodulations, the so-called auditory temporal modulationrepresentation.Given a training set of auditory temporal modulations, thedictionary, that best represents each member of the training setunder sparsity constraints, is extracted by means of the K-SVDalgorithma.
aM. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse
representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.
Music genre classification with multilinear and sparse techniques 6/79
Introduction
First approach: Ensemble Discriminant Sparse Projections (2)Discriminant Sparse Projections: The most discriminating features(MDF)a are extracted by applying dual linear discriminant analysis(LDA)b to the two principal subspaces of the within-class andbetween-class covariance matrices of the sparse coefficientvectors.Classifier Ensemble: Majority voting is applied to the decisionstaken by multiple individual dual LDA classifiers.
aD. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831-836, August 1996.b
X. Wang and X. Tang, “Dual space linear discriminant analysis for face recognition,” in Proc. IEEE Computer Society Conf.CVPR, 2004, vol. 2, pp. 564-569.
Music genre classification with multilinear and sparse techniques 7/79
Introduction
First approach: Ensemble Discriminant Sparse Projections (2)Discriminant Sparse Projections: The most discriminating features(MDF)a are extracted by applying dual linear discriminant analysis(LDA)b to the two principal subspaces of the within-class andbetween-class covariance matrices of the sparse coefficientvectors.Classifier Ensemble: Majority voting is applied to the decisionstaken by multiple individual dual LDA classifiers.
aD. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831-836, August 1996.b
X. Wang and X. Tang, “Dual space linear discriminant analysis for face recognition,” in Proc. IEEE Computer Society Conf.CVPR, 2004, vol. 2, pp. 564-569.
Music genre classification with multilinear and sparse techniques 7/79
Introduction
Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.
aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.
Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.
Music genre classification with multilinear and sparse techniques 8/79
Introduction
Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.
aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.
Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.
Music genre classification with multilinear and sparse techniques 8/79
Introduction
Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.
aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.
Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.
Music genre classification with multilinear and sparse techniques 8/79
Introduction
Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.
aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.
Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.
Music genre classification with multilinear and sparse techniques 8/79
Introduction
Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (1)
Cortical representation: A given music recording is mapped to athree-dimensional (3D) representation of its slow spectral andtemporal modulationsa.Each cortical representation is modeled as a sparse weightedsum of the basis elements (atoms) of an overcomplete dictionary,which stems from the cortical representations associated totraining music recordings whose genre is known.
aI. Panagakis, E. Benetos, and C. Kotropoulos: “Music genre classification: A multilinear approach,” in Proc. 7th Int. Symp.
Music Information Retrieval,Philadelphia, USA, 2008.
Music genre classification with multilinear and sparse techniques 9/79
Introduction
Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (1)
Cortical representation: A given music recording is mapped to athree-dimensional (3D) representation of its slow spectral andtemporal modulationsa.Each cortical representation is modeled as a sparse weightedsum of the basis elements (atoms) of an overcomplete dictionary,which stems from the cortical representations associated totraining music recordings whose genre is known.
aI. Panagakis, E. Benetos, and C. Kotropoulos: “Music genre classification: A multilinear approach,” in Proc. 7th Int. Symp.
Music Information Retrieval,Philadelphia, USA, 2008.
Music genre classification with multilinear and sparse techniques 9/79
Introduction
Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (2)
By vectorizing a typical 3D cortical representation of 6 scales, 10rates, and 128 frequency bands, one obtains a vector of 7680dimensions.Multilinear dimensionality reduction techniques do not guaranteethat two data points, which are close in the intrinsic geometry ofthe original space, are also close in the data space aftermultilinear dimensionality reduction.A novel algorithm is proposed, where the geometrical informationof the original data space is incorporated into the objectivefunction optimized by non-negative tensor factorization (NTF).
Music genre classification with multilinear and sparse techniques 10/79
Introduction
Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (2)
By vectorizing a typical 3D cortical representation of 6 scales, 10rates, and 128 frequency bands, one obtains a vector of 7680dimensions.Multilinear dimensionality reduction techniques do not guaranteethat two data points, which are close in the intrinsic geometry ofthe original space, are also close in the data space aftermultilinear dimensionality reduction.A novel algorithm is proposed, where the geometrical informationof the original data space is incorporated into the objectivefunction optimized by non-negative tensor factorization (NTF).
Music genre classification with multilinear and sparse techniques 10/79
Introduction
Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (2)
By vectorizing a typical 3D cortical representation of 6 scales, 10rates, and 128 frequency bands, one obtains a vector of 7680dimensions.Multilinear dimensionality reduction techniques do not guaranteethat two data points, which are close in the intrinsic geometry ofthe original space, are also close in the data space aftermultilinear dimensionality reduction.A novel algorithm is proposed, where the geometrical informationof the original data space is incorporated into the objectivefunction optimized by non-negative tensor factorization (NTF).
Music genre classification with multilinear and sparse techniques 10/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 11/79
Auditory Spectro-temporal Modulations
Computational Auditory ModelThe computational auditory model is inspired by psychoacousticaland neurophysiological investigations in the early and centralstages of the human auditory system.
Earl
yau
dit
ory
mo
del
Cen
tralau
dit
ory
mo
delAuditory Spectrogram
Auditory Temporal Modulations
Auditory Spectro-Temporal Modulations(Cortical Representation)
Music genre classification with multilinear and sparse techniques 12/79
Auditory Spectro-temporal Modulations
Computational Auditory ModelThe computational auditory model is inspired by psychoacousticaland neurophysiological investigations in the early and centralstages of the human auditory system.
Earl
yau
dit
ory
mo
del
Cen
tralau
dit
ory
mo
delAuditory Spectrogram
Auditory Temporal Modulations
Auditory Spectro-Temporal Modulations(Cortical Representation)
Music genre classification with multilinear and sparse techniques 12/79
Auditory Spectro-temporal Modulations
Early Auditory SystemAuditory Spectrogram: time-frequency distribution of energy alonga tonotopic (logarithmic frequency) axis.
Early auditory model
Auditory Spectrogram
Music genre classification with multilinear and sparse techniques 13/79
Auditory Spectro-temporal Modulations
Central Auditory System - Temporal Modulations
Auditory Spectrogram
ω(H
z)
ω(H
z)
Auditory Temporal Modulations
Music genre classification with multilinear and sparse techniques 14/79
Auditory Spectro-temporal Modulations
Temporal Modulation ParametersTemporal Modulations: ω ∈ 2,4,8,16,32,64,128,256 (Hz)96 frequency channels covering 4 octaves.
Music genre classification with multilinear and sparse techniques 15/79
Auditory Spectro-temporal Modulations
Temporal Modulation ParametersTemporal Modulations: ω ∈ 2,4,8,16,32,64,128,256 (Hz)96 frequency channels covering 4 octaves.
Music genre classification with multilinear and sparse techniques 15/79
Auditory Spectro-temporal Modulations
Auditory Temporal Modulations across 10 Music Genres
Blues Classical Country Disco Hiphop
Jazz Metal RockPop Reggae
Music genre classification with multilinear and sparse techniques 16/79
Auditory Spectro-temporal Modulations
Central Auditory System - Spectro-temporal Modulations
Auditory Spectrogram Auditory Spectro-Temporal Modulations
ωH
z)(
Ω(c
/o)
Music genre classification with multilinear and sparse techniques 17/79
Auditory Spectro-temporal Modulations
Cortical RepresentationA bank of 2D spectrotemporal filters is applied to the auditoryspectrogram, which are selective to different spectrotemporalmodulation parameters ranging from slow to fast rates temporally(in Hz) and from narrow to broad scales spectrally (inCycles/Octave).Each point in the auditory spectrogram has a 2D (hidden)rate-scale representation, which indicates the modulation strengthfor all rates and scales for that channel and time instant.
Music genre classification with multilinear and sparse techniques 18/79
Auditory Spectro-temporal Modulations
Cortical RepresentationA bank of 2D spectrotemporal filters is applied to the auditoryspectrogram, which are selective to different spectrotemporalmodulation parameters ranging from slow to fast rates temporally(in Hz) and from narrow to broad scales spectrally (inCycles/Octave).Each point in the auditory spectrogram has a 2D (hidden)rate-scale representation, which indicates the modulation strengthfor all rates and scales for that channel and time instant.
Music genre classification with multilinear and sparse techniques 18/79
Auditory Spectro-temporal Modulations
Temporal modulations - (Hzω )
FrequencyChannels- f
Spectralmodulations - (Ω c/o)
Cortical Representation ParametersSpectral Modulations: Ω ∈ 0.25, 0.5, 1,2,4,8 (Cycles/Octave).Temporal Modulations: Positive and negative ω ∈ 2,4,8,16,32(Hz).128 frequency channels covering 51
3 octaves.
Music genre classification with multilinear and sparse techniques 19/79
Auditory Spectro-temporal Modulations
Temporal modulations - (Hzω )
FrequencyChannels- f
Spectralmodulations - (Ω c/o)
Cortical Representation ParametersSpectral Modulations: Ω ∈ 0.25, 0.5, 1,2,4,8 (Cycles/Octave).Temporal Modulations: Positive and negative ω ∈ 2,4,8,16,32(Hz).128 frequency channels covering 51
3 octaves.
Music genre classification with multilinear and sparse techniques 19/79
Auditory Spectro-temporal Modulations
Temporal modulations - (Hzω )
FrequencyChannels- f
Spectralmodulations - (Ω c/o)
Cortical Representation ParametersSpectral Modulations: Ω ∈ 0.25, 0.5, 1,2,4,8 (Cycles/Octave).Temporal Modulations: Positive and negative ω ∈ 2,4,8,16,32(Hz).128 frequency channels covering 51
3 octaves.
Music genre classification with multilinear and sparse techniques 19/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 20/79
Overcomplete Dictionaries for Sparse Representations
Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns
+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.
Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.
Data matrix : Y = YT(3) =
[y1|y2| · · · |yNs
], where T denotes matrix
transposition.yj ∈ R768
+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.
Music genre classification with multilinear and sparse techniques 21/79
Overcomplete Dictionaries for Sparse Representations
Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns
+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.
Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.
Data matrix : Y = YT(3) =
[y1|y2| · · · |yNs
], where T denotes matrix
transposition.yj ∈ R768
+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.
Music genre classification with multilinear and sparse techniques 21/79
Overcomplete Dictionaries for Sparse Representations
Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns
+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.
Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.
Data matrix : Y = YT(3) =
[y1|y2| · · · |yNs
], where T denotes matrix
transposition.yj ∈ R768
+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.
Music genre classification with multilinear and sparse techniques 21/79
Overcomplete Dictionaries for Sparse Representations
Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns
+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.
Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.
Data matrix : Y = YT(3) =
[y1|y2| · · · |yNs
], where T denotes matrix
transposition.yj ∈ R768
+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.
Music genre classification with multilinear and sparse techniques 21/79
Overcomplete Dictionaries for Sparse Representations
Sparse ApproximationA downsampled representation of auditory temporal modulationsyj ∈ RM
+ , j = 1,2, . . . ,Ns, admits a sparse approximation over adictionary D ∈ RM×K , when yj = D xj or ||yj − Dxj ||p ≤ γ, where|| ||p denotes the `p vector norm for p = 1,2 and∞.To learn D with a fixed number of atoms K , the K-SVD is used.K-SVD iteratively alternates between sparse coding of the trainingsamples based on the current dictionary and dictionary updatingto better fit the training set.
Music genre classification with multilinear and sparse techniques 22/79
Overcomplete Dictionaries for Sparse Representations
Sparse ApproximationA downsampled representation of auditory temporal modulationsyj ∈ RM
+ , j = 1,2, . . . ,Ns, admits a sparse approximation over adictionary D ∈ RM×K , when yj = D xj or ||yj − Dxj ||p ≤ γ, where|| ||p denotes the `p vector norm for p = 1,2 and∞.To learn D with a fixed number of atoms K , the K-SVD is used.K-SVD iteratively alternates between sparse coding of the trainingsamples based on the current dictionary and dictionary updatingto better fit the training set.
Music genre classification with multilinear and sparse techniques 22/79
Overcomplete Dictionaries for Sparse Representations
Sparse ApproximationA downsampled representation of auditory temporal modulationsyj ∈ RM
+ , j = 1,2, . . . ,Ns, admits a sparse approximation over adictionary D ∈ RM×K , when yj = D xj or ||yj − Dxj ||p ≤ γ, where|| ||p denotes the `p vector norm for p = 1,2 and∞.To learn D with a fixed number of atoms K , the K-SVD is used.K-SVD iteratively alternates between sparse coding of the trainingsamples based on the current dictionary and dictionary updatingto better fit the training set.
Music genre classification with multilinear and sparse techniques 22/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (1)
The following problem is solved minxj ,D∑Nst
j=1 ||yj − Dxj ||22 subjectto ||xj ||0 ≤ L, where Nst < Ns is the number of training samplesand || ||0 counts the number of nonzero vector elements.For a given D with K atoms, where M ≤ K ≤ Nst , the optimalsparse coefficient vector of the j th training sample is found bysolving x∗j , argminx ||yj − D x||22 subject to ||x||0 ≤ L with anypursuit algorithm, e.g. OMPa or FOCUSSb.Let Y =
[y1|y2| · · · |yNst
], where yj is a compact notation for y:j .
aG. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompositions,” Optical Engineering, vol. 33, no. 7, pp.
2183-2191, July 1997.b
I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted normminimization algorithm,” IEEE Trans. Signal Processing, vol. 45, no. 3, pp. 600-616, March 1997.
Music genre classification with multilinear and sparse techniques 23/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (1)
The following problem is solved minxj ,D∑Nst
j=1 ||yj − Dxj ||22 subjectto ||xj ||0 ≤ L, where Nst < Ns is the number of training samplesand || ||0 counts the number of nonzero vector elements.For a given D with K atoms, where M ≤ K ≤ Nst , the optimalsparse coefficient vector of the j th training sample is found bysolving x∗j , argminx ||yj − D x||22 subject to ||x||0 ≤ L with anypursuit algorithm, e.g. OMPa or FOCUSSb.Let Y =
[y1|y2| · · · |yNst
], where yj is a compact notation for y:j .
aG. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompositions,” Optical Engineering, vol. 33, no. 7, pp.
2183-2191, July 1997.b
I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted normminimization algorithm,” IEEE Trans. Signal Processing, vol. 45, no. 3, pp. 600-616, March 1997.
Music genre classification with multilinear and sparse techniques 23/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (1)
The following problem is solved minxj ,D∑Nst
j=1 ||yj − Dxj ||22 subjectto ||xj ||0 ≤ L, where Nst < Ns is the number of training samplesand || ||0 counts the number of nonzero vector elements.For a given D with K atoms, where M ≤ K ≤ Nst , the optimalsparse coefficient vector of the j th training sample is found bysolving x∗j , argminx ||yj − D x||22 subject to ||x||0 ≤ L with anypursuit algorithm, e.g. OMPa or FOCUSSb.Let Y =
[y1|y2| · · · |yNst
], where yj is a compact notation for y:j .
aG. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompositions,” Optical Engineering, vol. 33, no. 7, pp.
2183-2191, July 1997.b
I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted normminimization algorithm,” IEEE Trans. Signal Processing, vol. 45, no. 3, pp. 600-616, March 1997.
Music genre classification with multilinear and sparse techniques 23/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (2)
Let also xTk : =
[xk1, xk2, . . . , xkNst
]be the k th row of X ∈ RK×Nst .
If || ||2F denotes the Frobenius norm of a matrix,
||Y− DX||2F = ||(
Y−K∑κ=1κ 6=k
dκxTκ:
)︸ ︷︷ ︸
Ek
−dkxTk :||2F = ||Ek − dkxT
k :||2F .
Music genre classification with multilinear and sparse techniques 24/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (2)
Let also xTk : =
[xk1, xk2, . . . , xkNst
]be the k th row of X ∈ RK×Nst .
If || ||2F denotes the Frobenius norm of a matrix,
||Y− DX||2F = ||(
Y−K∑κ=1κ 6=k
dκxTκ:
)︸ ︷︷ ︸
Ek
−dkxTk :||2F = ||Ek − dkxT
k :||2F .
Music genre classification with multilinear and sparse techniques 24/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (3)Let Jk be the set of training recordings, which include the k th atomin their approximation, i.e. Jk = j | xkj 6= 0, j = 1,2, . . . ,Nst.Let |Jk | denote the cardinality of the set Jk .Let Ωk be the Nst × |Jk | indicator matrix with ones for (j , ξ(j)) withj ∈ Jk and ξ(j) ∈ [1, |Jk |] be the position of j in Jk .
||Ek Ωk︸ ︷︷ ︸ER
k
−dk xTk :Ωk︸ ︷︷ ︸[xR
k :]T
||2F = ||ERk − dk [xR
k :]T ||2F .
Music genre classification with multilinear and sparse techniques 25/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (3)Let Jk be the set of training recordings, which include the k th atomin their approximation, i.e. Jk = j | xkj 6= 0, j = 1,2, . . . ,Nst.Let |Jk | denote the cardinality of the set Jk .Let Ωk be the Nst × |Jk | indicator matrix with ones for (j , ξ(j)) withj ∈ Jk and ξ(j) ∈ [1, |Jk |] be the position of j in Jk .
||Ek Ωk︸ ︷︷ ︸ER
k
−dk xTk :Ωk︸ ︷︷ ︸[xR
k :]T
||2F = ||ERk − dk [xR
k :]T ||2F .
Music genre classification with multilinear and sparse techniques 25/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (3)Let Jk be the set of training recordings, which include the k th atomin their approximation, i.e. Jk = j | xkj 6= 0, j = 1,2, . . . ,Nst.Let |Jk | denote the cardinality of the set Jk .Let Ωk be the Nst × |Jk | indicator matrix with ones for (j , ξ(j)) withj ∈ Jk and ξ(j) ∈ [1, |Jk |] be the position of j in Jk .
||Ek Ωk︸ ︷︷ ︸ER
k
−dk xTk :Ωk︸ ︷︷ ︸[xR
k :]T
||2F = ||ERk − dk [xR
k :]T ||2F .
Music genre classification with multilinear and sparse techniques 25/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (4)
Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of
ERk .
The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR
k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.
aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear
projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
Music genre classification with multilinear and sparse techniques 26/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (4)
Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of
ERk .
The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR
k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.
aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear
projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
Music genre classification with multilinear and sparse techniques 26/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (4)
Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of
ERk .
The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR
k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.
aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear
projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
Music genre classification with multilinear and sparse techniques 26/79
Overcomplete Dictionaries for Sparse Representations
K-SVD (4)
Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of
ERk .
The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR
k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.
aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear
projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.
Music genre classification with multilinear and sparse techniques 26/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 27/79
Dual Linear Discriminant Analysis of SparseRepresentations
DefinitionsLet the training set contain Ng genres and each genre class Yihave ni samples whose sample mean vector is denoted by mi ,i = 1,2, . . . ,Ng .The within-class sample covariance matrix is defined asSw = 1
Nst
∑Ngi=1∑
yj∈Yi(yj −mi) (yj −mi)
T .
The between-class sample covariance matrix is given bySb = 1
Nst
∑Ngi=1 ni (mi −m) (mi −m)T , where m is the gross
sample mean vector of the whole training set.
Music genre classification with multilinear and sparse techniques 28/79
Dual Linear Discriminant Analysis of SparseRepresentations
DefinitionsLet the training set contain Ng genres and each genre class Yihave ni samples whose sample mean vector is denoted by mi ,i = 1,2, . . . ,Ng .The within-class sample covariance matrix is defined asSw = 1
Nst
∑Ngi=1∑
yj∈Yi(yj −mi) (yj −mi)
T .
The between-class sample covariance matrix is given bySb = 1
Nst
∑Ngi=1 ni (mi −m) (mi −m)T , where m is the gross
sample mean vector of the whole training set.
Music genre classification with multilinear and sparse techniques 28/79
Dual Linear Discriminant Analysis of SparseRepresentations
DefinitionsLet the training set contain Ng genres and each genre class Yihave ni samples whose sample mean vector is denoted by mi ,i = 1,2, . . . ,Ng .The within-class sample covariance matrix is defined asSw = 1
Nst
∑Ngi=1∑
yj∈Yi(yj −mi) (yj −mi)
T .
The between-class sample covariance matrix is given bySb = 1
Nst
∑Ngi=1 ni (mi −m) (mi −m)T , where m is the gross
sample mean vector of the whole training set.
Music genre classification with multilinear and sparse techniques 28/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (1)The most discriminant features (MDFs) are obtained by projectingthe training samples on the columns of the matrixW∗ = argmaxW
|WT Sb W||WT Sw W| .
We propose to apply LDA in the space of the sparserepresentations defined by the matrix D∗.Let mi = 1
ni
∑j: yj∈Yi
xj be the sample mean vector of the sparsecoefficients associated to the training samples that belong to thei th class.
Music genre classification with multilinear and sparse techniques 29/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (1)The most discriminant features (MDFs) are obtained by projectingthe training samples on the columns of the matrixW∗ = argmaxW
|WT Sb W||WT Sw W| .
We propose to apply LDA in the space of the sparserepresentations defined by the matrix D∗.Let mi = 1
ni
∑j: yj∈Yi
xj be the sample mean vector of the sparsecoefficients associated to the training samples that belong to thei th class.
Music genre classification with multilinear and sparse techniques 29/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (1)The most discriminant features (MDFs) are obtained by projectingthe training samples on the columns of the matrixW∗ = argmaxW
|WT Sb W||WT Sw W| .
We propose to apply LDA in the space of the sparserepresentations defined by the matrix D∗.Let mi = 1
ni
∑j: yj∈Yi
xj be the sample mean vector of the sparsecoefficients associated to the training samples that belong to thei th class.
Music genre classification with multilinear and sparse techniques 29/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (2)
Sw ≈ D∗∑Ng
i=1∑
yj∈Yi
(xj − mi
) (xj − mi
)T[D∗]T = D∗ Sw [D∗]T ,
where Sw is the within-class sample covariance matrix of thesparse coefficients.Sb ≈ D∗ Sb [D∗]T , where Sb is the between-class samplecovariance matrix of the sparse coefficients.Let W , [D∗]T W. The optimization problem can be recast as
maxW|WT Sb W||WT Sw W|
.
Music genre classification with multilinear and sparse techniques 30/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (2)
Sw ≈ D∗∑Ng
i=1∑
yj∈Yi
(xj − mi
) (xj − mi
)T[D∗]T = D∗ Sw [D∗]T ,
where Sw is the within-class sample covariance matrix of thesparse coefficients.Sb ≈ D∗ Sb [D∗]T , where Sb is the between-class samplecovariance matrix of the sparse coefficients.Let W , [D∗]T W. The optimization problem can be recast as
maxW|WT Sb W||WT Sw W|
.
Music genre classification with multilinear and sparse techniques 30/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (2)
Sw ≈ D∗∑Ng
i=1∑
yj∈Yi
(xj − mi
) (xj − mi
)T[D∗]T = D∗ Sw [D∗]T ,
where Sw is the within-class sample covariance matrix of thesparse coefficients.Sb ≈ D∗ Sb [D∗]T , where Sb is the between-class samplecovariance matrix of the sparse coefficients.Let W , [D∗]T W. The optimization problem can be recast as
maxW|WT Sb W||WT Sw W|
.
Music genre classification with multilinear and sparse techniques 30/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (3)
If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =
[[D∗]†
]T W∗,
which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.
Music genre classification with multilinear and sparse techniques 31/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (3)
If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =
[[D∗]†
]T W∗,
which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.
Music genre classification with multilinear and sparse techniques 31/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (3)
If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =
[[D∗]†
]T W∗,
which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.
Music genre classification with multilinear and sparse techniques 31/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (3)
If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =
[[D∗]†
]T W∗,
which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.
Music genre classification with multilinear and sparse techniques 31/79
Dual Linear Discriminant Analysis of SparseRepresentations
Discriminant Sparse Projections (3)
If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =
[[D∗]†
]T W∗,
which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.
Music genre classification with multilinear and sparse techniques 31/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sw
Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has
rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].
Sb is transformed to Qb = G− 1
2w ΦT
w Sb Φw G− 1
2w .
Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT
b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.
The Ng − 1 discriminative vectors in the principal subspace of Sw
are the columns of U∗F = Φw G− 1
2w Ψb.
Music genre classification with multilinear and sparse techniques 32/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sw
Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has
rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].
Sb is transformed to Qb = G− 1
2w ΦT
w Sb Φw G− 1
2w .
Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT
b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.
The Ng − 1 discriminative vectors in the principal subspace of Sw
are the columns of U∗F = Φw G− 1
2w Ψb.
Music genre classification with multilinear and sparse techniques 32/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sw
Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has
rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].
Sb is transformed to Qb = G− 1
2w ΦT
w Sb Φw G− 1
2w .
Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT
b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.
The Ng − 1 discriminative vectors in the principal subspace of Sw
are the columns of U∗F = Φw G− 1
2w Ψb.
Music genre classification with multilinear and sparse techniques 32/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sw
Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has
rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].
Sb is transformed to Qb = G− 1
2w ΦT
w Sb Φw G− 1
2w .
Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT
b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.
The Ng − 1 discriminative vectors in the principal subspace of Sw
are the columns of U∗F = Φw G− 1
2w Ψb.
Music genre classification with multilinear and sparse techniques 32/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sb
Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It
has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb
].
Sw is transformed to Qw = G− 1
2b ΦT
b Sw Φb G− 1
2b .
Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT
w .The Ng − 1 discriminative vectors in the principal subspace of Sb
are the columns of U∗F
= Φb G− 1
2b Ψw H
− 12
w .
Music genre classification with multilinear and sparse techniques 33/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sb
Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It
has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb
].
Sw is transformed to Qw = G− 1
2b ΦT
b Sw Φb G− 1
2b .
Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT
w .The Ng − 1 discriminative vectors in the principal subspace of Sb
are the columns of U∗F
= Φb G− 1
2b Ψw H
− 12
w .
Music genre classification with multilinear and sparse techniques 33/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sb
Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It
has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb
].
Sw is transformed to Qw = G− 1
2b ΦT
b Sw Φb G− 1
2b .
Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT
w .The Ng − 1 discriminative vectors in the principal subspace of Sb
are the columns of U∗F
= Φb G− 1
2b Ψw H
− 12
w .
Music genre classification with multilinear and sparse techniques 33/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA: Principal subspace of Sb
Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It
has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb
].
Sw is transformed to Qw = G− 1
2b ΦT
b Sw Φb G− 1
2b .
Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT
w .The Ng − 1 discriminative vectors in the principal subspace of Sb
are the columns of U∗F
= Φb G− 1
2b Ψw H
− 12
w .
Music genre classification with multilinear and sparse techniques 33/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA ClassifierAt the test stage, the sparse coefficient vector of any test sampley and the class centers are projected to the discriminant vectors inthe two principal subspaces
D(y,mi) = ||[U∗F ]T(x− mi
)||2 + % ||[U∗F ]T
(x− mi
)||2
where % =tr[
U∗F [U∗F ]T]
tr[
U∗F
[U∗F]T] and tr[ ] stands for the trace of the matrix
enclosed in brackets.The test sample y is classified to genre i∗ = argmini D(y,mi).
Music genre classification with multilinear and sparse techniques 34/79
Dual Linear Discriminant Analysis of SparseRepresentations
Dual LDA ClassifierAt the test stage, the sparse coefficient vector of any test sampley and the class centers are projected to the discriminant vectors inthe two principal subspaces
D(y,mi) = ||[U∗F ]T(x− mi
)||2 + % ||[U∗F ]T
(x− mi
)||2
where % =tr[
U∗F [U∗F ]T]
tr[
U∗F
[U∗F]T] and tr[ ] stands for the trace of the matrix
enclosed in brackets.The test sample y is classified to genre i∗ = argmini D(y,mi).
Music genre classification with multilinear and sparse techniques 34/79
Dual Linear Discriminant Analysis of SparseRepresentations
Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona
By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗
F]τ in each training
dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.
aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.
Music genre classification with multilinear and sparse techniques 35/79
Dual Linear Discriminant Analysis of SparseRepresentations
Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona
By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗
F]τ in each training
dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.
aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.
Music genre classification with multilinear and sparse techniques 35/79
Dual Linear Discriminant Analysis of SparseRepresentations
Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona
By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗
F]τ in each training
dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.
aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.
Music genre classification with multilinear and sparse techniques 35/79
Dual Linear Discriminant Analysis of SparseRepresentations
Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona
By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗
F]τ in each training
dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.
aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.
Music genre classification with multilinear and sparse techniques 35/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 36/79
Experimental Assessment
GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.
aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.
10, no. 5, pp. 293-302, July 2002.
Music genre classification with multilinear and sparse techniques 37/79
Experimental Assessment
GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.
aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.
10, no. 5, pp. 293-302, July 2002.
Music genre classification with multilinear and sparse techniques 37/79
Experimental Assessment
GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.
aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.
10, no. 5, pp. 293-302, July 2002.
Music genre classification with multilinear and sparse techniques 37/79
Experimental Assessment
GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.
aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.
10, no. 5, pp. 293-302, July 2002.
Music genre classification with multilinear and sparse techniques 37/79
Experimental Assessment
Feature ExtractionThe auditory temporal modulations representation is computedover a segment of 30 sec duration, vectorized, and normalized tounit length.Each fold delivers a raw training pattern matrix of size 768× 900and a raw test pattern matrix of size 768× 100, which undergodownsampling with ratios 1/8, 1/4, 1/3, 1/2 in the rate-frequencydomain. Downsampled training pattern matrix Y ∈ RM×Nst ,M ∈ 12,48,85,192 and Nst = 900.
Multidimensional scaling (MDS)MDS with locality preserving indexinga
aD. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Trans. Knowledge and Data
Engineering, vol. 17, no. 12, pp. 1624-1637, December 2005.
Music genre classification with multilinear and sparse techniques 38/79
Experimental Assessment
Feature ExtractionThe auditory temporal modulations representation is computedover a segment of 30 sec duration, vectorized, and normalized tounit length.Each fold delivers a raw training pattern matrix of size 768× 900and a raw test pattern matrix of size 768× 100, which undergodownsampling with ratios 1/8, 1/4, 1/3, 1/2 in the rate-frequencydomain. Downsampled training pattern matrix Y ∈ RM×Nst ,M ∈ 12,48,85,192 and Nst = 900.
Multidimensional scaling (MDS)MDS with locality preserving indexinga
aD. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Trans. Knowledge and Data
Engineering, vol. 17, no. 12, pp. 1624-1637, December 2005.
Music genre classification with multilinear and sparse techniques 38/79
Experimental Assessment
Feature ExtractionThe auditory temporal modulations representation is computedover a segment of 30 sec duration, vectorized, and normalized tounit length.Each fold delivers a raw training pattern matrix of size 768× 900and a raw test pattern matrix of size 768× 100, which undergodownsampling with ratios 1/8, 1/4, 1/3, 1/2 in the rate-frequencydomain. Downsampled training pattern matrix Y ∈ RM×Nst ,M ∈ 12,48,85,192 and Nst = 900.
Multidimensional scaling (MDS)MDS with locality preserving indexinga
aD. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Trans. Knowledge and Data
Engineering, vol. 17, no. 12, pp. 1624-1637, December 2005.
Music genre classification with multilinear and sparse techniques 38/79
Experimental Assessment
Auditory temporal modulations for the 1st test fold for M = 192
−0.27 −0.26 −0.25 −0.24 −0.23 −0.22 −0.21 −0.2 −0.19 −0.18 −0.17−0.04
−0.03
−0.02
−0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
1st MDS coordinate
2nd
MD
S c
oord
inat
e
TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock
Music genre classification with multilinear and sparse techniques 39/79
Experimental Assessment
Sparse coefficients for K = 400 and L = 20
−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.15
−0.1
−0.05
0
0.05
0.1
1st MDS coordinate
2nd
MD
S c
oord
inat
e
TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock
Music genre classification with multilinear and sparse techniques 40/79
Experimental Assessment
Statistics of the number of non-zero coefficients, when [D∗]τ ,τ = 1,2, . . . ,10 are employed in OMP
0 20 40 60 80 1000
2
4
6
8
10
12
Test sample index j
min|
xj| 0
,E|
xj| 0
,m
ax|
xj| 0
Music genre classification with multilinear and sparse techniques 41/79
Experimental Assessment
Projections of the sparse coefficient vectors to the principalsubspaces of [Sw ]1 and [Sb]1
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1st MDS coordinate
2nd
MD
S c
oord
inat
e
TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1st MDS coordinate
2nd
MD
S c
oord
inat
e
TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock
Music genre classification with multilinear and sparse techniques 42/79
Experimental Assessment
Classifier ensemble decisions
10 20 30 40 50 60 70 80 90 100
Blues
Classical
Country
Disco
Hiphop
Jazz
Metal
Pop
Reggae
Rock
Test sample index j
Cla
ssifi
er e
nsem
ble
deci
sion
BluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock
Music genre classification with multilinear and sparse techniques 43/79
Experimental Assessment
Average classification accuracy (in %)Classifier Ratio
1/8 1/4 1/3 1/2dual LDA 34.2 42.3 49.3 54.3ensemble dual LDA 34.8 44.1 52.4 57.5DKL 34.4 42.3 49.3 54.4ensemble dual DKL 34.9 44.1 52.4 57.5discriminant sparse projections 42.2 43.03 55.03 57.64ensemble discriminative sparse projection 44.64 59.9 75.33 84.96
Music genre classification with multilinear and sparse techniques 44/79
Experimental Assessment
and its 95% confidence interval (in %)Classifier Ratio
1/8 1/4 1/3 1/2dual LDA/DKL 2.95 3.07 3.11 3.10ensemble dual LDA/DKL 2.96 3.09 3.11 3.07discriminant sparse projections 3.07 3.08 3.10 3.07ensemble discriminant sparse projections 3.09 3.05 2.68 2.22
Music genre classification with multilinear and sparse techniques 45/79
Experimental Assessment
Cumulative Confusion Matrix (in %)Genre Blues Classical Country Disco Hiphop Jazz Metal Pop Reggae RockBlues 91 1 2 0 5 0 0 0 1 0
Classical 0 96 0 1 0 0 1 0 0 2Country 2 0 88 1 0 0 0 1 3 5Disco 0 0 2 89 2 0 0 4 0 3
Hiphop 0 0 0 9 78 0 3 9 0 1Jazz 3 0 2 1 0 92 0 0 0 2Metal 0 3 3 0 0 0 88 0 0 6Pop 0 0 1 6 2 0 0 86 1 4
Reggae 4 1 0 5 11 1 0 2 70 6Rock 6 0 1 5 3 2 8 3 2 70
Music genre classification with multilinear and sparse techniques 46/79
Sparse Representation-based Classification (SRC)
Main Idea (1)
Let us denote by Ai = [ai,1|ai,2| . . . |ai,ni ] ∈ R768×ni+ the
(sub)dictionary that has as columns the ni auditory modulationrepresentations stemming from the i th genre (i.e., atoms).Given a test auditory representation y ∈ R768
+ , that belongs to thei th genre, it can be expressed as y = Ai ci , whereci = [ci,1, ci,2, . . . , ci,ni ]
T ∈ Rni .
Music genre classification with multilinear and sparse techniques 47/79
Sparse Representation-based Classification (SRC)
Main Idea (1)
Let us denote by Ai = [ai,1|ai,2| . . . |ai,ni ] ∈ R768×ni+ the
(sub)dictionary that has as columns the ni auditory modulationrepresentations stemming from the i th genre (i.e., atoms).Given a test auditory representation y ∈ R768
+ , that belongs to thei th genre, it can be expressed as y = Ai ci , whereci = [ci,1, ci,2, . . . , ci,ni ]
T ∈ Rni .
Music genre classification with multilinear and sparse techniques 47/79
Sparse Representation-based Classification (SRC)
Main Idea (1)
Let us denote by Ai = [ai,1|ai,2| . . . |ai,ni ] ∈ R768×ni+ the
(sub)dictionary that has as columns the ni auditory modulationrepresentations stemming from the i th genre (i.e., atoms).Given a test auditory representation y ∈ R768
+ , that belongs to thei th genre, it can be expressed as y = Ai ci , whereci = [ci,1, ci,2, . . . , ci,ni ]
T ∈ Rni .
Music genre classification with multilinear and sparse techniques 47/79
Sparse Representation-based Classification (SRC)
Main Idea (2)
Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the
n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT
i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.
Music genre classification with multilinear and sparse techniques 48/79
Sparse Representation-based Classification (SRC)
Main Idea (2)
Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the
n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT
i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.
Music genre classification with multilinear and sparse techniques 48/79
Sparse Representation-based Classification (SRC)
Main Idea (2)
Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the
n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT
i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.
Music genre classification with multilinear and sparse techniques 48/79
Sparse Representation-based Classification (SRC)
Main Idea (2)
Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the
n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT
i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.
Music genre classification with multilinear and sparse techniques 48/79
Sparse Representation-based Classification (SRC)
Problem formulationGiven D and y, solve for
c∗ = argminc||c||0 subject to D c = y.
The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.
Music genre classification with multilinear and sparse techniques 49/79
Sparse Representation-based Classification (SRC)
Problem formulationGiven D and y, solve for
c∗ = argminc||c||0 subject to D c = y.
The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.
Music genre classification with multilinear and sparse techniques 49/79
Sparse Representation-based Classification (SRC)
Problem formulationGiven D and y, solve for
c∗ = argminc||c||0 subject to D c = y.
The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.
Music genre classification with multilinear and sparse techniques 49/79
Sparse Representation-based Classification (SRC)
Problem formulationGiven D and y, solve for
c∗ = argminc||c||0 subject to D c = y.
The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.
Music genre classification with multilinear and sparse techniques 49/79
Sparse Representation-based Classification (SRC)
Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:
c∗ = argminc||c||1 subject to WT D c = WT y
where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by
non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.
Downsampling the entries of D is another option.
Music genre classification with multilinear and sparse techniques 50/79
Sparse Representation-based Classification (SRC)
Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:
c∗ = argminc||c||1 subject to WT D c = WT y
where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by
non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.
Downsampling the entries of D is another option.
Music genre classification with multilinear and sparse techniques 50/79
Sparse Representation-based Classification (SRC)
Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:
c∗ = argminc||c||1 subject to WT D c = WT y
where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by
non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.
Downsampling the entries of D is another option.
Music genre classification with multilinear and sparse techniques 50/79
Sparse Representation-based Classification (SRC)
Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:
c∗ = argminc||c||1 subject to WT D c = WT y
where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by
non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.
Downsampling the entries of D is another option.
Music genre classification with multilinear and sparse techniques 50/79
Sparse Representation-based Classification (SRC)
Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:
c∗ = argminc||c||1 subject to WT D c = WT y
where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by
non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.
Downsampling the entries of D is another option.
Music genre classification with multilinear and sparse techniques 50/79
Sparse Representation-based Classification (SRC)
Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:
c∗ = argminc||c||1 subject to WT D c = WT y
where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by
non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.
Downsampling the entries of D is another option.
Music genre classification with multilinear and sparse techniques 50/79
Sparse Representation-based Classification (SRC)
Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:
c∗ = argminc||c||1 subject to WT D c = WT y
where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by
non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.
Downsampling the entries of D is another option.
Music genre classification with multilinear and sparse techniques 50/79
Sparse Representation-based Classification (SRC)
Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.
Why a Redundant Dictionary?Enables treatment of
missing dataoutliersnoise.
Music genre classification with multilinear and sparse techniques 51/79
Sparse Representation-based Classification (SRC)
Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.
Why a Redundant Dictionary?Enables treatment of
missing dataoutliersnoise.
Music genre classification with multilinear and sparse techniques 51/79
Sparse Representation-based Classification (SRC)
Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.
Why a Redundant Dictionary?Enables treatment of
missing dataoutliersnoise.
Music genre classification with multilinear and sparse techniques 51/79
Sparse Representation-based Classification (SRC)
Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.
Why a Redundant Dictionary?Enables treatment of
missing dataoutliersnoise.
Music genre classification with multilinear and sparse techniques 51/79
Sparse Representation-based Classification (SRC)
Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.
Why a Redundant Dictionary?Enables treatment of
missing dataoutliersnoise.
Music genre classification with multilinear and sparse techniques 51/79
Sparse Representation-based Classification (SRC)
ClassificationA test auditory modulation is classified as follows.
1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns
of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that
are associated to multiple genres.5 Each auditory modulations representation is classified to the
genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.
Music genre classification with multilinear and sparse techniques 52/79
Sparse Representation-based Classification (SRC)
ClassificationA test auditory modulation is classified as follows.
1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns
of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that
are associated to multiple genres.5 Each auditory modulations representation is classified to the
genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.
Music genre classification with multilinear and sparse techniques 52/79
Sparse Representation-based Classification (SRC)
ClassificationA test auditory modulation is classified as follows.
1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns
of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that
are associated to multiple genres.5 Each auditory modulations representation is classified to the
genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.
Music genre classification with multilinear and sparse techniques 52/79
Sparse Representation-based Classification (SRC)
ClassificationA test auditory modulation is classified as follows.
1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns
of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that
are associated to multiple genres.5 Each auditory modulations representation is classified to the
genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.
Music genre classification with multilinear and sparse techniques 52/79
Sparse Representation-based Classification (SRC)
ClassificationA test auditory modulation is classified as follows.
1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns
of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that
are associated to multiple genres.5 Each auditory modulations representation is classified to the
genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.
Music genre classification with multilinear and sparse techniques 52/79
Sparse Representation-based Classification (SRC)
Sparse coefficients and residuals for a test auditory temporalmodulations representation of blues genre
Music genre classification with multilinear and sparse techniques 53/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 54/79
Experimental Assessment
Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset
1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).
ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900
+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.
Music genre classification with multilinear and sparse techniques 55/79
Experimental Assessment
Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset
1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).
ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900
+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.
Music genre classification with multilinear and sparse techniques 55/79
Experimental Assessment
Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset
1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).
ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900
+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.
Music genre classification with multilinear and sparse techniques 55/79
Experimental Assessment
Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset
1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).
ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900
+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.
Music genre classification with multilinear and sparse techniques 55/79
Experimental Assessment
ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing
NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .
Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.
ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)
Music genre classification with multilinear and sparse techniques 56/79
Experimental Assessment
ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing
NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .
Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.
ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)
Music genre classification with multilinear and sparse techniques 56/79
Experimental Assessment
ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing
NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .
Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.
ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)
Music genre classification with multilinear and sparse techniques 56/79
Experimental Assessment
ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing
NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .
Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.
ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)
Music genre classification with multilinear and sparse techniques 56/79
Experimental Assessment
SRC accuracy on the GTZAN and ISMIR2004 datasets
0 50 100 150 20020
30
40
50
60
70
80
90
100
Feature Dimension
Cla
ssifi
catio
n A
ccur
acy
(%)
NMFPCARandomDownsample
0 50 100 150 20020
30
40
50
60
70
80
90
100
Feature Dimension
Cla
ssifi
catio
n A
ccur
acy
(%)
NMFPCARandomDownsample
Music genre classification with multilinear and sparse techniques 57/79
Experimental Assessment
Linear LVM accuracy on the GTZAN and ISMIR2004 datasets
0 50 100 150 20020
30
40
50
60
70
80
90
100
Feature Dimension
Cla
ssifi
catio
n A
ccur
acy
(%)
NMFPCARandomDownsample
0 50 100 150 20020
30
40
50
60
70
80
90
100
Feature Dimension
Cla
ssifi
catio
n A
ccur
acy
(%)
NMFPCARandomDownsample
Music genre classification with multilinear and sparse techniques 58/79
Experimental Assessment
NN accuracy on the GTZAN and ISMIR2004 datasets
0 50 100 150 20020
30
40
50
60
70
80
90
100
Feature Dimension
Cla
ssifi
catio
n A
ccur
acy
(%)
NMFPCARandomDownsample
0 50 100 150 20020
30
40
50
60
70
80
90
100
Feature Dimension
Cla
ssifi
catio
n A
ccur
acy
(%)
NMFPCARandomDownsample
Music genre classification with multilinear and sparse techniques 59/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 60/79
Locality Preserving Non-negative Tensor Factorization
Multilinear Dimensionality Reduction TechniquesUnsupervised ones: Non-Negative Tensor Factorization (NTF)a,Multilinear Principal Component Analysis (MPCA)b, Non-negativeMPCA (NMPCA)c;Supervised ones: General Tensor Discriminant Analysis (GTDA)d,Discriminant Non-Negative Tensor Factorization (DNTF)e.
aE. Benetos and C. Kotropoulos, “A tensor-based approach for automatic music genre classification,” in Proc. XVI European
Signal Processing Conf., Lausanne, Switzerland, 2008.b
H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos: “MPCA: Multilinear principal component analysis of tensor objects,”IEEE Trans. Neural Networks, vol. 19, no. 1, pp 18-39, 2008.
cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal
modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d
D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700-1715, 2007.
eS. Zafeiriou, “Discriminant nonnegative tensor factorization algorithms,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp.
217-235, 2009.
Music genre classification with multilinear and sparse techniques 61/79
Locality Preserving Non-negative Tensor Factorization
Multilinear Dimensionality Reduction TechniquesUnsupervised ones: Non-Negative Tensor Factorization (NTF)a,Multilinear Principal Component Analysis (MPCA)b, Non-negativeMPCA (NMPCA)c;Supervised ones: General Tensor Discriminant Analysis (GTDA)d,Discriminant Non-Negative Tensor Factorization (DNTF)e.
aE. Benetos and C. Kotropoulos, “A tensor-based approach for automatic music genre classification,” in Proc. XVI European
Signal Processing Conf., Lausanne, Switzerland, 2008.b
H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos: “MPCA: Multilinear principal component analysis of tensor objects,”IEEE Trans. Neural Networks, vol. 19, no. 1, pp 18-39, 2008.
cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal
modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d
D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700-1715, 2007.
eS. Zafeiriou, “Discriminant nonnegative tensor factorization algorithms,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp.
217-235, 2009.
Music genre classification with multilinear and sparse techniques 61/79
Locality Preserving Non-negative Tensor Factorization
Multilinear Dimensionality Reduction TechniquesUnsupervised ones: Non-Negative Tensor Factorization (NTF)a,Multilinear Principal Component Analysis (MPCA)b, Non-negativeMPCA (NMPCA)c;Supervised ones: General Tensor Discriminant Analysis (GTDA)d,Discriminant Non-Negative Tensor Factorization (DNTF)e.
aE. Benetos and C. Kotropoulos, “A tensor-based approach for automatic music genre classification,” in Proc. XVI European
Signal Processing Conf., Lausanne, Switzerland, 2008.b
H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos: “MPCA: Multilinear principal component analysis of tensor objects,”IEEE Trans. Neural Networks, vol. 19, no. 1, pp 18-39, 2008.
cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal
modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d
D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700-1715, 2007.
eS. Zafeiriou, “Discriminant nonnegative tensor factorization algorithms,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp.
217-235, 2009.
Music genre classification with multilinear and sparse techniques 61/79
Locality Preserving Non-negative Tensor Factorization
Local structure in a nonlinear manifoldLet Aq|Qq=1 be a set of Q non-negative tensors of order N, whichlie in a nonlinear manifold A embedded into the tensor space. Theset can be represented as an (N + 1)-order tensor A ∈ RI1×I2×...
+×IN×IN+1 with IN+1 = Q.The local structure of A can be modeled by the nearest neighborgraph G whose weight matrix S has elements
sqp =
e−||Aq−Ap||2
τ if Aq and Ap belong to the same class0 otherwise
with || ||2 denoting the tensor norma.The Laplacian matrix is L = Γ− S, where Γ is a diagonal matrixwith elements γqq =
∑p sqp, i.e. the column sums of S.
aT. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, to appear.
Music genre classification with multilinear and sparse techniques 62/79
Locality Preserving Non-negative Tensor Factorization
Local structure in a nonlinear manifoldLet Aq|Qq=1 be a set of Q non-negative tensors of order N, whichlie in a nonlinear manifold A embedded into the tensor space. Theset can be represented as an (N + 1)-order tensor A ∈ RI1×I2×...
+×IN×IN+1 with IN+1 = Q.The local structure of A can be modeled by the nearest neighborgraph G whose weight matrix S has elements
sqp =
e−||Aq−Ap||2
τ if Aq and Ap belong to the same class0 otherwise
with || ||2 denoting the tensor norma.The Laplacian matrix is L = Γ− S, where Γ is a diagonal matrixwith elements γqq =
∑p sqp, i.e. the column sums of S.
aT. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, to appear.
Music genre classification with multilinear and sparse techniques 62/79
Locality Preserving Non-negative Tensor Factorization
Local structure in a nonlinear manifoldLet Aq|Qq=1 be a set of Q non-negative tensors of order N, whichlie in a nonlinear manifold A embedded into the tensor space. Theset can be represented as an (N + 1)-order tensor A ∈ RI1×I2×...
+×IN×IN+1 with IN+1 = Q.The local structure of A can be modeled by the nearest neighborgraph G whose weight matrix S has elements
sqp =
e−||Aq−Ap||2
τ if Aq and Ap belong to the same class0 otherwise
with || ||2 denoting the tensor norma.The Laplacian matrix is L = Γ− S, where Γ is a diagonal matrixwith elements γqq =
∑p sqp, i.e. the column sums of S.
aT. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, to appear.
Music genre classification with multilinear and sparse techniques 62/79
Locality Preserving Non-negative Tensor Factorization
Problem statementLet Z(i) = U(N+1) . . . U(i+1) U(i−1) . . . U(1).Subject to U(i) ≥ 0, i = 1,2, . . . ,N + 1, minimize
fLPNTF(U(i)|N+1
i=1
)= ||A(i)−U(i)[Z(i)]T ||2 +λtr
[U(N+1)
]T L U(N+1)
where U(i) ∈ RIi×k+ , k is the desirable number of rank-1 tensors
approximating A when linearly combined, and λ > 0.
Music genre classification with multilinear and sparse techniques 63/79
Locality Preserving Non-negative Tensor Factorization
Problem statementLet Z(i) = U(N+1) . . . U(i+1) U(i−1) . . . U(1).Subject to U(i) ≥ 0, i = 1,2, . . . ,N + 1, minimize
fLPNTF(U(i)|N+1
i=1
)= ||A(i)−U(i)[Z(i)]T ||2 +λtr
[U(N+1)
]T L U(N+1)
where U(i) ∈ RIi×k+ , k is the desirable number of rank-1 tensors
approximating A when linearly combined, and λ > 0.
Music genre classification with multilinear and sparse techniques 63/79
Locality Preserving Non-negative Tensor Factorization
Gradients
∇U(i) fLPNTF =
U(i)[Z(i)]T Z(i)︸ ︷︷ ︸∇+
U(i) fLPNTF
− A(i)Z(i)︸ ︷︷ ︸∇−
U(i) fLPNTF
for i = 1,2, . . . ,N
U(N+1)[Z(N+1)
]T Z(N+1) + λ ΓU(N+1)︸ ︷︷ ︸∇+
U(N+1)fLPNTF
−
−(A(N+1)Z(N+1) + λ S U(N+1)
)︸ ︷︷ ︸∇−
U(N+1)fLPNTF
for i = N + 1.
Music genre classification with multilinear and sparse techniques 64/79
Locality Preserving Non-negative Tensor Factorization
Robust multiplicative update rulesExtending (Lin, 2007)a, it is proven:
U(i)[t+1] = U(i)
[t] −U(i)
[t]
∇+
U(i)[t]
fLPNTF + δ∗ ∇U(i)
[t]fLPNTF
U(i)[t] =
U(i)
[t] if ∇U(i)[t]
fLPNTF ≥ 0
σ otherwise
for σ, δ small positive numbers, typically 10−8. The division iselementwise and t denotes the iteration index.
aC. -J. Lin, “On the convergence of multiplicative update algorithms for nonnegative matrix factorization,” IEEE Trans. Neural
Networks, vol. 18, no. 6, pp. 1589-1596, 2007.
Music genre classification with multilinear and sparse techniques 65/79
Locality Preserving Non-negative Tensor Factorization
Impact on the SRC
Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,
I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k
+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT
(4) = WT [U(4)]T .
For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).
Music genre classification with multilinear and sparse techniques 66/79
Locality Preserving Non-negative Tensor Factorization
Impact on the SRC
Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,
I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k
+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT
(4) = WT [U(4)]T .
For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).
Music genre classification with multilinear and sparse techniques 66/79
Locality Preserving Non-negative Tensor Factorization
Impact on the SRC
Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,
I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k
+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT
(4) = WT [U(4)]T .
For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).
Music genre classification with multilinear and sparse techniques 66/79
Locality Preserving Non-negative Tensor Factorization
Impact on the SRC
Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,
I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k
+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT
(4) = WT [U(4)]T .
For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).
Music genre classification with multilinear and sparse techniques 66/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 67/79
Experimental Assessment
Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900
+ ;AISMIR ∈ R6×10×128×729
+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.
Music genre classification with multilinear and sparse techniques 68/79
Experimental Assessment
Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900
+ ;AISMIR ∈ R6×10×128×729
+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.
Music genre classification with multilinear and sparse techniques 68/79
Experimental Assessment
Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900
+ ;AISMIR ∈ R6×10×128×729
+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.
Music genre classification with multilinear and sparse techniques 68/79
Experimental Assessment
Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900
+ ;AISMIR ∈ R6×10×128×729
+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.
Music genre classification with multilinear and sparse techniques 68/79
Experimental Assessment
Total number of retained principal components in each mode forthe GTZAN and ISMIR2004 datasets
78 80 82 84 86 88 90 92 942
3
4
5
6
7
8
9
10
11
12
Portion of total scatter retained (%)
Num
ber
of p
rinci
pal c
ompo
nent
s
Rate subspaceScale subspaceFrequency subspace
78 80 82 84 86 88 90 92 942
3
4
5
6
7
8
9
10
11
12
Portion of total scatter retained (%)
Num
ber
of p
rinci
pal c
ompo
nent
s
Rate subspaceScale subspaceFrequency subspace
Music genre classification with multilinear and sparse techniques 69/79
Experimental Assessment
Feature dimension for the GTZAN and ISMIR2004 datasets
78 80 82 84 86 88 90 92 9440
60
80
100
120
140
160
180
200
220
Portion of total scatter retained (%)
Fea
ture
dim
ensi
on
78 80 82 84 86 88 90 92 9420
40
60
80
100
120
140
160
180
200
220
Portion of total scatter retained (%)
Fea
ture
dim
ensi
on
Music genre classification with multilinear and sparse techniques 70/79
Experimental Assessment
SRC accuracy on the GTZAN and ISMIR2004 datasets
78 80 82 84 86 88 90 92 9430
40
50
60
70
80
90
100
Portion of total scatter retained (%)
Cla
ssifi
catio
n A
ccur
acy
(%)
LPNTFNTFDNTFMPCAGTDARandom
78 80 82 84 86 88 90 92 9430
40
50
60
70
80
90
100
Portion of total scatter retained (%)
Cla
ssifi
catio
n A
ccur
acy
(%)
LPNTFNTFDNTFMPCAGTDARandom
Music genre classification with multilinear and sparse techniques 71/79
Experimental Evaluation of SRC on CorticalRepresentations
Linear SVM accuracy on the GTZAN and ISMIR2004 datasets
78 80 82 84 86 88 90 92 9430
40
50
60
70
80
90
100
Portion of total scatter retained (%)
Cla
ssifi
catio
n A
ccur
acy
(%)
LPNTFNTFDNTFMPCAGTDARandom
78 80 82 84 86 88 90 92 9430
40
50
60
70
80
90
100
Portion of total scatter retained(%)
Cla
ssifi
catio
n A
ccur
acy
(%)
LPNTFNTFDNTFMPCAGTDARandom
Music genre classification with multilinear and sparse techniques 72/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 73/79
Comparison with the State of the Art
GTZAN datasetMethod Accuracy (in %)Topology preserving NTF +SRC 93.7LNPTF+SRCa 92.4 ± 2NMF+SRCb 91 ± 1.76Ensemble discriminant sparse projections 84.96 ± 2.22NMPCA + SVM-RBFc 84.3Adaboostd 82.5Daubechies wavelet coefficient histograms + SVMe 78.5Daubechies wavelet coefficient histograms + LDA 71.3
aY. Panagakis, C. Kotropoulos, and G. R. Arce, “Music genre classification using locality preserving non-negative tensor
factorization and sparse representations, in Proc. 2009 Int. Conf. Music Information Retrieval, Kobe, Japan, October 2009.b
Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Music genre classification via sparse representations of auditory temporalmodulations,” in Proc. 17th European Signal Processing Conf., Glasgow, August 2009.
cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal
modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d
J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kegl, “Aggregate features and AdaBoost for music classification,”Machine Learning, vol. 65, no. 2-3, pp. 473-484, 2006.
eT. Li, M. Ogihara, and Q. Li, “A comparative study on content-based music genre classification,” in Proc. 26th Int. ACM
SIGIR Conf. Research and Development in Information Retrieval, Toronto, Canada, 2003, pp. 282-289.
Music genre classification with multilinear and sparse techniques 74/79
Comparison with the State of the Art
ISMIR2004 datasetMethod Accuracy (in %)Topology preserving NTF +SRC 94.93NTF+SRC (ISMIR2009) 94.38 ± 1.68LPNTF+SRC (ISMIR 2009) 94.25 ± 1.70PCA+SRC (EUSIPCO 2009) 93.56 ± 1.79NMF+GMMa 83.5NTF + SVM-RBF (IEEE TSLP 2009) 83.15Adaboost 82.3NMPCA + SVM-RBF (IEEE TSLP 2009) 82.19
aA. Holzapfel and Y. Stylianou, “Musical genre classification using nonnegative matrix factorization-based features,” IEEE
Trans. Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 424-434, 2008.
Music genre classification with multilinear and sparse techniques 75/79
1 Introduction
2 Auditory Spectro-temporal Modulations
3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment
4 Sparse Representation-based Classification (SRC)Experimental Assessment
5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment
6 OutlookComparison with the State of the ArtConclusions-Future Work
Music genre classification with multilinear and sparse techniques 76/79
Conclusions-Future Work
SummaryA robust music genre classification framework has been proposedby taking into account the properties of the auditory humanperception.2D auditory temporal modulations and 3D cortical representationsyield rich tensors for feature extraction, while sparse conceptshave been employed for feature selection and classification.The best classification accuracies reported outperform any rateever obtained by the state of the art music genre classificationalgorithms when applied to either the GTZAN or the ISMIR2004Genre datasets.
Music genre classification with multilinear and sparse techniques 77/79
Conclusions-Future Work
SummaryA robust music genre classification framework has been proposedby taking into account the properties of the auditory humanperception.2D auditory temporal modulations and 3D cortical representationsyield rich tensors for feature extraction, while sparse conceptshave been employed for feature selection and classification.The best classification accuracies reported outperform any rateever obtained by the state of the art music genre classificationalgorithms when applied to either the GTZAN or the ISMIR2004Genre datasets.
Music genre classification with multilinear and sparse techniques 77/79
Conclusions-Future Work
SummaryA robust music genre classification framework has been proposedby taking into account the properties of the auditory humanperception.2D auditory temporal modulations and 3D cortical representationsyield rich tensors for feature extraction, while sparse conceptshave been employed for feature selection and classification.The best classification accuracies reported outperform any rateever obtained by the state of the art music genre classificationalgorithms when applied to either the GTZAN or the ISMIR2004Genre datasets.
Music genre classification with multilinear and sparse techniques 77/79
Conclusions-Future Work
Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.
Music genre classification with multilinear and sparse techniques 78/79
Conclusions-Future Work
Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.
Music genre classification with multilinear and sparse techniques 78/79
Conclusions-Future Work
Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.
Music genre classification with multilinear and sparse techniques 78/79
Conclusions-Future Work
Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.
Music genre classification with multilinear and sparse techniques 78/79
Thank You!Questions?
Music genre classification with multilinear and sparse techniques 79/79