Research Article Music Emotion Detection Using ...emotion or not. 3. Extraction of Acoustical...
Transcript of Research Article Music Emotion Detection Using ...emotion or not. 3. Extraction of Acoustical...
Research ArticleMusic Emotion Detection Using Hierarchical SparseKernel Machines
Yu-Hao Chin Chang-Hong Lin Ernestasia Siahaan and Jia-Ching Wang
Department of Computer Science and Information Engineering National Central University Taoyuan 32001 Taiwan
Correspondence should be addressed to Jia-Ching Wang jcwcsiencuedutw
Received 30 August 2013 Accepted 17 October 2013 Published 3 March 2014
Academic Editors B-W Chen S Liou and C-H Wu
Copyright copy 2014 Yu-Hao Chin et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Formusic emotion detection this paper presents amusic emotion verification system based on hierarchical sparse kernelmachinesWith the proposed system we intend to verify if a music clip possesses happiness emotion or not There are two levels in thehierarchical sparse kernel machines In the first level a set of acoustical features are extracted and principle component analysis(PCA) is implemented to reduce the dimensionThe acoustical features are utilized to generate the first-level decision vector whichis a vector with each element being a significant value of an emotion The significant values of eight main emotional classes areutilized in this paper To calculate the significant value of an emotion we construct its 2-class SVMwith calm emotion as the global(non-target) side of the SVM The probability distributions of the adopted acoustical features are calculated and the probabilityproduct kernel is applied in the first-level SVMs to obtain first-level decision vector feature In the second level of the hierarchicalsystem we merely construct a 2-class relevance vector machine (RVM) with happiness as the target side and other emotions as thebackground side of the RVM The first-level decision vector is used as the feature with conventional radial basis function kernelThe happiness verification threshold is built on the probability value In the experimental results the detection error tradeoff (DET)curve shows that the proposed system has a good performance on verifying if a music clip reveals happiness emotion
1 Introduction
Listening to music plays an important role in humanrsquos dailylife and people usually gain much benefit from listening tomusic Besides the leisure purpose music listening has otherapplication areas such as education inspiration productiontherapy and marketing [1] Sometimes people try to be inparticular emotion state by listening to music However insuch situation people need to choose the music which canmake human have particular feelings They should listen toeach song at least once to know the music emotion of eachsong and the whole process takes much time If peoplecan use computer to detect the emotion content in musicthe problem can be solved Besides this application musicemotion detection technology can be applied to other areaas well such as music research music recommendation andmusic retrieval For the limitless potential of music emotiondetection technology many researchers focus on detectingemotion in music
Many researches on music emotion detection havebeen proposed in music emotion detection [2] Existingresearch methods could be divided into two main categoriesdimension approach and categorical approach Dimensionapproach defines an emotion plane and views the emotionplane as a continuous emotion state space Each positionof the plane means an emotion state [3] The acousticalfeatures can be mapped to a point in the emotion plane [4]Categorical approach works by categorized emotions into anumber of emotion classes Each emotion class representsan area in the emotion plane [3] Different from dimensionapproach each emotion class is defined clearly In the trainingphase acoustical features are directly used to train classifiersto recognize the corresponding emotion classes [5] In thispaper the proposed method belongs to the second type
In previous music emotion detection studies manymachine learning algorithms are applied In [5] features weremapped into emotion categories on the emotion plane andtwo support vector regressors were trained to predict the
Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 270378 7 pageshttpdxdoiorg1011552014270378
2 The Scientific World Journal
Unknown
Spectrum
Tempo
Chromagram
energyRoot mean square
estimation
MFCC
emotionalmusic
Audiopreprocessing
Emotionalmusic database Time
-frequencyprocessing
Spectrum centroidSpectrum spread
RSS
Principle component Support vector First-level decision Relevance vector machine machineanalysis vector
With probability Testing
Verifiedemotion
Training
product Kernel
First-level decision vectorextraction
Feature extraction
Figure 1 Block diagram of the proposed system
arousal and valence value In [6] hierarchical frameworkwas adopted to detect emotion from acoustic music dataThemethod has the advantage of emphasizing proper featurein different detection work In [7] support vector machinewas applied to detect emotion content in music In [8]kernel-based class separability is used to weight featuresAfter feature selection principal component analysis andlinear discriminant analysis were applied and 119896-nearestneighborhood (KNN) classifier was then implemented Inthis paper amusic emotion detection system is proposedThesystem establishes a hierarchical sparse kernel machine Inthe first level eight 2-class SVMmodels are trainedwith eightemotion classes as the target sides respectively It is noted thatemotion perception is usually not based on a single acousticalfeature but a combination of acoustical features [4 9] Thispaper adopts an acoustical feature set comprising root meansquare energy (RMS energy) tempo chromagram MFCCsspectrum centroid spectrum spread and ratio of a spectralflatness measure to a spectral center (RSS) Each of them isnormalized In the second level of hierarchical sparse kernelmachines a 2-class relevance vector machine (RVM) modelwith happiness as the target side and other emotion as thebackground side is trained Besides first-level decision vectoris used as the feature in this level
The rest of this paper is organized as follows The systemoverview is described in Section 2 The features and first-level decision vector extraction are described in Section 3Principle component analysis is described in Section 4 Theintroduction of SVM and RVM is described in Section 5Section 6 shows our experimental results The conclusion isgiven in Section 7
2 System Overview
The block diagram of the proposed system is presented inFigure 1The systemmainly comprises two level sparse kernelmachines For the first-level SVMs we use a set of acousticalfeatures which includes RMS energy tempo chromagramMFCCs spectrum centroid spectrum spread and RSS InTable 1 the used acoustical features are classified into fourmain types that is dynamic rhythm timbre and tonality
Table 1 The proposed acoustical feature set
Feature class Feature name (dimension of feature)Dynamic RMS energy (1)Rhythm Tempo (1)
Timbre MFCCs (13) spectrum centroid (1) spectrumspread (1) RSS (1)
Tonality Chromagram (12)
Because each featurersquos scale is different normalization of thewhole feature set is performed [10] After normalization eightSVM models are trained to transform acoustical featuresinto emotion profile features Each of the eight SVM modelis trained and tested using probability product kernel Weuse the first-level decision vectors generated from the angryhappy sad relaxed pleased bored nervous and peacefulemotion classes For an emotion to calculate the correspond-ing value in the emotion profile features we construct its 2-class SVM with calm emotion as the background side of theRVM For the RVM conventional radial basis function kernelis used and the first-level decision vector extracted in the firstlevel is utilized as the feature To verify happiness emotiona 2-class RVM with happiness as the target side and otheremotion as the background side is constructed For a testedmusic clip the obtained probabilities value from this 2-classRVM is used to judge if this music clip belongs to happinessemotion or not
3 Extraction of Acoustical Feature and FirstLevel Decision Value Vector Feature
In the 2-level hierarchical sparse kernel machines the first-level SVMs use acoustical features while the second-levelRVM adopts first-level decision vector For acoustical fea-tures the proposed system extracts RMS energy tempochromagram MFCCs spectrum centroid spectrum spreadand RSSThe extraction of these acoustical features as well asfirst-level decision vectors are described in the following
The Scientific World Journal 3
31 Extraction of Acoustical Feature
311 RMS Energy RMS energy is also called root meansquare energy It computes the global energy of input signal119909 [11] The operation is defined as follows
119883RMS = radic1
119899
119899
sum
119894=1
1199092119894 (1)
where 119899 means signalrsquos length in hundredth of a second bydefault
312 Tempo Many tempo estimation methods have beenproposed The estimation of tempo is based on detectingperiodicities in a range of BPMs [12] Firstly significant onsetevents are detected in the frequency domain [11] Then findthe events that best represents the tempo of the song whichmeans to choose the maximum periodicity score for eachframe separately
313 Chromagram Chroma which is also called harmonicpitch class profile has a strong relationship with the structureof music [13] Chromagram is a joint distribution of signalstrength over the variables of time and chroma Chromais a frame-based representation of audio and is similar toshort time Fourier transform In music clips frequencycomponents belonging to the same pitch class are extractedby chromagram and transformed to a 12-dimensional repre-sentation including C C D D E F F G G A A andB The chromagram can present the distribution of energyalong the pitches or pitch classes [11 14] In [14] chromagramis defined as the remapping of time-frequency distributionThe chromagram is extracted by
V (119905 119896) = sum
119899isin119878119896
119883119905(119899)
119876119896
119896 isin 0 1 2 11 (2)
where 119883119905(119899) means the logarithmic magnitude of discrete
Fourier transform of the 119905th frame and 119876119896is the number of
elements in a subset of the discrete frequency space for eachpitch class [15]
In Figure 2 the chromagram from a piece of music isexemplified
314 Mel-Frequency Cepstral Coefficients (MFCCs) Aftersignal is digitized a large amount of information is notneeded and cost plenty of storage space Power spectrum isoften adopted to encode the signal to solve the problem [16]It is noted that MFCCs performs similar to human auditoryperception systemThe feature is adopted in various researchtopics including speaker recognition speech recognitionand music emotion recognition For example Cooper andFoote extracted MFCCs from music signal and they foundthat MFCCs are similar to music timbre expression [17] In[18]MFCCswere also proven to be having goodperformancein music recommendation
0 1 2 3 4 5C
CD
DEF
FG
GA
AB
Chromagram
Time axis (s)
Chro
ma c
lass
Figure 2 Example of chromagram from a piece of music
MFCCs extraction is based on spectrum The spectrumcan be extracted by using discrete Fourier transform
119909119908(119891) =
119873
sum
119899=0
119909119908(119899) expminus
2120587119891119899
119873 (3)
After power spectrum is extracted subband energies canbe extracted by using Mel filter banks and then evaluatelogarithm value of the energies as follows
119878119894= log
119865ℎ
sum
119891=119865119897
119871 (119894 119891)1003816100381610038161003816119883119908(119891)
10038161003816100381610038162
(4)
where 119865ℎis the discrete frequency index corresponding to
the high cutoff frequency 119865119897is the discrete frequency index
corresponding to low cutoff frequency and 119871(119894 119891) is theamplitude of the 119891th discrete frequency index of the 119894th Melwindow The number of the Mel windows often ranges from20 to 24 Finally MFCCs is obtained by performing discretecosine transform (DCT) [19] In Figure 3 the averageMFCCsvalues from a piece of music are exemplified
315 Spectrum Centroid Spectrum centroid is an econom-ical description of the shape of the power spectrum [20ndash22] Additionally it is correlated with a major perceptualdimension of timbre that is sharpness Figure 4 gives anexample of a spectrum and its spectrum centroid obtainedfrom a frame in a piece ofmusicThe spectrum centroid valueis 2638Hz in this example
316 Spectrum Spread Spectrum spread is an economicaldescriptor of the shape of the power spectrum that indicateswhether it is concentrated in the vicinity of its centroidor else spread out over the spectrum [20ndash22] It allowsdifferentiating between tone-like and noise-like sounds InFigure 5 an example of spectrum spread from a piece ofmusic is provided
4 The Scientific World Journal
1 2 3 4 5 6 7 8 9 10 11 12 13minus02
0
02
04
06
08
1
12MFCC
Coefficient ranks
Mag
nitu
de
Figure 3 Example of average MFCCs values from a piece of music
0 1000 2000 3000 4000 5000 6000 7000 80000
100
200
300
400
500
600
700Spectrum
Frequency (Hz)
Mag
nitu
de
Figure 4 Example of a spectrum and its spectrum centroid from aframe in a piece of music
317 Ratio of a Spectral Flatness Measure to a Spectral Center(RSS) RSSwas proposed byVapnik for speaker-independentemotional speech recognition [23] RSS is the ratio of spec-trum flatness to spectrum centroid and is calculated by
RSS =1000 times SF
SC (5)
where SF denotes spectrum flatness and SC represents spec-trum centroid
32 Extraction of First-Level Decision Vector The acousticalfeature set is utilized to generate the first-level decision vectorwith each element being a significant value of an emotionThis approach is able to interpret the emotional content byproviding multiple probabilistic class labels rather than asingle hard label [24] For example happiness emotion not
0 50 100 150 200 250 300 3500
500
1000
1500
2000
2500
3000
3500
4000
Frame
Freq
uenc
y
Spectrum spread
Figure 5 Example of spectrum spread from a piece of music
only contains happiness content but also other propertiesthat are similar to the content of peace The similarity topeaceful may cause a music clip to be recognized as an incor-rect emotion class In this example the advantage of first-level decision vector representation is its ability to conveyboth the evidences of happiness and peaceful emotions Thispaper uses the significant values of eight emotions (angryhappy sad relaxed pleased bored nervous and peaceful) toconstruct an emotion profile feature vector To calculate thesignificant value of an emotion we construct its 2-class SVMwith calm emotion as the background side of the SVM
4 Principle Component Analysis
PCA is an important mathematic technology in featureextraction approach In this paper PCA is implemented toreduce the dimensions of the extracted featuresThe first stepof PCA is to calculate the 119889-dimension mean vector u and119889 times 119889 covariance matrix Σ of the samples [25] After thatthe eigenvectors and eigenvalues are computed Finally thelargest 119896 eigenvectors are selected to form a 119889 times 119896 matrix 119872
whose columns consist of the 119896 eigenvectors In fact the otherdimensions are noise The PCA transformed data can be inthe form
x1015840 = M119879 (x minus u) (6)
5 Emotion Classifier
The emotion classifier used in the proposed system adoptsa 2-level hierarchical structure of sparse kernel machinesThe first-level SVMs use probability product kernel whilethe second-level RVMadopts traditional radial basis functionkernel with first-level decision vector feature
51 Support Vector Machine The SVM theory is an effectivestatistical technique and has drawn much attention on audioclassification tasks [7] An SVM is a binary classifier thatcreates an optimal hyperplane to classify input samples Thisoptimal hyperplane linearly divides the two classes with thelargest margin [23] Denote 119879 = (x
119894 119910119894) 119894 = 1 2 119873
as a training set for SVM each pair (x119894 119910119894) means training
The Scientific World Journal 5
sample x119894belongs to a class 119910
119894 where 119910
119894isin +1 minus1 The
fundamental concept is to choose a hyperplane which canclassify T accurately while maximizing the distance betweenthe two classes This means to find a pair (w 119887) such that
119910119894(w sdot x119894+ 119887) gt 0 119894 = 1 119873 (7)
where w isin 119877119873 is normalized by itself and 119887 isin 119877The pair (w 119887) defines a separating hyperplane of equa-
tion
w sdot x + 119887 = 0 (8)
If there exists a hyperplane satisfying (7) the set 119879 is saidto be linearly separable and we can change w and 119887 so that
119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873 (9)
According to (9) we can derive an objective functionunder constraint
min w2
subject to 119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873
(10)
Since w2 is convex we can solve (9) by applying the
classical method of Lagrange multipliers
min w2
+ 120583119894[119910119894(w sdot x119894+ 119887) minus 1] 119894 = 1 119873 (11)
We denote U = (1205831 1205832 120583
119873) as the 119873 nonnegative
Lagrange multipliers associated with (10) After solving (11)the optimal hyperplane has the following expansion
w =
119873
sum
119894=1
120583119894119910119894119909119894 (12)
119887 can be determined from U and from the Kuhn-Tuckerconditions Consider
120583119894(119910119894(w sdot 119909
119894+ 119887) minus 1) = 0 119894 = 1 2 119873 (13)
Accordingly (11) the expected hyperplane is a linearcombination of training samplesThe corresponding trainingsamples (x
119894 119910119894) with nonzero Lagrange multipliers are called
support vectors Finally the decision value from a new datapoint x can be written as
dec (119909) =
119873
sum
119894=1
120583119894119910119894x119894sdot x + 119887 (14)
Functions that satisfy Mercerrsquos theorem can be used askernels In this paper probability product kernel is adopted
52 Probability Product Support Vector Machine A functioncan be considered as kernel function if the function satisfiesMercerrsquos theorem Using Mercerrsquos theory we can introduce amapping function 120601(x) such that 119896(x
119895 x119894) = 120601(x
119895)120601(x119894) This
provides the ability of handling nonlinear data by mappingthe original input space R119889 into some other space
In this paper the probability product kernel is utilizedThe probability product kernel is a method of measuringsimilarity between distributions and it has the property ofsimple and intuitively compelling conception [26] Proba-bility product kernel computes a generalized inner productbetween two probability distributions in the Hilbert space Apositive definite kernel 119896 119874 times 119874 rarr R on input space 119874
and examples o1 o2 o
119898isin 119874 are defined Firstly the input
data 119909 is mapped to a probability distribution119901(119909 | 119874) whichfits separate probabilistic models 119901
1(119909) 1199012(119909) 119901
119898(119909) to
o1 o2 o
119898 After that a novel kernel 119896prob(119901
119894 119901119895) between
probability distributions on 119874 is defined At last a kernelbetween examples is needed to be defined and the kernelis equal to 119896
prob between the corresponding distributionsConsider
119896 (o119894 o119895) = 119896
prob(119901119894 119901119895) (15)
Finally this kernel is applied to SVM and proceeded as usualThe probability product kernel between distributions 119901
119894and
119901119895is defined as
119896 (119901119894 119901119895) = int119909
119901120588
119894(x) 119901120588119895(x) 119889119909 = ⟨119901
120588
119894 119901120588
119895⟩1198712
(16)
where 119901119894and 119901
119895are probability distributions on a space 119874
Assume that 119901120588
119894 119901120588
119895isin 1198712(119874) 119871
2is a Hilbert space and 120588 is
a positive constant Probability product kernel allows us tointroduce prior knowledge of data In this paper we assumea 119889-dimensional Gaussian distribution of our data
53 First-Level Decision Vector Extraction First-level deci-sion vector presents perception probability of each of theeight emotion-specific decisions which is extracted frominput data by collecting decision values from each modelThe decision value of SVM represents the degree of similaritybetween model and testing data The advantage of similaritymeasure can be used to find out which model fits the datamost accurately [24] Using the first-level decision vector themost probably perceived emotion in music can be detected
54 Relevance Vector Machine RVM is a development ofSVM Different from SVM RVM tries to find a considerablenumber of weights which has highest sparsity [27]Themodeldefines a conditional distribution for target class 119910 = 0 1given an input set x
1 x
119899 [28] Assume that a training
data can be a linear combination of weighted nonlinear basisfunctions 120601
119894(x) which is transformed by a logistic sigmoid
function as follows
119891 (xw) = w119879120601 (x) (17)
where w = (1199081 1199082 119908
119866) 120601(x) = (120601
1(x) 1206012(x)
120601119866(x))119879 denotes the weights In order to make weight sparse
the Bayesian probabilistic framework is implemented to findthe distribution over the weights instead of using pointwiseestimation therefore a separate hyperparameter 119886 for each
6 The Scientific World Journal
01 02 05 1 2 5 10 20 40
0102
05
1
2
5
10
20
40
False alarm probability ()
Miss
pro
babi
lity
()
Figure 6 DET curve of the proposed system
of the weight parameters119908 is introduced According to Bayesrule the posterior probability of 119908 is
119901 (w | 119910 a) =119901 (119910 | w a) 119901 (w | a)
119901 (119910 | a) (18)
where 119901(119910 | w a) is likelihood 119901(w | a) is prior conditionedon weights a = [119886
1 119886
119899]119879 and 119901(119910 | a) denotes the
evidence For the reason that 119910 is a binary variable thelikelihood function can be given by
119901 (119910 | w a) =
119899
prod
119894=1
[120590 (119891 (x119894w))]119910119894[1 minus 120590 (119891 (x
119894w))]1minus119910119894
(19)
where 120590(119891) = 1(1+119890minus119891
) is the logistic sigmoid link functionAccording to (18) it can be found that a significant proportionof hyperparameters tend to be infinity and the correspondingposterior distributions of weight parameters are concentratedat zeroTherefore the basis functions that multiplied by theseparameters will not be taken for reference when training themodel As a result the model will be sparse
6 Experimental Results
In the experiments we collected one hundred songs fromtwo websites to construct a music emotion database Thesewebsites are All Music Guide [29] and Lastfm [30] Asmentioned before music may contain multiple emotions Ifwe know which emotion class a song most likely belongs towemay know themain emotion of the song Songs in Lastfmare tagged by many people on the Internet We choose theemotion which is tagged by most people to be the groundtruth of data
The database consists of nine classes of emotions includ-ing happy angry sad bored nervous relaxed pleased calm
and peaceful Calm is taken as a modelrsquos opposite site whentraining models Each emotion class contains twenty songsEach song is thirty seconds long and is divided into five-second clips Half of the songs are used as training dataand the others are used as testing data In this paper 240music clips are tested All of songs are western music and areencoded in 16KHzWAV format The used acoustical featureset are listed in Table 1 The whole feature set dimension is30 The used SVM is based on LIBSVM library [31] andthe used RVM is based on PTR toolbox [32] The systemperformance is evaluated in terms of DET curve Figure 6depicts DET curve of the proposed happiness verificationsystem The proposed system can achieve 1333 equal errorrate (EER) From our results we see that the system performswell on happiness emotion verification in music
7 Conclusion
Detecting emotion in music has become the concern ofmany researchers in recent years In this paper we proposeda first-level decision-vector-based music happiness emotiondetection system The proposed system adopts a hierarchicalstructure of sparse kernel machines First eight SVMmodelsare trained based on acoustical features with probabilityproduct kernel Then eight decision values can be extractedto construct the first-level decision vector feature After thatthese eight decision values are considered as new feature totrain and test a 2-class RVM with happiness as the targetside The probability value of the RVM is used to verifyhappiness content in music Experimental results reveal thatthe proposed system can achieve 1333 equal error rate(EER)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R E Milliman ldquoUsing backgroundmusic to affect the behaviorof supermarket shoppersrdquo Journal of Marketing vol 46 no 3pp 86ndash91 1982
[2] C-H Yeh H-H Lin and H-T Chang ldquoAn efficient emotiondetection scheme for popular musicrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems (ISCAS rsquo09)pp 1799ndash1802 Taipei City Taiwan May 2009
[3] Y-H Yang Y-C Lin Y-F Su and H H Chen ldquoA regressionapproach to music emotion recognitionrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 2 pp 448ndash457 2008
[4] Y H Yang and H H Chen ldquoPrediction of the distributionof perceived music emotions using discrete samplesrdquo IEEETransactions on Audio Speech and Language Processing vol 19no 7 pp 2184ndash2196 2011
[5] B Han S Rho R B Dannenberg and E Hwang ldquoSMERSmusic emotion recognition using support vector regressionrdquo inProceedings of the International Conference on Music Informa-tion Retrieval Kobe Japan 2009
The Scientific World Journal 7
[6] L Lu D Liu and H-J Zhang ldquoAutomatic mood detection andtracking of music audio signalsrdquo IEEE Transactions on AudioSpeech and Language Processing vol 14 no 1 pp 5ndash18 2006
[7] C-Y Chang C-Y Lo C-J Wang and P-C Chung ldquoA musicrecommendation system with consideration of personal emo-tionrdquo in Proceedings of the International Computer Symposium(ICS rsquo10) pp 18ndash23 Tainan City Taiwan December 2010
[8] F C Hwang J S Wang P C Chung and C F YangldquoDetecting emotional expression ofmusic with feature selectionapproachrdquo in Proceedings of the International Conference onOrange Technologies (ICOT rsquo13) pp 282ndash286 March 2013
[9] K Hevner ldquoExpression in music a discussion of experimentalstudies and theoriesrdquo Psychological Review vol 42 no 2 pp186ndash204 1935
[10] M Chouchane S Paris F Le Gland C Musso and D-TPham ldquoOn the probability distribution of a moving targetAsymptotic and non-asymptotic resultsrdquo in Proceedings of the14th International Conference on Information Fusion (Fusion rsquo11)pp 1ndash8 July 2011
[11] O Lartillot and P Toiviainen ldquoMIR in Matlab (II) a toolboxfor musical feature extraction from audiordquo in Proceedings ofthe International Conference Music Information Retrieval pp127ndash130 2007 httpswwwjyufihumlaitoksetmusiikkienresearchcoematerialsmirtoolbox
[12] C-W Chen K Lee and H-H Wu ldquoTowards a class-basedrepresentation of perceptual tempo for music retrievalrdquo inProceedings of the 8th International Conference on MachineLearning andApplications (ICMLA rsquo09) pp 602ndash607December2009
[13] WChai ldquoSemantic segmentation and summarization ofmusicrdquoIEEE Signal Processing Magazine vol 23 no 2 pp 124ndash1322006
[14] M A Bartsch and G H Wakefield ldquoAudio thumbnailingof popular music using chroma-based representationsrdquo IEEETransactions on Multimedia vol 7 no 1 pp 96ndash104 2005
[15] X Yu J Zhang J LiuWWan andW Yang ldquoAn audio retrievalmethod based on chromagram and distance metricsrdquo in Pro-ceedings of the International Conference on Audio Language andImage Processing (ICALIP rsquo10) pp 425ndash428 Shanghai ChinaNovember 2010
[16] J O Garcıa and C A R Garcia ldquoMel-frequency cepstrumcoefficients extraction from infant cry for classification of nor-mal and pathological cry with feed-forward neural networksrdquoin Proceedings of the International Joint Conference on NeuralNetworks pp 3140ndash3145 July 2003
[17] C Y Lin and S Cheng ldquoMulti-theme analysis of musicemotion similarity for jukebox applicationrdquo in Proceedings ofthe International Conference on Audio Language and ImageProcessing (ICALIP rsquo12) pp 241ndash246 July 2012
[18] B Shao M Ogihara D Wang and T Li ldquoMusic recommenda-tion based on acoustic features and user access patternsrdquo IEEETransactions on Audio Speech and Language Processing vol 17pp 1602ndash1611 2009
[19] W-Q Zhang D Yang J Liu and X Bao ldquoPerturbation analysisof mel-frequency cepstrum coefficientsrdquo in Proceedings of theInternational Conference onAudio Language and Image Process-ing (ICALIP rsquo10) pp 715ndash718 Shanghai China November 2010
[20] H G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond AudioContent Indexing andRetrievalWileyNewYorkNY USA 2005
[21] ISO-IECJTCI SC29 WGII Moving Pictures Expeti GroupldquoInformation technologymdashmultimedia content description
interfacemdashpart 4 Audiordquo Comittee Draft 15938-4 ISOIEC2000
[22] M Casey ldquoMPEG-7 sound-recognition toolsrdquo IEEE Transac-tions on Circuits and Systems for Video Technology vol 11 no6 pp 737ndash747 2001
[23] V Vapnik Statistical Learning Theory Wiley New York NYUSA 1998
[24] E Mower M J Mataric and S Narayanan ldquoA frameworkfor automatic human emotion classification using emotionprofilesrdquo IEEE Transactions on Audio Speech and LanguageProcessing vol 19 no 5 pp 1057ndash1070 2011
[25] R O Duda P E Hart and D G Stork Pattern ClassificationNew York NY USA 2nd edition 2001
[26] T Jebara R Kondor and A Howard ldquoProbability productkernelsrdquo Journal of Machine Learning Research vol 5 pp 819ndash844 2004
[27] F A Mianji and Y Zhang ldquoRobust hyperspectral classificationusing relevance vector machinerdquo IEEE Transactions on Geo-science and Remote Sensing vol 49 no 6 pp 2100ndash2112 2011
[28] C M Bishop Pattern Recognition andMachine Learning (Infor-mation Science and Statistics) Springer New York NY USA2nd edition 2007
[29] ldquoThe All Music Guiderdquo httpwwwallmusiccom[30] ldquoLastfmrdquo httpcnlastfmhome[31] C C Chang andC J Lin ldquoLIBSVM a library for support vector
machinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm[32] ldquoPattern Recognition Toolboxrdquo httpwwwnewfolderconsult-
ingcomprt
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
2 The Scientific World Journal
Unknown
Spectrum
Tempo
Chromagram
energyRoot mean square
estimation
MFCC
emotionalmusic
Audiopreprocessing
Emotionalmusic database Time
-frequencyprocessing
Spectrum centroidSpectrum spread
RSS
Principle component Support vector First-level decision Relevance vector machine machineanalysis vector
With probability Testing
Verifiedemotion
Training
product Kernel
First-level decision vectorextraction
Feature extraction
Figure 1 Block diagram of the proposed system
arousal and valence value In [6] hierarchical frameworkwas adopted to detect emotion from acoustic music dataThemethod has the advantage of emphasizing proper featurein different detection work In [7] support vector machinewas applied to detect emotion content in music In [8]kernel-based class separability is used to weight featuresAfter feature selection principal component analysis andlinear discriminant analysis were applied and 119896-nearestneighborhood (KNN) classifier was then implemented Inthis paper amusic emotion detection system is proposedThesystem establishes a hierarchical sparse kernel machine Inthe first level eight 2-class SVMmodels are trainedwith eightemotion classes as the target sides respectively It is noted thatemotion perception is usually not based on a single acousticalfeature but a combination of acoustical features [4 9] Thispaper adopts an acoustical feature set comprising root meansquare energy (RMS energy) tempo chromagram MFCCsspectrum centroid spectrum spread and ratio of a spectralflatness measure to a spectral center (RSS) Each of them isnormalized In the second level of hierarchical sparse kernelmachines a 2-class relevance vector machine (RVM) modelwith happiness as the target side and other emotion as thebackground side is trained Besides first-level decision vectoris used as the feature in this level
The rest of this paper is organized as follows The systemoverview is described in Section 2 The features and first-level decision vector extraction are described in Section 3Principle component analysis is described in Section 4 Theintroduction of SVM and RVM is described in Section 5Section 6 shows our experimental results The conclusion isgiven in Section 7
2 System Overview
The block diagram of the proposed system is presented inFigure 1The systemmainly comprises two level sparse kernelmachines For the first-level SVMs we use a set of acousticalfeatures which includes RMS energy tempo chromagramMFCCs spectrum centroid spectrum spread and RSS InTable 1 the used acoustical features are classified into fourmain types that is dynamic rhythm timbre and tonality
Table 1 The proposed acoustical feature set
Feature class Feature name (dimension of feature)Dynamic RMS energy (1)Rhythm Tempo (1)
Timbre MFCCs (13) spectrum centroid (1) spectrumspread (1) RSS (1)
Tonality Chromagram (12)
Because each featurersquos scale is different normalization of thewhole feature set is performed [10] After normalization eightSVM models are trained to transform acoustical featuresinto emotion profile features Each of the eight SVM modelis trained and tested using probability product kernel Weuse the first-level decision vectors generated from the angryhappy sad relaxed pleased bored nervous and peacefulemotion classes For an emotion to calculate the correspond-ing value in the emotion profile features we construct its 2-class SVM with calm emotion as the background side of theRVM For the RVM conventional radial basis function kernelis used and the first-level decision vector extracted in the firstlevel is utilized as the feature To verify happiness emotiona 2-class RVM with happiness as the target side and otheremotion as the background side is constructed For a testedmusic clip the obtained probabilities value from this 2-classRVM is used to judge if this music clip belongs to happinessemotion or not
3 Extraction of Acoustical Feature and FirstLevel Decision Value Vector Feature
In the 2-level hierarchical sparse kernel machines the first-level SVMs use acoustical features while the second-levelRVM adopts first-level decision vector For acoustical fea-tures the proposed system extracts RMS energy tempochromagram MFCCs spectrum centroid spectrum spreadand RSSThe extraction of these acoustical features as well asfirst-level decision vectors are described in the following
The Scientific World Journal 3
31 Extraction of Acoustical Feature
311 RMS Energy RMS energy is also called root meansquare energy It computes the global energy of input signal119909 [11] The operation is defined as follows
119883RMS = radic1
119899
119899
sum
119894=1
1199092119894 (1)
where 119899 means signalrsquos length in hundredth of a second bydefault
312 Tempo Many tempo estimation methods have beenproposed The estimation of tempo is based on detectingperiodicities in a range of BPMs [12] Firstly significant onsetevents are detected in the frequency domain [11] Then findthe events that best represents the tempo of the song whichmeans to choose the maximum periodicity score for eachframe separately
313 Chromagram Chroma which is also called harmonicpitch class profile has a strong relationship with the structureof music [13] Chromagram is a joint distribution of signalstrength over the variables of time and chroma Chromais a frame-based representation of audio and is similar toshort time Fourier transform In music clips frequencycomponents belonging to the same pitch class are extractedby chromagram and transformed to a 12-dimensional repre-sentation including C C D D E F F G G A A andB The chromagram can present the distribution of energyalong the pitches or pitch classes [11 14] In [14] chromagramis defined as the remapping of time-frequency distributionThe chromagram is extracted by
V (119905 119896) = sum
119899isin119878119896
119883119905(119899)
119876119896
119896 isin 0 1 2 11 (2)
where 119883119905(119899) means the logarithmic magnitude of discrete
Fourier transform of the 119905th frame and 119876119896is the number of
elements in a subset of the discrete frequency space for eachpitch class [15]
In Figure 2 the chromagram from a piece of music isexemplified
314 Mel-Frequency Cepstral Coefficients (MFCCs) Aftersignal is digitized a large amount of information is notneeded and cost plenty of storage space Power spectrum isoften adopted to encode the signal to solve the problem [16]It is noted that MFCCs performs similar to human auditoryperception systemThe feature is adopted in various researchtopics including speaker recognition speech recognitionand music emotion recognition For example Cooper andFoote extracted MFCCs from music signal and they foundthat MFCCs are similar to music timbre expression [17] In[18]MFCCswere also proven to be having goodperformancein music recommendation
0 1 2 3 4 5C
CD
DEF
FG
GA
AB
Chromagram
Time axis (s)
Chro
ma c
lass
Figure 2 Example of chromagram from a piece of music
MFCCs extraction is based on spectrum The spectrumcan be extracted by using discrete Fourier transform
119909119908(119891) =
119873
sum
119899=0
119909119908(119899) expminus
2120587119891119899
119873 (3)
After power spectrum is extracted subband energies canbe extracted by using Mel filter banks and then evaluatelogarithm value of the energies as follows
119878119894= log
119865ℎ
sum
119891=119865119897
119871 (119894 119891)1003816100381610038161003816119883119908(119891)
10038161003816100381610038162
(4)
where 119865ℎis the discrete frequency index corresponding to
the high cutoff frequency 119865119897is the discrete frequency index
corresponding to low cutoff frequency and 119871(119894 119891) is theamplitude of the 119891th discrete frequency index of the 119894th Melwindow The number of the Mel windows often ranges from20 to 24 Finally MFCCs is obtained by performing discretecosine transform (DCT) [19] In Figure 3 the averageMFCCsvalues from a piece of music are exemplified
315 Spectrum Centroid Spectrum centroid is an econom-ical description of the shape of the power spectrum [20ndash22] Additionally it is correlated with a major perceptualdimension of timbre that is sharpness Figure 4 gives anexample of a spectrum and its spectrum centroid obtainedfrom a frame in a piece ofmusicThe spectrum centroid valueis 2638Hz in this example
316 Spectrum Spread Spectrum spread is an economicaldescriptor of the shape of the power spectrum that indicateswhether it is concentrated in the vicinity of its centroidor else spread out over the spectrum [20ndash22] It allowsdifferentiating between tone-like and noise-like sounds InFigure 5 an example of spectrum spread from a piece ofmusic is provided
4 The Scientific World Journal
1 2 3 4 5 6 7 8 9 10 11 12 13minus02
0
02
04
06
08
1
12MFCC
Coefficient ranks
Mag
nitu
de
Figure 3 Example of average MFCCs values from a piece of music
0 1000 2000 3000 4000 5000 6000 7000 80000
100
200
300
400
500
600
700Spectrum
Frequency (Hz)
Mag
nitu
de
Figure 4 Example of a spectrum and its spectrum centroid from aframe in a piece of music
317 Ratio of a Spectral Flatness Measure to a Spectral Center(RSS) RSSwas proposed byVapnik for speaker-independentemotional speech recognition [23] RSS is the ratio of spec-trum flatness to spectrum centroid and is calculated by
RSS =1000 times SF
SC (5)
where SF denotes spectrum flatness and SC represents spec-trum centroid
32 Extraction of First-Level Decision Vector The acousticalfeature set is utilized to generate the first-level decision vectorwith each element being a significant value of an emotionThis approach is able to interpret the emotional content byproviding multiple probabilistic class labels rather than asingle hard label [24] For example happiness emotion not
0 50 100 150 200 250 300 3500
500
1000
1500
2000
2500
3000
3500
4000
Frame
Freq
uenc
y
Spectrum spread
Figure 5 Example of spectrum spread from a piece of music
only contains happiness content but also other propertiesthat are similar to the content of peace The similarity topeaceful may cause a music clip to be recognized as an incor-rect emotion class In this example the advantage of first-level decision vector representation is its ability to conveyboth the evidences of happiness and peaceful emotions Thispaper uses the significant values of eight emotions (angryhappy sad relaxed pleased bored nervous and peaceful) toconstruct an emotion profile feature vector To calculate thesignificant value of an emotion we construct its 2-class SVMwith calm emotion as the background side of the SVM
4 Principle Component Analysis
PCA is an important mathematic technology in featureextraction approach In this paper PCA is implemented toreduce the dimensions of the extracted featuresThe first stepof PCA is to calculate the 119889-dimension mean vector u and119889 times 119889 covariance matrix Σ of the samples [25] After thatthe eigenvectors and eigenvalues are computed Finally thelargest 119896 eigenvectors are selected to form a 119889 times 119896 matrix 119872
whose columns consist of the 119896 eigenvectors In fact the otherdimensions are noise The PCA transformed data can be inthe form
x1015840 = M119879 (x minus u) (6)
5 Emotion Classifier
The emotion classifier used in the proposed system adoptsa 2-level hierarchical structure of sparse kernel machinesThe first-level SVMs use probability product kernel whilethe second-level RVMadopts traditional radial basis functionkernel with first-level decision vector feature
51 Support Vector Machine The SVM theory is an effectivestatistical technique and has drawn much attention on audioclassification tasks [7] An SVM is a binary classifier thatcreates an optimal hyperplane to classify input samples Thisoptimal hyperplane linearly divides the two classes with thelargest margin [23] Denote 119879 = (x
119894 119910119894) 119894 = 1 2 119873
as a training set for SVM each pair (x119894 119910119894) means training
The Scientific World Journal 5
sample x119894belongs to a class 119910
119894 where 119910
119894isin +1 minus1 The
fundamental concept is to choose a hyperplane which canclassify T accurately while maximizing the distance betweenthe two classes This means to find a pair (w 119887) such that
119910119894(w sdot x119894+ 119887) gt 0 119894 = 1 119873 (7)
where w isin 119877119873 is normalized by itself and 119887 isin 119877The pair (w 119887) defines a separating hyperplane of equa-
tion
w sdot x + 119887 = 0 (8)
If there exists a hyperplane satisfying (7) the set 119879 is saidto be linearly separable and we can change w and 119887 so that
119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873 (9)
According to (9) we can derive an objective functionunder constraint
min w2
subject to 119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873
(10)
Since w2 is convex we can solve (9) by applying the
classical method of Lagrange multipliers
min w2
+ 120583119894[119910119894(w sdot x119894+ 119887) minus 1] 119894 = 1 119873 (11)
We denote U = (1205831 1205832 120583
119873) as the 119873 nonnegative
Lagrange multipliers associated with (10) After solving (11)the optimal hyperplane has the following expansion
w =
119873
sum
119894=1
120583119894119910119894119909119894 (12)
119887 can be determined from U and from the Kuhn-Tuckerconditions Consider
120583119894(119910119894(w sdot 119909
119894+ 119887) minus 1) = 0 119894 = 1 2 119873 (13)
Accordingly (11) the expected hyperplane is a linearcombination of training samplesThe corresponding trainingsamples (x
119894 119910119894) with nonzero Lagrange multipliers are called
support vectors Finally the decision value from a new datapoint x can be written as
dec (119909) =
119873
sum
119894=1
120583119894119910119894x119894sdot x + 119887 (14)
Functions that satisfy Mercerrsquos theorem can be used askernels In this paper probability product kernel is adopted
52 Probability Product Support Vector Machine A functioncan be considered as kernel function if the function satisfiesMercerrsquos theorem Using Mercerrsquos theory we can introduce amapping function 120601(x) such that 119896(x
119895 x119894) = 120601(x
119895)120601(x119894) This
provides the ability of handling nonlinear data by mappingthe original input space R119889 into some other space
In this paper the probability product kernel is utilizedThe probability product kernel is a method of measuringsimilarity between distributions and it has the property ofsimple and intuitively compelling conception [26] Proba-bility product kernel computes a generalized inner productbetween two probability distributions in the Hilbert space Apositive definite kernel 119896 119874 times 119874 rarr R on input space 119874
and examples o1 o2 o
119898isin 119874 are defined Firstly the input
data 119909 is mapped to a probability distribution119901(119909 | 119874) whichfits separate probabilistic models 119901
1(119909) 1199012(119909) 119901
119898(119909) to
o1 o2 o
119898 After that a novel kernel 119896prob(119901
119894 119901119895) between
probability distributions on 119874 is defined At last a kernelbetween examples is needed to be defined and the kernelis equal to 119896
prob between the corresponding distributionsConsider
119896 (o119894 o119895) = 119896
prob(119901119894 119901119895) (15)
Finally this kernel is applied to SVM and proceeded as usualThe probability product kernel between distributions 119901
119894and
119901119895is defined as
119896 (119901119894 119901119895) = int119909
119901120588
119894(x) 119901120588119895(x) 119889119909 = ⟨119901
120588
119894 119901120588
119895⟩1198712
(16)
where 119901119894and 119901
119895are probability distributions on a space 119874
Assume that 119901120588
119894 119901120588
119895isin 1198712(119874) 119871
2is a Hilbert space and 120588 is
a positive constant Probability product kernel allows us tointroduce prior knowledge of data In this paper we assumea 119889-dimensional Gaussian distribution of our data
53 First-Level Decision Vector Extraction First-level deci-sion vector presents perception probability of each of theeight emotion-specific decisions which is extracted frominput data by collecting decision values from each modelThe decision value of SVM represents the degree of similaritybetween model and testing data The advantage of similaritymeasure can be used to find out which model fits the datamost accurately [24] Using the first-level decision vector themost probably perceived emotion in music can be detected
54 Relevance Vector Machine RVM is a development ofSVM Different from SVM RVM tries to find a considerablenumber of weights which has highest sparsity [27]Themodeldefines a conditional distribution for target class 119910 = 0 1given an input set x
1 x
119899 [28] Assume that a training
data can be a linear combination of weighted nonlinear basisfunctions 120601
119894(x) which is transformed by a logistic sigmoid
function as follows
119891 (xw) = w119879120601 (x) (17)
where w = (1199081 1199082 119908
119866) 120601(x) = (120601
1(x) 1206012(x)
120601119866(x))119879 denotes the weights In order to make weight sparse
the Bayesian probabilistic framework is implemented to findthe distribution over the weights instead of using pointwiseestimation therefore a separate hyperparameter 119886 for each
6 The Scientific World Journal
01 02 05 1 2 5 10 20 40
0102
05
1
2
5
10
20
40
False alarm probability ()
Miss
pro
babi
lity
()
Figure 6 DET curve of the proposed system
of the weight parameters119908 is introduced According to Bayesrule the posterior probability of 119908 is
119901 (w | 119910 a) =119901 (119910 | w a) 119901 (w | a)
119901 (119910 | a) (18)
where 119901(119910 | w a) is likelihood 119901(w | a) is prior conditionedon weights a = [119886
1 119886
119899]119879 and 119901(119910 | a) denotes the
evidence For the reason that 119910 is a binary variable thelikelihood function can be given by
119901 (119910 | w a) =
119899
prod
119894=1
[120590 (119891 (x119894w))]119910119894[1 minus 120590 (119891 (x
119894w))]1minus119910119894
(19)
where 120590(119891) = 1(1+119890minus119891
) is the logistic sigmoid link functionAccording to (18) it can be found that a significant proportionof hyperparameters tend to be infinity and the correspondingposterior distributions of weight parameters are concentratedat zeroTherefore the basis functions that multiplied by theseparameters will not be taken for reference when training themodel As a result the model will be sparse
6 Experimental Results
In the experiments we collected one hundred songs fromtwo websites to construct a music emotion database Thesewebsites are All Music Guide [29] and Lastfm [30] Asmentioned before music may contain multiple emotions Ifwe know which emotion class a song most likely belongs towemay know themain emotion of the song Songs in Lastfmare tagged by many people on the Internet We choose theemotion which is tagged by most people to be the groundtruth of data
The database consists of nine classes of emotions includ-ing happy angry sad bored nervous relaxed pleased calm
and peaceful Calm is taken as a modelrsquos opposite site whentraining models Each emotion class contains twenty songsEach song is thirty seconds long and is divided into five-second clips Half of the songs are used as training dataand the others are used as testing data In this paper 240music clips are tested All of songs are western music and areencoded in 16KHzWAV format The used acoustical featureset are listed in Table 1 The whole feature set dimension is30 The used SVM is based on LIBSVM library [31] andthe used RVM is based on PTR toolbox [32] The systemperformance is evaluated in terms of DET curve Figure 6depicts DET curve of the proposed happiness verificationsystem The proposed system can achieve 1333 equal errorrate (EER) From our results we see that the system performswell on happiness emotion verification in music
7 Conclusion
Detecting emotion in music has become the concern ofmany researchers in recent years In this paper we proposeda first-level decision-vector-based music happiness emotiondetection system The proposed system adopts a hierarchicalstructure of sparse kernel machines First eight SVMmodelsare trained based on acoustical features with probabilityproduct kernel Then eight decision values can be extractedto construct the first-level decision vector feature After thatthese eight decision values are considered as new feature totrain and test a 2-class RVM with happiness as the targetside The probability value of the RVM is used to verifyhappiness content in music Experimental results reveal thatthe proposed system can achieve 1333 equal error rate(EER)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R E Milliman ldquoUsing backgroundmusic to affect the behaviorof supermarket shoppersrdquo Journal of Marketing vol 46 no 3pp 86ndash91 1982
[2] C-H Yeh H-H Lin and H-T Chang ldquoAn efficient emotiondetection scheme for popular musicrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems (ISCAS rsquo09)pp 1799ndash1802 Taipei City Taiwan May 2009
[3] Y-H Yang Y-C Lin Y-F Su and H H Chen ldquoA regressionapproach to music emotion recognitionrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 2 pp 448ndash457 2008
[4] Y H Yang and H H Chen ldquoPrediction of the distributionof perceived music emotions using discrete samplesrdquo IEEETransactions on Audio Speech and Language Processing vol 19no 7 pp 2184ndash2196 2011
[5] B Han S Rho R B Dannenberg and E Hwang ldquoSMERSmusic emotion recognition using support vector regressionrdquo inProceedings of the International Conference on Music Informa-tion Retrieval Kobe Japan 2009
The Scientific World Journal 7
[6] L Lu D Liu and H-J Zhang ldquoAutomatic mood detection andtracking of music audio signalsrdquo IEEE Transactions on AudioSpeech and Language Processing vol 14 no 1 pp 5ndash18 2006
[7] C-Y Chang C-Y Lo C-J Wang and P-C Chung ldquoA musicrecommendation system with consideration of personal emo-tionrdquo in Proceedings of the International Computer Symposium(ICS rsquo10) pp 18ndash23 Tainan City Taiwan December 2010
[8] F C Hwang J S Wang P C Chung and C F YangldquoDetecting emotional expression ofmusic with feature selectionapproachrdquo in Proceedings of the International Conference onOrange Technologies (ICOT rsquo13) pp 282ndash286 March 2013
[9] K Hevner ldquoExpression in music a discussion of experimentalstudies and theoriesrdquo Psychological Review vol 42 no 2 pp186ndash204 1935
[10] M Chouchane S Paris F Le Gland C Musso and D-TPham ldquoOn the probability distribution of a moving targetAsymptotic and non-asymptotic resultsrdquo in Proceedings of the14th International Conference on Information Fusion (Fusion rsquo11)pp 1ndash8 July 2011
[11] O Lartillot and P Toiviainen ldquoMIR in Matlab (II) a toolboxfor musical feature extraction from audiordquo in Proceedings ofthe International Conference Music Information Retrieval pp127ndash130 2007 httpswwwjyufihumlaitoksetmusiikkienresearchcoematerialsmirtoolbox
[12] C-W Chen K Lee and H-H Wu ldquoTowards a class-basedrepresentation of perceptual tempo for music retrievalrdquo inProceedings of the 8th International Conference on MachineLearning andApplications (ICMLA rsquo09) pp 602ndash607December2009
[13] WChai ldquoSemantic segmentation and summarization ofmusicrdquoIEEE Signal Processing Magazine vol 23 no 2 pp 124ndash1322006
[14] M A Bartsch and G H Wakefield ldquoAudio thumbnailingof popular music using chroma-based representationsrdquo IEEETransactions on Multimedia vol 7 no 1 pp 96ndash104 2005
[15] X Yu J Zhang J LiuWWan andW Yang ldquoAn audio retrievalmethod based on chromagram and distance metricsrdquo in Pro-ceedings of the International Conference on Audio Language andImage Processing (ICALIP rsquo10) pp 425ndash428 Shanghai ChinaNovember 2010
[16] J O Garcıa and C A R Garcia ldquoMel-frequency cepstrumcoefficients extraction from infant cry for classification of nor-mal and pathological cry with feed-forward neural networksrdquoin Proceedings of the International Joint Conference on NeuralNetworks pp 3140ndash3145 July 2003
[17] C Y Lin and S Cheng ldquoMulti-theme analysis of musicemotion similarity for jukebox applicationrdquo in Proceedings ofthe International Conference on Audio Language and ImageProcessing (ICALIP rsquo12) pp 241ndash246 July 2012
[18] B Shao M Ogihara D Wang and T Li ldquoMusic recommenda-tion based on acoustic features and user access patternsrdquo IEEETransactions on Audio Speech and Language Processing vol 17pp 1602ndash1611 2009
[19] W-Q Zhang D Yang J Liu and X Bao ldquoPerturbation analysisof mel-frequency cepstrum coefficientsrdquo in Proceedings of theInternational Conference onAudio Language and Image Process-ing (ICALIP rsquo10) pp 715ndash718 Shanghai China November 2010
[20] H G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond AudioContent Indexing andRetrievalWileyNewYorkNY USA 2005
[21] ISO-IECJTCI SC29 WGII Moving Pictures Expeti GroupldquoInformation technologymdashmultimedia content description
interfacemdashpart 4 Audiordquo Comittee Draft 15938-4 ISOIEC2000
[22] M Casey ldquoMPEG-7 sound-recognition toolsrdquo IEEE Transac-tions on Circuits and Systems for Video Technology vol 11 no6 pp 737ndash747 2001
[23] V Vapnik Statistical Learning Theory Wiley New York NYUSA 1998
[24] E Mower M J Mataric and S Narayanan ldquoA frameworkfor automatic human emotion classification using emotionprofilesrdquo IEEE Transactions on Audio Speech and LanguageProcessing vol 19 no 5 pp 1057ndash1070 2011
[25] R O Duda P E Hart and D G Stork Pattern ClassificationNew York NY USA 2nd edition 2001
[26] T Jebara R Kondor and A Howard ldquoProbability productkernelsrdquo Journal of Machine Learning Research vol 5 pp 819ndash844 2004
[27] F A Mianji and Y Zhang ldquoRobust hyperspectral classificationusing relevance vector machinerdquo IEEE Transactions on Geo-science and Remote Sensing vol 49 no 6 pp 2100ndash2112 2011
[28] C M Bishop Pattern Recognition andMachine Learning (Infor-mation Science and Statistics) Springer New York NY USA2nd edition 2007
[29] ldquoThe All Music Guiderdquo httpwwwallmusiccom[30] ldquoLastfmrdquo httpcnlastfmhome[31] C C Chang andC J Lin ldquoLIBSVM a library for support vector
machinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm[32] ldquoPattern Recognition Toolboxrdquo httpwwwnewfolderconsult-
ingcomprt
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
The Scientific World Journal 3
31 Extraction of Acoustical Feature
311 RMS Energy RMS energy is also called root meansquare energy It computes the global energy of input signal119909 [11] The operation is defined as follows
119883RMS = radic1
119899
119899
sum
119894=1
1199092119894 (1)
where 119899 means signalrsquos length in hundredth of a second bydefault
312 Tempo Many tempo estimation methods have beenproposed The estimation of tempo is based on detectingperiodicities in a range of BPMs [12] Firstly significant onsetevents are detected in the frequency domain [11] Then findthe events that best represents the tempo of the song whichmeans to choose the maximum periodicity score for eachframe separately
313 Chromagram Chroma which is also called harmonicpitch class profile has a strong relationship with the structureof music [13] Chromagram is a joint distribution of signalstrength over the variables of time and chroma Chromais a frame-based representation of audio and is similar toshort time Fourier transform In music clips frequencycomponents belonging to the same pitch class are extractedby chromagram and transformed to a 12-dimensional repre-sentation including C C D D E F F G G A A andB The chromagram can present the distribution of energyalong the pitches or pitch classes [11 14] In [14] chromagramis defined as the remapping of time-frequency distributionThe chromagram is extracted by
V (119905 119896) = sum
119899isin119878119896
119883119905(119899)
119876119896
119896 isin 0 1 2 11 (2)
where 119883119905(119899) means the logarithmic magnitude of discrete
Fourier transform of the 119905th frame and 119876119896is the number of
elements in a subset of the discrete frequency space for eachpitch class [15]
In Figure 2 the chromagram from a piece of music isexemplified
314 Mel-Frequency Cepstral Coefficients (MFCCs) Aftersignal is digitized a large amount of information is notneeded and cost plenty of storage space Power spectrum isoften adopted to encode the signal to solve the problem [16]It is noted that MFCCs performs similar to human auditoryperception systemThe feature is adopted in various researchtopics including speaker recognition speech recognitionand music emotion recognition For example Cooper andFoote extracted MFCCs from music signal and they foundthat MFCCs are similar to music timbre expression [17] In[18]MFCCswere also proven to be having goodperformancein music recommendation
0 1 2 3 4 5C
CD
DEF
FG
GA
AB
Chromagram
Time axis (s)
Chro
ma c
lass
Figure 2 Example of chromagram from a piece of music
MFCCs extraction is based on spectrum The spectrumcan be extracted by using discrete Fourier transform
119909119908(119891) =
119873
sum
119899=0
119909119908(119899) expminus
2120587119891119899
119873 (3)
After power spectrum is extracted subband energies canbe extracted by using Mel filter banks and then evaluatelogarithm value of the energies as follows
119878119894= log
119865ℎ
sum
119891=119865119897
119871 (119894 119891)1003816100381610038161003816119883119908(119891)
10038161003816100381610038162
(4)
where 119865ℎis the discrete frequency index corresponding to
the high cutoff frequency 119865119897is the discrete frequency index
corresponding to low cutoff frequency and 119871(119894 119891) is theamplitude of the 119891th discrete frequency index of the 119894th Melwindow The number of the Mel windows often ranges from20 to 24 Finally MFCCs is obtained by performing discretecosine transform (DCT) [19] In Figure 3 the averageMFCCsvalues from a piece of music are exemplified
315 Spectrum Centroid Spectrum centroid is an econom-ical description of the shape of the power spectrum [20ndash22] Additionally it is correlated with a major perceptualdimension of timbre that is sharpness Figure 4 gives anexample of a spectrum and its spectrum centroid obtainedfrom a frame in a piece ofmusicThe spectrum centroid valueis 2638Hz in this example
316 Spectrum Spread Spectrum spread is an economicaldescriptor of the shape of the power spectrum that indicateswhether it is concentrated in the vicinity of its centroidor else spread out over the spectrum [20ndash22] It allowsdifferentiating between tone-like and noise-like sounds InFigure 5 an example of spectrum spread from a piece ofmusic is provided
4 The Scientific World Journal
1 2 3 4 5 6 7 8 9 10 11 12 13minus02
0
02
04
06
08
1
12MFCC
Coefficient ranks
Mag
nitu
de
Figure 3 Example of average MFCCs values from a piece of music
0 1000 2000 3000 4000 5000 6000 7000 80000
100
200
300
400
500
600
700Spectrum
Frequency (Hz)
Mag
nitu
de
Figure 4 Example of a spectrum and its spectrum centroid from aframe in a piece of music
317 Ratio of a Spectral Flatness Measure to a Spectral Center(RSS) RSSwas proposed byVapnik for speaker-independentemotional speech recognition [23] RSS is the ratio of spec-trum flatness to spectrum centroid and is calculated by
RSS =1000 times SF
SC (5)
where SF denotes spectrum flatness and SC represents spec-trum centroid
32 Extraction of First-Level Decision Vector The acousticalfeature set is utilized to generate the first-level decision vectorwith each element being a significant value of an emotionThis approach is able to interpret the emotional content byproviding multiple probabilistic class labels rather than asingle hard label [24] For example happiness emotion not
0 50 100 150 200 250 300 3500
500
1000
1500
2000
2500
3000
3500
4000
Frame
Freq
uenc
y
Spectrum spread
Figure 5 Example of spectrum spread from a piece of music
only contains happiness content but also other propertiesthat are similar to the content of peace The similarity topeaceful may cause a music clip to be recognized as an incor-rect emotion class In this example the advantage of first-level decision vector representation is its ability to conveyboth the evidences of happiness and peaceful emotions Thispaper uses the significant values of eight emotions (angryhappy sad relaxed pleased bored nervous and peaceful) toconstruct an emotion profile feature vector To calculate thesignificant value of an emotion we construct its 2-class SVMwith calm emotion as the background side of the SVM
4 Principle Component Analysis
PCA is an important mathematic technology in featureextraction approach In this paper PCA is implemented toreduce the dimensions of the extracted featuresThe first stepof PCA is to calculate the 119889-dimension mean vector u and119889 times 119889 covariance matrix Σ of the samples [25] After thatthe eigenvectors and eigenvalues are computed Finally thelargest 119896 eigenvectors are selected to form a 119889 times 119896 matrix 119872
whose columns consist of the 119896 eigenvectors In fact the otherdimensions are noise The PCA transformed data can be inthe form
x1015840 = M119879 (x minus u) (6)
5 Emotion Classifier
The emotion classifier used in the proposed system adoptsa 2-level hierarchical structure of sparse kernel machinesThe first-level SVMs use probability product kernel whilethe second-level RVMadopts traditional radial basis functionkernel with first-level decision vector feature
51 Support Vector Machine The SVM theory is an effectivestatistical technique and has drawn much attention on audioclassification tasks [7] An SVM is a binary classifier thatcreates an optimal hyperplane to classify input samples Thisoptimal hyperplane linearly divides the two classes with thelargest margin [23] Denote 119879 = (x
119894 119910119894) 119894 = 1 2 119873
as a training set for SVM each pair (x119894 119910119894) means training
The Scientific World Journal 5
sample x119894belongs to a class 119910
119894 where 119910
119894isin +1 minus1 The
fundamental concept is to choose a hyperplane which canclassify T accurately while maximizing the distance betweenthe two classes This means to find a pair (w 119887) such that
119910119894(w sdot x119894+ 119887) gt 0 119894 = 1 119873 (7)
where w isin 119877119873 is normalized by itself and 119887 isin 119877The pair (w 119887) defines a separating hyperplane of equa-
tion
w sdot x + 119887 = 0 (8)
If there exists a hyperplane satisfying (7) the set 119879 is saidto be linearly separable and we can change w and 119887 so that
119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873 (9)
According to (9) we can derive an objective functionunder constraint
min w2
subject to 119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873
(10)
Since w2 is convex we can solve (9) by applying the
classical method of Lagrange multipliers
min w2
+ 120583119894[119910119894(w sdot x119894+ 119887) minus 1] 119894 = 1 119873 (11)
We denote U = (1205831 1205832 120583
119873) as the 119873 nonnegative
Lagrange multipliers associated with (10) After solving (11)the optimal hyperplane has the following expansion
w =
119873
sum
119894=1
120583119894119910119894119909119894 (12)
119887 can be determined from U and from the Kuhn-Tuckerconditions Consider
120583119894(119910119894(w sdot 119909
119894+ 119887) minus 1) = 0 119894 = 1 2 119873 (13)
Accordingly (11) the expected hyperplane is a linearcombination of training samplesThe corresponding trainingsamples (x
119894 119910119894) with nonzero Lagrange multipliers are called
support vectors Finally the decision value from a new datapoint x can be written as
dec (119909) =
119873
sum
119894=1
120583119894119910119894x119894sdot x + 119887 (14)
Functions that satisfy Mercerrsquos theorem can be used askernels In this paper probability product kernel is adopted
52 Probability Product Support Vector Machine A functioncan be considered as kernel function if the function satisfiesMercerrsquos theorem Using Mercerrsquos theory we can introduce amapping function 120601(x) such that 119896(x
119895 x119894) = 120601(x
119895)120601(x119894) This
provides the ability of handling nonlinear data by mappingthe original input space R119889 into some other space
In this paper the probability product kernel is utilizedThe probability product kernel is a method of measuringsimilarity between distributions and it has the property ofsimple and intuitively compelling conception [26] Proba-bility product kernel computes a generalized inner productbetween two probability distributions in the Hilbert space Apositive definite kernel 119896 119874 times 119874 rarr R on input space 119874
and examples o1 o2 o
119898isin 119874 are defined Firstly the input
data 119909 is mapped to a probability distribution119901(119909 | 119874) whichfits separate probabilistic models 119901
1(119909) 1199012(119909) 119901
119898(119909) to
o1 o2 o
119898 After that a novel kernel 119896prob(119901
119894 119901119895) between
probability distributions on 119874 is defined At last a kernelbetween examples is needed to be defined and the kernelis equal to 119896
prob between the corresponding distributionsConsider
119896 (o119894 o119895) = 119896
prob(119901119894 119901119895) (15)
Finally this kernel is applied to SVM and proceeded as usualThe probability product kernel between distributions 119901
119894and
119901119895is defined as
119896 (119901119894 119901119895) = int119909
119901120588
119894(x) 119901120588119895(x) 119889119909 = ⟨119901
120588
119894 119901120588
119895⟩1198712
(16)
where 119901119894and 119901
119895are probability distributions on a space 119874
Assume that 119901120588
119894 119901120588
119895isin 1198712(119874) 119871
2is a Hilbert space and 120588 is
a positive constant Probability product kernel allows us tointroduce prior knowledge of data In this paper we assumea 119889-dimensional Gaussian distribution of our data
53 First-Level Decision Vector Extraction First-level deci-sion vector presents perception probability of each of theeight emotion-specific decisions which is extracted frominput data by collecting decision values from each modelThe decision value of SVM represents the degree of similaritybetween model and testing data The advantage of similaritymeasure can be used to find out which model fits the datamost accurately [24] Using the first-level decision vector themost probably perceived emotion in music can be detected
54 Relevance Vector Machine RVM is a development ofSVM Different from SVM RVM tries to find a considerablenumber of weights which has highest sparsity [27]Themodeldefines a conditional distribution for target class 119910 = 0 1given an input set x
1 x
119899 [28] Assume that a training
data can be a linear combination of weighted nonlinear basisfunctions 120601
119894(x) which is transformed by a logistic sigmoid
function as follows
119891 (xw) = w119879120601 (x) (17)
where w = (1199081 1199082 119908
119866) 120601(x) = (120601
1(x) 1206012(x)
120601119866(x))119879 denotes the weights In order to make weight sparse
the Bayesian probabilistic framework is implemented to findthe distribution over the weights instead of using pointwiseestimation therefore a separate hyperparameter 119886 for each
6 The Scientific World Journal
01 02 05 1 2 5 10 20 40
0102
05
1
2
5
10
20
40
False alarm probability ()
Miss
pro
babi
lity
()
Figure 6 DET curve of the proposed system
of the weight parameters119908 is introduced According to Bayesrule the posterior probability of 119908 is
119901 (w | 119910 a) =119901 (119910 | w a) 119901 (w | a)
119901 (119910 | a) (18)
where 119901(119910 | w a) is likelihood 119901(w | a) is prior conditionedon weights a = [119886
1 119886
119899]119879 and 119901(119910 | a) denotes the
evidence For the reason that 119910 is a binary variable thelikelihood function can be given by
119901 (119910 | w a) =
119899
prod
119894=1
[120590 (119891 (x119894w))]119910119894[1 minus 120590 (119891 (x
119894w))]1minus119910119894
(19)
where 120590(119891) = 1(1+119890minus119891
) is the logistic sigmoid link functionAccording to (18) it can be found that a significant proportionof hyperparameters tend to be infinity and the correspondingposterior distributions of weight parameters are concentratedat zeroTherefore the basis functions that multiplied by theseparameters will not be taken for reference when training themodel As a result the model will be sparse
6 Experimental Results
In the experiments we collected one hundred songs fromtwo websites to construct a music emotion database Thesewebsites are All Music Guide [29] and Lastfm [30] Asmentioned before music may contain multiple emotions Ifwe know which emotion class a song most likely belongs towemay know themain emotion of the song Songs in Lastfmare tagged by many people on the Internet We choose theemotion which is tagged by most people to be the groundtruth of data
The database consists of nine classes of emotions includ-ing happy angry sad bored nervous relaxed pleased calm
and peaceful Calm is taken as a modelrsquos opposite site whentraining models Each emotion class contains twenty songsEach song is thirty seconds long and is divided into five-second clips Half of the songs are used as training dataand the others are used as testing data In this paper 240music clips are tested All of songs are western music and areencoded in 16KHzWAV format The used acoustical featureset are listed in Table 1 The whole feature set dimension is30 The used SVM is based on LIBSVM library [31] andthe used RVM is based on PTR toolbox [32] The systemperformance is evaluated in terms of DET curve Figure 6depicts DET curve of the proposed happiness verificationsystem The proposed system can achieve 1333 equal errorrate (EER) From our results we see that the system performswell on happiness emotion verification in music
7 Conclusion
Detecting emotion in music has become the concern ofmany researchers in recent years In this paper we proposeda first-level decision-vector-based music happiness emotiondetection system The proposed system adopts a hierarchicalstructure of sparse kernel machines First eight SVMmodelsare trained based on acoustical features with probabilityproduct kernel Then eight decision values can be extractedto construct the first-level decision vector feature After thatthese eight decision values are considered as new feature totrain and test a 2-class RVM with happiness as the targetside The probability value of the RVM is used to verifyhappiness content in music Experimental results reveal thatthe proposed system can achieve 1333 equal error rate(EER)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R E Milliman ldquoUsing backgroundmusic to affect the behaviorof supermarket shoppersrdquo Journal of Marketing vol 46 no 3pp 86ndash91 1982
[2] C-H Yeh H-H Lin and H-T Chang ldquoAn efficient emotiondetection scheme for popular musicrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems (ISCAS rsquo09)pp 1799ndash1802 Taipei City Taiwan May 2009
[3] Y-H Yang Y-C Lin Y-F Su and H H Chen ldquoA regressionapproach to music emotion recognitionrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 2 pp 448ndash457 2008
[4] Y H Yang and H H Chen ldquoPrediction of the distributionof perceived music emotions using discrete samplesrdquo IEEETransactions on Audio Speech and Language Processing vol 19no 7 pp 2184ndash2196 2011
[5] B Han S Rho R B Dannenberg and E Hwang ldquoSMERSmusic emotion recognition using support vector regressionrdquo inProceedings of the International Conference on Music Informa-tion Retrieval Kobe Japan 2009
The Scientific World Journal 7
[6] L Lu D Liu and H-J Zhang ldquoAutomatic mood detection andtracking of music audio signalsrdquo IEEE Transactions on AudioSpeech and Language Processing vol 14 no 1 pp 5ndash18 2006
[7] C-Y Chang C-Y Lo C-J Wang and P-C Chung ldquoA musicrecommendation system with consideration of personal emo-tionrdquo in Proceedings of the International Computer Symposium(ICS rsquo10) pp 18ndash23 Tainan City Taiwan December 2010
[8] F C Hwang J S Wang P C Chung and C F YangldquoDetecting emotional expression ofmusic with feature selectionapproachrdquo in Proceedings of the International Conference onOrange Technologies (ICOT rsquo13) pp 282ndash286 March 2013
[9] K Hevner ldquoExpression in music a discussion of experimentalstudies and theoriesrdquo Psychological Review vol 42 no 2 pp186ndash204 1935
[10] M Chouchane S Paris F Le Gland C Musso and D-TPham ldquoOn the probability distribution of a moving targetAsymptotic and non-asymptotic resultsrdquo in Proceedings of the14th International Conference on Information Fusion (Fusion rsquo11)pp 1ndash8 July 2011
[11] O Lartillot and P Toiviainen ldquoMIR in Matlab (II) a toolboxfor musical feature extraction from audiordquo in Proceedings ofthe International Conference Music Information Retrieval pp127ndash130 2007 httpswwwjyufihumlaitoksetmusiikkienresearchcoematerialsmirtoolbox
[12] C-W Chen K Lee and H-H Wu ldquoTowards a class-basedrepresentation of perceptual tempo for music retrievalrdquo inProceedings of the 8th International Conference on MachineLearning andApplications (ICMLA rsquo09) pp 602ndash607December2009
[13] WChai ldquoSemantic segmentation and summarization ofmusicrdquoIEEE Signal Processing Magazine vol 23 no 2 pp 124ndash1322006
[14] M A Bartsch and G H Wakefield ldquoAudio thumbnailingof popular music using chroma-based representationsrdquo IEEETransactions on Multimedia vol 7 no 1 pp 96ndash104 2005
[15] X Yu J Zhang J LiuWWan andW Yang ldquoAn audio retrievalmethod based on chromagram and distance metricsrdquo in Pro-ceedings of the International Conference on Audio Language andImage Processing (ICALIP rsquo10) pp 425ndash428 Shanghai ChinaNovember 2010
[16] J O Garcıa and C A R Garcia ldquoMel-frequency cepstrumcoefficients extraction from infant cry for classification of nor-mal and pathological cry with feed-forward neural networksrdquoin Proceedings of the International Joint Conference on NeuralNetworks pp 3140ndash3145 July 2003
[17] C Y Lin and S Cheng ldquoMulti-theme analysis of musicemotion similarity for jukebox applicationrdquo in Proceedings ofthe International Conference on Audio Language and ImageProcessing (ICALIP rsquo12) pp 241ndash246 July 2012
[18] B Shao M Ogihara D Wang and T Li ldquoMusic recommenda-tion based on acoustic features and user access patternsrdquo IEEETransactions on Audio Speech and Language Processing vol 17pp 1602ndash1611 2009
[19] W-Q Zhang D Yang J Liu and X Bao ldquoPerturbation analysisof mel-frequency cepstrum coefficientsrdquo in Proceedings of theInternational Conference onAudio Language and Image Process-ing (ICALIP rsquo10) pp 715ndash718 Shanghai China November 2010
[20] H G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond AudioContent Indexing andRetrievalWileyNewYorkNY USA 2005
[21] ISO-IECJTCI SC29 WGII Moving Pictures Expeti GroupldquoInformation technologymdashmultimedia content description
interfacemdashpart 4 Audiordquo Comittee Draft 15938-4 ISOIEC2000
[22] M Casey ldquoMPEG-7 sound-recognition toolsrdquo IEEE Transac-tions on Circuits and Systems for Video Technology vol 11 no6 pp 737ndash747 2001
[23] V Vapnik Statistical Learning Theory Wiley New York NYUSA 1998
[24] E Mower M J Mataric and S Narayanan ldquoA frameworkfor automatic human emotion classification using emotionprofilesrdquo IEEE Transactions on Audio Speech and LanguageProcessing vol 19 no 5 pp 1057ndash1070 2011
[25] R O Duda P E Hart and D G Stork Pattern ClassificationNew York NY USA 2nd edition 2001
[26] T Jebara R Kondor and A Howard ldquoProbability productkernelsrdquo Journal of Machine Learning Research vol 5 pp 819ndash844 2004
[27] F A Mianji and Y Zhang ldquoRobust hyperspectral classificationusing relevance vector machinerdquo IEEE Transactions on Geo-science and Remote Sensing vol 49 no 6 pp 2100ndash2112 2011
[28] C M Bishop Pattern Recognition andMachine Learning (Infor-mation Science and Statistics) Springer New York NY USA2nd edition 2007
[29] ldquoThe All Music Guiderdquo httpwwwallmusiccom[30] ldquoLastfmrdquo httpcnlastfmhome[31] C C Chang andC J Lin ldquoLIBSVM a library for support vector
machinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm[32] ldquoPattern Recognition Toolboxrdquo httpwwwnewfolderconsult-
ingcomprt
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
4 The Scientific World Journal
1 2 3 4 5 6 7 8 9 10 11 12 13minus02
0
02
04
06
08
1
12MFCC
Coefficient ranks
Mag
nitu
de
Figure 3 Example of average MFCCs values from a piece of music
0 1000 2000 3000 4000 5000 6000 7000 80000
100
200
300
400
500
600
700Spectrum
Frequency (Hz)
Mag
nitu
de
Figure 4 Example of a spectrum and its spectrum centroid from aframe in a piece of music
317 Ratio of a Spectral Flatness Measure to a Spectral Center(RSS) RSSwas proposed byVapnik for speaker-independentemotional speech recognition [23] RSS is the ratio of spec-trum flatness to spectrum centroid and is calculated by
RSS =1000 times SF
SC (5)
where SF denotes spectrum flatness and SC represents spec-trum centroid
32 Extraction of First-Level Decision Vector The acousticalfeature set is utilized to generate the first-level decision vectorwith each element being a significant value of an emotionThis approach is able to interpret the emotional content byproviding multiple probabilistic class labels rather than asingle hard label [24] For example happiness emotion not
0 50 100 150 200 250 300 3500
500
1000
1500
2000
2500
3000
3500
4000
Frame
Freq
uenc
y
Spectrum spread
Figure 5 Example of spectrum spread from a piece of music
only contains happiness content but also other propertiesthat are similar to the content of peace The similarity topeaceful may cause a music clip to be recognized as an incor-rect emotion class In this example the advantage of first-level decision vector representation is its ability to conveyboth the evidences of happiness and peaceful emotions Thispaper uses the significant values of eight emotions (angryhappy sad relaxed pleased bored nervous and peaceful) toconstruct an emotion profile feature vector To calculate thesignificant value of an emotion we construct its 2-class SVMwith calm emotion as the background side of the SVM
4 Principle Component Analysis
PCA is an important mathematic technology in featureextraction approach In this paper PCA is implemented toreduce the dimensions of the extracted featuresThe first stepof PCA is to calculate the 119889-dimension mean vector u and119889 times 119889 covariance matrix Σ of the samples [25] After thatthe eigenvectors and eigenvalues are computed Finally thelargest 119896 eigenvectors are selected to form a 119889 times 119896 matrix 119872
whose columns consist of the 119896 eigenvectors In fact the otherdimensions are noise The PCA transformed data can be inthe form
x1015840 = M119879 (x minus u) (6)
5 Emotion Classifier
The emotion classifier used in the proposed system adoptsa 2-level hierarchical structure of sparse kernel machinesThe first-level SVMs use probability product kernel whilethe second-level RVMadopts traditional radial basis functionkernel with first-level decision vector feature
51 Support Vector Machine The SVM theory is an effectivestatistical technique and has drawn much attention on audioclassification tasks [7] An SVM is a binary classifier thatcreates an optimal hyperplane to classify input samples Thisoptimal hyperplane linearly divides the two classes with thelargest margin [23] Denote 119879 = (x
119894 119910119894) 119894 = 1 2 119873
as a training set for SVM each pair (x119894 119910119894) means training
The Scientific World Journal 5
sample x119894belongs to a class 119910
119894 where 119910
119894isin +1 minus1 The
fundamental concept is to choose a hyperplane which canclassify T accurately while maximizing the distance betweenthe two classes This means to find a pair (w 119887) such that
119910119894(w sdot x119894+ 119887) gt 0 119894 = 1 119873 (7)
where w isin 119877119873 is normalized by itself and 119887 isin 119877The pair (w 119887) defines a separating hyperplane of equa-
tion
w sdot x + 119887 = 0 (8)
If there exists a hyperplane satisfying (7) the set 119879 is saidto be linearly separable and we can change w and 119887 so that
119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873 (9)
According to (9) we can derive an objective functionunder constraint
min w2
subject to 119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873
(10)
Since w2 is convex we can solve (9) by applying the
classical method of Lagrange multipliers
min w2
+ 120583119894[119910119894(w sdot x119894+ 119887) minus 1] 119894 = 1 119873 (11)
We denote U = (1205831 1205832 120583
119873) as the 119873 nonnegative
Lagrange multipliers associated with (10) After solving (11)the optimal hyperplane has the following expansion
w =
119873
sum
119894=1
120583119894119910119894119909119894 (12)
119887 can be determined from U and from the Kuhn-Tuckerconditions Consider
120583119894(119910119894(w sdot 119909
119894+ 119887) minus 1) = 0 119894 = 1 2 119873 (13)
Accordingly (11) the expected hyperplane is a linearcombination of training samplesThe corresponding trainingsamples (x
119894 119910119894) with nonzero Lagrange multipliers are called
support vectors Finally the decision value from a new datapoint x can be written as
dec (119909) =
119873
sum
119894=1
120583119894119910119894x119894sdot x + 119887 (14)
Functions that satisfy Mercerrsquos theorem can be used askernels In this paper probability product kernel is adopted
52 Probability Product Support Vector Machine A functioncan be considered as kernel function if the function satisfiesMercerrsquos theorem Using Mercerrsquos theory we can introduce amapping function 120601(x) such that 119896(x
119895 x119894) = 120601(x
119895)120601(x119894) This
provides the ability of handling nonlinear data by mappingthe original input space R119889 into some other space
In this paper the probability product kernel is utilizedThe probability product kernel is a method of measuringsimilarity between distributions and it has the property ofsimple and intuitively compelling conception [26] Proba-bility product kernel computes a generalized inner productbetween two probability distributions in the Hilbert space Apositive definite kernel 119896 119874 times 119874 rarr R on input space 119874
and examples o1 o2 o
119898isin 119874 are defined Firstly the input
data 119909 is mapped to a probability distribution119901(119909 | 119874) whichfits separate probabilistic models 119901
1(119909) 1199012(119909) 119901
119898(119909) to
o1 o2 o
119898 After that a novel kernel 119896prob(119901
119894 119901119895) between
probability distributions on 119874 is defined At last a kernelbetween examples is needed to be defined and the kernelis equal to 119896
prob between the corresponding distributionsConsider
119896 (o119894 o119895) = 119896
prob(119901119894 119901119895) (15)
Finally this kernel is applied to SVM and proceeded as usualThe probability product kernel between distributions 119901
119894and
119901119895is defined as
119896 (119901119894 119901119895) = int119909
119901120588
119894(x) 119901120588119895(x) 119889119909 = ⟨119901
120588
119894 119901120588
119895⟩1198712
(16)
where 119901119894and 119901
119895are probability distributions on a space 119874
Assume that 119901120588
119894 119901120588
119895isin 1198712(119874) 119871
2is a Hilbert space and 120588 is
a positive constant Probability product kernel allows us tointroduce prior knowledge of data In this paper we assumea 119889-dimensional Gaussian distribution of our data
53 First-Level Decision Vector Extraction First-level deci-sion vector presents perception probability of each of theeight emotion-specific decisions which is extracted frominput data by collecting decision values from each modelThe decision value of SVM represents the degree of similaritybetween model and testing data The advantage of similaritymeasure can be used to find out which model fits the datamost accurately [24] Using the first-level decision vector themost probably perceived emotion in music can be detected
54 Relevance Vector Machine RVM is a development ofSVM Different from SVM RVM tries to find a considerablenumber of weights which has highest sparsity [27]Themodeldefines a conditional distribution for target class 119910 = 0 1given an input set x
1 x
119899 [28] Assume that a training
data can be a linear combination of weighted nonlinear basisfunctions 120601
119894(x) which is transformed by a logistic sigmoid
function as follows
119891 (xw) = w119879120601 (x) (17)
where w = (1199081 1199082 119908
119866) 120601(x) = (120601
1(x) 1206012(x)
120601119866(x))119879 denotes the weights In order to make weight sparse
the Bayesian probabilistic framework is implemented to findthe distribution over the weights instead of using pointwiseestimation therefore a separate hyperparameter 119886 for each
6 The Scientific World Journal
01 02 05 1 2 5 10 20 40
0102
05
1
2
5
10
20
40
False alarm probability ()
Miss
pro
babi
lity
()
Figure 6 DET curve of the proposed system
of the weight parameters119908 is introduced According to Bayesrule the posterior probability of 119908 is
119901 (w | 119910 a) =119901 (119910 | w a) 119901 (w | a)
119901 (119910 | a) (18)
where 119901(119910 | w a) is likelihood 119901(w | a) is prior conditionedon weights a = [119886
1 119886
119899]119879 and 119901(119910 | a) denotes the
evidence For the reason that 119910 is a binary variable thelikelihood function can be given by
119901 (119910 | w a) =
119899
prod
119894=1
[120590 (119891 (x119894w))]119910119894[1 minus 120590 (119891 (x
119894w))]1minus119910119894
(19)
where 120590(119891) = 1(1+119890minus119891
) is the logistic sigmoid link functionAccording to (18) it can be found that a significant proportionof hyperparameters tend to be infinity and the correspondingposterior distributions of weight parameters are concentratedat zeroTherefore the basis functions that multiplied by theseparameters will not be taken for reference when training themodel As a result the model will be sparse
6 Experimental Results
In the experiments we collected one hundred songs fromtwo websites to construct a music emotion database Thesewebsites are All Music Guide [29] and Lastfm [30] Asmentioned before music may contain multiple emotions Ifwe know which emotion class a song most likely belongs towemay know themain emotion of the song Songs in Lastfmare tagged by many people on the Internet We choose theemotion which is tagged by most people to be the groundtruth of data
The database consists of nine classes of emotions includ-ing happy angry sad bored nervous relaxed pleased calm
and peaceful Calm is taken as a modelrsquos opposite site whentraining models Each emotion class contains twenty songsEach song is thirty seconds long and is divided into five-second clips Half of the songs are used as training dataand the others are used as testing data In this paper 240music clips are tested All of songs are western music and areencoded in 16KHzWAV format The used acoustical featureset are listed in Table 1 The whole feature set dimension is30 The used SVM is based on LIBSVM library [31] andthe used RVM is based on PTR toolbox [32] The systemperformance is evaluated in terms of DET curve Figure 6depicts DET curve of the proposed happiness verificationsystem The proposed system can achieve 1333 equal errorrate (EER) From our results we see that the system performswell on happiness emotion verification in music
7 Conclusion
Detecting emotion in music has become the concern ofmany researchers in recent years In this paper we proposeda first-level decision-vector-based music happiness emotiondetection system The proposed system adopts a hierarchicalstructure of sparse kernel machines First eight SVMmodelsare trained based on acoustical features with probabilityproduct kernel Then eight decision values can be extractedto construct the first-level decision vector feature After thatthese eight decision values are considered as new feature totrain and test a 2-class RVM with happiness as the targetside The probability value of the RVM is used to verifyhappiness content in music Experimental results reveal thatthe proposed system can achieve 1333 equal error rate(EER)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R E Milliman ldquoUsing backgroundmusic to affect the behaviorof supermarket shoppersrdquo Journal of Marketing vol 46 no 3pp 86ndash91 1982
[2] C-H Yeh H-H Lin and H-T Chang ldquoAn efficient emotiondetection scheme for popular musicrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems (ISCAS rsquo09)pp 1799ndash1802 Taipei City Taiwan May 2009
[3] Y-H Yang Y-C Lin Y-F Su and H H Chen ldquoA regressionapproach to music emotion recognitionrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 2 pp 448ndash457 2008
[4] Y H Yang and H H Chen ldquoPrediction of the distributionof perceived music emotions using discrete samplesrdquo IEEETransactions on Audio Speech and Language Processing vol 19no 7 pp 2184ndash2196 2011
[5] B Han S Rho R B Dannenberg and E Hwang ldquoSMERSmusic emotion recognition using support vector regressionrdquo inProceedings of the International Conference on Music Informa-tion Retrieval Kobe Japan 2009
The Scientific World Journal 7
[6] L Lu D Liu and H-J Zhang ldquoAutomatic mood detection andtracking of music audio signalsrdquo IEEE Transactions on AudioSpeech and Language Processing vol 14 no 1 pp 5ndash18 2006
[7] C-Y Chang C-Y Lo C-J Wang and P-C Chung ldquoA musicrecommendation system with consideration of personal emo-tionrdquo in Proceedings of the International Computer Symposium(ICS rsquo10) pp 18ndash23 Tainan City Taiwan December 2010
[8] F C Hwang J S Wang P C Chung and C F YangldquoDetecting emotional expression ofmusic with feature selectionapproachrdquo in Proceedings of the International Conference onOrange Technologies (ICOT rsquo13) pp 282ndash286 March 2013
[9] K Hevner ldquoExpression in music a discussion of experimentalstudies and theoriesrdquo Psychological Review vol 42 no 2 pp186ndash204 1935
[10] M Chouchane S Paris F Le Gland C Musso and D-TPham ldquoOn the probability distribution of a moving targetAsymptotic and non-asymptotic resultsrdquo in Proceedings of the14th International Conference on Information Fusion (Fusion rsquo11)pp 1ndash8 July 2011
[11] O Lartillot and P Toiviainen ldquoMIR in Matlab (II) a toolboxfor musical feature extraction from audiordquo in Proceedings ofthe International Conference Music Information Retrieval pp127ndash130 2007 httpswwwjyufihumlaitoksetmusiikkienresearchcoematerialsmirtoolbox
[12] C-W Chen K Lee and H-H Wu ldquoTowards a class-basedrepresentation of perceptual tempo for music retrievalrdquo inProceedings of the 8th International Conference on MachineLearning andApplications (ICMLA rsquo09) pp 602ndash607December2009
[13] WChai ldquoSemantic segmentation and summarization ofmusicrdquoIEEE Signal Processing Magazine vol 23 no 2 pp 124ndash1322006
[14] M A Bartsch and G H Wakefield ldquoAudio thumbnailingof popular music using chroma-based representationsrdquo IEEETransactions on Multimedia vol 7 no 1 pp 96ndash104 2005
[15] X Yu J Zhang J LiuWWan andW Yang ldquoAn audio retrievalmethod based on chromagram and distance metricsrdquo in Pro-ceedings of the International Conference on Audio Language andImage Processing (ICALIP rsquo10) pp 425ndash428 Shanghai ChinaNovember 2010
[16] J O Garcıa and C A R Garcia ldquoMel-frequency cepstrumcoefficients extraction from infant cry for classification of nor-mal and pathological cry with feed-forward neural networksrdquoin Proceedings of the International Joint Conference on NeuralNetworks pp 3140ndash3145 July 2003
[17] C Y Lin and S Cheng ldquoMulti-theme analysis of musicemotion similarity for jukebox applicationrdquo in Proceedings ofthe International Conference on Audio Language and ImageProcessing (ICALIP rsquo12) pp 241ndash246 July 2012
[18] B Shao M Ogihara D Wang and T Li ldquoMusic recommenda-tion based on acoustic features and user access patternsrdquo IEEETransactions on Audio Speech and Language Processing vol 17pp 1602ndash1611 2009
[19] W-Q Zhang D Yang J Liu and X Bao ldquoPerturbation analysisof mel-frequency cepstrum coefficientsrdquo in Proceedings of theInternational Conference onAudio Language and Image Process-ing (ICALIP rsquo10) pp 715ndash718 Shanghai China November 2010
[20] H G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond AudioContent Indexing andRetrievalWileyNewYorkNY USA 2005
[21] ISO-IECJTCI SC29 WGII Moving Pictures Expeti GroupldquoInformation technologymdashmultimedia content description
interfacemdashpart 4 Audiordquo Comittee Draft 15938-4 ISOIEC2000
[22] M Casey ldquoMPEG-7 sound-recognition toolsrdquo IEEE Transac-tions on Circuits and Systems for Video Technology vol 11 no6 pp 737ndash747 2001
[23] V Vapnik Statistical Learning Theory Wiley New York NYUSA 1998
[24] E Mower M J Mataric and S Narayanan ldquoA frameworkfor automatic human emotion classification using emotionprofilesrdquo IEEE Transactions on Audio Speech and LanguageProcessing vol 19 no 5 pp 1057ndash1070 2011
[25] R O Duda P E Hart and D G Stork Pattern ClassificationNew York NY USA 2nd edition 2001
[26] T Jebara R Kondor and A Howard ldquoProbability productkernelsrdquo Journal of Machine Learning Research vol 5 pp 819ndash844 2004
[27] F A Mianji and Y Zhang ldquoRobust hyperspectral classificationusing relevance vector machinerdquo IEEE Transactions on Geo-science and Remote Sensing vol 49 no 6 pp 2100ndash2112 2011
[28] C M Bishop Pattern Recognition andMachine Learning (Infor-mation Science and Statistics) Springer New York NY USA2nd edition 2007
[29] ldquoThe All Music Guiderdquo httpwwwallmusiccom[30] ldquoLastfmrdquo httpcnlastfmhome[31] C C Chang andC J Lin ldquoLIBSVM a library for support vector
machinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm[32] ldquoPattern Recognition Toolboxrdquo httpwwwnewfolderconsult-
ingcomprt
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
The Scientific World Journal 5
sample x119894belongs to a class 119910
119894 where 119910
119894isin +1 minus1 The
fundamental concept is to choose a hyperplane which canclassify T accurately while maximizing the distance betweenthe two classes This means to find a pair (w 119887) such that
119910119894(w sdot x119894+ 119887) gt 0 119894 = 1 119873 (7)
where w isin 119877119873 is normalized by itself and 119887 isin 119877The pair (w 119887) defines a separating hyperplane of equa-
tion
w sdot x + 119887 = 0 (8)
If there exists a hyperplane satisfying (7) the set 119879 is saidto be linearly separable and we can change w and 119887 so that
119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873 (9)
According to (9) we can derive an objective functionunder constraint
min w2
subject to 119910119894(w sdot x119894+ 119887) gt 1 119894 = 1 119873
(10)
Since w2 is convex we can solve (9) by applying the
classical method of Lagrange multipliers
min w2
+ 120583119894[119910119894(w sdot x119894+ 119887) minus 1] 119894 = 1 119873 (11)
We denote U = (1205831 1205832 120583
119873) as the 119873 nonnegative
Lagrange multipliers associated with (10) After solving (11)the optimal hyperplane has the following expansion
w =
119873
sum
119894=1
120583119894119910119894119909119894 (12)
119887 can be determined from U and from the Kuhn-Tuckerconditions Consider
120583119894(119910119894(w sdot 119909
119894+ 119887) minus 1) = 0 119894 = 1 2 119873 (13)
Accordingly (11) the expected hyperplane is a linearcombination of training samplesThe corresponding trainingsamples (x
119894 119910119894) with nonzero Lagrange multipliers are called
support vectors Finally the decision value from a new datapoint x can be written as
dec (119909) =
119873
sum
119894=1
120583119894119910119894x119894sdot x + 119887 (14)
Functions that satisfy Mercerrsquos theorem can be used askernels In this paper probability product kernel is adopted
52 Probability Product Support Vector Machine A functioncan be considered as kernel function if the function satisfiesMercerrsquos theorem Using Mercerrsquos theory we can introduce amapping function 120601(x) such that 119896(x
119895 x119894) = 120601(x
119895)120601(x119894) This
provides the ability of handling nonlinear data by mappingthe original input space R119889 into some other space
In this paper the probability product kernel is utilizedThe probability product kernel is a method of measuringsimilarity between distributions and it has the property ofsimple and intuitively compelling conception [26] Proba-bility product kernel computes a generalized inner productbetween two probability distributions in the Hilbert space Apositive definite kernel 119896 119874 times 119874 rarr R on input space 119874
and examples o1 o2 o
119898isin 119874 are defined Firstly the input
data 119909 is mapped to a probability distribution119901(119909 | 119874) whichfits separate probabilistic models 119901
1(119909) 1199012(119909) 119901
119898(119909) to
o1 o2 o
119898 After that a novel kernel 119896prob(119901
119894 119901119895) between
probability distributions on 119874 is defined At last a kernelbetween examples is needed to be defined and the kernelis equal to 119896
prob between the corresponding distributionsConsider
119896 (o119894 o119895) = 119896
prob(119901119894 119901119895) (15)
Finally this kernel is applied to SVM and proceeded as usualThe probability product kernel between distributions 119901
119894and
119901119895is defined as
119896 (119901119894 119901119895) = int119909
119901120588
119894(x) 119901120588119895(x) 119889119909 = ⟨119901
120588
119894 119901120588
119895⟩1198712
(16)
where 119901119894and 119901
119895are probability distributions on a space 119874
Assume that 119901120588
119894 119901120588
119895isin 1198712(119874) 119871
2is a Hilbert space and 120588 is
a positive constant Probability product kernel allows us tointroduce prior knowledge of data In this paper we assumea 119889-dimensional Gaussian distribution of our data
53 First-Level Decision Vector Extraction First-level deci-sion vector presents perception probability of each of theeight emotion-specific decisions which is extracted frominput data by collecting decision values from each modelThe decision value of SVM represents the degree of similaritybetween model and testing data The advantage of similaritymeasure can be used to find out which model fits the datamost accurately [24] Using the first-level decision vector themost probably perceived emotion in music can be detected
54 Relevance Vector Machine RVM is a development ofSVM Different from SVM RVM tries to find a considerablenumber of weights which has highest sparsity [27]Themodeldefines a conditional distribution for target class 119910 = 0 1given an input set x
1 x
119899 [28] Assume that a training
data can be a linear combination of weighted nonlinear basisfunctions 120601
119894(x) which is transformed by a logistic sigmoid
function as follows
119891 (xw) = w119879120601 (x) (17)
where w = (1199081 1199082 119908
119866) 120601(x) = (120601
1(x) 1206012(x)
120601119866(x))119879 denotes the weights In order to make weight sparse
the Bayesian probabilistic framework is implemented to findthe distribution over the weights instead of using pointwiseestimation therefore a separate hyperparameter 119886 for each
6 The Scientific World Journal
01 02 05 1 2 5 10 20 40
0102
05
1
2
5
10
20
40
False alarm probability ()
Miss
pro
babi
lity
()
Figure 6 DET curve of the proposed system
of the weight parameters119908 is introduced According to Bayesrule the posterior probability of 119908 is
119901 (w | 119910 a) =119901 (119910 | w a) 119901 (w | a)
119901 (119910 | a) (18)
where 119901(119910 | w a) is likelihood 119901(w | a) is prior conditionedon weights a = [119886
1 119886
119899]119879 and 119901(119910 | a) denotes the
evidence For the reason that 119910 is a binary variable thelikelihood function can be given by
119901 (119910 | w a) =
119899
prod
119894=1
[120590 (119891 (x119894w))]119910119894[1 minus 120590 (119891 (x
119894w))]1minus119910119894
(19)
where 120590(119891) = 1(1+119890minus119891
) is the logistic sigmoid link functionAccording to (18) it can be found that a significant proportionof hyperparameters tend to be infinity and the correspondingposterior distributions of weight parameters are concentratedat zeroTherefore the basis functions that multiplied by theseparameters will not be taken for reference when training themodel As a result the model will be sparse
6 Experimental Results
In the experiments we collected one hundred songs fromtwo websites to construct a music emotion database Thesewebsites are All Music Guide [29] and Lastfm [30] Asmentioned before music may contain multiple emotions Ifwe know which emotion class a song most likely belongs towemay know themain emotion of the song Songs in Lastfmare tagged by many people on the Internet We choose theemotion which is tagged by most people to be the groundtruth of data
The database consists of nine classes of emotions includ-ing happy angry sad bored nervous relaxed pleased calm
and peaceful Calm is taken as a modelrsquos opposite site whentraining models Each emotion class contains twenty songsEach song is thirty seconds long and is divided into five-second clips Half of the songs are used as training dataand the others are used as testing data In this paper 240music clips are tested All of songs are western music and areencoded in 16KHzWAV format The used acoustical featureset are listed in Table 1 The whole feature set dimension is30 The used SVM is based on LIBSVM library [31] andthe used RVM is based on PTR toolbox [32] The systemperformance is evaluated in terms of DET curve Figure 6depicts DET curve of the proposed happiness verificationsystem The proposed system can achieve 1333 equal errorrate (EER) From our results we see that the system performswell on happiness emotion verification in music
7 Conclusion
Detecting emotion in music has become the concern ofmany researchers in recent years In this paper we proposeda first-level decision-vector-based music happiness emotiondetection system The proposed system adopts a hierarchicalstructure of sparse kernel machines First eight SVMmodelsare trained based on acoustical features with probabilityproduct kernel Then eight decision values can be extractedto construct the first-level decision vector feature After thatthese eight decision values are considered as new feature totrain and test a 2-class RVM with happiness as the targetside The probability value of the RVM is used to verifyhappiness content in music Experimental results reveal thatthe proposed system can achieve 1333 equal error rate(EER)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R E Milliman ldquoUsing backgroundmusic to affect the behaviorof supermarket shoppersrdquo Journal of Marketing vol 46 no 3pp 86ndash91 1982
[2] C-H Yeh H-H Lin and H-T Chang ldquoAn efficient emotiondetection scheme for popular musicrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems (ISCAS rsquo09)pp 1799ndash1802 Taipei City Taiwan May 2009
[3] Y-H Yang Y-C Lin Y-F Su and H H Chen ldquoA regressionapproach to music emotion recognitionrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 2 pp 448ndash457 2008
[4] Y H Yang and H H Chen ldquoPrediction of the distributionof perceived music emotions using discrete samplesrdquo IEEETransactions on Audio Speech and Language Processing vol 19no 7 pp 2184ndash2196 2011
[5] B Han S Rho R B Dannenberg and E Hwang ldquoSMERSmusic emotion recognition using support vector regressionrdquo inProceedings of the International Conference on Music Informa-tion Retrieval Kobe Japan 2009
The Scientific World Journal 7
[6] L Lu D Liu and H-J Zhang ldquoAutomatic mood detection andtracking of music audio signalsrdquo IEEE Transactions on AudioSpeech and Language Processing vol 14 no 1 pp 5ndash18 2006
[7] C-Y Chang C-Y Lo C-J Wang and P-C Chung ldquoA musicrecommendation system with consideration of personal emo-tionrdquo in Proceedings of the International Computer Symposium(ICS rsquo10) pp 18ndash23 Tainan City Taiwan December 2010
[8] F C Hwang J S Wang P C Chung and C F YangldquoDetecting emotional expression ofmusic with feature selectionapproachrdquo in Proceedings of the International Conference onOrange Technologies (ICOT rsquo13) pp 282ndash286 March 2013
[9] K Hevner ldquoExpression in music a discussion of experimentalstudies and theoriesrdquo Psychological Review vol 42 no 2 pp186ndash204 1935
[10] M Chouchane S Paris F Le Gland C Musso and D-TPham ldquoOn the probability distribution of a moving targetAsymptotic and non-asymptotic resultsrdquo in Proceedings of the14th International Conference on Information Fusion (Fusion rsquo11)pp 1ndash8 July 2011
[11] O Lartillot and P Toiviainen ldquoMIR in Matlab (II) a toolboxfor musical feature extraction from audiordquo in Proceedings ofthe International Conference Music Information Retrieval pp127ndash130 2007 httpswwwjyufihumlaitoksetmusiikkienresearchcoematerialsmirtoolbox
[12] C-W Chen K Lee and H-H Wu ldquoTowards a class-basedrepresentation of perceptual tempo for music retrievalrdquo inProceedings of the 8th International Conference on MachineLearning andApplications (ICMLA rsquo09) pp 602ndash607December2009
[13] WChai ldquoSemantic segmentation and summarization ofmusicrdquoIEEE Signal Processing Magazine vol 23 no 2 pp 124ndash1322006
[14] M A Bartsch and G H Wakefield ldquoAudio thumbnailingof popular music using chroma-based representationsrdquo IEEETransactions on Multimedia vol 7 no 1 pp 96ndash104 2005
[15] X Yu J Zhang J LiuWWan andW Yang ldquoAn audio retrievalmethod based on chromagram and distance metricsrdquo in Pro-ceedings of the International Conference on Audio Language andImage Processing (ICALIP rsquo10) pp 425ndash428 Shanghai ChinaNovember 2010
[16] J O Garcıa and C A R Garcia ldquoMel-frequency cepstrumcoefficients extraction from infant cry for classification of nor-mal and pathological cry with feed-forward neural networksrdquoin Proceedings of the International Joint Conference on NeuralNetworks pp 3140ndash3145 July 2003
[17] C Y Lin and S Cheng ldquoMulti-theme analysis of musicemotion similarity for jukebox applicationrdquo in Proceedings ofthe International Conference on Audio Language and ImageProcessing (ICALIP rsquo12) pp 241ndash246 July 2012
[18] B Shao M Ogihara D Wang and T Li ldquoMusic recommenda-tion based on acoustic features and user access patternsrdquo IEEETransactions on Audio Speech and Language Processing vol 17pp 1602ndash1611 2009
[19] W-Q Zhang D Yang J Liu and X Bao ldquoPerturbation analysisof mel-frequency cepstrum coefficientsrdquo in Proceedings of theInternational Conference onAudio Language and Image Process-ing (ICALIP rsquo10) pp 715ndash718 Shanghai China November 2010
[20] H G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond AudioContent Indexing andRetrievalWileyNewYorkNY USA 2005
[21] ISO-IECJTCI SC29 WGII Moving Pictures Expeti GroupldquoInformation technologymdashmultimedia content description
interfacemdashpart 4 Audiordquo Comittee Draft 15938-4 ISOIEC2000
[22] M Casey ldquoMPEG-7 sound-recognition toolsrdquo IEEE Transac-tions on Circuits and Systems for Video Technology vol 11 no6 pp 737ndash747 2001
[23] V Vapnik Statistical Learning Theory Wiley New York NYUSA 1998
[24] E Mower M J Mataric and S Narayanan ldquoA frameworkfor automatic human emotion classification using emotionprofilesrdquo IEEE Transactions on Audio Speech and LanguageProcessing vol 19 no 5 pp 1057ndash1070 2011
[25] R O Duda P E Hart and D G Stork Pattern ClassificationNew York NY USA 2nd edition 2001
[26] T Jebara R Kondor and A Howard ldquoProbability productkernelsrdquo Journal of Machine Learning Research vol 5 pp 819ndash844 2004
[27] F A Mianji and Y Zhang ldquoRobust hyperspectral classificationusing relevance vector machinerdquo IEEE Transactions on Geo-science and Remote Sensing vol 49 no 6 pp 2100ndash2112 2011
[28] C M Bishop Pattern Recognition andMachine Learning (Infor-mation Science and Statistics) Springer New York NY USA2nd edition 2007
[29] ldquoThe All Music Guiderdquo httpwwwallmusiccom[30] ldquoLastfmrdquo httpcnlastfmhome[31] C C Chang andC J Lin ldquoLIBSVM a library for support vector
machinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm[32] ldquoPattern Recognition Toolboxrdquo httpwwwnewfolderconsult-
ingcomprt
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
6 The Scientific World Journal
01 02 05 1 2 5 10 20 40
0102
05
1
2
5
10
20
40
False alarm probability ()
Miss
pro
babi
lity
()
Figure 6 DET curve of the proposed system
of the weight parameters119908 is introduced According to Bayesrule the posterior probability of 119908 is
119901 (w | 119910 a) =119901 (119910 | w a) 119901 (w | a)
119901 (119910 | a) (18)
where 119901(119910 | w a) is likelihood 119901(w | a) is prior conditionedon weights a = [119886
1 119886
119899]119879 and 119901(119910 | a) denotes the
evidence For the reason that 119910 is a binary variable thelikelihood function can be given by
119901 (119910 | w a) =
119899
prod
119894=1
[120590 (119891 (x119894w))]119910119894[1 minus 120590 (119891 (x
119894w))]1minus119910119894
(19)
where 120590(119891) = 1(1+119890minus119891
) is the logistic sigmoid link functionAccording to (18) it can be found that a significant proportionof hyperparameters tend to be infinity and the correspondingposterior distributions of weight parameters are concentratedat zeroTherefore the basis functions that multiplied by theseparameters will not be taken for reference when training themodel As a result the model will be sparse
6 Experimental Results
In the experiments we collected one hundred songs fromtwo websites to construct a music emotion database Thesewebsites are All Music Guide [29] and Lastfm [30] Asmentioned before music may contain multiple emotions Ifwe know which emotion class a song most likely belongs towemay know themain emotion of the song Songs in Lastfmare tagged by many people on the Internet We choose theemotion which is tagged by most people to be the groundtruth of data
The database consists of nine classes of emotions includ-ing happy angry sad bored nervous relaxed pleased calm
and peaceful Calm is taken as a modelrsquos opposite site whentraining models Each emotion class contains twenty songsEach song is thirty seconds long and is divided into five-second clips Half of the songs are used as training dataand the others are used as testing data In this paper 240music clips are tested All of songs are western music and areencoded in 16KHzWAV format The used acoustical featureset are listed in Table 1 The whole feature set dimension is30 The used SVM is based on LIBSVM library [31] andthe used RVM is based on PTR toolbox [32] The systemperformance is evaluated in terms of DET curve Figure 6depicts DET curve of the proposed happiness verificationsystem The proposed system can achieve 1333 equal errorrate (EER) From our results we see that the system performswell on happiness emotion verification in music
7 Conclusion
Detecting emotion in music has become the concern ofmany researchers in recent years In this paper we proposeda first-level decision-vector-based music happiness emotiondetection system The proposed system adopts a hierarchicalstructure of sparse kernel machines First eight SVMmodelsare trained based on acoustical features with probabilityproduct kernel Then eight decision values can be extractedto construct the first-level decision vector feature After thatthese eight decision values are considered as new feature totrain and test a 2-class RVM with happiness as the targetside The probability value of the RVM is used to verifyhappiness content in music Experimental results reveal thatthe proposed system can achieve 1333 equal error rate(EER)
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R E Milliman ldquoUsing backgroundmusic to affect the behaviorof supermarket shoppersrdquo Journal of Marketing vol 46 no 3pp 86ndash91 1982
[2] C-H Yeh H-H Lin and H-T Chang ldquoAn efficient emotiondetection scheme for popular musicrdquo in Proceedings of the IEEEInternational Symposium on Circuits and Systems (ISCAS rsquo09)pp 1799ndash1802 Taipei City Taiwan May 2009
[3] Y-H Yang Y-C Lin Y-F Su and H H Chen ldquoA regressionapproach to music emotion recognitionrdquo IEEE Transactions onAudio Speech and Language Processing vol 16 no 2 pp 448ndash457 2008
[4] Y H Yang and H H Chen ldquoPrediction of the distributionof perceived music emotions using discrete samplesrdquo IEEETransactions on Audio Speech and Language Processing vol 19no 7 pp 2184ndash2196 2011
[5] B Han S Rho R B Dannenberg and E Hwang ldquoSMERSmusic emotion recognition using support vector regressionrdquo inProceedings of the International Conference on Music Informa-tion Retrieval Kobe Japan 2009
The Scientific World Journal 7
[6] L Lu D Liu and H-J Zhang ldquoAutomatic mood detection andtracking of music audio signalsrdquo IEEE Transactions on AudioSpeech and Language Processing vol 14 no 1 pp 5ndash18 2006
[7] C-Y Chang C-Y Lo C-J Wang and P-C Chung ldquoA musicrecommendation system with consideration of personal emo-tionrdquo in Proceedings of the International Computer Symposium(ICS rsquo10) pp 18ndash23 Tainan City Taiwan December 2010
[8] F C Hwang J S Wang P C Chung and C F YangldquoDetecting emotional expression ofmusic with feature selectionapproachrdquo in Proceedings of the International Conference onOrange Technologies (ICOT rsquo13) pp 282ndash286 March 2013
[9] K Hevner ldquoExpression in music a discussion of experimentalstudies and theoriesrdquo Psychological Review vol 42 no 2 pp186ndash204 1935
[10] M Chouchane S Paris F Le Gland C Musso and D-TPham ldquoOn the probability distribution of a moving targetAsymptotic and non-asymptotic resultsrdquo in Proceedings of the14th International Conference on Information Fusion (Fusion rsquo11)pp 1ndash8 July 2011
[11] O Lartillot and P Toiviainen ldquoMIR in Matlab (II) a toolboxfor musical feature extraction from audiordquo in Proceedings ofthe International Conference Music Information Retrieval pp127ndash130 2007 httpswwwjyufihumlaitoksetmusiikkienresearchcoematerialsmirtoolbox
[12] C-W Chen K Lee and H-H Wu ldquoTowards a class-basedrepresentation of perceptual tempo for music retrievalrdquo inProceedings of the 8th International Conference on MachineLearning andApplications (ICMLA rsquo09) pp 602ndash607December2009
[13] WChai ldquoSemantic segmentation and summarization ofmusicrdquoIEEE Signal Processing Magazine vol 23 no 2 pp 124ndash1322006
[14] M A Bartsch and G H Wakefield ldquoAudio thumbnailingof popular music using chroma-based representationsrdquo IEEETransactions on Multimedia vol 7 no 1 pp 96ndash104 2005
[15] X Yu J Zhang J LiuWWan andW Yang ldquoAn audio retrievalmethod based on chromagram and distance metricsrdquo in Pro-ceedings of the International Conference on Audio Language andImage Processing (ICALIP rsquo10) pp 425ndash428 Shanghai ChinaNovember 2010
[16] J O Garcıa and C A R Garcia ldquoMel-frequency cepstrumcoefficients extraction from infant cry for classification of nor-mal and pathological cry with feed-forward neural networksrdquoin Proceedings of the International Joint Conference on NeuralNetworks pp 3140ndash3145 July 2003
[17] C Y Lin and S Cheng ldquoMulti-theme analysis of musicemotion similarity for jukebox applicationrdquo in Proceedings ofthe International Conference on Audio Language and ImageProcessing (ICALIP rsquo12) pp 241ndash246 July 2012
[18] B Shao M Ogihara D Wang and T Li ldquoMusic recommenda-tion based on acoustic features and user access patternsrdquo IEEETransactions on Audio Speech and Language Processing vol 17pp 1602ndash1611 2009
[19] W-Q Zhang D Yang J Liu and X Bao ldquoPerturbation analysisof mel-frequency cepstrum coefficientsrdquo in Proceedings of theInternational Conference onAudio Language and Image Process-ing (ICALIP rsquo10) pp 715ndash718 Shanghai China November 2010
[20] H G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond AudioContent Indexing andRetrievalWileyNewYorkNY USA 2005
[21] ISO-IECJTCI SC29 WGII Moving Pictures Expeti GroupldquoInformation technologymdashmultimedia content description
interfacemdashpart 4 Audiordquo Comittee Draft 15938-4 ISOIEC2000
[22] M Casey ldquoMPEG-7 sound-recognition toolsrdquo IEEE Transac-tions on Circuits and Systems for Video Technology vol 11 no6 pp 737ndash747 2001
[23] V Vapnik Statistical Learning Theory Wiley New York NYUSA 1998
[24] E Mower M J Mataric and S Narayanan ldquoA frameworkfor automatic human emotion classification using emotionprofilesrdquo IEEE Transactions on Audio Speech and LanguageProcessing vol 19 no 5 pp 1057ndash1070 2011
[25] R O Duda P E Hart and D G Stork Pattern ClassificationNew York NY USA 2nd edition 2001
[26] T Jebara R Kondor and A Howard ldquoProbability productkernelsrdquo Journal of Machine Learning Research vol 5 pp 819ndash844 2004
[27] F A Mianji and Y Zhang ldquoRobust hyperspectral classificationusing relevance vector machinerdquo IEEE Transactions on Geo-science and Remote Sensing vol 49 no 6 pp 2100ndash2112 2011
[28] C M Bishop Pattern Recognition andMachine Learning (Infor-mation Science and Statistics) Springer New York NY USA2nd edition 2007
[29] ldquoThe All Music Guiderdquo httpwwwallmusiccom[30] ldquoLastfmrdquo httpcnlastfmhome[31] C C Chang andC J Lin ldquoLIBSVM a library for support vector
machinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm[32] ldquoPattern Recognition Toolboxrdquo httpwwwnewfolderconsult-
ingcomprt
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
The Scientific World Journal 7
[6] L Lu D Liu and H-J Zhang ldquoAutomatic mood detection andtracking of music audio signalsrdquo IEEE Transactions on AudioSpeech and Language Processing vol 14 no 1 pp 5ndash18 2006
[7] C-Y Chang C-Y Lo C-J Wang and P-C Chung ldquoA musicrecommendation system with consideration of personal emo-tionrdquo in Proceedings of the International Computer Symposium(ICS rsquo10) pp 18ndash23 Tainan City Taiwan December 2010
[8] F C Hwang J S Wang P C Chung and C F YangldquoDetecting emotional expression ofmusic with feature selectionapproachrdquo in Proceedings of the International Conference onOrange Technologies (ICOT rsquo13) pp 282ndash286 March 2013
[9] K Hevner ldquoExpression in music a discussion of experimentalstudies and theoriesrdquo Psychological Review vol 42 no 2 pp186ndash204 1935
[10] M Chouchane S Paris F Le Gland C Musso and D-TPham ldquoOn the probability distribution of a moving targetAsymptotic and non-asymptotic resultsrdquo in Proceedings of the14th International Conference on Information Fusion (Fusion rsquo11)pp 1ndash8 July 2011
[11] O Lartillot and P Toiviainen ldquoMIR in Matlab (II) a toolboxfor musical feature extraction from audiordquo in Proceedings ofthe International Conference Music Information Retrieval pp127ndash130 2007 httpswwwjyufihumlaitoksetmusiikkienresearchcoematerialsmirtoolbox
[12] C-W Chen K Lee and H-H Wu ldquoTowards a class-basedrepresentation of perceptual tempo for music retrievalrdquo inProceedings of the 8th International Conference on MachineLearning andApplications (ICMLA rsquo09) pp 602ndash607December2009
[13] WChai ldquoSemantic segmentation and summarization ofmusicrdquoIEEE Signal Processing Magazine vol 23 no 2 pp 124ndash1322006
[14] M A Bartsch and G H Wakefield ldquoAudio thumbnailingof popular music using chroma-based representationsrdquo IEEETransactions on Multimedia vol 7 no 1 pp 96ndash104 2005
[15] X Yu J Zhang J LiuWWan andW Yang ldquoAn audio retrievalmethod based on chromagram and distance metricsrdquo in Pro-ceedings of the International Conference on Audio Language andImage Processing (ICALIP rsquo10) pp 425ndash428 Shanghai ChinaNovember 2010
[16] J O Garcıa and C A R Garcia ldquoMel-frequency cepstrumcoefficients extraction from infant cry for classification of nor-mal and pathological cry with feed-forward neural networksrdquoin Proceedings of the International Joint Conference on NeuralNetworks pp 3140ndash3145 July 2003
[17] C Y Lin and S Cheng ldquoMulti-theme analysis of musicemotion similarity for jukebox applicationrdquo in Proceedings ofthe International Conference on Audio Language and ImageProcessing (ICALIP rsquo12) pp 241ndash246 July 2012
[18] B Shao M Ogihara D Wang and T Li ldquoMusic recommenda-tion based on acoustic features and user access patternsrdquo IEEETransactions on Audio Speech and Language Processing vol 17pp 1602ndash1611 2009
[19] W-Q Zhang D Yang J Liu and X Bao ldquoPerturbation analysisof mel-frequency cepstrum coefficientsrdquo in Proceedings of theInternational Conference onAudio Language and Image Process-ing (ICALIP rsquo10) pp 715ndash718 Shanghai China November 2010
[20] H G Kim N Moreau and T Sikora MPEG-7 Audio andBeyond AudioContent Indexing andRetrievalWileyNewYorkNY USA 2005
[21] ISO-IECJTCI SC29 WGII Moving Pictures Expeti GroupldquoInformation technologymdashmultimedia content description
interfacemdashpart 4 Audiordquo Comittee Draft 15938-4 ISOIEC2000
[22] M Casey ldquoMPEG-7 sound-recognition toolsrdquo IEEE Transac-tions on Circuits and Systems for Video Technology vol 11 no6 pp 737ndash747 2001
[23] V Vapnik Statistical Learning Theory Wiley New York NYUSA 1998
[24] E Mower M J Mataric and S Narayanan ldquoA frameworkfor automatic human emotion classification using emotionprofilesrdquo IEEE Transactions on Audio Speech and LanguageProcessing vol 19 no 5 pp 1057ndash1070 2011
[25] R O Duda P E Hart and D G Stork Pattern ClassificationNew York NY USA 2nd edition 2001
[26] T Jebara R Kondor and A Howard ldquoProbability productkernelsrdquo Journal of Machine Learning Research vol 5 pp 819ndash844 2004
[27] F A Mianji and Y Zhang ldquoRobust hyperspectral classificationusing relevance vector machinerdquo IEEE Transactions on Geo-science and Remote Sensing vol 49 no 6 pp 2100ndash2112 2011
[28] C M Bishop Pattern Recognition andMachine Learning (Infor-mation Science and Statistics) Springer New York NY USA2nd edition 2007
[29] ldquoThe All Music Guiderdquo httpwwwallmusiccom[30] ldquoLastfmrdquo httpcnlastfmhome[31] C C Chang andC J Lin ldquoLIBSVM a library for support vector
machinesrdquo 2001 httpwwwcsientuedutwsimcjlinlibsvm[32] ldquoPattern Recognition Toolboxrdquo httpwwwnewfolderconsult-
ingcomprt
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of