Dimensional Music Emotion Recognition

1

Dimensional Music Emotion Recognition

Yi-Hsuan YangAssistant Research FellowMusic & Audio Computing (MAC) Lab Research Center for IT InnovationAcademia Sinica

Dec. 2011 @ MTG, UPF

Music & Emotion

Music conveys emotion and modulates our moodMusic emotion recognition (MER)

Understand how human perceives/feels emotion when listening to musicDevelop systems for emotion-based music retrieval

2

Why Do We Listen to Music?

Motive Ratio“to express, release, and influence emotions” 47%“to relax and settle down” 33%“for enjoyment, fun, and pleasure” 22%“as company and background sound” 16%“because it makes me feel good” 13%“because it’s a basic need, I can’t live without it” 12%“because I like/love music” 11%“to get energized” 9%“to evoke memories” 4%

3

“Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening,” Patrik N. Juslin and Petri Laukka, Journal of New Music Research, 2004

Categories of Emotion

Expressed (intended) emotionWhat a performer tries to express

Perceived emotionWhat a listener perceives as being expressed in musicUsually the same as the expressed emotion

Felt (induced) emotionWhat a listener actually feelsStrongly influenced by the context of music listening (environment, mood)

4

Emotion Description w/ Mood Labels

5Courtesy of Ching-Wei Chen @ Gracenote

Description w/ Latent Dimensions

6

7

Categorical Approach

Hevner’ model (1936)

Audio spectrum

8

Dimensional Approach

Emotion plane (Russell 1980, Thayer 1989)

Audio spectrum

Categorical vs. Dimensional

Pros Cons

Categorical • Intuitive• Natural language• Atomic description

• Lack a unifying model• Ambiguous• Subjective• Difficult to offer fine-grained

differentiation

Dimensional • Focus on a few dimensions

• Good user interface

• Less intuitive• Semantic loss in projection• Difficult to obtain ground

truth

9

Q: No Consensus on Mood Taxonomy

10

Work # Emotion description Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, seriousFeng et al [sigir03] 4 Happy, angry, fear, sadLi et al [ismir03],Wieczorkowska et al [imtci04]

13Happy, light, graceful, dreamy, longing, dark, sacred, dramatic, agitated, frustrated, mysterious, passionate, bluesy

Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy

Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm

Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, contentYang et al [mm06] 4 Happy, angry, sad, relaxedSkowronek et al [ismir07] 12 Arousing, angry, calming, carefree, cheerful, emo-

tional, loving, peaceful, powerful, sad, restless, tender

Wu et al [mmm08] 8 Happy, light, easy, touching, sad, sublime, grand, exciting

Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressiveTrohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry

Fuzzy Boundary b/w Mood Classes

Subjective usage of affective termsCheerful, happy, joyous, party/celebratoryMelancholy, gloomy, sad, sorrowful

Semantic overlap (#2 and #4) and acoustic overlap (#1 and #5) [mirex07.cyril&perfe]

11

MIREX AMC TaxonomyCluster 1 Passionate, rowdy, rousing, confident, boisterousCluster 2 Amiable/good-natured, sweet, fun, rollicking, cheerfulCluster 3 Literate, wistful, bittersweet, autumnal, brooding, poignantCluster 4 Witty, humorous, whimsical, wry, campy, quirky, sillyCluster 5 Aggressive, volatile, fiery, visceral, tense/anxious, intense

Granularity of Emotion Description

Small set of emotion classesInsufficient comparing to the richness of our perception

Large set of emotion classesDifficult to obtain reliable ground truth data

12

Acerbic, Aggressive, Ambitious, Amiable, Angry, Bittersweet, Bright, Brittle, Calm/, Carefree, Cathartic, Cerebral, Cheerful, Circular, Clinical, Cold, Confident, Delicate, Dramatic, Dreamy, Druggy, Earnest, Eccentric, Elegant, Energetic, Enigmatic, Epic, Exciting, Exuberant, Fierce, Fiery, Fun, Gentle, Gloomy, Greasy, Happy, …

□ Happy□ Sad□ Angry□ Relaxed

Sol: Describing Emotions in Emotion Space

13

￮ Activation, activity￮ Energy and stimulation levelArousal

Valence￮ Pleasantness￮ Positive and

negative affective states

[psp80]

The Dimensional Approach

StrengthNo need to consider which and how many emotionsGeneralize MER from categorical domain to real-valued domain Easy to compare differentcomputational models

ArousalValence

14

The Dimensional Approach

WeaknessSemantic loss due to projectionBlurs important psychological distinctions

3rd dimension: potency [psy07]Angry ↔ afraidProud ↔ shamefulInterested ↔ disappointed

4th dimension: unpredictabilitySurprisedTense ↔ afraidContempt ↔ disgust

15

Music Retrieval in VA Space

Provide a simple means for 2D user interface

Pick a pointDraw a trajectory

Useful for mobile devices with small display space

16

Demo

arousalarousal

valencevalence

Q: How to Predict Emotion Values?

Transformation-based approach [mm06]

Consider the four quadrants Perform 4-class mood classificationApply the following transformation

Arousal = u1 + u2 – u3 – u4

Valence = u1 + u4 – u2 – u3

(u denotes likelihood)

Not rigorous

17

18

Sol: Perform Regression

Given features,predict a numerical value

Given N inputs (xi, yi), 1≤ i ≤N, where xi is feature and yi is the numerical value to be predicted, train a regression model R(.) such that the following mean squared error (MSE) is minimized

2

1

1min ( ( ))i

N

f ii

fN

y=

−∑ x x

y

yi : numerical emotion value xi : feature (input)f(xi) : prediction result (output)

e.g. linear regressionf(xi) = wTxi +b

= \sumj {wjxij} +b

Computational Framework [taslp08]

Predict the VA valuesTrains a regressionmodel f (·) that minimizesthe mean squared error (MSE)One for valence;one for arousal

19

Trainingdata

Manual annotation

Feature extraction

Emotion value

Regressor training

Feature

Testdata

Feature extraction

Automatic Prediction

Feature

Regressor

Emotion value

2

1

1min ( ( ))i

N

f ii

fN

y=

−∑ x

Obtain Music Emotion Rating

Manual annotationRates the VA values of each song

Ordinal rating scaleScroll bar

20

Trainingdata

Manual annotation

Feature extraction

Emotion value

Regressor training

Feature

Testdata

Feature extraction


Feature

Regressor

Emotion value

User study1240 Chinese pop songs; each 30-sec666 subjects; each rates 8 random songs

Subjective evaluationEasiness of annotating emotionWithin-subject reliability: compare to one month laterBetween-subject reliability: compare to other subjects

21

0 100

Method Easiness Within-subject reliability

Between-subject reliability

Emotion rating 2.82 2.92 2.81

From 1 to 5 (strongly disagree to strongly agree)

Evaluation of Emotion Rating

AnnoEmo: GUI for Emotion Rating [hcm07]

Encourages differentiation

22

Click to listen again

Drag & drop to modify

annotation

Demo

Determining VA values is not that easyDifficult to ensure consistently

Does dist(0.5,0.8) = dist(–0.2,0.1) in terms of our emotion perception?Does 0.7 the same for two subjects?

23

Cognitive Load is Still High

-1

-1

1

10.80.5

0.1-0.2

Sol: Ranking Instead of Rating [taslp11a]

Determines the position of a song By the relative ranking with respect to other songs Rather than by the exact emotion values

24

Oh Happy DayI Want to Hold Your Hand by BeatlesI Feel Good by James BrownWhat a Wonderful World by Louis ArmstrongInto the Woods by My Morning JacketThe Christmas SongC'est La VieLabita by Lisa OneJust the Way You Are by Billy JoelPerfect Day by Lou ReedWhen a Man Loves a Woman by Michael BoltonSmells Like Teen Spirit by Nirvana

positivevalence

negative valence

valence= 1

valence= –1

relativeranking

exactrating

Ranking-Based Emotion Annotation

Emotion tournament Requires only n–1 pairwise comparisonsThe global ordering can later be approximated by a greedy algorithm [jair99]

25

a b c d e f g h

a b c d e f g habcdefgh

03100701

f > b > c = h > a = d = e = gWhich songs is more positive?

Online Interface

26

Simplify Emotion Annotation

Subjective evaluationBoth rate and rankThe ordering of rate and rank does not matter

Result

27

Strong

Weak

Q: Which Features are Relevant? [psy07]

28

Sound intensity Tempo Rhythm

Pitch rangeMode Consonance

major

Feature Extraction

Melody/harmony [MIR toolbox]Pitch estimate, key clarity, harmonic change, musical mode

Spectral [Marsyas]Spectral flatness measures, spectral crest factors, MFCCs

Temporal [Sound description toolbox]Zero-crossing rate, temporal centroid, log-attack time

Rhythmic [Rhythm pattern extractor]Beat histogram and average tempo

Psyco-acoustic motivated features [PsySound]Loudness, sharpness, timbral width, volume, spectral dissonance, tonal dissonance, pure tonal, complex tonal, multiplicity, tonality, chord

29

Data Collection

30

0

Q: Subjective Issue

31

Each circle represents the emotion annotation for a music piece by a subject

Sol: Probabilistic MER [taslp11b]

Predicts the probabilistic distribution P(e|d) of the perceived emotions of a music piece

32

Sol: Personalized MER [sigir09]

From P(e|d) to P(e|d,u) General regressor personal regressorUtilize user feedback

33

Trainingdata

Manual annotation

Feature extraction

Emotion value

Regressor training

Feature

Testdata

Feature extraction


Feature

Regressor

Emotion value

Emotion-based retrieval

Personalization

User feedback

Evaluation Setup

Training data195 Western/Japanese/Chinese pop songs25-sec segment that is representative of the song

Too long the emotion may not be homogeneousToo short the listener may not hear enough

Manual annotation253 subjects; each rates 12 songsRate the VA values in 11 ordinal levels

￮ 0 ￮ 1 ￮ 2 ￮ 3 ￮ 4 ￮ 5 ￮ 6 ￮ 7 ￮ 8 ￮ 9 ￮ 10

Each song is annotated by 10+ subjectsGround truth obtained by averaging

34

Quantitative Result

ResultR2: squared correlation between y and f(x)Valence prediction is challenging

Valence: 0.25 ~ 0.35Arousal: 0.60 ~ 0.85

35

Method R2 of valence R2 of arousalMultiple linear regression 0.109 0.568Adaboost.RT [ijcnn04] 0.117 0.553SVR (support vector regression) [sc04] 0.222 0.570SVR + RReliefF (feature selection) [ml03] 0.254 0.609

Qualitative Result

36

No No No Part 2 - Beyonce

All Of Me - 50 Cent

New York Giants -Big Pun

Why Do I Have To Choose - Willie Nelson

The Last Resort - The Eagles

Mammas Don't Let Your Babies Grow

Up To Be Cowboys -Willie Nelson

Live For The One I Love -Celine Dion

If Only In The Heaven's Eyes - NSYNC

I've Got To See You Again - Norah Jones

Bodies - Sex Pistols

You're Crazy - Guns N' Roses

Out Ta Get Me - Guns N' Roses

Missing 1: Temporal Context of Music

“Sweet anticipation” by David Huron

Music’s most expressive qualities probably relate to structural changes across time

Music emotion can also vary within an excerpt [tsmc06]

37

Missing 2: Context of Music Listening

38

Listening mood/contextFamiliarity/associated memoryPreference of the singer/performer/songSocial relationship

Conclusion

A computational framework for predicting numerical emotion values

Generalizes MER from categorical to dimensionalResolves some issues of emotion descriptionRank instead of rate2D user interface for music retrieval

Valence & subjectivityContent & context

AcknowledgementProf. Homer Chen, National Taiwan University

39

Reference

Music Emotion Recognition, CRC Press, 2011

“A regression approach to music emotion recognition,” IEEE TASLP, 2008. (cited by 76)

“Ranking-based emotion recognition for music organization and retrieval,” IEEE TASLP, 2011

“Prediction of the distribution of perceived music emotions using discrete samples,” IEEE TASLP, 2011

“Exploiting online tags for music emotion classification,” ACM TOMCCAP, 2011

“Machine recognition of music emotion: A review,” ACM TIST, 2012

40CRC Press

Dimensional Music Emotion Recognition

Documents

Transcript of Dimensional Music Emotion Recognition