Dimensional Music Emotion Recognition

40
1 Dimensional Music Emotion Recognition Yi-Hsuan Yang Assistant Research Fellow Music & Audio Computing (MAC) Lab Research Center for IT Innovation Academia Sinica Dec. 2011 @ MTG, UPF

Transcript of Dimensional Music Emotion Recognition

Page 1: Dimensional Music Emotion Recognition

1

Dimensional Music Emotion Recognition

Yi-Hsuan YangAssistant Research FellowMusic & Audio Computing (MAC) Lab Research Center for IT InnovationAcademia Sinica

Dec. 2011 @ MTG, UPF

Page 2: Dimensional Music Emotion Recognition

Music & Emotion

Music conveys emotion and modulates our moodMusic emotion recognition (MER)

Understand how human perceives/feels emotion when listening to musicDevelop systems for emotion-based music retrieval

2

Page 3: Dimensional Music Emotion Recognition

Why Do We Listen to Music?

Motive Ratio“to express, release, and influence emotions” 47%“to relax and settle down” 33%“for enjoyment, fun, and pleasure” 22%“as company and background sound” 16%“because it makes me feel good” 13%“because it’s a basic need, I can’t live without it” 12%“because I like/love music” 11%“to get energized” 9%“to evoke memories” 4%

3

“Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening,” Patrik N. Juslin and Petri Laukka, Journal of New Music Research, 2004

Page 4: Dimensional Music Emotion Recognition

Categories of Emotion

Expressed (intended) emotionWhat a performer tries to express

Perceived emotionWhat a listener perceives as being expressed in musicUsually the same as the expressed emotion

Felt (induced) emotionWhat a listener actually feelsStrongly influenced by the context of music listening (environment, mood)

4

Page 5: Dimensional Music Emotion Recognition

Emotion Description w/ Mood Labels

5Courtesy of Ching-Wei Chen @ Gracenote

Page 6: Dimensional Music Emotion Recognition

Description w/ Latent Dimensions

6

Page 7: Dimensional Music Emotion Recognition

7

Categorical Approach

Hevner’ model (1936)

Audio spectrum

Page 8: Dimensional Music Emotion Recognition

8

Dimensional Approach

Emotion plane (Russell 1980, Thayer 1989)

Audio spectrum

Page 9: Dimensional Music Emotion Recognition

Categorical vs. Dimensional

Pros Cons

Categorical • Intuitive• Natural language• Atomic description

• Lack a unifying model• Ambiguous• Subjective• Difficult to offer fine-grained

differentiation

Dimensional • Focus on a few dimensions

• Good user interface

• Less intuitive• Semantic loss in projection• Difficult to obtain ground

truth

9

Page 10: Dimensional Music Emotion Recognition

Q: No Consensus on Mood Taxonomy

10

Work # Emotion description Katayose et al [icpr98] 4 Gloomy, urbane, pathetic, seriousFeng et al [sigir03] 4 Happy, angry, fear, sadLi et al [ismir03],Wieczorkowska et al [imtci04]

13Happy, light, graceful, dreamy, longing, dark, sacred, dramatic, agitated, frustrated, mysterious, passionate, bluesy

Wang et al [icsp04] 6 Joyous, robust, restless, lyrical, sober, gloomy

Tolos et al [ccnc05] 3 Happy, aggressive, melancholic+calm

Lu et al [taslp06] 4 Exuberant, anxious/frantic, depressed, contentYang et al [mm06] 4 Happy, angry, sad, relaxedSkowronek et al [ismir07] 12 Arousing, angry, calming, carefree, cheerful, emo-

tional, loving, peaceful, powerful, sad, restless, tender

Wu et al [mmm08] 8 Happy, light, easy, touching, sad, sublime, grand, exciting

Hu et al [ismir08] 5 Passionate, cheerful, bittersweet, witty, aggressiveTrohidis et al [ismir08] 6 Surprised, happy, relaxed, quiet, sad, angry

Page 11: Dimensional Music Emotion Recognition

Fuzzy Boundary b/w Mood Classes

Subjective usage of affective termsCheerful, happy, joyous, party/celebratoryMelancholy, gloomy, sad, sorrowful

Semantic overlap (#2 and #4) and acoustic overlap (#1 and #5) [mirex07.cyril&perfe]

11

MIREX AMC TaxonomyCluster 1 Passionate, rowdy, rousing, confident, boisterousCluster 2 Amiable/good-natured, sweet, fun, rollicking, cheerfulCluster 3 Literate, wistful, bittersweet, autumnal, brooding, poignantCluster 4 Witty, humorous, whimsical, wry, campy, quirky, sillyCluster 5 Aggressive, volatile, fiery, visceral, tense/anxious, intense

Page 12: Dimensional Music Emotion Recognition

Granularity of Emotion Description

Small set of emotion classesInsufficient comparing to the richness of our perception

Large set of emotion classesDifficult to obtain reliable ground truth data

12

Acerbic, Aggressive, Ambitious, Amiable, Angry, Bittersweet, Bright, Brittle, Calm/, Carefree, Cathartic, Cerebral, Cheerful, Circular, Clinical, Cold, Confident, Delicate, Dramatic, Dreamy, Druggy, Earnest, Eccentric, Elegant, Energetic, Enigmatic, Epic, Exciting, Exuberant, Fierce, Fiery, Fun, Gentle, Gloomy, Greasy, Happy, …

□ Happy□ Sad□ Angry□ Relaxed

Page 13: Dimensional Music Emotion Recognition

Sol: Describing Emotions in Emotion Space

13

○ Activation, activity○ Energy and stimulation levelArousal

Valence○ Pleasantness○ Positive and

negative affective states

[psp80]

Page 14: Dimensional Music Emotion Recognition

The Dimensional Approach

StrengthNo need to consider which and how many emotionsGeneralize MER from categorical domain to real-valued domain Easy to compare differentcomputational models

ArousalValence

14

Page 15: Dimensional Music Emotion Recognition

The Dimensional Approach

WeaknessSemantic loss due to projectionBlurs important psychological distinctions

3rd dimension: potency [psy07]Angry ↔ afraidProud ↔ shamefulInterested ↔ disappointed

4th dimension: unpredictabilitySurprisedTense ↔ afraidContempt ↔ disgust

15

Page 16: Dimensional Music Emotion Recognition

Music Retrieval in VA Space

Provide a simple means for 2D user interface

Pick a pointDraw a trajectory

Useful for mobile devices with small display space

16

Demo

arousalarousal

valencevalence

Page 17: Dimensional Music Emotion Recognition

Q: How to Predict Emotion Values?

Transformation-based approach [mm06]

Consider the four quadrants Perform 4-class mood classificationApply the following transformation

Arousal = u1 + u2 – u3 – u4

Valence = u1 + u4 – u2 – u3

(u denotes likelihood)

Not rigorous

17

Page 18: Dimensional Music Emotion Recognition

18

Sol: Perform Regression

Given features,predict a numerical value

Given N inputs (xi, yi), 1≤ i ≤N, where xi is feature and yi is the numerical value to be predicted, train a regression model R(.) such that the following mean squared error (MSE) is minimized

2

1

1min ( ( ))i

N

f ii

fN

y=

−∑ x x

y

Page 19: Dimensional Music Emotion Recognition

yi : numerical emotion value xi : feature (input)f(xi) : prediction result (output)

e.g. linear regressionf(xi) = wTxi +b

= \sumj {wjxij} +b

Computational Framework [taslp08]

Predict the VA valuesTrains a regressionmodel f (·) that minimizesthe mean squared error (MSE)One for valence;one for arousal

19

Trainingdata

Manual annotation

Feature extraction

Emotion value

Regressor training

Feature

Testdata

Feature extraction

Automatic Prediction

Feature

Regressor

Emotion value

2

1

1min ( ( ))i

N

f ii

fN

y=

−∑ x

Page 20: Dimensional Music Emotion Recognition

Obtain Music Emotion Rating

Manual annotationRates the VA values of each song

Ordinal rating scaleScroll bar

20

Trainingdata

Manual annotation

Feature extraction

Emotion value

Regressor training

Feature

Testdata

Feature extraction

Automatic Prediction

Feature

Regressor

Emotion value

Page 21: Dimensional Music Emotion Recognition

User study1240 Chinese pop songs; each 30-sec666 subjects; each rates 8 random songs

Subjective evaluationEasiness of annotating emotionWithin-subject reliability: compare to one month laterBetween-subject reliability: compare to other subjects

21

0 100

Method Easiness Within-subject reliability

Between-subject reliability

Emotion rating 2.82 2.92 2.81

From 1 to 5 (strongly disagree to strongly agree)

Evaluation of Emotion Rating

Page 22: Dimensional Music Emotion Recognition

AnnoEmo: GUI for Emotion Rating [hcm07]

Encourages differentiation

22

Click to listen again

Drag & drop to modify

annotation

Demo

Page 23: Dimensional Music Emotion Recognition

Determining VA values is not that easyDifficult to ensure consistently

Does dist(0.5,0.8) = dist(–0.2,0.1) in terms of our emotion perception?Does 0.7 the same for two subjects?

23

Cognitive Load is Still High

-1

-1

1

10.80.5

0.1-0.2

Page 24: Dimensional Music Emotion Recognition

Sol: Ranking Instead of Rating [taslp11a]

Determines the position of a song By the relative ranking with respect to other songs Rather than by the exact emotion values

24

Oh Happy DayI Want to Hold Your Hand by BeatlesI Feel Good by James BrownWhat a Wonderful World by Louis ArmstrongInto the Woods by My Morning JacketThe Christmas SongC'est La VieLabita by Lisa OneJust the Way You Are by Billy JoelPerfect Day by Lou ReedWhen a Man Loves a Woman by Michael BoltonSmells Like Teen Spirit by Nirvana

positivevalence

negative valence

valence= 1

valence= –1

relativeranking

exactrating

Page 25: Dimensional Music Emotion Recognition

Ranking-Based Emotion Annotation

Emotion tournament Requires only n–1 pairwise comparisonsThe global ordering can later be approximated by a greedy algorithm [jair99]

25

a b c d e f g h

a b c d e f g habcdefgh

03100701

f > b > c = h > a = d = e = gWhich songs is more positive?

Page 26: Dimensional Music Emotion Recognition

Online Interface

26

Page 27: Dimensional Music Emotion Recognition

Simplify Emotion Annotation

Subjective evaluationBoth rate and rankThe ordering of rate and rank does not matter

Result

27

Strong

Weak

Page 28: Dimensional Music Emotion Recognition

Q: Which Features are Relevant? [psy07]

28

Sound intensity Tempo Rhythm

Pitch rangeMode Consonance

major

Page 29: Dimensional Music Emotion Recognition

Feature Extraction

Melody/harmony [MIR toolbox]Pitch estimate, key clarity, harmonic change, musical mode

Spectral [Marsyas]Spectral flatness measures, spectral crest factors, MFCCs

Temporal [Sound description toolbox]Zero-crossing rate, temporal centroid, log-attack time

Rhythmic [Rhythm pattern extractor]Beat histogram and average tempo

Psyco-acoustic motivated features [PsySound]Loudness, sharpness, timbral width, volume, spectral dissonance, tonal dissonance, pure tonal, complex tonal, multiplicity, tonality, chord

29

Page 30: Dimensional Music Emotion Recognition

Data Collection

30

0

Page 31: Dimensional Music Emotion Recognition

Q: Subjective Issue

31

Each circle represents the emotion annotation for a music piece by a subject

Page 32: Dimensional Music Emotion Recognition

Sol: Probabilistic MER [taslp11b]

Predicts the probabilistic distribution P(e|d) of the perceived emotions of a music piece

32

Page 33: Dimensional Music Emotion Recognition

Sol: Personalized MER [sigir09]

From P(e|d) to P(e|d,u) General regressor personal regressorUtilize user feedback

33

Trainingdata

Manual annotation

Feature extraction

Emotion value

Regressor training

Feature

Testdata

Feature extraction

Automatic Prediction

Feature

Regressor

Emotion value

Emotion-based retrieval

Personalization

User feedback

Page 34: Dimensional Music Emotion Recognition

Evaluation Setup

Training data195 Western/Japanese/Chinese pop songs25-sec segment that is representative of the song

Too long the emotion may not be homogeneousToo short the listener may not hear enough

Manual annotation253 subjects; each rates 12 songsRate the VA values in 11 ordinal levels

○ 0 ○ 1 ○ 2 ○ 3 ○ 4 ○ 5 ○ 6 ○ 7 ○ 8 ○ 9 ○ 10

Each song is annotated by 10+ subjectsGround truth obtained by averaging

34

Page 35: Dimensional Music Emotion Recognition

Quantitative Result

ResultR2: squared correlation between y and f(x)Valence prediction is challenging

Valence: 0.25 ~ 0.35Arousal: 0.60 ~ 0.85

35

Method R2 of valence R2 of arousalMultiple linear regression 0.109 0.568Adaboost.RT [ijcnn04] 0.117 0.553SVR (support vector regression) [sc04] 0.222 0.570SVR + RReliefF (feature selection) [ml03] 0.254 0.609

Page 36: Dimensional Music Emotion Recognition

Qualitative Result

36

No No No Part 2 - Beyonce

All Of Me - 50 Cent

New York Giants -Big Pun

Why Do I Have To Choose - Willie Nelson

The Last Resort - The Eagles

Mammas Don't Let Your Babies Grow

Up To Be Cowboys -Willie Nelson

Live For The One I Love -Celine Dion

If Only In The Heaven's Eyes - NSYNC

I've Got To See You Again - Norah Jones

Bodies - Sex Pistols

You're Crazy - Guns N' Roses

Out Ta Get Me - Guns N' Roses

Page 37: Dimensional Music Emotion Recognition

Missing 1: Temporal Context of Music

“Sweet anticipation” by David Huron

Music’s most expressive qualities probably relate to structural changes across time

Music emotion can also vary within an excerpt [tsmc06]

37

Page 38: Dimensional Music Emotion Recognition

Missing 2: Context of Music Listening

38

Listening mood/contextFamiliarity/associated memoryPreference of the singer/performer/songSocial relationship

Page 39: Dimensional Music Emotion Recognition

Conclusion

A computational framework for predicting numerical emotion values

Generalizes MER from categorical to dimensionalResolves some issues of emotion descriptionRank instead of rate2D user interface for music retrieval

Valence & subjectivityContent & context

AcknowledgementProf. Homer Chen, National Taiwan University

39

Page 40: Dimensional Music Emotion Recognition

Reference

Music Emotion Recognition, CRC Press, 2011

“A regression approach to music emotion recognition,” IEEE TASLP, 2008. (cited by 76)

“Ranking-based emotion recognition for music organization and retrieval,” IEEE TASLP, 2011

“Prediction of the distribution of perceived music emotions using discrete samples,” IEEE TASLP, 2011

“Exploiting online tags for music emotion classification,” ACM TOMCCAP, 2011

“Machine recognition of music emotion: A review,” ACM TIST, 2012

40CRC Press