ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf ·...

19
E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18 Lecture 11: Chroma and Chords Dan Ellis Dept. Electrical Engineering, Columbia University [email protected] http://www.ee.columbia.edu/~dpwe/e4896/ 1 1. Features for Music Audio 2. Chroma Features 3. Chord Recognition ELEN E4896 MUSIC SIGNAL PROCESSING

Transcript of ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf ·...

Page 1: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Lecture 11:Chroma and Chords

Dan EllisDept. Electrical Engineering, Columbia University

[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1

1. Features for Music Audio2. Chroma Features3. Chord Recognition

ELEN E4896 MUSIC SIGNAL PROCESSING

Page 2: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

1. Features for Music Audio

• Challenges of large music databaseshow to find “what we want”...

2

• Euclidean metaphormusic tracks as points in space

• What are the dimensions?“sound” - timbre, instruments → MFCCmelody, chords→ Chromarhythm, tempo→ Rhythmic bases

Page 3: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

MFCCs• The standard feature for speech recognition

3

Logan 2000

0.25 0.255 0.26 0.265 0.27 time / s!0.5

0

0.5

0 1000 2000 3000 freq / Hz05

10

0 5 10 15 freq / Mel012

x 104

0 5 10 15 freq / Mel

050

100

0 10 20 30 quefrency!200

0

200

FFT X[k]

Mel scalefreq. warp

log |X[k]|

IFFT

Truncate

MFCCs

Sound

spectra

audspec

cepstra

Page 4: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

MFCC Example• Resynthesize by imposing spectrum on noise

MFCCs capture instruments, not notes

4

freq

/ Hz

coef

ficie

ntfre

q / H

z

Let It Be - log-freq specgram (LIB-1)

300

1400

6000

MFCCs

2468

1012

time / sec

Noise excited MFCC resynthesis (LIB-2)

0 5 10 15 20 25

300

1400

6000

Page 5: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

MFCC Artist Classification• 20 Artists x 6 albums each

train models on 5 albums, classify tracks from last

• Model as MFCC mean + covarianceper artist“single Gaussian” model20 (mean) + 10 x 19 (covariance) parameters55% correct(guessing ~5%)

5

Confusion: MFCCs (acc 55.13%)

ae be cr cu da de flga gr lem

am

e pr qu ra ro st su to u2

aerosmithbeatles

creedence_c_rcure

dave_matthews_bdepeche_modefleetwood_mac

garth_brooksgreen_day

led_zeppelinmadonnametallica

princequeen

radioheadroxette

steely_dansuzanne_vega

tori_amosu2

true

Ellis 2007

Page 6: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

2. Chroma Features• What about modeling tonal content (notes)?

melody spottingchord recognitioncover songs...

• MFCCs exclude tonal content

• Polyphonic transcription is too harde.g. sinusoidal tracking: confused by harmonics

• Chroma features as solution...6

MID

I not

e nu

mbe

r

40

45

50

55

60

65

70

75

MID

I not

e nu

mbe

r

time / s22 24 26 28 30 32 34 36

40

45

50

55

60

65

70

75

RecognizedTrue

Page 7: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Chroma Features• Idea: Project all energy onto 12 semitones

regardless of octavemaintains main “musical” distinctioninvariant to musical equivalenceno need to worry about harmonics?

W(k) is weighting, B(b) selects every ~ mod127

C(b) =NM�

k=0

B(12 log2(k/k0)� b)W (k)|X[k]|

50 100 150 fft bin 2 4 6 8 time / sec 50 100 150 200 250 time / frame

freq

/ kHz

01234

chro

ma

A

CD

FG

chro

ma

A

CD

FG

Fujishima 1999

Page 8: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Better Chroma• Problems:

blurring of bins close to edgeslimitation of FFT bin resolution

• Solutions:peak picking - only keep energy at center of peaks

Instantaneous Frequency - high-resolution estimatesadapt tuning center based on histogram of pitches

8

2 4 6 8 time / sec 50 100 150 200 time / frame

freq /

kHz

01234

chro

ma

A

CD

FG

chro

ma

A

CD

FG

0 2000 freq / Hz

( )

Page 9: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Chroma Resynthesis• Chroma describes the notes in an octave

... but not the octave

• Can resynthesize by presenting all octaves... with a smooth envelope“Shepard tones” - octave is ambiguous

endless sequence illusion

9

0 500 1000 1500 2000 2500 freq / Hz-60-50-40-30-20-10

0

2 4 6 8 10 time / sec

freq

/ kH

z

leve

l / d

B

0

1

2

3

4Shepard tone resynth12 Shepard tone spectra

yb(t) =M�

o=1

W (o +b

12) cos 2o+ b

12 w0t

Ellis & Poliner 2007

Page 10: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Chroma Example• Simple Shepard tone resynthesis

can also reimpose broad spectrum from MFCCs

10

Let It Be - log-freq specgram (LIB-1)

Chroma features

CDEGAB

Shepard tone resynthesis of chroma (LIB-3)

MFCC-filtered shepard tones (LIB-4)

freq

/ Hz

300

1400

6000

freq

/ Hz

chro

ma

bin

300

1400

6000

freq

/ Hz

300

1400

6000

time / sec5202510150

Page 11: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Beat-Synchronous Chroma• Drastically reduce data size

by recording one chroma frame per beat

11

Let It Be - log-freq specgram (LIB-1)

Onset envelope + beat times

Beat-synchronous chroma

Beat-synchronous chroma + Shepard resynthesis (LIB-6)

freq

/ Hz

30014006000

freq

/ Hz

30014006000

CDEGAB

chro

ma

bin

time / sec0 5 10 15 20 25

Bartsch & Wakefield 2001

Page 12: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

3. Chord Recognition• Beat synchronous chroma look like chords

can we transcribe them?

• Two approachesmanual templates (prior knowledge)learned models (from training data)

12

ACDEG

chro

ma

bin

time / sec0 5 10 15 20

C-E-

G

B-D

-G

A-C-

E

A-C-

D-F ...

Page 13: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Chord Recognition System• Analogous to speech recognition

Gaussian models of features for each chordHidden Markov Models for chord transitions

13

Audio

Labels

Beat track

Resample

Chroma100-1600 HzBPF

Chroma25-400 HzBPF

Root normalize

HMMViterbi

Counttransitions

Gaussian Unnormalize

beat-synchronouschroma features

chordlabels

24x24transition

matrix

24Gaussmodels

traintest

C D E G A BCDE

GAB

CDE

GAB

C D E G A B

C maj

c min

C D E F G A B c d e f g a bC

D

EF

G

A

Bc

d

ef

g

a

b

Sheh & Ellis 2003

Page 14: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

HMMs• Hidden Markov Models are good for

inferring hidden statesunderlying Markov “generative model”each state has emission distribution

observationstell us somethingabout state...

infer smoothedstate sequence

14

1

2

3

0

0.2

0.4

0.6

0.8

0 10 20 300

0

1

2

3

0 1 2 3 40

0.2

0.4

0.6

0.8

observation x time step n

State sequenceEmission distributions

Observation sequence

x nx n

p(x|q)

p(x|q)

q = A q = B q = C

q = A q = B q = CAAAAAAAABBBBBBBBBBBCCCCBBBBBBBC

AS

EC

Bp(qn+1|qn)

S A B C E

0 1 0 0 0

0 0 0 0 1

0 .8 .1 .1 00 .1 .8 .1 00 .1 .1 .7 .1

S A B C E

qn

qn+1.8 .8

.7

.1

.1

.1.1.1

.1

.1

S A A A A A A A A B B B B B B B B B C C C C B B B B B B C E

Page 15: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

HMM Inference• HMM defines emission distribution

and transition probabilities

• Likelihood of observed given state sequence:

15

p(x|q)p(qn|qn�1)

p({xn}|{qn}) =�

n

p(xn|qn)p(qn|qn�1)

q0 q1 q2 q3 q4

S A A A ES A A B ES A B B ES B B B E

.9 x .7 x .7 x .1 = 0.0441

.9 x .7 x .2 x .2 = 0.0252

.9 x .2 x .8 x .2 = 0.0288

.1 x .8 x .8 x .2 = 0.0128Σ = 0.1109 Σ = p(X | M) = 0.4020

2.5 x 0.2 x 0.1 = 0.052.5 x 0.2 x 2.3 = 1.152.5 x 2.2 x 2.3 = 12.650.1 x 2.2 x 2.3 = 0.506

0.00220.02900.36430.0065

S A B E

AS B E0.9• 0.1 • 0.7• 0.2 0.1• • 0.8 0.2• • • 1

All possible 3-emission paths Qk from S to Ep(Q | M) = Πn p(qn|qn-1) p(X | Q,M) = Πn p(xn|qn) p(X,Q | M)

Observation likelihoods

x1 x2 x3

AB

p(x|q)

q{ 2.5 0.2 0.10.1 2.2 2.3

A B ES0.9

0.1

0.7

0.2

0.1

0.8

0.2

Model M1 xn

n1

p(x|B)p(x|A)

2 3

Observations x1, x2, x3

S0 1 2 3 4

ABE

Stat

es

time n

0.1

0.9

0.80.2

0.7

0.8 0.2

0.2

0.70.1

Paths • By dynamic programming, we can also identify the best state sequence given just the observations

Page 16: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Key Normalization• Chord transitions depend on key of piece

dominant, relative minor, etc...

16

Aligned Global model

Taxman Eleanor Rigby I'm Only Sleeping

She Said She Said Good Day Sunshine And Your Bird Can Sing

Love You To Yellow Submarine

alig

ned

chro

ma

A

CD

FG

A C D F GA

CD

FG

A C D F GA

CD

FG

A C D F GA

CD

FG

A C D F GA

CD

FG

A C D F GA

CD

FG

A C D F GA

CD

FG

A C D F GA

CD

FG

A C D F GA

CD

FG

aligned chroma

• Chord transition probabilities should be key-relativeestimate main key of piecerotate all chroma features learn models

Page 17: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Chord Recognition• Often works:

• But only about 60% of the time

17

freq

/ Hz

Let It Be/06-Let It Be

C G A:min

A:m

in/b7

F:m

aj7F:

maj6 C G F C C G A:min

A:m

in/b7

F:m

aj7

240

761

2416

0 2 4 6 8 10 12 14 16 18

ABCDEFG

C G a F C G F C G a

Groundtruthchord

Audio

Recognized

Beat-synchronous

chroma

Page 18: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

Summary• Music Audio Features

capture information useful for classification

• Chroma Features12 bins to robustly summarize notes

• Chord RecognitionSometimes easy, sometimes subtle

18

Page 19: ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 11: Chroma …dpwe/e4896/lectures/E4896-L11.pdf · suzanne_vega tori_amos u2 true Ellis 2007. E4896 Music Signal Processing (Dan Ellis)

E4896 Music Signal Processing (Dan Ellis) 2013-04-08 - /18

References• B. Logan, “Mel frequency cepstral coefficients for music modeling,” in Proc. Int.

Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000.

• D. Ellis, “Classifying Music Audio with Timbral and Chroma Features,” in Proc. Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340, Vienna, October 2007.

• T. Fujishima, “Realtime chord recognition of musical sound: A system using common lisp music,” In Proc. Int. Comp. Music Conf., pp. 464–467, Beijing, 1999.

• D. Ellis and G. Poliner, “Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking,” Proc. ICASSP-07, pp. IV-1429-1432, Hawai'i, April 2007.

• M. A. Bartsch and G. H. Wakefield, “To catch a chorus: Using chroma-based representations for audio thumbnailing,” in Proc. IEEE WASPAA, Mohonk, October 2001.

• A. Sheh and D. Ellis, “Chord Segmentation and Recognition using EM-Trained Hidden Markov Models,” Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191, Baltimore, October 2003.

19