Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of...

16
Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink g.brown}@dcs.shef.ac.uk

Transcript of Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of...

Extracting Melody Lines from Complex Audio

Jana Eggink

Supervisor: Guy J. Brown

University of Sheffield

{j.eggink g.brown}@dcs.shef.ac.uk

Melody Extraction from Complex Audio 2 / 16 Jana Eggink, Sheffield, UK

Task• Extract the melody line from an audio recording

flute

• Useful for: automatic music indexing and analysis, detection of copyright infringement, ‘query-by-humming’ systems...

• No clear definition of what is perceived as a melody by humans

• Working definition: F0s played by the solo instrument in accompanied sonatas and concertos

• Solo instrument is not necessarily always loudest F0

• Therefore: include information about the instrument by which a specific F0 was produced

Melody Extraction from Complex Audio 3 / 16 Jana Eggink, Sheffield, UK

Task I: Identify Solo Instrument

• Instrument sounds are harmonic, energy is concentrated in partials ...

flute

clarinet

oboe

violin

cello

audiosignal

recog-niser

featuresF0 and partials

• ... which are least likely to be masked by other sounds

• Features based only on frequency position and power of lowest 15 partials

• Statistical recogniser (GMMs) trained on monophonic music

Melody Extraction from Complex Audio 4 / 16 Jana Eggink, Sheffield, UK

Identify Solo InstrumentFeatures

• Exact frequency position and normalised log-compressed power of first 15 partials

...+1+30...-10+1

...-3+50...+5-1+2

...445060...658442220partials

frequency (Hz) power (dB)

• Frame to frame differences (deltas and delta-deltas) within tones of continuous F0

Melody Extraction from Complex Audio 5 / 16 Jana Eggink, Sheffield, UK

Results IInstrument Identification

94%6%0%0%0%

12%88%0%0%0%

0%18%82%0%0%

0%6%0%88%6%

0%25%0%0%75%

cello

violin

oboe

clarinet

flute

celloviolinoboeclarinetfluteresponse

stimulus

• Solo instrument with accompaniment (piano or orchestra), commercially available CDs, 90 examples, 2-3 min. each

• Instrument 86% correct

Melody Extraction from Complex Audio 6 / 16 Jana Eggink, Sheffield, UK

But...• Estimated F0s not very accurate (as judged by manual

inspection)

• Overall instrument classification very good, but only when averaged over a whole sound file, results not very accurate on a note-by-note or frame-by-frame basis

• More information is needed to find the melody!

Melody Extraction from Complex Audio 7 / 16 Jana Eggink, Sheffield, UK

Task II: Find Melody (assuming the solo instrument is known)

• Extract multiple F0 candidates

TEMPORAL KNOWLEDGE

tone length

interval transitions

AUDIO

F0 candidates

find most likely

‘path’ through time-

frequency space of

F0 candidates

F0 strength (~loudness)

F0 likelihood (absolute

frequency | instrument range)

instrument likelihood (recogniser output)

LOCAL KNOWLEDGE

silence estimation (only

accompaniment?)

MELODY

• Include additional knowledge about instrument range, tone duration, likely interval transitions to pick correct candidate

Melody Extraction from Complex Audio 8 / 16 Jana Eggink, Sheffield, UK

time

freq

uenc

yKnowledge Integration

(Path Finding)

• Possible melody paths restricted by longer tones of continuous F0• All knowledge sources are normalised to equal mean and

standard deviation• Knowledge sources are summed along the current path• N-best search for most likely path

Melody Extraction from Complex Audio 9 / 16 Jana Eggink, Sheffield, UK

‘Silence’ Estimation• Solo instrument is not always continuously playing

• Use likelihoods for solo instrument along the estimated path

• Present threshold: median of likelihood values for solo instrument (assuming the solo instrument is present at least 50% of the time)

• Silent threshold: mean of likelihood values over all instruments

• Assign whole tones according to proximity to present/silent threshold and the state of their neighbours

• Impose minimum length on ‘present’ sections

Melody Extraction from Complex Audio 10 / 16 Jana Eggink, Sheffield, UK

Evaluation: Test Material• Realistic recordings do not provide information about ‘true’

F0s, even scores only approximation

• Use MIDI generated audio

• Real instrument samples, but only 3-4 per octave, provided by the sampler software

• 10 examples, for every solo instrument one piece with piano accompaniment, one with orchestra

• Solo instrument and accompaniment mixed at 0dB SNR

• Whole movements (or first 3 minutes) to ensure sufficient presence of the solo instrument; mixture of different styles and tempi

Melody Extraction from Complex Audio 11 / 16 Jana Eggink, Sheffield, UK

Results F0 Estimation• Comparing F0s estimated using harmonic sieves to search

for prominent harmonic series with simply picking the highest spectral peak shows no advantage of the former

95%84%94%98%99%98%15

76%64%61%78%94%82%3

52%38%28%48%78%70%1 (strongest)

averagecelloviolinoboeclarinetfluteF0 candidates

(based solely on sections were the solo instrument is present)

• Very unexpected, but might be caused by the very rich mixture of harmonically related tones, initial results show that other algorithms that search for harmonic series like e.g. YIN (autocorrelation based) do not do well either

Melody Extraction from Complex Audio 12 / 16 Jana Eggink, Sheffield, UK

Results Instrument Identification• Solo instrument without accompaniment: all examples correct,

except one oboe mistaken for a flute

• Solo instrument with accompaniment: violin and cello still correct, but performance for woodwinds approaching random, even with true F0s provided

Possible reasons:• Sample-based music might be harder to identify, as it provides

less instrument specific variation like e.g. vibrato• Mixing level might be unfavourable with worse SNR than in

realistic recordings• Frequency regions that are dominated by the accompaniment

might differ between realistic recordings and MIDI based audio

Melody Extraction from Complex Audio 13 / 16 Jana Eggink, Sheffield, UK

Results Melody Extraction• Baseline performance only strongest F0, no other knowledge

135%

76%

51%

path

117%

72%

54%

path+silence

321%

78%

40%

strongest F0

spurious tones

tones found

correct frames

• Number of correct frames improved by 14%, with the number of spurious tones reduced to nearly a third, leading to significantly smoother melody lines

• Path finding and especially silence estimation likely to suffer from poor instrument identification performance with MIDI based audio

Melody Extraction from Complex Audio 14 / 16 Jana Eggink, Sheffield, UK

Realistic Example

50 100 150 200 250 300

400

800

1600

Melody based on strongest F0

time (frames)

F0

(Hz)

• Beginning of Mozart’s Clarinet Concerto, taken from a CD recording, manually annotated F0s (gray) and estimated melody (black)

F0

(Hz)

0 50 100 150 200 250 300

400

800

1600

Melody based on knowledge integrating path finding

time (frames)

Melody Extraction from Complex Audio 15 / 16 Jana Eggink, Sheffield, UK

Conclusions and Future Work

• Audio generated from MIDI not necessarily good test material!

• Two short manually annotated realistic examples 10%-15% more correct frames than equivalent MIDI based examples

• Further work concentrating on realistic examples, requires manual labeling, or

• Automatic alignment of MIDI data to real recordings?!

Melody Extraction from Complex Audio 16 / 16 Jana Eggink, Sheffield, UK

The End

Any Questions?