Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of...

Extracting Melody Lines from Complex Audio

Jana Eggink

Supervisor: Guy J. Brown

University of Sheffield

{j.eggink g.brown}@dcs.shef.ac.uk

Melody Extraction from Complex Audio 2 / 16 Jana Eggink, Sheffield, UK

Task• Extract the melody line from an audio recording

flute

• Useful for: automatic music indexing and analysis, detection of copyright infringement, ‘query-by-humming’ systems...

• No clear definition of what is perceived as a melody by humans

• Working definition: F0s played by the solo instrument in accompanied sonatas and concertos

• Solo instrument is not necessarily always loudest F0

• Therefore: include information about the instrument by which a specific F0 was produced


Task I: Identify Solo Instrument

• Instrument sounds are harmonic, energy is concentrated in partials ...

flute

clarinet

oboe

violin

cello

audiosignal

recog-niser

featuresF0 and partials

• ... which are least likely to be masked by other sounds

• Features based only on frequency position and power of lowest 15 partials

• Statistical recogniser (GMMs) trained on monophonic music


Identify Solo InstrumentFeatures

• Exact frequency position and normalised log-compressed power of first 15 partials

...+1+30...-10+1

...-3+50...+5-1+2

...445060...658442220partials

frequency (Hz) power (dB)

• Frame to frame differences (deltas and delta-deltas) within tones of continuous F0


Results IInstrument Identification

94%6%0%0%0%

12%88%0%0%0%

0%18%82%0%0%

0%6%0%88%6%

0%25%0%0%75%

cello

violin

oboe

clarinet

flute

celloviolinoboeclarinetfluteresponse

stimulus

• Solo instrument with accompaniment (piano or orchestra), commercially available CDs, 90 examples, 2-3 min. each

• Instrument 86% correct


But...• Estimated F0s not very accurate (as judged by manual

inspection)

• Overall instrument classification very good, but only when averaged over a whole sound file, results not very accurate on a note-by-note or frame-by-frame basis

• More information is needed to find the melody!


Task II: Find Melody (assuming the solo instrument is known)

• Extract multiple F0 candidates

TEMPORAL KNOWLEDGE

tone length

interval transitions

AUDIO

F0 candidates

find most likely

‘path’ through time-

frequency space of

F0 candidates

F0 strength (~loudness)

F0 likelihood (absolute

frequency | instrument range)

instrument likelihood (recogniser output)

LOCAL KNOWLEDGE

silence estimation (only

accompaniment?)

MELODY

• Include additional knowledge about instrument range, tone duration, likely interval transitions to pick correct candidate


time

freq

uenc

yKnowledge Integration

(Path Finding)

• Possible melody paths restricted by longer tones of continuous F0• All knowledge sources are normalised to equal mean and

standard deviation• Knowledge sources are summed along the current path• N-best search for most likely path


‘Silence’ Estimation• Solo instrument is not always continuously playing

• Use likelihoods for solo instrument along the estimated path

• Present threshold: median of likelihood values for solo instrument (assuming the solo instrument is present at least 50% of the time)

• Silent threshold: mean of likelihood values over all instruments

• Assign whole tones according to proximity to present/silent threshold and the state of their neighbours

• Impose minimum length on ‘present’ sections


Evaluation: Test Material• Realistic recordings do not provide information about ‘true’

F0s, even scores only approximation

• Use MIDI generated audio

• Real instrument samples, but only 3-4 per octave, provided by the sampler software

• 10 examples, for every solo instrument one piece with piano accompaniment, one with orchestra

• Solo instrument and accompaniment mixed at 0dB SNR

• Whole movements (or first 3 minutes) to ensure sufficient presence of the solo instrument; mixture of different styles and tempi


Results F0 Estimation• Comparing F0s estimated using harmonic sieves to search

for prominent harmonic series with simply picking the highest spectral peak shows no advantage of the former

95%84%94%98%99%98%15

76%64%61%78%94%82%3

52%38%28%48%78%70%1 (strongest)

averagecelloviolinoboeclarinetfluteF0 candidates

(based solely on sections were the solo instrument is present)

• Very unexpected, but might be caused by the very rich mixture of harmonically related tones, initial results show that other algorithms that search for harmonic series like e.g. YIN (autocorrelation based) do not do well either


Results Instrument Identification• Solo instrument without accompaniment: all examples correct,

except one oboe mistaken for a flute

• Solo instrument with accompaniment: violin and cello still correct, but performance for woodwinds approaching random, even with true F0s provided

Possible reasons:• Sample-based music might be harder to identify, as it provides

less instrument specific variation like e.g. vibrato• Mixing level might be unfavourable with worse SNR than in

realistic recordings• Frequency regions that are dominated by the accompaniment

might differ between realistic recordings and MIDI based audio


Results Melody Extraction• Baseline performance only strongest F0, no other knowledge

135%

76%

51%

path

117%

72%

54%

path+silence

321%

78%

40%

strongest F0

spurious tones

tones found

correct frames

• Number of correct frames improved by 14%, with the number of spurious tones reduced to nearly a third, leading to significantly smoother melody lines

• Path finding and especially silence estimation likely to suffer from poor instrument identification performance with MIDI based audio


Realistic Example

50 100 150 200 250 300

400

800

1600

Melody based on strongest F0

time (frames)

F0

(Hz)

• Beginning of Mozart’s Clarinet Concerto, taken from a CD recording, manually annotated F0s (gray) and estimated melody (black)

F0

(Hz)

0 50 100 150 200 250 300

400

800

1600

Melody based on knowledge integrating path finding

time (frames)


Conclusions and Future Work

• Audio generated from MIDI not necessarily good test material!

• Two short manually annotated realistic examples 10%-15% more correct frames than equivalent MIDI based examples

• Further work concentrating on realistic examples, requires manual labeling, or

• Automatic alignment of MIDI data to real recordings?!


The End

Any Questions?

Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of...

Documents

Transcript of Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of...