Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of...
-
Upload
lindsey-evans -
Category
Documents
-
view
217 -
download
0
Transcript of Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of...
Extracting Melody Lines from Complex Audio
Jana Eggink
Supervisor: Guy J. Brown
University of Sheffield
{j.eggink g.brown}@dcs.shef.ac.uk
Melody Extraction from Complex Audio 2 / 16 Jana Eggink, Sheffield, UK
Task• Extract the melody line from an audio recording
flute
• Useful for: automatic music indexing and analysis, detection of copyright infringement, ‘query-by-humming’ systems...
• No clear definition of what is perceived as a melody by humans
• Working definition: F0s played by the solo instrument in accompanied sonatas and concertos
• Solo instrument is not necessarily always loudest F0
• Therefore: include information about the instrument by which a specific F0 was produced
Melody Extraction from Complex Audio 3 / 16 Jana Eggink, Sheffield, UK
Task I: Identify Solo Instrument
• Instrument sounds are harmonic, energy is concentrated in partials ...
flute
clarinet
oboe
violin
cello
audiosignal
recog-niser
featuresF0 and partials
• ... which are least likely to be masked by other sounds
• Features based only on frequency position and power of lowest 15 partials
• Statistical recogniser (GMMs) trained on monophonic music
Melody Extraction from Complex Audio 4 / 16 Jana Eggink, Sheffield, UK
Identify Solo InstrumentFeatures
• Exact frequency position and normalised log-compressed power of first 15 partials
...+1+30...-10+1
...-3+50...+5-1+2
...445060...658442220partials
frequency (Hz) power (dB)
• Frame to frame differences (deltas and delta-deltas) within tones of continuous F0
Melody Extraction from Complex Audio 5 / 16 Jana Eggink, Sheffield, UK
Results IInstrument Identification
94%6%0%0%0%
12%88%0%0%0%
0%18%82%0%0%
0%6%0%88%6%
0%25%0%0%75%
cello
violin
oboe
clarinet
flute
celloviolinoboeclarinetfluteresponse
stimulus
• Solo instrument with accompaniment (piano or orchestra), commercially available CDs, 90 examples, 2-3 min. each
• Instrument 86% correct
Melody Extraction from Complex Audio 6 / 16 Jana Eggink, Sheffield, UK
But...• Estimated F0s not very accurate (as judged by manual
inspection)
• Overall instrument classification very good, but only when averaged over a whole sound file, results not very accurate on a note-by-note or frame-by-frame basis
• More information is needed to find the melody!
Melody Extraction from Complex Audio 7 / 16 Jana Eggink, Sheffield, UK
Task II: Find Melody (assuming the solo instrument is known)
• Extract multiple F0 candidates
TEMPORAL KNOWLEDGE
tone length
interval transitions
AUDIO
F0 candidates
find most likely
‘path’ through time-
frequency space of
F0 candidates
F0 strength (~loudness)
F0 likelihood (absolute
frequency | instrument range)
instrument likelihood (recogniser output)
LOCAL KNOWLEDGE
silence estimation (only
accompaniment?)
MELODY
• Include additional knowledge about instrument range, tone duration, likely interval transitions to pick correct candidate
Melody Extraction from Complex Audio 8 / 16 Jana Eggink, Sheffield, UK
time
freq
uenc
yKnowledge Integration
(Path Finding)
• Possible melody paths restricted by longer tones of continuous F0• All knowledge sources are normalised to equal mean and
standard deviation• Knowledge sources are summed along the current path• N-best search for most likely path
Melody Extraction from Complex Audio 9 / 16 Jana Eggink, Sheffield, UK
‘Silence’ Estimation• Solo instrument is not always continuously playing
• Use likelihoods for solo instrument along the estimated path
• Present threshold: median of likelihood values for solo instrument (assuming the solo instrument is present at least 50% of the time)
• Silent threshold: mean of likelihood values over all instruments
• Assign whole tones according to proximity to present/silent threshold and the state of their neighbours
• Impose minimum length on ‘present’ sections
Melody Extraction from Complex Audio 10 / 16 Jana Eggink, Sheffield, UK
Evaluation: Test Material• Realistic recordings do not provide information about ‘true’
F0s, even scores only approximation
• Use MIDI generated audio
• Real instrument samples, but only 3-4 per octave, provided by the sampler software
• 10 examples, for every solo instrument one piece with piano accompaniment, one with orchestra
• Solo instrument and accompaniment mixed at 0dB SNR
• Whole movements (or first 3 minutes) to ensure sufficient presence of the solo instrument; mixture of different styles and tempi
Melody Extraction from Complex Audio 11 / 16 Jana Eggink, Sheffield, UK
Results F0 Estimation• Comparing F0s estimated using harmonic sieves to search
for prominent harmonic series with simply picking the highest spectral peak shows no advantage of the former
95%84%94%98%99%98%15
76%64%61%78%94%82%3
52%38%28%48%78%70%1 (strongest)
averagecelloviolinoboeclarinetfluteF0 candidates
(based solely on sections were the solo instrument is present)
• Very unexpected, but might be caused by the very rich mixture of harmonically related tones, initial results show that other algorithms that search for harmonic series like e.g. YIN (autocorrelation based) do not do well either
Melody Extraction from Complex Audio 12 / 16 Jana Eggink, Sheffield, UK
Results Instrument Identification• Solo instrument without accompaniment: all examples correct,
except one oboe mistaken for a flute
• Solo instrument with accompaniment: violin and cello still correct, but performance for woodwinds approaching random, even with true F0s provided
Possible reasons:• Sample-based music might be harder to identify, as it provides
less instrument specific variation like e.g. vibrato• Mixing level might be unfavourable with worse SNR than in
realistic recordings• Frequency regions that are dominated by the accompaniment
might differ between realistic recordings and MIDI based audio
Melody Extraction from Complex Audio 13 / 16 Jana Eggink, Sheffield, UK
Results Melody Extraction• Baseline performance only strongest F0, no other knowledge
135%
76%
51%
path
117%
72%
54%
path+silence
321%
78%
40%
strongest F0
spurious tones
tones found
correct frames
• Number of correct frames improved by 14%, with the number of spurious tones reduced to nearly a third, leading to significantly smoother melody lines
• Path finding and especially silence estimation likely to suffer from poor instrument identification performance with MIDI based audio
Melody Extraction from Complex Audio 14 / 16 Jana Eggink, Sheffield, UK
Realistic Example
50 100 150 200 250 300
400
800
1600
Melody based on strongest F0
time (frames)
F0
(Hz)
• Beginning of Mozart’s Clarinet Concerto, taken from a CD recording, manually annotated F0s (gray) and estimated melody (black)
F0
(Hz)
0 50 100 150 200 250 300
400
800
1600
Melody based on knowledge integrating path finding
time (frames)
Melody Extraction from Complex Audio 15 / 16 Jana Eggink, Sheffield, UK
Conclusions and Future Work
• Audio generated from MIDI not necessarily good test material!
• Two short manually annotated realistic examples 10%-15% more correct frames than equivalent MIDI based examples
• Further work concentrating on realistic examples, requires manual labeling, or
• Automatic alignment of MIDI data to real recordings?!