ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course ...dpwe/e4896/lectures/E4896-L01.pdf · E4896...
Transcript of ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 1: Course ...dpwe/e4896/lectures/E4896-L01.pdf · E4896...
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Lecture 1: Course Introduction
Dan Ellis Dept. Electrical Engineering, Columbia University
[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1
1. Course Structure2. DSP: The Short-Time Fourier Transform
ELEN E4896 MUSIC SIGNAL PROCESSING
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Signal Processing and Music
• Music is a very rich signal
• Signal Processing can expose some of it
• We want to get inside it2
freq
/ kHz
0
1
2
3
4
freq
/ kHz
0
1
2
3
4fre
q / k
Hz
0
1
2
3
4
freq
/ kHz
0
1
2
3
4
time / sec0 2 4 6 8 10 12 14 16 18 20
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
1. Course Goals• Survey of applications
of signal processing to music audiomusic synthesismusic/audio processing (modification)music audio analysis
• Connect basic DSP theory to sound phenomena and effects
• Hands-on, live investigations of audio processing algorithmsusing Matlab, Pd, ...
3
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Course Structure• Two weekly sessions
Monday: presentations, practical exposition, discussionWednesday: presentations, practical, sharing
• Grade structure20% practicals participation10% one presentation30% three mini-project assignments40% one final project
4
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Flipped Classroom• Video Lectures are required viewing
before Monday session
30-50 mins each
recorded in 2013
bring one question (at least) to class
pop quizzes
5
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Hands-on Practicals• Music Signal Processing involves
connecting algorithms with perceptions
• Our goal is to be able to ‘play’ with algorithms to feel how they workIn class, on laptops, in small groups
• We will use several platforms:
6
MatlabPureData (Pd) Python
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Projects• There will be 3 “mini projects” assigned
approx. week 3, week 5, week 7provide basic pieces, but open-endedinvolve implementation & experimentation
• Also, Final projecton a topic of your choiceimplement some kind of music signal processingdue at end of semester
• Good projects ...... start with clear goals... apply ideas from class... evaluate performance & modify accordingly
• Could continue into summer...7
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Project Examples
• Music Transcription
8
Figure 1: Screenshot of the PD patch implementing the Scheirer algorithm.Much of the logic is hidden inside the “combfilterbank” object at the bottom.
2.2 Tempo Selection
Each of the six onset pulse signals (one for each band) is fed into its ownbank of 150 comb filters – you can think of these filters as mapping to 150possible tempos that we want to detect. A comb filter is essentially just a delayimplemented a circular buer. The delay time on each filters corresponds tothe tempo that that filter is meant to detect – so for example, the filter thatis detecting a tempo of 90 bpm has a delay time of 90
60 = 1.5 Hz, which yieldsa buer length of 44100
1.5 = 29400 samples. The tempo selector then, for eachbpm, sums the energy across all 6 subbands at that tempo. The tempo withthe highest total energy is chosen as the tempo of the piece.
Because of the sheer number of filters (150 6 = 900), it was impossible toimplement this using standard PD objects. Therefore I created a PD Externalwritten in C [3] and used this within my PD patch. The external has six inputs,one for each subband, and has two float outlets – one indicating the detectedtempo, and the other indicating the phase. I added a signal outlet as well for
2
• Beat Tracking
• Mood Classification
Fig. 4. Comparison of effectiveness of different combination of features.
the most effective combination of features. The result is shownin Figure 4 Although there are some features especially useful,I also observe that the number of features enhance the overallclassification accuracy. Therefore, the best combination is touse all the features.
Fig. 5. Composition of hits in each class.
VI. EVALUATION AND RESULTS
In the evaluation part, I will use 100 test samples (20samples for each mood class) to evaluate the effectivenessof my classifier. Essentially, there are two tasks I wouldlike to investigate in the evaluation. First, I use the bestcombination of feature found in the feature selection part,which is to use all of the features. The average hit rate is0.64. However, I noticed that the accuracy of each moodclass are dramatically different. The hit rates for angry and
Fig. 6. Comparison of effectiveness of different features for distinguishingtwo classes
happy are almost 1 while hit rate for calm is 0. Thus I furtherinvestigate the composition of hits in each class. The result isshown in Figure 5It is quite intuitive that happy songs will bemistaken as angry songs since they both have fast tempo. Inthe same logic, it is also intuitive that calm songs, sad songsand scary songs are easily obfuscated. However, it is hard toenvision that calm songs would be largely classified as angry.This phenomenon is still left to be investigated. Second, Iwould like to investigate which features are capable to tellthe difference between any two different classes. Differentcombination of features will be applied in the above two tasksto find out the most effective combination of features. Theresult is shown in Figure 6. Previously we have known thattimber features are very useful. In this investigation, we furtherfound that the harmony features are useful when the two songsare melodically different. For example, harmony features areeffective in terms of classifying calm songs and happy songs,while they are not useful when distinguishing calm songs andsad songs.
VII. CONCLUSION
In this project, I implemented and evaluated a music moodclassifier. Specifically, I focus on investigating the effective-ness of each feature. Compared with other feature selectedworks, we found that tempo feature does not always work wellwhen classifying music moods, while timbre features play asa key role of recognizing music moods. Also we found thatnumber of features also affects classification accuracy. Whenfurther analyzing the classification results, we found that someclasses are easily recognized to another one, and then thefeatures that are suitable for distinguishing two mood class arefound. For example, harmony features are useful when the twosongs are melodically different. However, there are still someunexplainable phenomenon that two classes are obfuscated.In the future work, I plan to find the explanation for this
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Presentations• Everyone will make at least one
in-class presentatione.g. presentation of a relevant paper... or your latest mini-project... or just something you’re curious about
• We will assign time slots first-come-first servedslots each classcheck course calendar to choose a timeemail TA
• Encourage discussioninformal presentations of practical findings
9
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Web Site• Information distributed via:
http://www.ee.columbia.edu/~dpwe/e4896/
10
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Course Outline
• Anything missing?
11
Jan 20: Intro Mar 07: Pitch tracking
Jan 25: Acoustics Mar 21: Time & pitch scaling
Feb 01: Perception Mar 28: Beat tracking
Feb 08: Analog synthesis Apr 04: Chroma & Chords
Feb 15: Sinusoidal models Apr 11: Audio alignment
Feb 22: LPC models Apr 18: Music fingerprinting
Feb 29: Filtering & Reverb Apr 25: Source separation
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
2. Digital Signals
• Sampling interval
• Sampling frequency
• Quantizer
12
time
xd[n] = Q( xc(nT ) )
Discrete-time samplinglimits bandwidth
Discrete-levelquantization
limitsdynamic range
T
0 = 2/TT
Q(x) = · round(x/)
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Fourier Series• Observation:
Any periodic signal can be constructed from sinusoids at integer multiples of the fundamental frequency
• E.g., square wave
13
k = 0; ak =
(1)
k12 1
k k = 1, 3, 5, . . .
0 otherwise
k1 2 3 5 6 74
|ak|1.0
1.5 1 0.5 0 0.5 1 1.5 1
0.5
0
0.5
t
x(t)
x(t) = x(t + T ) x(t) M
k=0
ak cos(2k
Tt + k)
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Fourier Transform• Complex form of Fourier Series + analysis
• Let Harmonics become infinitely close:
are equivalent, dual descriptions14
x(t) M
k=M
ckej 2kT t ck =
1T
T/2
T/2x(t)ej 2k
T tdt
T
x(t) =12
X(j)ejtd
X(j) =
x(t)ejtdt
x(t), X(j)
0 0.002 0.004 0.006 0.008time / sec
level / dB
-0.01
0
0.01
0.02
x(t)
0 2000 4000 6000 8000freq / Hz
-80
-60
-40
-20 |X(jΩ)|
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Spectrum of Periodic Sounds• If sound is (nearly) periodic (i.e., pitched),
Fourier Transform approaches Fourier Series
• n.b.: plotted in log units:
15
x(t) x(t + T )X(j)
k ck( k 2
T )
1.52 1.54 1.56 1.58-0.1
0
0.1
0 1000 2000 3000 4000-100
-80
-60
-40
time / s freq / Hzlev
el / d
B
ampli
tude
|X(j)|dB(x) = 20 log10(x)
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Discrete-Time Fourier Transf. (DTFT)• Sampled has same FT form
• ... but results in spectrum with period 2π:
16
x[n] = x(nT )
x[n] =12
X(ej)ejnd
X(ej) =
n=x[n]ejn
0 π
|X(ejω)|
ω
2π 3π 4π 5π
1
2
3 πargX(ejω)
ω
−π
π 2π 3π 4π 5π
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Discrete Fourier Transform (DFT)• N-point finite-length sampled signal
the kind you can have on a computer!
• Just a matrix-multiply between vectors
17
x[n] =1N
N1
k=0
X[k]WnkN
X[k] =N1
n=0
x[n]WnkN
WN = ej 2N
X[0]X[1]X[2]
...X[N 1]
=
1 1 1 · · · 11 W 1
N W 2N · · · W (N1)
N
1 W 2N W 4
N · · · W 2(N1)N
......
.... . .
...1 W (N1)
N W 2(N1)N · · · W (N1)2
N
x[0]x[1]x[2]
...x[N 1]
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Short-Time Fourier Transform• To localize energy in time and frequency...
chop signal into N-point windows every L pointstake DFT of each one
18
2.35 2.4 2.45
0
4000
3000
2000
1000
2.5 2.55 2.6-0.1
0
0.1
time / s
freq
/ Hz
k →
short-timewindow
DFT
m = 0 m = 1 m = 2 m = 3
L 2L 3L
X[k, m] =N1
n=0
x[n + mL]w[n]ej 2knN
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
The Spectrogram• Plot STFT |X[k, m]| as a grayscale image
19
time / s
time / s
freq
/ Hz
intensity / dB
2.35 2.4 2.45 2.5 2.55 2.60
1000
2000
3000
4000
freq
/ Hz
0
1000
2000
3000
4000
0
0.1
-50
-40
-30
-20
-10
0
10
0 0.5 1 1.5 2 2.5
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Time-Frequency Tradeoff• Shorter window
better resolution of time detail worse resolution of frequency detail
201.4 1.6 1.8 2 2.2 2.4 2.6
freq
/ Hz
time / s
level/ dB
0
1000
2000
3000
4000
freq
/ Hz
0
1000
2000
3000
4000
0
0.2
Win
dow
= 2
56 p
t N
arro
wba
nd
Win
dow
= 4
8 pt
Wid
eban
d
-50
-40
-30
-20
-10
0
10
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Spectrogram in Matlab-
• https://www.ee.columbia.edu/~dpwe/e4810/matlab/M14-fft.diary
21
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Live Matlab Spectrogram
• http://www.wakayama-u.ac.jp/~kawahara/MatlabRealtimeSpeechTools/
22
E4896 Music Signal Processing (Dan Ellis) 2016-01-20 - /23
Summary + Assignment• Music Signal Processing
synthesis, modification, analysishands-on investigation
• Courseparticipation, practicals, projects, presentations
• Digital Signal Processingsignals on computersFourier analysis & spectrogram
• AssignmentWatch Acoustics video & prepare questionDo Pd tutorial http://en.flossmanuals.net/pure-data/ through to “Amplitude Modulation”
23