Time-frequency Analysis of Musical · PDF file‘–gas a ﬁnite ... piano scale. (c)...

Time-frequency Analysisof Musical RhythmXiaowen Cheng, Jarod V. Hart, and James S. Walker
We shall use the mathematical tech-niques of Gabor transforms andcontinuous wavelet transforms toanalyze the rhythmic structure ofmusic and its interaction with melod-
ic structure. This analysis reveals the hierarchicalstructure of rhythm. Hierarchical structure is com-mon to rhythmic performances throughout theworlds music. The work described here is interdis-ciplinary and experimental. We use mathematics toaid in the understanding of the structure of music,and have developed mathematical tools that (whilenot completely finished) have shown themselvesto be useful for this musical analysis. We aim toexplore ideas with this paper, to provoke thought,not to present completely finished work.
The paper is organized as follows. We firstsummarize the mathematical method of Gabortransforms (also known as short-time Fourier trans-forms, or spectrograms). Spectrograms provide atool for visualizing the patterns of time-frequencystructures within a musical passage. We then re-view the method of percussion scalograms, a newtechnique for analyzing rhythm introduced in [34].After that, we show how percussion scalograms areused to analyze percussion passages and rhythm.We carry out four analyses of percussion passages
Xiaowen Cheng is a student of mathematics at the Uni-versity of MinnesotaTwin Cities. Her email address [email protected].
Jarod V. Hart is a student of mathematics at the Universi-ty of Kansas, Lawrence. His email address is [email protected].
James S. Walker is professor of mathematics at the Uni-versity of WisconsinEau Claire. His email address [email protected].
from a variety of music styles (rock drumming,African drumming, and jazz drumming). We alsoexplore three examples of the connection betweenrhythm and melody (a jazz piano piece, a Bach pi-ano transcription, and a jazz orchestration). Theseexamples provide empirical justification of ourmethod. Finally, we explain how the parametersfor percussion scalograms are chosen in orderto provide a satisfactory display of the pulsetrains that characterize a percussion passage (akey component of our method). A brief concludingsection provides some ideas for future research.
Gabor Transforms and MusicWe briefly review the widely employed method ofGabor transforms [17], also known as short-timeFourier transforms, or spectrograms, or sonograms.The first comprehensive effort in employing spec-trograms in musical analysis was Robert Cogansmasterpiece, New Images of Musical Sound [9]abook that still deserves close study. In [12, 13],Drfler describes the fundamental mathematicalaspects of using Gabor transforms for musicalanalysis. Two other sources for applications ofshort-time Fourier transforms are [31, 25]. Thereis also considerable mathematical backgroundin [15, 16, 19], with musical applications in [14].Using sonograms or spectrograms for analyzingthe music of bird song is described in [21, 30, 26].The theory of Gabor transforms is discussed incomplete detail in [15, 16, 19], with focus on itsdiscrete aspects in [1, 34]. However, to fix ournotations for subsequent work, we briefly describethis theory.
The sound signals that we analyze are all dig-ital, hence discrete, so we assume that a soundsignal has the form {f (tk)}, for uniformly spaced
356 Notices of the AMS Volume 56, Number 3

(a) (b) (c)
Figure 1. (a) Signal. (b) Succession of shifted window functions. (c) Signal multiplied by middlewindow in (b); an FFT can now be applied to this windowed signal.
values tk = kt in a finite interval [0, T ]. A Gabortransform of f , with window function w , is definedas follows. First, multiply {f (tk)} by a sequence ofshifted window functions {w(tk `)}M`=0, produc-ing time localized subsignals, {f (tk)w(tk `)}M`=0.Uniformly spaced time values, {` = tj`}M`=0 areused for the shifts (j being a positive integergreater than 1). The windows {w(tk `)}M`=0 areall compactly supported and overlap each other.See Figure 1. The value of M is determined by theminimum number of windows needed to cover[0, T ], as illustrated in Figure 1(b).
Second, because w is compactly supported, wetreat each subsignal {f (tk)w(tk `)} as a finitesequence and apply an FFT F to it. (A good, briefexplanation of how FFTs are used for frequencyanalysis can be found in [1].) This yields the Gabortransform of {f (tk)}:(1) {F{f (tk)w(tk `)}}M`=0.Note that because the values tk belong to the finiteinterval [0, T ], we always extend our signal valuesbeyond the intervals endpoints by appendingzeroes, hence the full supports of all windows areincluded.
The Gabor transform that we employ uses aBlackman window defined by
w(t) =
0.42+ 0.5 cos(2t/)+
0.08 cos(4t/) for |t| /20 for |t| > /2
for a positive parameter equaling the widthof the window where the FFT is performed. TheFourier transform of the Blackman window is verynearly positive (negative values less than 104 insize), thus providing an effective substitute for aGaussian function (which is well known to haveminimum time-frequency support). See Figure 2.Further evidence of the advantages of Blackman-windowing is provided in [3, Table II]. In Figure 2(b)we illustrate that for each windowing byw(tkm)we finely partition the frequency axis into thinrectangular bands lying above the support of thewindow. This provides a thin rectangular partitionof the (slightly smeared) spectrum of f over the
support of w(tk m) for each m. The efficacy ofthese Gabor transforms is shown by how well theyproduce time-frequency portraits that accord wellwith our auditory perception, which is describedin the vast literature on Gabor transforms that webriefly summarized above.
0.5 0.25 0 0.25 0.50.5
0
0.5
1
1.5
(a) (b)
Figure 2. (a) Blackman window, = 1 = 1 = 1. Noticethat it closely resembles the classic Gaborwindowa bell curve described by a Gaussianexponentialbut it has the advantage ofcompact support. (b) Time-frequencyrepresentationthe units along the horizontalare in seconds, along the vertical are in Hzofthree Blackman windows multiplied by the realpart of the kernel ei2nk/Nei2nk/Nei2nk/N of the FFT used in aGabor transform, for three different frequencyvalues nnn. Each horizontal bar accounts for99.99% of the energy of the cosine-modulatedBlackman window (Gabor atom) graphedbelow it.
It is interesting to listen to the sound createdby the three Gabor atoms in Figure 2(b). Youcan watch a video of the spectrogram beingtraced out while the sound is played by goingto the following webpage:
(2) http://www.uwec.edu/walkerjs/TFAMRVideos/
and selecting the video for Gabor Atoms. Thesound of the atoms is of three successive pure
March 2009 Notices of the AMS 357
http://www.uwec.edu/walkerjs/TFAMRVideos/

(a) Drum Clip (b) Piano scale notes (c) Bach melody
Figure 3. Three spectrograms. (a) Spectrogram of a drum solo from a rock song. (b) Notes along apiano scale. (c) Spectrogram of a piano solo from a Bach melody.
tones, on an ascending scale. The sound occursprecisely when the cursor crosses the thin darkbands in the spectrogram, and our aural perceptionof a constant pitch matches perfectly with theconstant darkness of the thin bands. These Gaboratoms are, in fact, good examples of individualnotes. Much better examples of notes, in fact, thanthe infinitely extending (both in past and future)sines and cosines used in classical Fourier analysis.Because they are good examples of pure tone notes,these Gabor atoms are excellent building blocksfor music.
We shall provide some new examples that fur-ther illustrate the effectiveness of these Gabortransforms. For all of our examples, we used 1024point FFTs, based on windows of support 1/8 secwith a shift of 0.008 sec. These time-valuesare usually short enough to capture the essentialfeatures of musical frequency change.
In Figure 3 we show three basic examples ofspectrograms of music. Part (a) of the figure showsa spectrogram of a clip from a rock drum solo.Notice that the spectrogram consists of dark ver-tical swatches; these swatches correspond to thestriking of the drum, which can be verified bywatching a video of the spectrogram (go to thewebsite in (2) and select the video Rock Drum Solo).As the cursor traces over the spectrogram in thevideo, you will hear the sound of the drum strikesduring the times when the cursor is crossing avertical swatch. The reason why the spectrogramconsists of these vertical swatches will be explainedin the next section.
Part (b) of Figure 3 shows a spectrogram of arecording of four notes played on a piano scale.Here the spectrogram shows two features. Its mainfeature is a set of four sections consisting ofgroups of horizontal line segments placed verti-cally above each other. These vertical series of
short horizontal segments are the fundamentalsand overtones of the piano notes. There are alsothin vertical swatches located at the beginning ofeach note. They are the percussive attacks of thenotes (the piano is, in fact, classed as a percussiveinstrument).
Part (c) of Figure 3 shows a spectrogram ofa clip from a piano version of a famous Bachmelody. This spectrogram is much more complex,rhythmically and melodically, than the first twopassages. Its melodic complexity consists in itspolyphonic nature: the vertical series of horizontalsegments are due to three-note chords being playedon the treble scale and also individual notes playedas counterpoint on the bass scale.1 (This contrastswith the single notes in the monophonic passagein (b).) We will analyze the rhythm of this Bachmelody in Example 5 below.
Scalograms, Percussion Scalograms, andRhythmIn this section we briefly review the method ofscalograms (continuous wavelet transforms) andthen

Time-frequency Analysis of Musical · PDF file‘–gas a ﬁnite ... piano scale. (c)...

Documents

Transcript of Time-frequency Analysis of Musical · PDF file‘–gas a ﬁnite ... piano scale. (c)...