Analysis and assessment of musical sounds · C. ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS H.F....
Transcript of Analysis and assessment of musical sounds · C. ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS H.F....
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
Analysis and assessment ofmusical sounds
Pollard, H. F. and Jansson, E. V.
journal: STL-QPSRvolume: 21number: 4year: 1980pages: 062-098
http://www.speech.kth.se/qpsr
C. ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS
H . F . Po l l a rd* and E.V. Jansson
Abs t r ac t
A ccsnplete description of musical sounds must include the changes in spectrum w i t h time to describe the four major characteristics: dura- tion, pitch, loudness and timbre as we11 as the microstructure within notes. To understand the perception of the musical sounds, the operat- ing characteristics of the hearing process should be considered, such as the c r i t i ca l bands of hearing, masking effects, the t5-w resolu- tion needed, tim constants for integration of signal paramters, and likely memory functions.
Iong-time-average-spectra of organ ranks are shown t o predict steady state paramters independent of starting transients, frequency range of a rank and its roughness. Sampled f i l t e r methods show that transients of different part ials dartinate during different portions of the total starting transient and that very high rate of changes may occur. By replacing the natural starting transients with expnen- t ia l ones, the averall effect is not to make the sound ccgnpletely dif- ferent, it is a change in one quality factor of the tone, while lack of reverberation makes the tone sound unnatural.
The tri-stimulus method, employed in explaining the colour vision, is introduced as a method to analyse musical sounds. In this investiga- tion the j3rdmental was used as the f i r s t coordinate, the folluwing three part ials resolved by the ear as the second, and the higher par- t i a l s as the tkird. The tri-stimulus graphs display differences behem ranks and between pipes of the same rank, which seems perceptually im- portant.
In the following three appendices analysis methods are reviewd, results of previous analysis presented, and a synthesis prescription given to test the relevance of an analysis.
- . , 7. , ' 1 . I n t r o d u c t i o n ' .
Musical sounds form a special class that di f fer markedly in
structure f r m either pure tones or noise, the two extremes that are
usually considered by acousticians. Methods of analysis that have
been devised for pure tones or noise are often unsuited to the analysis
of a musical sound. In particular, changes that occur in the spectrum
of the sound with time must be described in same way.
In the time damain, a musical sound consists essentially of three
parts: (1) a starting transient in which camplicated interference
* guest researcher a t the Dept. of Speech Commmication, Febr.-June, 1 979. Address : School of Physics, University of New South Wales, Kensington, Australia
STL-QPSR 4 / 1 980 63.
processes occur between the free vibrations of the modes ccanprising
the instrument and the forced mtion imposed by the source of energy,
(2) a m r e or less steady region corresponding t o the forced mtion
of the instrument. (This 'steady s ta te ' sound usually consists of a
s e t of part ial tones that may or may not be related hanmnically, to-
gether with same background instrumntal noise. ) ( 3 ) The decay of
free vibrations of the mdes of the instrument when the forcing function
has been remved. The decay pattern is usually ccanplicated by inter-
ference between the instrument modes and roam modes. Under normal
music-making conditions, the roam modes predominate during the decay.
A major characteristic of musical sounds are the changes of state that
occur with time. It is rare t o encounter a sound in music whose descrip-
t ive p a r a t e r s are not varying in scm way with time.
A measurement system suited for analysis of musical tones must be
capable then of providing not only a time description of the sound but
also a detailed frequency-time analysis i n terms of a l l the important
spectral cmpnents. Furthennore, even the simplest piece of music
consists of a sequence of notes. Therefore, the analysis procedure
should be capable of dealing w i t h a t least a simple sequence, such as
a diatonic scale. ,
The present investigation aims towards such a physical analysis
procedure. But it is equally important that the analysis procedure is
adequate for the perception by our hearing. Therefore properties of
the hearing process are discussed in sarne detai l . Musical sounds have
certain signal properties and their perception determines which of
these properties are important. ..
The report is the work by one of the authors (HFP, on sabbatical
leave t o the Departmmt of Speech Carrarmnication, KTH) together with
the second author. The report contains tm main parts: one part re-
viewing previously used methods and investigations, and a second part
w i t h an analysis of organ tones done a t the Department of Speech Cam-
m ica t ion . F 'urthmre, simple gating and f i l t e r experiments are
described and a new method, "the mi-stimulus Method", is introduced
as a possible description for the timbre of musical instrurrrents.
2. Characteristics of a musical sound . - . y '-'7 I
1 There are four major characteristics that are used to describe
I a musical sound: (1) duration, which involves the relative imprtance I of the starting transient, the 'steady1 sound and the decay; (2) pitch,
" - Mch involves an assessment on a one-dimensional scale related to
' frequency; (3) loudness, which is a masure of the electric p e r
' transmitted by the auditory nerve (Howes, 1979); (4) timbre, which .ci!::'
is a multi-dinuansional entity. - . ,,. -
I + < ;, , ', - - . In addition to the major characteristics of the sound, there are
*?-'< 1 the dynamic fluctuations within each note (soarretimes called the micro-
structure) , examples of which include: (a) variations in the level of
the time envelope, (b) variations in both the level and frequency of
'" spectral canpnents, (c) the temprary appearance of hhammnic spectral r ' j;
canponents, and (d) the presence of noise caponents. All of these . + x i -
factors can be vital for a musical sound. . .. I
3. Perception of musical sounds I 1 ' When it is desired to study loudness relationships and soarre as-
pects of timbre, the operating characteristics of the hearing process
,. . should be considered. While there is detailed physiological and I psycho-physical information available concerning processes occurring
-i in the region of the cochlea, there is much less known concerning the
processing of neural impulses after they leave that region.
The basilar mgnbrane contains about 30,000 receptor hair cells
and behaves as a set of 24 band pass filters (critical bands) each con- -, taining approximately the same n&r of hair cells (about 1300 per 1
/ ' ' a - band) . The bandwidths of the critical bands are approximately 100 Hz r s ';. for frequencies below about 450 Hz and approximately one-third octave
1 ,' E for higher frequencies (Zwicker & Feldtkeller, 1967). The bandwidths
i- - " ' measured as pulsation threshold have, however, approximately half the
": above bandwidths '(~lcmlp, 1967, Ch. 1) . The maximum firing rate of ' neurons energised by the sound wave is 1000 per sec; the rate being
I
dependent on the intensity of the sound. Individual neurons respond
to an intensity range of 40 dB. The recovery time after each firing
is greater than 1 ms so that the minimum time required t o detect a
change in intensity is probably i n the range 2-10 ms. There is sane
agreement that a time constant of 100 ms is involved i n the aural as-
sessment of the loudness of a steady sound (Zwicker & Feldtkeller,
1967). Details concerning the subsequent auditory coding of neural
impulses arising fran the ear are still somewhat speculative
(Ibederer, 1975) . Thus, for loudness determination, the analyser should preferably
be the equivalent of a set of f i l t e r s with bandwidths approximately
the same as the c r i t i ca l bands of the ear, and with the added capa-
b i l i ty of providing an estimate of the loudness i n each band af ter
low pass f i l ter ing with a t ine constant of 100 ms. The f ir ing rate
implies that the f i l t e r s should be sampled a t least every 10 ms.
One characteristic of a steady sound is roughness (Terhardt, 1974).
A musical sound w i t h t m or m r e part ials w i t h i n the same cr i t i ca l
band w i l l give rise to a sensation of roughness. The partials within
the sarue band are not resolved but produce a beating effect which can
be unpleasant. With only one part ial in each occupied band, the music-
a l effect is one of m t h n e s s . The perceived roughness corresponds
to a lm-pass RC-filtering w i t h a 13 ms t ine constant.
A strong sound canpnent may make another catpnent impossible
to perceive. This phenmon is called masking. In the frequency do-
main a low tone may mask a high tone, while the opposite effect is
snall. Therefore, simultaneous masking can be modelled as a masking
threshold fal l ing fran the level of each band a t a rate of approximately
10 dB/Bark towads higher pitches (c.f . Zwicker & Feldtkeller, 1967,
Fig. 76.2). Masking p h e n m are also present i n the time dcanain.
A suitable labelling pre-masking and post-masking were introduced
in Zwicker (1977). The p r e l ~ s k i n g for loudness is short relative * ' > A
to that of post-masking.
The relative importance of the starting transient and the steady
state parts of a musical note have been studied by a n m h r of mrkers
(sunanary in Grey, 1 975) . The starting transient plays an important
the note together with an assessmnt routine for dynamic changes,
(2) a long-term m r y which muld enable the identification of the
instmment being played or the relationship of one note with another
in a s q k n c e . ;> . ' 1 1 .s?43t,i1 c y,. t l C , I
- . J
A further r e f i n m t that may be- possible a t a la ter date
,, is to use a system of analysis that follows more closely the audi-
tory coding processes used by the brain. These processes have yet
to be delineated with respect t o hearing but advantage may possibly .l
be taken of the progress that has been made in relation to optical - t I
coding. I
I + , . SUE i n i t i a l studies for this report are sumoarized in Appen-
A Q- dices. Thus the Appendix A presents methods previously used for I , c.
analysis of musical sounds, Appendix B same results of previous anal-
yses, and Appendix C methods of synthesis for finding out the im- -
t . portant sound parameters for the perception. I 4 . LTApmeasurements and f i l t e r i n g experiments
I " - Recordings were made of a diatonic scale (C4 t o Cg) played on
three different organ pipe ranks at Sphga Church, Stockholm, using
four different note durations (very short, 0.5 s, 2 s, continuous
diatonic scale). For each of the ranks of pipes recorded and each
note set, an LTAS was ca tp t ed (see Appendix A) . Fig. IV-C-1 a shows
the curves derived for a Rincipal 8' rank (short notes and conti-
nuous scale) , while similar curves are shown in Figs. IV-C-lb and 1c
for Gedackt 8' and Vox hum^ 8' ranks, respectively. There is l i t t l e
difference i n the shapes of the curves obtained for each rank, ex-
cept for the shortest notes played on the Principal 8'. In the lat-
ter case, the curve for the shortest tirrae shows a reduced level a t
low frequencies likely because the fundamental is slaw to develop. q^'"6-b: It is concluded that where the LTAS is not very sensitive to the char- . " : acter of the start?ing transients of the notes, the LTAS is a repre-
sentative average for the quasi-steady s ta te portion of the tones - i .e. , the physical parameter defining steady state timbre. I
STL-QPSR 4/1980 69.
I I I 1 I I I 1 I 1 I I
PITCH ( B A R K )
0 10 *;
20
PITCH ( B A R K )
0 10 20
PITCH ( B A R K )
Fig. IV-C-1. LTAS-curves for a diatonic scale (C4 to C5) on (a) Principal 8 ' rank of organ pipes, (b) a Geciackt 86 rank, (c) a Vox Humans 8 ' rank. The organ was a Marcusson in Sphga church, S ~ . o l m . The curves shown are for a set of very short duration notes (---) and for a continuous diatonic scale (-!. A pre-exphasis curve was used in the analysis.
I
Frm the LTAS:es i n Fig. IV-C-1 m can predict p s s i b l e rough-
ness sensations. The LTAS:es describe approximately the spectrum
envelope for different ccanplex tones and thus m can estimate the
upper frequency limits for the different musical tones. I
For the three organ pipes having pitch C4 (262 H z ) , the high
frequency limits are approximately: Principal C4, 4 kHz; Gedackt Cq,
3 kHz; Vox Humana C4, 8 kHz.
Principal and reed pipes as Vox Humana produce a f u l l set of
harmonic par t ia l tones and for a fundamental frequency of 262 Hz,
which is also the frequency difference betmen partials, t m or mre
partials w i l l f a l l within the s m cr i t i ca l band above 1.5 kHz. This
w i l l give rise to possible roughness, which is especially so for the
Vox H-, with strong part ials above 1.5 kHz.
Closed pipes produce only odd-nuhered partials. For instance,
for a Gedackt C4 pipe, only part ials above 3 kHz can contribute t o
roughness. A s noted above, there is little radiated sound above
th i s frequency. Listening to recorded tones reveals that the Ge-
dackt Cq sounds smooth, the Principal Cq sounds slightly rough,
and the Vox Humans Cq very rough. Low-pass f i l ter ing a t 1.5 kHz I rem3ves the roughness sensation, which c0nfin-w the previous con-
clusions regarding roughness. ? f
5 . Growth curve measurements and q a t i n g experiment I Steady signals are frequently analysed with the aid of a se t
of analog f i l t e r s (either 1/3 octave bandwidth or narrcw band) whose
outputs are scanned in series or parallel. For the analysis of
transient signals, as i n speech and music, a parallel bank of f i l t e r s
is required whose outputs can be sampled a t short time intervals, a t
least every 10 m s . The f i l t e r bandwidths may be variable or set t o
1/3 octave values. Digital f i l t e r s muld be an alternative but to
date they do not appear t o have been used in the analysis of musical
sounds. A pre-emphasis netmrk may be desirable i f the magnitudes
of important high frequency components are lm. The mighting net-
mrk can also be designed to be an approximation t o loudness might-
ing functions (Stevens, 1972; Zwicker & Feldtkeller, 1967). The
sampled f i l t e r outputs are stored in a ccarrputer for subsequent anal-
ysis and display.
- --- - - --
STL-QPSR 4 /I 980 71.
5.1 Sampled F i l t e r Methcd.
A set of 51 adjustable f i l t e r s was available for t h i s analysis.
The f i l t e r s e r e set to have a centre frequency every 100 Hz star t ing
£ram 100 Hz, the bandwidth was set to 125 H Z , smoothing f i l t e r time
constant 13 ms, sampling ra te 100 Hz. In addition, a pre-esrqhasis
ne-k was used (Fig. IV-C-2) that permitted the recording of great-
er amplitudes a t higher frequencies and also gave a weighting tha t was
an approximation to loudness weiqhting functions (Stevens, 1972;
Zwicker & Feldtkeller, 1967). I f required, unweighted f i l t e r respons-
es could easily be derived f r m which loudness values in sones could
be c q u t e d .
Figs. IV-C-3, IV-C-4, IV-C-5 show results obtained for a Gedackt
C4, Principal Cq, and Vox Humana C4 pipe, respectively (Sphga church
organ). In each case the weighted t o t a l level is given together w i t h
curves for scatre of the par t ia l tones (only the f i r s t six are shown
for the Principal and Vox Humana) . For loudness carputations the
f i r s t s ix par t ia ls are treated separately, a f t e r which the par t ia ls
are grouped since'mre than one of the upper par t ia ls w i l l f a l l into
each c r i t i c a l band.
One of the major effects produced by the s tar t ing transients of
sounds is to d r a w the attention of the l is tener t o the sound. It is
of interest therefore to plot the derivative of the sound level versus
t h e a s in Fig. IV-C-6, for the Gedackt C4. Fram inspection of such
graphs it is possible to ascertain which pa r t i a l tones are likely to
be dcaninant in gaining the attention of the ear. Such overall features
may be shown mre clearly by plotting only the dominant slopes in each
case, as shcwn i n Fig. IV-C-7, For instance, in the case of the Ge- I
dackt Cq pipe, the fundamental tone has its fas tes t r a te of change
between 0 and 11 ms, the f i f t h par t ia l betwen 11 and 18 ms, foil-
by the fundamental again fran 18 t o 42 ms.
For the Principal C4 pipe, the second par t i a l is d m h a n t be-
twen 0 and 23 ms, the 11th and 12th par t ia ls betwen 23 and 31 m s ,
the th i rd pa r t i a l betwen 31 and 48 ms, followed by the fundammtal.
C ..-.. ,.
. - : , *. ..a0 . - , ,.,I ' r F .T;;;.: - ... " * \ . . . . < , t F : ' " r , 5 . 'f. . . A
,u&c;' j f i $** L FREWENCY ( HZ 1
1 t' -* J,
Fig. IVX-2. (a) Pre-emphasis filter characteristic for sampled filter set.
. : i L - !b) mudness weighting function (Stevens, 1 972) , , 7-
: (c) Loudness weighting function (Zwicker , 1 967) . ' r- : * ->
* 1 . . F? i 50 1 1
GEDACKT C, +
. ,;.' a r . '-tx; 40 - )'-L , . . -> f :b (,f . , = - s f e i r
. \ F.: 5, ; -. 3, - , x ,.. : ..?. 3
, , Relative , . * ' Loudness I . -
. :; Level
\
\ '.A j 7
k . , . /--
- i t , $ j -.- . . . + TIME (rns ) . -. -- 1 .-.,A .. ,
- Fig. IV-C-3. Analysis of startina transient for a Gedackt Cd organ pipe. Partials 1-9.
TlME (ms)
,Fig. IVq-6. D e r i v a t i ~ of loudness level versus centre t i n e of 10 m s band for a Gedackt C4 organ pipe. The numbers indicate partial tones.
TIME (ms) r . .-,,-. : ' ,,-;y,: ..: -1 .-, . 7-,. . . . . .
Fig. IV-C-7. Derivative of loudness level versus centre time of 10 m s band for (a) Gedackt Cq, (b) Principal Cq, (c) Vox Humans Cq organ pipes. The par t ia l nuher is marked against each graph segmnt.
STL-QPSR 4/1980
In the case of the Vox Humana, the fundamental is daninant
£ram 0 to 17 ms, the third part ial from 17 t o 32 ms, the 11-13th
partials £ram 32 to 43 ms, etc. While such graphs do not constitute
the whole picture regarding the aural assessment of transients, they
are a t least useful in revealing the partials which have the great-
est rate of change of loudness a t each interval of time. It is in-
teresting to note the high rates of change involved, in saw cases
w e r 2000 dB/s.
The above analysis implies that there is an analog of the I
effect which is applicable t o rates of change of signal level.
It muld be of interest to conduct further psycho-acoustic tests re-
lating to this supposition.
5.2 Gating Experiments $
Effects of three kinds of gatings were tried. In the f i r s t one
modifications of the starting transients were investigated. The
transients were cut w i t h an exponential fas t gate to avoid click
sounds. The in i t i a l portions of tones were cut off i n steps froan 50
to 300 ms. The effects of the cuttings varied fran small t o con-
siderable (especially for the Vox Humana) but did i n no case make
the sound ccanpletely different. The overall effect is to change one , I
quality factor of the tone. -
In the second series the effect of cutting away the roam re-
verberation was tried. W i t h the previously used gating system, tones
w i t h and without reverberation were carpared. Lack of reverberation
makes the tone sound unnatural. In the third series t m tones were
added w i t h the start of one delayed relatively t o the second.* The
delays t r ied were 0 , i 25, i 50 ms. The effect of this delay was
fai r ly small even w i t h the 50 ms delays although the difference in
starting transients were perceived in these cases. A 50 ms time de-
lay corresponds t o 16 m difference in distance t o the pipes, i .e. a
fa i r ly large distance difference :seldom encountered by a listener.
5.3 Fourier Transform Method
Similar information to that provided by the sampled f i l t e r mthod
m y be obtained by digitising and storing the signal and then using a
*. . m e additions were i m p l m t e d by Ake Olofsson on the ccsnputer of the Department of Technical Audiology, Karolinska Institutet. 1
fas t Fourier transform t o cosnpute the spectrum of selected segmnts
of the signal. I f a sampling rate of 25 kHz is used, then each 10
I m s slice of the signal w i l l contain 256 points and a corresponding
I frequency spectrum is obtained w i t h a frequency resolution of 100 1 Hz. Problems that arise w i t h this mthd include the definition of
the start ing point of the analysis and the choice of a suitable - i winduw function. Sane werlap of time sl ices is desirable - a t least
50% is often used. This method needs further dwelopmnt before it
can cclmpete w i t h the sampled f i l t e r mthcd for convenience of opera-
, tion.
6 . Tri - s t imulus Method
A basic problem in devising a satisfactory system of measure-
ment for transient sounds is to relate the acoustic procedures t o
those used by the ear and brain. Unfortunately, there is l i t t l e
detailed information available concerning the processing of neural
, impulses af ter they leave the region of the cochlea. This situation
is in contrast w i t h optical research where there is considerable in-
. . formtion available concerning the coding of optical signals. Des-
: pi te the enomus n&r of receptors in the retina, colour vision
depends on only three different types of cone receptors. Scune of the
possible coding operations that occur betmen the retina and the vi-
sual cortex have been discussed by Land (1977). - : I F .. , Sarrre clues as t o possible auditory coding procedures cclme froan
psychoacci s t i c experiments relating to the timbre of musical sounds.
3, - Grey (1977) describes experimnts on a set of 16 music instrumnt
tones i n which perceptual similarities were determined for a l l pairs
of stimuli. Application of multidimensional analysis sh- that
three dimensions were sufficient to give a satisfactory representa-
tion of the data (see also Section 3). Plcanp and S-eken (1973)
describe an experiment in which the relation between timbre and
sound spectrum for selected steady sounds was studied a t various
pints in a diffuse sound field. A dissimilarity matrix was con-
structed which showed that the ten test stimuli could be represented
satisfactorily by a three-dimensional space. Thus present tentative
information, such as that provided by the above experiments, suggests
that the f inal assessment of auditory siqnals is in terms of a
-11 nunher of derived parameters. Whether the number of such
parameters can be as small as three w i l l depend on the outcame of
m r e extensive experimentation. On a related topic, Mller (1979)
finds that six nvtasurement parameters are needed t o obtain reason-
able correlation beheen objective and subjective descriptions of
high f ideli ty audio systems.
Time variations of amplitude and frequency are important in
the perception of sounds. Associated with these effects are me-
mory processes, both short-term and long-term. A t present there
is no satisfactory method for incorporating such processes into
acoustical measurement procedures.
A s a starting point for the develo-t of a simplified mew
for representing musical sounds, it was decided to use a 3-coordi-
nate methcd based on the spectral information derived f m the
sampled f i l t e r set. This method produces the level of each part ial
as a function of t i m e . The partials are groupd i f m r e than one
f a l l s within the sarrre cr i t i ca l band. A choice must nuw be made of
the three spectral regions that are t o represent the three coordi-
nates. It was decided to use the fun-ntal as one coordinate. I I
A choice must also be made of the unit in which to express the
f i l t e r outputs. Since a nmber of arithmetic operations are to be
made, it was decided to convert the f i l t e r outputs into loudness units
in sones. I f the unweighted output levels are used, it is convenient
to ccsnpute loudness values using Stevens (1972) Mark VII method.
When wei'*ted values were used, it was assumed that these values
were the equivalent of PLdB values f r m which sones were easily ob-
tained using the table given in Stevens ( 1 972) . A s a f i r s t approximation, the output of each band may be con-
sidered to be independent, so that the total. loudness of the sound
may be expressed as
. ?- - r " . I c . . . : . . I f an analogy is drawn w i t h the optical tri-stimulus method, each
texm on the right-hand side of Eq. (1) may be regarded as a tri-stim-
ulus value f r m which a se t of noml ised coordinates may be found:
" . . i a ., tsi; !. Frcm these coordinates, it is now only necessary t o select
- L : w . Ism in order t o d r a w a graph since x + y + z = 1. Fig. IV-C-8
shows a graph of x versus y with the significance of the main areas
, indicated. I
- : . I . < Fig. IV-C-9 shm a tri-stimulus graph for the behaviour of 3,
the start ing transients of the following organ pipe sounds: .-
I
1 (a) Gedackt 8 ' Cq, (b) Principal 8' C4, and (c) Vox Humans 8' Cq.
I -2
Tirrres measured f m the onset of the sound are shown beside each , i ) z:;
curve. The f inal pint in each case corresponds w i t h the steady <. 3 i sound. A nurmber of interesting features may be observed. It is
M i a t e l y apparent how the spectral character of the sound changes .; '1 3: . ]> .'
betmen cr~set and steady state. The Gedackt C4 has stronq high
part ials in i t i a l ly and then progresses t o a mre fun-tal type -*31 of tone. The Principal C4 starts with strong high partials and
a then progresses t o an evenly balanced sound. The Vox Humana Cq I starts with a predominance of lower part ials and then mes tawards
. a predminance of higher partials. I t 3 ; 4 ; ; - : > -
This methcd may be used to investigate the transient beha-
vioux of different pipes within the same rank of pipes. For in- I
stance, Fig. I17-C-10 shows noml ised tri-stimulus plots for three I
notes of the Gedackt 8' rank, while Fig. N-C-11 and 12 show plots
for the same three notes of the Principal 8' rank and Vox Humans 8'
rank, respectively. While the Gedackt notes show a similarity of
behaviour, the Principal aFti! YOx H u m n a notes show marked differ-
ences i n transient behaviour, although the steady state values are
A d
Fig. IV-C-8. Acoustic tri-stimulus diagram.
Fig. IV-C-9. Acoustic t r i - s t imlus diagram shaving transient behaviour of a Gedackt C4, Principal C4 and Vox Humana C4 organ pipe. The nunhers indicate I
t i m e in milliseconds a f t e r onset of sound. The larger circles correspond to the steady state. '
Gedackt 8'
F . 1 0 Tri-stimulus diagram s ~ u the transient behaviour of three Gedackt 8 organ pips.
0.5 1.0 x-
.>
'\
Fia. IV-C-11. Tri-stimulus diagram show in^ the transient behaviour of three
. Principal 8' organ pipes.
\ \
T -- - I '3 -.. -
0.5 1.0 x-
i . - 1 2. Tri-stimulus diagram showing the transient behaviour of three Vox H i m a n a 8 ' organ pipes.
. .
STL-QPSR 4/1980
in approximately the same area of the diagram. In listening t o the
recorded sounds, the Principal E4 pipe had a slowly developing funda-
mental while the Principal Gq pipe had a prominent high-frequency
'chiff ' i n the starting transient. The Vox Humans E4 and Q4 notes
have a similar starting sound whereas the C4 note is markedly dif-
ferent.
7 . Discussion
&thods are now being evolved for the analysis of ccanplex sounds,
such as the starting transients of musical sounds, that take account
of the known characteristics of the hearing process. Factors that
are relevant t o such a system of analysis include (a) an i n i t i a l fre-
quency analysis related to the c r i t i ca l bandwidths of the ear, (b)
a knowledge of the response times of the hearing mechanism, (c) mask-
ing effects, (dl the possible operation of the precedence effect, and
(e) m r y processes.
In our experiments w have shown that long-time-average-spectra
give a representation of the "steady-state" properties of tones f m
an instrument. W e have shown that the properties of starting tran-
sients are suitable to analysis in sampled parallel f i l t e r bands and
a new method "the tri-stimulus method" has been introduced as a I
simple way t o s m i s e these propr t ies .
Not a l l of the factors (a) t o (e) above are sufficiently well
defined t o be incorporated in a system of masurement (see Appendices
A and B ) . For instance, there is a vagueness in the literature con-
cerning the minimum time required t o register a change in loudness
or of timbre. A knowledge of such times muld be important in defin-
ing the frequency of sampling and averaging of time-dependent tran-
sients. There have been few a t t e r s t o incorporate memory functions
into analysis procedures despite their importance in subjective eva-
luation of transient sounds.
Detailed information concerning auditory coding of signals is
not yet available. Eventually th i s type of information could lead
to marked changes in analysis procedures. A satisfactory method of
analysis should be applicable to sequences of musical notes since the
I analysis of isolated notes is of little interest outside the labo-
ratory. A critical test of any analysis procedm is to use the
analysis data as the basis for canparison by synthesis (c.f. Appen-
! . . Z i , , ' > ; ;,:,y,:
! . . " ' , j , > : ;; ?.; 7 < ~. . . ,- ...-..,, . L . . ! ' 8 ., ' : I",". ., ,<..: ..a ':Y%Il f !-T.k3:33ri y::; ?I-)
APPENDIX A
METHODS OF ANALYSIS FOR TRANSIENT AND STEADY STATES
Available methcds of analysis are nearly a l l forms of frequen-
cy analysis. L i t t l e progress has been made with analysis procedures
in the time damah.
* A. 1 . Modified Fourier Series ~ ~ 1 o d
The Fourier series methcd a s s m s that the signal is periodic
and infini te in extent. One period of the signal may then be digi-
tised and a line spectrum c q u t e d consisting of hanmnics of the as-
sumed period. I f the signal is mildly aperiodic or is varying slawly
with time, as in a slowly growing or decaying transient, th i s method
rnay be approximated by assuming that each period of the transient may
be isolated and treated as i f it were part of an infini te train of
such periods.
Thus, it is assumed that the sound pressure as a function of
time can be represented by I , I
where &(t) is the amplitude of the rmth part ial a t tim t,
em(t) is the phase angle of the m t h part ial a t time t.
w 1 is the fundamntal frequency
M is the order of the highest part ial q t e d
The values of amplitude and phase so computed are assumed t o
exist a t the centre point of the chosen period. Sources of error
include the definition of the period boundaries, the assmption that
the time function is quasi-periodic and limitations of numerical in-
tegration. The method does not detect the presence of inhanmnic
tones or give a measure of backgrad noise.
This methcd has been used by Keeler (1 972a, b) for the deter-
mination of the growth in amplitude of the part ial tones of organ
pipes. As pointed out by Moorer (1975), when this method is used
For extracting phase and frequency information, the values may vary
APPENDIX A
L l ' ,
dismntinuously £ran one period to another, which gives rise to
problems i f the data is t o be used la ter for synthesis. In addi-
tion, Keeler made use of Sinrpson's rule rather than use direct sum-
mation which could introduce aliasing problems for sounds having
greater harrnonic developwnt than organ pipes.
A.2. Discrete Fourier Transform Method F r , a I Fr- (1967) used the follcwing function to represent a
musical note:
i t 2i, The function \(t) is a simple cascading of exponential attack functions
and W,(t) represents a frequency which changes by a discrete step wh,
at time t h,. represents q l i t u d e , t, onset time, a an at- . -? tack t i n e constant, @ , i n i t i a l phase angle of the m t h w n e n t , and
u ( t - t ,) is a step function dmse value changes £ran zero t o one a t
5:. In order to evaluate the parameters, tsm transforms are used. The
- > '. 1 .a D transform, defined by - 1
;" !.-, F , t
STL-QPSR 4/1980
Appendix A
{ } : - I
X(n) = A (n) sin nT [mu + 2nFm(n)] m (A71 I i . m= 1 * < , I ', -, ;
I where X(n) is the signal a t time nT, n is a tinre index, T is the time
between successive samples, w is the fundamental radian frequency of
the note, m is the partial number, Am(n) is the amplitude of partial
m a t time nT, M is the number of partials, and F,(n) is the frequency
I deviation of partial m a t t h nT. I i .
For the method t o function properly, it is necessary to estimate
the fundamental frequency of the waveform within a 2% deviation.
, Each hammnic is extracted in turn, and plots made of amplitude and
. r frequency as functions of tinre. The method is thus equivalent to a
- - - ' bank of f i l terp , -, T -- .*. d - \ -....-< - s
A.5 Sampled Analog Filter Wthod I
I Steady signals are frequently analysed with the aid of a set of
k t : : analog f i l t e r s (either 1/3 octave bandwidth or narrow band) whose
il- i 8 s outputs are scanned in series or in parallel. . i * ,J
1 , . For the analysis of transient signals, as in speech and music,
L A L * .
a parallel bank of f i l te rs is required whose outputs can be sampled -
c. .. - -Ti",:, a t short time intervals, a t least every 10 m s . Preferably, the band-
. *- .- .I:#- 2 width of the f i l t e r s should be variable. For certain analyses a pre-
+ -. '.* -' . emphasis netmrk may be required i f the mgitudes of important high - 3 ): f -
frequency components are low. I
A mdified form of this method is to record the output of a set ,.- , of f i l t e r s in turn on a transient waveform recorder. Although this
. -=A' procedure is slow, it is feasible to record a t least 256 sample points
I . 3,
for each 10 m s of the waveform. Subsequent signal processing is per- 'nc
fo- by an interfaced ccarrputer.
A. 6 Iong-Time-Average-Spectra (LTAS)
Jansson and Sundberg (1975) and Jansson (1976) describe a tech-
nique for deriving an average spectrum of sounds or sequences of 1
. . sounds that have a total duration of approximately 20 s. A bank of
i 6
23 f i l t e r s is used having bandwidths similar to those of the cr i t ical
i bapds of the ear. - - . -. ,. ..c. I
Appendix A
The output fram each filter is rectified, smoothed with a time
constant set to 13 ms, and converted to a logarithmic scale. The
logarithmic output £ram each filter is sampled at a frequency of 160
Hz, which means that several sampling points are obtained even of
tones as short as 50 ms duration. Thereafter, the sampled output
measures are quantized in steps of 1 dB, transfont& into p e r
measures, and added in storage cells in a cmputer, one cell for each
filter. Finally, the contents of every cell are normalized with r e s
pect to the analysing time, transformed into dB measures, and plotted
as function of frequency giving an LTAS. Thus an LTAS provides a
record of sound-poier band-level as function of frequmcy. The pro-
cedure allaws the results to be displayed w i t h essentially no delay
after the sampling is caplet&.
As a consequence of the procedure just described, the LTAS is
depndent on the fun-tal frequencies, the spectrum envelopes, and
the intensity levels contained in the sound analyzed. It is note-
mrthy, that tones of high sound pressure level will contribute mre
than tones of law, as the LTAS averages the sound-pressure amplitude
M e d times the duration of each frequency cmpnent.
The ccmputer program allows canparison to be made of any of the
stored LTAS records. For instance, a dissimilarity measure may be
ccanputed between t m given LTAS :es .
STL-QPSR 4/1980
APPENDIX B 4 - \
I .. ' ., , PREVIOUS ANALYSES OF TRANSIENTS A , - a . . I \.' \ There are relatively few publications available with detailed
analyses of musical notes. With the advert of d igi ta lmthods of . J'. *.?: analysis an increasing n&r of papers are appearing. In the fol-
lowing notes, investigations have been grouped according to tradi- , f i f t ional classes of instrument.
. ., . d l ; 1, b. ,Y !i ' t
I B. 1 Pipe Organ
-jr b a One of the ear l ies t investigations was that of Trendelenburg
'
(1 936) who masured the starting transients of a nwrber of organ I pipes by means of oscillograms of the build-up of sound in successive
one-octave bands. Nolle and Boner (1941) also examined starting tran-
-- i is ients, without using f i l t e r s , by photographing an oscilloscope dis-
play with a m i e camera. Deductions were made £ram the visual ap-
pearance of the oscillograms. For instance, it was observed that ,
with open diapason pipes, the second partial was in i t i a l ly dcaninant
during the starting transient. With bourdon pipes, the f i f t h par t ia l
daninated in i t ia l ly . Reed pipes start p q t l y with the fundamntal
prominent fram the outset. They also found that the time required
" \ & - , to reach steady speech required approximately the same nmber of .Z u
I cycles regardless of the pitch of the pipe. They confirrcred Trende-
lenburg's observation that inharmonic part ials may be present durinu
the starting transient state. I
Richardson (1 954) recognised the irrcportance of the starting tran-
sient with respect to i n s w t tone and conducted an investigation
of the i n i t i a l build-up of sound in organ pipes. The microphone out-
put was displayed on an oscilloscope, together with a timing signal,
which was *tographed by a w i n g film camera. A l s o photographed
was a record of pressure changes in the pipe foot. Portions of the
film, a t approximately 10 m s intervals, =re enlarged and subjected
to harmonic analysis (amplitude and relative phase). I
Caddy and Pollard (1957) investigated the effect of the playing
action on the starting transient of a principal pipe. The factors
controlling the starting transient and steady state *re delineated.
STL-QPSR 4/1980 89.
Sundberg (1966) made a thorough study of the physical behaviour
of organ pipes including an examination of starting transients. Oscil-
lograms are shown for principal and f lu te pipes in which the qrowth
of the fun-tal and of the upper part ials have been separated by
filtering.
Keeler (1979b) applied a mdified Fourier series method to the
cmputer analysis of growth curves for a nLrmber of f lute, principal
and reed pipes. W i t h t h i s method there is the need to identify each
cycle i n the sound emitted by the pipe, which is dif f icul t to do in
the i n i t i a l stages of the transient. In his published curves Keeler
shows plots of pressure amplitude versus time. When these are con-
verted into loudness plots, the relative roles of some of the partials
are altered. Fig. IV-C-B.l shows loudness plots for the f i r s t few
partials of a gedackt, principal and trumrpet pipe.
Fletcher (1976) has developed a theory for the onset of sound in
an or9an pipe which involves non-linear couplin9 between the a i r je t
system and the set of pipe n o m l modes. A sensitive factor i n devel-
oping a solution is the rate of r i se of pressure in the pipe foot.
In m y cases, the second part ial of the pipe is found to develop
mre rapidly than the fundamental. The frequencies of the upper par-
t i a l s are not in i t ia l ly harmonic being p r t l y determined by the reso-
nances of the pipe alone. In the steady s ta te a l l m d e s becm locked
into a hamronic relationship.
B.2 Piano and Harpsichord
Fletcher e t a1 (1962) examined the decay of sound in a grand
piano using a wave analyser t o isolate the part ial tones. The rapid
attack precluded any detailed analysis of the starting transients.
Weyer (1976, 1976/77) made a thoroug!~ examination of both the
time and frequency behaviour of the starting transients and decay of
various noted fram harpsichords, upright and grand pianos.
Alf redson and Steinke ( 1 978) used the Fourier transform mthod
to examine decay of a whole piano note but did not extend the
mthod to examine temporal dependence.
STL-QPSR 4/1980
Appendix B
B.3 Strings
Beauchamp (1974) describes the analysis of violin tones that
were recorded in an anechoic room and analysed by computer using a
d i f i e d Fourier series technique. The analysis provides fundament-
a l frequency, hanmnic amplitudes and h m n i c phases as a function
of time throughout the duration of each sound.
Grey (1975) describes the application of the heterodyne f i l t e r
method t o analyse variations in amplitude and frequency of each part ial
tone of a musical sound as a function of time. Examples are given for
a number of different sounds including that of a cello note. Fran the
original analyses, straight line segnaents are derived to show in as
simple a m e r as p s s i b l e the time history of each part ial tone. It
is found that each of the time functions can be satisfactorily re-
presented by betmen 4 and 8 line segments, as shown in Fiq. IV-C-B.2.
This technique significantly reduces the m u n t of data storaqe neces-
sary for la ter use i n synthesis of sounds.
B.4 Woodwinds
Grey (1975) also gives m examples of modwind tones reduced
to line segmnts. Fig. IT7<-R.3 shows notes of the sanae pitch fran
a clarinet, f lute and oboe.
Using the method described in section A.3 of t h i s report, Bariaux
e t a1 (1975) give an application of their method to measmement of the
time histories of the f i r s t and third part ials of a clarinet tone.
. B.5 Brass
Rissett and Mathews (1969) used a d i f i e d Fourier series mthcd
to analyse trumpet tones. Since they were interested in camparing
the original with synthesised tones, line segrmt functions were de-
rived for the time histories of each part ial tone. Factors arising
from the analysis that =re found t o be important were :
( 1 ) the spectrum envelope,
(2) the attack transient which las ts for approximately 20 m s wit!! faster build-up of lower order harmonics than higher-order ones.
(3) a high-rate, quasi-random frequency fluctuation.
It was also found that low-order h m n i c s have a longer decay.
( c l t
Fig. n"C-8.2. Heterodyne f i l t e r analysis of a Amplitude * ?J, ce l l ono t e ( ~ 2 , frequency 314 Hz). , !
d !. - * - Each par t i a l curve has been ap-
proximated by straight l ine seg- Time -, m t s (Grey, 1975) .
1 J r! k 0 .
l l
Frequency ' , , ,
Fig. IV+-E3.3. Heterodyne f i l t e r analysis of notes £ran (a) an aboe, (b) a f lu te , (c) a clar inet . Each note has the same pitch,
- (Grey, 1975) .
B.6 Percussion
There are few examples of the study of prcussion instruments,
probably because of the ccanplexity of the spectra which are often
mre noise-like than tone-like. Moorer (1977) aives an example of
the overall analysis of a bass drum note. Fletcher and Bassett (1 978) ' describe the analysis of sounds frcnn the bass drum using computer-
modelled band-pass f i l t e r s with 3 Hz bandwidths. The sounds were
analysed in 1 Hz steps f m 30-1000 Hz and frequency as a function
of time, peak sound pressure level and decay rate determined for each
major cmponent. -I *,
STL-QPSR 4/1980
Pressure - 3 __L___----------- -
TIME ( m s l r .
-.? . . i t
- ' . Fig. IVX-C.1. Line segment representation . .
of s tar t ing transient of a Gedackt C4 organ pipe. . .
Jansson, E.V. & Sundberg, J. (1975) : "Long-Time-Average-Spectra applied to analysis of music. Part I: Methcd and general applica- tions", Acustica - 34, pp. 15-19.
Keeler. J.S. (1972a): "Piecewise-periodic analysis of ahnst-periodic . . , . sounds & musical transients", IEEE Trans. Audio & Electroacoustics
AU-20, PE>. 338-344. , * - ) a - 7 1 , ..$ . A I 1 ~ 1 , ! >, , .
. ,. Keeler. J.S. (1972b): "The attack transients of safe organ pipes", IEEE Trans. Audio & ElectrQacoustics AU-20, pp. 3787,3$1. Land, E.H. (1977): "The retinex theory of color vision", Sci.Amer. 237, No. 6, pp. 108-128. -- Land, E.H. & McCann, J.J. (1971): "Lightness and retinex theory", J. @t. I30c.Am. 61, pp. 1-11. - - # " . ", -,
Lindbl~~t~, B. & Sundberg, J. (1970): "Towards a generative theory of melody", Swed. J. of Musicology 52, pp. 71-88. -
. . Luce, D. & Clark. M. (1965) : "Durations of attack transients on non- percussive orchestral instruments", J. Aaio Ehg. Soc. 13, pp. 194-199. -- e r r H. (1979): "Multidimnsional audio: Parts I and 2", J. Audio Ehg. Soc. 27, pp. 386-393; 27, pp. 496-502. - -- Mrer, J.A. (I 973) : "The heteroch'rtle m e t h d . of analysis of transient wavefom", Artificial Intelligence Lab., Stanford University.
Mrer, J.A. (1975): "On the secpentation and analysis of continuous musical sound by digital computer", Report No. STAN-M-3, Dept. of Music, Stanford University.
Mrer, J.A. (1977): "Signalprocessingaspctsof ccanputermusic: A SUTE~", Proc. IEEE 65, pp. 1108-1137. L .Ir j * u.t. I. -
' - Nolle, A.W. & Boner, C.F. (1941) : "The initial transients of organ pipes", J.Acoust.Soc.Am. 13, pp. 149-155. - Plcanp, R. (1976): Aspects of Tone Sensation, Academic Press, London, pp. 111-142.
Plomp, R. & Steeneken. H.J.M. (1973): "Place dependence of timbre in reverberant sound fields", Acustica 28, pp. 50-59. Rash, R.A. (1978): "The perception of si.multaneous notes such as in polyphonic music", Acustica 40, pp. 21-33. - Rash, R.A. ( 1 979) : "Synchronization in performed ensemble music", Acustica 43, pp. 121-131. - Richardson, E.G. (1 954) : "The transient tones of wind instruments" , J.Acoust.Soc.Am. 26, pp. 960-962. - Rissett, J.C. & Mathews, M.V. (1969) : "Analysis of musical instrurraent tones", Physics Today 22, pp. 23-30. -
Roedierer, J.G. (1975) : Introduction to the Physics and Psychophysics of Music, Springer-Verlag , New York, 2nd edition. ,, , Russel, I.J. & Sellick, P. (1978): "Intracellular studies of hair ,
, .- cells in the m l i a n cochlea", J. Physiol. 284, pp. 261-290. - ' Stevens, S.S. (1972): "Perceived level of noise by Mxrk VII ard
decibels (E) ", J.Acoust.Soc.Am. 51, pp. 575-601. - Sundberg , J. ( 1 96.6) : Mensurens betydelse i 6ppm labialpipor, Almqvist & Wiksell, Uppsala.
' Swdberg, J. (1 978) : "Synthesis of singing", Swed.J. of Musicology 60, - pp. 107-112.
"-' Terhardt , E . ( 1 974) : "On the wrception of periodic sound fluctuations (roughness)", Acustica 30, pp. 202-213. - Trendelenburg, F. (1 936) : "Beginning of sound in organ pipes", Z . f . tech.Physik 16, p. 513. - i 3
Wedin, L. ( 1 970) : "Dimensionsanalys av perception av instrumentklancf"' Inst. of Ps~c~o~o~Y, Stockholm University.
Weyer, R-D. (1976): "Time-frequency-structures in the attack transients of piano and harpsichord sounds - I", Acustica 35, pp. 232-252. - I
, . Weyer , R-D . ( 1 976/77) : " Time-varying amplitude-frequency-structures in the attack transients of piano and harpsichord sounds - 11", Acustica 36, pp. 241-258. - Zwicker, E. (1960): "Ein Verfahren zur Berechnung der LautslSrke", Acustica 10, pp. 304-308. - Zwicker , E. (1 977) : "Procedures for calculating loudness of temporally variable sounds", J.Acoust.Soc.Am. 62, pp. 675-682. - Zwicker, E. & Feldtkeller, R. (1967): Das Ohr als Nachrichtenenpf3ngerI S.Hirzel Verlag, Stuttcmk, 2nd ed.