Analysis and assessment of musical sounds · C. ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS H.F....

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

Analysis and assessment ofmusical sounds

Pollard, H. F. and Jansson, E. V.

journal: STL-QPSRvolume: 21number: 4year: 1980pages: 062-098

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

C. ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS

H . F . Po l l a rd* and E.V. Jansson

Abs t r ac t

A ccsnplete description of musical sounds must include the changes in spectrum w i t h time to describe the four major characteristics: duration, pitch, loudness and timbre as we11 as the microstructure within notes. To understand the perception of the musical sounds, the operating characteristics of the hearing process should be considered, such as the c r i t i ca l bands of hearing, masking effects, the t5-w resolution needed, tim constants for integration of signal paramters, and likely memory functions.

Iong-time-average-spectra of organ ranks are shown t o predict steady state paramters independent of starting transients, frequency range of a rank and its roughness. Sampled f i l t e r methods show that transients of different part ials dartinate during different portions of the total starting transient and that very high rate of changes may occur. By replacing the natural starting transients with expnen- t ia l ones, the averall effect is not to make the sound ccgnpletely different, it is a change in one quality factor of the tone, while lack of reverberation makes the tone sound unnatural.

The tri-stimulus method, employed in explaining the colour vision, is introduced as a method to analyse musical sounds. In this investigation the j3rdmental was used as the f i r s t coordinate, the folluwing three part ials resolved by the ear as the second, and the higher par- t i a l s as the tkird. The tri-stimulus graphs display differences behem ranks and between pipes of the same rank, which seems perceptually important.

In the following three appendices analysis methods are reviewd, results of previous analysis presented, and a synthesis prescription given to test the relevance of an analysis.

- . , 7. , ' 1 . I n t r o d u c t i o n ' .

Musical sounds form a special class that di f fer markedly in

structure f r m either pure tones or noise, the two extremes that are

usually considered by acousticians. Methods of analysis that have

been devised for pure tones or noise are often unsuited to the analysis

of a musical sound. In particular, changes that occur in the spectrum

of the sound with time must be described in same way.

In the time damain, a musical sound consists essentially of three

parts: (1) a starting transient in which camplicated interference

* guest researcher a t the Dept. of Speech Commmication, Febr.-June, 1 979. Address : School of Physics, University of New South Wales, Kensington, Australia

STL-QPSR 4 / 1 980 63.

processes occur between the free vibrations of the modes ccanprising

the instrument and the forced mtion imposed by the source of energy,

(2) a m r e or less steady region corresponding t o the forced mtion

of the instrument. (This 'steady s ta te ' sound usually consists of a

s e t of part ial tones that may or may not be related hanmnically, to-

gether with same background instrumntal noise. ) ( 3 ) The decay of

free vibrations of the mdes of the instrument when the forcing function

has been remved. The decay pattern is usually ccanplicated by inter-

ference between the instrument modes and roam modes. Under normal

music-making conditions, the roam modes predominate during the decay.

A major characteristic of musical sounds are the changes of state that

occur with time. It is rare t o encounter a sound in music whose descrip-

t ive p a r a t e r s are not varying in scm way with time.

A measurement system suited for analysis of musical tones must be

capable then of providing not only a time description of the sound but

also a detailed frequency-time analysis i n terms of a l l the important

spectral cmpnents. Furthennore, even the simplest piece of music

consists of a sequence of notes. Therefore, the analysis procedure

should be capable of dealing w i t h a t least a simple sequence, such as

a diatonic scale. ,

The present investigation aims towards such a physical analysis

procedure. But it is equally important that the analysis procedure is

adequate for the perception by our hearing. Therefore properties of

the hearing process are discussed in sarne detai l . Musical sounds have

certain signal properties and their perception determines which of

these properties are important. ..

The report is the work by one of the authors (HFP, on sabbatical

leave t o the Departmmt of Speech Carrarmnication, KTH) together with

the second author. The report contains tm main parts: one part re-

viewing previously used methods and investigations, and a second part

w i t h an analysis of organ tones done a t the Department of Speech Cam-

m ica t ion . F 'urthmre, simple gating and f i l t e r experiments are

described and a new method, "the mi-stimulus Method", is introduced

as a possible description for the timbre of musical instrurrrents.

2. Characteristics of a musical sound . - . y '-'7 I

1 There are four major characteristics that are used to describe

I a musical sound: (1) duration, which involves the relative imprtance I of the starting transient, the 'steady1 sound and the decay; (2) pitch,

" - Mch involves an assessment on a one-dimensional scale related to

' frequency; (3) loudness, which is a masure of the electric p e r

' transmitted by the auditory nerve (Howes, 1979); (4) timbre, which .ci!::'

is a multi-dinuansional entity. - . ,,. -

I + < ;, , ', - - . In addition to the major characteristics of the sound, there are

*?-'< 1 the dynamic fluctuations within each note (soarretimes called the micro-

structure) , examples of which include: (a) variations in the level of

the time envelope, (b) variations in both the level and frequency of

'" spectral canpnents, (c) the temprary appearance of hhammnic spectral r ' j;

canponents, and (d) the presence of noise caponents. All of these . + x i -

factors can be vital for a musical sound. . .. I

3. Perception of musical sounds I 1 ' When it is desired to study loudness relationships and soarre as-

pects of timbre, the operating characteristics of the hearing process

,. . should be considered. While there is detailed physiological and I psycho-physical information available concerning processes occurring

-i in the region of the cochlea, there is much less known concerning the

processing of neural impulses after they leave that region.

The basilar mgnbrane contains about 30,000 receptor hair cells

and behaves as a set of 24 band pass filters (critical bands) each con- -, taining approximately the same n&r of hair cells (about 1300 per 1

/ ' ' a - band) . The bandwidths of the critical bands are approximately 100 Hz r s ';. for frequencies below about 450 Hz and approximately one-third octave

1 ,' E for higher frequencies (Zwicker & Feldtkeller, 1967). The bandwidths

i- - " ' measured as pulsation threshold have, however, approximately half the

": above bandwidths '(~lcmlp, 1967, Ch. 1) . The maximum firing rate of ' neurons energised by the sound wave is 1000 per sec; the rate being

I

dependent on the intensity of the sound. Individual neurons respond

to an intensity range of 40 dB. The recovery time after each firing

is greater than 1 ms so that the minimum time required t o detect a

change in intensity is probably i n the range 2-10 ms. There is sane

agreement that a time constant of 100 ms is involved i n the aural as-

sessment of the loudness of a steady sound (Zwicker & Feldtkeller,

1967). Details concerning the subsequent auditory coding of neural

impulses arising fran the ear are still somewhat speculative

(Ibederer, 1975) . Thus, for loudness determination, the analyser should preferably

be the equivalent of a set of f i l t e r s with bandwidths approximately

the same as the c r i t i ca l bands of the ear, and with the added capa-

b i l i ty of providing an estimate of the loudness i n each band af ter

low pass f i l ter ing with a t ine constant of 100 ms. The f ir ing rate

implies that the f i l t e r s should be sampled a t least every 10 ms.

One characteristic of a steady sound is roughness (Terhardt, 1974).

A musical sound w i t h t m or m r e part ials w i t h i n the same cr i t i ca l

band w i l l give rise to a sensation of roughness. The partials within

the sarue band are not resolved but produce a beating effect which can

be unpleasant. With only one part ial in each occupied band, the music-

a l effect is one of m t h n e s s . The perceived roughness corresponds

to a lm-pass RC-filtering w i t h a 13 ms t ine constant.

A strong sound canpnent may make another catpnent impossible

to perceive. This phenmon is called masking. In the frequency do-

main a low tone may mask a high tone, while the opposite effect is

snall. Therefore, simultaneous masking can be modelled as a masking

threshold fal l ing fran the level of each band a t a rate of approximately

10 dB/Bark towads higher pitches (c.f . Zwicker & Feldtkeller, 1967,

Fig. 76.2). Masking p h e n m are also present i n the time dcanain.

A suitable labelling pre-masking and post-masking were introduced

in Zwicker (1977). The p r e l ~ s k i n g for loudness is short relative * ' > A

to that of post-masking.

The relative importance of the starting transient and the steady

state parts of a musical note have been studied by a n m h r of mrkers

(sunanary in Grey, 1 975) . The starting transient plays an important

the note together with an assessmnt routine for dynamic changes,

(2) a long-term m r y which muld enable the identification of the

instmment being played or the relationship of one note with another

in a s q k n c e . ;> . ' 1 1 .s?43t,i1 c y,. t l C , I

- . J

A further r e f i n m t that may be- possible a t a la ter date

,, is to use a system of analysis that follows more closely the audi-

tory coding processes used by the brain. These processes have yet

to be delineated with respect t o hearing but advantage may possibly .l

be taken of the progress that has been made in relation to optical - t I

coding. I

I + , . SUE i n i t i a l studies for this report are sumoarized in Appen-

A Q- dices. Thus the Appendix A presents methods previously used for I , c.

analysis of musical sounds, Appendix B same results of previous anal-

yses, and Appendix C methods of synthesis for finding out the im- -

t . portant sound parameters for the perception. I 4 . LTApmeasurements and f i l t e r i n g experiments

I " - Recordings were made of a diatonic scale (C4 t o Cg) played on

three different organ pipe ranks at Sphga Church, Stockholm, using

four different note durations (very short, 0.5 s, 2 s, continuous

diatonic scale). For each of the ranks of pipes recorded and each

note set, an LTAS was ca tp t ed (see Appendix A) . Fig. IV-C-1 a shows

the curves derived for a Rincipal 8' rank (short notes and conti-

nuous scale) , while similar curves are shown in Figs. IV-C-lb and 1c

for Gedackt 8' and Vox hum^ 8' ranks, respectively. There is l i t t l e

difference i n the shapes of the curves obtained for each rank, ex-

cept for the shortest notes played on the Principal 8'. In the lat-

ter case, the curve for the shortest tirrae shows a reduced level a t

low frequencies likely because the fundamental is slaw to develop. q^'"6-b: It is concluded that where the LTAS is not very sensitive to the char- . " : acter of the start?ing transients of the notes, the LTAS is a repre-

sentative average for the quasi-steady s ta te portion of the tones - i .e. , the physical parameter defining steady state timbre. I

STL-QPSR 4/1980 69.

I I I 1 I I I 1 I 1 I I

PITCH ( B A R K )

0 10 *;

20

PITCH ( B A R K )

0 10 20

PITCH ( B A R K )

Fig. IV-C-1. LTAS-curves for a diatonic scale (C4 to C5) on (a) Principal 8 ' rank of organ pipes, (b) a Geciackt 86 rank, (c) a Vox Humans 8 ' rank. The organ was a Marcusson in Sphga church, S ~ . o l m . The curves shown are for a set of very short duration notes (---) and for a continuous diatonic scale (-!. A pre-exphasis curve was used in the analysis.

I

Frm the LTAS:es i n Fig. IV-C-1 m can predict p s s i b l e rough-

ness sensations. The LTAS:es describe approximately the spectrum

envelope for different ccanplex tones and thus m can estimate the

upper frequency limits for the different musical tones. I

For the three organ pipes having pitch C4 (262 H z ) , the high

frequency limits are approximately: Principal C4, 4 kHz; Gedackt Cq,

3 kHz; Vox Humana C4, 8 kHz.

Principal and reed pipes as Vox Humana produce a f u l l set of

harmonic par t ia l tones and for a fundamental frequency of 262 Hz,

which is also the frequency difference betmen partials, t m or mre

partials w i l l f a l l within the s m cr i t i ca l band above 1.5 kHz. This

w i l l give rise to possible roughness, which is especially so for the

Vox H-, with strong part ials above 1.5 kHz.

Closed pipes produce only odd-nuhered partials. For instance,

for a Gedackt C4 pipe, only part ials above 3 kHz can contribute t o

roughness. A s noted above, there is little radiated sound above

th i s frequency. Listening to recorded tones reveals that the Ge-

dackt Cq sounds smooth, the Principal Cq sounds slightly rough,

and the Vox Humans Cq very rough. Low-pass f i l ter ing a t 1.5 kHz I rem3ves the roughness sensation, which c0nfin-w the previous con-

clusions regarding roughness. ? f

5 . Growth curve measurements and q a t i n g experiment I Steady signals are frequently analysed with the aid of a se t

of analog f i l t e r s (either 1/3 octave bandwidth or narrcw band) whose

outputs are scanned in series or parallel. For the analysis of

transient signals, as i n speech and music, a parallel bank of f i l t e r s

is required whose outputs can be sampled a t short time intervals, a t

least every 10 m s . The f i l t e r bandwidths may be variable or set t o

1/3 octave values. Digital f i l t e r s muld be an alternative but to

date they do not appear t o have been used in the analysis of musical

sounds. A pre-emphasis netmrk may be desirable i f the magnitudes

of important high frequency components are lm. The mighting net-

mrk can also be designed to be an approximation t o loudness might-

ing functions (Stevens, 1972; Zwicker & Feldtkeller, 1967). The

sampled f i l t e r outputs are stored in a ccarrputer for subsequent anal-

ysis and display.

- --- - - --

STL-QPSR 4 /I 980 71.

5.1 Sampled F i l t e r Methcd.

A set of 51 adjustable f i l t e r s was available for t h i s analysis.

The f i l t e r s e r e set to have a centre frequency every 100 Hz star t ing

£ram 100 Hz, the bandwidth was set to 125 H Z , smoothing f i l t e r time

constant 13 ms, sampling ra te 100 Hz. In addition, a pre-esrqhasis

ne-k was used (Fig. IV-C-2) that permitted the recording of great-

er amplitudes a t higher frequencies and also gave a weighting tha t was

an approximation to loudness weiqhting functions (Stevens, 1972;

Zwicker & Feldtkeller, 1967). I f required, unweighted f i l t e r respons-

es could easily be derived f r m which loudness values in sones could

be c q u t e d .

Figs. IV-C-3, IV-C-4, IV-C-5 show results obtained for a Gedackt

C4, Principal Cq, and Vox Humana C4 pipe, respectively (Sphga church

organ). In each case the weighted t o t a l level is given together w i t h

curves for scatre of the par t ia l tones (only the f i r s t six are shown

for the Principal and Vox Humana) . For loudness carputations the

f i r s t s ix par t ia ls are treated separately, a f t e r which the par t ia ls

are grouped since'mre than one of the upper par t ia ls w i l l f a l l into

each c r i t i c a l band.

One of the major effects produced by the s tar t ing transients of

sounds is to d r a w the attention of the l is tener t o the sound. It is

of interest therefore to plot the derivative of the sound level versus

t h e a s in Fig. IV-C-6, for the Gedackt C4. Fram inspection of such

graphs it is possible to ascertain which pa r t i a l tones are likely to

be dcaninant in gaining the attention of the ear. Such overall features

may be shown mre clearly by plotting only the dominant slopes in each

case, as shcwn i n Fig. IV-C-7, For instance, in the case of the Ge- I

dackt Cq pipe, the fundamental tone has its fas tes t r a te of change

between 0 and 11 ms, the f i f t h par t ia l betwen 11 and 18 ms, foil-

by the fundamental again fran 18 t o 42 ms.

For the Principal C4 pipe, the second par t i a l is d m h a n t be-

twen 0 and 23 ms, the 11th and 12th par t ia ls betwen 23 and 31 m s ,

the th i rd pa r t i a l betwen 31 and 48 ms, followed by the fundammtal.

C ..-.. ,.

. - : , *. ..a0 . - , ,.,I ' r F .T;;;.: - ... " * \ . . . . < , t F : ' " r , 5 . 'f. . . A

,u&c;' j f i $** L FREWENCY ( HZ 1

1 t' -* J,

Fig. IVX-2. (a) Pre-emphasis filter characteristic for sampled filter set.

. : i L - !b) mudness weighting function (Stevens, 1 972) , , 7-

: (c) Loudness weighting function (Zwicker , 1 967) . ' r- : * ->

* 1 . . F? i 50 1 1

GEDACKT C, +

. ,;.' a r . '-tx; 40 - )'-L , . . -> f :b (,f . , = - s f e i r

. \ F.: 5, ; -. 3, - , x ,.. : ..?. 3

, , Relative , . * ' Loudness I . -

. :; Level

\

\ '.A j 7

k . , . /--

- i t , $ j -.- . . . + TIME (rns ) . -. -- 1 .-.,A .. ,

- Fig. IV-C-3. Analysis of startina transient for a Gedackt Cd organ pipe. Partials 1-9.

TlME (ms)

,Fig. IVq-6. D e r i v a t i ~ of loudness level versus centre t i n e of 10 m s band for a Gedackt C4 organ pipe. The numbers indicate partial tones.

TIME (ms) r . .-,,-. : ' ,,-;y,: ..: -1 .-, . 7-,. . . . . .

Fig. IV-C-7. Derivative of loudness level versus centre time of 10 m s band for (a) Gedackt Cq, (b) Principal Cq, (c) Vox Humans Cq organ pipes. The par t ia l nuher is marked against each graph segmnt.

STL-QPSR 4/1980

In the case of the Vox Humana, the fundamental is daninant

£ram 0 to 17 ms, the third part ial from 17 t o 32 ms, the 11-13th

partials £ram 32 to 43 ms, etc. While such graphs do not constitute

the whole picture regarding the aural assessment of transients, they

are a t least useful in revealing the partials which have the great-

est rate of change of loudness a t each interval of time. It is in-

teresting to note the high rates of change involved, in saw cases

w e r 2000 dB/s.

The above analysis implies that there is an analog of the I

effect which is applicable t o rates of change of signal level.

It muld be of interest to conduct further psycho-acoustic tests re-

lating to this supposition.

5.2 Gating Experiments $

Effects of three kinds of gatings were tried. In the f i r s t one

modifications of the starting transients were investigated. The

transients were cut w i t h an exponential fas t gate to avoid click

sounds. The in i t i a l portions of tones were cut off i n steps froan 50

to 300 ms. The effects of the cuttings varied fran small t o con-

siderable (especially for the Vox Humana) but did i n no case make

the sound ccanpletely different. The overall effect is to change one , I

quality factor of the tone. -

In the second series the effect of cutting away the roam re-

verberation was tried. W i t h the previously used gating system, tones

w i t h and without reverberation were carpared. Lack of reverberation

makes the tone sound unnatural. In the third series t m tones were

added w i t h the start of one delayed relatively t o the second.* The

delays t r ied were 0 , i 25, i 50 ms. The effect of this delay was

fai r ly small even w i t h the 50 ms delays although the difference in

starting transients were perceived in these cases. A 50 ms time de-

lay corresponds t o 16 m difference in distance t o the pipes, i .e. a

fa i r ly large distance difference :seldom encountered by a listener.

5.3 Fourier Transform Method

Similar information to that provided by the sampled f i l t e r mthod

m y be obtained by digitising and storing the signal and then using a

*. . m e additions were i m p l m t e d by Ake Olofsson on the ccsnputer of the Department of Technical Audiology, Karolinska Institutet. 1

fas t Fourier transform t o cosnpute the spectrum of selected segmnts

of the signal. I f a sampling rate of 25 kHz is used, then each 10

I m s slice of the signal w i l l contain 256 points and a corresponding

I frequency spectrum is obtained w i t h a frequency resolution of 100 1 Hz. Problems that arise w i t h this mthd include the definition of

the start ing point of the analysis and the choice of a suitable - i winduw function. Sane werlap of time sl ices is desirable - a t least

50% is often used. This method needs further dwelopmnt before it

can cclmpete w i t h the sampled f i l t e r mthcd for convenience of opera-

, tion.

6 . Tri - s t imulus Method

A basic problem in devising a satisfactory system of measure-

ment for transient sounds is to relate the acoustic procedures t o

those used by the ear and brain. Unfortunately, there is l i t t l e

detailed information available concerning the processing of neural

, impulses af ter they leave the region of the cochlea. This situation

is in contrast w i t h optical research where there is considerable in-

. . formtion available concerning the coding of optical signals. Des-

: pi te the enomus n&r of receptors in the retina, colour vision

depends on only three different types of cone receptors. Scune of the

possible coding operations that occur betmen the retina and the vi-

sual cortex have been discussed by Land (1977). - : I F .. , Sarrre clues as t o possible auditory coding procedures cclme froan

psychoacci s t i c experiments relating to the timbre of musical sounds.

3, - Grey (1977) describes experimnts on a set of 16 music instrumnt

tones i n which perceptual similarities were determined for a l l pairs

of stimuli. Application of multidimensional analysis sh- that

three dimensions were sufficient to give a satisfactory representa-

tion of the data (see also Section 3). Plcanp and S-eken (1973)

describe an experiment in which the relation between timbre and

sound spectrum for selected steady sounds was studied a t various

pints in a diffuse sound field. A dissimilarity matrix was con-

structed which showed that the ten test stimuli could be represented

satisfactorily by a three-dimensional space. Thus present tentative

information, such as that provided by the above experiments, suggests

that the f inal assessment of auditory siqnals is in terms of a

-11 nunher of derived parameters. Whether the number of such

parameters can be as small as three w i l l depend on the outcame of

m r e extensive experimentation. On a related topic, Mller (1979)

finds that six nvtasurement parameters are needed t o obtain reason-

able correlation beheen objective and subjective descriptions of

high f ideli ty audio systems.

Time variations of amplitude and frequency are important in

the perception of sounds. Associated with these effects are me-

mory processes, both short-term and long-term. A t present there

is no satisfactory method for incorporating such processes into

acoustical measurement procedures.

A s a starting point for the develo-t of a simplified mew

for representing musical sounds, it was decided to use a 3-coordi-

nate methcd based on the spectral information derived f m the

sampled f i l t e r set. This method produces the level of each part ial

as a function of t i m e . The partials are groupd i f m r e than one

f a l l s within the sarrre cr i t i ca l band. A choice must nuw be made of

the three spectral regions that are t o represent the three coordi-

nates. It was decided to use the fun-ntal as one coordinate. I I

A choice must also be made of the unit in which to express the

f i l t e r outputs. Since a nmber of arithmetic operations are to be

made, it was decided to convert the f i l t e r outputs into loudness units

in sones. I f the unweighted output levels are used, it is convenient

to ccsnpute loudness values using Stevens (1972) Mark VII method.

When wei'*ted values were used, it was assumed that these values

were the equivalent of PLdB values f r m which sones were easily ob-

tained using the table given in Stevens ( 1 972) . A s a f i r s t approximation, the output of each band may be con-

sidered to be independent, so that the total. loudness of the sound

may be expressed as

. ?- - r " . I c . . . : . . I f an analogy is drawn w i t h the optical tri-stimulus method, each

texm on the right-hand side of Eq. (1) may be regarded as a tri-stim-

ulus value f r m which a se t of noml ised coordinates may be found:

" . . i a ., tsi; !. Frcm these coordinates, it is now only necessary t o select

- L : w . Ism in order t o d r a w a graph since x + y + z = 1. Fig. IV-C-8

shows a graph of x versus y with the significance of the main areas

, indicated. I

- : . I . < Fig. IV-C-9 shm a tri-stimulus graph for the behaviour of 3,

the start ing transients of the following organ pipe sounds: .-

I

1 (a) Gedackt 8 ' Cq, (b) Principal 8' C4, and (c) Vox Humans 8' Cq.

I -2

Tirrres measured f m the onset of the sound are shown beside each , i ) z:;

curve. The f inal pint in each case corresponds w i t h the steady <. 3 i sound. A nurmber of interesting features may be observed. It is

M i a t e l y apparent how the spectral character of the sound changes .; '1 3: . ]> .'

betmen cr~set and steady state. The Gedackt C4 has stronq high

part ials in i t i a l ly and then progresses t o a mre fun-tal type -*31 of tone. The Principal C4 starts with strong high partials and

a then progresses t o an evenly balanced sound. The Vox Humana Cq I starts with a predominance of lower part ials and then mes tawards

. a predminance of higher partials. I t 3 ; 4 ; ; - : > -

This methcd may be used to investigate the transient beha-

vioux of different pipes within the same rank of pipes. For in- I

stance, Fig. I17-C-10 shows noml ised tri-stimulus plots for three I

notes of the Gedackt 8' rank, while Fig. N-C-11 and 12 show plots

for the same three notes of the Principal 8' rank and Vox Humans 8'

rank, respectively. While the Gedackt notes show a similarity of

behaviour, the Principal aFti! YOx H u m n a notes show marked differ-

ences i n transient behaviour, although the steady state values are

A d

Fig. IV-C-8. Acoustic tri-stimulus diagram.

Fig. IV-C-9. Acoustic t r i - s t imlus diagram shaving transient behaviour of a Gedackt C4, Principal C4 and Vox Humana C4 organ pipe. The nunhers indicate I

t i m e in milliseconds a f t e r onset of sound. The larger circles correspond to the steady state. '

Gedackt 8'

F . 1 0 Tri-stimulus diagram s ~ u the transient behaviour of three Gedackt 8 organ pips.

0.5 1.0 x-

.>

'\

Fia. IV-C-11. Tri-stimulus diagram show in^ the transient behaviour of three

. Principal 8' organ pipes.

\ \

T -- - I '3 -.. -

0.5 1.0 x-

i . - 1 2. Tri-stimulus diagram showing the transient behaviour of three Vox H i m a n a 8 ' organ pipes.

. .

STL-QPSR 4/1980

in approximately the same area of the diagram. In listening t o the

recorded sounds, the Principal E4 pipe had a slowly developing funda-

mental while the Principal Gq pipe had a prominent high-frequency

'chiff ' i n the starting transient. The Vox Humans E4 and Q4 notes

have a similar starting sound whereas the C4 note is markedly dif-

ferent.

7 . Discussion

&thods are now being evolved for the analysis of ccanplex sounds,

such as the starting transients of musical sounds, that take account

of the known characteristics of the hearing process. Factors that

are relevant t o such a system of analysis include (a) an i n i t i a l fre-

quency analysis related to the c r i t i ca l bandwidths of the ear, (b)

a knowledge of the response times of the hearing mechanism, (c) mask-

ing effects, (dl the possible operation of the precedence effect, and

(e) m r y processes.

In our experiments w have shown that long-time-average-spectra

give a representation of the "steady-state" properties of tones f m

an instrument. W e have shown that the properties of starting tran-

sients are suitable to analysis in sampled parallel f i l t e r bands and

a new method "the tri-stimulus method" has been introduced as a I

simple way t o s m i s e these propr t ies .

Not a l l of the factors (a) t o (e) above are sufficiently well

defined t o be incorporated in a system of masurement (see Appendices

A and B ) . For instance, there is a vagueness in the literature con-

cerning the minimum time required t o register a change in loudness

or of timbre. A knowledge of such times muld be important in defin-

ing the frequency of sampling and averaging of time-dependent tran-

sients. There have been few a t t e r s t o incorporate memory functions

into analysis procedures despite their importance in subjective eva-

luation of transient sounds.

Detailed information concerning auditory coding of signals is

not yet available. Eventually th i s type of information could lead

to marked changes in analysis procedures. A satisfactory method of

analysis should be applicable to sequences of musical notes since the

I analysis of isolated notes is of little interest outside the labo-

ratory. A critical test of any analysis procedm is to use the

analysis data as the basis for canparison by synthesis (c.f. Appen-

! . . Z i , , ' > ; ;,:,y,:

! . . " ' , j , > : ;; ?.; 7 < ~. . . ,- ...-..,, . L . . ! ' 8 ., ' : I",". ., ,<..: ..a ':Y%Il f !-T.k3:33ri y::; ?I-)

APPENDIX A

METHODS OF ANALYSIS FOR TRANSIENT AND STEADY STATES

Available methcds of analysis are nearly a l l forms of frequen-

cy analysis. L i t t l e progress has been made with analysis procedures

in the time damah.

* A. 1 . Modified Fourier Series ~ ~ 1 o d

The Fourier series methcd a s s m s that the signal is periodic

and infini te in extent. One period of the signal may then be digi-

tised and a line spectrum c q u t e d consisting of hanmnics of the as-

sumed period. I f the signal is mildly aperiodic or is varying slawly

with time, as in a slowly growing or decaying transient, th i s method

rnay be approximated by assuming that each period of the transient may

be isolated and treated as i f it were part of an infini te train of

such periods.

Thus, it is assumed that the sound pressure as a function of

time can be represented by I , I

where &(t) is the amplitude of the rmth part ial a t tim t,

em(t) is the phase angle of the m t h part ial a t time t.

w 1 is the fundamntal frequency

M is the order of the highest part ial q t e d

The values of amplitude and phase so computed are assumed t o

exist a t the centre point of the chosen period. Sources of error

include the definition of the period boundaries, the assmption that

the time function is quasi-periodic and limitations of numerical in-

tegration. The method does not detect the presence of inhanmnic

tones or give a measure of backgrad noise.

This methcd has been used by Keeler (1 972a, b) for the deter-

mination of the growth in amplitude of the part ial tones of organ

pipes. As pointed out by Moorer (1975), when this method is used

For extracting phase and frequency information, the values may vary

APPENDIX A

L l ' ,

dismntinuously £ran one period to another, which gives rise to

problems i f the data is t o be used la ter for synthesis. In addi-

tion, Keeler made use of Sinrpson's rule rather than use direct sum-

mation which could introduce aliasing problems for sounds having

greater harrnonic developwnt than organ pipes.

A.2. Discrete Fourier Transform Method F r , a I Fr- (1967) used the follcwing function to represent a

musical note:

i t 2i, The function \(t) is a simple cascading of exponential attack functions

and W,(t) represents a frequency which changes by a discrete step wh,

at time t h,. represents q l i t u d e , t, onset time, a an at- . -? tack t i n e constant, @ , i n i t i a l phase angle of the m t h w n e n t , and

u ( t - t ,) is a step function dmse value changes £ran zero t o one a t

5:. In order to evaluate the parameters, tsm transforms are used. The

- > '. 1 .a D transform, defined by - 1

;" !.-, F , t

STL-QPSR 4/1980

Appendix A

{ } : - I

X(n) = A (n) sin nT [mu + 2nFm(n)] m (A71 I i . m= 1 * < , I ', -, ;

I where X(n) is the signal a t time nT, n is a tinre index, T is the time

between successive samples, w is the fundamental radian frequency of

the note, m is the partial number, Am(n) is the amplitude of partial

m a t time nT, M is the number of partials, and F,(n) is the frequency

I deviation of partial m a t t h nT. I i .

For the method t o function properly, it is necessary to estimate

the fundamental frequency of the waveform within a 2% deviation.

, Each hammnic is extracted in turn, and plots made of amplitude and

. r frequency as functions of tinre. The method is thus equivalent to a

- - - ' bank of f i l terp , -, T -- .*. d - \ -....-< - s

A.5 Sampled Analog Filter Wthod I

I Steady signals are frequently analysed with the aid of a set of

k t : : analog f i l t e r s (either 1/3 octave bandwidth or narrow band) whose

il- i 8 s outputs are scanned in series or in parallel. . i * ,J

1 , . For the analysis of transient signals, as in speech and music,

L A L * .

a parallel bank of f i l te rs is required whose outputs can be sampled -

c. .. - -Ti",:, a t short time intervals, a t least every 10 m s . Preferably, the band-

. *- .- .I:#- 2 width of the f i l t e r s should be variable. For certain analyses a pre-

+ -. '.* -' . emphasis netmrk may be required i f the mgitudes of important high - 3 ): f -

frequency components are low. I

A mdified form of this method is to record the output of a set ,.- , of f i l t e r s in turn on a transient waveform recorder. Although this

. -=A' procedure is slow, it is feasible to record a t least 256 sample points

I . 3,

for each 10 m s of the waveform. Subsequent signal processing is per- 'nc

fo- by an interfaced ccarrputer.

A. 6 Iong-Time-Average-Spectra (LTAS)

Jansson and Sundberg (1975) and Jansson (1976) describe a tech-

nique for deriving an average spectrum of sounds or sequences of 1

. . sounds that have a total duration of approximately 20 s. A bank of

i 6

23 f i l t e r s is used having bandwidths similar to those of the cr i t ical

i bapds of the ear. - - . -. ,. ..c. I

Appendix A

The output fram each filter is rectified, smoothed with a time

constant set to 13 ms, and converted to a logarithmic scale. The

logarithmic output £ram each filter is sampled at a frequency of 160

Hz, which means that several sampling points are obtained even of

tones as short as 50 ms duration. Thereafter, the sampled output

measures are quantized in steps of 1 dB, transfont& into p e r

measures, and added in storage cells in a cmputer, one cell for each

filter. Finally, the contents of every cell are normalized with r e s

pect to the analysing time, transformed into dB measures, and plotted

as function of frequency giving an LTAS. Thus an LTAS provides a

record of sound-poier band-level as function of frequmcy. The pro-

cedure allaws the results to be displayed w i t h essentially no delay

after the sampling is caplet&.

As a consequence of the procedure just described, the LTAS is

depndent on the fun-tal frequencies, the spectrum envelopes, and

the intensity levels contained in the sound analyzed. It is note-

mrthy, that tones of high sound pressure level will contribute mre

than tones of law, as the LTAS averages the sound-pressure amplitude

M e d times the duration of each frequency cmpnent.

The ccmputer program allows canparison to be made of any of the

stored LTAS records. For instance, a dissimilarity measure may be

ccanputed between t m given LTAS :es .

STL-QPSR 4/1980

APPENDIX B 4 - \

I .. ' ., , PREVIOUS ANALYSES OF TRANSIENTS A , - a . . I \.' \ There are relatively few publications available with detailed

analyses of musical notes. With the advert of d igi ta lmthods of . J'. *.?: analysis an increasing n&r of papers are appearing. In the fol-

lowing notes, investigations have been grouped according to tradi- , f i f t ional classes of instrument.

. ., . d l ; 1, b. ,Y !i ' t

I B. 1 Pipe Organ

-jr b a One of the ear l ies t investigations was that of Trendelenburg

'

(1 936) who masured the starting transients of a nwrber of organ I pipes by means of oscillograms of the build-up of sound in successive

one-octave bands. Nolle and Boner (1941) also examined starting tran-

-- i is ients, without using f i l t e r s , by photographing an oscilloscope dis-

play with a m i e camera. Deductions were made £ram the visual ap-

pearance of the oscillograms. For instance, it was observed that ,

with open diapason pipes, the second partial was in i t i a l ly dcaninant

during the starting transient. With bourdon pipes, the f i f t h par t ia l

daninated in i t ia l ly . Reed pipes start p q t l y with the fundamntal

prominent fram the outset. They also found that the time required

" \ & - , to reach steady speech required approximately the same nmber of .Z u

I cycles regardless of the pitch of the pipe. They confirrcred Trende-

lenburg's observation that inharmonic part ials may be present durinu

the starting transient state. I

Richardson (1 954) recognised the irrcportance of the starting tran-

sient with respect to i n s w t tone and conducted an investigation

of the i n i t i a l build-up of sound in organ pipes. The microphone out-

put was displayed on an oscilloscope, together with a timing signal,

which was *tographed by a w i n g film camera. A l s o photographed

was a record of pressure changes in the pipe foot. Portions of the

film, a t approximately 10 m s intervals, =re enlarged and subjected

to harmonic analysis (amplitude and relative phase). I

Caddy and Pollard (1957) investigated the effect of the playing

action on the starting transient of a principal pipe. The factors

controlling the starting transient and steady state *re delineated.

STL-QPSR 4/1980 89.

Sundberg (1966) made a thorough study of the physical behaviour

of organ pipes including an examination of starting transients. Oscil-

lograms are shown for principal and f lu te pipes in which the qrowth

of the fun-tal and of the upper part ials have been separated by

filtering.

Keeler (1979b) applied a mdified Fourier series method to the

cmputer analysis of growth curves for a nLrmber of f lute, principal

and reed pipes. W i t h t h i s method there is the need to identify each

cycle i n the sound emitted by the pipe, which is dif f icul t to do in

the i n i t i a l stages of the transient. In his published curves Keeler

shows plots of pressure amplitude versus time. When these are con-

verted into loudness plots, the relative roles of some of the partials

are altered. Fig. IV-C-B.l shows loudness plots for the f i r s t few

partials of a gedackt, principal and trumrpet pipe.

Fletcher (1976) has developed a theory for the onset of sound in

an or9an pipe which involves non-linear couplin9 between the a i r je t

system and the set of pipe n o m l modes. A sensitive factor i n devel-

oping a solution is the rate of r i se of pressure in the pipe foot.

In m y cases, the second part ial of the pipe is found to develop

mre rapidly than the fundamental. The frequencies of the upper par-

t i a l s are not in i t ia l ly harmonic being p r t l y determined by the reso-

nances of the pipe alone. In the steady s ta te a l l m d e s becm locked

into a hamronic relationship.

B.2 Piano and Harpsichord

Fletcher e t a1 (1962) examined the decay of sound in a grand

piano using a wave analyser t o isolate the part ial tones. The rapid

attack precluded any detailed analysis of the starting transients.

Weyer (1976, 1976/77) made a thoroug!~ examination of both the

time and frequency behaviour of the starting transients and decay of

various noted fram harpsichords, upright and grand pianos.

Alf redson and Steinke ( 1 978) used the Fourier transform mthod

to examine decay of a whole piano note but did not extend the

mthod to examine temporal dependence.

STL-QPSR 4/1980

Appendix B

B.3 Strings

Beauchamp (1974) describes the analysis of violin tones that

were recorded in an anechoic room and analysed by computer using a

d i f i e d Fourier series technique. The analysis provides fundament-

a l frequency, hanmnic amplitudes and h m n i c phases as a function

of time throughout the duration of each sound.

Grey (1975) describes the application of the heterodyne f i l t e r

method t o analyse variations in amplitude and frequency of each part ial

tone of a musical sound as a function of time. Examples are given for

a number of different sounds including that of a cello note. Fran the

original analyses, straight line segnaents are derived to show in as

simple a m e r as p s s i b l e the time history of each part ial tone. It

is found that each of the time functions can be satisfactorily re-

presented by betmen 4 and 8 line segments, as shown in Fiq. IV-C-B.2.

This technique significantly reduces the m u n t of data storaqe neces-

sary for la ter use i n synthesis of sounds.

B.4 Woodwinds

Grey (1975) also gives m examples of modwind tones reduced

to line segmnts. Fig. IT7<-R.3 shows notes of the sanae pitch fran

a clarinet, f lute and oboe.

Using the method described in section A.3 of t h i s report, Bariaux

e t a1 (1975) give an application of their method to measmement of the

time histories of the f i r s t and third part ials of a clarinet tone.

. B.5 Brass

Rissett and Mathews (1969) used a d i f i e d Fourier series mthcd

to analyse trumpet tones. Since they were interested in camparing

the original with synthesised tones, line segrmt functions were de-

rived for the time histories of each part ial tone. Factors arising

from the analysis that =re found t o be important were :

( 1 ) the spectrum envelope,

(2) the attack transient which las ts for approximately 20 m s wit!! faster build-up of lower order harmonics than higher-order ones.

(3) a high-rate, quasi-random frequency fluctuation.

It was also found that low-order h m n i c s have a longer decay.

( c l t

Fig. n"C-8.2. Heterodyne f i l t e r analysis of a Amplitude * ?J, ce l l ono t e ( ~ 2 , frequency 314 Hz). , !

d !. - * - Each par t i a l curve has been ap-

proximated by straight l ine seg- Time -, m t s (Grey, 1975) .

1 J r! k 0 .

l l

Frequency ' , , ,

Fig. IV+-E3.3. Heterodyne f i l t e r analysis of notes £ran (a) an aboe, (b) a f lu te , (c) a clar inet . Each note has the same pitch,

- (Grey, 1975) .

B.6 Percussion

There are few examples of the study of prcussion instruments,

probably because of the ccanplexity of the spectra which are often

mre noise-like than tone-like. Moorer (1977) aives an example of

the overall analysis of a bass drum note. Fletcher and Bassett (1 978) ' describe the analysis of sounds frcnn the bass drum using computer-

modelled band-pass f i l t e r s with 3 Hz bandwidths. The sounds were

analysed in 1 Hz steps f m 30-1000 Hz and frequency as a function

of time, peak sound pressure level and decay rate determined for each

major cmponent. -I *,

STL-QPSR 4/1980

Pressure - 3 __L___----------- -

TIME ( m s l r .

-.? . . i t

- ' . Fig. IVX-C.1. Line segment representation . .

of s tar t ing transient of a Gedackt C4 organ pipe. . .

Jansson, E.V. & Sundberg, J. (1975) : "Long-Time-Average-Spectra applied to analysis of music. Part I: Methcd and general applica- tions", Acustica - 34, pp. 15-19.

Keeler. J.S. (1972a): "Piecewise-periodic analysis of ahnst-periodic . . , . sounds & musical transients", IEEE Trans. Audio & Electroacoustics

AU-20, PE>. 338-344. , * - ) a - 7 1 , ..$ . A I 1 ~ 1 , ! >, , .

. ,. Keeler. J.S. (1972b): "The attack transients of safe organ pipes", IEEE Trans. Audio & ElectrQacoustics AU-20, pp. 3787,3$1. Land, E.H. (1977): "The retinex theory of color vision", Sci.Amer. 237, No. 6, pp. 108-128. -- Land, E.H. & McCann, J.J. (1971): "Lightness and retinex theory", J. @t. I30c.Am. 61, pp. 1-11. - - # " . ", -,

Lindbl~~t~, B. & Sundberg, J. (1970): "Towards a generative theory of melody", Swed. J. of Musicology 52, pp. 71-88. -

. . Luce, D. & Clark. M. (1965) : "Durations of attack transients on non- percussive orchestral instruments", J. Aaio Ehg. Soc. 13, pp. 194-199. -- e r r H. (1979): "Multidimnsional audio: Parts I and 2", J. Audio Ehg. Soc. 27, pp. 386-393; 27, pp. 496-502. - -- Mrer, J.A. (I 973) : "The heteroch'rtle m e t h d . of analysis of transient wavefom", Artificial Intelligence Lab., Stanford University.

Mrer, J.A. (1975): "On the secpentation and analysis of continuous musical sound by digital computer", Report No. STAN-M-3, Dept. of Music, Stanford University.

Mrer, J.A. (1977): "Signalprocessingaspctsof ccanputermusic: A SUTE~", Proc. IEEE 65, pp. 1108-1137. L .Ir j * u.t. I. -

' - Nolle, A.W. & Boner, C.F. (1941) : "The initial transients of organ pipes", J.Acoust.Soc.Am. 13, pp. 149-155. - Plcanp, R. (1976): Aspects of Tone Sensation, Academic Press, London, pp. 111-142.

Plomp, R. & Steeneken. H.J.M. (1973): "Place dependence of timbre in reverberant sound fields", Acustica 28, pp. 50-59. Rash, R.A. (1978): "The perception of si.multaneous notes such as in polyphonic music", Acustica 40, pp. 21-33. - Rash, R.A. ( 1 979) : "Synchronization in performed ensemble music", Acustica 43, pp. 121-131. - Richardson, E.G. (1 954) : "The transient tones of wind instruments" , J.Acoust.Soc.Am. 26, pp. 960-962. - Rissett, J.C. & Mathews, M.V. (1969) : "Analysis of musical instrurraent tones", Physics Today 22, pp. 23-30. -

Roedierer, J.G. (1975) : Introduction to the Physics and Psychophysics of Music, Springer-Verlag , New York, 2nd edition. ,, , Russel, I.J. & Sellick, P. (1978): "Intracellular studies of hair ,

, .- cells in the m l i a n cochlea", J. Physiol. 284, pp. 261-290. - ' Stevens, S.S. (1972): "Perceived level of noise by Mxrk VII ard

decibels (E) ", J.Acoust.Soc.Am. 51, pp. 575-601. - Sundberg , J. ( 1 96.6) : Mensurens betydelse i 6ppm labialpipor, Almqvist & Wiksell, Uppsala.

' Swdberg, J. (1 978) : "Synthesis of singing", Swed.J. of Musicology 60, - pp. 107-112.

"-' Terhardt , E . ( 1 974) : "On the wrception of periodic sound fluctuations (roughness)", Acustica 30, pp. 202-213. - Trendelenburg, F. (1 936) : "Beginning of sound in organ pipes", Z . f . tech.Physik 16, p. 513. - i 3

Wedin, L. ( 1 970) : "Dimensionsanalys av perception av instrumentklancf"' Inst. of Ps~c~o~o~Y, Stockholm University.

Weyer, R-D. (1976): "Time-frequency-structures in the attack transients of piano and harpsichord sounds - I", Acustica 35, pp. 232-252. - I

, . Weyer , R-D . ( 1 976/77) : " Time-varying amplitude-frequency-structures in the attack transients of piano and harpsichord sounds - 11", Acustica 36, pp. 241-258. - Zwicker, E. (1960): "Ein Verfahren zur Berechnung der LautslSrke", Acustica 10, pp. 304-308. - Zwicker , E. (1 977) : "Procedures for calculating loudness of temporally variable sounds", J.Acoust.Soc.Am. 62, pp. 675-682. - Zwicker, E. & Feldtkeller, R. (1967): Das Ohr als Nachrichtenenpf3ngerI S.Hirzel Verlag, Stuttcmk, 2nd ed.

Analysis and assessment of musical sounds · C. ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS H.F....

Documents

Transcript of Analysis and assessment of musical sounds · C. ANALYSIS AND ASSESSMENT OF MUSICAL SOUNDS H.F....