Vibrato and vowel identification - Royal Institute of ... STL-QPSR 2-3/1975 Introduction Vibrato

download Vibrato and vowel identification - Royal Institute of ... STL-QPSR 2-3/1975 Introduction Vibrato

of 18

  • date post

    19-Aug-2018
  • Category

    Documents

  • view

    215
  • download

    0

Embed Size (px)

Transcript of Vibrato and vowel identification - Royal Institute of ... STL-QPSR 2-3/1975 Introduction Vibrato

  • Dept. for Speech, Music and Hearing

    Quarterly Progress andStatus Report

    Vibrato and vowelidentification

    Sundberg, J.

    journal: STL-QPSRvolume: 16number: 2-3year: 1975pages: 049-060

    http://www.speech.kth.se/qpsr

    http://www.speech.kth.sehttp://www.speech.kth.se/qpsr

  • STL-QPSR 2-3/1975

    111. MUSICAL ACOUSTICS

    A. VIBRATO AND VOWEL IDENTIFICATION

    J. Sundberg

    Abstract

    The influence of vibrato on vowel identification i s studied in synthesized vowel sounds with fundamental frequencies between 300 and 1000 Hz. Phonetically trained subjects were asked to identify these stimuli presented with and without vibrato a s any of 12 Swedish long vowels. Small and occasional effects a r e ob- served on the mean formant frequencies and the scatter between the responses.

    However, the effects a r e greater and more frequent$haai would be expected by chance. In a majority of the cases, where the vibrato affected the responses, the vowel identification became somewhat harder when the stimulus was presented with vibrato.

  • STL-QPSR 2-3/1975

    Introduction

    Vibrato occurs in music performed on many types of instruments,

    such a s s t r ing and wind instruments and i n singing. Physically it cor - responds to an almost sinusoidal modulatio~t of the fundamental frequency.

    Provided that the r a t e and extent of the vibrato i s kept within certain l im-

    i t s , the vibrato i s generally considered a useful means of musical ex-

    pression in a variety of musical styles. I

    Acoustic character is t ics typical of music sounds can be assumed to

    serve some purpose in musical communication. One commonly heard

    hypothesis i s that the vibrato covers up small e r r o r s in the fundamental

    frequency. In a previous investigation no support was found for this hypo-

    thesis a s fa r a s the pitch of single tones - and thus not chords - i s con- cerned (Sundberg 197 2). In the present investigation another commonly

    heard hypothesis will be tested, namely that the vibrato facil i tates vowel

    identification. This hypothesis seems plausible, because as the part ia ls

    move in frequency (e . g. owing to a vibrato) their amplitudes scan small

    p a r t s of the vocal-tract t ransfer function, and m o r e information on the , formant frequencies i s communicated. The effect would be particularly

    strong in the soprano range where the fundamental frequency may be even

    higher than 1000 Hz and the information on the formant frequencies would

    be ha rd to decode f r o m the acoustic spectrum.

    Experiment

    A se r i e s of high-pitched sounds were synthesized with and without

    vibrato using a terminal analogue. Six formant-frequency combinations

    were used, each of which corresponds to a Swedish long vowel: [u , o,

    a , e , i , y ] . The vowel formant frequencies ( s e e Table 111-A-I) were typical of soprano singing according to previous measurements on

    a singer singing a t a fundamental of 262 Hz (Sundberg 1975). Contrary to

    what was found in this soprano, each of these s ix combinations of formant

    frequencies were used without changes for four different pitches within

    the soprano range: 300, 450, 675, and 1000 Hz. In this way the mater ia l

    included sounds ranging f rom maximum ease to maximum difficulty a s

    regards vowel identification. The vibrato consisted of a sinusoidal modu-

    lation of the fundamental frequency. The r a t e was six ondulations per

    second, and the extent was 50 cents. These values a r e found in normal

    singing, even though the extent i s normally slightly narrower.

  • Table 111-A-I.

    F o r m a n t f requency va lues used for syn thes i s of s t imul i ( lef t)

    and a s c r i b e d t o r e s p o n s e vowels ( r ight ) .

    I STIMULI RESPONSE VOWELS Vowel

    u

    o

    a a

    a3

    E

    e

    1

    Y u

    d ce

    1 Hz

    325

    430

    640

    400

    270

    250

    =2 H z

    700

    650

    1090

    2000

    1900

    1950

    F3 H z

    F4 H z

    = 1 H z

    I

    2700 1 4000 320 415

    700

    855

    815

    625

    375

    275

    275

    300

    390

    565

    2820

    2800

    2700

    1950

    2510

    F2 H z

    4000

    4000

    4000

    4000

    4000

    Hz F3

    565

    725

    1065

    1200

    2030

    2305

    2790

    2675

    2555

    1920

    2040

    1290

    2775

    2775

    3000

    2820

    3000

    3040

    3360

    3655

    3210

    2745

    2725

    2690 J

  • STL-QFSR 2-3/1975 52.

    Four samples of each vowel stimulus were tape recorded a t a constant

    overall amplitude. The t ime and the shape of the onsets and decays were

    a l l s imilar . The stimuli samples were arranged in random order avoid-

    ing m o r e than one repetition of a given stimulus in sequence. The ent i re

    tape contained a total of 192 vowel samples: six formant-frequency com-

    binations x four fundamental frequencies x four presentations x two cases

    (with and without vibrato). Each stimulus had a duration of I sec and was

    followed by a silent interval of 4 sec.

    The stimuli were presented in head-phones (Sennheiser HD 414) a t an

    overall SPL of 70 dB, approximately, to ten phonetically trained subjects.

    Their task was to decide which one of 12 given Swedish long vowels [ u, o, a, a, ;e, E, e, i , y, u, 4 , c e ] sounded most s imilar to the stimulus they heard and to write that vowel in phonetic symbols on a tes t chart. These

    vowels suggested by the subjects a s interpretations of the stimuli will be

    r e fe r red to a s response vowels. The test , including three short breaks,

    took, i n all , about 30 min.

    Evaluation of r e snons e s

    I t seems reasonable to hypothesize that the process of identifying an I

    isolated stimulus a s a specific vowel involves three major steps. The

    f i r s t is a purely sensory one in which the acoustic stimulus i s converted

    into some kind of sensory information. The second step involves an ex-

    traction of an ensemble of sensory character is t ics f rom this sensory

    information. In the third step a comparison i s made between these sen-

    sory character is t ics and corresponding character is t ics of vowels s tored

    in the subject ' s internal reference. The difficulty of identifying a sound

    a s a specific vowel would depend on the degree of s imilar i ty between the

    ensemble of sensory character is t ics and the character is t ics in the internal

    reference.

    The vibrato may affect the sensory information in various ways. Let

    us assume that i t affects the extraction of sensory character is t ics . If so,

    the responses for a given vibrato stimulus should differ f rom those ob-

    tained when the same stimulus was presented without vibrato. In the tes t

    responses the effect would then be that the mean formant frequencies of

    the response vowels for a given stimulus would differ between the cases

    with and without vibrato. In that case the identification difficulties may

  • STL-QPSR 2- 3/1975 53.

    be affected i n various ways, depending on whether the vibrato makes the

    sensory character is t ics (1) more s imi lar , (2) l e s s s imi lar , o r (3) equally

    s imilar to the subject' s reference features. The cases (1) and (2) would

    affect the scatter of the responses.

    The above considerations show the need for measures of the average and

    scat ter of the response vowels. In order to obtain such measures a set of

    formant frequencies was ascr ibed to each of the 12 response vowels in ac-

    cordance with data on female speakers , given by Fant (197 3). These values

    a r e l is ted in Table 111-A-I. I t should be noted that the same formant f re -

    quencies were ascr ibed to a given response vowel regard less of the funda-

    mental fr equency of the corresponding stimulus. The formant frequencies

    were expressed in .Mels and the mean formant frequencies of the response

    vowels were computed for each stimulus.

    Resul ts

    The responses a r e l is ted in Table 111-A-11. One of the subjects refused

    to give an answer to three of the stimuli, a l l of which happened to lack vib-

    rato. The inter- subject differences rkgarding the internal reference were

    ra ther small: three o r four unique votes on a specific response vowel were

    given by the same subject only occasionally, a s can be seen in Table 111-A-II.

    The mean formant frequencies of the response vowels a r e shown in Fig.

    111-A- 1 and the corresponding standard deviations a r e given in Table 111-A-111.

    The relationships between the mean formant frequencies of the r espon-

    s e s and the stimulus spectra can be studied in Fig. 111-A-1, where, in

    par t icular , the case of the stimuli synthesized with [I?]-formants offer an

    i l lustrative example (Fig. 111-A- lc). The mean formant frequencies of the

    responses drop with r is ing fundamental frequency up to 675 Hz. The r ea -

    son for this is probably that, in o rde r to retain the vowel quality when the

    fundamental is ra i sed , i t i s neces