The formant patterns of fricative consonants · B. THE FORKBNT PATTERNS OF FRICATIVE CONSONANTS The...

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

The formant patterns offricative consonants

Jassem, W.

journal: STL-QPSRvolume: 3number: 3year: 1962pages: 006-015

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

B. THE FORKBNT PATTERNS OF FRICATIVE CONSONANTS

The r e l a t i o n s between t h e production of f r i c a t i v e

consonants and t h e r e s u l t i n g acous t i c wave have been t h e o r e t i c a l l y

t r e a t e d by O. Fant (' ), W. Meyer-Epplor ( 9 ) and J . M . Heinz and

K.N. Stevens (4) . G.W. Hughes and M. Hallo havc analyzed

American English f r i c a t i v e s and considered t h e general d i s t r i -

bu t ion of energy i n t h e i r spec t r a so a s t o a r r i v e a t a system

of i d e n t i f i c a t i o n i n terms of b inary f ea tu r e s ( 5 ) . Perceptual

experiments performed by K.S. Ha r r i s and J. Mgrtony ( 7 , 8)

have shown t h a t wh i l s t some f r i c a t i v c s a r c mzinly recognized

by t h e spectrum of t h e f r i c a t i v e segment (pos tden ta l , a l vco l a r

and p a l a t a l types ) , o ther cues, v i a . ove ra l l l e v e l , t h e aspi-

r a t i v e scgment and vowel t r a n s i t i o n s a r e important f o r t h e

i d e n t i f i c a t i o n of o ther types, c f . a l s o * ) Independently of

a l i n g u i s t i c context t h e noicc spectrum of f r i c a t i v c s has been

analyzed by P. St revens ( I 0 ) from t h e point of view of t h e low

and high l i m i t s of t h e noise along t h e frequency s ca l e , formant-

l i k e s t r u c t u r e and ove ra l l l e v e l .

Apart from t h c i n t r i n s i c i n t e r e s t f o r t h e acous t ic

theory of spcech production, t h e r e a r e p r a c t i c a l ob j ec t s i n

t h e measurcmcnt of F-patterns of f r i c a t i v e consonants: some

compressod speech t ransmiss ion systems depend on formant-

t racking, and more information than has h i t h c r t o been ava i l ab l e

i s nceded on t h e f requencies and r c l a t i v c l e v c l s of those

peaks i n t h c spectrum of f r i c a t i v e s which, whcn considered a s

timo-varying parameters, correspond t o vowel formants. Data on

higher-frequency peaks may be important f o r t h e syn thos i s of

f r i c a t i v e s with po le and zero c i r c u i t s , c f . (4)

One na t i ve speakor has bcen choson f o r each of t h e

t h r e e languages - Stockholm Swedish, American Engl ish and Non-

Regional Polish. Thc i r spcech has informally bcon judged a s

t y p i c a l . The ma te r i a l s cons i s t p r imar i ly of CVC and CV s y l l a b l e s

i n sequences which a r e phonologically na tu r a l i n each language.

A r t i f i c i a l sequences have bcen avoided. American English / j / has exooptionally been t r e a t e d i n VC syllables because t h i s

phoncmc doos not occur i n i t i a l l y i n t h i s language. Swedish /?/

and /x/ occur in Clr syllables only since they are not used

in final positions, and all Polish voiced fricatives, for

similar reasons of linguistic constraint, only appear in CV

syllables. The respective varieties of a fully open ( [a]-like), a front-close ([il-like) and n back-close ([ul-like) vowel havc bcon uscd in cach languagc, thcsc being acoustically and

articulatorily the "extrcmc" vocalic syllabics. In the case of

Polish /;/ an additional, fourth vowel /i/ has bocn uscd

because an initial //i/ only occurs in not fully assimilated

foreign words. Thrcc series of recordings worc made, at inter-

vals of approximately one month. Within one languagc, the

three series wcrc not identical (the first one consisted of

CV syllables only). Each fricative phorremc in tho materials

is thus represented by bctwccn 3 (as is American Znglish

/i 7 4 a3 UJ / ) and 8 (in polish/ j i j j a j , ju; , / ) positional

variants, and cach variant is represented by 3 or 2 specimens.

The phonemic status of Swedish /x/, which is mostly a labio-

velar fricativc [ 5 ] is not entirely clear. For reasons which

would lie outside the scop~ of this article, t h ~ usual assump-

tion that [ 5 ] is a variant of the /// phoncmc may, in the

author's opinion, bc doubtod. Our Swedish speaker ordinarily

uses [ 6 ] in word-initial positions, but apparantly pronounces 4

the other sound, which is an alveolar, slightly rctroflox [ $ ] quite naturally as an alternative.

The materials were analyzed by means of a spectrum

section analyzer called MSSLNY, which was dcvclopcd in the

Speech Transmission Laboratory, Royal Institute of Technology,

Stockholm, and dcscribcd by J. Liljcncrants ( 6 ) . The frequency

range covorcd for this analysis is 0-9 kcps X ) . Thc spacing

Owing to a ccrrier current leak in the heterodyning system the analysis of frcquencies up to approximately 300 cps is diffic~~lt in unvoiced sounds, and in the present investigation this low frequency range has been disregarded. In volcad fricatives, on the other hand, strong harmonic components baiow approximately 900 cps sometimes make it difficult to obtain useful data on higher frcqucncics within the available dynamic range. Most of the voiced fricatives have thereforebcen high-pass filtered with a very sharp cutoff at 275? 500 or 750 cps. Unfiltered spectra of voiced fricatives show a gradual drop-off of the spectral envtlopc bctwcon tho first harmonic and approxinatoly 900 cps which may be up to 20 dB. The highcr valucs hzvc boen found in [ 81 and [v].

8 .

bctwccn tho ccn t c r f rcqucncics of t h e analyzing f i l t e r s of

RASSLAN was 50 cps. Most ana ly s i s Frore performed with two

bandwidths of t h c f i l t a r s , v i z - 725 and 250 cps. Thc c f f c c t i r e

in tcgra- t ion timo r w a s 80 nsoc - a low-pass smoothing f i l t e r

of 5 cps bandwidth bcing used. The i n t e g r a t i o n c i r c u i t r y was

t r i ggcd 10-20 mscc before t h e cnd of t h e f r i c a t i v e segncnt so

t h a t t h c maximum of t hc momor;r curve occurred et an l n s t a n t of

timc a t which, on t h e whola, thd d i s t r i b u t i o n of enorgy i n

frequency tcndcd t o bc r c l a t i v c l y steady. Conventional sonegrams

were madc of t he o n t i r c rriaterial . Togcthzr with ink-wri t tcc

oscil lograms theso wcrc ~7.scd t o dctsrmina t hc boundarics of

phonic scgmcnts. Thc sonsgrams worc a l s o hc lpfu l i n dcciding

which of t hc pcaks i n t he f r i c a t i v e s p ~ c t r a wcrc t o bz rc-

garded a s B 2 F3 and F4?d sonc casas i n which t h e appearrancc

of add i t i ona l peaks madc t hc docis ion d i f f i c u l t . Apart f rom

Swedish and Po l i sh /x/, r;, F, and P could bc found i n tho 4 f r i c a t i v e spcc t r a i n approaimetively 95 $ of t hc cascs I).

Above thesc formants, f u r t h e r pcaks could bo sccn i n t h ~

s p c c t r a along tho frequency s c a l s , some of which wcrc r egu l a r

i n t h c sensc t h a t fhcy tcnded t o c l u s t e r round d c f i n i t c f rc -

quencics i n a t l c a s t n major i ty of t h e sp~c imcns of a givcn

phoneme. I n somc c s s c s t hc r c was a poak below F2, c spcc i a l l y

i n [f]-sounds. Thc Swcdish and Po l i sh /x/ phonsmcs a r e d i f -

f e r e n t from thc r e s t i n t h e t horc F1 and F2 appcar q u i t c

regu la r ly , togc thcr with F3 f o r t h e Po l i sh /x/, wh i l s t t he

o the r formants a r c so low i n l e v e l t h a t they r a r c l y appear

i n our ana lys i s , which covcrcd a rangc of approximatcly 35 dB.

Fig. 1-2 shows the frcquencios of thc formants 2nd

t h e r egu l a r highcr-frcqucncy peaks. Zach column inc ludes dz t a

from a l l the spccincns of a given phoneme. Thz b lack a r m s

r e f e r t o F29 % and F i n a l l phonemes except Swcdish a ~ d 4

x) I n somc cascs a zero of t h , vocm,l tr ct t r nns l ,r func t~_on nr.y have csnccl lcd out t hc ac tua l F and what was mcasured as F

4 4 may not then correspond t o an F pole of t h ~ t r a n s f a r function. Spectrographic cvidencc w a s som8timos i n o o n c l ~ s ~ v e i n l o c a t i n g an F pole.

4 XX) According t o Fnnt:

Formant numbzr N 1s l abc l l cd FN Fn Thc frcqucncy of fo rnzn t N IS l abo l l ad 9; Fn Tho l o v c l of formant N i s l a b c l l c d N = 1 9 2, 39 4

LN

t o t h e r e l a t i o n between L and L2. Thus i n sounds of t h e 3

(a-A) type F4 i s higher i n l e v e l than F2. This is t he case

i n [s , z ] . In type (a-B) F4 i s lower i n l e v e l than F2.

Here belong [f, v; e , a ] and a l s o [x]. A l l ( b ) type sounds

belong t o group ( A ) . Thus F3 i s higher i n l e v e l than F2 i n

[ , , 4 ; , , j 1. A l l t h e specimens i n t h e mate r ia l s which

contained t h e re levan t informat ion i n t h e spectrum were exaained

a s t o t h e a p p l i c a b i l i t y of t h e above r u l e s . In t h e r e s u l t s

which follow t h e per cent of specimens of each phoneme i s given

which a r e i n agreement with t h e above ru l e s .

Rule I

- F 2 1 . 8 kcps :: [ f v 8 % s z ] F4 2

F4 - F q < 1 . 8 k c p s : 3 - 1 7 j l

Swedish

phoneme f v s 1 - 4 0 3 $ y c o r r e c t ' 93 73 82 100 100 66 100

American English

phoneme f v 0 3 s J - 3 $, ' correct 9 100 100 100 100 100 100 100 100

Pol i sh

phoneme f v s J - 3 B 7 % ' c o r r ec t9 100 100 100 100 95 100 100 100

Rule I1

F p + Fj + F4 < 8 kcps : [ $ 3 * ] F~ + F~ + F~ 2 8 kcpe : rq+jl

Swed ish

phoneme I d j

f6 'correct' 100 100 86 63

Polish

phoneme J 3 G O $ 'correct' 1 0 0 1 0 0 100 1 0 0

Rule 111

group (a): 1 > 1 : [ a z] F4 F2

'F~ < 1 ~ 2 : [f v 6% 1; also: [xf

group (b) : 1 3 > IF2 : [ 5 3 ~ ~ $ 3 1

Swedish

phoneme f v 8 5 4 $ 3 , x

$ correct 93 1 0 0 loo l o o l o o 100 1 0 0 1 0 0

Am. English

phoneme f v 0 3 8 5 3 $ correct 100 100 93 100 100 86 1 0 0 100

Polish

phoneme f v s $ 3 G ' b x correct 100 100 100 100 lo0 1 0 0 92 89 1 0

According, then, t o the F-pattern, the fr icat ives in the three languages

here investigated can be described in binary features, a s follows:

Swedish

spread formant 6: raised formants: Fq - F2 1.8 kops F2 + F3 F,+P k a ~ e

Am. English

higher-formant emphasis 1 - 1 i n speed- formant % ' *2 group

i n non-spread '9 "'2 formant group

and 1x3

spread formants higher-formant emphasis,

13 . Pa l i sh

spread formant s r a i s e d formants higher-formant emphasis

Although we a r e here pr imari ly concerned with t h e

F-patterns of f r i c a t i v e consonants, it should be noted t h a t t h e r e

a r e o ther s p e c t r a l f e a tu r e s t h a t c l e a r l y d i f f e r e n t i a t e var ious

types of such sounds a t l e a s t i n some cases they may be more

powerful cues f o r human o r automatic recogni t ion than t h e F-

pa t te rns . One of these i s t h e general d i s t r i b u t i o n of energy

i n t h e spectrum. A l l t h e spec t r a here analyzed have been quantized

i n 10 dB s t eps r e l a t i v e t o t h e highest peak i n each ind iv idua l

spectrum. Although such a quan t iza t ion does not r e a d i l y lend

i t s e l f t o a p r a c t i c a l , mathematically r i g i d o r ins t rumental ly

e f f e c t i v e t reatment , it has been found he lp fu l i n a somewhat in-

formal de sc r ip t i on of t he s p e c t r a l p roper t i es of t h e consonants

i n question. When t h e noise spectrum by i t s e l f i s considered

( i. e. d i s regard ing s p e c t r a l p roper t i es due t o g l o t t a1 e x c i t a t i o n )

[ f v 8 3 1 may be described a s having an e s s e n t i a l l y f l a t spectrum

and [ s z 1 can be charac te r ized a s having almost a l l energy con-

t a ined i n a region above 4 kcps. Tho noise of [ j 3 4 ] i s con-

t a ined between approximately 1.5 kcps and 8 kcps, and t h a t of

[ y 9 j] between approximately 2 kcps and 9 kcps. Group ( b )

f r i c a t i v e s ( i . e . those with r e l a t i v e l y compressed formants, see

above) have two regions of energy concentra t ion i n t h e middle

f requencies . I n [ X I almost a l l enorgy i s contained below 3 kcps.

The var ious shapes of noise spec t r a can schemat ical ly be rep-

resen ted a s i n Fig. 1-3 i n which no -,ttzmpt i s nL7.di t o obtc in

numerical accuracy.

BL

Swedish

Fig. I-2a. The formants and higher-frequency peaks of fricative consonants.

Fig. I-2b. The formants and higher-frequency peaks of fricative consonants.

2 4 6 8 Kcps

dB 2 4 6 8 Kcps

dB 2 4 6 8 Kcps

2 4 6 0 Kcps dB

u 2 L 6 8 Kcps

Fig. 1-3. Schematic spectra of voiceless fricatives.

i n only 1 specimen. The pre- /i/ Polish /x/ is a pa la t a l sound,

s imilar t o /?/ from which it d i f f e r s by showing a o lear low-

frequency F1 (about 0 45 kcps). The concentration of energy i n

the h i g h e ~ i d d l e frequencies typica l f o r /F/ is absent i n the

pxe- /i/ Polish /x/, though the gradual drop-off of the overa l l

spec t ra l envelope occurs much higher up along the frequency

sca le i n t h i s than i n the other variants .

Acknowledgpant

The author wishes g ra te fu l ly t o acknowledge the

technical assis tance of Mr. J. Lil jencrants of the Royal I n s t i t u t e

of Technology, Speech Transmission Laboratory.

References

(1) Fant, G.: Acoustic Theory of Speech Production, 9s-Gravenhage (1960)

( 2 ) Forgie, J .W. and Forgie, C.D.: "A Computer Program fo r Recognizing the English Fr ica t ives /f/ and /Q/", IV Internat ional Con- gress on Acoustics, Copenhagen, G I 1 (1 962).

( 3 ) Harris, K.S.: "Cues f o r the Discrimination of American English Fr ica t ives i n Spoken Syllables", Language and Speech g 1 (1958) PP* 1-7.

(4) Heinz, J . M . and Stevens, K.N. : "On the Propert ies of Voiceless Fr ica t ive Consonants", J.Acoust . Soc.Am, 3 (1 961 ) pp. 589-596.

(5) Hughes, G.W. and Halle, M.: "Spectral Properties of Fr ica t ive Consonants", J.Acoust .Soc.Am. 28 (1956) pp. 303-310.

( 6 ) Lil jencrants , J.: "MSSLAN - a 6-Channel Loop Sectioning Device", STL, QPSR 2/1960, pp. 1-3.

( 7 ) H&rtony, J.: "On the Synthesis and Perception of Voiceless Swedish Fricat ivest ' , STL, QPSR 1/1962, pp. 17-22.

(8) Mgrtony, J.: "On the Perception of Swedish Voiceless Fricativesfl , STL, QPSR 2/1962, pp. 25-28,

(9) Meyer-Eppler, W. : "Untersuchung zur Schal ls t ructur der st imm- haften und st immlosen Gerauschlaute" , Z.Ph. 1 (1 953) pp. 89-104.

(1 0) Strevens, P, : !'Spectra of Fricat ive Noise i n Human Speech", Language and Speech 3/1 (1 960) pp, 32-49.

W. Jassem

Sued ish

P2 F3 cps dB $ cps dB $

1520 2 1 2450 0 22

1370 O +3 2340 -6 9

1800 -14 24 2650 -13 22

1670 -13 +3 2310 0 2 3

1640 -5 +2 2380 0 ~2

2030 -7 2 9 2860 0 23

1990-12 21 28% 0 - +I

F1 P2

cps dB $ cps dB $

F 4

cps dB 76

3630 -6 22

3480 -12 +3

3820 -2 24

2940 -2 2 2

2850 -4 2 2

3560 -1 2 2

3450 -2 -+1

higher -freq. peaks

cps dB cps dB

7800 -2 8570 0

- -- 8590 -6

6610 -2 8370 0

3650 -7 5460 -7

3300 -2 -- -- 4400 -7 5880 -6

4420 -9 -- -- higher-freq. peaks

cp8 dB O/o cps dB cps dB 1 -- -0 -- 3680 -20 -- --

Table I-la.

Am. English

F4 1

higher-freq. peaks

Table I-lb.

Pol iah

F2 *3 F4

cps dB $ cps dB $ CPs dB %

higher-freq. peaks

cps dB cps dB

7350 -9 8520 -7

7340 -11 8410 -9

7680 0 8550 0

7700 -2 8580 0

4070 -6 6500 -6

4050 -2 4380 0

higher-freq. peaks

cps dB cpa dB

Table I-1c.

Spectra of fricative consonants

Fig. I-4b. American English

Spectra of fricative consonants

Fig. I-4c. Polish

The formant patterns of fricative consonants · B. THE FORKBNT PATTERNS OF FRICATIVE CONSONANTS The...

Documents

Transcript of The formant patterns of fricative consonants · B. THE FORKBNT PATTERNS OF FRICATIVE CONSONANTS The...