The formant patterns of fricative consonants · B. THE FORKBNT PATTERNS OF FRICATIVE CONSONANTS The...
Transcript of The formant patterns of fricative consonants · B. THE FORKBNT PATTERNS OF FRICATIVE CONSONANTS The...
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
The formant patterns offricative consonants
Jassem, W.
journal: STL-QPSRvolume: 3number: 3year: 1962pages: 006-015
http://www.speech.kth.se/qpsr
B. THE FORKBNT PATTERNS OF FRICATIVE CONSONANTS
The r e l a t i o n s between t h e production of f r i c a t i v e
consonants and t h e r e s u l t i n g acous t i c wave have been t h e o r e t i c a l l y
t r e a t e d by O. Fant (' ), W. Meyer-Epplor ( 9 ) and J . M . Heinz and
K.N. Stevens (4) . G.W. Hughes and M. Hallo havc analyzed
American English f r i c a t i v e s and considered t h e general d i s t r i -
bu t ion of energy i n t h e i r spec t r a so a s t o a r r i v e a t a system
of i d e n t i f i c a t i o n i n terms of b inary f ea tu r e s ( 5 ) . Perceptual
experiments performed by K.S. Ha r r i s and J. Mgrtony ( 7 , 8)
have shown t h a t wh i l s t some f r i c a t i v c s a r c mzinly recognized
by t h e spectrum of t h e f r i c a t i v e segment (pos tden ta l , a l vco l a r
and p a l a t a l types ) , o ther cues, v i a . ove ra l l l e v e l , t h e aspi-
r a t i v e scgment and vowel t r a n s i t i o n s a r e important f o r t h e
i d e n t i f i c a t i o n of o ther types, c f . a l s o * ) Independently of
a l i n g u i s t i c context t h e noicc spectrum of f r i c a t i v c s has been
analyzed by P. St revens ( I 0 ) from t h e point of view of t h e low
and high l i m i t s of t h e noise along t h e frequency s ca l e , formant-
l i k e s t r u c t u r e and ove ra l l l e v e l .
Apart from t h c i n t r i n s i c i n t e r e s t f o r t h e acous t ic
theory of spcech production, t h e r e a r e p r a c t i c a l ob j ec t s i n
t h e measurcmcnt of F-patterns of f r i c a t i v e consonants: some
compressod speech t ransmiss ion systems depend on formant-
t racking, and more information than has h i t h c r t o been ava i l ab l e
i s nceded on t h e f requencies and r c l a t i v c l e v c l s of those
peaks i n t h c spectrum of f r i c a t i v e s which, whcn considered a s
timo-varying parameters, correspond t o vowel formants. Data on
higher-frequency peaks may be important f o r t h e syn thos i s of
f r i c a t i v e s with po le and zero c i r c u i t s , c f . (4)
One na t i ve speakor has bcen choson f o r each of t h e
t h r e e languages - Stockholm Swedish, American Engl ish and Non-
Regional Polish. Thc i r spcech has informally bcon judged a s
t y p i c a l . The ma te r i a l s cons i s t p r imar i ly of CVC and CV s y l l a b l e s
i n sequences which a r e phonologically na tu r a l i n each language.
A r t i f i c i a l sequences have bcen avoided. American English / j / has exooptionally been t r e a t e d i n VC syllables because t h i s
phoncmc doos not occur i n i t i a l l y i n t h i s language. Swedish /?/
and /x/ occur in Clr syllables only since they are not used
in final positions, and all Polish voiced fricatives, for
similar reasons of linguistic constraint, only appear in CV
syllables. The respective varieties of a fully open ( [a]-like), a front-close ([il-like) and n back-close ([ul-like) vowel havc bcon uscd in cach languagc, thcsc being acoustically and
articulatorily the "extrcmc" vocalic syllabics. In the case of
Polish /;/ an additional, fourth vowel /i/ has bocn uscd
because an initial //i/ only occurs in not fully assimilated
foreign words. Thrcc series of recordings worc made, at inter-
vals of approximately one month. Within one languagc, the
three series wcrc not identical (the first one consisted of
CV syllables only). Each fricative phorremc in tho materials
is thus represented by bctwccn 3 (as is American Znglish
/i 7 4 a3 UJ / ) and 8 (in polish/ j i j j a j , ju; , / ) positional
variants, and cach variant is represented by 3 or 2 specimens.
The phonemic status of Swedish /x/, which is mostly a labio-
velar fricativc [ 5 ] is not entirely clear. For reasons which
would lie outside the scop~ of this article, t h ~ usual assump-
tion that [ 5 ] is a variant of the /// phoncmc may, in the
author's opinion, bc doubtod. Our Swedish speaker ordinarily
uses [ 6 ] in word-initial positions, but apparantly pronounces 4
the other sound, which is an alveolar, slightly rctroflox [ $ ] quite naturally as an alternative.
The materials were analyzed by means of a spectrum
section analyzer called MSSLNY, which was dcvclopcd in the
Speech Transmission Laboratory, Royal Institute of Technology,
Stockholm, and dcscribcd by J. Liljcncrants ( 6 ) . The frequency
range covorcd for this analysis is 0-9 kcps X ) . Thc spacing
Owing to a ccrrier current leak in the heterodyning system the analysis of frcquencies up to approximately 300 cps is diffic~~lt in unvoiced sounds, and in the present investigation this low frequency range has been disregarded. In volcad fricatives, on the other hand, strong harmonic components baiow approximately 900 cps sometimes make it difficult to obtain useful data on higher frcqucncics within the available dynamic range. Most of the voiced fricatives have thereforebcen high-pass filtered with a very sharp cutoff at 275? 500 or 750 cps. Unfiltered spectra of voiced fricatives show a gradual drop-off of the spectral envtlopc bctwcon tho first harmonic and approxinatoly 900 cps which may be up to 20 dB. The highcr valucs hzvc boen found in [ 81 and [v].
8 .
bctwccn tho ccn t c r f rcqucncics of t h e analyzing f i l t e r s of
RASSLAN was 50 cps. Most ana ly s i s Frore performed with two
bandwidths of t h c f i l t a r s , v i z - 725 and 250 cps. Thc c f f c c t i r e
in tcgra- t ion timo r w a s 80 nsoc - a low-pass smoothing f i l t e r
of 5 cps bandwidth bcing used. The i n t e g r a t i o n c i r c u i t r y was
t r i ggcd 10-20 mscc before t h e cnd of t h e f r i c a t i v e segncnt so
t h a t t h c maximum of t hc momor;r curve occurred et an l n s t a n t of
timc a t which, on t h e whola, thd d i s t r i b u t i o n of enorgy i n
frequency tcndcd t o bc r c l a t i v c l y steady. Conventional sonegrams
were madc of t he o n t i r c rriaterial . Togcthzr with ink-wri t tcc
oscil lograms theso wcrc ~7.scd t o dctsrmina t hc boundarics of
phonic scgmcnts. Thc sonsgrams worc a l s o hc lpfu l i n dcciding
which of t hc pcaks i n t he f r i c a t i v e s p ~ c t r a wcrc t o bz rc-
garded a s B 2 F3 and F4?d sonc casas i n which t h e appearrancc
of add i t i ona l peaks madc t hc docis ion d i f f i c u l t . Apart f rom
Swedish and Po l i sh /x/, r;, F, and P could bc found i n tho 4 f r i c a t i v e spcc t r a i n approaimetively 95 $ of t hc cascs I).
Above thesc formants, f u r t h e r pcaks could bo sccn i n t h ~
s p c c t r a along tho frequency s c a l s , some of which wcrc r egu l a r
i n t h c sensc t h a t fhcy tcnded t o c l u s t e r round d c f i n i t c f rc -
quencics i n a t l c a s t n major i ty of t h e sp~c imcns of a givcn
phoneme. I n somc c s s c s t hc r c was a poak below F2, c spcc i a l l y
i n [f]-sounds. Thc Swcdish and Po l i sh /x/ phonsmcs a r e d i f -
f e r e n t from thc r e s t i n t h e t horc F1 and F2 appcar q u i t c
regu la r ly , togc thcr with F3 f o r t h e Po l i sh /x/, wh i l s t t he
o the r formants a r c so low i n l e v e l t h a t they r a r c l y appear
i n our ana lys i s , which covcrcd a rangc of approximatcly 35 dB.
Fig. 1-2 shows the frcquencios of thc formants 2nd
t h e r egu l a r highcr-frcqucncy peaks. Zach column inc ludes dz t a
from a l l the spccincns of a given phoneme. Thz b lack a r m s
r e f e r t o F29 % and F i n a l l phonemes except Swcdish a ~ d 4
x) I n somc cascs a zero of t h , vocm,l tr ct t r nns l ,r func t~_on nr.y have csnccl lcd out t hc ac tua l F and what was mcasured as F
4 4 may not then correspond t o an F pole of t h ~ t r a n s f a r function. Spectrographic cvidencc w a s som8timos i n o o n c l ~ s ~ v e i n l o c a t i n g an F pole.
4 XX) According t o Fnnt:
Formant numbzr N 1s l abc l l cd FN Fn Thc frcqucncy of fo rnzn t N IS l abo l l ad 9; Fn Tho l o v c l of formant N i s l a b c l l c d N = 1 9 2, 39 4
LN
t o t h e r e l a t i o n between L and L2. Thus i n sounds of t h e 3
(a-A) type F4 i s higher i n l e v e l than F2. This is t he case
i n [s , z ] . In type (a-B) F4 i s lower i n l e v e l than F2.
Here belong [f, v; e , a ] and a l s o [x]. A l l ( b ) type sounds
belong t o group ( A ) . Thus F3 i s higher i n l e v e l than F2 i n
[ , , 4 ; , , j 1. A l l t h e specimens i n t h e mate r ia l s which
contained t h e re levan t informat ion i n t h e spectrum were exaained
a s t o t h e a p p l i c a b i l i t y of t h e above r u l e s . In t h e r e s u l t s
which follow t h e per cent of specimens of each phoneme i s given
which a r e i n agreement with t h e above ru l e s .
Rule I
- F 2 1 . 8 kcps :: [ f v 8 % s z ] F4 2
F4 - F q < 1 . 8 k c p s : 3 - 1 7 j l
Swedish
phoneme f v s 1 - 4 0 3 $ y c o r r e c t ' 93 73 82 100 100 66 100
American English
phoneme f v 0 3 s J - 3 $, ' correct 9 100 100 100 100 100 100 100 100
Pol i sh
phoneme f v s J - 3 B 7 % ' c o r r ec t9 100 100 100 100 95 100 100 100
Rule I1
F p + Fj + F4 < 8 kcps : [ $ 3 * ] F~ + F~ + F~ 2 8 kcpe : rq+jl
Swed ish
phoneme I d j
f6 'correct' 100 100 86 63
Polish
phoneme J 3 G O $ 'correct' 1 0 0 1 0 0 100 1 0 0
Rule 111
group (a): 1 > 1 : [ a z] F4 F2
'F~ < 1 ~ 2 : [f v 6% 1; also: [xf
group (b) : 1 3 > IF2 : [ 5 3 ~ ~ $ 3 1
Swedish
phoneme f v 8 5 4 $ 3 , x
$ correct 93 1 0 0 loo l o o l o o 100 1 0 0 1 0 0
Am. English
phoneme f v 0 3 8 5 3 $ correct 100 100 93 100 100 86 1 0 0 100
Polish
phoneme f v s $ 3 G ' b x correct 100 100 100 100 lo0 1 0 0 92 89 1 0
According, then, t o the F-pattern, the fr icat ives in the three languages
here investigated can be described in binary features, a s follows:
Swedish
spread formant 6: raised formants: Fq - F2 1.8 kops F2 + F3 F,+P k a ~ e
Am. English
higher-formant emphasis 1 - 1 i n speed- formant % ' *2 group
i n non-spread '9 "'2 formant group
and 1x3
spread formants higher-formant emphasis,
13 . Pa l i sh
spread formant s r a i s e d formants higher-formant emphasis
Although we a r e here pr imari ly concerned with t h e
F-patterns of f r i c a t i v e consonants, it should be noted t h a t t h e r e
a r e o ther s p e c t r a l f e a tu r e s t h a t c l e a r l y d i f f e r e n t i a t e var ious
types of such sounds a t l e a s t i n some cases they may be more
powerful cues f o r human o r automatic recogni t ion than t h e F-
pa t te rns . One of these i s t h e general d i s t r i b u t i o n of energy
i n t h e spectrum. A l l t h e spec t r a here analyzed have been quantized
i n 10 dB s t eps r e l a t i v e t o t h e highest peak i n each ind iv idua l
spectrum. Although such a quan t iza t ion does not r e a d i l y lend
i t s e l f t o a p r a c t i c a l , mathematically r i g i d o r ins t rumental ly
e f f e c t i v e t reatment , it has been found he lp fu l i n a somewhat in-
formal de sc r ip t i on of t he s p e c t r a l p roper t i es of t h e consonants
i n question. When t h e noise spectrum by i t s e l f i s considered
( i. e. d i s regard ing s p e c t r a l p roper t i es due t o g l o t t a1 e x c i t a t i o n )
[ f v 8 3 1 may be described a s having an e s s e n t i a l l y f l a t spectrum
and [ s z 1 can be charac te r ized a s having almost a l l energy con-
t a ined i n a region above 4 kcps. Tho noise of [ j 3 4 ] i s con-
t a ined between approximately 1.5 kcps and 8 kcps, and t h a t of
[ y 9 j] between approximately 2 kcps and 9 kcps. Group ( b )
f r i c a t i v e s ( i . e . those with r e l a t i v e l y compressed formants, see
above) have two regions of energy concentra t ion i n t h e middle
f requencies . I n [ X I almost a l l enorgy i s contained below 3 kcps.
The var ious shapes of noise spec t r a can schemat ical ly be rep-
resen ted a s i n Fig. 1-3 i n which no -,ttzmpt i s nL7.di t o obtc in
numerical accuracy.
BL
Swedish
Fig. I-2a. The formants and higher-frequency peaks of fricative consonants.
Fig. I-2b. The formants and higher-frequency peaks of fricative consonants.
2 4 6 8 Kcps
dB 2 4 6 8 Kcps
dB 2 4 6 8 Kcps
2 4 6 0 Kcps dB
u 2 L 6 8 Kcps
Fig. 1-3. Schematic spectra of voiceless fricatives.
i n only 1 specimen. The pre- /i/ Polish /x/ is a pa la t a l sound,
s imilar t o /?/ from which it d i f f e r s by showing a o lear low-
frequency F1 (about 0 45 kcps). The concentration of energy i n
the h i g h e ~ i d d l e frequencies typica l f o r /F/ is absent i n the
pxe- /i/ Polish /x/, though the gradual drop-off of the overa l l
spec t ra l envelope occurs much higher up along the frequency
sca le i n t h i s than i n the other variants .
Acknowledgpant
The author wishes g ra te fu l ly t o acknowledge the
technical assis tance of Mr. J. Lil jencrants of the Royal I n s t i t u t e
of Technology, Speech Transmission Laboratory.
References
(1) Fant, G.: Acoustic Theory of Speech Production, 9s-Gravenhage (1960)
( 2 ) Forgie, J .W. and Forgie, C.D.: "A Computer Program fo r Recognizing the English Fr ica t ives /f/ and /Q/", IV Internat ional Con- gress on Acoustics, Copenhagen, G I 1 (1 962).
( 3 ) Harris, K.S.: "Cues f o r the Discrimination of American English Fr ica t ives i n Spoken Syllables", Language and Speech g 1 (1958) PP* 1-7.
(4) Heinz, J . M . and Stevens, K.N. : "On the Propert ies of Voiceless Fr ica t ive Consonants", J.Acoust . Soc.Am, 3 (1 961 ) pp. 589-596.
(5) Hughes, G.W. and Halle, M.: "Spectral Properties of Fr ica t ive Consonants", J.Acoust .Soc.Am. 28 (1956) pp. 303-310.
( 6 ) Lil jencrants , J.: "MSSLAN - a 6-Channel Loop Sectioning Device", STL, QPSR 2/1960, pp. 1-3.
( 7 ) H&rtony, J.: "On the Synthesis and Perception of Voiceless Swedish Fricat ivest ' , STL, QPSR 1/1962, pp. 17-22.
(8) Mgrtony, J.: "On the Perception of Swedish Voiceless Fricativesfl , STL, QPSR 2/1962, pp. 25-28,
(9) Meyer-Eppler, W. : "Untersuchung zur Schal ls t ructur der st imm- haften und st immlosen Gerauschlaute" , Z.Ph. 1 (1 953) pp. 89-104.
(1 0) Strevens, P, : !'Spectra of Fricat ive Noise i n Human Speech", Language and Speech 3/1 (1 960) pp, 32-49.
W. Jassem
Sued ish
P2 F3 cps dB $ cps dB $
1520 2 1 2450 0 22
1370 O +3 2340 -6 9
1800 -14 24 2650 -13 22
1670 -13 +3 2310 0 2 3
1640 -5 +2 2380 0 ~2
2030 -7 2 9 2860 0 23
1990-12 21 28% 0 - +I
F1 P2
cps dB $ cps dB $
F 4
cps dB 76
3630 -6 22
3480 -12 +3
3820 -2 24
2940 -2 2 2
2850 -4 2 2
3560 -1 2 2
3450 -2 -+1
higher -freq. peaks
cps dB cps dB
7800 -2 8570 0
- -- 8590 -6
6610 -2 8370 0
3650 -7 5460 -7
3300 -2 -- -- 4400 -7 5880 -6
4420 -9 -- -- higher-freq. peaks
cp8 dB O/o cps dB cps dB 1 -- -0 -- 3680 -20 -- --
Table I-la.
Am. English
F4 1
higher-freq. peaks
Table I-lb.
Pol iah
F2 *3 F4
cps dB $ cps dB $ CPs dB %
higher-freq. peaks
cps dB cps dB
7350 -9 8520 -7
7340 -11 8410 -9
7680 0 8550 0
7700 -2 8580 0
4070 -6 6500 -6
4050 -2 4380 0
higher-freq. peaks
cps dB cpa dB
Table I-1c.
Spectra of fricative consonants
Fig. I-4b. American English
Spectra of fricative consonants
Fig. I-4c. Polish