Moment weighting techniques for segmentation · Moment weighting techniques for segmentation...

12
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Moment weighting techniques for segmentation Liljencrants, J. journal: STL-QPSR volume: 3 number: 4 year: 1962 pages: 015-021 http://www.speech.kth.se/qpsr

Transcript of Moment weighting techniques for segmentation · Moment weighting techniques for segmentation...

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

Moment weightingtechniques for segmentation

Liljencrants, J.

journal: STL-QPSRvolume: 3number: 4year: 1962pages: 015-021

http://www.speech.kth.se/qpsr

C. MOlyIENT WEIGHTING TECHNIQUES FOR SEGEEXTATION

The automat ic segmenta t ion of d i f f e r e n t speech sounds

deponds on a s u i t a b l e choica of parameters , some o f which can be

e x t r a c t e d from t h e s i g n a l by means o f moment weight ing . The moments

may be eva lua ted from a s p e c t r a l a n a l y s i s c a r r i e d ou t f o r i n s t a n c e

w i t h a f i l t e r bank. From t h e t e c h n i c a l p o i n t of view i t is ,however,

l e s s involved t o u s e d i r e c t methods f o r moment measuring.

I n t h i s s t u d y we a r e concerned wi th t h e concepts o f

c e n t e r of g r a v i t y and spread i n t h e s p e c t r a of some unvoiced f r i c a t i v e

sounds. The methods a r e of cource a p p l i c a b l e t o o t h e r sound c l a s s e s

a s w e l l .

Consider ing a spectrum w i t h t h e power d e n s i t y ~ ( f ) w e

have t h e moment o f o r d e r n around t h e o r i g i n as

n (1 ) 0

The c e n t e r of g r a v i t y of t h e spectrum i s

The s p e c t r a l sp read i s t h e mean d e v i a t i o n from T given by

where MZ0 i s t h e c e n t r a l moment of o r d e r two found from

0

The t o t a l l e v e l of t h e s i g n a l i s M . If t h e s i g n a l i s passed 0

through an emphasis network w i t h t h e s l o p e +3 d ~ / o c t t h e power

d e n s i t y w i l l be m u l t i p l i e d w i t h a f a c t o r t h a t i s p r o p o r t i o n a l t c

frequency. The t o t a l l e v e l of t h e processed s i g n a l w i l l t h u s be

p r o p o r t i o n a l t o M When t h e s i g n a l i s processed i n an i n t e g r a t i n g 1. ' network wi th n e g a t i v e s l o p e t h e r e s u l t i n g t o t a l l e v e l w i l l be propor-

t i o n a l t o a moment of nega t ive o r d e r .

Moments around a f requency f might be approximated i f X

t h e s i g n a l i s f i r s t modulated wi th fx i n a ba lanced modulator. The

r e s u l t i n g low frequency components cou ld then be t r e a t e d a s above.

Moments around f w i t h nega t ive o r d e r s a r c of c o u r s e out of q u e s t i o n X

if t h e s i gna l con ta ins energy a t fx i n t he frequency domain.

I n t h e following experiment var ious moments have

been approximated by measuring t h e mean average va lue of t h e pro-

cessed s igna l i n s t ead of t h e RNS value . The e r r o r s introduced by

t h i s should be neg l ig ib l e .

It soon was found t h a t t h e low order moments were too

i n t e n s i t i v e t o s p e c t r a l changes f o r p r a c t i c a l purposes when a

reasonable accuracy i n t he maasuring equipment i s assumed. This

s e n s i t i v i t y may be increased by t he use of h igher order moments.

Then we cannot u t i l i z e t h e conventional d e f i n i t i o n s of cen t , r of

g r a v i t y and spycad. These q u a n t i t i c a ;;lay i n s t ead be i n t e rp re t ed

by means of a simple model:

Assume a rec tangula r ( no i s e ) spectrum covering t h e band

from f , = fc/% t o f 2 = f 3? and with constant power dens i t y S . C 0

f w i l l then be t h e cen tdr of g r a v i t y i f a logar i thmic frequency C

s c a l e i s assumed. The r o l s t i v a spread i s represented by t he f a c t o r a. The moments of t h i s s i gna l a r e given by

It i s now d.esirable t c f ind two simple combinations of moments

g iv ing measures of f and -.ttl r e spec t i v s ly . f i r the rmore these combi- C

na t ions should be l e v e l normalized (independent of S ) . 0

f i s most e a s i l y found from express ions of t h e type C

where ~ ( 2 ) i s a funct ion of -1.1. The spec i a l case n = 0 , v = 1

i s equal t o formula ( 2 ) . Some of these expressions a r e independant

of 6c indeed

For p r a c t i c a l purposes it seems t o be j u s t i f i e d t o consider moments

of orders not h igher than + 4. Wc a r e then l e f t with t h r e e non- -

1 I I I I

0 1 2 3 4 -[x '1

octaves

Fig. I - 7. The sensitivity of some moment combinations to X .

wi th in ind iv idua l sound segments. I n add i t ion t h e output l e v e l

v a r i a t i o n s must be ve ry small . The use of a l e s s soph i s t i c a t ed

compressor w i l l however aase t h e requirements on dynamis range i n

t h e following c i r c u i t r y .

The normalization does not e x i s t with t h e zero c ross ing

method. Great c a r e has however t o be exercised i n t h e design of

t he c l i pp ing s t ages t o insure high s e n s i t i v i t y and freedom from

blocking phenomena,

Measurements

Measurements have been msde on a small mate r ia l from a

phone t ica l ly t r a ined spea.korii . The f r i c a t i v e s s, f , , 7, and 5 " appeared i n i n i t i a l pos i t i on i n tho Swedish words of tha type / ~ e : n a / .

Four u t t e rances of each f r i c a t i v e were used. The g ross s p e c t r a l

f e a t u r e s of t h e speech mater ia l a r e displayed i n Fig. I - 8.

Tho s igna l was band l im i t ed t o t h e range frcm 0.5 t o

1 2 kc/s wi th moderately s h ~ r p f i l t e r s (1 80 d ~ / o c t a v e ) . The s i g c a l

was then fcd i n t o f i v e networks g iv ing -12, -6, -0, +6, and +12

d ~ / o c t s lope. Tlie s i g n s l s from t h e emphasis networks were r e c t i f i e d ,

smoothed ( i n t e g r a t i o n tlma constant 25 m s ) , and synchronously sampled

z t 23 m s i n t e r v a l s . Tlic s u b s ~ , q ~ : n ; p r o e x s r i ~ q c t ~ ~ g , e p ~ o v i d i n g cnaiog-

d i g i t a l convc r s i o n 'on a 1 ogaritIhrcic a c p - l ~ h z v ~ b ~ < , n descr ibed i n r o f . (2 )

Resu l t s of t he measurements a r e p l o t t e d i n Fig. I - 9.

Po in t s belonging t o t h e same u t t e r ance of a s p e c i f i c sound a r e joined

wi th arrows po in t ing i n t h e d i r e c t i o n of time.

The f i gu re d i sp lays a c e r t a i n correspondence t o tha

d i s t i n c t i v e f e a t u r e s ~ 'corn~ac t/dii 'fusell , The spectrum model could

be expected t o makh "compact'' sounds b e t t e r than o thers . This

seems t o be v e r i f i e d by tho l i s t e n i n g t e s t repor ted on i n t h e

appendix.

?t (3 ) Pa r t of t he mate r ia l used by MQrtony .

Fig. I - 8. C m u a p e t r a of f h e Swedish fMlllv* a.d.. B u b 0.1m repescnta four ult.rme.a. Bach uttermce h.s !na smpled In It M average five pohls 40 ma .put. Each sample Is mrmlllred with r e s p d to Its mil Iwcl whlcb 1s u s d M

rdermee. 26 p r c.nt of tha s.mpl*s fall batwen neb p l r 01 Ilnu. A ~ I y s l a w u prformed by eonllgwus lhlrd order B8tterwrtL ft1t.r~. The dfeetlve averqlr@ lime mnslnnt of Lbc SmoalhIng fil ters Is of the s u n e order M the sunpllng latewll. Only sample. wilh a total level wltbln 10 W (for /f/: 15 dB) fmm UI. mulmum for each utterance have been oaaaldcred.

octaves

Fig. I - 9. Moment weighting results for four utterances of each of five sounds. Sloping dashed lines indicate loci of fc. (M4/M-4) i s not independent of X .

0 2 > P o , and P-2 w a s experimentally determined f o r t h e same

speech mator ia l . f and 3Lc were determined from the se and conform C

i n q u a l i t y t o a reasonable ex ten t wi th t h e previous r e s u l t s .

However, i t appcarad t h a t t h e zero c ross ing method was apt t o g ive

a l e s s d i s t i n c t separa t ion between t h e diffecrent eounds. This seems

t o be due t o h y s t e r e s i s phenomena i n t h e c l i pp ing process.

I t should be observed t h a t some s i g n a l s may g ive r i s e

t o va lues of t h a t a r c g r e a t e r than t h e r a t i o between t h e l i m i t i n g

f requencies of t h e e n t i r e processed band. This w i l l occur when t h e

s i g n a l spectrum has two widely separated peaks of comparable magnitude.

The / 6 / scuni? t snds somewhat t o t h i s . Such a s igna l w i l l of co-;rse

match t h e rec tangula r spectrum model extremely bad. Thc e f f e c t may

a l s o be d i s t u rb ing i n case of excess ive background no ise o r hum and

g r e a t c a r€ should be takon i n case mul t ip le i n t e g r a t i o n o r d i f f e r e n t i -

a t i o n i s used.

Appendix

In o rder t o p a r t i a l l y j u s t i f y t h e model a s e t of sounds

were fabr ica ted by f i l t e r i n g out c e r t a i n bands of a random whits

no i se s i gna l according t o the s p e c i f i c a t i o n below. 180 d ~ / o c t

f i l t e r s were used f o r t h i s purpose. The d a t a approximately corre-

spond t o mean va lues obtained from t h e measurements above.

I i Total i f m j lev01 t kc/s

[ $1

octaves I

0.5 i 8.0 I -a5 S I 4.0 1 8.0 -10

I F 2.0 t 6.0 I -15

i l 0 5 i 4.0 J ?

2 .o 5 86

3 95 2 05

4.0

1.0

197

1.4

5 0.75 j 3.0 105

back- i ground1 0 1 1 6

! i I -35 j !

I

2 .O

Samples wi th t h e dura t ion 180 m s were sp l i c ed i n l i e u

of t h e /f/ i n i d e n t i c a l copies of t h e na tu r a l u t t e r ance /fe:na/.

The m a x i m u m l e v e l of t h i s word i s t h e refcrenco i n t h e t a b l c above.

There was apparent ly no formant t r a n s i t i o n a t t h e beginning of t h e

e Tho onset and decay of t h e syn the t i c sounds l a s t e d some 6 m s

con t ro l l ed by diagonal tape sp l i c ings . The samples were assembled

i n random order i n t o two l is ts of 4x5 items and background noise

was added. The s t imu l i were presented over headphones t o nine

members of t h e l abora tory s t a f f f o r fcrced choice i d e n t i f i c a t i o n .

Three sub jec t s were phone t ica l ly t r a i n e d and responded 82 $ "correct f1 ,

somewhat higher than t h e avera,ge 74 $. The confusion matrices bclow

a r e pooled fcr a l l sub j ec t s . The f i r s t and t h e second l i s t s a r e

kept apa r t t o show the l ea rn ing . Separat ion between /j/ and / b / w a s requested al though the r e i s no semantical d i f f e r ence between

them i n Swedish.

Stim.

F i r s t reading

Resp . Second read ing

Resp,

The good separa t ion between /?/ and /j/ i s somewhat s u r p r i s i n g

having t h e s tudy of MBrtony ( 3 ) ~n - mind but could be explained

by t h a t these p a r t i c u l a r s t imu l i had a g r e a t e r d i f fe rence i n f c

than found i n t h e l i v e speech. The / h / sound was very unnatural 4

p a r t l y due t o t h e l a c k of formant t r a n s i t i o n s i n t h e following

vowel. It resembled a tltolephone qua l i t y " /f/. Some sub j ec t s

o f fe red q u a l i t y judgements which e s s e n t i a l l y cons i s ted with t h e

ranking i n t h e matrix diagonals.

J, Li l j enc ran t s

References

(1 ) Ricc, S.0.r "Mathematical Analysis o f Random Noise", B e l l System Tech. J., J u l y 1 944 and January 1945.

( 2 ) L i l jencrant s , J . , Rengman, U . , Garpendahl , G , : "51 -Channel Analyzer f o r Spectrum Samplingtt, STL-QPSR-J/I 962, PP* 1-5,

( 3 ) Mgrtony, J, s "On the Percept i o n of :$wedish Voiceless F r i c a t i ve s " , STL-QPSR-2/1 962, pp. 25-28.