Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech...

Post on 19-Jan-2018

216 views 0 download

description

Speech Processing Laboratory, Temple University May 5, Overview  Voiced and Unvoiced Speech  Usable and Unusable Speech  Nonlinearities in Speech  Non-Linear Embedding  Research Goal  Proposed Research

Transcript of Speech Processing Laboratory, Temple University May 5, 2004 1 Structure-Based Speech...

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

1

Structure-Based Speech Classification Structure-Based Speech Classification Using Nonlinear Embedding Using Nonlinear Embedding

TechniquesTechniques

Uchechukwu Ofoegbu

AdvisorDr. Robert E. Yantorno

CommitteeDr. Saroj K. Biswas

Dr. Henry M. Sendaula

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

2

AcknowledgmentAcknowledgment Dr. Robert YantornoDr. Robert Yantorno Dr. Saroj BiswasDr. Saroj Biswas Dr. Henry SendaulaDr. Henry Sendaula Speech Lab MembersSpeech Lab Members

Air Force Research Laboratory,Air Force Research Laboratory,Rome, NYRome, NY

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

3

OverviewOverview Voiced and Unvoiced Speech

Usable and Unusable Speech

Nonlinearities in Speech

Non-Linear Embedding

Research Goal

Proposed Research

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

4

Voiced and Unvoiced SpeechVoiced and Unvoiced Speech

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

5

Voiced/Unvoiced CharacteristicsVoiced/Unvoiced Characteristics

Voiced

Quasi-periodic excitation

Modulation by vocal tract

Production of vowels, voiced fricatives & plosives

Unvoiced

No periodic vibration of vocal chords

Noise-like nature

Production of unvoiced fricatives and plosives

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

6

Usable SpeechUsable Speech

Portions of co-channel speech still usable for applications such as Speaker ID and Speech Recognition.

Low-energy (unvoiced/silence) segments overlap with high-energy (voiced) segments

Target-to-interferer Ratio (TIR) > 20dB

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

7

Nonlinearities in SpeechNonlinearities in SpeechGlottal waveform changes

Shape varies with amplitude

Physical observations Flow in vocal tract is non-laminar

Coupling between vocal tract and folds When glottis is open, prominent changes are observed

in formant characteristics

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

8

Nonlinear EmbeddingNonlinear Embedding

Nonlinear Systems

Point moving along some trajectory in an abstract state space

Coordinates of the point are independent degrees of freedom of the system

State space could be reconstructed from a scalar signal

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

9

Nonlinear Embedding (cont’d)Nonlinear Embedding (cont’d)

Takens’ Method of Delays

A state space representation topologically equivalent to the original state space of a system can be reconstructed from a single observable dimension

Vectors in m-dimensional state space are formed from time-delayed values of a signal

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

10

Nonlinear Embedding (cont’d)Nonlinear Embedding (cont’d)

dmisdisdisisix 1,,2,,

m = embedding dimension

d = delay value

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

11

Nonlinear Embedding (Cont’d)Nonlinear Embedding (Cont’d)Delay value, d:

Dependent on sampling rate and signal properties

Large enough such that nonlinearities are taken into account by the reconstructed trajectory

Small enough to retain reasonable time resolution

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

12

Nonlinear Embedding (Cont’d)Nonlinear Embedding (Cont’d)Dimension, m:

Generation of voiced speech constitutes a low-dimensional system

Generation of unvoiced speech constitutes a relatively high-dimensional system

Using a low dimension (such as m = 3) sufficiently reconstructs voiced but not unvoiced speech

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

13

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

14

Embedded Voiced and Embedded Voiced and Unvoiced SpeechUnvoiced Speech

-50000

5000

10000

-5000

0

5000

10000-5000

0

5000

10000

Embedded Voiced Speech

-2000

0

2000

-2000-10000

10002000-2000

-1000

0

1000

2000

Embedded Unvoiced Speech

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

15

Embedded Usable and Embedded Usable and Unusable SpeechUnusable Speech

-4000-2000

02000

40006000

-5000

0

5000-4000

-2000

0

2000

4000

6000

Embedded Co-channel Speech of 30dB TIR

-10000-5000

05000

-10000-5000

05000

-10000

-5000

0

5000

Embedded Co-channel Speech of 10dB TIR

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

16

Research GoalResearch GoalFeature Extraction

Difference-Mean Comparison (DMC) Measure

– Voiced/unvoiced classification

Nodal Density Measure– Voiced/unvoiced classification– Usable/unusable classification

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

Difference-Mean Difference-Mean Comparison (DMC) MeasureComparison (DMC) Measure

Voiced/Unvoiced ClassificationVoiced/Unvoiced Classification

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

18

IntroductionIntroduction 3rd order difference computation along first

non-singleton dimension

Ist order difference of NxN matrix given by

Length(3rd order diff. > mean) observed

(2,1) (1,1) (2, 2) (1, 2) . . . (2, ) (1, )(3,1) (2,1) (3, 2) (2,2) . . . (3, ) (2, )

. . .

. . .

. . .( ,1) (( 1),1) ( , 2) (( 1),2) . . . ( , ) (( 1), )

X X X X X N X NX X X X X N X N

X N X N X N X N X N N X N N

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

19

Embedded Voiced and Embedded Voiced and Unvoiced SpeechUnvoiced Speech

-50000

5000

10000

-5000

0

5000

10000-5000

0

5000

10000

Embedded Voiced Speech

-2000

0

2000

-2000-10000

10002000-2000

-1000

0

1000

2000

Embedded Unvoiced Speech

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

20

Difference-Mean Comparison Difference-Mean Comparison Distribution Distribution

0 20 40 60 80 100 120 140 1600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Prob

abili

ty

Difference-Mean Comparison

Clean Speech

VoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

21

Difference-Mean Comparison Difference-Mean Comparison DistributionDistribution

0 20 40 60 80 100 120 140 1600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Prob

abili

ty

Difference-Mean Comparison

Speech + 15dB Pink Noise

VoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

22

Difference-Mean Comparison Difference-Mean Comparison DistributionDistribution

0 50 100 1500

0.05

0.1

0.15

0.2

Prob

abili

ty

Difference-Mean Comparison

Speech + 15dB White NoiseVoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

23

DMC-Based Decisions

200 400 600 800 1000 1200 1400-1

0

1

Clean Speech => 1:V; 0:Dont Care; -1:UV

Ampl

itude

200 400 600 800 1000 1200 1400-1

0

1

Deci

sion

200 400 600 800 1000 1200 1400-1

0

1

Sample Number

Deci

sion

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

24

DMC-Based Decisions

200 400 600 800 1000 1200 1400-1

0

1

Speech + 15dB Pink Noise => 1:V; 0:Dont Care; -1:UV

Ampl

itude

200 400 600 800 1000 1200 1400-1

0

1

Deci

sion

200 400 600 800 1000 1200 1400-1

0

1

Sample Number

Deci

sion

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

25

DMC-Based Decisions

200 400 600 800 1000 1200 1400-1

0

1

Speech + 15dB White Noise => 1:V; 0:Dont Care; -1:UV

Ampl

itude

200 400 600 800 1000 1200 1400-1

0

1

Deci

sion

200 400 600 800 1000 1200 1400-1

0

1

Sample Number

Deci

sion

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

26

DMC-Based Decisions

200 400 600 800 1000 1200 1400-1

0

1

Clean Speech => 1:V; 0:Dont Care; -1:UV

Ampl

itude

200 400 600 800 1000 1200 1400-1

0

1

Deci

sion

200 400 600 800 1000 1200 1400-1

0

1

Sample Number

Deci

sion

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

27

DMC-Based Decisions

200 400 600 800 1000 1200 1400-1

0

1

Speech + 15dB Pink Noise => 1:V; 0:Dont Care; -1:UVAm

plitu

de

200 400 600 800 1000 1200 1400-1

0

1

Deci

sion

200 400 600 800 1000 1200 1400-1

0

1

Sample Number

Deci

sion

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

28

DMC-Based Decisions

200 400 600 800 1000 1200 1400-1

0

1

Speech + 15dB White Noise => 1:V; 0:Dont Care; -1:UV

Ampl

itude

200 400 600 800 1000 1200 1400-1

0

1

Deci

sion

200 400 600 800 1000 1200 1400-1

0

1

Sample Number

Deci

sion

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

29

ResultsResultsHits Minus False Alarms for Voiced Speech

0

20

40

60

80

100

Clean 15dB P ink 15dB White

FR/RE E/ZC DMC

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

30

Results (Cont’d)Results (Cont’d)Hits Minus False Alarms for Unvoiced Speech

0

20

40

60

80

100

Clean 15dB Pink 15dB White

Perc

ent

FR/RE E/ZC DMC

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

Nodal Density MeasureNodal Density Measure Voiced/Unvoiced ClassificationUsable/Unusable Classification

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

32

IntroductionIntroduction Smallest cube which encloses the signal is

determined

This cube is divided into N smaller cubes

Edges of the smaller cubes are defined as nodes

Number of nodes spanned by the signal is determined

Ratio of number of nodes spanned to total number of nodes is defined as nodal density

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

Voiced/Unvoiced ClassificationVoiced/Unvoiced Classification

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

34

Embedded Voiced and Unvoiced Embedded Voiced and Unvoiced Speech Frames with GridsSpeech Frames with Grids

-0.1-0.05

00.05

0.10.15

-0.1-0.05

00.05

0.10.15-0.1

-0.05

0

0.05

0.1

0.15

Voiced

-0.01-0.005

00.005

0.01

-0.01

-0.0050

0.005

0.01-0.01

-0.005

0

0.005

0.01

Unvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

35

Nodes Spanned by Embedded Voiced and Nodes Spanned by Embedded Voiced and Unvoiced Speech FramesUnvoiced Speech Frames

-0.1-0.05

00.05

0.10.15

-0.1-0.05

00.05

0.10.15-0.1

-0.05

0

0.05

0.1

0.15

Voiced

-0.01-0.005

00.005

0.01

-0.01

-0.005

0

0.005

0.01-0.01

-0.005

0

0.005

0.01

Unvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

36

Nodal-Density Distribution Nodal-Density Distribution

0.03 0.04 0.05 0.06 0.070

0.05

0.1

0.15

0.2

0.25

Prob

abili

ty

Nodal-Density

Clean Speech VoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

37

Nodal-Density Distribution Nodal-Density Distribution

0.03 0.04 0.05 0.06 0.070

0.05

0.1

0.15

0.2

0.25

Prob

abili

ty

Nodal-Density

Speech + 15dB Pink Noise

VoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

38

Nodal-Density Distribution Nodal-Density Distribution

0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.0750

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Prob

abili

ty

Nodal-Density

Speech + 15dB White NoiseVoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

39

FilteringFiltering

Moving Average Filter

Order, M = 10

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

40

Nodal-Density Distributions after Nodal-Density Distributions after FilteringFiltering

0.03 0.04 0.05 0.06 0.070

0.05

0.1

0.15

0.2

Prob

abili

ty

Nodal Density

Clean Speech

VoicedUnvoiced

0.03 0.04 0.05 0.06 0.070

0.05

0.1

0.15

0.2

0.25

Prob

abili

ty

Nodal-Density

Clean Speech VoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

41

Nodal-Density Distributions after Nodal-Density Distributions after FilteringFiltering

0.03 0.04 0.05 0.06 0.070

0.05

0.1

0.15

0.2

0.25

Prob

abili

ty

Nodal Density

Speech + 15dB Pink Noise

VoicedUnvoiced

0.03 0.04 0.05 0.06 0.070

0.05

0.1

0.15

0.2

0.25

Prob

abili

ty

Nodal-Density

Speech + 15dB Pink NoiseVoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

42

Nodal-Density Distributions After Nodal-Density Distributions After FilteringFiltering

0.04 0.05 0.06 0.070

0.05

0.1

0.15

0.2

Prob

abili

ty

Nodal Density

Speech + 15dB White Noise

VoicedUnvoiced

0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.0750

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Prob

abili

ty

Nodal-Density

Speech + 15dB White NoiseVoicedUnvoiced

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

43

ResultsResultsHits Minus False Alarms for Voiced Speech

010203040506070

Clean 15dB P ink 15dB White

ND ND_Filt

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

44

Results (Cont’d)Results (Cont’d)

Hits Minus False Alarms for Unvoiced Speech

010203040506070

Clean 15dB Pink 15dB White

Perc

ent

ND ND_Filt

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

Proposed ResearchProposed Research

Usable/Unusable ClassificationUsable/Unusable Classification

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

46

Embedded Usable and Unusable Embedded Usable and Unusable Speech Frames with GridsSpeech Frames with Grids

-10000-5000

05000

-10000-5000

05000

-10000

-5000

0

5000

Embedded Co-channel Speech of 10dB TIR with Grids

-5000

0

5000

-5000

0

5000-5000

0

5000

Embedded Co-channel Speech of 30dB TIR with Grids

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

47

Nodes Spanned by Embedded Usable Nodes Spanned by Embedded Usable and Unusable Speech Framesand Unusable Speech Frames

-4000-2000

02000

40006000

-5000

0

5000-4000

-2000

0

2000

4000

6000

Nodes Spanned by Embedded Co-channel Speech of 30dB TIR

-10000

-5000

0

5000

-10000

-5000

0

5000-6000

-4000

-2000

0

2000

4000

6000

Nodes Spanned by Embedded Co-channel Speech of 30dB TIR

-10000

-5000

0

5000

-10000

-5000

0

5000-6000

-4000

-2000

0

2000

4000

6000

Nodes Spanned by Embedded Co-channel Speech of 30dB TIR

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

48

Preliminary ResultsPreliminary Results

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC Curve for Usable Speech Detection Using the Nodal Density Measure

False Alarms

Hits

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

49

SummarySummary

SpeechSpeech Nonlinear Embedding

Difference-Mean

Comparison

Nodal Density Usable/Unusable Usable/Unusable

ClassificationClassification

V/UV ClassificationV/UV Classification

V/UV ClassificationV/UV Classification

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

50

Future Proposed ResearchFuture Proposed Research Determine optimum filter for nodal density-based

voiced/unvoiced classification

Develop nodal density measure for usable/unusable classification

Investigate the presence of complimentary information in between both features (DMC and nodal density) for voiced/unvoiced classification

Perform decision-level fusion of both features

Speech Processing Laboratory, Temple University Speech Processing Laboratory, Temple University

May 5, 2004May 5, 2004

51

If you understood this If you understood this presentation presentation

……

please askplease ask QUESTIONS !!!QUESTIONS !!!