What does speech “look” like

to an Automatic Speech Recognition System?

Jiang Wujiang.wu@binghamton.eduElectrical Engineering Department

Say if you are only allowed to use 39 values to represent a speech seg of 1 sec long…

What are “speech features?” What are good features?

◦ Discriminative

◦ “Curse of Dimensionality”

How do we extract features from speech?

From both time and frequency domain…

For each frame of pre-recorded speech, we try to extract the feature as to compress its spectrum.

Frequency Domain Features : “Static” Features

0 2 4 6 8-0.2

Frequency [kHz]

BV1BV2

-60 -40 -20 0 20 40 60-0.25

Time [ms]

BV1BV2

Time Domain Features : Trajectories of Static Spectral Features over Time Most recent research has

shown that spectral trajectories, over time, also play an important role in ASR.

Thus, we also want to let computers see what happens over time, about the center of each static feature.

So finally to the ASR system, the speech features look like…

Time (Sec)

Original Spectrogram

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

Time (Sec)

Rebuilt Spectrogram

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

Other Features for Future Study… Pitch Contour :

◦ Strongly related to “tones”◦ Very popular feature type

for tonal languages: Mandarin, Cantonese, Some of Korean dialects, etc.

“Perceptual Features:◦ Analyze speech signal as to

how human’s auditory system “perceptually” process sound

◦ Frequency resolution and time resolution both depend on frequency and time..

0 0.5 1 1.5 2 2.5 3-4000

4000Speech waveform

Time (seconds)

0 0.5 1 1.5 2 2.50

300Pitch

Time (seconds)

0.5 1 1.5 2 2.5 30

8000Spectrogram

Our Speech Lab Dr. Montri Karnjanadecha (Force Alignment) Chandra Vootkuri (Ph.D.) (Landmark Theory) Brian Wong (M.S.) (Freq. Non-linearity) Andrew Hwang(M.S.) (Feature Transform)

Our Current Projects Project 1: To create an open source multi- language

audio database for spoken language processing applications.

Project 2: To understand tonal languages.

◦Signal processing (A/D)◦Probability theory, pattern recognition and

machine learning◦Understanding of human auditory system/

linguistic/musicality will be a bonus!

Background you need..

Speech Research is interesting!

◦ Pronunciation therapy

◦ Singing voice processing

◦ Hearing aids

Tons of interesting applications!!

Talk to a public computer in your real life…

◦ Not just the Microsoft speech-text software on your PC..

Thank you!Questions?

What does speech “look” like

Documents

Transcript of What does speech “look” like

What Does Failure Look Like?

WHOSE COUNT? What does Edmond Dantes look like? What does Edmond Dantes look like? Does the Count of Monte Cristo look the same? Does the Count of Monte.

What does plagiarism look like?

Questions: What do you look like? What does your friend look like? What do they look like? What does your father look like? How about your mother?

What does he look like ?. Unit 7 What does he look like?

Look Board what does drowning look like

What Does Green Look Like

What Does HUNGER Look Like?

“Equilibrium” What does it mean? What word does it look like? What does it mean? What word does it look like?

What does success look like?

HOW DOES A HONEYBEE LOOK?

What Does It LOOK Like Online?

What does wellness look like??

Does it look difficult?

WHAT DOES MUSIC LOOK LIKE?

What does she look like

SharePoint: what does good look like?

What Does ‘Beautiful’ Look Like?

What does speech “look” like

What does he look like?