Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John...

Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and

Robust Speech Recognition

John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex Acero: Microsoft Research

With help from Zicheng Liu and Asela Gunawardana

Bone Sensor for Robust Speech Recognition

• Small amounts of interfering noise are disasterous for ASR.

• Standard air microphones are susceptible to noise.

• A bone sensor is more resistant to noise, but produces distortion.

• There is not enough data yet to train speech recognition using the bone sensor.

• Enhancement that fuses bone and air sensors allows us to use existing speech recognizers

Sensor Platform

Noise Suppression in Bone Sensor

AudioAudio

Bone signalBone signal

speechspeech

non-speechnon-speechspeechspeech

Relationship between air and bone signals

Articulation-Dependent Relationship Between Air and Bone Signals

High Resolution Witty Enhancement

High-resolution log spectrum takes advantage of harmonics.

Speech Model

Adaptive noise model: (handful of states)

Pre-trained speech model:

(hundreds of states)

Noisy Speech Model

Speech State

Air Signal

Bone Signal

Noisy Air Mic

Noisy Bone Mic

Noise States

Speech Model

Trained on small speech database

Noisy Speech Model

Adapted at test time

Sensor Model

Frequency domain combination of signal and noise:

Relationship between magnitudes:

Relationship between log spectra:

is treated as probabilistic error

Model distribution over sensors is thus:

Inference

Speech posterior:

This is intractable due to nonlinearity. Algonquin: Iterate Laplace method. Gaussian estimate of posterior of speech and noise for each combination of states,

with mean and variance

and state posterior, (see paper for details)

Compute posterior speech mean:

Then resynthesize using lapped transform and noisy phases.

Adaptation

Since noise is unpredictable, we adapt the noise model at inference time, keeping the speech model constant.

The generalized EM algorithm yields the following update equations (see paper for details). The averages over time <>t can be done either in batch or online.

Here, is the posterior state probability, , and are the posterior mean and covariance of the noise given the state.

Air Signal

Bone Signal

Enhanced Signal

Evaluation

• Four subjects reading 41 sentences each of the Wall Street Journal.

• Speaker-dependent models of clean speech training set

• Test set: Same sentences read in same environment with interfering male speaker

• Recognizer: 500K vocabulary (Microsoft)

Parameters

• Audio was 16-bit, 16kHz

• Frames were 50ms with 20ms frame shift.

• Frequency resolution was 256 bins.

• Speech models had 512 states

• Noise models had 2 states

• Noisy log spectra were temporally smoothed prior to processing to reduce jitter.

Test Cases

• Sensor conditions: – Audio only speech model (A)– Audio plus Bone speech model (AB)– AB with state posteriors inferred from bone

signal only (AB+)

• Adaptation conditions– Use speech detection on bone signal to train

noise model (Detect)– Use EM algorithm to adapt noise model (Adapt)

Results

Adaptation Helps

Inferring states from bone mic only

Inferring states from both bone and air mics

Inference only using air mic

Air 57.2%

Bone 28.4%

Original Noisy Signals

Clean: 92.7%

Future Directions

• We have shown a strong potential for model-based noise adaptation in concert with a bone sensor.

• Improve the speed of such systems by employing sequential approximations.

• Incorporate state-conditional correlations between air and bone signals.

• Deal with phase dependencies.

Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John...

Documents

Transcript of Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John...

DOLAN SPEECH · Title: DOLAN SPEECH Subject: DOLAN SPEECH Keywords

1 6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone Sequence To Speech –Articulatory Approaches –Concatenative.

Tropospheric aerosol size distributions simulated by three ... · The modal approach assumes the aerosol population can be described by a number of ... Precipitation Rasch and Kristjansson

Section 3 Introduction-1 Freedom of Speech Key Terms pure speech, symbolic speech, seditious speech, defamatory speech, slander, libel Find Out What speech.

Risk assessment of natural hazards at the Icelandic Meteorological Office – an overview Sigrún Karlsdóttir Eiríkur Gíslason Trausti Jónsson Evgenia Ilyinskaya.

Update on CAM and the AMWG. Recent activities and near ...€¦ · Update on CAM and the AMWG. Recent activities and near -term priorities. ... Rasch-Kristjansson (1998) ... -Fix

REPORTED SPEECH INDIRECT SPEECH

The two different parts of speech Speech Production Speech Perception.

WordPress.com · 2016-01-29 · ž8. Maiden speech means- @ First speech @ Middle speech O Maid servant's speech 771 @ Final speech Maiden speech (First speech) I Botany is to plants

Unit 1 Friendship Grammar Direct speech & Indirect speech Grammar Direct speech & Indirect speech.

The Trouble With Ambivalent Emotions Kristjansson Philosophy Journal

1 Speech User Interfaces 2 Outline Motivation for speech UIs Speech recognition UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech.

Supplementary feeding for improving the health of ... · Systematic Review Summary 5 Elizabeth Kristjansson Damian Francis Vivian Welch et al. Health Services Supplementary feeding

Trausti Baldursson, Ásrún Elmarsdóttir, Kristján Jónasson ... · Mat á verndargildi 18 háhitasvæða Trausti Baldursson, Ásrún Elmarsdóttir, Kristján Jónasson , Olga Kolbrún

The Very Hungry Caterpillar Story Props - Speech-Fun.com · The Very Hungry Caterpillar Story Props page 1 speech-fun.com speech-fun.com speech-fun.com speech-fun.com speech-fun.com

Reported Speech - Direct Speech

speech motion - Speech and Motion Therapy - Speech and ...

Cache Modeling and Optimization using Miniature …...Cache Modeling and Optimization using Miniature Simulations Carl A. Waldspurger CachePhysics, Inc. carl@cachephysics.com Trausti

Leo Kristjansson, Institute of Earth Sciences, University of Iceland Agust Gudmundsson,

Living with volcanos Sigurleifur Kristjansson Isavia Iceland.