HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre.
HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006...
-
date post
22-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006...
HIWIRE meeting
ITC-irst
Activity report
Marco Matassoni, Piergiorgio Svaizer
March 9.-10. 2006Torino
Outline
• Beamforming and Adaptive Noise Cancellation• Environmental Acoustics Estimation• Audio-Video data collection• Multi-channel pitch estimation• Fixed-platform prototype acquisition module
Beamforming: D&S
Availability of multi-channel signals allows to selectively capture the desired source:
)τs(t)(~i
M
1i
1 M
ts
Issues:
• estimation of reliable TDOAs;
Method:
• CSP analysis over multiple frames
Advantages:
• robustness
• reduced computational power
D&S with MarkIII
Test set:
• set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels
• clean models, trained on original TIDIGITS
Results (WRR [%]):
C_1 38.5
C_32 50.8
DS_C8 79.9
DS_C16 83.0
DS_C32 85.3
DS_C64 85.4
Adaptive Noise Cancellation
A remote microphone can be used as reference for noise estimation:
+ ++ -
equivalentnoise path filter
(cockpit) noise
(beamformed) speech
Adaptive filter
noisy speech
filtered noise
denoised speech
NMLS
The tested algorithm is the Normalized Mean Least Squares: iterativelly estimate a FIR filter that minimizes the difference between the primary channel and the reference
We implemented two algorithms:
• time domain
• frequency domain (subband)
D&S + ANC
Test set:
• set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels
• clean models, trained on original TIDIGITS
Results (WRR):
C_32 (T) 64.7
C_32 (F) 72.4
DS_C64 (T) 81.8
DS_C64 (F) 88.4
Acoustics estimation
Idea:
Simulate in a realistic way an environment (and the noise)
Method:
• Measure several impulse responses in an environment with a multi-channel equipment (through reproduction of chirp signals) preserving relative amplitudes and mutual delays;
• Generate appropriate noisy signals starting from clean data;
The derived acoustics models perform better in the given environment (also) using real data.
Audio–Video Data Collection
Idea:
In a noisy environment exploit additional features from video data
(collaboration with NTUA and TUC)
Design of AV corpus:
•Task: English connected digits, HIWIRE commands/keywords
•Channels: 4 audio, 3 video
•Environment: acoustically-treated room + noise diffusion
Audio–Video Setup
)))
)))
Cockpit noise )))
70-80 cm
Audio–Video Setup
Audio
4 omnidirectional PZM Shure microphones, 16 kHz/16 bits
background noise diffused by 2 loudspeakers
Video
Webcam: 640x480, 30 fps – color, Unix timestamps
Stereoscopic camera pair: 640x480, 30 fps - bw or 15 fps – color, perfectly synchronous
Current data sets
• 8 speakers / connected digits
• 2 speakers / HIWIRE keyword lists
Fixed prototype acquisition device
Hardware platform:
8 Shure microphones + RME Hammerfall
Software environment:
Linux, ALSA driver
Acquisition module:
• acquires synchronously multiple channels (8);
• writes (to its standard output/file) the enhanced signal + additional information/features (start/end speech hyphoteses, voiced/unvoiced, pitch, …)
Multi-channel pitch analysis
The basic principle is that we can exploit many observations of the same speech processOnce located the speaker, we can take into account the different propagation time at the microphones and perform a time-alignment
Pitch analysis can be performed using:adjacent time intervals extracted from different microphone signals
Basic correlation techniques: AMDF, AUTOC, WAUTOC
WAUTOC is computed for each channel, and summed over the M channels.
For a given frame:
Issues:
• Weights wi may represent the channel reliability; • Use of possible intraframe smoothing of the resulting
fundamental frequency contour, which could improve the overall accuracy
A Multichannel WAUTOC Method
M
iiiwautocwf
1
)()(
)(
fMAX
Video example: distant-talking speech recognition
Video example: multi-channel pitch estimation
Forthcoming activities
• more effective combination of beamforming and ANC;
• test also ANC before D&S beamforming;
• test post-filtering after D&S;
• audio-video collection: an improved audio/video synchronization would be advisable;
• audio-video collection: select best balance beetween quality and frame rate
• acoustically characterize the target environment (prototype);
• integrate the selected features in the multi-channel front-end;