Computational Audition at AFRL/HE: Past, Present, and Future

46
Computational Audition at AFRL/HE: Past, Present, and Future Dr. Timothy R. Anderson Human Effectiveness Directorate Air Force Research Laboratory

description

Computational Audition at AFRL/HE: Past, Present, and Future. Dr. Timothy R. Anderson Human Effectiveness Directorate Air Force Research Laboratory. Biologically Based Signal Processing. AWACS. Sensor-decision maker- shooter. Future JAOC Command & Control. Speech Technologies. JAOC. - PowerPoint PPT Presentation

Transcript of Computational Audition at AFRL/HE: Past, Present, and Future

Page 1: Computational Audition at AFRL/HE: Past, Present, and Future

Computational Audition at AFRL/HE:

Past, Present, and Future

Dr. Timothy R. AndersonHuman Effectiveness Directorate

Air Force Research Laboratory

Page 2: Computational Audition at AFRL/HE: Past, Present, and Future

2

Biologically Based Signal Processing

• research, development and applications of:– Biologically based algorithms– Perceptually relevant features – Human-centered metrics and models– to improve robustness of speech processing

systems

SpeechSpeechTechnologiesTechnologies

JAOC

Sensor-decision maker-shooter

Future JAOCCommand & Control

Combat Plans

AWACS

Chem-bio Defense Environment

Page 3: Computational Audition at AFRL/HE: Past, Present, and Future

3

Why Is This Area Important?

• Present signal processing systems (i.e. speech and speaker recognition, speech coding, etc.) are not robust in adverse military environments.

• Biological principles offer potential to provide improved performance in military environments.

Page 4: Computational Audition at AFRL/HE: Past, Present, and Future

4

Technical Challenges• Identification and modeling of features and processes used by biological systems• Incorporation of those key features and processes into computationally efficient algorithms and structures

Approach• Develop psychoacoustic testing procedures• Characterize key features and processes• Developed human-centered model and metrics• Implement computationally efficient algorithms• Provide support to operational test and warfighting exercises to evaluate system utility

Biologically Based Signal Processing

Dominant

Strong

Favorable

Tenable

Weak

Embryonic Growth Mature Aging

Page 5: Computational Audition at AFRL/HE: Past, Present, and Future

5

Research Areas

• Cockpit Speech Recognition• Robust Speech Recognition

– Monaural Speech Recognition– Binaural Speech Recognition– Auditory Model Front-ends

• Speaker Recognition/Verification– Biologically Based Speaker ID– Channel Robustness– Speaker Recognizability Test

Page 6: Computational Audition at AFRL/HE: Past, Present, and Future

6

Phoneme Classification

• Kohonen Self-Organizing Feature Map– 16 X 16

• 10 Speaker Database (TIMIT)• 10 sentences/speaker• Leaving one out method (per speaker)• Features calculated with

– 16 ms window – 5 ms frame step

Page 7: Computational Audition at AFRL/HE: Past, Present, and Future

7

TRADITIONAL VS. AUDITORYMONAURAL

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

% AIMMFCC

Page 8: Computational Audition at AFRL/HE: Past, Present, and Future

9

Binaural Speech Recognition

• Past• Present • Future

Page 9: Computational Audition at AFRL/HE: Past, Present, and Future

10

Binaural Speech Recognition

• Stereausis• Cocktail Party Processor• BAIM• BINAP

Page 10: Computational Audition at AFRL/HE: Past, Present, and Future

11

EXPERIMENT SETUP

SOUNDSOURCE

SOURCENOISE

XX

Page 11: Computational Audition at AFRL/HE: Past, Present, and Future

12

MONAURAL VS. BINAURAL COCKTAIL PARTY PROCESSOR

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

% CPPMONO

Page 12: Computational Audition at AFRL/HE: Past, Present, and Future

13

MONAURAL VS. BINAURAL AUDITORY IMAGE MODEL

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

% BAIMAIM

Page 13: Computational Audition at AFRL/HE: Past, Present, and Future

14

BINAURAL

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

% CPPBAIM

Page 14: Computational Audition at AFRL/HE: Past, Present, and Future

15

MONAURAL

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

% AIMMONO

Page 15: Computational Audition at AFRL/HE: Past, Present, and Future

16

BAIM VS. CPP-AIM

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

%BAIMAIMCPP-AIM

Page 16: Computational Audition at AFRL/HE: Past, Present, and Future

17

COINCIDENCE

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

% BAIMBINAP

Page 17: Computational Audition at AFRL/HE: Past, Present, and Future

18

MONAURAL, BINAURAL AND TRADITIONAL

Phoneme Recognition Rate

05

101520253035404550

1 5 10 15 20 32 Clean

Signal-to-Noise Ratio (dB)

%

CPPBAIMAIMMONOMFCCBINAPCPP-AIM

Page 18: Computational Audition at AFRL/HE: Past, Present, and Future

19

Binaural Speech Recognition

RESULTSBINAURAL AUDITORY MODELPROVIDES BETTER REPRESENTATION THAN TRADITIONAL TECHNIQUES:

TASK

PHONEME RECOGNITION

SPEECH

LOW TO HIGH SNR

RESULTS7-12 dB BINAURAL ADVANTAGE

Page 19: Computational Audition at AFRL/HE: Past, Present, and Future

20

Binaural Speech Recognition

• Past• Present

– No Current Work• Future

Page 20: Computational Audition at AFRL/HE: Past, Present, and Future

21

Binaural Speech Recognition

• Past• Present • Future

– Implement binaural ASR system– Investigate further binaural fusion mechanisms– Meeting room data– Implement binaural system using AIM chips

Page 21: Computational Audition at AFRL/HE: Past, Present, and Future

22

Auditory Model Front Ends

• Past• Present • Future

Page 22: Computational Audition at AFRL/HE: Past, Present, and Future

23

Auditory Model Front Ends

• Tanner Research “Analog Speech Recognition”– Implementation of AIM– 56 channels Analog Filter bank– Single SBUS board– 1.5 X Real-time

Page 23: Computational Audition at AFRL/HE: Past, Present, and Future

24

Auditory Model Front Ends

• AFIT – Designed Digital Implementation

• Middle ear, BMM, adaptive thresholding– 32 channels per chip– 300 Hz – 7 kHz– 44.1 KHz sampling rate– 2 chips provide 64 channels in real-time

Page 24: Computational Audition at AFRL/HE: Past, Present, and Future

27

Auditory Model Front Ends

• Past• Present

– Single board system designed and prototyped - USB– Current chip design undergoing debug– Second fabrication run this fall

• Future

Page 25: Computational Audition at AFRL/HE: Past, Present, and Future

28

Auditory Model Front Ends

• Past• Present • Future

– Debug and verify chip fabrication– Debug PC based real-time auditory model front end– Implement complete end-to-end auditory ASR– Investigate feedback mechanisms in auditory model

for ASR

Page 26: Computational Audition at AFRL/HE: Past, Present, and Future

29

Biologically Based SID

• Past• Present • Future

Page 27: Computational Audition at AFRL/HE: Past, Present, and Future

30

Biologically Based SID

• Auditory Models Investigated– Payton’s Auditory Model (PAM)– Auditory Image Model (AIM)

• VQ Codebook used to model speaker• 37 Speakers from TIMIT (dr1,2 12F 25M)

– MFCC 94%– PAM 67%– AIM 91%

Page 28: Computational Audition at AFRL/HE: Past, Present, and Future

31

Biologically Based SID

• Past• Present • Future

Page 29: Computational Audition at AFRL/HE: Past, Present, and Future

32

Biologically Based SID

• Using perceptual features– Formants, formant bandwidths, and pitch

• Voiced Frames• Using GMM classifier• Conducting experiments on larger databases

– Switchboard

Page 30: Computational Audition at AFRL/HE: Past, Present, and Future

33

Biologically Based SID

MFCCs, no Deltas, no CMS

F0 Base

MFCCs, no CMS

Page 31: Computational Audition at AFRL/HE: Past, Present, and Future

34

Biologically Based SID

MFCCs, no Deltas, no CMS

F0 Base

MFCCs, no CMS

Page 32: Computational Audition at AFRL/HE: Past, Present, and Future

35

Biologically Based SID

F0 Base

MFCCs, no Deltas, no CMSMFCCs,

no CMS

Page 33: Computational Audition at AFRL/HE: Past, Present, and Future

36

Biologically Based SID

MFCCs, no Deltas, no CMS

F0 Base

MFCCs, no CMS

Page 34: Computational Audition at AFRL/HE: Past, Present, and Future

37

Biologically Based SID

• Performance isn’t the best, but this feature set…– Uses only 9 features versus 19–38 for MFCCs– Hasn’t been as heavily researched as MFCCs

Page 35: Computational Audition at AFRL/HE: Past, Present, and Future

38

Biologically Based SID

• Determine reasons for performance differences between various databases

• Channel & score normalizations• Pitch-synchronous features• Closed-phase analysis• Glottal model features

Page 36: Computational Audition at AFRL/HE: Past, Present, and Future

39

Biologically Based SID

Page 37: Computational Audition at AFRL/HE: Past, Present, and Future

40

Biologically Based SID

• Past• Present • Future

Page 38: Computational Audition at AFRL/HE: Past, Present, and Future

41

Biologically Based SID

• Investigate other auditory based features– Vocal agitation– Formants, formant bandwidths, and pitch calculated

from the auditory model– Auditory model features

• Conduct experiments on other databases– Broadcast news– Military training exercises

Page 39: Computational Audition at AFRL/HE: Past, Present, and Future

42

Speaker Recognizability Test

• Past• Present • Future

Page 40: Computational Audition at AFRL/HE: Past, Present, and Future

43

Speaker Recognizability Test

• Dynastat “The Development of a Method for Evaluating and Predicting Speaker Recognizability in Voice Communication Systems”– Determined perceptually relevant features

• Perceptual voice traits (PVT)• 21 traits currently identified

– Developed methodology to measure these traits• Human listeners

– Developed measure to determine loss due to channel• Diagnostic Speaker Recogniziability Test (DSRT)

Page 41: Computational Audition at AFRL/HE: Past, Present, and Future

44

Speaker Recognizability Test

• Past• Present • Future

Page 42: Computational Audition at AFRL/HE: Past, Present, and Future

45

Speaker Recognizability Test

• Use perceptual voice traits to identify groups of similar and distinctive speakers

• Determine if current SID systems have difficulty with these similar speakers

• Implementing in-house – Web-based listening test for

• PVT rating• DSRT

Page 43: Computational Audition at AFRL/HE: Past, Present, and Future

46

Speaker Recognizability Test

• Past• Present • Future

Page 44: Computational Audition at AFRL/HE: Past, Present, and Future

47

Speaker Recognizability Test

• Obtain PVT ratings for larger database– Switchboard

• Determine acoustic correlates of perceptually relevant features

• Use as features for speaker recognition• Utilize DSRT for communication system testing

Page 45: Computational Audition at AFRL/HE: Past, Present, and Future

48

Summary

• Computational Audition offers potential for improved performance in adverse military environments

• Still lots of research needs to be accomplished– Fidelity of model– Model feedback pathways

• Computation issues no longer limiting factor in performing meanful experiments

Page 46: Computational Audition at AFRL/HE: Past, Present, and Future

49

Questions?