Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with...

14
Springer Handbook of Speech Processing Bearbeitet von Jacob Benesty, M. M. Sondhi, Yiteng Huang 1. Auflage 2007. Buch. xxxvi, 1176 S. ISBN 978 3 540 49128 6 Format (B x L): 19,3 x 24,2 cm Weitere Fachgebiete > EDV, Informatik > Informationsverarbeitung > Spracherkennung, Sprachverarbeitung Zu Leseprobe schnell und portofrei erhältlich bei Die Online-Fachbuchhandlung beck-shop.de ist spezialisiert auf Fachbücher, insbesondere Recht, Steuern und Wirtschaft. Im Sortiment finden Sie alle Medien (Bücher, Zeitschriften, CDs, eBooks, etc.) aller Verlage. Ergänzt wird das Programm durch Services wie Neuerscheinungsdienst oder Zusammenstellungen von Büchern zu Sonderpreisen. Der Shop führt mehr als 8 Millionen Produkte.

Transcript of Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with...

Page 1: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

Springer Handbook of Speech Processing

Bearbeitet vonJacob Benesty, M. M. Sondhi, Yiteng Huang

1. Auflage 2007. Buch. xxxvi, 1176 S.ISBN 978 3 540 49128 6

Format (B x L): 19,3 x 24,2 cm

Weitere Fachgebiete > EDV, Informatik > Informationsverarbeitung >Spracherkennung, Sprachverarbeitung

Zu Leseprobe

schnell und portofrei erhältlich bei

Die Online-Fachbuchhandlung beck-shop.de ist spezialisiert auf Fachbücher, insbesondere Recht, Steuern und Wirtschaft.Im Sortiment finden Sie alle Medien (Bücher, Zeitschriften, CDs, eBooks, etc.) aller Verlage. Ergänzt wird das Programmdurch Services wie Neuerscheinungsdienst oder Zusammenstellungen von Büchern zu Sonderpreisen. Der Shop führt mehr

als 8 Millionen Produkte.

Page 2: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

XVII

Contents

List of Abbreviations ................................................................................. XXXI

1 Introduction to Speech ProcessingJ. Benesty, M. M. Sondhi, Y. Huang ........................................................... 11.1 A Brief History of Speech Processing ............................................... 11.2 Applications of Speech Processing .................................................. 21.3 Organization of the Handbook ....................................................... 4References .............................................................................................. 4

Part A Production, Perception, and Modeling of Speech

2 Physiological Processes of Speech ProductionK. Honda................................................................................................. 72.1 Overview of Speech Apparatus ....................................................... 72.2 Voice Production Mechanisms ........................................................ 82.3 Articulatory Mechanisms ................................................................ 142.4 Summary ...................................................................................... 24References .............................................................................................. 25

3 Nonlinear Cochlear Signal Processing and Maskingin Speech PerceptionJ. B. Allen ................................................................................................ 273.1 Basics ........................................................................................... 273.2 The Nonlinear Cochlea ................................................................... 353.3 Neural Masking ............................................................................. 453.4 Discussion and Summary ............................................................... 55References .............................................................................................. 56

4 Perception of Speech and SoundB. Kollmeier, T. Brand, B. Meyer ............................................................... 614.1 Basic Psychoacoustic Quantities ..................................................... 624.2 Acoustical Information Required for Speech Perception ................... 704.3 Speech Feature Perception ............................................................. 74References .............................................................................................. 81

5 Speech Quality AssessmentV. Grancharov, W. B. Kleijn....................................................................... 835.1 Degradation Factors Affecting Speech Quality .................................. 845.2 Subjective Tests ............................................................................. 855.3 Objective Measures ........................................................................ 905.4 Conclusions ................................................................................... 95References .............................................................................................. 96

Page 3: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

XVIII Contents

Part B Signal Processing for Speech

6 Wiener and Adaptive FiltersJ. Benesty, Y. Huang, J. Chen.................................................................... 1036.1 Overview....................................................................................... 1036.2 Signal Models ................................................................................ 1046.3 Derivation of the Wiener Filter ....................................................... 1066.4 Impulse Response Tail Effect .......................................................... 1076.5 Condition Number ......................................................................... 1086.6 Adaptive Algorithms ...................................................................... 1106.7 MIMO Wiener Filter ........................................................................ 1166.8 Conclusions ................................................................................... 119References .............................................................................................. 120

7 Linear PredictionJ. Benesty, J. Chen, Y. Huang.................................................................... 1217.1 Fundamentals ............................................................................... 1217.2 Forward Linear Prediction .............................................................. 1227.3 Backward Linear Prediction ........................................................... 1237.4 Levinson–Durbin Algorithm ........................................................... 1247.5 Lattice Predictor ............................................................................ 1267.6 Spectral Representation ................................................................. 1277.7 Linear Interpolation ...................................................................... 1287.8 Line Spectrum Pair Representation ................................................. 1297.9 Multichannel Linear Prediction ...................................................... 1307.10 Conclusions ................................................................................... 133References .............................................................................................. 133

8 The Kalman FilterS. Gannot, A. Yeredor ............................................................................... 1358.1 Derivation of the Kalman Filter ...................................................... 1368.2 Examples: Estimation of Parametric Stochastic Process

from Noisy Observations ................................................................ 1418.3 Extensions of the Kalman Filter ...................................................... 1448.4 The Application of the Kalman Filter to Speech Processing ............... 1498.5 Summary ...................................................................................... 157References .............................................................................................. 157

9 Homomorphic Systems and Cepstrum Analysis of SpeechR. W. Schafer ........................................................................................... 1619.1 Definitions .................................................................................... 1619.2 Z-Transform Analysis ..................................................................... 1649.3 Discrete-Time Model for Speech Production .................................... 1659.4 The Cepstrum of Speech ................................................................. 1669.5 Relation to LPC .............................................................................. 1699.6 Application to Pitch Detection ........................................................ 171

Page 4: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

Contents XIX

9.7 Applications to Analysis/Synthesis Coding ....................................... 1729.8 Applications to Speech Pattern Recognition .................................... 1769.9 Summary ...................................................................................... 180References .............................................................................................. 180

10 Pitch and Voicing Determination of Speechwith an Extension Toward Music SignalsW. J. Hess ................................................................................................ 18110.1 Pitch in Time-Variant Quasiperiodic Acoustic Signals ....................... 18210.2 Short-Term Analysis PDAs .............................................................. 18510.3 Selected Time-Domain Methods ..................................................... 19210.4 A Short Look into Voicing Determination ......................................... 19510.5 Evaluation and Postprocessing ....................................................... 19710.6 Applications in Speech and Music .................................................. 20110.7 Some New Challenges and Developments ....................................... 20310.8 Concluding Remarks ...................................................................... 207References .............................................................................................. 208

11 Formant Estimation and TrackingD. O’Shaughnessy .................................................................................... 21311.1 Historical ...................................................................................... 21311.2 Vocal Tract Resonances .................................................................. 21511.3 Speech Production ........................................................................ 21611.4 Acoustics of the Vocal Tract ............................................................ 21811.5 Short-Time Speech Analysis ........................................................... 22111.6 Formant Estimation ....................................................................... 22311.7 Summary ...................................................................................... 226References .............................................................................................. 226

12 The STFT, Sinusoidal Models, and Speech ModificationM. M. Goodwin ........................................................................................ 22912.1 The Short-Time Fourier Transform .................................................. 23012.2 Sinusoidal Models ......................................................................... 24212.3 Speech Modification ...................................................................... 253References .............................................................................................. 256

13 Adaptive Blind Multichannel IdentificationY. Huang, J. Benesty, J. Chen.................................................................... 25913.1 Overview....................................................................................... 25913.2 Signal Model and Problem Formulation .......................................... 26013.3 Identifiability and Principle ........................................................... 26113.4 Constrained Time-Domain Multichannel LMS

and Newton Algorithms ................................................................. 26213.5 Unconstrained Multichannel LMS Algorithm

with Optimal Step-Size Control ...................................................... 26613.6 Frequency-Domain Blind Multichannel Identification Algorithms .... 26813.7 Adaptive Multichannel Exponentiated Gradient Algorithm .............. 276

Page 5: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

XX Contents

13.8 Summary ...................................................................................... 279References .............................................................................................. 279

Part C Speech Coding

14 Principles of Speech CodingW. B. Kleijn .............................................................................................. 28314.1 The Objective of Speech Coding ...................................................... 28314.2 Speech Coder Attributes ................................................................. 28414.3 A Universal Coder for Speech .......................................................... 28614.4 Coding with Autoregressive Models ................................................ 29314.5 Distortion Measures and Coding Architecture .................................. 29614.6 Summary ...................................................................................... 302References .............................................................................................. 303

15 Voice over IP: Speech Transmission over Packet NetworksJ. Skoglund, E. Kozica, J. Linden, R. Hagen, W. B. Kleijn ............................ 30715.1 Voice Communication .................................................................... 30715.2 Properties of the Network .............................................................. 30815.3 Outline of a VoIP System ................................................................ 31315.4 Robust Encoding ........................................................................... 31715.5 Packet Loss Concealment ............................................................... 32615.6 Conclusion .................................................................................... 327References .............................................................................................. 328

16 Low-Bit-Rate Speech CodingA. V. McCree ............................................................................................. 33116.1 Speech Coding............................................................................... 33116.2 Fundamentals: Parametric Modeling of Speech Signals ................... 33216.3 Flexible Parametric Models ............................................................ 33716.4 Efficient Quantization of Model Parameters .................................... 34416.5 Low-Rate Speech Coding Standards................................................ 34516.6 Summary ...................................................................................... 347References .............................................................................................. 347

17 Analysis-by-Synthesis Speech CodingJ.-H. Chen, J. Thyssen .............................................................................. 35117.1 Overview....................................................................................... 35217.2 Basic Concepts of Analysis-by-Synthesis Coding .............................. 35317.3 Overview of Prominent Analysis-by-Synthesis Speech Coders .......... 35717.4 Multipulse Linear Predictive Coding (MPLPC) .................................... 36017.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP) ........ 36217.6 The Original Code Excited Linear Prediction (CELP) Coder .................. 36317.7 US Federal Standard FS1016 CELP ..................................................... 36717.8 Vector Sum Excited Linear Prediction (VSELP) ................................... 36817.9 Low-Delay CELP (LD-CELP) .............................................................. 370

Page 6: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

Contents XXI

17.10 Pitch Synchronous Innovation CELP (PSI-CELP) ................................. 37117.11 Algebraic CELP (ACELP) .................................................................... 37117.12 Conjugate Structure CELP (CS-CELP) and CS-ACELP ............................. 37717.13 Relaxed CELP (RCELP) – Generalized Analysis by Synthesis ................ 37817.14 eX-CELP ........................................................................................ 38117.15 iLBC .............................................................................................. 38217.16 TSNFC ............................................................................................ 38317.17 Embedded CELP ............................................................................. 38617.18 Summary of Analysis-by-Synthesis Speech Coders .......................... 38817.19 Conclusion .................................................................................... 390References .............................................................................................. 390

18 Perceptual Audio Coding of Speech SignalsJ. Herre, M. Lutzky ................................................................................... 39318.1 History of Audio Coding ................................................................. 39318.2 Fundamentals of Perceptual Audio Coding ...................................... 39418.3 Some Successful Standardized Audio Coders.................................... 39618.4 Perceptual Audio Coding for Real-Time Communication .................. 39818.5 Hybrid/Crossover Coders ................................................................. 40318.6 Summary ...................................................................................... 409References .............................................................................................. 409

Part D Text-to-Speech Synthesis

19 Basic Principles of Speech SynthesisJ. Schroeter.............................................................................................. 41319.1 The Basic Components of a TTS System ............................................ 41319.2 Speech Representations and Signal Processing

for Concatenative Synthesis ........................................................... 42119.3 Speech Signal Transformation Principles ......................................... 42319.4 Speech Synthesis Evaluation .......................................................... 42519.5 Conclusions ................................................................................... 426References .............................................................................................. 426

20 Rule-Based Speech SynthesisR. Carlson, B. Granström .......................................................................... 42920.1 Background .................................................................................. 42920.2 Terminal Analog ............................................................................ 42920.3 Controlling the Synthesizer ............................................................ 43220.4 Special Applications of Rule-Based Parametric Synthesis ................. 43420.5 Concluding Remarks ...................................................................... 434References .............................................................................................. 434

21 Corpus-Based Speech SynthesisT. Dutoit .................................................................................................. 43721.1 Basics ........................................................................................... 437

Page 7: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

XXII Contents

21.2 Concatenative Synthesis with a Fixed Inventory .............................. 43821.3 Unit-Selection-Based Synthesis ..................................................... 44721.4 Statistical Parametric Synthesis ...................................................... 45021.5 Conclusion .................................................................................... 453References .............................................................................................. 453

22 Linguistic Processing for Speech SynthesisR. Sproat ................................................................................................. 45722.1 Why Linguistic Processing is Hard ................................................... 45722.2 Fundamentals: Writing Systems and the Graphical Representation

of Language .................................................................................. 45722.3 Problems to be Solved and Methods to Solve Them ......................... 45822.4 Architectures for Multilingual Linguistic Processing ......................... 46522.5 Document-Level Processing ........................................................... 46522.6 Future Prospects ............................................................................ 466References .............................................................................................. 467

23 Prosodic ProcessingJ. van Santen, T. Mishra, E. Klabbers ........................................................ 47123.1 Overview....................................................................................... 47123.2 Historical Overview ........................................................................ 47523.3 Fundamental Challenges ............................................................... 47623.4 A Survey of Current Approaches ...................................................... 47723.5 Future Approaches ........................................................................ 48423.6 Conclusions ................................................................................... 485References .............................................................................................. 485

24 Voice TransformationY. Stylianou ............................................................................................. 48924.1 Background .................................................................................. 48924.2 Source–Filter Theory and Harmonic Models .................................... 49024.3 Definitions .................................................................................... 49224.4 Source Modifications ..................................................................... 49424.5 Filter Modifications ....................................................................... 49824.6 Conversion Functions..................................................................... 49924.7 Voice Conversion ........................................................................... 50024.8 Quality Issues in Voice Transformations .......................................... 50124.9 Summary ...................................................................................... 502References .............................................................................................. 502

25 Expressive/Affective Speech SynthesisN. Campbell............................................................................................. 50525.1 Overview....................................................................................... 50525.2 Characteristics of Affective Speech .................................................. 50625.3 The Communicative Functionality of Speech ................................... 50825.4 Approaches to Synthesizing Expressive Speech ................................ 51025.5 Modeling Human Speech ............................................................... 512

Page 8: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

Contents XXIII

25.6 Conclusion .................................................................................... 515References .............................................................................................. 515

Part E Speech Recognition

26 Historical Perspective of the Field of ASR/NLUL. Rabiner, B.-H. Juang ........................................................................... 52126.1 ASR Methodologies ........................................................................ 52126.2 Important Milestones in Speech Recognition History ....................... 52326.3 Generation 1 – The Early History of Speech Recognition ................... 52426.4 Generation 2 – The First Working Systems for Speech Recognition .... 52426.5 Generation 3 – The Pattern Recognition Approach

to Speech Recognition ................................................................... 52526.6 Generation 4 – The Era of the Statistical Model ............................... 53026.7 Generation 5 – The Future ............................................................. 53426.8 Summary ...................................................................................... 534References .............................................................................................. 535

27 HMMs and Related Speech Recognition TechnologiesS. Young ................................................................................................. 53927.1 Basic Framework ........................................................................... 53927.2 Architecture of an HMM-Based Recognizer...................................... 54027.3 HMM-Based Acoustic Modeling ...................................................... 54727.4 Normalization ............................................................................... 55027.5 Adaptation.................................................................................... 55127.6 Multipass Recognition Architectures ............................................... 55427.7 Conclusions ................................................................................... 554References .............................................................................................. 555

28 Speech Recognition with Weighted Finite-State TransducersM. Mohri, F. Pereira, M. Riley ................................................................... 55928.1 Definitions .................................................................................... 55928.2 Overview....................................................................................... 56028.3 Algorithms .................................................................................... 56728.4 Applications to Speech Recognition ................................................ 57428.5 Conclusion .................................................................................... 582References .............................................................................................. 582

29 A Machine Learning Framework for Spoken-Dialog ClassificationC. Cortes, P. Haffner, M. Mohri .................................................................. 58529.1 Motivation .................................................................................... 58529.2 Introduction to Kernel Methods ..................................................... 58629.3 Rational Kernels ............................................................................ 58729.4 Algorithms .................................................................................... 58929.5 Experiments .................................................................................. 59129.6 Theoretical Results for Rational Kernels .......................................... 593

Page 9: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

XXIV Contents

29.7 Conclusion .................................................................................... 594References .............................................................................................. 595

30 Towards Superhuman Speech RecognitionM. Picheny, D. Nahamoo.......................................................................... 59730.1 Current Status ............................................................................... 59730.2 A Multidomain Conversational Test Set ........................................... 59830.3 Listening Experiments ................................................................... 59930.4 Recognition Experiments ............................................................... 60130.5 Speculation ................................................................................... 607References .............................................................................................. 614

31 Natural Language UnderstandingS. Roukos ................................................................................................ 61731.1 Overview of NLU Applications ......................................................... 61831.2 Natural Language Parsing .............................................................. 62031.3 Practical Implementation .............................................................. 62331.4 Speech Mining .............................................................................. 62331.5 Conclusion .................................................................................... 625References .............................................................................................. 626

32 Transcription and Distillation of Spontaneous SpeechS. Furui, T. Kawahara .............................................................................. 62732.1 Background .................................................................................. 62732.2 Overview of Research Activities on Spontaneous Speech .................. 62832.3 Analysis for Spontaneous Speech Recognition ................................. 63232.4 Approaches to Spontaneous Speech Recognition ............................. 63532.5 Metadata and Structure Extraction of Spontaneous Speech.............. 64032.6 Speech Summarization .................................................................. 64432.7 Conclusions ................................................................................... 647References .............................................................................................. 647

33 Environmental RobustnessJ. Droppo, A. Acero................................................................................... 65333.1 Noise Robust Speech Recognition ................................................... 65333.2 Model Retraining and Adaptation .................................................. 65633.3 Feature Transformation and Normalization..................................... 65733.4 A Model of the Environment .......................................................... 66433.5 Structured Model Adaptation ......................................................... 66733.6 Structured Feature Enhancement ................................................... 67133.7 Unifying Model and Feature Techniques ......................................... 67533.8 Conclusion .................................................................................... 677References .............................................................................................. 677

34 The Business of Speech TechnologiesJ. Wilpon, M. E. Gilbert, J. Cohen............................................................... 68134.1 Introduction ................................................................................. 682

Page 10: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

Contents XXV

34.2 Network-Based Speech Services ..................................................... 68634.3 Device-Based Speech Applications ................................................. 69234.4 Vision/Predications of Future Services – Fueling the Trends ............. 69734.5 Conclusion .................................................................................... 701References .............................................................................................. 702

35 Spoken Dialogue SystemsV. Zue, S. Seneff ....................................................................................... 70535.1 Technology Components and System Development ......................... 70735.2 Development Issues....................................................................... 71235.3 Historical Perspectives ................................................................... 71435.4 New Directions .............................................................................. 71535.5 Concluding Remarks ...................................................................... 718References .............................................................................................. 718

Part F Speaker Recognition

36 Overview of Speaker RecognitionA. E. Rosenberg, F. Bimbot, S. Parthasarathy ............................................ 72536.1 Speaker Recognition ...................................................................... 72536.2 Measuring Speaker Features .......................................................... 72936.3 Constructing Speaker Models.......................................................... 73136.4 Adaptation.................................................................................... 73536.5 Decision and Performance ............................................................. 73536.6 Selected Applications for Automatic Speaker Recognition ................ 73736.7 Summary ...................................................................................... 739References .............................................................................................. 739

37 Text-Dependent Speaker RecognitionM. Hébert ................................................................................................ 74337.1 Brief Overview ............................................................................... 74337.2 Text-Dependent Challenges ........................................................... 74737.3 Selected Results ............................................................................ 75037.4 Concluding Remarks ...................................................................... 760References .............................................................................................. 760

38 Text-Independent Speaker RecognitionD. A. Reynolds, W. M. Campbell ................................................................. 76338.1 Introduction ................................................................................. 76338.2 Likelihood Ratio Detector ............................................................... 76438.3 Features ....................................................................................... 76638.4 Classifiers ...................................................................................... 76738.5 Performance Assessment ............................................................... 77638.6 Summary ...................................................................................... 778References .............................................................................................. 779

Page 11: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

XXVI Contents

Part G Language Recognition

39 Principles of Spoken Language RecognitionC.-H. Lee ................................................................................................. 78539.1 Spoken Language .......................................................................... 78539.2 Language Recognition Principles .................................................... 78639.3 Phone Recognition Followed by Language Modeling (PRLM) ............ 78839.4 Vector-Space Characterization (VSC) ................................................ 78939.5 Spoken Language Verification ........................................................ 79039.6 Discriminative Classifier Design ...................................................... 79139.7 Summary ...................................................................................... 793References .............................................................................................. 793

40 Spoken Language CharacterizationM. P. Harper, M. Maxwell ......................................................................... 79740.1 Language versus Dialect................................................................. 79840.2 Spoken Language Collections ......................................................... 80040.3 Spoken Language Characteristics .................................................... 80040.4 Human Language Identification ..................................................... 80440.5 Text as a Source of Information on Spoken Languages..................... 80640.6 Summary ...................................................................................... 807References .............................................................................................. 807

41 Automatic Language Recognition Via Spectraland Token Based ApproachesD. A. Reynolds, W. M. Campbell, W. Shen, E. Singer .................................... 81141.1 Automatic Language Recognition ................................................... 81141.2 Spectral Based Methods ................................................................. 81241.3 Token-Based Methods ................................................................... 81541.4 System Fusion ............................................................................... 81841.5 Performance Assessment ............................................................... 82041.6 Summary ...................................................................................... 823References .............................................................................................. 823

42 Vector-Based Spoken Language ClassificationH. Li, B. Ma, C.-H. Lee .............................................................................. 82542.1 Vector Space Characterization ........................................................ 82642.2 Unit Selection and Modeling .......................................................... 82742.3 Front-End: Voice Tokenization and Spoken Document Vectorization 83042.4 Back-End: Vector-Based Classifier Design ....................................... 83142.5 Language Classification Experiments and Discussion ....................... 83542.6 Summary ...................................................................................... 838References .............................................................................................. 839

Page 12: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

Contents XXVII

Part H Speech Enhancement

43 Fundamentals of Noise ReductionJ. Chen, J. Benesty, Y. Huang, E. J. Diethorn .............................................. 84343.1 Noise ............................................................................................ 84343.2 Signal Model and Problem Formulation .......................................... 84543.3 Evaluation of Noise Reduction ....................................................... 84643.4 Noise Reduction via Filtering Techniques ........................................ 84743.5 Noise Reduction via Spectral Restoration ........................................ 85743.6 Speech-Model-Based Noise Reduction ........................................... 86343.7 Summary ...................................................................................... 868References .............................................................................................. 869

44 Spectral Enhancement MethodsI. Cohen, S. Gannot.................................................................................. 87344.1 Spectral Enhancement ................................................................... 87444.2 Problem Formulation..................................................................... 87544.3 Statistical Models .......................................................................... 87644.4 Signal Estimation .......................................................................... 87944.5 Signal Presence Probability Estimation ........................................... 88144.6 A Priori SNR Estimation .................................................................. 88244.7 Noise Spectrum Estimation ............................................................ 88844.8 Summary of a Spectral Enhancement Algorithm .............................. 89144.9 Selection of Spectral Enhancement Algorithms ................................ 89644.10 Conclusions ................................................................................... 898References .............................................................................................. 899

45 Adaptive Echo Cancelation for Voice SignalsM. M. Sondhi ........................................................................................... 90345.1 Network Echoes ............................................................................. 90445.2 Single-Channel Acoustic Echo Cancelation ...................................... 91545.3 Multichannel Acoustic Echo Cancelation ......................................... 92145.4 Summary ...................................................................................... 925References .............................................................................................. 926

46 DereverberationY. Huang, J. Benesty, J. Chen.................................................................... 92946.1 Background and Overview ............................................................. 92946.2 Signal Model and Problem Formulation .......................................... 93146.3 Source Model-Based Speech Dereverberation ................................. 93246.4 Separation of Speech and Reverberation

via Homomorphic Transformation .................................................. 93646.5 Channel Inversion and Equalization ............................................... 93746.6 Summary ...................................................................................... 941References .............................................................................................. 942

Page 13: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

XXVIII Contents

47 Adaptive Beamforming and PostfilteringS. Gannot, I. Cohen.................................................................................. 94547.1 Problem Formulation..................................................................... 94747.2 Adaptive Beamforming .................................................................. 94847.3 Fixed Beamformer and Blocking Matrix .......................................... 95347.4 Identification of the Acoustical Transfer Function............................ 95547.5 Robustness and Distortion Weighting ............................................. 96047.6 Multichannel Postfiltering ............................................................. 96247.7 Performance Analysis .................................................................... 96747.8 Experimental Results ..................................................................... 97247.9 Summary ...................................................................................... 97247.A Appendix: Derivation of the Expected Noise Reduction

for a Coherent Noise Field .............................................................. 97347.B Appendix: Equivalence Between Maximum SNR

and LCMV Beamformers ................................................................. 974References .............................................................................................. 975

48 Feedback Control in Hearing AidsA. Spriet, S. Doclo, M. Moonen, J. Wouters................................................. 97948.1 Problem Statement ....................................................................... 98048.2 Standard Adaptive Feedback Canceller ........................................... 98248.3 Feedback Cancellation Based on Prior Knowledge

of the Acoustic Feedback Path........................................................ 98648.4 Feedback Cancellation Based on Closed-Loop System Identification . 99048.5 Comparison ................................................................................... 99548.6 Conclusions ................................................................................... 997References .............................................................................................. 997

49 Active Noise ControlS. M. Kuo, D. R. Morgan ............................................................................ 100149.1 Broadband Feedforward Active Noise Control .................................. 100249.2 Narrowband Feedforward Active Noise Control ................................ 100649.3 Feedback Active Noise Control ........................................................ 101049.4 Multichannel ANC .......................................................................... 101149.5 Summary ...................................................................................... 1015References .............................................................................................. 1015

Part I Multichannel Speech Processing

50 Microphone ArraysG. W. Elko, J. Meyer .................................................................................. 102150.1 Microphone Array Beamforming ..................................................... 102150.2 Constant-Beamwidth Microphone Array System .............................. 102950.3 Constrained Optimization of the Directional Gain ............................ 103050.4 Differential Microphone Arrays ....................................................... 103150.5 Eigenbeamforming Arrays .............................................................. 1034

Page 14: Springer Handbook of Speech Processing - · PDF file17.5 Regular-Pulse Excitation with Long-Term Prediction (RPE-LTP)..... 362 17.6 The Original Code Excited Linear Prediction (CELP)

Contents XXIX

50.6 Adaptive Array Systems .................................................................. 103750.7 Conclusions ................................................................................... 1040References .............................................................................................. 1040

51 Time Delay Estimation and Source LocalizationY. Huang, J. Benesty, J. Chen.................................................................... 104351.1 Technology Taxonomy ................................................................... 104351.2 Time Delay Estimation ................................................................... 104451.3 Source Localization ........................................................................ 105451.4 Summary ...................................................................................... 1061References .............................................................................................. 1062

52 Convolutive Blind Source Separation MethodsM. S. Pedersen, J. Larsen, U. Kjems, L. C. Parra ........................................... 106552.1 The Mixing Model .......................................................................... 106652.2 The Separation Model .................................................................... 106852.3 Identification ................................................................................ 107152.4 Separation Principle ...................................................................... 107152.5 Time Versus Frequency Domain ...................................................... 107652.6 The Permutation Ambiguity ........................................................... 107852.7 Results .......................................................................................... 108452.8 Conclusion .................................................................................... 1084References .............................................................................................. 1084

53 Sound Field ReproductionR. Rabenstein, S. Spors ............................................................................ 109553.1 Sound Field Synthesis .................................................................... 109553.2 Mathematical Representation of Sound Fields ................................ 109653.3 Stereophony ................................................................................. 110053.4 Vector-Based Amplitude Panning................................................... 110353.5 Ambisonics ................................................................................... 110453.6 Wave Field Synthesis ..................................................................... 1109References .............................................................................................. 1113

Acknowledgements ................................................................................... 1115About the Authors ..................................................................................... 1117Detailed Contents ...................................................................................... 1133Subject Index ............................................................................................. 1161