{"product_id":"techniques-for-noise-robustness-in-automatic-speech-recognition-isbn-9781119970880","title":"Techniques for Noise Robustness in Automatic Speech Recognition","description":"\u003cp\u003eAutomatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems.  As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.\u003c\/p\u003e \u003cp\u003eKey features:\u003c\/p\u003e \u003cul\u003e \u003cli\u003eReviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.\u003c\/li\u003e \u003cli\u003eActs as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.\u003c\/li\u003e \u003cli\u003eAddresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.\u003c\/li\u003e \u003cli\u003eIncludes contributions from top ASR researchers from leading research units in the field\u003c\/li\u003e \u003c\/ul\u003e  \u003cp\u003eList of Contributors xv\u003c\/p\u003e \u003cp\u003eAcknowledgments xvii\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Introduction 1\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eTuomas Virtanen, Rita Singh, Bhiksha Raj\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e1.1 Scope of the Book 1\u003c\/p\u003e \u003cp\u003e1.2 Outline 2\u003c\/p\u003e \u003cp\u003e1.3 Notation 4\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart One FOUNDATIONS\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 The Basics of Automatic Speech Recognition 9\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eRita Singh, Bhiksha Raj, Tuomas Virtanen\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e2.1 Introduction 9\u003c\/p\u003e \u003cp\u003e2.2 Speech Recognition Viewed as Bayes Classification 10\u003c\/p\u003e \u003cp\u003e2.3 Hidden Markov Models 11\u003c\/p\u003e \u003cp\u003e2.3.1 Computing Probabilities with HMMs 12\u003c\/p\u003e \u003cp\u003e2.3.2 Determining the State Sequence 17\u003c\/p\u003e \u003cp\u003e2.3.3 Learning HMM Parameters 19\u003c\/p\u003e \u003cp\u003e2.3.4 Additional Issues Relating to Speech Recognition Systems 20\u003c\/p\u003e \u003cp\u003e2.4 HMM-Based Speech Recognition 24\u003c\/p\u003e \u003cp\u003e2.4.1 Representing the Signal 24\u003c\/p\u003e \u003cp\u003e2.4.2 The HMM for a Word Sequence 25\u003c\/p\u003e \u003cp\u003e2.4.3 Searching through all Word Sequences 26\u003c\/p\u003e \u003cp\u003eReferences 29\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 The Problem of Robustness in Automatic Speech Recognition 31\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eBhiksha Raj, Tuomas Virtanen, Rita Singh\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e3.1 Errors in Bayes Classification 31\u003c\/p\u003e \u003cp\u003e3.1.1 Type 1 Condition: Mismatch Error 33\u003c\/p\u003e \u003cp\u003e3.1.2 Type 2 Condition: Increased Bayes Error 34\u003c\/p\u003e \u003cp\u003e3.2 Bayes Classification and ASR 35\u003c\/p\u003e \u003cp\u003e3.2.1 All We Have is a Model: A Type 1 Condition 35\u003c\/p\u003e \u003cp\u003e3.2.2 Intrinsic Interferences—Signal Components that are Unrelated to the Message: A Type 2 Condition 36\u003c\/p\u003e \u003cp\u003e3.2.3 External Interferences—The Data are Noisy: Type 1 and Type 2 Conditions 36\u003c\/p\u003e \u003cp\u003e3.3 External Influences on Speech Recordings 36\u003c\/p\u003e \u003cp\u003e3.3.1 Signal Capture 37\u003c\/p\u003e \u003cp\u003e3.3.2 Additive Corruptions 41\u003c\/p\u003e \u003cp\u003e3.3.3 Reverberation 42\u003c\/p\u003e \u003cp\u003e3.3.4 A Simplified Model of Signal Capture 43\u003c\/p\u003e \u003cp\u003e3.4 The Effect of External Influences on Recognition 44\u003c\/p\u003e \u003cp\u003e3.5 Improving Recognition under Adverse Conditions 46\u003c\/p\u003e \u003cp\u003e3.5.1 Handling the Model Mismatch Error 46\u003c\/p\u003e \u003cp\u003e3.5.2 Dealing with Intrinsic Variations in the Data 47\u003c\/p\u003e \u003cp\u003e3.5.3 Dealing with Extrinsic Variations 47\u003c\/p\u003e \u003cp\u003eReferences 50\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart Two SIGNAL ENHANCEMENT\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement \u003ci\u003e53\u003cbr\u003e \u003c\/i\u003e\u003c\/b\u003e\u003ci\u003eRainer Martin, Dorothea Kolossa\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e4.1 Introduction 53\u003c\/p\u003e \u003cp\u003e4.2 Signal Analysis and Synthesis 55\u003c\/p\u003e \u003cp\u003e4.2.1 DFT-Based Analysis Synthesis with Perfect Reconstruction 55\u003c\/p\u003e \u003cp\u003e4.2.2 Probability Distributions for Speech and Noise DFT Coefficients 57\u003c\/p\u003e \u003cp\u003e4.3 Voice Activity Detection 58\u003c\/p\u003e \u003cp\u003e4.3.1 VAD Design Principles 58\u003c\/p\u003e \u003cp\u003e4.3.2 Evaluation of VAD Performance 62\u003c\/p\u003e \u003cp\u003e4.3.3 Evaluation in the Context of ASR 62\u003c\/p\u003e \u003cp\u003e4.4 Noise Power Spectrum Estimation 65\u003c\/p\u003e \u003cp\u003e4.4.1 Smoothing Techniques 65\u003c\/p\u003e \u003cp\u003e4.4.2 Histogram and GMM Noise Estimation Methods 67\u003c\/p\u003e \u003cp\u003e4.4.3 Minimum Statistics Noise Power Estimation 67\u003c\/p\u003e \u003cp\u003e4.4.4 MMSE Noise Power Estimation 68\u003c\/p\u003e \u003cp\u003e4.4.5 Estimation of the A Priori Signal-to-Noise Ratio 69\u003c\/p\u003e \u003cp\u003e4.5 Adaptive Filters for Signal Enhancement 71\u003c\/p\u003e \u003cp\u003e4.5.1 Spectral Subtraction 71\u003c\/p\u003e \u003cp\u003e4.5.2 Nonlinear Spectral Subtraction 73\u003c\/p\u003e \u003cp\u003e4.5.3 Wiener Filtering 74\u003c\/p\u003e \u003cp\u003e4.5.4 The ETSI Advanced Front End 75\u003c\/p\u003e \u003cp\u003e4.5.5 Nonlinear MMSE Estimators 75\u003c\/p\u003e \u003cp\u003e4.6 ASR Performance 80\u003c\/p\u003e \u003cp\u003e4.7 Conclusions 81\u003c\/p\u003e \u003cp\u003eReferences 82\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5 Extraction of Speech from Mixture Signals 87\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eParis Smaragdis\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e5.1 The Problem with Mixtures 87\u003c\/p\u003e \u003cp\u003e5.2 Multichannel Mixtures 88\u003c\/p\u003e \u003cp\u003e5.2.1 Basic Problem Formulation 88\u003c\/p\u003e \u003cp\u003e5.2.2 Convolutive Mixtures 92\u003c\/p\u003e \u003cp\u003e5.3 Single-Channel Mixtures 98\u003c\/p\u003e \u003cp\u003e5.3.1 Problem Formulation 98\u003c\/p\u003e \u003cp\u003e5.3.2 Learning Sound Models 100\u003c\/p\u003e \u003cp\u003e5.3.3 Separation by Spectrogram Factorization 101\u003c\/p\u003e \u003cp\u003e5.3.4 Dealing with Unknown Sounds 105\u003c\/p\u003e \u003cp\u003e5.4 Variations and Extensions 107\u003c\/p\u003e \u003cp\u003e5.5 Conclusions 107\u003c\/p\u003e \u003cp\u003eReferences 107\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 Microphone Arrays 109\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eJohn McDonough, Kenichi Kumatani\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e6.1 Speaker Tracking 110\u003c\/p\u003e \u003cp\u003e6.2 Conventional Microphone Arrays 113\u003c\/p\u003e \u003cp\u003e6.3 Conventional Adaptive Beamforming Algorithms 120\u003c\/p\u003e \u003cp\u003e6.3.1 Minimum Variance Distortionless Response Beamformer 120\u003c\/p\u003e \u003cp\u003e6.3.2 Noise Field Models 122\u003c\/p\u003e \u003cp\u003e6.3.3 Subband Analysis and Synthesis 123\u003c\/p\u003e \u003cp\u003e6.3.4 Beamforming Performance Criteria 126\u003c\/p\u003e \u003cp\u003e6.3.5 Generalized Sidelobe Canceller Implementation 129\u003c\/p\u003e \u003cp\u003e6.3.6 Recursive Implementation of the GSC 130\u003c\/p\u003e \u003cp\u003e6.3.7 Other Conventional GSC Beamformers 131\u003c\/p\u003e \u003cp\u003e6.3.8 Beamforming based on Higher Order Statistics 132\u003c\/p\u003e \u003cp\u003e6.3.9 Online Implementation 136\u003c\/p\u003e \u003cp\u003e6.3.10 Speech-Recognition Experiments 140\u003c\/p\u003e \u003cp\u003e6.4 Spherical Microphone Arrays 142\u003c\/p\u003e \u003cp\u003e6.5 Spherical Adaptive Algorithms 148\u003c\/p\u003e \u003cp\u003e6.6 Comparative Studies 149\u003c\/p\u003e \u003cp\u003e6.7 Comparison of Linear and Spherical Arrays for DSR 152\u003c\/p\u003e \u003cp\u003e6.8 Conclusions and Further Reading 154\u003c\/p\u003e \u003cp\u003eReferences 155\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart Three FEATURE ENHANCEMENT\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7 From Signals to Speech Features by Digital Signal Processing 161\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eMatthias W¨olfel\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e7.1 Introduction 161\u003c\/p\u003e \u003cp\u003e7.1.1 About this Chapter 162\u003c\/p\u003e \u003cp\u003e7.2 The Speech Signal 162\u003c\/p\u003e \u003cp\u003e7.3 Spectral Processing 163\u003c\/p\u003e \u003cp\u003e7.3.1 Windowing 163\u003c\/p\u003e \u003cp\u003e7.3.2 Power Spectrum 165\u003c\/p\u003e \u003cp\u003e7.3.3 Spectral Envelopes 166\u003c\/p\u003e \u003cp\u003e7.3.4 LP Envelope 166\u003c\/p\u003e \u003cp\u003e7.3.5 MVDR Envelope 169\u003c\/p\u003e \u003cp\u003e7.3.6 Warping the Frequency Axis 171\u003c\/p\u003e \u003cp\u003e7.3.7 Warped LP Envelope 175\u003c\/p\u003e \u003cp\u003e7.3.8 Warped MVDR Envelope 176\u003c\/p\u003e \u003cp\u003e7.3.9 Comparison of Spectral Estimates 177\u003c\/p\u003e \u003cp\u003e7.3.10 The Spectrogram 179\u003c\/p\u003e \u003cp\u003e7.4 Cepstral Processing 179\u003c\/p\u003e \u003cp\u003e7.4.1 Definition and Calculation of Cepstral Coefficients 180\u003c\/p\u003e \u003cp\u003e7.4.2 Characteristics of Cepstral Sequences 181\u003c\/p\u003e \u003cp\u003e7.5 Influence of Distortions on Different Speech Features 182\u003c\/p\u003e \u003cp\u003e7.5.1 Objective Functions 182\u003c\/p\u003e \u003cp\u003e7.5.2 Robustness against Noise 185\u003c\/p\u003e \u003cp\u003e7.5.3 Robustness against Echo and Reverberation 187\u003c\/p\u003e \u003cp\u003e7.5.4 Robustness against Changes in Fundamental Frequency 189\u003c\/p\u003e \u003cp\u003e7.6 Summary and Further Reading 191\u003c\/p\u003e \u003cp\u003eReferences 191\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8 Features Based on Auditory Physiology and Perception 193\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eRichard M. Stern, Nelson Morgan\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e8.1 Introduction 193\u003c\/p\u003e \u003cp\u003e8.2 Some Attributes of Auditory Physiology and Perception 194\u003c\/p\u003e \u003cp\u003e8.2.1 Peripheral Processing 194\u003c\/p\u003e \u003cp\u003e8.2.2 Processing at more Central Levels 200\u003c\/p\u003e \u003cp\u003e8.2.3 Psychoacoustical Correlates of Physiological Observations 202\u003c\/p\u003e \u003cp\u003e8.2.4 The Impact of Auditory Processing on Conventional Feature Extraction 206\u003c\/p\u003e \u003cp\u003e8.2.5 Summary 208\u003c\/p\u003e \u003cp\u003e8.3 “Classic” Auditory Representations 208\u003c\/p\u003e \u003cp\u003e8.4 Current Trends in Auditory Feature Analysis 213\u003c\/p\u003e \u003cp\u003e8.5 Summary 221\u003c\/p\u003e \u003cp\u003eAcknowledgments 222\u003c\/p\u003e \u003cp\u003eReferences 222\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9 Feature Compensation 229\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eJasha Droppo\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e9.1 Life in an Ideal World 229\u003c\/p\u003e \u003cp\u003e9.1.1 Noise Robustness Tasks 229\u003c\/p\u003e \u003cp\u003e9.1.2 Probabilistic Feature Enhancement 230\u003c\/p\u003e \u003cp\u003e9.1.3 Gaussian Mixture Models 231\u003c\/p\u003e \u003cp\u003e9.2 MMSE-SPLICE 232\u003c\/p\u003e \u003cp\u003e9.2.1 Parameter Estimation 233\u003c\/p\u003e \u003cp\u003e9.2.2 Results 236\u003c\/p\u003e \u003cp\u003e9.3 Discriminative SPLICE 237\u003c\/p\u003e \u003cp\u003e9.3.1 The MMI Objective Function 238\u003c\/p\u003e \u003cp\u003e9.3.2 Training the Front-End Parameters 239\u003c\/p\u003e \u003cp\u003e9.3.3 The Rprop Algorithm 240\u003c\/p\u003e \u003cp\u003e9.3.4 Results 241\u003c\/p\u003e \u003cp\u003e9.4 Model-Based Feature Enhancement 242\u003c\/p\u003e \u003cp\u003e9.4.1 The Additive Noise-Mixing Equation 243\u003c\/p\u003e \u003cp\u003e9.4.2 The Joint Probability Model 244\u003c\/p\u003e \u003cp\u003e9.4.3 Vector Taylor Series Approximation 246\u003c\/p\u003e \u003cp\u003e9.4.4 Estimating Clean Speech 247\u003c\/p\u003e \u003cp\u003e9.4.5 Results 247\u003c\/p\u003e \u003cp\u003e9.5 Switching Linear Dynamic System 248\u003c\/p\u003e \u003cp\u003e9.6 Conclusion 249\u003c\/p\u003e \u003cp\u003eReferences 249\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10 Reverberant Speech Recognition 251\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eReinhold Haeb-Umbach, Alexander Krueger\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e10.1 Introduction 251\u003c\/p\u003e \u003cp\u003e10.2 The Effect of Reverberation 252\u003c\/p\u003e \u003cp\u003e10.2.1 What is Reverberation? 252\u003c\/p\u003e \u003cp\u003e10.2.2 The Relationship between Clean and Reverberant Speech Features 254\u003c\/p\u003e \u003cp\u003e10.2.3 The Effect of Reverberation on ASR Performance 258\u003c\/p\u003e \u003cp\u003e10.3 Approaches to Reverberant Speech Recognition 258\u003c\/p\u003e \u003cp\u003e10.3.1 Signal-Based Techniques 259\u003c\/p\u003e \u003cp\u003e10.3.2 Front-End Techniques 260\u003c\/p\u003e \u003cp\u003e10.3.3 Back-End Techniques 262\u003c\/p\u003e \u003cp\u003e10.3.4 Concluding Remarks 265\u003c\/p\u003e \u003cp\u003e10.4 Feature Domain Model of the Acoustic Impulse Response 265\u003c\/p\u003e \u003cp\u003e10.5 Bayesian Feature Enhancement 267\u003c\/p\u003e \u003cp\u003e10.5.1 Basic Approach 268\u003c\/p\u003e \u003cp\u003e10.5.2 Measurement Update 269\u003c\/p\u003e \u003cp\u003e10.5.3 Time Update 270\u003c\/p\u003e \u003cp\u003e10.5.4 Inference 271\u003c\/p\u003e \u003cp\u003e10.6 Experimental Results 272\u003c\/p\u003e \u003cp\u003e10.6.1 Databases 272\u003c\/p\u003e \u003cp\u003e10.6.2 Overview of the Tested Methods 273\u003c\/p\u003e \u003cp\u003e10.6.3 Recognition Results on Reverberant Speech 274\u003c\/p\u003e \u003cp\u003e10.6.4 Recognition Results on Noisy Reverberant Speech 276\u003c\/p\u003e \u003cp\u003e10.7 Conclusions 277\u003c\/p\u003e \u003cp\u003eAcknowledgment 278\u003c\/p\u003e \u003cp\u003eReferences 278\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart Four MODEL ENHANCEMENT\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e11 Adaptation and Discriminative Training of Acoustic Models 285\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eYannick Est`eve, Paul Del´eglise\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e11.1 Introduction 285\u003c\/p\u003e \u003cp\u003e11.1.1 Acoustic Models 286\u003c\/p\u003e \u003cp\u003e11.1.2 Maximum Likelihood Estimation 287\u003c\/p\u003e \u003cp\u003e11.2 Acoustic Model Adaptation and Noise Robustness 288\u003c\/p\u003e \u003cp\u003e11.2.1 Static (or Offline) Adaptation 289\u003c\/p\u003e \u003cp\u003e11.2.2 Dynamic (or Online) Adaptation 289\u003c\/p\u003e \u003cp\u003e11.3 Maximum A Posteriori Reestimation 290\u003c\/p\u003e \u003cp\u003e11.4 Maximum Likelihood Linear Regression 293\u003c\/p\u003e \u003cp\u003e11.4.1 Class Regression Tree 294\u003c\/p\u003e \u003cp\u003e11.4.2 Constrained Maximum Likelihood Linear Regression 297\u003c\/p\u003e \u003cp\u003e11.4.3 CMLLR Implementation 297\u003c\/p\u003e \u003cp\u003e11.4.4 Speaker Adaptive Training 298\u003c\/p\u003e \u003cp\u003e11.5 Discriminative Training 299\u003c\/p\u003e \u003cp\u003e11.5.1 MMI Discriminative Training Criterion 301\u003c\/p\u003e \u003cp\u003e11.5.2 MPE Discriminative Training Criterion 302\u003c\/p\u003e \u003cp\u003e11.5.3 I-smoothing 303\u003c\/p\u003e \u003cp\u003e11.5.4 MPE Implementation 304\u003c\/p\u003e \u003cp\u003e11.6 Conclusion 307\u003c\/p\u003e \u003cp\u003eReferences 308\u003c\/p\u003e \u003cp\u003e\u003cb\u003e12 Factorial Models for Noise Robust Speech Recognition 311\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eJohn R. Hershey, Steven J. Rennie, Jonathan Le Roux\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e12.1 Introduction 311\u003c\/p\u003e \u003cp\u003e12.2 The Model-Based Approach 313\u003c\/p\u003e \u003cp\u003e12.3 Signal Feature Domains 314\u003c\/p\u003e \u003cp\u003e12.4 Interaction Models 317\u003c\/p\u003e \u003cp\u003e12.4.1 Exact Interaction Model 318\u003c\/p\u003e \u003cp\u003e12.4.2 Max Model 320\u003c\/p\u003e \u003cp\u003e12.4.3 Log-Sum Model 321\u003c\/p\u003e \u003cp\u003e12.4.4 Mel Interaction Model 321\u003c\/p\u003e \u003cp\u003e12.5 Inference Methods 322\u003c\/p\u003e \u003cp\u003e12.5.1 Max Model Inference 322\u003c\/p\u003e \u003cp\u003e12.5.2 Parallel Model Combination 324\u003c\/p\u003e \u003cp\u003e12.5.3 Vector Taylor Series Approaches 326\u003c\/p\u003e \u003cp\u003e12.5.4 SNR-Dependent Approaches 331\u003c\/p\u003e \u003cp\u003e12.6 Efficient Likelihood Evaluation in Factorial Models 332\u003c\/p\u003e \u003cp\u003e12.6.1 Efficient Inference using the Max Model 332\u003c\/p\u003e \u003cp\u003e12.6.2 Efficient Vector-Taylor Series Approaches 334\u003c\/p\u003e \u003cp\u003e12.6.3 Band Quantization 335\u003c\/p\u003e \u003cp\u003e12.7 Current Directions 337\u003c\/p\u003e \u003cp\u003e12.7.1 Dynamic Noise Models for Robust ASR 338\u003c\/p\u003e \u003cp\u003e12.7.2 Multi-Talker Speech Recognition using Graphical Models 339\u003c\/p\u003e \u003cp\u003e12.7.3 Noise Robust ASR using Non-Negative Basis Representations 340\u003c\/p\u003e \u003cp\u003eReferences 341\u003c\/p\u003e \u003cp\u003e\u003cb\u003e13 Acoustic Model Training for Robust Speech Recognition 347\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eMichael L. Seltzer\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e13.1 Introduction 347\u003c\/p\u003e \u003cp\u003e13.2 Traditional Training Methods for Robust Speech Recognition 348\u003c\/p\u003e \u003cp\u003e13.3 A Brief Overview of Speaker Adaptive Training 349\u003c\/p\u003e \u003cp\u003e13.4 Feature-Space Noise Adaptive Training 351\u003c\/p\u003e \u003cp\u003e13.4.1 Experiments using fNAT 352\u003c\/p\u003e \u003cp\u003e13.5 Model-Space Noise Adaptive Training 353\u003c\/p\u003e \u003cp\u003e13.6 Noise Adaptive Training using VTS Adaptation 355\u003c\/p\u003e \u003cp\u003e13.6.1 Vector Taylor Series HMM Adaptation 355\u003c\/p\u003e \u003cp\u003e13.6.2 Updating the Acoustic Model Parameters 357\u003c\/p\u003e \u003cp\u003e13.6.3 Updating the Environmental Parameters 360\u003c\/p\u003e \u003cp\u003e13.6.4 Implementation Details 360\u003c\/p\u003e \u003cp\u003e13.6.5 Experiments using NAT 361\u003c\/p\u003e \u003cp\u003e13.7 Discussion 364\u003c\/p\u003e \u003cp\u003e13.7.1 Comparison of Training Algorithms 364\u003c\/p\u003e \u003cp\u003e13.7.2 Comparison to Speaker Adaptive Training 364\u003c\/p\u003e \u003cp\u003e13.7.3 Related Adaptive Training Methods 365\u003c\/p\u003e \u003cp\u003e13.8 Conclusion 366\u003c\/p\u003e \u003cp\u003eReferences 366\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart Five COMPENSATION FOR INFORMATION LOSS\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e14 Missing-Data Techniques: Recognition with Incomplete Spectrograms 371\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eJon Barker\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e14.1 Introduction 371\u003c\/p\u003e \u003cp\u003e14.2 Classification with Incomplete Data 373\u003c\/p\u003e \u003cp\u003e14.2.1 A Simple Missing Data Scenario 374\u003c\/p\u003e \u003cp\u003e14.2.2 Missing Data Theory 376\u003c\/p\u003e \u003cp\u003e14.2.3 Validity of the MAR Assumption 378\u003c\/p\u003e \u003cp\u003e14.2.4 Marginalising Acoustic Models 379\u003c\/p\u003e \u003cp\u003e14.3 Energetic Masking 381\u003c\/p\u003e \u003cp\u003e14.3.1 The Max Approximation 381\u003c\/p\u003e \u003cp\u003e14.3.2 Bounded Marginalisation 382\u003c\/p\u003e \u003cp\u003e14.3.3 Missing Data ASR in the Cepstral Domain 384\u003c\/p\u003e \u003cp\u003e14.3.4 Missing Data ASR with Dynamic Features 386\u003c\/p\u003e \u003cp\u003e14.4 Meta-Missing Data: Dealing with Mask Uncertainty 388\u003c\/p\u003e \u003cp\u003e14.4.1 Missing Data with Soft Masks 388\u003c\/p\u003e \u003cp\u003e14.4.2 Sub-band Combination Approaches 391\u003c\/p\u003e \u003cp\u003e14.4.3 Speech Fragment Decoding 393\u003c\/p\u003e \u003cp\u003e14.5 Some Perspectives on Performance 395\u003c\/p\u003e \u003cp\u003eReferences 396\u003c\/p\u003e \u003cp\u003e\u003cb\u003e15 Missing-Data Techniques: Feature Reconstruction 399\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eJort Florent Gemmeke, Ulpu Remes\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e15.1 Introduction 399\u003c\/p\u003e \u003cp\u003e15.2 Missing-Data Techniques 401\u003c\/p\u003e \u003cp\u003e15.3 Correlation-Based Imputation 402\u003c\/p\u003e \u003cp\u003e15.3.1 Fundamentals 402\u003c\/p\u003e \u003cp\u003e15.3.2 Implementation 404\u003c\/p\u003e \u003cp\u003e15.4 Cluster-Based Imputation 406\u003c\/p\u003e \u003cp\u003e15.4.1 Fundamentals 406\u003c\/p\u003e \u003cp\u003e15.4.2 Implementation 408\u003c\/p\u003e \u003cp\u003e15.4.3 Advances 409\u003c\/p\u003e \u003cp\u003e15.5 Class-Conditioned Imputation 411\u003c\/p\u003e \u003cp\u003e15.5.1 Fundamentals 411\u003c\/p\u003e \u003cp\u003e15.5.2 Implementation 412\u003c\/p\u003e \u003cp\u003e15.5.3 Advances 413\u003c\/p\u003e \u003cp\u003e15.6 Sparse Imputation 414\u003c\/p\u003e \u003cp\u003e15.6.1 Fundamentals 414\u003c\/p\u003e \u003cp\u003e15.6.2 Implementation 416\u003c\/p\u003e \u003cp\u003e15.6.3 Advances 418\u003c\/p\u003e \u003cp\u003e15.7 Other Feature-Reconstruction Methods 420\u003c\/p\u003e \u003cp\u003e15.7.1 Parametric Approaches 420\u003c\/p\u003e \u003cp\u003e15.7.2 Nonparametric Approaches 421\u003c\/p\u003e \u003cp\u003e15.8 Experimental Results 421\u003c\/p\u003e \u003cp\u003e15.8.1 Feature-Reconstruction Methods 422\u003c\/p\u003e \u003cp\u003e15.8.2 Comparison with Other Methods 424\u003c\/p\u003e \u003cp\u003e15.8.3 Advances 426\u003c\/p\u003e \u003cp\u003e15.8.4 Combination with Other Methods 427\u003c\/p\u003e \u003cp\u003e15.9 Discussion and Conclusion 428\u003c\/p\u003e \u003cp\u003eAcknowledgments 429\u003c\/p\u003e \u003cp\u003eReferences 430\u003c\/p\u003e \u003cp\u003e\u003cb\u003e16 Computational Auditory Scene Analysis and Automatic Speech Recognition 433\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eArun Narayanan, DeLiang Wang\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e16.1 Introduction 433\u003c\/p\u003e \u003cp\u003e16.2 Auditory Scene Analysis 434\u003c\/p\u003e \u003cp\u003e16.3 Computational Auditory Scene Analysis 435\u003c\/p\u003e \u003cp\u003e16.3.1 Ideal Binary Mask 435\u003c\/p\u003e \u003cp\u003e16.3.2 Typical CASA Architecture 438\u003c\/p\u003e \u003cp\u003e16.4 CASA Strategies 440\u003c\/p\u003e \u003cp\u003e16.4.1 IBM Estimation Based on Local SNR Estimates 440\u003c\/p\u003e \u003cp\u003e16.4.2 IBM Estimation using ASA Cues 442\u003c\/p\u003e \u003cp\u003e16.4.3 IBM Estimation as Binary Classification 448\u003c\/p\u003e \u003cp\u003e16.4.4 Binaural Mask Estimation Strategies 451\u003c\/p\u003e \u003cp\u003e16.5 Integrating CASA with ASR 452\u003c\/p\u003e \u003cp\u003e16.5.1 Uncertainty Transform Model 454\u003c\/p\u003e \u003cp\u003e16.6 Concluding Remarks 458\u003c\/p\u003e \u003cp\u003eAcknowledgment 458\u003c\/p\u003e \u003cp\u003eReferences 458\u003c\/p\u003e \u003cp\u003e\u003cb\u003e17 Uncertainty Decoding 463\u003cbr\u003e \u003c\/b\u003e\u003ci\u003eHank Liao\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e17.1 Introduction 463\u003c\/p\u003e \u003cp\u003e17.2 Observation Uncertainty 465\u003c\/p\u003e \u003cp\u003e17.3 Uncertainty Decoding 466\u003c\/p\u003e \u003cp\u003e17.4 Feature-Based Uncertainty Decoding 468\u003c\/p\u003e \u003cp\u003e17.4.1 SPLICE with Uncertainty 470\u003c\/p\u003e \u003cp\u003e17.4.2 Front-End Joint Uncertainty Decoding 471\u003c\/p\u003e \u003cp\u003e17.4.3 Issues with Feature-Based Uncertainty Decoding 472\u003c\/p\u003e \u003cp\u003e17.5 Model-Based Joint Uncertainty Decoding 473\u003c\/p\u003e \u003cp\u003e17.5.1 Parameter Estimation 475\u003c\/p\u003e \u003cp\u003e17.5.2 Comparisons with Other Methods 476\u003c\/p\u003e \u003cp\u003e17.6 Noisy CMLLR 477\u003c\/p\u003e \u003cp\u003e17.7 Uncertainty and Adaptive Training 480\u003c\/p\u003e \u003cp\u003e17.7.1 Gradient-Based Methods 481\u003c\/p\u003e \u003cp\u003e17.7.2 Factor Analysis Approaches 482\u003c\/p\u003e \u003cp\u003e17.8 In Combination with Other Techniques 483\u003c\/p\u003e \u003cp\u003e17.9 Conclusions 484\u003c\/p\u003e \u003cp\u003eReferences 485\u003c\/p\u003e \u003cp\u003eIndex 487\u003c\/p\u003e  \u003cp\u003e\u003cstrong\u003eTuomas Virtanen, Tampere University of Technology, Finland\u003c\/strong\u003e\u003cbr\u003eDr . Virtanen is a senior researcher at Tampere University of Technology. Previously, he has worked at Cambridge University, UK as a research associate. His main research contributions are in sound source separation and its application to robust speech recognition, audio content analysis, and music information retrieval. He is well-known for his work on non-negative matrix factorization based source separation, which is currently widely used in the field. He has published numerous journal and conference articles related to above topics. \u003c\/p\u003e\u003cp\u003e\u003cstrong\u003eRita Singh, Carnegie Mellon University, USA\u003c\/strong\u003e\u003cbr\u003eDr. Singh is the CEO of a speech-technology startup but remains an adjunct faculty of the Language Technologies Institute at Carnegie Mellon University. She has been a major contributor to the open-source CMU sphinx and is one of the main architects of the popular Sphinx4 java-based open-source speech recognition system. In addition to her work on core speech recognition technology, she has also developed several algorithms for noise compensation, and was the prime architect of CMU's award-winning submission to the 2001 Naval Research Lab's challenge on automatic recognition of speech in noisy environments (SPINE). \u003c\/p\u003e\u003cp\u003e\u003cstrong\u003eBhiksha Raj, Carnegie Mellon University, USA\u003c\/strong\u003e\u003cbr\u003eDr. Raj is an associate professor in the Language Technologies Institute and in Electrical and Computer Engineering at Carnegie Mellon University. He has worked extensively on robustness algorithms for speech recognition, and is very well-known for his contributions to the highly-popular VTS approach for noise compensation, as well as his contributions to missing-feature-based techniques for noise compensation. He has published extensively on and holds patents for algorithms for microphone array processing and signal separation.   \u003c\/p\u003e\u003cp\u003eAutomatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems.  As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.\u003c\/p\u003e \u003cp\u003eKey features:\u003c\/p\u003e \u003cul\u003e \u003cli\u003eReviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.\u003c\/li\u003e \u003cli\u003eActs as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.\u003c\/li\u003e \u003cli\u003eAddresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.\u003c\/li\u003e \u003cli\u003eIncludes contributions from top ASR researchers from leading research units in the field.\u003c\/li\u003e \u003c\/ul\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47990139584741,"sku":"NP9781119970880","price":135.95,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9781119970880.jpg?v=1761786656","url":"https:\/\/k12savings.com\/products\/techniques-for-noise-robustness-in-automatic-speech-recognition-isbn-9781119970880","provider":"K12savings","version":"1.0","type":"link"}