{"product_id":"audio-source-separation-and-speech-enhancement-isbn-9781119279891","title":"Audio Source Separation and Speech Enhancement","description":"\u003cp\u003e\u003cb\u003e\u003ci\u003eLearn the technology behind hearing aids, Siri, and Echo\u003c\/i\u003e\u003c\/b\u003e\u003cb\u003e\u003ci\u003e \u003c\/i\u003e\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eAudio source separation and speech enhancement aim to extract one or more source signals of interest from an audio recording involving several sound sources. These technologies are among the most studied in audio signal processing today and bear a critical role in the success of hearing aids, hands-free phones, voice command and other noise-robust audio analysis systems, and music post-production software.\u003c\/p\u003e \u003cp\u003eResearch on this topic has followed three convergent paths, starting with sensor array processing, computational auditory scene analysis, and machine learning based approaches such as independent component analysis, respectively. This book is the first one to provide a comprehensive overview by presenting the common foundations and the differences between these techniques in a unified setting.\u003c\/p\u003e \u003cp\u003eKey features:\u003c\/p\u003e \u003cul\u003e \u003cli\u003eConsolidated perspective on audio source separation and speech enhancement.\u003c\/li\u003e \u003cli\u003eBoth historical perspective and latest advances in the field, e.g. deep neural networks.\u003c\/li\u003e \u003cli\u003eDiverse disciplines: array processing, machine learning, and statistical signal processing.\u003c\/li\u003e \u003cli\u003eCovers the most important techniques for both single-channel and multichannel processing.\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003eThis book provides both introductory and advanced material suitable for people with basic knowledge of signal processing and machine learning. Thanks to its comprehensiveness, it will help students select a promising research track, researchers leverage the acquired cross-domain knowledge to design improved techniques, and engineers and developers choose the right technology for their target application scenario. It will also be useful for practitioners from other fields (e.g., acoustics, multimedia, phonetics, and musicology) willing to exploit audio source separation or speech enhancement as pre-processing tools for their own needs.\u003c\/p\u003e \u003cp\u003eList of Authors xvii\u003c\/p\u003e \u003cp\u003ePreface xxi\u003c\/p\u003e \u003cp\u003eAcknowledgment xxiii\u003c\/p\u003e \u003cp\u003eNotations xxv\u003c\/p\u003e \u003cp\u003eAcronyms xxix\u003c\/p\u003e \u003cp\u003eAbout the Companion Website xxxi\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart I Prerequisites 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Introduction 3\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eEmmanuel Vincent, Sharon Gannot, and Tuomas Virtanen\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e1.1 Why are Source Separation and Speech Enhancement Needed? 3\u003c\/p\u003e \u003cp\u003e1.2 What are the Goals of Source Separation and Speech Enhancement? 4\u003c\/p\u003e \u003cp\u003e1.3 How can Source Separation and Speech Enhancement be Addressed? 9\u003c\/p\u003e \u003cp\u003e1.4 Outline 11\u003c\/p\u003e \u003cp\u003eBibliography 12\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 Time-Frequency Processing: Spectral Properties 15\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eTuomas Virtanen, Emmanuel Vincent, and Sharon Gannot\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e2.1 Time-Frequency Analysis and Synthesis 15\u003c\/p\u003e \u003cp\u003e2.2 Source Properties in the Time-Frequency Domain 23\u003c\/p\u003e \u003cp\u003e2.3 Filtering in the Time-Frequency Domain 25\u003c\/p\u003e \u003cp\u003e2.4 Summary 28\u003c\/p\u003e \u003cp\u003eBibliography 28\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 Acoustics: Spatial Properties 31\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eEmmanuel Vincent, Sharon Gannot, and Tuomas Virtanen\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e3.1 Formalization of the Mixing Process 31\u003c\/p\u003e \u003cp\u003e3.2 Microphone Recordings 32\u003c\/p\u003e \u003cp\u003e3.3 Artificial Mixtures 36\u003c\/p\u003e \u003cp\u003e3.4 Impulse Response Models 37\u003c\/p\u003e \u003cp\u003e3.5 Summary 43\u003c\/p\u003e \u003cp\u003eBibliography 43\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 Multichannel Source Activity Detection, Localization, and Tracking 47\u003cbr\u003e\u003c\/b\u003e\u003ci\u003ePasi Pertilä, Alessio Brutti, Piergiorgio Svaizer, and Maurizio Omologo\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e4.1 Basic Notions in Multichannel Spatial Audio 47\u003c\/p\u003e \u003cp\u003e4.2 Multi-Microphone Source Activity Detection 52\u003c\/p\u003e \u003cp\u003e4.3 Source Localization 54\u003c\/p\u003e \u003cp\u003e4.4 Summary 60\u003c\/p\u003e \u003cp\u003eBibliography 60\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart II Single-Channel Separation and Enhancement 65\u003cbr\u003e\u003cbr\u003e5 Spectral Masking and Filtering 67\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eTimo Gerkmann and Emmanuel Vincent\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e5.1 Time-Frequency Masking 67\u003c\/p\u003e \u003cp\u003e5.2 Mask Estimation Given the Signal Statistics 70\u003c\/p\u003e \u003cp\u003e5.3 Perceptual Improvements 81\u003c\/p\u003e \u003cp\u003e5.4 Summary 82\u003c\/p\u003e \u003cp\u003eBibliography 83\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 Single-Channel Speech Presence Probability Estimation and Noise Tracking 87\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eRainer Martin and Israel Cohen\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e6.1 Speech Presence Probability and its Estimation 87\u003c\/p\u003e \u003cp\u003e6.2 Noise Power Spectrum Tracking 93\u003c\/p\u003e \u003cp\u003e6.3 Evaluation Measures 102\u003c\/p\u003e \u003cp\u003e6.4 Summary 104\u003c\/p\u003e \u003cp\u003eBibliography 104\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7 Single-Channel Classification and Clustering Approaches 107\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eFelixWeninger, Jun Du, Erik Marchi, and Tian Gao\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e7.1 Source Separation by Computational Auditory Scene Analysis 108\u003c\/p\u003e \u003cp\u003e7.2 Source Separation by Factorial HMMs 111\u003c\/p\u003e \u003cp\u003e7.3 Separation Based Training 113\u003c\/p\u003e \u003cp\u003e7.4 Summary 125\u003c\/p\u003e \u003cp\u003eBibliography 125\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8 Nonnegative Matrix Factorization 131\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eRoland Badeau and Tuomas Virtanen\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e8.1 NMF and Source Separation 131\u003c\/p\u003e \u003cp\u003e8.2 NMF Theory and Algorithms 137\u003c\/p\u003e \u003cp\u003e8.3 NMF Dictionary LearningMethods 145\u003c\/p\u003e \u003cp\u003e8.4 Advanced NMF Models 148\u003c\/p\u003e \u003cp\u003e8.5 Summary 156\u003c\/p\u003e \u003cp\u003eBibliography 156\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9 Temporal Extensions of Nonnegative Matrix Factorization 161\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eCédric Févotte, Paris Smaragdis, NasserMohammadiha, and Gautham J.Mysore\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e9.1 Convolutive NMF 161\u003c\/p\u003e \u003cp\u003e9.2 Overview of DynamicalModels 169\u003c\/p\u003e \u003cp\u003e9.3 Smooth NMF 170\u003c\/p\u003e \u003cp\u003e9.4 Nonnegative State-Space Models 174\u003c\/p\u003e \u003cp\u003e9.5 Discrete DynamicalModels 178\u003c\/p\u003e \u003cp\u003e9.6 The Use of DynamicModels in Source Separation 182\u003c\/p\u003e \u003cp\u003e9.7 Which Model to Use? 183\u003c\/p\u003e \u003cp\u003e9.8 Summary 184\u003c\/p\u003e \u003cp\u003e9.9 Standard Distributions 184\u003c\/p\u003e \u003cp\u003eBibliography 185\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart III Multichannel Separation and Enhancement 189\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10 Spatial Filtering 191\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eShmulik Markovich-Golan,Walter Kellermann, and Sharon Gannot\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e10.1 Fundamentals of Array Processing 192\u003c\/p\u003e \u003cp\u003e10.2 Array Topologies 197\u003c\/p\u003e \u003cp\u003e10.3 Data-Independent Beamforming 199\u003c\/p\u003e \u003cp\u003e10.4 Data-Dependent Spatial Filters: Design Criteria 202\u003c\/p\u003e \u003cp\u003e10.5 Generalized Sidelobe Canceler Implementation 209\u003c\/p\u003e \u003cp\u003e10.6 Postfilters 210\u003c\/p\u003e \u003cp\u003e10.7 Summary 211\u003c\/p\u003e \u003cp\u003eBibliography 212\u003c\/p\u003e \u003cp\u003e\u003cb\u003e11 Multichannel Parameter Estimation 219\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eShmulik Markovich-Golan,Walter Kellermann, and Sharon Gannot\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e11.1 Multichannel Speech Presence Probability Estimators 219\u003c\/p\u003e \u003cp\u003e11.2 Covariance Matrix Estimators Exploiting SPP 227\u003c\/p\u003e \u003cp\u003e11.3 Methods forWeakly Guided and Strongly Guided RTF Estimation 228\u003c\/p\u003e \u003cp\u003e11.4 Summary 231\u003c\/p\u003e \u003cp\u003eBibliography 231\u003c\/p\u003e \u003cp\u003e\u003cb\u003e12 Multichannel Clustering and Classification Approaches 235\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eMichael I.Mandel, Shoko Araki, and Tomohiro Nakatani\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e12.1 Two-Channel Clustering 236\u003c\/p\u003e \u003cp\u003e12.2 Multichannel Clustering 244\u003c\/p\u003e \u003cp\u003e12.3 Multichannel Classification 251\u003c\/p\u003e \u003cp\u003e12.4 Spatial Filtering Based on Masks 255\u003c\/p\u003e \u003cp\u003e12.5 Summary 257\u003c\/p\u003e \u003cp\u003eBibliography 258\u003c\/p\u003e \u003cp\u003e\u003cb\u003e13 Independent Component and Vector Analysis 263\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eHiroshi Sawada and Zbynˇek Koldovský\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e13.1 Convolutive Mixtures and their Time-Frequency Representations 264\u003c\/p\u003e \u003cp\u003e13.2 Frequency-Domain Independent Component Analysis 265\u003c\/p\u003e \u003cp\u003e13.3 Independent Vector Analysis 279\u003c\/p\u003e \u003cp\u003e13.4 Example 280\u003c\/p\u003e \u003cp\u003e13.5 Summary 284\u003c\/p\u003e \u003cp\u003eBibliography 284\u003c\/p\u003e \u003cp\u003e\u003cb\u003e14 Gaussian Model Based Multichannel Separation 289\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eAlexey Ozerov and Hirokazu Kameoka\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e14.1 Gaussian Modeling 289\u003c\/p\u003e \u003cp\u003e14.2 Library of Spectral and SpatialModels 295\u003c\/p\u003e \u003cp\u003e14.3 Parameter Estimation Criteria and Algorithms 300\u003c\/p\u003e \u003cp\u003e14.4 Detailed Presentation of Some Methods 305\u003c\/p\u003e \u003cp\u003e14.5 Summary 312\u003c\/p\u003e \u003cp\u003eAcknowledgment 312\u003c\/p\u003e \u003cp\u003eBibliography 312\u003c\/p\u003e \u003cp\u003e\u003cb\u003e15 Dereverberation 317\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eEmanuël A.P. Habets and Patrick A. Naylor\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e15.1 Introduction to Dereverberation 317\u003c\/p\u003e \u003cp\u003e15.2 Reverberation Cancellation Approaches 319\u003c\/p\u003e \u003cp\u003e15.3 Reverberation Suppression Approaches 329\u003c\/p\u003e \u003cp\u003e15.4 Direct Estimation 335\u003c\/p\u003e \u003cp\u003e15.5 Evaluation of Dereverberation 336\u003c\/p\u003e \u003cp\u003e15.6 Summary 337\u003c\/p\u003e \u003cp\u003eBibliography 337\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart IV Application Scenarios and Perspectives 345\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e16 Applying Source Separation to Music 347\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eBryan Pardo, Antoine Liutkus, Zhiyao Duan, and Gaël Richard\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e16.1 Challenges and Opportunities 348\u003c\/p\u003e \u003cp\u003e16.2 Nonnegative Matrix Factorization in the Case of Music 349\u003c\/p\u003e \u003cp\u003e16.3 Taking Advantage of the Harmonic Structure of Music 354\u003c\/p\u003e \u003cp\u003e16.4 Nonparametric Local Models: Taking Advantage of Redundancies in Music 358\u003c\/p\u003e \u003cp\u003e16.5 Taking Advantage of Multiple Instances 363\u003c\/p\u003e \u003cp\u003e16.6 Interactive Source Separation 367\u003c\/p\u003e \u003cp\u003e16.7 Crowd-Based Evaluation 367\u003c\/p\u003e \u003cp\u003e16.8 Some Examples of Applications 368\u003c\/p\u003e \u003cp\u003e16.9 Summary 370\u003c\/p\u003e \u003cp\u003eBibliography 370\u003c\/p\u003e \u003cp\u003e\u003cb\u003e17 Application of Source Separation to Robust Speech Analysis and Recognition 377\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eShinjiWatanabe, Tuomas Virtanen, and Dorothea Kolossa\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e17.1 Challenges and Opportunities 377\u003c\/p\u003e \u003cp\u003e17.2 Applications 380\u003c\/p\u003e \u003cp\u003e17.3 Robust Speech Analysis and Recognition 390\u003c\/p\u003e \u003cp\u003e17.4 Integration of Front-End and Back-End 397\u003c\/p\u003e \u003cp\u003e17.5 Use of Multimodal Information with Source Separation 403\u003c\/p\u003e \u003cp\u003e17.6 Summary 404\u003c\/p\u003e \u003cp\u003eBibliography 405\u003c\/p\u003e \u003cp\u003e\u003cb\u003e18 Binaural Speech Processing with Application to Hearing Devices 413\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eSimon Doclo, Sharon Gannot, Daniel Marquardt, and Elior Hadad\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e18.1 Introduction to Binaural Processing 413\u003c\/p\u003e \u003cp\u003e18.2 Binaural Hearing 415\u003c\/p\u003e \u003cp\u003e18.3 Binaural Noise Reduction Paradigms 416\u003c\/p\u003e \u003cp\u003e18.4 The Binaural Noise Reduction Problem 420\u003c\/p\u003e \u003cp\u003e18.5 Extensions for Diffuse Noise 425\u003c\/p\u003e \u003cp\u003e18.6 Extensions for Interfering Sources 431\u003c\/p\u003e \u003cp\u003e18.7 Summary 437\u003c\/p\u003e \u003cp\u003eBibliography 437\u003c\/p\u003e \u003cp\u003e\u003cb\u003e19 Perspectives 443\u003cbr\u003e\u003c\/b\u003e\u003ci\u003eEmmanuel Vincent, Tuomas Virtanen, and Sharon Gannot\u003c\/i\u003e\u003c\/p\u003e \u003cp\u003e19.1 Advancing Deep Learning 443\u003c\/p\u003e \u003cp\u003e19.2 Exploiting Phase Relationships 447\u003c\/p\u003e \u003cp\u003e19.3 AdvancingMultichannel Processing 450\u003c\/p\u003e \u003cp\u003e19.4 Addressing Multiple-Device Scenarios 453\u003c\/p\u003e \u003cp\u003e19.5 TowardsWidespread Commercial Use 455\u003c\/p\u003e \u003cp\u003eAcknowledgment 457\u003c\/p\u003e \u003cp\u003eBibliography 457\u003c\/p\u003e \u003cp\u003eIndex 465\u003c\/p\u003e   \u003cp\u003e\u003cb\u003eEMMANUEL VINCENT\u003c\/b\u003e is a Senior Research Scientist with Inria, Nancy, France. His research focuses on machine learning for speech and audio signal processing. He has been working on audio source separation for 15 years and co-authored over 180 publications in this field. His contributions include harmonic nonnegative matrix factorization, full-rank spatial covariance modeling, joint spatial\/spectral estimation, deep learning based multichannel source separation, and objective performance metrics. He has given several keynotes, tutorials and summer school lectures, including at Interspeech 2012 and 2016, WASPAA 2015 and LVA\/ICA 2015. He is a founding chair of the series of Signal Separation Evaluation Campaigns (SiSEC) and CHiME Speech Separation and Recognition Challenges and the chair of ISCA's special interest group on Robust Speech Processing. \u003c\/p\u003e\u003cp\u003e\u003cb\u003eTUOMAS VIRTANEN\u003c\/b\u003e is a Professor with the Laboratory of Signal Processing, Tampere University of Technology, Finland, where he is leading the Audio Research Group. He is known for his pioneering work on single-channel sound source separation using nonnegative matrix factorization, and its application to noise-robust speech recognition, music content analysis, and sound event detection. His research interests also include content analysis and processing of audio signals in general. He has authored more than 170 publications and received four best paper awards. He is an IEEE Senior Member, a member of the Audio and Acoustic Signal Processing Technical Committee of IEEE Signal Processing Society, Associate Editor of IEEE\/ACM Transaction on Audio, Speech, and Language Processing, and recipient of the ERC 2014 Starting Grant. \u003c\/p\u003e\u003cp\u003e\u003cb\u003eSHARON GANNOT\u003c\/b\u003e is a Full Professor at the Faculty of Engineering, Bar-Ilan University, Israel, where he is heading the Speech and Signal Processing laboratory and the Signal Processing Track. His research interests include multi-microphone speech processing; distributed algorithms for noise reduction and speaker separation; array processing on manifold; dereverberation; single-microphone speech enhancement; and speaker localization and tracking. He received the Bar-Ilan University's Outstanding Lecturer Award for 2010 and 2014 and the Bar-Ilan Rector Innovation in Research Award in 2018. He has co-authored over 200 publications and lectured tutorials at ICASSP 2012, EUSIPCO 2012, ICASSP 2013, and EUSIPCO 2013 and a keynote address at IWAENC 2012. He was a co-editor of the book Speech Processing in Modern Communication: Challenges and Perspectives (Springer, 2012). He also served as an Associate Editor and a Senior Area Chair of the IEEE Transactions on Speech, Audio and Language Processing. He currently serves as the Chair of the IEEE Audio and Acoustic Signal Processing (AASP) Technical Committee.    \u003c\/p\u003e\u003cp\u003e\u003cb\u003eLEARN THE TECHNOLOGY BEHIND HEARING AIDS, SIRI, AND ECHO\u003c\/b\u003e \u003c\/p\u003e\u003cp\u003eAudio source separation and speech enhancement aim to extract one or more source signals of interest from an audio recording involving several sound sources. These technologies are among the most studied in audio signal processing today and bear a critical role in the success of hearing aids, hands-free phones, voice command and other noise-robust audio analysis systems, and music post-production software. \u003c\/p\u003e\u003cp\u003eResearch on this topic has followed three convergent paths, starting with sensor array processing, computational auditory scene analysis, and machine learning based approaches such as independent component analysis, respectively. This book is the first one to provide a comprehensive overview by presenting the common foundations and the differences between these techniques in a unified setting. \u003c\/p\u003e\u003cp\u003eKey features: \u003c\/p\u003e\u003cul\u003e \u003cli\u003eConsolidated perspective on audio source separation and speech enhancement.\u003c\/li\u003e \u003cli\u003eBoth historical perspective and latest advances in the field, e.g. deep neural networks.\u003c\/li\u003e \u003cli\u003eDiverse disciplines: array processing, machine learning, and statistical signal processing.\u003c\/li\u003e \u003cli\u003eCovers the most important techniques for both single-channel and multichannel processing.\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003eThis book provides both introductory and advanced material suitable for people with basic knowledge of signal processing and machine learning. Thanks to its comprehensiveness, it will help students select a promising research track, researchers leverage the acquired cross-domain knowledge to design improved techniques, and engineers and developers choose the right technology for their target application scenario. It will also be useful for practitioners from other fields (e.g. acoustics, multimedia, phonetics, and musicology) willing to exploit audio source separation or speech enhancement as pre-processing tools for their own needs.\u003c\/p\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47988777124069,"sku":"NP9781119279891","price":147.95,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9781119279891.jpg?v=1761781550","url":"https:\/\/k12savings.com\/products\/audio-source-separation-and-speech-enhancement-isbn-9781119279891","provider":"K12savings","version":"1.0","type":"link"}