首页 > 最新目录 > 正文

08 基于深度卷积循环网络的双通道语音增强算法

日期:2023-04-10 19:29:41 点击:


Dual-channel speech enhancement algorithm based on deep convolutional recurrent network


1.中国电子技术标准化研究院,北京 100176;

摘 要: 为提高针对手机通话应用场景中的双通道语音增强效果,使用深度卷积循环网络实现双通道语音增强算法,结合通道间的空间信息特征对模型进行改进,最后使用相位敏感掩蔽作为训练目标来修复相位损失,实验结果表明,改进方法可以提升网络对双通道信号的处理能力,得到语音质量和可懂度更高的双通道增强语音。

关键词: 深度卷积循环网络; 双通道语音增强; 空间信息特征; 相位敏感掩蔽

DOI: 10.16559 /j.cnki.2095 - 2295.2022.03.008

基金项目: 国家自然科学基金资助项目( 62071039);内蒙古自治区自然科学基金资助项目( 2017MS( LH) 0602)



[1] Wang Y X,Wang D LBoosting classification based speech separation using temporal dynamics[C]/ /Thirteenth annual conference of the international speech communication associationPortland, USA: Interspeech, 2012: 1528

[2] Wang Y X,Wang D LCocktail party processing via structured prediction[J]Advances in Neural Information Processing Systems,2012,25: 224

[3] Wang Y X,Wang D LTowards scaling up classificationbased speech separation[J]IEEE Transactions on Audio,Speech,and Language Processing,2013,21 ( 7 )1381

[4] Wang Y X,Narayanan A,Wang D LOn training targets for supervised speech separation [J]IEEE/ACM Tansactions on Audio,Speech,and Language Processing, 2014,22( 12)1849

[5] Erdogan H,Hershey J R,Watanabe S,et alPhase-sensitive and recognition-boosted speech separation using deep recurrent neural networks [C]/ / In 2015 IEEE International conference on acoustics,speech and signal processing ( ICASSP) South Brisbane,QLD,Australia: IEEE,2015: 708

[6] Hui L,Cai M,Guo C,et alConvolutional maxout neural networks for speech separation[C]/ /2015 IEEE International symposium on signal processing and information technology ( ISSPIT) Washington,USA: IEEE,2015: 24

[7] Sun L,Du J,Dai L R,et alMultiple-target deep learning for LSTM-RNN based speech enhancement[C]/ / 2017 Hands-free Speech Communications and Microphone Arrays ( HSCMA) San Francisco,USA: IEEE,2017: 136

[8] Roman N,Wang D L,Brown G JSpeech segregation based on sound localization[J]The Journal of the Acoustical Society of America,2003,114( 4)2236

[9] Ma N,May T,Brown G JExploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments [J]IEEE /ACM Transactions on Audio,Speech and Language Processing ( TASLP) ,2017,25( 12)2444

[10] Fan N,Du J,Dai L RA regression approach to binaural speech segregation via deep neural network[C]/ / 2016 10th international symposium on chinese spoken language processing ( ISCSLP ) New York, USA: IEEE,2016: 1.

[11] Nakashima H,Chisaki Y,Usagawa T,et alFrequency domain binaural model based on interaural phase and level differences[J]Acoustical Science and Technology,2003,24( 4)172

[12] Varga A,Steeneken H J MAssessment for automatic speech recognition: IINOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems[J]Speech Communication,1993,12( 3)247

[13] Gardner BHRTF measurements of a EMAR dummyhead microphone[J]Mit Media Lab Perceptual Computing Technical Report,1994,280: 1

[14] Kingma D P,Ba JAdam: A method for stochastic optimization[J]ArXiv Preprint ArXiv: 14126980,2014, 9: 1

[15] Yousefian N,Loizou P CA dual-microphone speech enhancement algorithm based on the coherence function [J]IEEE Transactions on Audio,Speech,and Language Processing,2011,20( 2)599

[16] 杨淑楠基于 PLD 的双麦克风语音增强算法研究与 实现[D]西安: 西安电子科技大学,2015.

地址:内蒙古包头市昆都仑区阿尔丁大街7号 邮编:014010 电话:0472-5951610或0472-5953910 Email:cky@imust.edu.cn nkdxb@imust.edu.cn
