首页 > 自然科学目录 > 正文

08 基于深度卷积循环网络的双通道语音增强算法

日期：2023-04-10 19:29:41 点击：

基于深度卷积循环网络的双通道语音增强算法

Dual-channel speech enhancement algorithm based on deep convolutional recurrent network

董桂官，闫昭宇，曾江蛟，张丹丹

1.中国电子技术标准化研究院，北京 100176;

摘要: 为提高针对手机通话应用场景中的双通道语音增强效果，使用深度卷积循环网络实现双通道语音增强算法，结合通道间的空间信息特征对模型进行改进，最后使用相位敏感掩蔽作为训练目标来修复相位损失，实验结果表明，改进方法可以提升网络对双通道信号的处理能力，得到语音质量和可懂度更高的双通道增强语音。

关键词: 深度卷积循环网络; 双通道语音增强; 空间信息特征; 相位敏感掩蔽

DOI: 10.16559 /j.cnki.2095 － 2295.2022.03.008

基金项目: 国家自然科学基金资助项目( 62071039)；内蒙古自治区自然科学基金资助项目( 2017MS( LH) 0602)

作者：董桂官，闫昭宇，曾江蛟，张丹丹

参考文献:

［1］ Wang Y X，Wang D LBoosting classification based speech separation using temporal dynamics［C］/ /Thirteenth annual conference of the international speech communication associationPortland， USA: Interspeech， 2012: 1528

［2］ Wang Y X，Wang D LCocktail party processing via structured prediction［J］Advances in Neural Information Processing Systems，2012，25: 224

［3］ Wang Y X，Wang D LTowards scaling up classificationbased speech separation［J］IEEE Transactions on Audio，Speech，and Language Processing，2013，21 ( 7 )1381

［4］ Wang Y X，Narayanan A，Wang D LOn training targets for supervised speech separation ［J］IEEE/ACM Tansactions on Audio，Speech，and Language Processing， 2014，22( 12)1849

［5］ Erdogan H，Hershey J Ｒ，Watanabe S，et alPhase-sensitive and recognition-boosted speech separation using deep recurrent neural networks ［C］/ / In 2015 IEEE International conference on acoustics，speech and signal processing ( ICASSP) South Brisbane，QLD，Australia: IEEE，2015: 708

［6］ Hui L，Cai M，Guo C，et alConvolutional maxout neural networks for speech separation［C］/ /2015 IEEE International symposium on signal processing and information technology ( ISSPIT) Washington，USA: IEEE，2015: 24

［7］ Sun L，Du J，Dai L Ｒ，et alMultiple-target deep learning for LSTM-ＲNN based speech enhancement［C］/ / 2017 Hands-free Speech Communications and Microphone Arrays ( HSCMA) San Francisco，USA: IEEE，2017: 136

［8］Ｒoman N，Wang D L，Brown G JSpeech segregation based on sound localization［J］The Journal of the Acoustical Society of America，2003，114( 4)2236

［9］ Ma N，May T，Brown G JExploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments ［J］IEEE /ACM Transactions on Audio，Speech and Language Processing ( TASLP) ，2017，25( 12)2444

［10］ Fan N，Du J，Dai L ＲA regression approach to binaural speech segregation via deep neural network［C］/ / 2016 10th international symposium on chinese spoken language processing ( ISCSLP ) New York， USA: IEEE，2016: 1．

［11］ Nakashima H，Chisaki Y，Usagawa T，et alFrequency domain binaural model based on interaural phase and level differences［J］Acoustical Science and Technology，2003，24( 4)172

［12］ Varga A，Steeneken H J MAssessment for automatic speech recognition: IINOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems［J］Speech Communication，1993，12( 3)247

［13］ Gardner BHＲTF measurements of a EMAＲ dummyhead microphone［J］Mit Media Lab Perceptual Computing Technical Ｒeport，1994，280: 1

［14］ Kingma D P，Ba JAdam: A method for stochastic optimization［J］ArXiv Preprint ArXiv: 14126980，2014， 9: 1

［15］ Yousefian N，Loizou P CA dual-microphone speech enhancement algorithm based on the coherence function ［J］IEEE Transactions on Audio，Speech，and Language Processing，2011，20( 2)599

［16］杨淑楠基于 PLD 的双麦克风语音增强算法研究与实现［D］西安: 西安电子科技大学，2015．