SAD論文整理

Category Year Authors Title Features Classifier/Decision rule Dataset noise condition Comparisons Performance
Statistical model IEEE SPL 1999 Jongseo Sohn A Statistical Model-Based Voice Activity Detection DFT coefficients, Gsussian modeling Log-likelihood ratio test (LRT), HMM based hang-over scheme - NOISEX-92: vehicle, white, babble; 5dB, 15dB, 25dB G.729 ROC curves
IEEE SPL 2005 Javier Ramírez Statistical Voice Activity Detection Using a Multiple Observation Likelihood Ratio Test DFT coefficients, Gsussian modeling Multiple observation LRT AURORA-3 Spanish SpeechDat-Car (SDC) distant and close-talking in car environments; 5dB Sohn’s VAD, G.729, AMR1/2, AFE ROC curves
IEEE TASLP2011 Dongwen Ying Voice Activity Detection Based on an Unsupervised Learning Framework log=mel energies GMM TIMIT NOISEX-92 Sohn, G.729, AMR ROC curves
2015 Tomi Kinnunen HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-TermUnsupervised and Segment i-Vector Based Speech Activity Detectors - Fusion of 6 VADs NIST 2015 OpenSAD - Sohn, G.729, GMM, rSAD, ivectors DCF
Deep learning Interspeech 2016 Ruben Zazo, Google Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection Raw waveform CLDNN sythetic, 3800h, balanced daily life noises, 5~30dB 40dim. log mel energies+DNN, LSTM, CLDNN ROC curves, FAR, MAR
IEEE TASLP2016 Xiao-Lei Zhang Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection MRCG Multi-resolution stacking (MRS)+bDNN AURORA-2(8K, noisy), AURORA-4(16K, clean) NOISEX-92; -5~20dB SVM, Zhang13, DNN, bDNN AUC
Interspeech 2014 Xiao-Lei Zhang Boosted Deep Neural Networks and Multi-resolution Cochleagram Features for Voice Activity Detection MRCG bDNN AURORA-4 NOISEX-92; -5~5dB Sohn, Ramirez05, Ying, SVM, Zhang13 AUC
IEEE TASLP2013 Xiao-Lei Zhang Deep Belief Networks Based Voice Activity Detection Pitch, DFT, MFCC, LPC, PLP, AMS Deep belif network (DBN) AURORA-2 -5~10dB G.729., ETSI Wiener filtering, Sohn, Ramirez05/07, Yu, Shin, Ying, SVM AUC
IEEE SPL 2018 Juntae Kim Voice Activity Detection Using an Adaptive Context Attention Model MRCG Attention model+LSTM TIMIT, self-recorded dataset, HAVIC NOISEX-92; -5~10dB HFCL, MSFI, DNN, bDNN, LSTM AUC
IEEE ISSPIT 2019 Guan-Bo Wang A Fusion Model for Robust Voice Activity Detection Fbank Fusion of BUT, CRNN, RNN, SOV OpenSAT19(16K), 160h various background noise BUT, CRNN, RNN, SOV DCF
APSIPA 2019 Guan-Bo Wang An RNN and CRNN Based Approach to Robust Voice Activity Detection Fbank Fusion of RNN, CRNN OpenSAT19, OpenSAT17 BUT, RNN, CRNN DCF
Interspeech 2019 Ruixi Lin Optimizing Voice Activity Detection for Noisy Conditions MFCC, Fbank DAE, CNN AISHELL(16K, maually labeled), AURORA-2 self-collected noises; 0~20dB G.729, SVM, DBN, DDNN Accuracy
Interspeech 2015 Qing Wang A Universal VAD Based on Jointly Trained Deep Neural Networks MRCG Jointly learning DNN with speech enhancement AURORA-4 115 noises types including NOISEX-92; -5~20dB DNN AUC
IEEE ICASSP 2016 Sibo Tong A COMPARATIVE STUDY OF ROBUSTNESS OF DEEP LEARNING APPROACHES FOR VAD log-mel energies Noise-aware training, DNN, LSTM, CNN AURORA-4, WSJ0 6 noises; 5~20dB DNN, LSTM, CNN AUC, EER
ICMSCE 2018 Jaeseok Kim Voice Activity Detection based on Multi-Dilated Convolutional Neural Network MRCG CNN with multi-dilated convolution TIMIT NOISEX-92, sound effect library; -12~10dB bDNN, RNN, CNN AUC
Interspeech 2013 Neville Ryant Speech Activity Detection on YouTube Using Deep Neural Networks MFCC DNN HAVIC, 65h, web videos - GMM EER
IEEE ICASSP 2019 Rajat Hebbar ROBUST SPEECH ACTIVITY DETECTION IN MOVIE AUDIO: DATA RESOURCES AND EXPERIMENTAL EVALUATION log-mel energies CNN-TD Movies MUSAN, Audioset CNN, CLDNN F1, TPR+FPR
Interspeech 2016 Yuya Fujita Robust DNN-based VAD augmented with phone entropy based rejection of background speech Fbank Acoustic model based DNN, entropy criterion self-collected, mobile voice search, 1200h - DNN ER
IEEE ICASSP 2013 Thad Hughes RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION PLP RNN - - GMM+SM ROC curves
IEEE ICASSP 2013 Florian Eyben REAL-LIFE VOICE ACTIVITY DETECTION WITH LSTM RECURRENT NEURAL NETWORKS AND AN APPLICATION TO HOLLYWOOD MOVIES RASTA-PLP LSTM Buckeye (26h), TIMIT 4 noises; -6~25dB Sohn, Ram05, ARG AUC, FNR+FPR
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章