原創: [email protected]
時間: 2020/05/19
文章目錄
之前查看論文得方法有誤,鄙人還沾沾自喜,引以爲戒,paper非經典或必看切勿通篇閱讀
parper 閱讀手冊:你的Paper閱讀能力合格了嗎
一、LVCSR
- The LVCSR based systems need to generate rich lattices and high computational resources.
二、HMM-GMM
三、語音到文本
3.1 DNN
- 2014 | Small-footprint keyword spotting using deep neural networks
- arXiv:1709.03665 | attention | DNN + CTC
- 不需要幀對齊,需要full-sized encoders
- arXiv:1812.02802 | Streaming + End-to-End
- Singular value decomposition SVD,奇異值分解 減小模型
- 2019 | Multitask Learning of Deep Neural Network Based Keyword Spotting for IoT Devices | DNN + HMM
- 2019 | Time-Delayed Bottleneck Highway Networks Using a DFT Feature for Keyword Spotting
3.2 CNN | 考慮時間和頻率得相關性
- 2015 | Convolutional neural networks for small-footprint keyword spotting
- arXiv:1907.01448 | Sub-band CNN
- 2019 | A Small-Footprint End-to-End KWS System in Low Resources | CTC + End-to-End
- arXiv:1811.07684 | Dilated Convolutions | end-to-end
3.3 CRNN
- arXiv:1703.05390
- 解碼窗1.5秒,不能做到真正得實時
- arXiv:1911.01803 | CRNN + temporal feedback connections
3.4 RNN
-
arXiv:1512.08903 | LSTM + CTC
-
arXiv:1611.09405 | RNN + CTC (End-to-End)
-
arXiv:1705.02411 | LSTM + Max Pooling loss
- 不需要依賴於音素級信息對齊,但是受限於解碼
-
arXiv:1803.10916 | Attention | Encoding - Decoding | End-to-End
-
2019 | Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting (End-to-End)
-
2019 | DesenNet-BiLSTM
-
arXiv:1912.07575 | Encoding - Decoding
-
arXiv:2002.10851 | Quantized LSTM + CTC
-
ResNet| 更大得感受野
- arXiv:1710.10361
- arXiv:1904.03814 | TCNet | hyperconnect/TC-ResNet
- arXiv:1912.05124 | CENet-GCN
- arXiv:2004.08531 | MatchboxNet (end-to-end)
3.5 Other
-
TDNN
- 2017 | Compressed time delay neural network for small-footprint keyword spotting | SVD
- 2019 | A Time Delay Neural Network with Shared Weight Self-Attention for
Small-Footprint Keyword Spotting
-
DSConv (Depthwise Separable Convolutions)
-
arXiv:1911.02086 | SincConv + DSConv | Raw audio
arXiv:1808.00158 | ASR + SincConv | github
-
arXiv:2004.12200 | DSConv + ResNet
-
四、Query-by-Example
-
LSTM
- 2015 | Query-by-example keyword spotting using long short-term memory networks
-
arXiv:1811.10736 | DONUT | CTC | 後驗概率
-
RNN-T with attention
- arXiv:1710.09617 | Squence-to-sequence | End-to-End | keyword/filler
- 不需要幀對齊,需要full-sized encoders
- arXiv:1710.09617 | Squence-to-sequence | End-to-End | keyword/filler
-
arXiv:1910.05171 | query enrollment and testing | user-specific queries
五、Other
5.1 Optimization model
- Compression methods
- arXiv:1412.6115 | CV端用得比較多
- 2016 | Model compression applied to small-footprint keyword spotting
- arXiv:1711.07128
- arXiv:1712.05877
- arXiv:1902.05026
- Another optimization approach,non-streaming model convert to streaming model
- Quantized Distillation
- Low rank
- 2016 | Model compression applied to small-footprint keyword spotting
- 2017| Compressed time delay neural network for small-footprint keyword
spotting
5.2 Loss
-
CTC loss
- 2006 | Connection-ist temporal classifification: labelling unsegmented sequence data with recurrent neural networks
-
Max-pooling loss
- arXiv:1705.02411
- arXiv:2001.09246 | A smoothed maxpooling loss (end-to-end)
5.3 Dataset
5.3.1 Enhance
- 2019 | Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting (End-to-End)
5.3.2 Small dataset
- 2018 | Fast asr-free and almost zero-resource keyword spotting using dtw and cnns for humanitarian monitoring
- use DTW to augment the data
- 2019 | Meta learning for few-shot keyword spotting
- suggests
a few-shot meta-learning approach
- suggests
5.4 Other
- arXiv:2005.03633 | Far-field
- 2019 | Improving keyword spotting and language identifification via neural architecture search at scale
- 2017 | Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistan