KWS 派系類別

原創

2020-05-20 16:32

原創: [email protected]
時間: 2020/05/19

文章目錄

5.4 Other

之前查看論文得方法有誤，鄙人還沾沾自喜，引以爲戒，paper非經典或必看切勿通篇閱讀
parper 閱讀手冊：你的Paper閱讀能力合格了嗎

一、LVCSR

The LVCSR based systems need to generate rich lattices and high computational resources.

二、HMM-GMM

三、語音到文本

3.1 DNN

2014 | Small-footprint keyword spotting using deep neural networks
arXiv:1709.03665 | attention | DNN + CTC
- 不需要幀對齊，需要full-sized encoders
arXiv:1812.02802 | Streaming + End-to-End
- Singular value decomposition SVD，奇異值分解減小模型
2019 | Multitask Learning of Deep Neural Network Based Keyword Spotting for IoT Devices | DNN + HMM
2019 | Time-Delayed Bottleneck Highway Networks Using a DFT Feature for Keyword Spotting

3.2 CNN | 考慮時間和頻率得相關性

2015 | Convolutional neural networks for small-footprint keyword spotting
arXiv:1907.01448 | Sub-band CNN
2019 | A Small-Footprint End-to-End KWS System in Low Resources | CTC + End-to-End
arXiv:1811.07684 | Dilated Convolutions | end-to-end

3.3 CRNN

arXiv:1703.05390
- 解碼窗1.5秒，不能做到真正得實時
arXiv:1911.01803 | CRNN + temporal feedback connections

3.4 RNN

arXiv:1512.08903 | LSTM + CTC
arXiv:1611.09405 | RNN + CTC (End-to-End)
arXiv:1705.02411 | LSTM + Max Pooling loss
- 不需要依賴於音素級信息對齊，但是受限於解碼
arXiv:1803.10916 | Attention | Encoding - Decoding | End-to-End
2019 | Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting (End-to-End)
2019 | DesenNet-BiLSTM
arXiv:1912.07575 | Encoding - Decoding
arXiv:2002.10851 | Quantized LSTM + CTC
ResNet| 更大得感受野
- arXiv:1710.10361
- arXiv:1904.03814 | TCNet | hyperconnect/TC-ResNet
- arXiv:1912.05124 | CENet-GCN
- arXiv:2004.08531 | MatchboxNet (end-to-end)

3.5 Other

TDNN
- 2017 | Compressed time delay neural network for small-footprint keyword spotting | SVD
- 2019 | A Time Delay Neural Network with Shared Weight Self-Attention for
  Small-Footprint Keyword Spotting
DSConv (Depthwise Separable Convolutions)
- arXiv:1911.02086 | SincConv + DSConv | Raw audio
  
  arXiv:1808.00158 | ASR + SincConv | github
- arXiv:2004.12200 | DSConv + ResNet

四、Query-by-Example

LSTM
- 2015 | Query-by-example keyword spotting using long short-term memory networks
arXiv:1811.10736 | DONUT | CTC | 後驗概率
RNN-T with attention
- arXiv:1710.09617 | Squence-to-sequence | End-to-End | keyword/filler
  - 不需要幀對齊，需要full-sized encoders
arXiv:1910.05171 | query enrollment and testing | user-specific queries

五、Other

5.1 Optimization model

Compression methods
- arXiv:1412.6115 | CV端用得比較多
- 2016 | Model compression applied to small-footprint keyword spotting
- arXiv:1711.07128
- arXiv:1712.05877
- arXiv:1902.05026
Another optimization approach，non-streaming model convert to streaming model
- arXiv:1811.07684
- arXiv:2005.06720 | google-research/kws_streaming
Quantized Distillation
Low rank
- 2016 | Model compression applied to small-footprint keyword spotting
- 2017| Compressed time delay neural network for small-footprint keyword
  spotting

5.2 Loss

CTC loss
- 2006 | Connection-ist temporal classifification: labelling unsegmented sequence data with recurrent neural networks
Max-pooling loss
- arXiv:1705.02411
- arXiv:2001.09246 | A smoothed maxpooling loss (end-to-end)

5.3 Dataset

5.3.1 Enhance

2019 | Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting (End-to-End)

5.3.2 Small dataset

2018 | Fast asr-free and almost zero-resource keyword spotting using dtw and cnns for humanitarian monitoring
- use DTW to augment the data
2019 | Meta learning for few-shot keyword spotting
- suggests
  a few-shot meta-learning approach

5.4 Other

arXiv:2005.03633 | Far-field
2019 | Improving keyword spotting and language identifification via neural architecture search at scale
2017 | Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistan

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

KWS 派系類別

文章目錄

一、LVCSR

二、HMM-GMM

三、語音到文本

3.1 DNN

3.2 CNN | 考慮時間和頻率得相關性

3.3 CRNN

3.4 RNN

3.5 Other

四、Query-by-Example

五、Other

5.1 Optimization model

5.2 Loss

5.3 Dataset

5.3.1 Enhance

5.3.2 Small dataset

5.4 Other

《Python進階》學習筆記

Leetcode 3161. 物塊放置查詢

leetcode 60 排列序列

一個docker容器暴露多個端口

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

STM32CubeMx wiht AI 初體驗

數據結構與算法_渡劫3

Beyond Tracking

數據結構與算法_渡劫2

數據結構與算法_渡劫4

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結