模型壓縮之蒸餾算法小結
文章目錄
- 模型壓縮之蒸餾算法小結
- 輸出配準
- Distilling the Knowledge in a Neural Network(NIPS 2014)
- Deep Mutual Learning(CVPR 2018)
- Born Again Neural Networks(ICML 2018)
- 直接配準
- 擬合注意力圖
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer(ICLR 2017)
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation(ICCV 2019)
- 擬合特徵
- 關係配準
- 擬合特徵兩兩之間的關係
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning(CVPR 2017)
- Graph-based Knowledge Distillation by Multi-head Attention Network(BMVC 2019)
- 擬合輸出中蘊含的關係
- Similarity-Preserving Knowledge Distillation(ICCV 2019)
- Relational Knowledge Distillation(CVPR 2019)
- Data Distillation: Towards Omni-Supervised Learning(CVPR2018)
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results(NIPS 2017)
- 擬合特徵自身內部的關係
原始文檔:https://www.yuque.com/lart/gw5mta/scisva
Google Slide: https://docs.google.com/presentation/d/e/2PACX-1vSsa5X_zfuJUPgxUL7vu8MHbkj3JnUzIlKbf-eXkYivhwiFZRVx_NqhSxBbYDu-1c2D7ucBX_Rlf9kD/pub?start=false&loop=false&delayms=3000
2019年09月07日製作
腦圖的原始文檔:http://naotu.baidu.com/file/f60fea22a9ed0ea7236ca9a70ff1b667?token=dab31b70fffa034a(kdxj)
輸出配準
Distilling the Knowledge in a Neural Network(NIPS 2014)
- 使用教師模型的soft-target
Deep Mutual Learning(CVPR 2018)
- 交替式訓練多個學生網絡互相促進
Born Again Neural Networks(ICML 2018)
- 從教師訓練學生1,以此由學生i訓練學生i+1,最後集成所有的學生模型
直接配準
擬合注意力圖
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer(ICLR 2017)
- 配準各階段特徵通經過道融合後得到的單通道注意力圖
Learning Lightweight Lane Detection CNNs by Self Attention Distillation(ICCV 2019)
- 使網絡各階段的特徵通過通道融合計算注意力圖,配準早期的輸出注意力圖
擬合特徵
FitNets : Hints for Thin Deep Nets(ICLR2015)
- 第一階段使用一個迴歸模塊來配準部分學生網絡和部分教師網絡的輸出特徵,第二階段使用soft targets
關係配準
擬合特徵兩兩之間的關係
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning(CVPR 2017)
- 計算相鄰階段特徵個通道之間的關係進行配準
Graph-based Knowledge Distillation by Multi-head Attention Network(BMVC 2019)
- 使用non-local挖掘相鄰階段特徵奇異值分解處理後的特徵之間的關係
擬合輸出中蘊含的關係
Similarity-Preserving Knowledge Distillation(ICCV 2019)
- 整個batch內部樣本對應輸出特徵之間的關係
Relational Knowledge Distillation(CVPR 2019)
- batch中任意二元數據對應輸出的距離關係和三元組輸出對應角度關係
Data Distillation: Towards Omni-Supervised Learning(CVPR2018)
- 教師模型與學生模型結構可同可不同,會集成不同變換後的樣本對應的教師網絡的輸出
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results(NIPS 2017)
- 半監督方法,教師模型使用當前學生模型的權重參數和上一週期的權重參數計算指數移動平均,一致性約束
擬合特徵自身內部的關係
Knowledge Adaptation for Efficient Semantic Segmentation(CVPR 2019)
- 對教師模型使用自編碼器轉換特徵,對學生模型使用適配單元來適配教師模型的特徵
Structured Knowledge Distillation for Semantic Segmentation(CVPR 2019)
- 同時結合了soft targets,以及使用gan做的更高級的信息的擬合