Knowledge Distillation 筆記

Inter-Region Affinity Distillation for Road Marking Segmentation （2020.04）

Yuenan Hou1, Zheng Ma2, Chunxiao Liu2, Tak-Wai Hui1, and Chen Change Loy3y
1The Chinese University of Hong Kong 2SenseTime Group Limited 3Nanyang Technological University

用 inter-region affinity graph 描述 structural relationship。
每個node表示一種class的 areas of interest (AOI)（同類算一個 or instance算一個？），edge 代表 Affinity

Generation of AOI：smooth the label map with an average kernel φ and AOI map as
AOI-grounded moment pooling：分別描述 mean, variance, skewness

Inter-region affinity ：

distillation：

Experiments

Learning Lightweight Lane Detection CNNs by Self Attention Distillation （2019.08）

Yuenan Hou1, Zheng Ma2, Chunxiao Liu2, and Chen Change Loy3y
1The Chinese University of Hong Kong 2SenseTime Group Limited 3Nanyang Technological University

用於lane det 的self attention distillation，
backbone：Enet、Resnet18/34
每個block輸出的feature map轉爲attention map，後層attention map監督指導前層的attention map。
生成attention map：
-> Bilinear upsampling B(.) -> spatial softmax operation Φ(.)

Loss
m爲block數，Ld爲L2 loss

Ablation Study
Distillation paths of SAD：SAD用於block 1起反作用，造成low-level info細節損失？
Backward distillation.：後層爲student，前層爲teacher不行
SAD v.s. Deep Supervision：soft target，feedback connection
When to add SAD：adding SAD in later training stage would benefit

Knowledge Adaptation for Efficient Semantic Segmentation（CVPR 2019）

Tong He1 Chunhua Shen1y Zhi Tian1 Dong Gong1 Changming Sun2 Youliang Yan3
1The University of Adelaide 2Data61, CSIRO 3Noah’s Ark Lab, Huawei Technologies

motivation：Teacher和Student間結構差異，使得abilities to capture context and long range dependencies不同，給直接蒸餾造成難度。應該將knowledge去冗去噪後再用於蒸餾。

Knowledge Translation
用auto-encoder壓縮feature

Feature Adaptation（有Fitnet的影子）
solve the problem of feature mismatching and decrease the effect of the inherent network difference of two model

Cf uses a 3 × 3 kernel with stride of 1, padding of 1, BN layer and ReLU

Affinity Distillation
cosine距離表示相似度

backbone： T： resnet50、 S：mobilenetV2

Knowledge Distillation via Instance Relationship Graph （CVPR 2019）

Yufan Liu∗a, Jiajiong Cao*b, Bing Li†a, Chunfeng Yuan†a, Weiming Hua, Yangxi Lic and Yunqiang Duanc
aNLPR, Institute of Automation, Chinese Academy of Sciences bAnt Financial cNational Computer Network Emergency Response Technical Team/Coordination Center of China

分類task，蒸餾中引入instance(樣本) relationship。構建graph：instance feature爲頂點，鄰接矩陣A表示relationship。A與batch size有關，loss中λ需根據batch size嘗試。
backbone: Resnet dataset: cifar、ImageNet

L2 feature + L2 edge

Relational Knowledge Distillation （CVPR2019）

distill info中加入batch中樣本間的差異。用於metric learning，classification也能漲點。
backbone：resnet

embedding vectors不加L2 normalization效果更好（by exploiting a larger embedding space.）
Distance-wise distillation loss

Angle-wise distillation loss

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning （CVPPR 2017）

Junho Yim1 Donggyu Joo1 Jihoon Bae2 Junmo Kim1
1School of Electrical Engineering, KAIST, South Korea 2Electronics and Telecommunications Research Institute

用網絡自身不同層feature間的相似性作爲蒸餾知識（Gramian matrix）
The knowledge transfer performance is very sensitive to how the distilled knowledge is defined. we believe that demonstrating the solution process for the problem provides better generalization than teaching the intermediate result
論文通過實驗證明蒸餾三個好處：
1、Fast optimization
2、Performance improvement for the small DNN
3、Transfer Learning

Gij = Feature1的 i 通道和Feature2的 j 通道elem-product後求和

訓練過程：分階段

FITNETS: HINTS FOR THIN DEEP NETS （2015.03）

teacher：wide shallow student：thin deep。也許由於當時resnet未出，deep net不好訓練
先蒸餾中間層（student中間層加一層conv和teacher match size），再蒸餾輸出分佈

λ逐漸減小

PAYING MORE ATTENTION TO ATTENTION: IMPROVING THE PERFORMANCE OF CONVOLUTIONAL NEURAL NETWORKS VIA ATTENTION TRANSFER （2017.02）

attention map作爲distill info。

attention based attention transfer
p=2
gradient-based attention transfer

Low-resolution Face Recognition in the Wild via Selective Knowledge Distillation (2019.03 TIP2018)

人臉識別任務，只選擇有用的info用於distill
teacher：大網絡、high-resolution input
student：小網絡、low-resolution input
Initialization of the Two-stream CNNs
teacher：可用其他dataset預訓練
student：隨機初始化
Selective Knowledge Distillation from the Teacher Stream

引入feature centroid (uc)，減少graph edge：保留類內fi間的edge，不同類間只有fi和uc相連

Cosine distance d(·) λ is a negative weight（第一項傾向少選node，第二項傾向多選node）
Teacher-supervised Student Stream Fine-tuning

Knowledge Distillation 筆記

Inter-Region Affinity Distillation for Road Marking Segmentation （2020.04）

Learning Lightweight Lane Detection CNNs by Self Attention Distillation （2019.08）

Knowledge Adaptation for Efficient Semantic Segmentation（CVPR 2019）

Knowledge Distillation via Instance Relationship Graph （CVPR 2019）

Relational Knowledge Distillation （CVPR2019）

A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning （CVPPR 2017）

FITNETS: HINTS FOR THIN DEEP NETS （2015.03）

PAYING MORE ATTENTION TO ATTENTION: IMPROVING THE PERFORMANCE OF CONVOLUTIONAL NEURAL NETWORKS VIA ATTENTION TRANSFER （2017.02）

Low-resolution Face Recognition in the Wild via Selective Knowledge Distillation (2019.03 TIP2018)

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

生成二維高斯分佈熱力圖

輕量化backbone

DUC HDC 筆記

Knowledge Distillation 筆記

mmdetection訓練VOC數據

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結