目錄
DEPARA: Deep Attribution Graph for Deep Knowledge Transferability(cvpr2020 oral)
Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics
Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos
Distilling Cross-Task Knowledge via Relationship Matching
Revisiting Knowledge Distillation via Label Smoothing Regularization
Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering
What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation(cvpr2020)
Author:Jiahua Dong,..., Xiaowei Xu
問題:遷移的時候對所有信息同等對待,有些信息會損耗目標網絡。how to automatically capture transferable visual characterizations and semantic representations while neglecting irrelevant knowledge across domains
- alternatively determine where and how to explore transferable domain-invariant knowledge。
- 模塊1:殘差轉化模塊 residual transferability-aware bottleneck is developed for TD to highlight where to translate transferable
visual information while preventing irrelevant translation。TD 不能保證倆domain的特徵對齊。個人覺得類似於風格遷移
- 模塊2:Residual Attention on Attention Block (RA2B) is proposed to encode domain-invariant knowledge with high transferability scores, which assists TF in exploring how to augment transferable semantic features and boost the translation performance of module TD in return 用的Attention機制加強某些轉化的特徵
-
環結構,參數交替更新。Our model could be regarded as a closed loop to alternatively update the parameters of TD and TF
DEPARA: Deep Attribution Graph for Deep Knowledge Transferability(cvpr2020 oral)
- probe data:無標註的目標數據
-
Nodes:每一個data的attribution
-
attribution計算=Gradient*Input,
-
Edges:每兩個點之間的cosine similarity
-
這樣對於每一個任務都能有一個Graph,由此計算任務之間的相似性:
- 由於F是來自於不同模型、不同layer,所以也能計算選擇哪一層作爲遷移
Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics
出發點:discriminate global image statistics.
亮點:
-
local statistics are largely unchanged, while global statistics are clearly altered
-
最後的loss function中以inpainter爲主,自監督訓練和classifier不衝突:
- 這樣做有以下好處:
- A separate tuning of training parameters is possible, 2) GAN tricks can be applied without affecting the classififier C, 3) GAN training can be stable even when the classififier wins
自監督方法框架:通過一個邊框復原的方法復原的圖像和原圖差異是非常大的
其中的classifier模塊:
Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos
亮點:無監督關鍵點識別
方法:利用一個和目標數據集無關的其他參考pose,去擬合一個分佈
Distilling Cross-Task Knowledge via Relationship Matching
問題:目前的蒸餾方法its dependence on the instance-label relationship restricts both teacher and student to the same label space.
通用蒸餾方法:
方法:emphasize the instance-instance relationship to bridge the knowledge transfer across different tasks
- 使用triplet,將T、S的特徵層遷移過去。xi,xj,xk代表一組triple,P代表T和S各自產生的度量空間
- 遷移分類層:在每一個mini-batch中,讓student的序號label1靠近Teacher的序號label1
Revisiting Knowledge Distillation via Label Smoothing Regularization
問題:對於傳統知識蒸餾的探討(大Teacher訓練小Student)
發現:
- 小模型同樣能促進大模型學習(Reversed KD )
-
poorly-trained teacher models with worse performance can also boost students.(Defective KD )
-
Knowledge distillation is a learned label smoothing regularization(LSR) LSR解讀