目录
DEPARA: Deep Attribution Graph for Deep Knowledge Transferability(cvpr2020 oral)
Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics
Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos
Distilling Cross-Task Knowledge via Relationship Matching
Revisiting Knowledge Distillation via Label Smoothing Regularization
Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering
What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation(cvpr2020)
Author:Jiahua Dong,..., Xiaowei Xu
问题:迁移的时候对所有信息同等对待,有些信息会损耗目标网络。how to automatically capture transferable visual characterizations and semantic representations while neglecting irrelevant knowledge across domains
- alternatively determine where and how to explore transferable domain-invariant knowledge。
- 模块1:残差转化模块 residual transferability-aware bottleneck is developed for TD to highlight where to translate transferable
visual information while preventing irrelevant translation。TD 不能保证俩domain的特征对齐。个人觉得类似于风格迁移
- 模块2:Residual Attention on Attention Block (RA2B) is proposed to encode domain-invariant knowledge with high transferability scores, which assists TF in exploring how to augment transferable semantic features and boost the translation performance of module TD in return 用的Attention机制加强某些转化的特征
-
环结构,参数交替更新。Our model could be regarded as a closed loop to alternatively update the parameters of TD and TF
DEPARA: Deep Attribution Graph for Deep Knowledge Transferability(cvpr2020 oral)
- probe data:无标注的目标数据
-
Nodes:每一个data的attribution
-
attribution计算=Gradient*Input,
-
Edges:每两个点之间的cosine similarity
-
这样对于每一个任务都能有一个Graph,由此计算任务之间的相似性:
- 由于F是来自于不同模型、不同layer,所以也能计算选择哪一层作为迁移
Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics
出发点:discriminate global image statistics.
亮点:
-
local statistics are largely unchanged, while global statistics are clearly altered
-
最后的loss function中以inpainter为主,自监督训练和classifier不冲突:
- 这样做有以下好处:
- A separate tuning of training parameters is possible, 2) GAN tricks can be applied without affecting the classififier C, 3) GAN training can be stable even when the classififier wins
自监督方法框架:通过一个边框复原的方法复原的图像和原图差异是非常大的
其中的classifier模块:
Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos
亮点:无监督关键点识别
方法:利用一个和目标数据集无关的其他参考pose,去拟合一个分布
Distilling Cross-Task Knowledge via Relationship Matching
问题:目前的蒸馏方法its dependence on the instance-label relationship restricts both teacher and student to the same label space.
通用蒸馏方法:
方法:emphasize the instance-instance relationship to bridge the knowledge transfer across different tasks
- 使用triplet,将T、S的特征层迁移过去。xi,xj,xk代表一组triple,P代表T和S各自产生的度量空间
- 迁移分类层:在每一个mini-batch中,让student的序号label1靠近Teacher的序号label1
Revisiting Knowledge Distillation via Label Smoothing Regularization
问题:对于传统知识蒸馏的探讨(大Teacher训练小Student)
发现:
- 小模型同样能促进大模型学习(Reversed KD )
-
poorly-trained teacher models with worse performance can also boost students.(Defective KD )
-
Knowledge distillation is a learned label smoothing regularization(LSR) LSR解读