【KD】、【reid】Distilled Person Re-identification: Towards a More Scalable System

评论:log distance对我们有参考意义,多teacher其他的研究是面向无监督学习的。

 

Motivation:

面向应用,论文从以下几点出发:

  1. 要有低的标注成本。A scalable Re-ID system可以从无标签数据和半标注数据中学习。
  2. 要有低的场景扩展成本。当扩展新的场景时,低成本的解决cross domin的问题。
  3. 要有低的testing computation(inference计算量)成本。为了应对在芯片上计算的趋势,需要有更轻量级的模型。

 

Contribution:

(1) Reid是一个open-set的识别任务,为其的KD提出Log-Euclidean Similarity Distillation Loss.

(2)提出自适应知识门来聚合多teacher模型来学习轻量级学生网络。
(3)进一步用一个Multi-teacher Adaptive Similarity Distillation Framework来聚合他们,降低标注成本,扩展成本,testing计算成本。(实际上做的是跨域的工作,在无监督,半监督方法中达到SOTA)

 

论文框架

Teacher model:  trained 5 teacher models T 1, T 2, T 3, T 4, T 5 with labelled data in the training sets of MSMT17 [53], CUHK03 [28], ViPER [18], DukeMTMC [70] and Market-1501

 

3. Similarity Knowledge Distillation

3.1. Construction of Similarity Matrices

目标:we minimize the distance between the student similarity matrix AS and the teacher similarity matrix AT。

Similarity Matrices约束:

在KD前要对student,teacher vector分别做construction

1,normalization(用Relu 将xs的范围放到0-1之间)

2, As是symmetric positive definite (正定矩阵)

出发点:限制特征值为正,可进行凸优化,也可求特征值。The range of similarities in AS is [0, 1]

 

3.2. Log-Euclidean Similarity Distillation

measure the distance in a log-Euclidean Riemannian framework [4] instead of using Euclidean metric as follow:

其中:log(A)是对任意symmetric positive definite正定矩阵可求特征根,如下式:

目的:降维。

 

最后的log距离:

We distill the knowledge embedded in the similarity from teacher to student by minimizing the Log-Euclidean distance as follow:

 

4. Learning to learn from Multiple Teachers

4.1. Multi-teacher Adaptive Aggregated Distillation

权重系数是学习的we aim to learn αi dynamically to make the loss LT A adaptive to the target domain. We call LT A the Adaptive Aggregated Distillation Loss

 

4.2. Adaptive Knowledge Aggregation

目的:为了使用较少的label a small amount of identities in target domin,就能训练.

Let xU S,k and xL S,i denote the features of unlabelled sample IU k and labelled sample IL i

As there is no overlap identity in DL and DU

标注label和没标注的id no overlap

然后loss的优化目标就是让pos pair大,neg(no labeled)小

validation empirical risk

 

实验结果

比较log-euclidean距离和欧式距离

 

Setting:

Implementation Details. For teacher models of source scenes, an advanced Re-ID model PCB [47] was adopted. For student model of the target scene, a lightweight model MobileNetV2 [46] was adopted and a convolution layer was applied to reduce the last feature map channel number to 256.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章