【KD】、【reid】Distilled Person Re-identification: Towards a More Scalable System

評論:log distance對我們有參考意義,多teacher其他的研究是面向無監督學習的。

 

Motivation:

面向應用,論文從以下幾點出發:

  1. 要有低的標註成本。A scalable Re-ID system可以從無標籤數據和半標註數據中學習。
  2. 要有低的場景擴展成本。當擴展新的場景時,低成本的解決cross domin的問題。
  3. 要有低的testing computation(inference計算量)成本。爲了應對在芯片上計算的趨勢,需要有更輕量級的模型。

 

Contribution:

(1) Reid是一個open-set的識別任務,爲其的KD提出Log-Euclidean Similarity Distillation Loss.

(2)提出自適應知識門來聚合多teacher模型來學習輕量級學生網絡。
(3)進一步用一個Multi-teacher Adaptive Similarity Distillation Framework來聚合他們,降低標註成本,擴展成本,testing計算成本。(實際上做的是跨域的工作,在無監督,半監督方法中達到SOTA)

 

論文框架

Teacher model:  trained 5 teacher models T 1, T 2, T 3, T 4, T 5 with labelled data in the training sets of MSMT17 [53], CUHK03 [28], ViPER [18], DukeMTMC [70] and Market-1501

 

3. Similarity Knowledge Distillation

3.1. Construction of Similarity Matrices

目標:we minimize the distance between the student similarity matrix AS and the teacher similarity matrix AT。

Similarity Matrices約束:

在KD前要對student,teacher vector分別做construction

1,normalization(用Relu 將xs的範圍放到0-1之間)

2, As是symmetric positive definite (正定矩陣)

出發點:限制特徵值爲正,可進行凸優化,也可求特徵值。The range of similarities in AS is [0, 1]

 

3.2. Log-Euclidean Similarity Distillation

measure the distance in a log-Euclidean Riemannian framework [4] instead of using Euclidean metric as follow:

其中:log(A)是對任意symmetric positive definite正定矩陣可求特徵根,如下式:

目的:降維。

 

最後的log距離:

We distill the knowledge embedded in the similarity from teacher to student by minimizing the Log-Euclidean distance as follow:

 

4. Learning to learn from Multiple Teachers

4.1. Multi-teacher Adaptive Aggregated Distillation

權重係數是學習的we aim to learn αi dynamically to make the loss LT A adaptive to the target domain. We call LT A the Adaptive Aggregated Distillation Loss

 

4.2. Adaptive Knowledge Aggregation

目的:爲了使用較少的label a small amount of identities in target domin,就能訓練.

Let xU S,k and xL S,i denote the features of unlabelled sample IU k and labelled sample IL i

As there is no overlap identity in DL and DU

標註label和沒標註的id no overlap

然後loss的優化目標就是讓pos pair大,neg(no labeled)小

validation empirical risk

 

實驗結果

比較log-euclidean距離和歐式距離

 

Setting:

Implementation Details. For teacher models of source scenes, an advanced Re-ID model PCB [47] was adopted. For student model of the target scene, a lightweight model MobileNetV2 [46] was adopted and a convolution layer was applied to reduce the last feature map channel number to 256.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章