Re-ID: Learning Deep Feature Representation with Domain Guided Dropout for Person Re-identifiation

剛剛看完這篇論文，整理了一下思路。這篇論文基於神經元在學習不同特徵時活躍程度不同而提出DGD的方法，也是666的

論文：https://arxiv.org/abs/1604.07528
代碼：https://github.com/Cysu/dgd_person_reid

論文解析

文章一開始，作者解釋了爲什麼要使用多個數據集進行訓練：
- Learning generic and robust feature representations with data from multiple domains for the same problem is of great value, especially for the problems that have multiple datasets but none of them are large enough to provide abundant data variations.
- 即當一個問題所對應的數據集中沒有一個能夠提供足夠的信息時，可以考慮使用多個訓練集進行訓練，有點類似於“互補”。
包括一些背景的補充：
- In computer vision, a domain often refers to a dataset where samples follow the same underlying data distrubution.
- 一個域通常指一個數據集，該數據集中的樣本符合某種數據分佈。
- It’s common that multiple datasets with different data distributions are proposed to target the same or similar problems.
- 帶有不同數據分佈的數據集都是爲了解決同一個或者相似的問題。
- Multiple-domain learning aims to solve the problem with datasets across different domains simultaneously by using all the data they provide.
- 多個域的學習問題就是利用多個數據集去解決某個問題。
- The success of deep learning is driven by the emergence of large-scale learning. Many studies have shown that fine-tuning a deep model pretrained on a large-scale dataset.
- 深度學習的發展是受到large-scale leaning的驅使。有很多研究都是使用一個大訓練集對模型進行與訓練，然後在使用特定的訓練集對模型進行微調以得到最終的模型。
- However, in many specific areas, there is no such large-scale dataset for learning robust and generic feature representation.
- 然而，並不是在每個領域都會有大訓練集可以用於學習具有魯棒性的特徵。所以很多研究團隊提出了很多小訓練集。
- 所以，作者認爲：
  - It is necessary to develop an effective algorithm that jointly utilize al of them to learn generic feature representation.
- 多域學習除了學習具有魯棒性特徵的方面以外，還有：
  - Another interesting aspect of multi-domain learning is that it enriches the data variety because of the domain discrepancies.
  - 域間的差異也是很重要的。
  - Limited by various condtions, data collected by a research group might only include certain types of variations.
  - Each of such datasets is biased and contains only a subset of possible data variations, which is not sufficient for learning generic feature representation. Combining them together can diversify the training data, thus makes the learned features more robust.
- 以上也是對爲什麼使用多個訓練集進行訓練的原因進行補充。
接着作者基於他們實驗中觀察到的現象：
- When training a CNN with data from all the domains, some neurons learn representations shared across several domains, while some others are effective only for a specific one.
- Neurons that are effective for one domain could be useless for another domain because of the presence domain biases.
- 意思就是說，他們發現在訓練CNN的過程中，不同特徵的學習，神經元的活躍程度是不一樣的。舉個例子，我把神經元簡單的化成兩個group(當然實際中不是這個樣子，這裏只是爲了舉例說明)，對於特徵A的學習，group1比較活躍，而group2比較低迷；但是對於特徵B的學習，group2比較活躍，而group1比較低迷。
- 他們觀察到這個現象的依據：
提出了他們的方法:
- Based on this important observation, we propose a Domain Guided Dropout algorithm to improve the feature learning procedure.
- Domain Guided Dropout — a simple yet effective method of muting non-related neurons for each domain.
- 不難想到，這個方法就是能在學習特徵時抑制對該特徵不活躍的神經元，並促進對該特徵活躍神經元的工作，這樣在一定程度上能減少訓練的參數，以提高程序的性能
- Dropout is one of the most widely used regularzation method in training deeo neural networks, which significantly improves the performance of the deep model.
- Dropout就是一種正規化方法，可以用於提高深度學習模型的性能。
- 而且，這個方法與Standard Dropout是不一樣的：
  - Different from the standard Dropout, which treats all the neurous equally, our method assigns each neuron a specific dropouts rate for each domain according to its effectiveness on that domain.
  - Standard Dropout對神經元是一視同仁的，而作者的方法是根據實際情況進行操作的。
  - 作者的方法有兩種模式：
    - A deterministic scheme
    - A stochastic scheme
下面介紹一下他們整個方法的流程：
- We first mix the data and labels from all the domains together, and train a carefully designed CNN from scratch on the joint dataset with a single softmax loss.
  - Our goal is to learn a generic feature extractor g(·) that has similar outputs for images of the same person and dissimilar outputs for different people.
  - softmax loss：
  - During the test phase, given a probe pedestrian image and a set of gallery images, we use g(·) to extract features from all of them, and rank the gallery images according to their Euclidean distances to the probe image in the feature space.
    - 看到作者使用歐式距離，我第一反應是感到很困惑。一般來說，將歐式距離作爲metric，並不能得到很好的結果，但作者在後面補充了使用歐式距離的原因：
      - … and use the Euclidean distance directly as the metric, which stresses the quality of the learned features representation rather than metrics.
      - 即作者使用歐式距離是爲了強調學習到的特徵的質量，而不是將它作爲一種度量。
- Next, for each domain, we perform the forward pass on all its samples and compute for each neuron its average impact on the objective function. Then we replace the standard Dropout layer with the proposed Domain Guided Dropout layer , and continue to train the CNN model for several more epochs.
  - the impact of a particular neuron: the gain of the loss function when we remove the neuron
  - With the guidance of which neurons being effective for each domain, the CNN learns more discriminative features for all of them.
- At last, if we want to obtain feature representations for a specific domain, the CNN could be further fine-tuned on it, again with the Domain Guided Dropout to improve the performance.
- 注意，以上的第二、第三步驟分別使用的是Domain Guided Dropout的兩個模式：deterministic scheme、stochastic scheme。
  - After the baseline model is trained jointly with datasets of all the domains, we replace the standard Dropout with the deterministic Domain Guided Dropout and resume the training for several epochs.
  - 接着使用Domain Guided Dropout的deterministic scheme進行再次訓練
  - We further fine-tune the net with stochastic Domain Guided Dropout on each domain separately to obtain the best possible results.
  - 最後再使用Domain Guided Dropout的stochastic scheme進行訓練微調。
- 對於Domain的Guidence，不同模式採取的方式是不一樣的
  - 對於deterministic模式：
  - 當神經元的impact score > 0時，使該神經元active；
  - 當神經元的impact score <=0時，使該神經元inactive。
  - 對於stochasic模式：
    - 根據impact score與Temperature T計算該神經元活躍的概率
    - T controls how significantly the score s would affect the probabilities.
    - 也就是說T決定了impact score對p的影響大小
實驗部分：
- 首先介紹了各個數據集的特點
- 接着與state-of-the-art method進行了對比
- 檢驗DGD的有效性
最後作者做了總結。
再談一下，作者認爲他們的貢獻在於三個方面：
- First, we present a pipeline for learning generic feature representations from multiple domains that perform well on all of them.
- Second, we propose Domain Guided Dropouy to discard useless neurons for each domain, which improves the performance of the CNN.
- At last, our method outperforms state-of-arts on multiple person reidentification datasets by large margins.
以上便是對該論文的解析