CosFace[2018-CVPR]

原創

2020-06-16 16:09

Motivation

Novelty

L2-normalizing both features and weight vectors to remove radial variation.
Cosine Loss

Details

Softmax loss : ||w_i||cos(theta_i) = ||w_j||cos(theta_j), 分類面受||w||和cos(theta)的影響可能存在overlap，因此margin<=0.
Normalized-SM : cos(theta_i) = cos(theta_j), 消除||w||對分類面的影響，margin = 0.
Angular-SM : cos(mθ1) ≥ cos(θ2) for C1, cos(mθ2) ≥ cos(θ1) for C2, margin >= 0, 並隨着theta同向單調變化.
LMCL-SM ：cos(θ1) ≥ cos(θ2) + m for C1, cos(θ2) ≥ cos(θ1) + m for C2, margin = sqrt(2)m.

備註：前4幅小圖是從loss-boundary的視角呈現的；後2幅圖是從feature-boundary的視角呈現的.

，

對於easy樣本，cos(theta_yn_n) >> cos(theta_i_n)[i != yn]，在softmax中顯然exp(cos(theta_yn_n))佔據主導地位，爲了使得樣本的loss減小，模型會趨向於學習||x_n||_2大一些的embeddig_feature;
對於hard樣本，cos(theta_yn_n) 與 cos(theta_i_n)[i != yn]起鼓相當，甚至還小於max(cos(theta_i_n)[i != yn])，這樣在softmax中就處於劣勢地位，爲了使得樣本的loss減小，模型會趨向於學習||x_n||_2小一些的embeddig_feature;
綜合（1）、（2）的分析，如果我們強制將embeddig_feature的L2-norm設置爲相同的scale，那麼模型在訓練過程中爲了減低樣本的loss，就只能不斷push相應的類向量[w_yn]和樣本x_n的夾角變小，從而增加了embeddig_feature的區分能力！
從模型優化的角度來看，在初始化模型時將cos(theta_i)[i=1, 2, ..., K]置爲起鼓相當的值，爲了減小樣本的loss，模型會趨向於通過降低||x_n||_2而不是減小相應的類向量[w_yn]和樣本x_n的夾角的方式來優化，這會導致特徵的區分能力減弱！
s = ||x||_L2,取值過小可能導致模型收斂過慢、甚至不不收斂的情況；s取值過大，會增加模型學習難度、過早陷入局部最優解.

Experiment

Visualize

Reference

[1]. CosFace: Large Margin Cosine Loss for Deep Face Recognition

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.