【論文復現】ArcFace: Additive Angular Margin Loss for Deep Face Recognition

論文的標題:《ArcFace: Additive Angular Margin Loss for Deep Face Recognition》
論文下載鏈接:https://arxiv.org/pdf/1801.07698v1.pdf

一、核心思想

本篇文論提出了一種新的幾何可解釋性的損失函數:ArcFace。在L2正則化之後的weights和features基礎之上,引入了cos(\theta+m)使得角度空間中類間的決策邊界最大化,如下圖所示:


上圖是ArcFace的幾何解釋:(a)藍色和綠色點代表了兩個不同類別的向量特徵,比如藍色代表一些貓的圖片向量特徵,綠色代表一些狗的圖片向量特徵。ArcFace可以直接進一步增加兩種類別間隔。(2)右邊更加直觀地解釋了角度和角度間隔。ArcFace的角度間隔代表了(超)球面上不同種類樣本的幾何間隔。

二、背景介紹

  • 深度卷積網絡能夠將面部圖像映射到嵌入特徵向量中(通常在姿勢輸入進行歸一化步驟之後)。

  • 這樣,同一個人的特徵之間的距離很小,而不同個體的特徵之間的距離很大。

  • 深度卷積網絡的人臉識別方法在以下三個主要方面有所不同:
    (1)訓練數據:

    • 許多數據集的大小各不相同
    • 數據集帶有標註噪聲
    • 作者發現,MegaFace和FaceScrub數據集之間存在數百張重疊的人臉圖像,針對MS-Celeb-1M, MegaFace及FaceScrub做了整理,並將整理過後的dataset公開。
    • 訓練數據量表上的數量級差異–>行業的人臉識別模型比學術界的模型要好得多
    • 訓練數據的差異也使得某些深度網絡的人臉識別結果無法完全重現。

    (2)網絡架構和設置:

    • ResNet,Inception-ResNet,VGG和Google Inception V1
    • 訓練速度與模型精度之間的權衡

    (3)損失函數:

    • 歐氏邊界損失函數
    • 基於角餘弦餘量的損失函數

三、ArcFace 損失函數的演變

該部分我們主要解釋下從Softmax到ArcFace演變歷程

1. Softmax

L_{1}=-\frac{1}{m}\sum_{i=1}^m\log\frac{e^{W_{y_i}^Tx_i+b_{y_{i}}}}{\sum_{j=1}^ne^{W_{j}^Tx_i+b_{j}}}

  • 其中m代表了batch size個樣本,n代表了類別個數
  • Softmax損失函數沒有明確優化目標,這個目標就是正樣本能夠有更高的相似度,負樣本能夠有更低的相似度,也就是說並沒有擴大決策邊界。

2. Weights Normalisation

b_{j}=0,Softmax公式中分子的部分可以表示爲:
W_{j}^Tx_{i}=\left \|W_{j}\right\|\left\|x_{i} \right\|cos\theta_{j}

\left\|W_{j}\right\|=1,那麼我們有如下公式:
L2=-\frac{1}{m}\sum_{i=1}^m\log\frac{e^{\left\|x_{i}\right\|\cos(\theta_{y_i})}}{e^{\left\|x_{i}\right\|\cos(\theta_{y_i})}+\sum_{j=1,j\not =y_{i}}^ne^{\left\|x_{i}\right\|\cos(\theta_{y_j})}}

權重歸一化之後,loss也就只跟特徵向量和權重之間的角度有關了。在Sphereface論文中表明,權值歸一化可以提高一點點效果。

3. MultiplicativeAngular Margin

在SphereFace中,角度乘以角度間隔m從而擴大m倍:
L_{3}=-\frac{1}{m}\sum_{i=1}^m\log\frac{e^{\left\|x_{i}\right\|\cos(m\theta_{y_i})}}{e^{\left\|x_{i}\right\|\cos(m\theta_{y_i})}+\sum_{j=1,j\not =y_{i}}^ne^{\left\|x_{i}\right\|\cos(\theta_{y_j})}}
其中\theta_{y_i}\in[0,\pi/m]

由於cosine不是單調的,所以用了一個分段函數\psi(y_i)給轉成單調的,
L_{4}=-\frac{1}{m}\sum_{i=1}^m\log\frac{e^{\left\|x_{i}\right\|\psi(\theta_{y_i})}}{e^{\left\|x_{i}\right\|\psi(\theta_{y_i})}+\sum_{j=1,j\not =y_{i}}^ne^{\left\|x_{i}\right\|\cos(\theta_{y_j})}}

其中:\psi(\theta_{y_i})=(-1)^k\cos(m\theta_{y_i})-2k,\theta_{y_i}\in[\frac{k\pi}{m},\frac{(k+1)\pi}{m}],k\in[0,m-1],m \geq 1

引入\psi(\theta_{y_i})實際上相當於訓練的時候加入了softmax幫助收斂,並且權重由動態超參數\lambda控制:
\psi(\theta_{y_i})=\frac{(-1)^k\cos(m\theta_{y_i})-2k+\lambda\cos(\theta_{y_i})}{1+\lambda}

4. Feature Normalisation

特徵和權重正則化消除了徑向變化並且讓每個特徵都分佈在超球面上,本文將特徵超球面半徑設爲s=64,也就是將\left\|x_{i}\right\|縮放爲s,sphereface的loss變爲:

L_{5}=-\frac{1}{m}\sum_{i=1}^m\log\frac{e^{s\psi(\theta_{y_i})}}{e^{s\psi(\theta_{y_i})}+\sum_{j=1,j\not =y_{i}}^ne^{s\cos(\theta_{y_j})}}

5. Additive Cosine Margin

本文把m設爲0.35. 與sphereface相比,cosine-face有三個好處:(1)沒有超參數簡單易實現,(2)清晰且不用softmax監督也能收斂,(3)性能明顯提升

L_{6}=-\frac{1}{m}\sum_{i=1}^m\log\frac{e^{s\cos(\theta_{y_i}-m)}}{e^{s\cos(\theta_{y_i}-m)}+\sum_{j=1,j\not =y_{i}}^ne^{s\cos(\theta_{y_j})}}

6. Additive AngularMargin

L_{6}=-\frac{1}{m}\sum_{i=1}^m\log\frac{e^{s\cos(\theta_{y_i}+m)}}{e^{s\cos(\theta_{y_i}+m)}+\sum_{j=1,j\not =y_{i}}^ne^{s\cos(\theta_{y_j})}}
服從於:
W_{j}=\frac{W_{j}}{\left\|W_{j}\right\|},x_{i}=\frac{x_{i}}{\left\|x_{i}\right\|},\cos(\theta_{j})=W_{j}^Tx_{i}

四、不同損失函數

五、結果對比

六、基於 MNIST Dataset的ARCFace 的Pytorch實現

1 導入包

import torch 
import torch.nn.functional as F

from torch import nn, optim 
from torch.utils.data import DataLoader
from torchvision import transforms as T, datasets

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import plotly.express as px

from tqdm.notebook import tqdm
from sklearn.metrics import accuracy_score

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

2 數據預處理

transform = T.Compose([
    T.ToTensor(),
    T.Normalize((0.5,), (0.5,))
])
trainset = datasets.MNIST('../input/mnist-dataset-pytorch', train = True, transform = transform)
testset = datasets.MNIST('../input/mnist-dataset-pytorch', train = False, transform = transform)

3 ArcFace CNN Model

class ArcFace(nn.Module):
    
    def __init__(self,in_features,out_features,margin = 0.7 ,scale = 64):
        super().__init__()
        
        self.in_features = in_features
        self.out_features = out_features
        self.scale = scale
        self.margin = margin 
        
        self.weights = nn.Parameter(torch.FloatTensor(out_features,in_features))
        nn.init.xavier_normal_(self.weights)
        
    def forward(self,features,targets):
        cos_theta = F.linear(features,F.normalize(self.weights),bias=None)
        cos_theta = cos_theta.clip(-1+1e-7, 1-1e-7)
        
        arc_cos = torch.acos(cos_theta)
        M = F.one_hot(targets, num_classes = self.out_features) * self.margin
        arc_cos = arc_cos + M
        
        cos_theta_2 = torch.cos(arc_cos)
        logits = cos_theta_2 * self.scale
        return logits
    
    
class MNIST_Model(nn.Module):
    
    def __init__(self):
        super(MNIST_Model, self).__init__()

        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50,3)
        self.arc_face = ArcFace(in_features = 3, out_features = 10)
        
    def forward(self,features,targets = None):
        
        x = F.relu(F.max_pool2d(self.conv1(features), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        _,c,h,w = x.shape
        x = x.view(-1, c*h*w)
        x = F.relu(self.fc1(x))
        x = F.normalize(self.fc2(x))
        
        if targets is not None:
            logits = self.arc_face(x,targets)
            return logits
        return x
model = MNIST_Model()
model.to(device)

MNIST_Model(
  (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(10, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2_drop): Dropout2d(p=0.5, inplace=False)
  (fc1): Linear(in_features=320, out_features=50, bias=True)
  (fc2): Linear(in_features=50, out_features=3, bias=True)
  (arc_face): ArcFace()
)

4 模型訓練

class TrainModel():
    
    def __init__(self,criterion = None,optimizer = None,schedular = None,device = None):
        self.criterion = criterion
        self.optimizer = optimizer
        self.schedular = schedular
        self.device = device
        
    def accuracy(self,logits,labels):
        ps = torch.argmax(logits,dim = 1).detach().cpu().numpy()
        acc = accuracy_score(ps,labels.detach().cpu().numpy())
        return acc

    def get_dataloader(self,trainset,validset):
        trainloader = DataLoader(trainset,batch_size = 64, num_workers = 4, pin_memory = True)
        validloader = DataLoader(validset,batch_size = 64, num_workers = 4, pin_memory = True)
        return trainloader, validloader
        
    def train_batch_loop(self,model,trainloader,i):
        
        epoch_loss = 0.0
        epoch_acc = 0.0
        pbar_train = tqdm(trainloader, desc = "Epoch" + " [TRAIN] " + str(i+1))
        
        for t,data in enumerate(pbar_train):
            
            images,labels = data
            images = images.to(device)
            labels = labels.to(device)
            
            logits = model(images,labels)
            loss = self.criterion(logits,labels)
            
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()
            
            epoch_loss += loss.item()
            epoch_acc += self.accuracy(logits,labels)
            
            pbar_train.set_postfix({'loss' : '%.6f' %float(epoch_loss/(t+1)), 'acc' : '%.6f' %float(epoch_acc/(t+1))})
            
        return epoch_loss / len(trainloader), epoch_acc / len(trainloader)
            
    
    def valid_batch_loop(self,model,validloader,i):
        
        epoch_loss = 0.0
        epoch_acc = 0.0
        pbar_valid = tqdm(validloader, desc = "Epoch" + " [VALID] " + str(i+1))
        
        for v,data in enumerate(pbar_valid):
            
            images,labels = data
            images = images.to(device)
            labels = labels.to(device)
            
            logits = model(images,labels)
            loss = self.criterion(logits,labels)
            
            epoch_loss += loss.item()
            epoch_acc += self.accuracy(logits,labels)
            
            pbar_valid.set_postfix({'loss' : '%.6f' %float(epoch_loss/(v+1)), 'acc' : '%.6f' %float(epoch_acc/(v+1))})
            
        return epoch_loss / len(validloader), epoch_acc / len(validloader)
            
    
    def run(self,model,trainset,validset,epochs):
    
        trainloader,validloader = self.get_dataloader(trainset,validset)
        
        for i in range(epochs):
            
            model.train()
            avg_train_loss, avg_train_acc = self.train_batch_loop(model,trainloader,i)
            
            model.eval()
            avg_valid_loss, avg_valid_acc = self.valid_batch_loop(model,validloader,i)
            
        return model 
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = 0.0001)


model = TrainModel(criterion, optimizer, device).run(model, trainset, testset, 20)

5 提取圖片向量

emb = []
y = []

testloader = DataLoader(testset,batch_size = 64)
with torch.no_grad():
    for images,labels in tqdm(testloader):
        
        images = images.to(device)
        embeddings = model(images)
        
        emb += [embeddings.detach().cpu()]
        y += [labels]
        
    embs = torch.cat(emb).cpu().numpy()
    y = torch.cat(y).cpu().numpy()
tsne_df = pd.DataFrame(
    np.column_stack((embs, y)),
    columns = ["x","y","z","targets"]
)

fig = px.scatter_3d(tsne_df, x='x', y='y', z='z',
              color='targets')
fig.show()


https://www.kaggle.com/parthdhameliya77/simple-arcface-implementation-on-mnist-dataset

參考資料

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章