arcface，pytorch代碼理解記錄

    def train(self, conf, epochs):
        self.model.train()
        running_loss = 0.            
        for e in range(epochs):
            print('epoch {} started'.format(e))
            if e == self.milestones[0]:
                self.schedule_lr()
            if e == self.milestones[1]:
                self.schedule_lr()      
            if e == self.milestones[2]:
                self.schedule_lr()
            s = time.time()

            prefetcher = data_prefetcher(self.loader)
            input, target = prefetcher.next()
            i = 0

            while input is not None and len(input)==conf.batch_size:
                i += 1
                print(input)
                imgs = input.to(conf.device)
                labels = target.to(conf.device)
                s1 = time.time()
                print (s1-s)

                self.optimizer.zero_grad()
                embeddings = self.model(imgs)
                thetas = self.head(embeddings, labels)


                # print (thetas.shape, labels)
                loss = conf.ce_loss(thetas, labels)
                loss.backward()
                running_loss += loss.item()
                self.optimizer.step()

class Arcface(Module):
    # implementation of additive margin softmax loss in https://arxiv.org/abs/1801.05599    
    def __init__(self, embedding_size=512, classnum=51332,  s=64., m = 0.5):
        super(Arcface, self).__init__()
        self.classnum = classnum
        self.kernel = Parameter(torch.Tensor(embedding_size, classnum))
        # initial kernel
        self.kernel.data.uniform_(-1, 1).renorm_(2, 1, 1e-5).mul_(1e5) #uniform_(-1, 1)服從均勻分佈，mul_對應點相乘
        self.m = m # the margin value, default is 0.5
        self.s = s # scalar value default is 64, see normface https://arxiv.org/abs/1704.06369
        self.cos_m = math.cos(m)
        self.sin_m = math.sin(m)
        self.mm = self.sin_m * m  # issue 1
        self.threshold = math.cos(math.pi - m)
    def forward(self, embbedings, label):
        # weights norm
        nB = len(embbedings)
        kernel_norm = l2_norm(self.kernel, axis=0)
        # cos(theta+m)
        cos_theta = torch.mm(embbedings, kernel_norm)#進行矩陣乘法
#         output = torch.mm(embbedings,kernel_norm)
        cos_theta = cos_theta.clamp(-1,1) # for numerical stability
        cos_theta_2 = torch.pow(cos_theta, 2)
        sin_theta_2 = 1 - cos_theta_2
        sin_theta = torch.sqrt(sin_theta_2)
        cos_theta_m = (cos_theta * self.cos_m - sin_theta * self.sin_m)
        # this condition controls the theta+m should in range [0, pi]
        #      0<=theta+m<=pi
        #     -m<=theta<=pi-m
        cond_v = cos_theta - self.threshold
        cond_mask = cond_v <= 0
        keep_val = (cos_theta - self.mm) # when theta not in [0,pi], use cosface instead
        cos_theta_m[cond_mask] = keep_val[cond_mask]
        output = cos_theta * 1.0 # a little bit hacky way to prevent in_place operation on cos_theta
        idx_ = torch.arange(0, nB, dtype=torch.long)
        output[idx_, label] = cos_theta_m[idx_, label]
        output *= self.s # scale up in order to make softmax work, first introduced in normface
        return output

arcface 損失計算之前先牢記，softmax loss 計算過程

softmax loss 兩個計算步驟，先將模型的特徵輸出（通常是線性的bn 後面的特徵結果）

softmax 計算的是概率，輸入是模型的特徵輸出維度維度是（N，C），C是分類數量（但是，arcfaceloss 這裏是模型輸出512 特徵，通過 arcface head，結構計算，得到一個（512，C）的特徵，沒有arcface 結構，通常這裏的輸入特徵維度，就是最後的線性分類層。

softmax loss, 這裏的yj 是0或者1，對應分類label 的是1，只關注該分類樣本的概率
(https://blog.csdn.net/luoxuexiong/article/details/90062937)
但是pytorch 實際使用用的是
loss = nn.CrossEntropyLoss()
This criterion combines :func:nn.LogSoftmax and :func:nn.NLLLoss in one single class

softmax 概率計算的一個數值更加穩定的計算是 LogSoftmax，
所以 CrossEntropyLoss ，包含了概率計算，和損失函數計算兩個步驟，如果概率計算是softmax,計算，那麼softmaxLoss == CrossEntropyLoss

然後我們看arcface, arcface head 結構是有學習參數的，就是分類輸出線性層的改造，輸出是分類個，w 與輸入向量x的角度，不是通常的w*x 矩陣乘積

mxnet 是這句話
fc7 = mx.sym.FullyConnected(data=nembedding, weight = _weight, no_bias = True, num_hidden=args.num_classes, name='fc7'

pytorch是這句話
        self.kernel = Parameter(torch.Tensor(embedding_size, classnum))
        self.kernel.data.uniform_(-1, 1).renorm_(2, 1, 1e-5).mul_(1e5) #uniform_(-1, 1)服從均勻分佈，mul_對應點相乘
		————————————————
		原文鏈接：https://blog.csdn.net/jacke121/article/details/104790999/

Parameter並將這個parameter綁定到這個module裏面(net.parameter()中就有這個綁定的parameter，所以在參數優化的時候可以進行優化的)變成了模型的一部分，不直接使用一個torch.nn.Linear()可能是因爲學習的效果不好（https://blog.csdn.net/qq_36955294/article/details/88117170，我不知道爲什麼,這裏說的理由）

w.renorm(2,0,1e-5).mul(le5) 對w進行歸一化。w.renorm中前兩個2，0是代表在對w進行在第0維度的L2範數操作得到歸一化結果。1e-5是代表maxnorm ，將大於1e-5的乘以1e5，使得最終歸一化到1。
這句話就是歸一化到 0-1，之間，到底怎麼計算得到的，看官方文檔，

維度是0 計算過程基本是這樣 L2 範數
1/(根號（1+4+9） = 0.2673
2/(根號（1+4+9） = 0.5345
4/（根號（16+25+36））=0.2558

維度是1，
4/根號（1+16） = 0.9701，分母是1和4，所有的列元素

arcface對模型的最後一個分類層，的圖解，

（通俗易懂-arcface https://zhuanlan.zhihu.com/p/101059838?from_voters_page=true）

人臉識別：《Arcface》論文詳解，對損失層的改進有較爲詳細的講解

理解的感覺我覺得類似歸一化，原先是 w*x，作爲輸入
現在是 cos角度作爲輸入（值在 -1，1 之間），做分類，
（儘管在餘弦範圍到角度範圍的映射具有一對一的關係，但他們之間仍有不同之處，事實上，實現角度空間內最大化分類界限相對於餘弦空間而言具有更加清晰的幾何解釋性，角空間中的邊緣差距也相當於超球面上的弧距。）

arcface，pytorch代碼理解記錄

arcface 損失計算之前先牢記，softmax loss 計算過程

然後我們看arcface, arcface head 結構是有學習參數的，就是分類輸出線性層的改造，輸出是分類個，w 與輸入向量x的角度，不是通常的w*x 矩陣乘積

arcface，pytorch代碼理解記錄

Pytorch踩坑記錄——model.eval() torch.no_grad()

fastreid部分python語法記錄，高模塊化框架結構

reid常用評價指標roc rank1 map,誤識率far, 以及optim lr_scheduler 學習率衰減函數

caffe net.blobs: net.params作用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結