Triplet Loss、Coupled Cluster Loss 探究

Preface

因爲要區分相似圖像，所以研究了一下 Triplet Loss，還有今年 CVPR 的一篇文章：《Deep Relative Distance Learning: Tell the Difference Between Similar Vehicles》，這篇文章提出了 Coupled Cluster Loss 。

文章的主要內容在之前的閱讀筆記已經敘述過了，文本主要集中於對這兩個損失函數的實驗部分。

Triplet Loss

Triplet Loss 的 Torch 實驗

Triplet Loss 的 Torch 實現，有人已經做好了。只需看懂即可，看看是怎麼做的。爲接下去實現 Coupled Cluster Loss 做好準備。
具體參見這裏的 Github 倉庫，Google 的《FaceNet: A Unified Embedding for Face Recognition and Clustering 》論文就是用的這個 Triplet Loss 實現，代碼參見：https://github.com/cmusatyalab/openface

貼上主要實現部分：

--------------------------------------------------------------------------------
-- TripletEmbeddingCriterion
--------------------------------------------------------------------------------
-- Alfredo Canziani, Apr/May 15
-- Xinpeng.Chen --
--------------------------------------------------------------------------------

local TripletEmbeddingCriterion, parent = torch.class('nn.TripletEmbeddingCriterion', 'nn.Criterion')

function TripletEmbeddingCriterion:__init(alpha)
   parent.__init(self)
   self.alpha = alpha or 0.2
   self.Li = torch.Tensor()
   self.gradInput = {}
end

function TripletEmbeddingCriterion:updateOutput(input)
    local a = input[1] -- anchor
    local p = input[2] -- positive
    local n = input[3] -- negative
    local N = a:size(1) -- N is batchSize, represent the N in the formula
    self.Li:resize(N)
    for i = 1, N do
        self.Li[i] = math.max(0, (a[i] - p[i]) * (a[i] - p[i]) + self.alpha - (a[i] - n[i]) * (a[i] - n[i]))
    end
    self.output = self.Li:sum() / N

    return self.output
end

function TripletEmbeddingCriterion:updateGradInput(input)
   local a = input[1] -- anchor
   local p = input[2] -- positive
   local n = input[3] -- negative
   local N = a:size(1) -- N is batchSize, represent the N in the formula
   if torch.type(a) == 'torch.CudaTensor' then -- if buggy CUDA API
      self.gradInput[1] = (n - p):cmul(self.Li:gt(0):repeatTensor(a:size(2),1):t():type(a:type()) * 2/N)
      self.gradInput[2] = (p - a):cmul(self.Li:gt(0):repeatTensor(a:size(2),1):t():type(a:type()) * 2/N)
      self.gradInput[3] = (a - n):cmul(self.Li:gt(0):repeatTensor(a:size(2),1):t():type(a:type()) * 2/N)
   else -- otherwise
      self.gradInput[1] = self.Li:gt(0):diag():type(a:type()) * (n - p) * 2/N
      self.gradInput[2] = self.Li:gt(0):diag():type(a:type()) * (p - a) * 2/N
      self.gradInput[3] = self.Li:gt(0):diag():type(a:type()) * (a - n) * 2/N
   end

   return self.gradInput
end

Triplet Loss 示意圖及其 Loss Function

Triplet Loss 的示意圖及其損失函數如下：

損失函數爲：

L = \sum N m a x {∥ ∥ f (x a) - f (x p) ∥ ∥ 22 + α - ∥ ∥ f (x a) - f (x n) ∥ ∥ 22, 0}

Triplet Loss 中 margin 取值分析

我們的目的就是使 loss 在訓練迭代中下降的越小越好，也就是要使得 Anchor 與 Positive 越接近越好，Anchor 與 Negative 越遠越好。基於上面這些，分析一下 margin 值的取值。

當 margin 值越小時，loss 也就較容易的趨近於 0，於是 Anchor 與 Positive 都不需要拉的太近，Anchor 與 Negative 不需要拉的太遠，就能使得 loss 很快的趨近於 0。這樣訓練得到的結果，不能夠很好的區分相似的圖像。

當 Anchor 越大時，就需要使得網絡參數要拼命地拉近 Anchor、Positive 之間的距離，拉遠 Anchor、Negative 之間的距離。如果 margin 值設置的太大，很可能最後 loss 保持一個較大的值，難以趨近於 0 。

因此，設置一個合理的 margin 值很關鍵，這是衡量相似度的重要指標。簡而言之，margin 值設置的越小，loss 很容易趨近於 0 ，但很難區分相似的圖像。margin 值設置的越大，loss 值較難趨近於 0，甚至導致網絡不收斂，但可以較有把握的區分較爲相似的圖像。

Triplet Loss 實驗1

分析完了，就得通過實驗來驗證。
當我在 Triplet Loss 中的 Model 設置爲 VGG Net 。同時，margin = 0.2，才跑沒幾下，這個 Loss 曲線就詭異的先猛的增大，之後突然降爲 0 了。不知道爲何？如下：

我的猜想是，VGG Net 本身是很深的網絡，但網絡一深，到最後提取到的特徵向量就很深。相似圖像之間，更多的細節在卷積網絡層中被層層過濾掉了。到最後兩張相似圖像之間特徵區別不大了，即使將 margin 設置爲 0.2，這麼比較小的數值，也不行。

Triplet Loss 實驗2

同樣的數據，margin ＝ 0.2，對於不同的網絡，如下面的這個 AlexNet 網絡（有改動）：

-----------------------------------------------------------
-- Network definition --
-----------------------------------------------------------
backend = nn

convNet = nn.Sequential()
convNet:add(backend.SpatialConvolution(3,64, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true)):add(nn.Dropout(0.3))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(64, 128, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true)):add(nn.Dropout(0.4))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(128, 256, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true)):add(nn.Dropout(0.4))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(256, 512, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true)):add(nn.Dropout(0.5))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(512, 512, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true)):add(nn.Dropout(0.5))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(nn.View(512*4*4))

convNet:add(nn.Linear(512*4*4, 4096*4))
convNet:add(nn.ReLU(true))
convNet:add(nn.Dropout(0.5))
convNet:add(nn.Linear(4096*4, 4096))
convNet:add(nn.ReLU(true))
convNet:add(nn.Dropout(0.5))
convNet:add(nn.Linear(4096, 1024))

-- initialization from MSR
local function MSRinit(net)
    local function init(name)
        for k, v in pairs(net:findModules(name)) do
            local n = v.kW * v.kH * v.nOutputPlane
            v.weight:normal(0, math.sqrt(2/n))
            v.bias:zero()
        end
    end
    -- have to do for both backends
    init'cudnn.SpatialConvolution'
    init'nn.SpatialConvolution'
end
MSRinit(convNet)

convNetPos = convNet:clone('weight', 'bias', 'gradWeight', 'gradBias')
convNetNeg = convNet:clone('weight', 'bias', 'gradWeight', 'gradBias')

-- Parallel container
parallel = nn.ParallelTable()
parallel:add(convNet)
parallel:add(convNetPos)
parallel:add(convNetNeg)

parallel = parallel:cuda()

parameters, gradParameters = parallel:getParameters()

print(b('Fresh-embeddings-computation network:')); print(parallel)

上面這個較淺的網絡，就很快收斂了。當 epoch = 1 時，Loss 曲線如下：

當 epoch = 2 時，Loss 曲線如下：

當 epoch = 3 時，Loss 曲線如下：

最後，當 epoch = 100 時，Loss 曲線如下：

可以看見，最後這個 loss 下降到 0.01 ~ 0.03 之間，其實這還有點大。

Triplet Loss 實驗3

下面，當 margin 值設置爲 0.5 時， epoch = 1 時，loss 曲線如下：

epoch = 2 時，loss 曲線如下：

當 epoch = 16 時，loss 曲線如下：

當 epoch = 17 時，loss 曲線如下：

當 epoch = 28 時，loss 曲線如下：

可以看見，其 Loss 一直在下降。
當 margin ＝ 0.2 時，很快的便能下降到 0.1 附近，接着跑一天一夜，loss 便下降到 0.01 ~ 0.03 附近。
當 margin = 0.5 時，很快的便能下降到 0.2 ~ 0.25 附近。接着再跑，loss 就慢慢再下降，經過 17 個紀元，就下降到了 0.1 ~ 0.15 附近。還會進一步下降。

用訓練好的 model 進行 predict 測試

這裏我折騰了一晚，有點……詭異……我將訓練好的模型加載後，傳入數據，無論我輸入什麼 Tensor，模型輸出 predict 居然都是一樣的值……這裏把我鬱悶了半天。

下面是我用測試圖片（Anchor 5 張圖像、Positive 5 張圖像、Negative 5 張圖像）進行模型的測試代碼：

require 'nn'
require 'image'
require 'paths'
require 'cunn'
require 'cudnn'

-------------------------------------------------------------
-- load images
-- aImgs: Anchor, pImgs: Positive, nImgs: Negative
-------------------------------------------------------------
aImgs = torch.Tensor(5, 3, 128, 128)
local aImgs_i = 1
for f in paths.iterfiles('tripletTestImgs/aImgs/') do
    local img = image.load('tripletTestImgs/aImgs/' .. f)
    aImgs[aImgs_i] = image.scale(img, 128, 128)
    aImgs_i = aImgs_i + 1
end

pImgs = torch.Tensor(5, 3, 128, 128)
local pImgs_i = 1
for f in paths.iterfiles('tripletTestImgs/pImgs/') do
    local img = image.load('tripletTestImgs/pImgs/' .. f)
    pImgs[pImgs_i] = image.scale(img, 128, 128)
    pImgs_i = pImgs_i + 1
end

nImgs = torch.Tensor(5, 3, 128, 128)
local nImgs_i = 1
for f in paths.iterfiles('tripletTestImgs/nImgs/') do
    local img = image.load('tripletTestImgs/nImgs/' .. f)
    nImgs[nImgs_i] = image.scale(img, 128, 128)
    nImgs_i = nImgs_i + 1
end

-------------------------------------------------------------
-- load trained model
-- margin  = 0.2, batchSize = 50, Net = AlexNet
-------------------------------------------------------------
model = torch.load('model_AlexNet.t7')
print(model.modules[1])
print(model.modules[2])
print(model.modules[3])

predict1 = model.modules[1]:forward(aImgs:cuda())
predict2 = model.modules[1]:forward(pImgs:cuda())
predict3 = model.modules[1]:forward(nImgs:cuda())

print('\n---------------------------------------------- \n')
dist1 = torch.sum(torch.cmul(predict1 - predict2, predict1 - predict2), 2)
print('dist1: ');print(dist1)

print('\n----------------------------------------------')
dist2 = torch.sum(torch.cmul(predict1 - predict2, predict1 - predict2), 2)
print('dist2: ');print(dist2)

當我把預測部分的代碼改成如下：

predict = model:forward({aImgs:cuda(), pImgs:cuda(), nImgs:cuda()})

print('\n---------------------------------------------- \n')
dist1 = torch.sum(torch.cmul(predict[1] - predict[2], predict[1] - predict[2]), 2)
print('dist1: ');print(dist1)

print('\n----------------------------------------------')
dist2 = torch.sum(torch.cmul(predict[1] - predict[3], predict[1] - predict[3]), 2)
print('dist2: ');print(dist2)

print('\n----------------------------------------------')
d1 = torch.sum(torch.cmul(predict[1][1] - predict[1][2], predict[1][1] - predict[1][2]))
print('d1: ' .. d1)

d2 = torch.sum(torch.cmul(predict[1][1] - predict[1][3], predict[1][1] - predict[1][3]))
print('d2: ' .. d2)

d3 = torch.sum(torch.cmul(predict[1][1] - predict[1][4], predict[1][1] - predict[1][4]))
print('d3: ' .. d3)

d4 = torch.sum(torch.cmul(predict[1][1] - predict[1][5], predict[1][1] - predict[1][5]))
print('d4: ' .. d4)

d5 = torch.sum(torch.cmul(predict[1][1] - predict[2][1], predict[1][1] - predict[2][1]))
print('d5: ' .. d5)

d6 = torch.sum(torch.cmul(predict[1][1] - predict[2][2], predict[1][1] - predict[2][2]))
print('d6: ' .. d6)

d7 = torch.sum(torch.cmul(predict[1][1] - predict[2][3], predict[1][1] - predict[2][3]))
print('d7: ' .. d7)

d8 = torch.sum(torch.cmul(predict[1][1] - predict[2][4], predict[1][1] - predict[2][4]))
print('d8: ' .. d8)

d9 = torch.sum(torch.cmul(predict[1][1] - predict[2][5], predict[1][1] - predict[2][5]))
print('d9: ' .. d9)

d10 = torch.sum(torch.cmul(predict[1][1] - predict[3][1], predict[1][1] - predict[3][1]))
print('d10: ' .. d10)

d11 = torch.sum(torch.cmul(predict[1][1] - predict[3][2], predict[1][1] - predict[3][2]))
print('d11: ' .. d11)

d12 = torch.sum(torch.cmul(predict[1][1] - predict[3][3], predict[1][1] - predict[3][3]))
print('d12: ' .. d12)

d13 = torch.sum(torch.cmul(predict[1][1] - predict[3][4], predict[1][1] - predict[3][4]))
print('d13: ' .. d13)

d14 = torch.sum(torch.cmul(predict[1][1] - predict[3][5], predict[1][1] - predict[3][5]))
print('d14: ' .. d14)

就有值了：

但奇怪的是，輸出的值，每次運行都不一樣，如過我再運行一次，就會變成下面的值：

經過 Google，終於找到每次輸入，輸出的值不一樣的原因了！原來是我在網絡中加了 Dropout 層，在這個 Torch 的文檔中找到了解釋：

In this example, we demonstrate how the call to forward samples different outputs to dropout (the zeros) given the same input:

module = nn.Dropout()

> x = torch.Tensor{{1, 2, 3, 4}, {5, 6, 7, 8}}

> module:forward(x)
  2   0   0   8
 10   0  14   0
[torch.DoubleTensor of dimension 2x4]

> module:forward(x)
  0   0   6   0
 10   0   0   0
[torch.DoubleTensor of dimension 2x4]

Triplet Loss 實驗4

看來得把 Dropout 的那篇 2014 年 JMLR 的 Paper 《Dropout - A Simple Way to Prevent Neural Networks from Overfitting》看一下了～

於是我乾脆將上面網絡中的 Dropout 層取消掉，看實驗效果。同時參數設置： margin = 0.5，batchSize = 50，取消之後的網絡爲：

-----------------------------------------------------------
-- Network definition --
-- Cut out the Dropout Layer --
-----------------------------------------------------------
backend = nn

convNet = nn.Sequential()
convNet:add(backend.SpatialConvolution(3,64, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(64, 128, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(128, 256, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(256, 512, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(backend.SpatialConvolution(512, 512, 3,3, 1,1, 1,1))
convNet:add(backend.ReLU(true))
convNet:add(backend.SpatialMaxPooling(2, 2, 2, 2):ceil())

convNet:add(nn.View(512*4*4))

convNet:add(nn.Linear(512*4*4, 4096*4))
convNet:add(nn.ReLU(true))
convNet:add(nn.Linear(4096*4, 4096))
convNet:add(nn.ReLU(true))
convNet:add(nn.Linear(4096, 1024))

-- initialization from MSR
local function MSRinit(net)
    local function init(name)
        for k, v in pairs(net:findModules(name)) do
        local n = v.kW * v.kH * v.nOutputPlane
        v.weight:normal(0, math.sqrt(2/n))
        v.bias:zero()
        end
    end
    -- have to do for both backends
    init'cudnn.SpatialConvolution'
    init'nn.SpatialConvolution'
end
MSRinit(convNet)

convNetPos = convNet:clone('weight', 'bias', 'gradWeight', 'gradBias')
convNetNeg = convNet:clone('weight', 'bias', 'gradWeight', 'gradBias')

-- Parallel container
parallel = nn.ParallelTable()
parallel:add(convNet)
parallel:add(convNetPos)
parallel:add(convNetNeg)

parallel = parallel:cuda()
parameters, gradParameters = parallel:getParameters()
print(b('Fresh-embeddings-computation network:')); print(parallel)

當 epoch = 1 時，其 loss 曲線爲：

當 epoch = 2 時，其 loss 曲線爲：

當 epoch = 時，其 loss 曲線爲：

突然發現沒有了 Dropout 層，訓練的 loss 收斂速度要遠遠快於有 Dropout 層。但同時，每一個 epoch 的速度消耗時間也變長了，是有 Dropout 層所消耗時間的 3 ~ 4 倍：

看有 Dropout 層的訓練時間消耗：

Coupled Cluster Loss

Coupled Cluster Loss 的 Torch 實現

當我將 Triplet Loss 改寫爲 Coupled Cluster Loss，其損失函數模塊的代碼如下，這改寫沒什麼難的，依葫蘆畫瓢就好了：

------------------------------------------
-- Coupled Cluster Loss --
-- Xinpeng.Chen --
------------------------------------------

local CoupledClusterLoss, parent = torch.class('nn.CoupledClusterLoss', 'nn.Criterion')

function CoupledClusterLoss:__init(alpha)
    parent.__init(self)
    self.alpha = alpha or 0.2 -- margin
    self.Li = torch.Tensor()
    self.gradInput = {}
end

function CoupledClusterLoss:updateOutput(input)
    local p = input[1] -- p is the 5 * 1024 vector
    local n = input[2] -- n is the 5 * 1024 vector
    local N = p:size(1)

    -- find the center of the positive points
    local centerP = torch.sum(p, 1) / N

    -- fing the closest negative vector to the centerP
    local negToCP = torch.Tensor(N)

    for i = 1, N do
        negToCP[i] = torch.sum( torch.cmul(centerP - n[i], centerP - n[i]) )
    end
    local minNegToCP, minIndex = torch.min(negToCP, 1)

    -- Caculate the loss
    self.Li:resize(N)
    for j = 1, N do
        -- self.Li[j] = 0.5 * math.max(0, torch.sum( torch.cmul(p[i] - centerP, p[i] - centerP) ) + self.alpha - torch.sum( torch.cmul(n[minIndex[1]] - centerP, n[minIndex[1]] - centerP) ) )
        self.Li[j] = 0.5 * math.max( 0, (p[j] - centerP) * (p[j] - centerP) + self.alpha - (n[minIndex[1]] - centerP) * (n[minIndex[1]] - centerP) )
    end

    self.output = self.Li:sum()

    return self.output
end

function CoupledClusterLoss:updateGradInput(input)
    local p = input[1] -- p is the 5 * 1024 vector
    local n = input[2] -- n is the 5 * 1024 vector
    local N = p:size(1)

    -- find the center of the positive points
    local centerP = torch.sum(p, 1) / N

    -- fing the closest negative vector to the centerP
    local negToCP = torch.Tensor(N)

    for i = 1, N do
        negToCP[i] = torch.sum( torch.cmul(centerP - n[i], centerP - n[i]) )
    end
    local minNegToCP, minIndex = torch.min(negToCP, 1)

    -- Caculate the gradient of input
    if torch.type(p) == 'torch.CudaTensor' then -- if buggy CUDA API
        self.gradInput[1] = (p - centerP:repeatTensor(N, 1)):cmul(self.Li:gt(0):repeatTensor(p:size(2), 1):type(p:type()) )
        self.gradInput[2] = (centerP:repeatTensor(N, 1) - n[minIndex[1]]:repeatTensor(N, 1)):cmul(self.Li:gt(0):repeatTensor(p:size(2), 1):type(p:type()) )
    else
        self.gradInput[1] = self.Li:gt(0):diag():type(p:type()) * (p - centerP:repeatTensor(N, 1)) 
        self.gradInput[2] = self.Li:gt(0):diag():type(p:type()) * (centerP:repeatTensor(N, 1) - n[minIndex[1]]:repeatTensor(N, 1))
    end

    return self.gradInput
end

Coupled Cluster Loss 的示意圖及其 Loss Function

L (W, X p, X n) = \sum i N p 1 2 m a x {0, ∥ ∥ f (x p i) - c p ∥ ∥ 22 + α - ∥ ∥ f (x n *) - c p ∥ ∥ 22}

Coupled Cluster Loss 中 margin 取值分析

一開始我比較疑惑，這個 margin 值該怎樣設置呢？下面是我的分析：

我們的目標是儘量使得 loss 下降到 0，在迭代梯度更新的過程中，要使得第一項：∥∥f(xpi)−cp∥∥22 儘可能的小，即在圖中就是使得相似圖片與其中心的距離越小越好，而與中心最近的負樣本之距離越大越好。

當 margin，即公式中的 α 越大時，就要使得正樣本與中心的距離更小，離中心最近的負樣本與中心之距離越大，才能使得 loss 將爲 0 。

當 margin 越小時，比如設置爲 0.05 或者直接是 0 時，正樣本與中心的距離就沒有必要像 margin 較大時的那樣，拼命的”靠在一起”才能使得 loss 爲 0，負樣本也沒必要儘量的與中心”拉遠”距離了。

所以，通過上面的分析，這個 margin 很關鍵。設置的太小，如 0，那麼 loss 很容易就下降到 0，因爲正樣本與中心的距離本應就小於最近負樣本與中心的距離。如果設置的太大，那麼就很難收斂了，因爲無論網絡的梯度怎麼更新，即使正樣本與中心都重合了，最近負樣本與中心的距離要”拉得”很遠纔有可能使得 loss 將爲 0 。

Coupled Cluster Loss 實驗1

當 Net 爲 AlexNet，Margin = 0.5，batchSize = 10 。
在第一個紀元，即 epoch ＝ 1 時，其 loss 曲線（因爲 trainSize = 49000，所以在一個 epoch 內，共有 4900 次計算迭代，下圖只是一部分）：

可以看見，loss 很快下降到 0.25 的位置上就很難往下降了。我將 loss 打印出來：

下圖是跑了一個晚上，跑到當 epoch = 18 時的情況：

是不是比上面的 Triplet Loss 詭異？就只到 0.25，之後就不再往下降一點點了。

Coupled Cluster Loss 實驗2

我開始找原因，是什麼導致了這種情況？我注意到，當 margin 取值 0.5 的時候，結合其 loss 函數：

L (W, X p, X n) = \sum i N p 1 2 m a x {0, ∥ ∥ f (x p i) - c p ∥ ∥ 22 + α - ∥ ∥ f (x n *) - c p ∥ ∥ 22}

0.25 正好是 12×margin ，margin 取值爲 0.5。所以，我開始懷疑這篇論文中所提到的 Coupled Cluster Loss 到底合不合適，正不正確？

這時候我將上面損失函數改寫一下，就是去掉公式中和 0 的比較：：

L (W, X p, X n) = \sum i N p 1 2 (∥ ∥ f (x p i) - c p ∥ ∥ 22 + α - ∥ ∥ f (x n *) - c p ∥ ∥ 22)

但實驗結果依然如此，loss 還是降到 0.25 就降不下去：

打印出 loss 如下：

Coupled Cluster Loss 實驗3

當我的 margin = 1 時，其 loss 還只是下降到一半就不下了：

Coupled Cluster Loss 實驗4

到這裏，我先斷了訓練，改變 margin 值試試看，我將 margin 改爲 0.05 時，其 loss 走勢如下：

打印出來如下：

從上面可以看見，當 margin = 0.05 時，其 loss 就下降到 0.025 。因此，是不是發現了一個規律？loss 最後的底線值是 margin 值的一半。

Coupled Cluster Loss 實驗5

當 margin ＝ 0.01 時，那就是隻下降到 0.05，其 loss 曲線如下：
當 epoch = 3 時，

當 epoch = 4 時，

看 Terminal 打印出來的內容就更清楚了，只到 0.05，就再不肯往下降一點：

Coupled Cluster Loss 實驗發現與猜想、分析

根據上面的實驗，所以我猜想 margin 取值與 loss 之間有如下關係：

爲什麼會這樣，我想想，這或許與數據有關係。

幾個正樣本就很接近，所以損失函數的第一項就幾乎爲 0，第二項是 margin 的值 α ，第三項就是到中心距離最近的負樣本之距離。其實第三項，我想也很接近於 0，因爲經過網絡的訓練，負樣本的特徵向量與正樣本的特徵向量差別並不大，都是車，在很難區分的細節上有些差異。

Triplet Loss、Coupled Cluster Loss 探究

Preface

Triplet Loss

Triplet Loss 的 Torch 實驗

Triplet Loss 示意圖及其 Loss Function

Triplet Loss 中 margin 取值分析

Triplet Loss 實驗1

Triplet Loss 實驗2

Triplet Loss 實驗3

用訓練好的 model 進行 predict 測試

Triplet Loss 實驗4

Coupled Cluster Loss

Coupled Cluster Loss 的 Torch 實現

Coupled Cluster Loss 的示意圖及其 Loss Function

Coupled Cluster Loss 中 margin 取值分析

Coupled Cluster Loss 實驗1

Coupled Cluster Loss 實驗2

Coupled Cluster Loss 實驗3

Coupled Cluster Loss 實驗4

Coupled Cluster Loss 實驗5

Coupled Cluster Loss 實驗發現與猜想、分析

Torch 中的引用、深拷貝以及 getParameters 獲取參數的探討

利用 caffe 接口構建 CNN 網絡

論文閱讀：Deep Relative Distance Learning: Tell the Difference Between Similar Vehicles

Linux 中 bashrc 中的 rc 是什麼意思

Triplet Loss、Coupled Cluster Loss 探究

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Triplet Loss、Coupled Cluster Loss 探究

Preface

Triplet Loss

Triplet Loss 的 Torch 實驗

Triplet Loss 示意圖及其 Loss Function

Triplet Loss 中 margin 取值分析

Triplet Loss 實驗1

Triplet Loss 實驗2

Triplet Loss 實驗3

用訓練好的 model 進行 predict 測試

Triplet Loss 實驗4

Coupled Cluster Loss

Coupled Cluster Loss 的 Torch 實現

Coupled Cluster Loss 的示意圖 及其 Loss Function

Coupled Cluster Loss 中 margin 取值分析

Coupled Cluster Loss 實驗1

Coupled Cluster Loss 實驗2

Coupled Cluster Loss 實驗3

Coupled Cluster Loss 實驗4

Coupled Cluster Loss 實驗5

Coupled Cluster Loss 實驗發現與猜想、分析

Coupled Cluster Loss 的示意圖及其 Loss Function