《Loss Function》
本文總結Pytorch中的Loss Function
Loss Function是深度學習模型訓練中非常重要的一個模塊，它評估網絡輸出與真實目標之間誤差，訓練中會根據這個誤差來更新網絡參數，使得誤差越來越小；所以好的，與任務匹配的Loss Function會得到更好的模型。
但本文不會討論什麼任務用什麼損失函數，只是總結下Pytorch中的Loss Function

0 博客目錄

Pytorch模型訓練(0) - CPN源碼解析
 Pytorch模型訓練(1) - 模型定義
 Pytorch模型訓練(2) - 模型初始化
 Pytorch模型訓練(3) - 模型保存與加載
 Pytorch模型訓練(4) - Loss Function
Pytorch模型訓練(5) - Optimizer
Pytorch模型訓練(6) - 數據加載

1 Loss 基類

Pytorch在loss function實現時，先對共有參數進行基類封裝，而不同loss function將繼承下面這兩個基類

class _Loss(Module):
    def __init__(self, size_average=None, reduce=None, reduction='mean'):
        super(_Loss, self).__init__()
        if size_average is not None or reduce is not None:
            self.reduction = _Reduction.legacy_get_string(size_average, reduce)
        else:
            self.reduction = reduction

_Loss基類：繼承自Mudule，傳入了3個外來參數

1）size_average：bool類型，可選參數，是否均值化損失值。默認True，表示在當前Batch中將損失值均值化；若爲False，則只將Batch中損失相加；當reduce=False，它被忽略；已經不推薦使用
2）reduce：bool類型，可選參數，是否均值化損失值。默認True，表示會根據size_average做相應操作；若爲False，返回每批元素的損失並忽略size_average；已經不推薦使用
3）reduction：string類型，可選參數，按指定對損失做相應操作。默認‘mean’：表示對損失進行均值化；‘none’：表示不做操作；‘sum’：求和操作。
值得注意的是，前2個參數雖然正在被放棄的路上，但若前2個參數只要有一個不爲None，則reduction就會被覆蓋，見上述代碼的if-else語句

class _WeightedLoss(_Loss):
    def __init__(self, weight=None, size_average=None, reduce=None, reduction='mean'):
        super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
        self.register_buffer('weight', weight)

_WeightedLoss基類：繼承自_Loss，比_Loss多weight參數

4）weight：Tensor類型，可選參數，用來調節不同損失的權重。如果使用，一般要求它的尺寸要與batch相同，否則被視爲所有元素權重一樣

2 Pytorch Loss

2.1 L1Loss

torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')

criterion：計算輸入x與目標y之間的平均絕對誤差（mean absolute error （MAE））

也就是說，如果求平均，返回就是一個標量；否則返回就是和label輸入維度一樣的張量

Examples:

loss = nn.L1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

2.2 MSELoss

torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')

criterion：計算輸入x與目標y之間的平均平方誤差（mean squared error (squared L2 norm) ）

Examples:

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

2.3 NLLLoss

torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

criterion：負對數似然損失（The negative log likelihood loss）。常用於訓練分類問題中
可設置1D Tensor的weight參數，爲每個類分配權重，當訓練不平衡樣本時，尤其有用

參數ignore_index：指定忽略某個目標值，在計算平均值時，將被忽略
可以在NN結尾加上LogSoftmax layer來獲取對數似然值，若不想加額外層，則可用CrossEntropyLoss代替

Examples:

m = nn.LogSoftmax()
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()


# 2D loss example (used, for example, with image inputs)
N, C = 5, 4
loss = nn.NLLLoss()
# input is of size N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax()
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
output.backward()

2.4 CrossEntropyLoss

torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

criterion：交叉熵損失，常用於分類任務，它是nn.LogSoftmax和nn.NLLLoss的結合體

Examples:

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()

2.5 PoissonNLLLoss

torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')

criterion：服從泊松分佈的負對數似然損失

最後一項可以省略或用Stirling formula近似。近似值用於大於1的目標值，對於小於或等於1的零，將損失設爲零。

參數log_input：
若爲True：loss = exp(input)−target∗input
若爲False：loss = input−target∗log(input+eps)
參數full：是否計算全部loss，如增加Stirling formula近似
target∗log(target)−target+0.5∗log(2πtarget)
參數eps：防止log(0)現象，默認1e-8

Examples:

loss = nn.PoissonNLLLoss()
log_input = torch.randn(5, 2, requires_grad=True)
target = torch.randn(5, 2)
output = loss(log_input, target)
output.backward()

2.6 KLDivLoss

torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean')

criterion：KL離散損失（The Kullback-Leibler divergence Loss）
KL散度是對連續分佈有用距離度量，並且在對（離散採樣的）連續輸出分佈的空間執行直接回歸時通常是有用的。
與NLLLoss一樣，給定的輸入包含對數概率，但是，與NLLLoss不同，輸入不限於2D Tensor；目標以概率形式給出。

2.7 BCELoss

torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')

criterion：二分類交叉熵損失，常用於分類任務

此函數可以認爲是nn.CrossEntropyLoss函數的特例。其分類限定爲二分類，y 必須是{0,1}。還需要注意的是，input 應該爲概率分佈的形式，這樣才符合交叉熵的應用。所以在 BCELoss 之前，input 一般爲sigmoid激活層的輸出

Examples:

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()

2.8 BCEWithLogitsLoss

torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)

criterion：結合來sigmoid的BCELoss，與CrossEntropyLoss類似，比單獨sigmoid+BCELoss更加穩定

2.9 MarginRankingLoss

torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

criterion：計算輸入x1，x2（2個1D張量）與y（1或-1）的損失
計算兩個向量之間的相似度，當兩個向量之間的距離大於 margin,則 loss 爲正，小於margin，loss 爲 0

2.10 HingeEmbeddingLoss

torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')

criterion：給定輸入張量x和標籤張量y（1或-1）的損失
通常用於測量兩個輸入是相似還是不相似，例如，使用L1成對距離作爲x，並且通常用於學習非線性嵌入或半監督學習

2.11 MultiLabelMarginLoss

torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')

criterion：用於一個樣本屬於多個類別時的分類任務
例如一個多分類任務，樣本 x 屬於第 0類，屬於第 1 類，不屬於第 2 類，不屬於第 3 類

2.12 SmoothL1Loss

torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean')

criterion：如果元素絕對誤差小於1，使用平方均值；否則，使用絕對均值

在存在異常值時，比MSELoss敏感度低，並且在某些情況下可以防止梯度爆炸

2.13 SoftMarginLoss

torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')

criterion：用於優化輸入張量x和目標張量y（1或-1）之間的兩類分類邏輯損失

2.14 MultiLabelSoftMarginLoss

torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction='mean')

criterion：基於最大熵，優化輸入張量x和多目標張量y(N, C)之間一對多損失

2.15 CosineEmbeddingLoss

torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

criterion：基於餘弦距離，利用目標張量y(1或-1)，度量輸入張量x1和x2之間相似度

2.16 MultiMarginLoss

torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')

criterion：多分類的hinge損失（margin-based loss）

若加上權重，則

2.17 TripletMarginLoss

torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')

criterion：3元損失，度量輸入x1，x2，x3之間的相似度

triplet：a（anchor），p（positive），n（negative）
人臉驗證中常常用到，它的目的就是讓p與a儘量相似（同一個人不同樣本），而n與a儘量不相似（不同人的樣本）

Examples:

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
input1 = torch.randn(100, 128, requires_grad=True)
input2 = torch.randn(100, 128, requires_grad=True)
input3 = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(input1, input2, input3)
output.backward()

2.18 CTCLoss

torch.nn.CTCLoss(blank=0, reduction='mean')

criterion：The Connectionist Temporal Classification loss

3 CPN Loss

CPN中 Global loss和Refine loss

# define loss function (criterion) 
criterion1 = torch.nn.MSELoss().cuda() # for Global loss
criterion2 = torch.nn.MSELoss(reduce=False).cuda() # for Refine loss

3.1 Global loss

它用的平均平方誤差，直接像MSELoss實例一樣

 for global_output, label in zip(global_outputs, targets):
      num_points = global_output.size()[1]
      global_label = label * (valid > 1.1).type(torch.FloatTensor).view(-1, num_points, 1, 1)
      global_loss = criterion1(global_output, torch.autograd.Variable(global_label.cuda(async=True))) / 2.0
      loss += global_loss
      global_loss_record += global_loss.data.item()

3.2 Refine loss

它在設計時，要求動態地將loss值比較大的幾個channels進行反向學習，所以在實例化criterion2時，傳入參數reduce=False，取消均值操作；而增加下列操作

    refine_loss = criterion2(refine_output, refine_target_var)
    refine_loss = refine_loss.mean(dim=3).mean(dim=2)
    refine_loss *= (valid_var > 0.1).type(torch.cuda.FloatTensor)
    refine_loss = ohkm(refine_loss, 8)
    loss += refine_loss
    refine_loss_record = refine_loss.data.item()

ohkm函數（計算loss較大幾個目標的loss）

    def ohkm(loss, top_k):
        ohkm_loss = 0.
        for i in range(loss.size()[0]):
            sub_loss = loss[i]
            topk_val, topk_idx = torch.topk(sub_loss, k=top_k, dim=0, sorted=False)
            tmp_loss = torch.gather(sub_loss, 0, topk_idx)
            ohkm_loss += torch.sum(tmp_loss) / top_k
        ohkm_loss /= loss.size()[0]
        return ohkm_loss

Pytorch模型訓練(4) - Loss Function

文章目錄

0 博客目錄

1 Loss 基類

2 Pytorch Loss

2.1 L1Loss

2.2 MSELoss

2.3 NLLLoss

2.4 CrossEntropyLoss

2.5 PoissonNLLLoss

2.6 KLDivLoss

2.7 BCELoss

2.8 BCEWithLogitsLoss

2.9 MarginRankingLoss

2.10 HingeEmbeddingLoss

2.11 MultiLabelMarginLoss

2.12 SmoothL1Loss

2.13 SoftMarginLoss

2.14 MultiLabelSoftMarginLoss

2.15 CosineEmbeddingLoss

2.16 MultiMarginLoss

2.17 TripletMarginLoss

2.18 CTCLoss

3 CPN Loss

3.1 Global loss

3.2 Refine loss

詐騙（殺豬盤）網站進行滲透測試

Python 潮流週刊#50：我最喜歡的 Python 3.13 新特性！

【Python】保存gym截圖

【譯】使用 GitHub Copilot 作爲你的編碼 GPS

Linux 服務器配置-安裝portainer-ce社區版

外行也能讀懂的網絡硬件設備功能原理速成

安裝Auto-GPT

Caffe Prototxt 特殊層系列：Concat Layer

Caffe Prototxt 特殊層系列：Softmax Layer

Pytorch模型訓練(0) - CPN源碼解析

Caffe Prototxt 特徵層系列：Scale Layer

Pytorch模型訓練(3) - 模型保存與加載

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結