5、nn.L1Loss

功能：计算inputs与target之差的绝对值；
主要参数：reduction：计算模式，可为none/sum/mean；none是逐个元素计算，sum是所有元素求和，返回标量；mean是加权平均，返回标量；
计算公式： $l_{n}=\left|x_{n}-y_{n}\right|$

nn.L1Loss(size_average=None,reduce=None,reduction='mean')

通过代码观察其功能：

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from toolss.common_tools import set_seed

set_seed(1)  # 设置随机种子

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3  # [[3,3],[3,3]]

loss_f = nn.L1Loss(reduction='none')
loss = loss_f(inputs, target)

print("input:{}\ntarget:{}\nL1 loss:{}".format(inputs, target, loss))

其输出为：

input:tensor([[1., 1.],
        [1., 1.]])
target:tensor([[3., 3.],
        [3., 3.]])
L1 loss:tensor([[2., 2.],
        [2., 2.]])

6、nn.MSELoss

功能：计算inputs与target之差的平方；
主要参数：reduction：计算模式，可为none/sum/mean；none是逐个元素计算，sum是所有元素求和，返回标量；mean是加权平均，返回标量；
计算公式： $l_{n}=\left(x_{n}-y_{n}\right)^{2}$

nn.MSELoss(size_average=None,reduce=None,reduction='mean')

通过代码观察其功能：

import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
import numpy as np
from toolss.common_tools import set_seed

set_seed(1)  # 设置随机种子

inputs = torch.ones((2, 2))
target = torch.ones((2, 2)) * 3  # [[3,3],[3,3]]

loss_f_mse = nn.MSELoss(reduction='none')
loss_mse = loss_f_mse(inputs, target)

print("MSE loss:{}".format(loss_mse))

其输出为：

MSE loss:tensor([[4., 4.],
        [4., 4.]])

7、nn.SmoothL1Loss

功能：平滑的L1Loss
主要参数：reduction：计算模式，可为none/sum/mean；none是逐个元素计算，sum是所有元素求和，返回标量；mean是加权平均，返回标量；
计算公式： $\operatorname{loss}(x, y)=\frac{1}{n} \sum_{i} z_{i}$ $z_{i}=\left\{\begin{array}{ll} 0.5\left(x_{i}-y_{i}\right)^{2}, & \text { if }\left|x_{i}-y_{i}\right|<1 \\ \left|x_{i}-y_{i}\right|-0.5, & \text { otherwise } \end{array}\right.$ 公式中的 $x_i$ 是指模型的输出， $y_i$ 是指标签，看一下平滑的L1Loss和L1Loss的示意图：

从图中可以看到，X轴是 $x_i-y_i$ ，当 $-1<x_i-y_i<1$ 时，采用的是L2Loss，通过平方取代L1Loss来实现平滑的过程，通过平滑可以减轻离群点带来的影响。

nn.SmoothL1Loss(size_average=None,reduce=None,reduction='mean')

现在通过代码观看SmoothL1Loss的实现，代码中同时使用L1Loss和SmoothL1Loss比较：

    inputs = torch.linspace(-3, 3, steps=500)  # -3到3均匀取500个点
    target = torch.zeros_like(inputs)  # target的形状和inputs一样，但是target值全为零

    loss_f = nn.SmoothL1Loss(reduction='none')  # 平滑L1Loss

    loss_smooth = loss_f(inputs, target)

    loss_l1 = np.abs(inputs.numpy())  # L1Loss

    plt.plot(inputs.numpy(), loss_smooth.numpy(), label='Smooth L1 Loss')
    plt.plot(inputs.numpy(), loss_l1, label='L1 loss')
    plt.xlabel('x_i - y_i')
    plt.ylabel('loss value')
    plt.legend()
    plt.grid()
    plt.show()

上面代码的输出为：

可以发现，其输出和我们介绍的图案是一样的。

8、nn.PoissonNLLLoss

功能：泊松分布的负对数似然损失函数；
主要参数：

log_input：输入是否为对数形式，决定计算形式；
full：计算所有损失，默认为False；
eps：修正项，避免log(input)为nan；

nn.PoissonNLLLoss(log_input=True,full=False,size_average=None,eps=1e-08,reduce=None,reduction='mean')

log_input = True：loss(input, target) = exp(input) - target * input
log_input = False：input - target * log(input + eps)

其代码使用为：

inputs = torch.randn((2, 2))
target = torch.randn((2, 2))

loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none')
loss = loss_f(inputs, target)
print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss))

代码的输出为：

input:tensor([[0.6614, 0.2669],
        [0.0617, 0.6213]])
target:tensor([[-0.4519, -0.1661],
        [-1.5228,  0.3817]])
Poisson NLL loss:tensor([[2.2363, 1.3503],
        [1.1575, 1.6242]])

9、nn.KLDivLoss

功能：计算KLD(divergence)，KL散度，相对熵；
注意事项：需提前将输入计算log-probabilities，如通过nn.logsoftmax()；
主要参数：reduction：none/sum/mean/batchmean；batchsize是在batchsize维度求平均值；
计算公式： $D_{K L}(P \| Q)=E_{x, r}\left[\log \frac{P(x)}{Q(x)}\right]=E_{x-p}[\log P(x)-\log Q(x)]$ $=\sum_{i=1}^{N} P\left(x_{i}\right)\left(\log P\left(x_{i}\right)-\log Q\left(x_{i}\right)\right)$ 公式中的P是真实的分布，Q是模型输出的分布，我们要让Q的分布逼近P的分布 $l_{n}=y_{n} \cdot\left(\log y_{n}-x_{n}\right)$ 由于公式只是对一个样本计算损失函数，因此没有求和符号；KL散度公式中的 $P(x_i)$ 对应损失函数中的 $y_n$ ，loss函数中减去的是 $x_n$ ，而KL散度中是 $log(Q(x_i))$ ，因此需要提前将输入计算log-probabilities，经过logsoftmax之后就可以在loss中直接减去 $x_n$ ，这可以通过Pytorch中的nn.logsoftmax()实现。

nn.KLDivLoss有一个特别的参数，就是reduction中的batchmean，基于batchsize维度求取平均值，除数不是元素个数而是batchsize大小。

nn.KLDivLoss(size_average=None, reduce=None,reduction='mean')

下面通过代码学习nn.KLDivLoss，其代码如下：

inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])
inputs_log = torch.log(inputs)
target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)

loss_f_none = nn.KLDivLoss(reduction='none')
loss_f_mean = nn.KLDivLoss(reduction='mean')
loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')

loss_none = loss_f_none(inputs, target)
loss_mean = loss_f_mean(inputs, target)
loss_bs_mean = loss_f_bs_mean(inputs, target)

    print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))

其输出为：

loss_none:
tensor([[-0.5448, -0.1648, -0.1598],
        [-0.2503, -0.4597, -0.4219]])
loss_mean:
-0.3335360586643219
loss_bs_mean:
-1.000608205795288

10、nn.MarginRankingLoss

功能：用于计算两个向量之间的相似度，用于排序任务；
特别说明：该方法计算两组数据之间的差异，返回一个n*n的loss矩阵；
主要参数：

margin：边界值，x1与x2之间的差异值；
reduction：计算模式，可为none/sum/mean；
Loss计算公式： $\operatorname{loss}(x, y)=\max (0,-y *(x 1-x 2)+\operatorname{margin})$ 公式中的y指的是标签，取值只能是1和-1；x1和x2就是两组数据，两个向量的每个元素计算其差值。

考虑一下Loss公式中-y的作用：

y=1时，希望x1比x2大，当x1大于x2时，不产生loss；
y=-1时，希望x2比x1大，当x2大于x1时，不产生loss；

特别说明中提及该方法用于计算两组数据之间的差异，它会对两组数据中的元素分别计算差异。比如有两个形状为[1,3]的数据，经过计算后，该方法会得到一个3*3的loss矩阵；第一组数据中的第一个元素会分别和第二组数据中的3个元素计算loss；

nn.MarginRankingLoss(margin=0.0,size_average=None,reduce=none，reduction='mean')

下面通过代码学习nn.MarginRankingLoss的具体使用：

x1 = torch.tensor([[1], [2], [3]], dtype=torch.float)
x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)

target = torch.tensor([1, 1, -1], dtype=torch.float)

loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none')

loss = loss_f_none(x1, x2, target)

print(loss)

其输出为：

tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 1.]])

11、nn.MultiLabelMarginLoss

功能：多标签边界损失函数；
主要参数：

reduction：计算模式，可为none/sum/mean；
举例：四分类任务，样本x属于0类和3类，标签表示为[0,3,-1,-1]而不是[1,0,0,1]；
loss计算公式： $\operatorname{loss}(x, y)=\sum_{i j} \frac{\max (0,1-(x[y[j]]-x[i]))}{x \cdot \operatorname{size}(0)}$ 公式中分母x.size(0)是输出向量神经元个数，分子中的i的取值范围为0到x.size()，j的取值范围为0到y.size()，y[j]大于等于0，同时i不等于y[j]。

假设四分类的标签为[0,3,-1-1]，则分子中的x[y[i]]只能是x[0]和x[3]，同样，分子中的x[i]只能是x[1]和x[2]。

nn.MultiLabelMarginLoss(size_average=None,reduce=None,reduction='mean')

通过代码观察MultiLabelMarginLoss的实际操作：

x = torch.tensor([[0.1, 0.2, 0.4, 0.8]])
y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long)

loss_f = nn.MultiLabelMarginLoss(reduction='none')

loss = loss_f(x, y)

print(loss)

代码的输出为：

tensor([0.8500])

下面通过手动计算MultiLabelMarginLoss的损失：

x = x[0]
item_1 = (1-(x[0] - x[1])) + (1 - (x[0] - x[2]))    # [0]
item_2 = (1-(x[3] - x[1])) + (1 - (x[3] - x[2]))    # [3]

loss_h = (item_1 + item_2) / x.shape[0]

print(loss_h)

得到的输出和Pytorch中实现的MultiLabelMarginLoss是一样的。

12、nn.softMarginLoss

功能：计算二分类的logistic损失；
主要参数：计算模式，可为none/sum/mean；
loss计算公式： $\operatorname{loss}(x, y)=\sum_{i} \frac{\log (1+\exp (-y[i] * x[i]))}{\text { x.nelement } 0}$

nn.softMarginLoss(size_average=None,reduce=None,reduction='mean')

通过代码看一下nn.softMarginLoss的具体操作：

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)

loss_f = nn.SoftMarginLoss(reduction='none')

loss = loss_f(inputs, target)

print("SoftMargin: ", loss)

其对应的输出为

SoftMargin:  tensor([[0.8544, 0.4032],
        [0.4741, 0.9741]])

13、nn.MultiLabelSoftMarginLoss

功能：SoftMarginLoss多标签版本
主要参数：

weight：各类别的loss设置权值；
reduction：计算模式，可为none/sum/mean；

loss计算公式： $\operatorname{loss}(x, y)=-\frac{1}{C} * \sum_{i} y[i] * \log \left((1+\exp (-x[i]))^{-1}\right)+(1-y[i]) * \log \left(\frac{\exp (-x[i])}{(1+\exp (-x[i]))}\right)$

nn.MultiLabelSoftMarginLoss(weight=None,size_average=None,reduce=None,reduction='mean')

下面通过代码观察一下其功能：

inputs = torch.tensor([[0.3, 0.7, 0.8]])
target = torch.tensor([[0, 1, 1]], dtype=torch.float)

loss_f = nn.MultiLabelSoftMarginLoss(reduction='none')

loss = loss_f(inputs, target)

print("MultiLabel SoftMargin: ", loss)

代码对应的输出为：

MultiLabel SoftMargin:  tensor([0.5429])

14、nn.MultiMarginLoss

功能：计算多分类的折页损失；
主要参数：

P：可选1或2；
weight：各类别的loss设置权值；
margin：边界值；
reduction：计算模式，可为none/sum/mean；
loss计算公式： $\operatorname{loss}(x, y)=\frac{\left.\sum_{i} \max (0, \operatorname{margin}-x[y]+x[i])\right)^{p}}{\mathrm{x} . \operatorname{size}(0)}$ 公式中 $x \in\{0, \cdots, \text { x.size }(0)-1\}, y \in\{0, \cdots, \text { y.size }(0)-1\}, 0 \leq y[j] \leq x . \text { size }(0)-1$ 对于所有的 $i$ 和 $j$ ，有 $i \neq y[j]$

nn.MultiMarginLoss(p=1,margin=1.0,weight=None,size_average=None,reduce=None,reduction='mean')

其代码例子为：

x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]])
y = torch.tensor([1, 2], dtype=torch.long)

loss_f = nn.MultiMarginLoss(reduction='none')

loss = loss_f(x, y)

print("Multi Margin Loss: ", loss)

代码输出为

Multi Margin Loss:  tensor([0.8000, 0.7000])

15、nn.TripleMarginLoss

功能：计算三元组损失，人脸验证中常用；
主要参数：

P：范数的阶，默认为2；
margin：边界值；
reduction：计算模式，可为none/sum/mean；
loss计算公式： $L(a, p, n)=\max \left\{d\left(a_{i}, p_{i}\right)-d\left(a_{i}, n_{i}\right)+\text { margin, } 0\right\}$ $d\left(x_{i}, y_{i}\right)=\left\|\mathbf{x}_{i}-\mathbf{y}_{i}\right\|_{p_{p}}$

nn.TripleMarginLoss(margin=1.0,eps=1e-06,swap=False,size_average=None,reduce=None,reduction='mean')

下面通过代码观察其使用：

anchor = torch.tensor([[1.]])
pos = torch.tensor([[2.]])
neg = torch.tensor([[0.5]])

loss_f = nn.TripletMarginLoss(margin=1.0, p=1)

loss = loss_f(anchor, pos, neg)

print("Triplet Margin Loss", loss)

代码对应输出为：

Triplet Margin Loss tensor(1.5000)

16、nn.HingeEmbeddingLoss

功能：计算两个输入的相似性，常用于非线性embedding和半监督学习；
特别注意：输入X应该为两个输入之差的绝对值；
主要参数：

margin：边界值；
reduction：计算模式，可为none/sum/mean；
损失函数： $l_{n}=\left\{\begin{array}{ll} x_{n}, & \text { if } y_{n}=1 \\ \max \left\{0, \Delta-x_{n}\right\}, & \text { if } y_{n}=-1 \end{array}\right.$

nn.HingeEmbeddingLoss(margin=1.0,size_average=None,reduce=None,reduction='mean')

下面通过代码观察其功能：

inputs = torch.tensor([[1., 0.8, 0.5]])
target = torch.tensor([[1, 1, -1]])

loss_f = nn.HingeEmbeddingLoss(margin=1, reduction='none')

loss = loss_f(inputs, target)

print("Hinge Embedding Loss", loss)

代码的输出为：

Hinge Embedding Loss tensor([[1.0000, 0.8000, 0.5000]])

17、nn.CosineEmbeddingLoss

功能：使用余弦相似度计算两个输入的相似性；
主要参数： $\operatorname{loss}(x, y)=\left\{\begin{array}{ll} 1-\cos \left(x_{1}, x_{2}\right), & \text { if } y=1 \\ \max \left(0, \cos \left(x_{1}, x_{2}\right)-\operatorname{margin}\right), & \text { if } y=-1 \end{array}\right.$ $\cos (\theta)=\frac{A \cdot B}{\|A\|\|B\|}=\frac{\sum_{i=1}^{n} A_{i} \times B_{i}}{\sqrt{\sum_{i=1}^{n}\left(A_{i}\right)^{2}} \times \sqrt{\sum_{i=1}^{n}\left(B_{i}\right)^{2}}}$

margin：可取值[-1,1]，推荐为[0,0.5]；
reduction：计算模式，可为none/sum/mean；
loss损失函数计算公式：

nn.CosineEmbeddingLoss(margin=0.0,size_average=None,reduce=None,reduction='mean')

下面通过代码观察nn.CosineEmbeddingLoss功能：

x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]])
x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]])

target = torch.tensor([[1, -1]], dtype=torch.float)

loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none')

loss = loss_f(x1, x2, target)

print("Cosine Embedding Loss", loss)

代码的输出为：

Cosine Embedding Loss tensor([[0.0167, 0.9833]])

18、nn.CTCLoss

功能：计算CTC损失，解决时序数据的分类；
主要参数：

blank：blank label；
zero_infinity：无穷大的值或梯度置0；
reduction：计算模式，可为none/sum/mean；

torch.nn.CTCLoss(blank=0,reduction='mean',zero_infinity=False)

Pytorch —— 损失函数（二）

目录

5、nn.L1Loss

6、nn.MSELoss

7、nn.SmoothL1Loss

8、nn.PoissonNLLLoss

9、nn.KLDivLoss

10、nn.MarginRankingLoss

11、nn.MultiLabelMarginLoss

12、nn.softMarginLoss

13、nn.MultiLabelSoftMarginLoss

14、nn.MultiMarginLoss

15、nn.TripleMarginLoss

16、nn.HingeEmbeddingLoss

17、nn.CosineEmbeddingLoss

18、nn.CTCLoss

leetcode —— 959. 由斜槓劃分區域

Python詞彙比較運算符

Python —— any()函數和all()函數

Pytorch —— 模型保存與加載

leetcode —— 40. 組合總和 II

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結