《Loss Function》
本文总结Pytorch中的Loss Function
Loss Function是深度学习模型训练中非常重要的一个模块，它评估网络输出与真实目标之间误差，训练中会根据这个误差来更新网络参数，使得误差越来越小；所以好的，与任务匹配的Loss Function会得到更好的模型。
但本文不会讨论什么任务用什么损失函数，只是总结下Pytorch中的Loss Function

0 博客目录

Pytorch模型训练(0) - CPN源码解析
 Pytorch模型训练(1) - 模型定义
 Pytorch模型训练(2) - 模型初始化
 Pytorch模型训练(3) - 模型保存与加载
 Pytorch模型训练(4) - Loss Function
Pytorch模型训练(5) - Optimizer
Pytorch模型训练(6) - 数据加载

1 Loss 基类

Pytorch在loss function实现时，先对共有参数进行基类封装，而不同loss function将继承下面这两个基类

class _Loss(Module):
    def __init__(self, size_average=None, reduce=None, reduction='mean'):
        super(_Loss, self).__init__()
        if size_average is not None or reduce is not None:
            self.reduction = _Reduction.legacy_get_string(size_average, reduce)
        else:
            self.reduction = reduction

_Loss基类：继承自Mudule，传入了3个外来参数

1）size_average：bool类型，可选参数，是否均值化损失值。默认True，表示在当前Batch中将损失值均值化；若为False，则只将Batch中损失相加；当reduce=False，它被忽略；已经不推荐使用
2）reduce：bool类型，可选参数，是否均值化损失值。默认True，表示会根据size_average做相应操作；若为False，返回每批元素的损失并忽略size_average；已经不推荐使用
3）reduction：string类型，可选参数，按指定对损失做相应操作。默认‘mean’：表示对损失进行均值化；‘none’：表示不做操作；‘sum’：求和操作。
值得注意的是，前2个参数虽然正在被放弃的路上，但若前2个参数只要有一个不为None，则reduction就会被覆盖，见上述代码的if-else语句

class _WeightedLoss(_Loss):
    def __init__(self, weight=None, size_average=None, reduce=None, reduction='mean'):
        super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
        self.register_buffer('weight', weight)

_WeightedLoss基类：继承自_Loss，比_Loss多weight参数

4）weight：Tensor类型，可选参数，用来调节不同损失的权重。如果使用，一般要求它的尺寸要与batch相同，否则被视为所有元素权重一样

2 Pytorch Loss

2.1 L1Loss

torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')

criterion：计算输入x与目标y之间的平均绝对误差（mean absolute error （MAE））

也就是说，如果求平均，返回就是一个标量；否则返回就是和label输入维度一样的张量

Examples:

loss = nn.L1Loss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

2.2 MSELoss

torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')

criterion：计算输入x与目标y之间的平均平方误差（mean squared error (squared L2 norm) ）

Examples:

loss = nn.MSELoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5)
output = loss(input, target)
output.backward()

2.3 NLLLoss

torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

criterion：负对数似然损失（The negative log likelihood loss）。常用于训练分类问题中
可设置1D Tensor的weight参数，为每个类分配权重，当训练不平衡样本时，尤其有用

参数ignore_index：指定忽略某个目标值，在计算平均值时，将被忽略
可以在NN结尾加上LogSoftmax layer来获取对数似然值，若不想加额外层，则可用CrossEntropyLoss代替

Examples:

m = nn.LogSoftmax()
loss = nn.NLLLoss()
# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
# each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
output.backward()


# 2D loss example (used, for example, with image inputs)
N, C = 5, 4
loss = nn.NLLLoss()
# input is of size N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax()
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
output.backward()

2.4 CrossEntropyLoss

torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')

criterion：交叉熵损失，常用于分类任务，它是nn.LogSoftmax和nn.NLLLoss的结合体

Examples:

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()

2.5 PoissonNLLLoss

torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')

criterion：服从泊松分布的负对数似然损失

最后一项可以省略或用Stirling formula近似。近似值用于大于1的目标值，对于小于或等于1的零，将损失设为零。

参数log_input：
若为True：loss = exp(input)−target∗input
若为False：loss = input−target∗log(input+eps)
参数full：是否计算全部loss，如增加Stirling formula近似
target∗log(target)−target+0.5∗log(2πtarget)
参数eps：防止log(0)现象，默认1e-8

Examples:

loss = nn.PoissonNLLLoss()
log_input = torch.randn(5, 2, requires_grad=True)
target = torch.randn(5, 2)
output = loss(log_input, target)
output.backward()

2.6 KLDivLoss

torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean')

criterion：KL离散损失（The Kullback-Leibler divergence Loss）
KL散度是对连续分布有用距离度量，并且在对（离散采样的）连续输出分布的空间执行直接回归时通常是有用的。
与NLLLoss一样，给定的输入包含对数概率，但是，与NLLLoss不同，输入不限于2D Tensor；目标以概率形式给出。

2.7 BCELoss

torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')

criterion：二分类交叉熵损失，常用于分类任务

此函数可以认为是nn.CrossEntropyLoss函数的特例。其分类限定为二分类，y 必须是{0,1}。还需要注意的是，input 应该为概率分布的形式，这样才符合交叉熵的应用。所以在 BCELoss 之前，input 一般为sigmoid激活层的输出

Examples:

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()

2.8 BCEWithLogitsLoss

torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)

criterion：结合来sigmoid的BCELoss，与CrossEntropyLoss类似，比单独sigmoid+BCELoss更加稳定

2.9 MarginRankingLoss

torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

criterion：计算输入x1，x2（2个1D张量）与y（1或-1）的损失
计算两个向量之间的相似度，当两个向量之间的距离大于 margin,则 loss 为正，小于margin，loss 为 0

2.10 HingeEmbeddingLoss

torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')

criterion：给定输入张量x和标签张量y（1或-1）的损失
通常用于测量两个输入是相似还是不相似，例如，使用L1成对距离作为x，并且通常用于学习非线性嵌入或半监督学习

2.11 MultiLabelMarginLoss

torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')

criterion：用于一个样本属于多个类别时的分类任务
例如一个多分类任务，样本 x 属于第 0类，属于第 1 类，不属于第 2 类，不属于第 3 类

2.12 SmoothL1Loss

torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean')

criterion：如果元素绝对误差小于1，使用平方均值；否则，使用绝对均值

在存在异常值时，比MSELoss敏感度低，并且在某些情况下可以防止梯度爆炸

2.13 SoftMarginLoss

torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')

criterion：用于优化输入张量x和目标张量y（1或-1）之间的两类分类逻辑损失

2.14 MultiLabelSoftMarginLoss

torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction='mean')

criterion：基于最大熵，优化输入张量x和多目标张量y(N, C)之间一对多损失

2.15 CosineEmbeddingLoss

torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

criterion：基于余弦距离，利用目标张量y(1或-1)，度量输入张量x1和x2之间相似度

2.16 MultiMarginLoss

torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')

criterion：多分类的hinge损失（margin-based loss）

若加上权重，则

2.17 TripletMarginLoss

torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')

criterion：3元损失，度量输入x1，x2，x3之间的相似度

triplet：a（anchor），p（positive），n（negative）
人脸验证中常常用到，它的目的就是让p与a尽量相似（同一个人不同样本），而n与a尽量不相似（不同人的样本）

Examples:

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
input1 = torch.randn(100, 128, requires_grad=True)
input2 = torch.randn(100, 128, requires_grad=True)
input3 = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(input1, input2, input3)
output.backward()

2.18 CTCLoss

torch.nn.CTCLoss(blank=0, reduction='mean')

criterion：The Connectionist Temporal Classification loss

3 CPN Loss

CPN中 Global loss和Refine loss

# define loss function (criterion) 
criterion1 = torch.nn.MSELoss().cuda() # for Global loss
criterion2 = torch.nn.MSELoss(reduce=False).cuda() # for Refine loss

3.1 Global loss

它用的平均平方误差，直接像MSELoss实例一样

 for global_output, label in zip(global_outputs, targets):
      num_points = global_output.size()[1]
      global_label = label * (valid > 1.1).type(torch.FloatTensor).view(-1, num_points, 1, 1)
      global_loss = criterion1(global_output, torch.autograd.Variable(global_label.cuda(async=True))) / 2.0
      loss += global_loss
      global_loss_record += global_loss.data.item()

3.2 Refine loss

它在设计时，要求动态地将loss值比较大的几个channels进行反向学习，所以在实例化criterion2时，传入参数reduce=False，取消均值操作；而增加下列操作

    refine_loss = criterion2(refine_output, refine_target_var)
    refine_loss = refine_loss.mean(dim=3).mean(dim=2)
    refine_loss *= (valid_var > 0.1).type(torch.cuda.FloatTensor)
    refine_loss = ohkm(refine_loss, 8)
    loss += refine_loss
    refine_loss_record = refine_loss.data.item()

ohkm函数（计算loss较大几个目标的loss）

    def ohkm(loss, top_k):
        ohkm_loss = 0.
        for i in range(loss.size()[0]):
            sub_loss = loss[i]
            topk_val, topk_idx = torch.topk(sub_loss, k=top_k, dim=0, sorted=False)
            tmp_loss = torch.gather(sub_loss, 0, topk_idx)
            ohkm_loss += torch.sum(tmp_loss) / top_k
        ohkm_loss /= loss.size()[0]
        return ohkm_loss

Pytorch模型训练(4) - Loss Function

文章目录

0 博客目录

1 Loss 基类

2 Pytorch Loss

2.1 L1Loss

2.2 MSELoss

2.3 NLLLoss

2.4 CrossEntropyLoss

2.5 PoissonNLLLoss

2.6 KLDivLoss

2.7 BCELoss

2.8 BCEWithLogitsLoss

2.9 MarginRankingLoss

2.10 HingeEmbeddingLoss

2.11 MultiLabelMarginLoss

2.12 SmoothL1Loss

2.13 SoftMarginLoss

2.14 MultiLabelSoftMarginLoss

2.15 CosineEmbeddingLoss

2.16 MultiMarginLoss

2.17 TripletMarginLoss

2.18 CTCLoss

3 CPN Loss

3.1 Global loss

3.2 Refine loss

Caffe Prototxt 特殊層系列：Concat Layer

Caffe Prototxt 特殊層系列：Softmax Layer

Pytorch模型訓練(0) - CPN源碼解析

Caffe Prototxt 特徵層系列：Scale Layer

Pytorch模型訓練(3) - 模型保存與加載

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結