【NAS工具箱】Drop Path介紹 + Dropout回顧

【前言】Drop Path是NAS中常用到的一種正則化方法,由於網絡訓練的過程中常常是動態的,Drop Path就成了一個不錯的正則化工具,在FractalNet、NASNet等都有廣泛使用。

Dropout

Dropout是最早的用於解決過擬合的方法,是所有drop類方法的大前輩。Dropout在12年被Hinton提出,並且在ImageNet Classification with Deep Convolutional Neural Network工作AlexNet中使用到了Dropout。

原理 :在前向傳播的時候,讓某個神經元激活以概率1-keep_prob(0<p<1)停止工作。

功能 : 這樣可以讓模型泛化能力更強,因爲其不會過於以來某些局部的節點。訓練階段以keep_prob的概率保留,以1-keep_prob的概率關閉;測試階段所有的神經元都不關閉,但是對訓練階段應用了dropout的神經元,輸出值需要乘以keep_prob。

具體是這樣的:

假設一個神經元的輸出激活值爲a,在不使用dropout的情況下,其輸出期望值爲a,如果使用了dropout,神經元就可能有保留和關閉兩種狀態,把它看作一個離散型隨機變量,它就符合概率論中的0-1分佈,其輸出激活值的期望變爲 p*a+(1-p)*0=pa,此時若要保持期望和不使用dropout時一致,就要除以 p
作者:種子_fe
鏈接:https://www.imooc.com/article/30129

實現 : pytorch中的實現如下。

class _DropoutNd(Module):
    __constants__ = ['p', 'inplace']
    p: float
    inplace: bool

    def __init__(self, p: float = 0.5, inplace: bool = False) -> None:
        super(_DropoutNd, self).__init__()
        if p < 0 or p > 1:
            raise ValueError("dropout probability has to be between 0 and 1, "
                             "but got {}".format(p))
        self.p = p
        self.inplace = inplace

    def extra_repr(self) -> str:
        return 'p={}, inplace={}'.format(self.p, self.inplace)
    
class Dropout(_DropoutNd):
    def forward(self, input: Tensor) -> Tensor:
        return F.dropout(input, self.p, self.training, self.inplace)

funtional.py中的dropout實現:

def dropout(input: Tensor, p: float = 0.5, training: bool = True, inplace: bool = False) -> Tensor:
    r"""
    During training, randomly zeroes some of the elements of the input
    tensor with probability :attr:`p` using samples from a Bernoulli
    distribution.
    See :class:`~torch.nn.Dropout` for details.
    Args:
        p: probability of an element to be zeroed. Default: 0.5
        training: apply dropout if is ``True``. Default: ``True``
        inplace: If set to ``True``, will do this operation in-place. Default: ``False``
    """
    if has_torch_function_unary(input):
        return handle_torch_function(dropout, (input,), input, p=p, training=training, inplace=inplace)
    if p < 0.0 or p > 1.0:
        raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
    return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)

最終在Dropout.cpp中找到具體實現:

template<bool feature_dropout, bool alpha_dropout, bool inplace, typename T>
Ctype<inplace> _dropout_impl(T& input, double p, bool train) {
  TORCH_CHECK(p >= 0 && p <= 1, "dropout probability has to be between 0 and 1, but got ", p);
  if (p == 0 || !train || input.numel() == 0) {
    return input;
  }

  if (p == 1) {
    return multiply<inplace>(input, at::zeros({}, input.options()));
  }

  at::Tensor b; // used for alpha_dropout only
  auto noise = feature_dropout ? make_feature_noise(input) : at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);
  noise.bernoulli_(1 - p);
  if (alpha_dropout) {
    constexpr double alpha = 1.7580993408473766;
    double a = 1. / std::sqrt((alpha * alpha * p + 1) * (1 - p));
    b = noise.add(-1).mul_(alpha * a).add_(alpha * a * p);
    noise.mul_(a);
  } else {
    noise.div_(1 - p);
  }  

  if (!alpha_dropout) {
    return multiply<inplace>(input, noise);
  } else {
    return multiply<inplace>(input, noise).add_(b);
  }
}

流程:

  • 判斷p的範圍 以及訓練狀態
  • 使用1-p的概率得到伯努利分佈(0-1分佈)
  • (input / 1-p) * 伯努利分佈

Drop Path

原理 :字如其名,Drop Path就是隨機將深度學習網絡中的多分支結構隨機刪除。

功能 :一般可以作爲正則化手段加入網絡,但是會增加網絡訓練的難度。尤其是在NAS問題中,如果設置的drop prob過高,模型甚至有可能不收斂。

實現

def drop_path(x, drop_prob: float = 0., training: bool = False):
    if drop_prob == 0. or not training:
        return x
    keep_prob = 1 - drop_prob
    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
    random_tensor.floor_()  # binarize
    output = x.div(keep_prob) * random_tensor
    return output


class DropPath(nn.Module):
    """Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
    """
    def __init__(self, drop_prob=None):
        super(DropPath, self).__init__()
        self.drop_prob = drop_prob

    def forward(self, x):
        return drop_path(x, self.drop_prob, self.training)

有了Dropout的理論鋪墊,這裏的實現就比較明瞭了,具體使用的時候一般是這樣的:

x = x + self.drop_path(self.conv(x))

Drop Path不能直接這樣使用:

x = self.drop_path(x)

Reference

https://www.cnblogs.com/dan-baishucaizi/p/14703263.html

https://www.imooc.com/article/30129

https://www.github.com/pytorch/pytorch

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章