VoVNet：一種實時高效的目標檢測Backbone網絡【pytorch代碼詳解】

2.Factors of Efficient Network Design

在設計輕量級網絡時，FLOPs和模型參數是主要考慮因素，但是減少模型大小和FLOPs不等同於減少推理時間和降低能耗。比如ShuffleNetv2與MobileNetv2在相同的FLOPs下，前者在GPU上速度更快。所以除了FLOPs和模型大小外，還需要考慮其他因素對能耗和模型推理速度的影響。這裏考慮兩個重要的因素：內存訪問成本（Memory Access Cost，MAC）和GPU計算效率。

2.1. Memory Access Cost

對於CNN，能耗在內存訪問而不是計算上。影響MAC的主要是是內存佔用（intermediate activation memory footprint），它主要受卷積核和feature map大小的影響。c爲輸入輸出通道。

2.2. GPUComputation

通過減少FLOP是來加速的前提是，每個flop point的計算效率是一致的。
GPU特性：
（1）擅長parallel computation，tensor越大，GPU使用效率越高。
（2）把大的卷積操作拆分成碎片的小操作將不利於GPU計算。
因此，設計layer數量少的網絡是更好的選擇。mobileNet使用1x1卷積來減少計算量，不過這不利於GPU計算。爲了衡量GPU使用效率，我們使用Flops/s指標。

3. Proposed Method

3.1. Rethinking Dense Connection

當固定卷積層參數量B=k^2hwcico的時候，MAC可以表示如下，根據均值不等式，可以知道當輸入和輸出的channel數相同時MAC才取下界，此時的設計是最高效的。

使用Dense connect模塊，輸出的channel size不變，但是輸入的channel size一直在線性增加，因此DenseNet有很高的MAC。bottleneck connection同樣不利於GPU計算。在模型很大的時候，計算量隨着深度指數二階增長。bottleneck 把一個3x3的卷積分成了兩個計算，相當於增加了一次序列運算。

3.2. One Shot Aggregation

對於DenseNet來說，其核心模塊就是Dense
Block，如下圖1a所示，這種密集連接會聚合前面所有的layer，這導致每個layer的輸入channel數線性增長。受限於FLOPs和模型參數，每層layer的輸出channel數是固定大小，這帶來的問題就是輸入和輸出channel數不一致，如前面所述，此時的MAC不是最優的。另外，由於輸入channel數較大，DenseNet採用了1x1卷積層先壓縮特徵，這個額外層的引入對GPU高效計算不利。所以，雖然DenseNet的FLOPs和模型參數都不大，但是推理卻並不高效，當輸入較大時往往需要更多的顯存和推理時間。

DenseNet的一大問題就是密集連接太重了，而且每個layer都會聚合前面層的特徵，其實造成的是特徵冗餘，而且從模型weights的L1範數會發現中間層對最後的分類層貢獻較少，這不難理解，因爲後面的特徵其實已經學習到了這些中間層的核心信息。這種信息冗餘反而是可以優化的方向，據此這裏提出了OSA（One-Shot Aggregation）模塊，如圖1b所示，簡單來說，就是隻在最後一次性聚合前面所有的layer。這一改動將會解決DenseNet前面所述的問題，因爲每個layer的輸入channel數是固定的，這裏可以讓輸出channel數和輸入一致而取得最小的MAC，而且也不再需要1x1卷積層來壓縮特徵，所以OSA模塊是GPU計算高效的。

那麼OSA模塊效果如何，論文中拿DenseNet-40來做對比，DenseBlock層數是12，OSA模塊也設計爲12層，但是保持和Dense
Block類似的參數大小和計算量，此時OSA模塊的輸出將更大。最終發現在CIFAR-10數據集上acc僅比DenseNet下降了1.2%。但是如果將OSA模塊的層數降至5，而提升layer的通道數爲43，會發現與DenseNet-40模型效果相當。這說明DenseNet中很多中間特徵可能是冗餘的。儘管OSA模塊性能沒有提升，但是MAC低且計算更高效，這對於目標檢測非常重要，因爲檢測模型一般的輸入都是較大的。

3.3. Configuration of VoVNet

VoVNet由OSA模塊構成，主要有三種不同的配置，如下表所示。VoVNet首先是一個由3個3x3卷積層構成的stem block，然後4個階段的OSA模塊，每個stage的最後會採用一個stride爲2的3x3 max pooling層進行降採樣，模型最終的output stride是32。與其他網絡類似，每次降採樣後都會提升特徵的channel數。VoVNet-27-slim是一個輕量級模型，而VoVNet-39/57在stage4和stage5包含更多的OSA模塊，所以模型更大。

4. Experiments

相比於DenseNet-67，PeleeNet減少了Flops，但是推斷速度沒有提升，與之相反，VoVNet-27-slim稍微增加了Flops，而推斷速度提升了一倍。同時，VoVNet-27-slim的精度比其他模型都高。VoVNet-27-slim的內存佔用、能耗、GPU使用效率都是最好的。相比其他模型，VoVNet做到了準確率和效率的均衡，提升了目標檢測的整體性能。

import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict


__all__ = ['VoVNet', 'vovnet27_slim', 'vovnet39', 'vovnet57']


model_urls = {
    'vovnet39': 'https://dl.dropbox.com/s/1lnzsgnixd8gjra/vovnet39_torchvision.pth?dl=1',
    'vovnet57': 'https://dl.dropbox.com/s/6bfu9gstbwfw31m/vovnet57_torchvision.pth?dl=1'
}


def conv3x3(in_channels, out_channels, module_name, postfix,
            stride=1, groups=1, kernel_size=3, padding=1):
    """3x3 convolution with padding"""
    return [
        ('{}_{}/conv'.format(module_name, postfix),
            nn.Conv2d(in_channels, out_channels,
                      kernel_size=kernel_size,
                      stride=stride,
                      padding=padding,
                      groups=groups,
                      bias=False)),
        ('{}_{}/norm'.format(module_name, postfix),
            nn.BatchNorm2d(out_channels)),
        ('{}_{}/relu'.format(module_name, postfix),
            nn.ReLU(inplace=True)),
    ]


def conv1x1(in_channels, out_channels, module_name, postfix,
            stride=1, groups=1, kernel_size=1, padding=0):
    """1x1 convolution"""
    return [
        ('{}_{}/conv'.format(module_name, postfix),
            nn.Conv2d(in_channels, out_channels,
                      kernel_size=kernel_size,
                      stride=stride,
                      padding=padding,
                      groups=groups,
                      bias=False)),
        ('{}_{}/norm'.format(module_name, postfix),
            nn.BatchNorm2d(out_channels)),
        ('{}_{}/relu'.format(module_name, postfix),
            nn.ReLU(inplace=True)),
    ]


class _OSA_module(nn.Module):
    def __init__(self,
                 in_ch,
                 stage_ch,
                 concat_ch,
                 layer_per_block,
                 module_name,
                 identity=False):
        super(_OSA_module, self).__init__()

        self.identity = identity
        self.layers = nn.ModuleList()
        in_channel = in_ch
        for i in range(layer_per_block):
            self.layers.append(nn.Sequential(
                OrderedDict(conv3x3(in_channel, stage_ch, module_name, i))))
            in_channel = stage_ch

        # feature aggregation
        in_channel = in_ch + layer_per_block * stage_ch
        print("inch:",in_ch,"layer_per_block:",layer_per_block,"stage:",stage_ch)
        self.concat = nn.Sequential(
            OrderedDict(conv1x1(in_channel, concat_ch, module_name, 'concat')))

    def forward(self, x):
        identity_feat = x
        output = []
        output.append(x)
        for layer in self.layers:
            x = layer(x)
            output.append(x)
        print("output:",output)
        x = torch.cat(output, dim=1)  #按列拼
        xt = self.concat(x)

        if self.identity:
            xt = xt + identity_feat

        return xt


class _OSA_stage(nn.Sequential):
    def __init__(self,
                 in_ch,
                 stage_ch,
                 concat_ch,
                 block_per_stage,
                 layer_per_block,
                 stage_num):
        super(_OSA_stage, self).__init__()

        if not stage_num == 2:
            self.add_module('Pooling',
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))  #向上取整，默認padding爲0

        module_name = f'OSA{stage_num}_1'
        self.add_module(module_name,
            _OSA_module(in_ch,
                        stage_ch,
                        concat_ch,
                        layer_per_block,
                        module_name))
        for i in range(block_per_stage-1):
            module_name = f'OSA{stage_num}_{i+2}'
            self.add_module(module_name,
                _OSA_module(concat_ch,
                            stage_ch,
                            concat_ch,
                            layer_per_block,
                            module_name,
                            identity=True))


class VoVNet(nn.Module):
    def __init__(self,
                 config_stage_ch,
                 config_concat_ch,
                 block_per_stage,
                 layer_per_block,
                 num_classes=1000):
        super(VoVNet, self).__init__()

        # Stem module
        stem = conv3x3(3,   64, 'stem', '1', 2)
        stem += conv3x3(64,  64, 'stem', '2', 1)
        stem += conv3x3(64, 128, 'stem', '3', 2)
        self.add_module('stem', nn.Sequential(OrderedDict(stem)))

        stem_out_ch = [128]
        in_ch_list = stem_out_ch + config_concat_ch[:-1]
        self.stage_names = []
        for i in range(4): #num_stages
            name = 'stage%d' % (i+2)
            self.stage_names.append(name)
            self.add_module(name,
                            _OSA_stage(in_ch_list[i],
                                       config_stage_ch[i],
                                       config_concat_ch[i],
                                       block_per_stage[i],
                                       layer_per_block,
                                       i+2))

        self.classifier = nn.Linear(config_concat_ch[-1], num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.stem(x)
        for name in self.stage_names:
            x = getattr(self, name)(x)
        x = F.adaptive_avg_pool2d(x, (1, 1)).view(x.size(0), -1)
        x = self.classifier(x)
        return x


def _vovnet(arch,
            config_stage_ch,
            config_concat_ch,
            block_per_stage,
            layer_per_block,
            pretrained,
            progress,
            **kwargs):
    model = VoVNet(config_stage_ch, config_concat_ch,
                   block_per_stage, layer_per_block,
                   **kwargs)
    if pretrained:
        state_dict = torch.hub.load_state_dict_from_url(model_urls[arch],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model


def vovnet57(pretrained=False, progress=True, **kwargs):
    r"""Constructs a VoVNet-57 model as described in
    `"An Energy and GPU-Computation Efficient Backbone Networks"
    <https://arxiv.org/abs/1904.09730>`_.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vovnet('vovnet57', [128, 160, 192, 224], [256, 512, 768, 1024],
                    [1,1,4,3], 5, pretrained, progress, **kwargs)


def vovnet39(pretrained=False, progress=True, **kwargs):
    r"""Constructs a VoVNet-39 model as described in
    `"An Energy and GPU-Computation Efficient Backbone Networks"
    <https://arxiv.org/abs/1904.09730>`_.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vovnet('vovnet39', [128, 160, 192, 224], [256, 512, 768, 1024],
                    [1,1,2,2], 5, pretrained, progress, **kwargs)


def vovnet27_slim(pretrained=False, progress=True, **kwargs):
    r"""Constructs a VoVNet-39 model as described in
    `"An Energy and GPU-Computation Efficient Backbone Networks"
    <https://arxiv.org/abs/1904.09730>`_.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
        progress (bool): If True, displays a progress bar of the download to stderr
    """
    return _vovnet('vovnet27_slim', [64, 80, 96, 112], [128, 256, 384, 512],
                    [1,1,1,1], 5, pretrained, progress, **kwargs)


if __name__ == '__main__':
    model = vovnet39()
    print(model)

VoVNet：一種實時高效的目標檢測Backbone網絡【pytorch代碼詳解】

2.Factors of Efficient Network Design

2.1. Memory Access Cost

2.2. GPUComputation

3. Proposed Method

3.1. Rethinking Dense Connection

3.2. One Shot Aggregation

3.3. Configuration of VoVNet

4. Experiments

【面試準備】又一次失敗的面試經歷，題目離譜～資深軟件測試工程師

dotnet 8 版本與銀河麒麟V10和UOS系統的 glibc 兼容性

Pytorch實現FPN及FCOS，附有詳細註釋！

CVPR2020論文閱讀——超強通道注意力模塊ECANet！

【LeetCode刷題】8 字符串轉換整數 || 10 正則表達式匹配

【目標檢測】FCOS：Fully Convolutional One-Stage Object Detection【附pytorch實現】

VoVNet：一種實時高效的目標檢測Backbone網絡【pytorch代碼詳解】

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結