CVPR-2019

caffe code： https://github.com/implus/SKNet/blob/master/models/sknet50.prototxt

caffe code 可視化工具： http://ethereon.github.io/netscope/#/editor

文章目錄

4 Experiments

5 Conclusion（own）

1 Background and Motivation

來自：cs231n課件鏈接

人類視皮質神經元的感受野是隨着刺激的不同而變化的，然而 CNN 設計的時候，感受野是固定的！

It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs.

Inception family 在這方面也做了嘗試——同一stage，多個感受野 linear aggregation（the RF sizes of neurons in the same area (e.g., V1 region) are different），但還不能做到 adaptive changing of RF size.

【Inception-v1】《Going Deeper with Convolutions》（CVPR-2015）
【Inception-v2】《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》（ICML-2015）
【Inception-v3】《Rethinking the Inception Architecture for Computer Vision》（CVPR-2016）
【Inception-v4、Inception-Resnet-v1、v2】《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》（AAAI-2017）

作者想借鑑 visual cortical neurons 的特性，沿着 inception family 的發展線路，在 CNN 中做到 adaptive changing of receptive field（RF） size

2 Advantages / Contributions

提出了 SKNet，讓 CNN 做到 adaptive changing of receptive field size！比 SENet 效果好！

3 Method

Split, Fuse and Select

adaptively change during inference

巧妙的 softmax 設計，666，很容易擴展到 multi-branch！

3.1 Selective Kernel Convolution

1）Split

特徵圖 $X \in \mathbb{R}^{H' \times W' \times C' }$ 經過 3x3 和 5x5 卷積，生成特徵圖 $\widetilde{U} \in \mathbb{R}^{H \times W \times C}$ 和 $\widehat{U} \in \mathbb{R}^{H \times W \times C }$

爲了減少計算量，conv 採用的是 grouped / depth-wise convolutions，5x5 用 3x3 配合 dilation 卷積來代替！

2）Fuse

做到 adaptive kernel，利用 gate 來實現

先 element-wise add

再 global average pooling 來做 channel wise 的 attention

接個 fc 來壓縮下計算

把向量 $s \in \mathbb{R}^{C \times 1}$ 壓縮成 $z \in \mathbb{R}^{d \times 1}$ ，其中 $B$ 表示 batch normalization， $\delta$ 是 relu， $W \in \mathbb{R}^{d \times C}$ 表示權重！

壓縮後的向量維度如下

$L = 32$ ， $r$ 是壓縮比率！最低保留 32

3）Select

$z$ 經兩個 fc 把向量恢復成通道數的一樣的維度，接 softmax 激活

$z \in \mathbb{R}^{d \times 1}$ ， $A_c, B_c \in \mathbb{R}^{1 \times d}$

$a，b \in \mathbb{R}^{C \times 1}$ ，

$a_c，b_c \in \mathbb{R}^{1 \times 1}$ ，下標 $c$ 表示 $c$ -th element

最後把 learning 到的 channel-wise attention 作用到原特徵圖上

因爲 two-branch 用的是 softmax，所以權重和爲1，確實起到了 gate 的作用，控制兩條分支的比重

$V \in \mathbb{R}^{H \times W \times C}$

3.2 Network Architecture

M 是 the number of paths，比如 fig 1 中 M = 2

第二列是在每個 bottleneck 結束後接一個 SK attention，第三列是在 bottleneck 內部使用，應該是在 3x3 conv 之後

4 Experiments

4.1 Datasets

ImageNet 2012 dataset
CIFAR-10
CIFAR-100

4.2 ImageNet Classification

1）Comparisons with state-of-the-art models

可以看到，SKNet-50 就表現出媲美其它 100 layer 的網絡了

2）Selective Kernel vs. Depth/Width/Cardinality

和 ResNeXt 皇城 PK，把參數量都調至相當的水平

3）Performance with respect to the number of parameters

左下角最好，可以看到，SKNet utilizes parameters more efficiently than 其它的 models

4）Lightweight models

加在 shuffleNet 上有提升，且比 SE 的 channel attention 好

4.3 CIFAR Classification

看 ResNeXt 的參數設置，分的組數不多，每組的 width 好大，32，64

SKNet 和 ResNeXt 相比，更少的參數量，更高的 acc

4.4 Ablation Studies

1）The dilation D and group number G

D 表示 dilation， G 是 group！難怪看 code 的時候，怎麼兩個 branch 都是 3x3，懵了半天，原來是配合 dilation convolution 是實現不同 kernel size 的！上面表展示了 second branch 的超參數調整實現

first branch 默認參數設定爲，3x3 conv，D = 1，G = 32

作者發現，相同 RF 下，小的 kernel 配合 dilation 要比大 kernel 好一丟丟（20.79 vs 20.78）

2）Combination of different kernels

K3 是 3x3 conv，K5 是 3x3 配合 d = 2 的 dilation conv，K7 是 3x3 配合 d = 3 的 dilation conv

SK 的有無就是 fig1中 U 還是 V 的區別，也就是 branch 結合的時候，是線性的（U，element-wise addition）還是非線性的（V）

可以看到，M 越多效果越好，多個branch 的非線性組合（V）要比線性組合（U）結果好

M = 2 的時候 “性價比” 最高

4.5 Analysis and Interpretation

這兩個圖模式一樣，

左上角是三种放大目標的原始圖片
左下角是觀察 SK_3_4 中 5x5 卷積的 attention 情況，橫座標是 channel 的索引，縱座標是 channel-wise 的 weight，三條曲線對應三種原始圖片！可以看到，隨着目標的放大（1.0x to 2.0x），5x5 conv 的 weight 呈放大趨勢
右邊圖是 SK unit 中，5x5 conv 的 channel-weight 減去 3x3 conv 的 channel-weight，可以看到，隨着目標的放大（1.0x to 2.0x），5x5 conv 越來越重要，差值呈上升趨勢！

這個圖是對所有例子（all image instances in the ImageNet validation set）的統計，大體結論和上面兩個例子一樣，就是目標越大，5x5 的 conv 的 attention weight 也越來越大，但隨着 stage 的深入，這一規律消失

The larger the target object is, the more attention will be assigned to larger kernels by the Selective Kernel mechanism in low and middle level stages.(e.g., SK 2 3, SK 3 4).

在 high level 的 stage 中，這種現象消失了

作者還探討了類別上的效果，橫座標是類別索引（每類 50 個 sample），縱座標是 5X5 和 3x3 兩個 branch 的 channel-wise weight 的差值

可以看到，晚期的 stage 中，差距很小，早期和中期的 stage 差距明顯，中期的差距最明顯！！

作者對晚期 stage 差距不明顯的解釋如下：

since for the high-level representation, “scale” is partially encoded in the feature vector, and the kernel size matters less compared to the situation in lower layers.

就是說，high level 的 feature 本身就編碼了 scale 信息，不太依賴不同的 kernel 來提取出了！

5 Conclusion（own）

是 inception 和 SENet 的巧妙結合
inception 是設定的多個感受野線性聚合，SKNet 做到非線性的自適應的感受野大小

【Inception-v1】《Going Deeper with Convolutions》（CVPR-2015）
【SENet】《Squeeze-and-Excitation Networks》（CVPR-2018）

inception family 的 motivation，the RF sizes of neurons in the same area (e.g., V1 region) are different, which enables the neurons to collect multi-scale spatial information in the same processing stage.
related work 中 Multi-branch convolutional networks 還挺多論文的（Highway Network，resnet）
多種 kernel 可以配合 dilation 實現，666，這樣還不用增加參數量！由於採用的是 softmax，作者的方法也很容易擴展到 multi-branch
注意作者對 high level 時，兩種不同 kernel 的 attention weight 差距不明顯的解釋！！！

【SKNet】《Selective Kernel Networks》

文章目錄

1 Background and Motivation

2 Advantages / Contributions

3 Method

3.1 Selective Kernel Convolution

3.2 Network Architecture

4 Experiments

4.1 Datasets

4.2 ImageNet Classification

4.3 CIFAR Classification

4.4 Ablation Studies

4.5 Analysis and Interpretation

5 Conclusion（own）

如何在低代碼平臺中引用 JavaScript ？

探究職業發展的關鍵：能力模型解讀

高效率使用windows

如何使用 JavaScript 獲取當前頁面幀率 FPS

工程款拖欠，農民工怎麼了？就得一直忍着委屈求全嗎？

HarmonyOS 實現下拉刷新，上拉加載更多

語音信號處理中的“窗函數”

智能決策新時代：可視化大屏是否能夠超越傳統白板？

解密Prompt系列28. LLM Agent之金融領域摸索：FinMem & FinAgent

分享幾個.NET開源的AI和LLM相關項目框架

【python】Stack / Queue

【python】Single / Single Cycle / Double Link List

【MoCo】《Momentum Contrast for Unsupervised Visual Representation Learning》

【python】Sort and Search

【Distilling】《Learning Efficient Object Detection Models with Knowledge Distillation》

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結