DenseNet: Densely Connected CNN

文章來源

arxiv
torch代碼地址
caffe模型地址

突出貢獻

一個關於DenseNet block的示意圖

In this paper, we propose an architecture that distills this insight into a simple connectivity pattern: to ensure maximum information flow between layers in the network, we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent
layers. Crucially, in contrast to ResNets, we never combine features
through summation before they are passed into a layer; instead, we combine features by concatenating them.

模型

DenseNet的組成結構

對其中的121層的模型進行顯示,如下圖所示。爲了顯示得更多,我對其中第二個DenseBlock變爲只剩頭和尾的部分,第三層的也是如同處理。

Layer 121的組成結構

殘差網關鍵技術

關鍵是ResBlock的理解。傳統卷積網絡就是l層向前卷積,結果作爲l+1層的輸入。ResNet中添加了一個skip-connnection連接l層和l+1層。如下共計算公式:

ResBlock

稠密網關鍵技術

這裏是採用串聯的方式組合特徵,這就要求各層特徵圖X0, X1...Xl-1的大小是一樣的。

ResBlock

H()是一個composite function,是三個操作的組合

BN->ReLU->Conv(3x3)

而Pooling操作會改變特徵圖的大小。因此就在採用VGG中的卷積棧的想法。每一塊被稱爲DenseBlock, Block之間的層稱爲Transition Layers:

DenseBlock特徵輸出->BN->Conv(1x1)->AvePooling(2x2)

Growth rate

由於每個層的輸入是所有之前層輸出的連接,因此每個層的輸出不需要像傳統網絡一樣多。 稠密網中的每個denseblock單元中,每層產生特徵的個數爲k的話,那麼第l層將有kx(l-1) + k0個輸入特徵。 雖然說每個層只產生k個輸出,但是後面層的輸入依然會很多,因此引入了Bottleneck layers 。本質上是引入1x1的卷積層來減少輸入的數量,Hl的具體表示如下:

BN−>ReLU−>Conv(1×1)−>BN−>ReLU−>Conv(3×3)

文中將帶有Bottleneck layers的網絡結構稱爲DenseNet-B。 除了在DenseBlock內部減少特徵圖的數量,還可以在transition layers中來進一步壓縮。如果一個DenseNet有m個特徵圖的輸出,則transition layer產生 ⌊θm⌋個輸出,其中0<θ≤1。對於含有該操作的網絡結構稱爲DenseNet-C

同時包含Bottleneck layerCompression的網絡結構爲DenseNet-BC

效果

分類錯誤率

分類錯誤率

L表示網絡深度,k爲增長率。藍色字體表示最優結果,+表示對原數據庫進行data augmentation。可以發現DenseNet相比ResNet可以取得更低的錯誤率,並且使用了更少的參數。

數據集合

  1. CIFAR C10指的CIFAR-10, C100爲CIFAR-100
  2. SVHN. The Street View House Numbers (SVHN) dataset contains 32×32 colored digit images coming from Google Street View. The task is to classify the central digit into the correct one of the 10 digit classes. There are 73,257 images in the training set, 26,032 images in the test set, and 531,131 images for additional training.
  3. ImageNet. The ILSVRC 2012 classification dataset consists 1.2 million images for training, and 50,000 for validation, and each image is associated with a label from 1000 predefined classes. 數據增益的方法 the images are first zero-padded with 4 pixels on each side, then randomly cropped to again produce 32×32 images;half of the images are then horizontally mirrored.

參數規模和訓練收斂效率

ImageNet分類錯誤率 前兩組描述分類錯誤率與參數量的對比,從第二幅可以看出,在取得相同分類精度的情況下,DenseNet-BC比ResNet少了2/3的參數。第三幅圖描述含有10M參數的1001層的ResNet與只有0.8M的100層的DenseNet的訓練曲線圖。可以發現ResNet可以收斂到更小的loss值,但是最終的test error與DenseNet相差無幾。再次說明了DenseNet參數效率(Parameter Efficiency)很高!

計算量

ImageNet分類錯誤率 右圖使用FLOPS來說明計算量。通過比較ResNet-50,DenseNet-201,ResNet-101,說明計算量方面,DenseNet結果更好。

DenseNet信息量分析

ImageNet分類錯誤率

For each convolutional layer l within a block, we compute the average (absolute) weight assigned to connections with layer s. 上圖 shows a heatmap for all three dense blocks.

The average absolute weight serves as a surrogate for the dependency of a convolutional layer on its preceding layers.

1. All layers spread their weights over many inputs within
the same block. This indicates that features extracted
by very early layers are, indeed, directly used by deep
layers throughout the same dense block.
2. The weights of the transition layers also spread their
weight across all layers within the preceding dense
block, indicating information flow from the first to the
last layers of the DenseNet through few indirections.
3. The layers within the second and third dense block
consistently assign the least weight to the outputs of
the transition layer (the top row of the triangles), indicating that the transition layer outputs many redundant features (with low weight on average). This is in
keeping with the strong results of DenseNet-BC where
exactly these outputs are compressed.
4. Although the final classification layer, shown on the
very right, also uses weights across the entire dense block, there seems to be a concentration towards final
feature-maps, suggesting that there may be some more
high-level features produced late in the network
  1. 一個densenet block中的,靠前的層提取的特徵,直接被後面的層使用。
  2. transition layers中使用的特徵,是來自densenetblock中的中的所有層的。
  3. 第二和第三 block中第一行顯示,上一個block輸出的特徵中有大量的冗餘信息。因此Densenet-BC就是這麼來的

小結

DenseNet有如下優點:

  • 有效解決梯度消失問題
  • 強化特徵傳播
  • 支持特徵重用
  • 大幅度減少參數數量

想法

  1. 其實無論是ResNet還是DenseNet,核心的思想都是HighWay Nets的思想: 就是skip connection,對於某些的輸入不加選擇的讓其進入之後的layer(skip),從而實現信息流的整合,避免了信息在層間傳遞的丟失和梯度消失的問題(還抑制了某些噪聲的產生).

  2. 利用DenseNet block實現了將深度網絡向着淺層但是很寬的網絡方向發展。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章