VGGNet論文筆記

Title: Very Deep Convolutional Networks for large-scale image recognition（2014）

Link：paper

文章目錄

3 Classification Framework

4 Classification Experiments

5 Conclusion

abstract

任務： large-scale image recognition

主要貢獻：通過比較不同深度的網絡，驗證深度卷積網絡（with very small 3×3 convolution filter）的深度配置對性能的影響。

這篇論文基於 ImageNet Challenge 2014，該團隊分別取得了 localisa tion and classification tracks 的第一第二名。

因爲團隊名稱叫 VGG，所以該模型叫 VGGNet。

1 Introduction

Convolutional networks (ConvNets) have recently enjoyed a great success in large-scale image and video recognition

CNN在大規模圖像，視頻識別中取得巨大成功。主要體現在ImageNet Large-ScaleVisual Recognition Challenge (ILSVRC)這個比賽中。

之前的工作也在爲更高的準確性改進CNN框架。

In this paper, we address another important aspect of ConvNet architecture design – its depth.

這篇論文，主要考慮網絡的深度。不改變其他參數，通過添加更多的卷積層來增加網絡深度。因爲使用的卷積濾波器很小 (3 × 3)，所以這個方案可行。

最後得到了一個準確度更高的卷積網絡，這個網絡不但早ILSVRC classification and localisation tasks中取得了state-of-the-art accuracy，在其他圖像識別數據集上表現也很好。

文章結構如下：

第二章：網絡配置
第三章：圖像分類訓練和驗證的細節
第四章：ILSVRC classification task中不同配置的比較
第五章：總結

2 Convnet Configurations

2.1 Architecture

convolution layers
input：224 × 224 RGB image
filter size：3 × 3（receptive field）
stride：1 pixel
pooling layer：5個 max-pooling 層，只有一些卷積層後面跟着 pooling 層（2 × 2 pixel window, with stride 2）
Fully-Connected (FC) layers 所有網絡的FC層都一樣
前兩層 4096 channels
第三層 1000 channels（1000個類別）
第四層 soft-max layer
hidden layers
所有隱藏層的激活函數都是 ReLU；
除了一個網絡外，其餘網絡都沒有采取 Local Response Normalisation (LRN) 這個正則化沒有提高 ILSVRC dataset 上的網絡性能，還耗時佔內存。

2.2 Configurations

本論文的方法是對比不同深度的 CNN，一共設計了以下幾種網絡，每一列是一種配置，網絡用 A 到 E 命名。

所有的網絡配置都按照2.1中的設計，只有深度不同： 從 A 中的11層（8 conv. and 3 FC layers）到 E 中的19層（16 conv. and 3 FC layers）

卷積層的通道數量很小，從 64 開始，沒經過一個max pooling 加 2，直到 512。

2.3 Discussion

small respective filed:
與 ILSVRC-2012 和 ILSVRC-2013 比賽的 top-performing 網絡相比，本文的網絡用了 very small 3 × 3 receptive field with stride 1。

3 Classification Framework

3.1 Training

the training is carried out by optimising the multinomial logistic regression objective using mini-batch gradient descent with momentum.

learning strategy

SGD
batch size = 256
momentum = 0.9
weight decay = 0.0005
dropout ratio = 0.5
The learning rate was initially set to 0.01, and then decreased by a factor of 10 when the validation set accuracy stopped improving.

initialisation

按照上述 Table 1 配置網絡，淺層網絡訓練時可以隨機初始化，深層的網絡訓練時用 net A 的數據初始化前四個卷積層和最後三層全連接層，其餘的中間層隨機初始化。

隨機初始化採用均值爲 0 ，標準差爲 0.01 的正態分佈隨機取樣。

網絡偏置（bias）初始化爲 0。

training image size

用 S 代表 training image 的最小規模，當裁剪尺寸固定爲 224 × 224, S = 224，可以直接輸入整個圖像，黨 S >> 224，圖片就會裁剪。

訓練規模 S 有兩種設定方法。

The first is to fix S, which corresponds to single-scale training. The second approach to setting S is multi-scale training.

第一種方法是固定 S，適用於單尺寸訓練；第二種方法的 S 是從一個確定的區間隨機取樣，適用於多尺寸訓練，這可以看作一種數據增強（data augmentation）

3.2 Testing

給定一個訓練好的網絡和輸入圖像，按如下步驟完成分類任務：

首先圖像調節（rescale）到預先定義的最小尺寸，記爲 Q；
接着把網絡應用到 Q (rescaled test image)，全連接層可以看作 1 × 1 的卷積層，所以整個網絡可以看作全卷積網絡（fully-convolutional net）；
最後得到一個固定尺寸的類別向量。輸出的向量通道數與類別數相等。

可以通過水平翻轉做測試圖片的數據增強，用原始圖像和翻轉圖像得到分數的平均值作爲最後預測結果。

3.3 Implementation Details

用 Caffe 實現，在此基礎上做了一些有意義的修改，所以能在多個 GPU 上訓練和測試（數據並行，加速訓練），還能使用多尺寸的未裁剪的圖片。

4 Classification Experiments

ILSVRC-2012 dataset 分成 3 sets：training，validation，testing。

分類性能測量：the top-1 and top-5 error。

4.1 Single scale evaluation

在單個尺寸的圖片上做測試：

使用 local response normalisation (A-LRN network) 沒有太大效果，所以後面的網絡沒有用 LRN；
隨着網絡深度增加，分類錯誤減小；
採用 multi-scale image 的策略對訓練有幫助（S 不固定）

4.2 Multi-scale evaluation

在多個尺寸的圖片上做測試：

可以看出，測試時應用 scale jittering 可以提高性能，看黑體加粗的部分。

4.3 Multi-crop evaluation

比較 dense ConvNet evaluation 和 mult-crop evaluation：

可以看出，multiple crops 比 dense 的效果稍好，兩個方式互補，兩者結合效果更好。

4.4 ConvNet fusion

考慮多模型融合：

把多個模型的輸出結合起來，錯誤率再一次降低。（比賽後再次提交的效果更好）

4.5 Comparision with the state of the art

從圖中的粗體可以看出，該網絡的效果比之前的好，6.8 的錯誤率也逼近分類任務冠軍 GoogLeNet 的 6.7。

單個網絡的性能，單個 VGG 的性能比單個 GoogLeNet 的性能好。

5 Conclusion

本文驗證了網絡深度對分類性能的影響，而且設計的深度卷積神經網絡在ImageNet challenge dataset 上能取得很好的分類性能。

VGGNet論文筆記

文章目錄

abstract

1 Introduction

2 Convnet Configurations

2.1 Architecture

2.2 Configurations

2.3 Discussion

3 Classification Framework

3.1 Training

3.2 Testing

3.3 Implementation Details

4 Classification Experiments

4.1 Single scale evaluation

4.2 Multi-scale evaluation

4.3 Multi-crop evaluation

4.4 ConvNet fusion

4.5 Comparision with the state of the art

5 Conclusion

AI 畫圖真刺激，手把手教你如何用 ComfyUI 來畫出刺激的圖

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

CV 和 DL 相關的GitHub倉庫

自監督學習和計算機視覺

GitHub 教程目錄

GitHub 圖片加載不出來怎麼辦

github 圖片加載不出來怎麼辦

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結