Inception Architecture for Computer Vision

目的

2014年之後,深度CNN網絡成爲主流,其中出現了Inception之後,將神經網絡由十幾層加深到34層[^2], Inception作爲後來深度神經網絡中的重要組成模塊,有對其中的原理和效果進行鑽研學習一下。

論文

Network In Network提出原因[^1]

提出原因

Generalized Linear Model使用的前提是假設語義空間是線性可分的。但是往往並不是假設的那樣子,來自同一個概念的數據信息往往是非線性的,從而表示這些信息要使用輸入參數X的非線性關係函數。

結構

通過堆疊的MLPConv的方式實現了NIN的設計,最後的預測層使用Global Average Pooling替代全連接層。爲什麼呢?因爲全連接層容易出現Overfitting。對最後的每一個特徵層進行average pooling計算,對pooling後的向量直接作爲softmax的輸入。其中最後的輸出特徵層可以解釋爲每個類別的confidence map;同時,average pooling沒有參數進行優化;而且average pooling的方式利用的是全局信息,對於空間信息更加robust

展示最後的feature maps 結果信息:

從Inception設計遵循規則[^4]

避免特徵表示瓶頸

Avoid representational bottlenecks, especially early in the network. Feed-forward networks can be represented by an acyclic graph from the input layer(s) to the classifier or regressor. This defines a clear direction for the information flow. For any cut separating the inputs from the outputs, one can access the amount of information passing though the cut. One should avoid bottlenecks with extreme compression. In general the representation size should gently decrease from the inputs to the outputs before reaching the final representation used for the task at hand. Theoretically, information content can not be assessed merely by the dimensionality of the representation as it discards important factors like correlation structure; the dimensionality merely provides a rough estimate of information content.

高緯度更容易處理局部

Higher dimensional representations are easier to process locally within a network. Increasing the activations per tile in a convolutional network allows for more disentangled features. The resulting networks will train faster.

通過低維嵌入的方式實現空間信息的聚合,能夠減少特徵表示的損失

Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power. For example, before performing a more spread out (e.g. 3 × 3) convolution, one can reduce the dimension of the input representation before the spatial aggregation without expecting serious adverse effects. We hypothesize that the reason for that is the strong correlation between adjacent unit results in much less loss of information during dimension reduction, if the outputs are used in a spatial aggregation context. Given that these signals should be easily compressible, the dimension reduction even promotes faster learning.

網絡的寬度和深度的平衡

Balance the width and depth of the network. Optimal performance of the network can be reached by balancing the number of filters per stage and the depth of the network. Increasing both the width and the depth of the network can contribute to higher quality networks. However, the optimal improvement for a constant amount of computation can be reached if both are increased in parallel. The computational budget should therefore be distributed in a balanced way between the depth and width of the network.

GoogLeNet中的應用[^2]

參考文獻

[^1]: [Network in Network]https://arxiv.org/abs/1312.4400

[^2]: [Going Deeper with Convolutions]https://arxiv.org/abs/1409.4842

[^3]: [Inception v3]https://www.arxiv.org/abs/1512.00567

[^4]: [Inception v4]https://arxiv.org/abs/1602.07261

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章