卷積網絡
卷積網絡
, 也稱爲卷積神經網(CNNs) , 是一種特殊的神經網絡,用來處理一個已知網格狀拓撲的數據。這種網絡使用了一種叫做卷積
的操作。卷積是一種特殊的線性操作。
卷積操作
f 是輸入,g 是核函數
離散卷積
除了保存數值的有限點,其他地方核函數的值爲0
2維離散卷積
如果使用2d圖像I 作爲輸入,那麼
爲什麼卷積?
卷積帶來3個非常重要的特性
sparse interactions parameter sharing equivariant representations
sparse interactions
如果有
Parameter sharing
在不只一個函數中使用同一組參數
例如,
s2=αx1+βx2+γx3
s3=αx2+βx3+γx4
其中
equivariance
如果說一個函數是equivariant
, 那麼當輸入改變時,輸出也相應改變。 另外,如果 equivariant
池化
池化函數用附近的輸出的綜合計算值代替了網絡的輸出。池化可以使得網絡在輸入的小範圍波動下保持輸出不變。當我們對是否存在某種特徵比較關心,而不是關心其具體位置,這種不變性就變得尤爲重要。例如,人臉識別,判斷面部特徵,但不關心具體位置。
另外,池化是處理變化輸入大小的基礎。變化池化的大小,可以下一層得到的數據都是固定大小的,與輸入無關。
Convolutional Networks
Convolutional networks
, also known asconvolutional neural
networks(CNNs) , are a specialized kind of neural network for processing data that has a known, grid-like topology. The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation.
The convolution operation
In convolutional network terminology,
f is often referred to as theinput
, andg as thekernel
, the output is sometimes referred to as thefeature map
Discrete convolution
Besides the finite set of points for which we store the values, these functions are considered to be zero.
2-d discrete convolution
If we use a 2d image I as input, then
Motivation
It brings 3 important ideas that can help improve a machine learning system:
sparse interactions parameter sharing equivariant representations
sparse interactions
If there are
Parameter sharing
using the same parameter for more than one function in a model
For example, in the figure,
s2=αx1+βx2+γx3
s3=αx2+βx3+γx4
as we can see, the
equivariance
To say a function is equivariant means that if the input changes, the output changes in the same way. Specifically, a function
Pooling
A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs. Pooling helps to make the representation become to small invariant translations of the input. Invariance to local translation can be a very useful property if we care more about whether some feature is present than exactly where it is.
Besides, pooling is essential for handling inputs of varying size. Varying the size of and offset between pooling regions makes it possible that the classification layer always receives the same number of summary statistics regardless of the input size.