第11章 CNNs (1)

卷積網絡


卷積網絡, 也稱爲卷積神經網 (CNNs) , 是一種特殊的神經網絡,用來處理一個已知網格狀拓撲的數據。這種網絡使用了一種叫做卷積的操作。卷積是一種特殊的線性操作。


卷積操作

f(x)g(x)=f(α)g(xα)dα

f 是輸入, g 是核函數

離散卷積

s[t]=(fg)(t)=a=f[a]g[ta]

除了保存數值的有限點,其他地方核函數的值爲0

2維離散卷積

如果使用2d圖像I 作爲輸入,那麼

s[i,j]=(IK)[i,j]=mnI[m,n]K[im,jn]

=mnI[im,jn]K[m,n]

爲什麼卷積?

卷積帶來3個非常重要的特性

  • sparse interactions
  • parameter sharing
  • equivariant representations

sparse interactions

如果有m 維度的輸入和n 維度的輸入,那麼全連接矩陣需要m×n 個參數,運算時間爲O(m×n) 。如果我們限定連接數爲k , 那麼只需要 k×n 個參數,運算時間爲O(k×n)

Parameter sharing

在不只一個函數中使用同一組參數

例如,

s2=αx1+βx2+γx3
s3=αx2+βx3+γx4

其中α,β,γ 就是所謂的Parameter sharing。 如此一來,卷積可以使得內存需求和統計效率更高

equivariance

如果說一個函數是equivariant, 那麼當輸入改變時,輸出也相應改變。 另外,如果 f(g(x))=g(f(x)) 那麼稱函數 f(x) 與函數 gequivariant


池化

池化函數用附近的輸出的綜合計算值代替了網絡的輸出。池化可以使得網絡在輸入的小範圍波動下保持輸出不變。當我們對是否存在某種特徵比較關心,而不是關心其具體位置,這種不變性就變得尤爲重要。例如,人臉識別,判斷面部特徵,但不關心具體位置。

另外,池化是處理變化輸入大小的基礎。變化池化的大小,可以下一層得到的數據都是固定大小的,與輸入無關。


Convolutional Networks


Convolutional networks, also known as convolutional neural
networks(CNNs) , are a specialized kind of neural network for processing data that has a known, grid-like topology. The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation.


The convolution operation

f(x)g(x)=f(α)g(xα)dα

In convolutional network terminology, f is often referred to as the input, and g as the kernel, the output is sometimes referred to as the feature map

Discrete convolution

s[t]=(fg)(t)=a=f[a]g[ta]

Besides the finite set of points for which we store the values, these functions are considered to be zero.

2-d discrete convolution

If we use a 2d image I as input, then

s[i,j]=(IK)[i,j]=mnI[m,n]K[im,jn]

=mnI[im,jn]K[m,n]

Motivation

It brings 3 important ideas that can help improve a machine learning system:

  • sparse interactions
  • parameter sharing
  • equivariant representations

sparse interactions

If there are m inputs and n outputs, then matrix multiplication requires m×n parameters and the algorithms used in practice have O(m×n) runtime(per example). If we limit the number of connections each output may have to k , then the sparsely connected approach requires only k×n parameters and O(k×n) runtime.

Parameter sharing

using the same parameter for more than one function in a model

For example, in the figure,

s2=αx1+βx2+γx3
s3=αx2+βx3+γx4

as we can see, the α,β,γ are the parameters it shares. In this case, convolution can make the memory requirement and statistical efficiency more efficient.

equivariance

To say a function is equivariant means that if the input changes, the output changes in the same way. Specifically, a function f(x) is equivariant to a function g if f(g(x))=g(f(x))


Pooling

A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby outputs. Pooling helps to make the representation become to small invariant translations of the input. Invariance to local translation can be a very useful property if we care more about whether some feature is present than exactly where it is.

Besides, pooling is essential for handling inputs of varying size. Varying the size of and offset between pooling regions makes it possible that the classification layer always receives the same number of summary statistics regardless of the input size.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章