最近開始學神經網絡一個很重要的分支——卷積神經網絡。什麼是卷積神經網絡,卷積神網絡用來幹什麼,卷積神經網絡比普通神經網絡優越在哪兒,相信這些問題你都能在以下一些比較經典的博客或者教程中找到很好的答案,本篇博客,只是從我的角度來總結一下我所瞭解的卷積網絡,做一箇中間學習過程的鞏固與拓展。
經典博客/教程
圖像分類
談CNN就不能不談圖像分類。卷積神經網絡是一種專門來處理具有類似網格結構數據的神經網絡,例如時間序列數據,圖像數據。爲什麼說CNN也可用於時間序列數據呢?因爲我們數據中所採用的時間,可以理解爲是在時間軸上的一種按照一定規律採樣的一維網格數據。但是卷積神經網絡應用最多的,應該是圖像分類領域。
圖像分類問題涉及的方向很廣,比如物體檢測、圖像分割、目標跟蹤、背景建模……而在圖像分類領域,所面臨的困難與挑戰也很多。相信大家在一些視頻教學的開頭也曾經看到(比如CS231n),其中有:
- Viewpoint variation 同物體可以從不同角度拍攝
- Scale variation 物體可視的大小通常有變化
- Deformation 很多物體形狀會有所變化
- Occlusion 目標物體可能被擋住
- Illumination conditions 在像素層面上,光照影響很大
- Background clutter 背景干擾
- Intra-class variation 類內差異
諸多挑戰。同樣在設計算法進行圖像方面的處理時,我們也要考慮到這些內在的問題,達到維持分類結論穩定的同時,保持對類間差距足夠明敏感。
手動實現CNN
CNN的構成
-
卷積神經網絡的卷積層包括:
- Zero Padding 零填充
- Convolve window 過濾器
- Convolution forward 前向傳播
- Convolution backward 後向傳播
-
卷積神經網絡的池化層包括:
- Pooling forward 前向傳播
- Create mask 創建mask
- Distribute value 分佈值
- Pooling backward 後向傳播
1. Zero-padding
每進行一次卷積和池化,數據矩陣的高度/寬度就會縮小一次。隨着卷積神經網絡層數的增多,高度/寬度就會一直縮小下去,直到0。採用Zero-padding可以保留住高和寬。
同時,填充可以保留圖像邊緣更多的信息,讓每一層圖像的邊緣都發揮作用,而不至於被忽略掉。
實現:np.pad,假設a是一個五維數組(比如a.shape=(5,5,5,5,5)
),你想在第二維上添加pad=1,在第三維上添加pad=3,剩下其他維度上pad=0,那麼你可以這麼實現:
a = np.pad(a, ((0,0), (1,1), (3,3), (0,0), (0,0)), 'constant', constant_values = (..,..))
實現zero_pad函數:
def zero_pad(X, pad):
"""
Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, as illustrated in Figure 1.
Argument:
X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
pad -- integer, amount of padding around each image on vertical and horizontal dimensions
Returns:
X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
"""
X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),'constant',constant_values = 0)
return X_pad
2.卷積層
我們先建立一個卷積的單步函數,也就是隻進行一次卷積中的一次計算。函數分爲三部分:接收輸入,應用過濾器,進行輸出。
(1) 實現 conv_single_step函數:
def conv_single_step(a_slice_prev, W, b):
"""
Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation of the previous layer.
Arguments:
a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)
Returns:
Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
"""
s = np.multiply(a_slice_prev,W)
Z = np.sum(s)
Z = Z + float(b) #將b轉化爲浮點數,將Z轉成標量.
return Z
下面基於上面的單步函數,實現一次卷積。輸入值有: A_prev(前一層的值),過濾器F的權重(用W表示),每個過濾操作都有的不同的偏移量b,還有包含了stride(步幅)和padding(填充)的 hyperparameters dictionary(參數詞典)。
(2) 實現conv_forward函數:
假定我們的過濾器大小爲2×2×n,我們用vert_start、vert_end、horiz_start 和 horiz_end來準確定義每一個2×2 slice 的位置,如下圖所示:
對於每個卷積層的輸出,有如下公式計算n_H、n_W和N_c的shape。
我們不必擔心矢量化, 只用 for 循環就可以實現一切。
def conv_forward(A_prev, W, b, hparameters):
"""
Implements the forward propagation for a convolution function
Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"
Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""
# Retrieve dimensions from A_prev's shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# Retrieve dimensions from W's shape
(f, f, n_C_prev, n_C) = W.shape
# Retrieve information from "hparameters"
stride = hparameters['stride']
pad = hparameters['pad']
# Compute the dimensions of the CONV output volume using the formula given above.
n_H = int((n_H_prev - f +2 * pad)/stride) +1
n_W = int((n_W_prev - f +2 * pad)/stride) +1
# Initialize the output volume Z with zeros.
Z = np.zeros((m , n_H, n_W, n_C))
# Create A_prev_pad by padding A_prev
A_prev_pad = zero_pad(A_prev, pad)
for i in range(m): # loop over the batch of training examples
a_prev_pad = A_prev_pad[i] # Select ith training example's padded activation
for h in range(n_H): # loop over vertical axis of the output volume
for w in range(n_W): # loop over horizontal axis of the output volume
for c in range(n_C): # loop over channels (= #filters) of the output volume
# Find the corners of the current "slice"
vert_start = h*stride
vert_end = vert_start + f
horiz_start =w*stride
horiz_end = horiz_start + f
# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
Z[i, h, w, c] = conv_single_step(a_slice_prev, W[...,c], b[...,c])
# Making sure your output shape is correct
assert(Z.shape == (m, n_H, n_W, n_C))
# Save information in "cache" for the backprop
cache = (A_prev, W, b, hparameters)
return Z, cache
卷積層應該也包含一個激活函數,實現如下:
# Convolve the window to get back one output neuron
Z[i, h, w, c] = ...
# Apply activation
A[i, h, w, c] = activation(Z[i, h, w, c])
3.池化層
其實,池化層的一個作用是:通過最大池化方法來達到“視角不變性”。不變性意味着,如果我們略微調整輸入,輸出仍然是一樣的。換句話說,在輸入圖像上,當我們稍微變換一下我們想要檢測的對象時,由於最大池化的存在,網絡活動(神經元的輸出)將保持不變,網絡仍然能檢測到對象。
但是從另一個角度說,上述機制並不怎麼好,因爲最大池丟失了有價值的信息,也沒有編碼特徵之間的相對空間關係。
池化方法一般來說通用的有兩種,Max Pool和Average Pool,而Max Pool更加常用。輸出的n_H、n_W以及n_C公式如下:
def pool_forward(A_prev, hparameters, mode = "max"):
"""
Implements the forward pass of the pooling layer
Arguments:
A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
hparameters -- python dictionary containing "f" and "stride"
mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
Returns:
A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters
"""
# Retrieve dimensions from the input shape
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
# Retrieve hyperparameters from "hparameters"
f = hparameters["f"]
stride = hparameters["stride"]
# Define the dimensions of the output
n_H = int(1 + (n_H_prev - f) / stride)
n_W = int(1 + (n_W_prev - f) / stride)
n_C = n_C_prev
# Initialize output matrix A
A = np.zeros((m, n_H, n_W, n_C))
for i in range(m): # loop over the training examples
for h in range(n_H): # loop on the vertical axis of the output volume
for w in range(n_W): # loop on the horizontal axis of the output volume
for c in range (n_C): # loop over the channels of the output volume
# Find the corners of the current "slice"
vert_start = h*stride
vert_end = vert_start + f
horiz_start =w*stride
horiz_end = horiz_start + f
# Use the corners to define the current slice on the ith training example of A_prev, channel c.
a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]
# Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
if mode == "max":
A[i, h, w, c] = np.max(a_prev_slice)
elif mode == "average":
A[i, h, w, c] = np.mean(a_prev_slice)
# Store the input and hparameters in "cache" for pool_backward()
cache = (A_prev, hparameters)
# Making sure your output shape is correct
assert(A.shape == (m, n_H, n_W, n_C))
return A, cache
4.反向傳播
在現代的深層學習框架中, 我們只需要實現前向傳遞, 而深度學習框架負責向後傳遞, 因此大多數深學習的工程師不必費心處理向後傳遞的細節。卷積網絡的向後傳遞是複雜的。但是, 如果你想要去實現,可以通過下一節的內容, 去了解什麼 Backprop 在一個卷積網絡中是怎麼個情況。