實戰 | 一步步完成卷積神經網絡CNN的搭建

摘要： 現有的Caffe、TensorFlow等工具箱已經很好地實現CNN模型，但這些工具箱需要的硬件資源比較多，不利於初學者實踐和理解。因此，本文教大家如何僅使用NumPy來構建卷積神經網絡（Convolutional Neural Network , CNN）模型，具體實現了卷積層、ReLU激活函數層以及最大池化層（max pooling），代碼簡單，講解詳細。

目前網絡上存在很多編譯好的機器學習、深度學習工具箱，在某些情況下，直接調用已經搭好的模型可能是非常方便且有效的，比如Caffe、TensorFlow工具箱，但這些工具箱需要的硬件資源比較多，不利於初學者實踐和理解。因此，爲了更好的理解並掌握相關知識，最好是能夠自己編程實踐下。本文將展示如何使用NumPy來構建卷積神經網絡（Convolutional Neural Network , CNN）。
CNN是較早提出的一種神經網絡，直到近年來才變得火熱，可以說是計算機視覺領域中應用最多的網絡。一些工具箱中已經很好地實現CNN模型，相關的庫函數已經完全編譯好，開發人員只需調用現有的模塊即可完成模型的搭建，避免了實現的複雜性。但實際上，這樣會使得開發人員不知道其中具體的實現細節。有些時候，數據科學家必須通過一些細節來提升模型的性能，但這些細節是工具箱不具備的。在這種情況下，唯一的解決方案就是自己編程實現一個類似的模型，這樣你對實現的模型會有最高級別的控制權，同時也能更好地理解模型每步的處理過程。
本文將僅使用NumPy實現CNN網絡，創建三個層模塊，分別爲卷積層（Conv）、ReLu激活函數和最大池化（max pooling）。

1.讀取輸入圖像

以下代碼將從skimage Python庫中讀取已經存在的圖像，並將其轉換爲灰度圖：

1.  import skimage.data  
2.  # Reading the image  
3.  img = skimage.data.chelsea()  
4.  # Converting the image into gray.  
5.  img = skimage.color.rgb2gray(img)js

讀取圖像是第一步，下一步的操作取決於輸入圖像的大小。將圖像轉換爲灰度圖如下所示：

2.準備濾波器

以下代碼爲第一個卷積層Conv準備濾波器組（Layer 1，縮寫爲l1，下同）：

1.  l1_filter = numpy.zeros((2,3,3))

根據濾波器的數目和每個濾波器的大小來創建零數組。上述代碼創建了2個3x3大小的濾波器，（2,3,3）中的元素數字分別表示2：濾波器的數目（num_filters）、3：表示濾波器的列數、3：表示濾波器的行數。由於輸入圖像是灰度圖，讀取後變成2維圖像矩陣，因此濾波器的尺寸選擇爲2維陣列，捨去了深度。如果圖像是彩色圖（具有3個通道，分別爲RGB），則濾波器的大小必須爲（3,3,3），最後一個3表示深度，上述代碼也要更改，變成（2,3,3,3）。
濾波器組的大小由自己指定，但沒有給定濾波器中具體的數值，一般採用隨機初始化。下列一組值可以用來檢查垂直和水平邊緣：

1.  l1_filter[0, :, :] = numpy.array([[[-1, 0, 1],   
2.                                     [-1, 0, 1],   
3.                                     [-1, 0, 1]]])  
4.  l1_filter[1, :, :] = numpy.array([[[1,   1,  1],   
5.                                     [0,   0,  0],   
6.                                     [-1, -1, -1]]])

3.卷積層（Conv Layer）

構建好濾波器後，接下來就是與輸入圖像進行卷積操作。下面代碼使用conv函數將輸入圖像與濾波器組進行卷積：

1.  l1_feature_map = conv(img, l1_filter)

conv函數只接受兩個參數，分別爲輸入圖像、濾波器組：

1.  def conv(img, conv_filter):  
2.      if len(img.shape) > 2 or len(conv_filter.shape) > 3: # Check if number of image channels matches the filter depth.  
3.          if img.shape[-1] != conv_filter.shape[-1]:  
4.              print("Error: Number of channels in both image and filter must match.")  
5.              sys.exit()  
6.      if conv_filter.shape[1] != conv_filter.shape[2]: # Check if filter dimensions are equal.  
7.          print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.')  
8.          sys.exit()  
9.      if conv_filter.shape[1]%2==0: # Check if filter diemnsions are odd.  
10.         print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.')  
11.         sys.exit()  
12.   
13.     # An empty feature map to hold the output of convolving the filter(s) with the image.  
14.     feature_maps = numpy.zeros((img.shape[0]-conv_filter.shape[1]+1,   
15.                                 img.shape[1]-conv_filter.shape[1]+1,   
16.                                 conv_filter.shape[0]))  
17.   
18.     # Convolving the image by the filter(s).  
19.     for filter_num in range(conv_filter.shape[0]):  
20.         print("Filter ", filter_num + 1)  
21.         curr_filter = conv_filter[filter_num, :] # getting a filter from the bank.  
22.         """  
23.         Checking if there are mutliple channels for the single filter. 
24.         If so, then each channel will convolve the image. 
25.         The result of all convolutions are summed to return a single feature map. 
26.         """  
27.         if len(curr_filter.shape) > 2:  
28.             conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature maps.  
29.             for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results.  
30.                 conv_map = conv_map + conv_(img[:, :, ch_num],   
31.                                   curr_filter[:, :, ch_num])  
32.         else: # There is just a single channel in the filter.  
33.             conv_map = conv_(img, curr_filter)  
34.         feature_maps[:, :, filter_num] = conv_map # Holding feature map with the current filter.
35.      return feature_maps # Returning all feature maps.

該函數首先確保每個濾波器的深度等於圖像通道的數目，代碼如下。if語句首先檢查圖像與濾波器是否有一個深度通道，若存在，則檢查其通道數是否相等，如果匹配不成功，則報錯。

1.  if len(img.shape) > 2 or len(conv_filter.shape) > 3: # Check if number of image channels matches the filter depth.  
2.          if img.shape[-1] != conv_filter.shape[-1]:  
3.              print("Error: Number of channels in both image and filter must match.")

此外，濾波器的大小應該是奇數，且每個濾波器的大小是相等的。這是根據下面兩個if條件語塊來檢查的。如果條件不滿足，則程序報錯並退出。

1.  if conv_filter.shape[1] != conv_filter.shape[2]: # Check if filter dimensions are equal.  
2.      print('Error: Filter must be a square matrix. I.e. number of rows and columns must match.')  
3.      sys.exit()  
4.  if conv_filter.shape[1]%2==0: # Check if filter diemnsions are odd.  
5.      print('Error: Filter must have an odd size. I.e. number of rows and columns must be odd.')  
6.      sys.exit()

上述條件都滿足後，通過初始化一個數組來作爲濾波器的值，通過下面代碼來指定濾波器的值：

1.  # An empty feature map to hold the output of convolving the filter(s) with the image.  
2.  feature_maps = numpy.zeros((img.shape[0]-conv_filter.shape[1]+1,   
3.                              img.shape[1]-conv_filter.shape[1]+1,   
4.                              conv_filter.shape[0]))

由於沒有設置步幅（stride）或填充（padding），默認爲步幅設置爲1，無填充。那麼卷積操作後得到的特徵圖大小爲（img_rows-filter_rows+1, image_columns-filter_columns+1, num_filters），即輸入圖像的尺寸減去濾波器的尺寸後再加1。注意到，每個濾波器都會輸出一個特徵圖。

1.   # Convolving the image by the filter(s).  
2.      for filter_num in range(conv_filter.shape[0]):  
3.          print("Filter ", filter_num + 1)  
4.          curr_filter = conv_filter[filter_num, :] # getting a filter from the bank.  
5.          """  
6.          Checking if there are mutliple channels for the single filter. 
7.          If so, then each channel will convolve the image. 
8.          The result of all convolutions are summed to return a single feature map. 
9.          """  
10.         if len(curr_filter.shape) > 2:  
11.             conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature maps.  
12.             for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results.  
13.                 conv_map = conv_map + conv_(img[:, :, ch_num],   
14.                                   curr_filter[:, :, ch_num])  
15.         else: # There is just a single channel in the filter.  
16.             conv_map = conv_(img, curr_filter)  
17.         feature_maps[:, :, filter_num] = conv_map # Holding feature map with the current filter.

循環遍歷濾波器組中的每個濾波器後，通過下面代碼更新濾波器的狀態：

1.  curr_filter = conv_filter[filter_num, :] # getting a filter from the bank.

如果輸入圖像不止一個通道，則濾波器必須具有同樣的通道數目。只有這樣，卷積過程才能正常進行。最後將每個濾波器的輸出求和作爲輸出特徵圖。下面的代碼檢測輸入圖像的通道數，如果圖像只有一個通道，那麼一次卷積即可完成整個過程：

1.  if len(curr_filter.shape) > 2:  
2.       conv_map = conv_(img[:, :, 0], curr_filter[:, :, 0]) # Array holding the sum of all feature map 
3.       for ch_num in range(1, curr_filter.shape[-1]): # Convolving each channel with the image and summing the results.  
4.          conv_map = conv_map + conv_(img[:, :, ch_num],   
5.                                    curr_filter[:, :, ch_num])  
6.  else: # There is just a single channel in the filter.  
7.      conv_map = conv_(img, curr_filter)

上述代碼中conv_函數與之前的conv函數不同，函數conv只接受輸入圖像和濾波器組這兩個參數，本身並不進行卷積操作，它只是設置用於conv_函數執行卷積操作的每一組輸入濾波器。下面是conv_函數的實現代碼：

1.  def conv_(img, conv_filter):  
2.      filter_size = conv_filter.shape[0]  
3.      result = numpy.zeros((img.shape))  
4.      #Looping through the image to apply the convolution operation.  
5.      for r in numpy.uint16(numpy.arange(filter_size/2,   
6.                            img.shape[0]-filter_size/2-2)):  
7.          for c in numpy.uint16(numpy.arange(filter_size/2, img.shape[1]-filter_size/2-2)):  
8.              #Getting the current region to get multiplied with the filter.  
9.              curr_region = img[r:r+filter_size, c:c+filter_size]  
10.             #Element-wise multipliplication between the current region and the filter.  
11.             curr_result = curr_region * conv_filter  
12.             conv_sum = numpy.sum(curr_result) #Summing the result of multiplication.  
13.             result[r, c] = conv_sum #Saving the summation in the convolution layer feature map.  
14.               
15.     #Clipping the outliers of the result matrix.  
16.     final_result = result[numpy.uint16(filter_size/2):result.shape[0]-numpy.uint16(filter_size/2),   
17.                           numpy.uint16(filter_size/2):result.shape[1]-numpy.uint16(filter_size/2)]  
18.     return final_result

每個濾波器在圖像上迭代卷積的尺寸相同，通過以下代碼實現：

1.  curr_region = img[r:r+filter_size, c:c+filter_size]

之後，在圖像區域矩陣和濾波器之間對位相乘，並將結果求和以得到單值輸出：

1.  #Element-wise multipliplication between the current region and the filter.  
2.  curr_result = curr_region * conv_filter  
3.  conv_sum = numpy.sum(curr_result) #Summing the result of multiplication.  
4.  result[r, c] = conv_sum #Saving the summation in the convolution layer feature map.

輸入圖像與每個濾波器卷積後，通過conv函數返回特徵圖。下圖顯示conv層返回的特徵圖（由於l1卷積層的濾波器參數爲（2,3,3），即2個3x3大小的卷積核，最終輸出2個特徵圖）：

卷積後圖像

卷積層的後面一般跟着激活函數層，本文采用ReLU激活函數。

4.ReLU激活函數層

ReLU層將ReLU激活函數應用於conv層輸出的每個特徵圖上，根據以下代碼行調用ReLU激活函數：

l1_feature_map_relu = relu(l1_feature_map)

ReLU激活函數（ReLU）的具體實現代碼如下：

1.  def relu(feature_map):  
2.      #Preparing the output of the ReLU activation function.  
3.      relu_out = numpy.zeros(feature_map.shape)  
4.      for map_num in range(feature_map.shape[-1]):  
5.          for r in numpy.arange(0,feature_map.shape[0]):  
6.              for c in numpy.arange(0, feature_map.shape[1]):  
7.                  relu_out[r, c, map_num] = numpy.max(feature_map[r, c, map_num], 0)

ReLU思想很簡單，只是將特徵圖中的每個元素與0進行比較，若大於0，則保留原始值。否則將其設置爲0。ReLU層的輸出如下圖所示：

ReLU層輸出圖像

激活函數層後面一般緊跟池化層，本文采用最大池化（max pooling）。

5.最大池化層

ReLU層的輸出作爲最大池化層的輸入，根據下面的代碼行調用最大池化操作：

1.  l1_feature_map_relu_pool = pooling(l1_feature_map_relu, 2, 2)

最大池化函數（max pooling）的具體實現代碼如下：

1.  def pooling(feature_map, size=2, stride=2):  
2.      #Preparing the output of the pooling operation.  
3.      pool_out = numpy.zeros((numpy.uint16((feature_map.shape[0]-size+1)/stride),  
4.                              numpy.uint16((feature_map.shape[1]-size+1)/stride),  
5.                              feature_map.shape[-1]))  
6.      for map_num in range(feature_map.shape[-1]):  
7.          r2 = 0  
8.          for r in numpy.arange(0,feature_map.shape[0]-size-1, stride):  
9.              c2 = 0  
10.             for c in numpy.arange(0, feature_map.shape[1]-size-1, stride):  
11.                 pool_out[r2, c2, map_num] = numpy.max(feature_map[r:r+size,  c:c+size])  
12.                 c2 = c2 + 1  
13.             r2 = r2 +1

該函數接受3個參數，分別爲ReLU層的輸出，池化掩膜的大小和步幅。首先也是創建一個空數組，用來保存該函數的輸出。數組大小根據輸入特徵圖的尺寸、掩膜大小以及步幅來確定。

1.  pool_out = numpy.zeros((numpy.uint16((feature_map.shape[0]-size+1)/stride),  
2.                          numpy.uint16((feature_map.shape[1]-size+1)/stride),  
3.                          feature_map.shape[-1]))

對每個輸入特徵圖通道都進行最大池化操作，返回該區域中最大的值，代碼如下：

pool_out[r2, c2, map_num] = numpy.max(feature_map[r:r+size,  c:c+size])

池化層的輸出如下圖所示，這裏爲了顯示讓其圖像大小看起來一樣，其實池化操作後圖像尺寸遠遠小於其輸入圖像。

池化層輸出圖像

6.層堆疊

以上內容已經實現CNN結構的基本層——conv、ReLU以及max pooling，現在將其進行堆疊使用，代碼如下：

1.  # Second conv layer  
2.  l2_filter = numpy.random.rand(3, 5, 5, l1_feature_map_relu_pool.shape[-1])  
3.  print("\n**Working with conv layer 2**")  
4.  l2_feature_map = conv(l1_feature_map_relu_pool, l2_filter)  
5.  print("\n**ReLU**")  
6.  l2_feature_map_relu = relu(l2_feature_map)  
7.  print("\n**Pooling**")  
8.  l2_feature_map_relu_pool = pooling(l2_feature_map_relu, 2, 2)  
9.  print("**End of conv layer 2**\n")

從代碼中可以看到，l2表示第二個卷積層，該卷積層使用的卷積核爲（3,5,5），即3個5x5大小的卷積核（濾波器）與第一層的輸出進行卷積操作，得到3個特徵圖。後續接着進行ReLU激活函數以及最大池化操作。將每個操作的結果可視化，如下圖所示：

l2層處理過程可視化圖像

1.  # Third conv layer  
2.  l3_filter = numpy.random.rand(1, 7, 7, l2_feature_map_relu_pool.shape[-1])  
3.  print("\n**Working with conv layer 3**")  
4.  l3_feature_map = conv(l2_feature_map_relu_pool, l3_filter)  
5.  print("\n**ReLU**")  
6.  l3_feature_map_relu = relu(l3_feature_map)  
7.  print("\n**Pooling**")  
8.  l3_feature_map_relu_pool = pooling(l3_feature_map_relu, 2, 2)  
9.  print("**End of conv layer 3**\n"

從代碼中可以看到，l3表示第三個卷積層，該卷積層使用的卷積核爲（1,7,7），即1個7x7大小的卷積核（濾波器）與第二層的輸出進行卷積操作，得到1個特徵圖。後續接着進行ReLU激活函數以及最大池化操作。將每個操作的結果可視化，如下圖所示：

l3層處理過程可視化圖像

神經網絡的基本結構是前一層的輸出作爲下一層的輸入，比如l2層接收l1層的輸出，l3層接收來l2層的輸出，代碼如下：

1.  l2_feature_map = conv(l1_feature_map_relu_pool, l2_filter)  
2.  l3_feature_map = conv(l2_feature_map_relu_pool, l3_filter)

7.完整代碼

全部代碼已經上傳至Github上，每層的可視化是使用Matplotlib庫實現。

作者信息

Ahmed Gad，研究興趣是深度學習、人工智能和計算機視覺
個人主頁：https://www.linkedin.com/in/ahmedfgad/

實戰 | 一步步完成卷積神經網絡CNN的搭建

1.讀取輸入圖像

2.準備濾波器

3.卷積層（Conv Layer）

4.ReLU激活函數層

5.最大池化層

6.層堆疊

7.完整代碼

作者信息

開源高性能結構化日誌模塊NanoLog

杭州的 IT 崩盤了麼？

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

CnComm多線程串口通訊類解讀

寫在1314

告別 Google Reader，告別一個時代！

2013流行Python項目彙總

瘋狂的Web應用開源項目

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結