卷積神經網絡(CNN)詳解與代碼實現

本文系作者原創，轉載請註明出處:https://www.cnblogs.com/further-further-further/p/10430073.html

1.應用場景

2.卷積神經網絡結構

2.1 卷積（convelution）

2.2 Relu激活函數

2.3 池化（pool）

2.4 全連接（full connection）

2.5 損失函數（softmax_loss）

2.6 前向傳播（forward propagation）

2.7 反向傳播（backford propagation）

2.8 隨機梯度下降（sgd_momentum）

3.代碼實現流程圖以及介紹

4.代碼實現（python3.6）

5.運行結果以及分析

6.參考文獻

1.應用場景

卷積神經網絡的應用不可謂不廣泛，主要有兩大類，數據預測和圖片處理。數據預測自然不需要多說，圖片處理主要包含有圖像分類，檢測，識別，以及分割方面的應用。

圖像分類：場景分類，目標分類

圖像檢測：顯著性檢測，物體檢測，語義檢測等等

圖像識別：人臉識別，字符識別，車牌識別，行爲識別，步態識別等等

圖像分割：前景分割，語義分割

2.卷積神經網絡結構

卷積神經網絡主要是由輸入層、卷積層、激活函數、池化層、全連接層、損失函數組成，表面看比較複雜，其實質就是特徵提取以及決策推斷。

要使特徵提取儘量準確，就需要將這些網絡層結構進行組合，比如經典的卷積神經網絡模型AlexNet:5個卷積層+3個池化層+3個連接層結構。

2.1 卷積（convolution）

卷積的作用就是提取特徵，因爲一次卷積可能提取的特徵比較粗糙，所以多次卷積，以及層層縱深卷積，層層提取特徵（千萬要區別於多次卷積，因爲每一層裏含有多次卷積）。

這裏可能就有小夥伴問：爲什麼要進行層層縱深卷積，而且還要每層多次？

你可以理解爲物質A有自己的多個特徵（高、矮、胖、瘦、、、），所以在物質A上需要多次提取，得到不同的特徵，然後這些特徵組合後發生化學反應生成物質B，

而物質B又有一些新的專屬於自己的特徵，所以需要進一步卷積。這是我個人的理解，不對的話或者有更形象的比喻還請不吝賜教啊。

在卷積層中，每一層的卷積核是不一樣的。比如AlexNet

第一層：96*11*11（96表示卷積核個數，11表示卷積核矩陣寬*高） stride（步長） = 4 pad（邊界補零） = 0

第二層：256*5*5 stride（步長） = 1 pad（邊界補零） = 2

第三，四層：384*3*3 stride（步長） = 1 pad（邊界補零） = 1

第五層：256*3*3 stride（步長） = 1 pad（邊界補零） = 2

卷積的篇幅說了這麼多，那麼到底是如何進行運算的呢，雖說網絡上關於卷積運算原理鋪天蓋地，但是個人總感覺講得不夠透徹，或者說本人智商有待提高，

希望通過如下這幅圖（某位大神的傑作）來使各位看官們能夠真正理解。

這裏舉的例子是一個輸入圖片（5*5*3），卷積核（3*3*3），有兩個（Filter W0，W1），偏置b也有兩個（Bios b0，b1），卷積結果Output Volumn（3*3*2），步長stride = 2。

輸入：7*7*3 是因爲 pad = 1 （在圖片邊界行和列都補零，補零的行和的數目是1），

（對於彩色圖片，一般都是RGB3種顏色，號稱3通道，7*7指圖片高h * 寬w）

，補零的作用是能夠提取圖片邊界的特徵。

卷積核深度爲什麼要設置成3呢？這是因爲輸入是3通道，所以卷積核深度必須與輸入的深度相同。至於卷積核寬w，高h則是可以變化的，但是寬高必須相等。

卷積核輸出o[0,0,0] = 3 （Output Volumn下淺綠色框結果），這個結果是如何得到的呢？其實關鍵就是矩陣對應位置相乘再相加（千萬不要跟矩陣乘法搞混淆啦）

=> w0[:,:,0] * x[:,:,0]藍色區域矩陣(R通道) + w0[:,:,1] * x[:,:,1]藍色區域矩陣（G通道）+ w0[:,:,2] * x[:,:,2]藍色區域矩陣（B通道） + b0（千萬不能丟，因爲 y = w * x + b）

第一項 => 0 * 1 + 0 * 1 + 0 * 1 + 0 * (-1) + 1 * (-1) + 1 * 0 + 0 * (-1) + 1 * 1 + 1 * 0 = 0

第二項 => 0 * (-1) + 0 * (-1) + 0 * 1 + 0 * (-1) + 0 * 1 + 1 * 0 + 0 * (-1) + 2 * 1 + 2 * 0 = 2

第三項 => 0 * 1 + 0 * 0 + 0 * (-1) + 0 * 0 + 2 * 0 + 2 * 0 + 0 * 1 + 0 * (-1) + 0 * (-1) = 0

卷積核輸出o[0,0,0] = > 第一項 + 第二項 + 第三項 + b0 = 0 + 2 + 0 + 1 = 3

o[0,0,1] = -5 又是如何得到的呢？

因爲這裏的stride = 2 ，所以輸入的窗口就要滑動兩個步長，也就是紅色框的區域，而運算跟之前是一樣的

第一項 => 0 * 1 + 0 * 1 + 0 * 1 + 1 * (-1) + 2 * (-1) + 2 * 0 + 1 * (-1) + 1 * 1 + 2 * 0 = -3

第二項 => 0 * (-1) + 0 * (-1) + 0 * 1 + 1 * (-1) + 2 * 1 + 0 * 0 + 2 * (-1) + 1 * 1 + 1 * 0 = 0

第三項 => 0 * 1 + 0 * 0 + 0 * (-1) + 2 * 0 + 0 * 0 + 1 * 0 + 0 * 1 + 2 * (-1) + 1 * (-1) = - 3

卷積核輸出o[0,0,1] = > 第一項 + 第二項 + 第三項 + b0 = (-3) + 0 + (-3) + 1 = -5

之後以此卷積核窗口大小在輸入圖片上滑動，卷積求出結果，因爲有兩個卷積核，所有就有兩個輸出結果。

這裏小夥伴可能有個疑問，輸出窗口是如何得到的呢？

這裏有一個公式：輸出窗口寬 w = (輸入窗口寬 w - 卷積核寬 w + 2 * pad)/stride + 1 ，輸出高 h = 輸出窗口寬 w

以上面例子，輸出窗口寬 w = ( 5 - 3 + 2 * 1)/2 + 1 = 3 ，則輸出窗口大小爲 3 * 3，因爲有2個輸出，所以是 3*3*2。

2.2 Relu激活函數

相信看過卷積神經網絡結構（CNN）的夥伴們都知道，激活函數無處不在，特別是CNN中，在卷積層後，全連接（FC）後都有激活函數Relu的身影，

那麼這就自然不得不讓我們產生疑問：

問題1、爲什麼要用激活函數？它的作用是什麼？

問題2、在CNN中爲什麼要用Relu，相比於sigmoid，tanh，它的優勢在什麼地方？

對於第1個問題：由 y = w * x + b 可知，如果不用激活函數，每個網絡層的輸出都是一種線性輸出，而我們所處的現實場景，其實更多的是各種非線性的分佈。

這也說明了激活函數的作用是將線性分佈轉化爲非線性分佈，能更逼近我們的真實場景。

對於第2個問題：先看sigmoid，tanh分佈

他們在 x -> 時，輸出就變成了恆定值，因爲求梯度時需要對函數求一階偏導數，而不論是sigmoid，還是tanhx，他們的偏導都爲0，

也就是存在所謂的梯度消失問題，最終也就會導致權重參數w ， b 無法更新。相比之下，Relu就不存在這樣的問題，另外在 x > 0 時，

Relu求導 = 1，這對於反向傳播計算dw，db，是能夠大大的簡化運算的。

使用sigmoid還會存在梯度爆炸的問題，比如在進行前向傳播和反向傳播迭代次數非常多的情況下，sigmoid因爲是指數函數，其結果中

某些值會在迭代中累積，併成指數級增長，最終會出現NaN而導致溢出。

2.3 池化

池化層一般在卷積層+ Relu之後，它的作用是：

1、減小輸入矩陣的大小（只是寬和高，而不是深度），提取主要特徵。（不可否認的是，在池化後，特徵會有一定的損失，所以，有些經典模型就去掉了池化這一層）。

它的目的是顯而易見的，就是在後續操作時能降低運算。

2、一般採用mean_pooling（均值池化）和max_pooling（最大值池化），對於輸入矩陣有translation（平移），rotation（旋轉），能夠保證特徵的不變性。

mean_pooling 就是輸入矩陣池化區域求均值，這裏要注意的是池化窗口在輸入矩陣滑動的步長跟stride有關，一般stride = 2.（圖片是直接盜過來，這裏感謝原創）

最右邊7/4 => (1 + 1 + 2 + 3)/4

max_pooling 最大值池化，就是每個池化區域的最大值放在輸出對應位置上。

2.4 全連接（full connection）

作用：分類器角色，將特徵映射到樣本標記空間，本質是矩陣變換（affine）。

至於變換的實現見後面的代碼流程圖，或者最好是跟一下代碼，這樣理解更透徹。

2.5 損失函數（softmax_loss）

作用：計算損失loss，從而求出梯度grad。

常用損失函數有:MSE均方誤差，SVM（支持向量機）合頁損失函數，Cross Entropy交叉熵損失函數。

這幾種損失函數目前還看不出誰優誰劣，估計只有在具體的應用場景中去驗證了。至於這幾種損失函數的介紹，

大家可以去參考《常用損失函數小結》https://blog.csdn.net/zhangjunp3/article/details/80467350，這個哥們寫得比較詳細。

在後面的代碼實例中，用到的是softmax_loss，它屬於Cross Entropy交叉熵損失函數。

softmax計算公式：

其中，是要計算的類別的網絡輸出，分母是網絡輸出所有類別之和（共有個類別），表示第類的概率。

交叉熵損失：

其中，是類別的真實標籤，表示第類的概率，是樣本總數，是類別數。

梯度:

= 當 !=

= - 1 當 =

其中表示真實標籤對應索引下預測的目標值，類別索引。

這個有點折磨人，原理講解以及推導請大家可以參考這位大神的博客：http://www.cnblogs.com/zongfa/p/8971213.html。

2.6 前向傳播（forward propagation）

前向傳播包含之前的卷積，Relu激活函數，池化（pool），全連接(fc)，可以說，在損失函數之前操作都屬於前向傳播。

主要是權重參數w , b 初始化，迭代，以及更新w, b,生成分類器模型。

2.7 反向傳播（back propagation）

反向傳播包含損失函數，通過梯度計算dw，db，Relu激活函數逆變換，反池化，反全連接。

2.8 隨機梯度下降（sgd_momentum）

作用：由梯度grad計算新的權重矩陣w

sgd公式：

其中，

缺點：其更新方向完全依賴於當前的batch，因而其更新十分不穩定。

爲了解決這個問題，momentum就橫空出世了，具體原理詳解見下路派出所（這名字霸氣）的博客http://www.cnblogs.com/callyblog/p/8299074.html。

momentum即動量，它模擬的是物體運動時的慣性，即更新的時候在一定程度上保留之前更新的方向，同時利用當前batch的梯度微調最終的更新方向。

這樣一來，可以在一定程度上增加穩定性，從而學習地更快，並且還有一定擺脫局部最優的能力：

其中，

start.py

 1 # -*- coding: utf-8 -*-
 2 import matplotlib.pyplot as plt
 3 '''同路徑下py模塊引用'''
 4 
 5 try:
 6     from . import data_utils
 7     from . import solver
 8     from . import cnn
 9 except Exception:
10     import data_utils
11     import solver
12     import cnn
13 
14 import numpy as np
15 # 獲取樣本數據
16 data = data_utils.get_CIFAR10_data()
17 # model初始化（權重因子以及對應偏置 w1,b1 ,w2,b2 ,w3,b3，數量取決於網絡層數）
18 model = cnn.ThreeLayerConvNet(reg=0.9)
19 solver = solver.Solver(model, data,
20                 lr_decay=0.95,                
21                 print_every=10, num_epochs=5, batch_size=2, 
22                 update_rule='sgd_momentum',                
23                 optim_config={'learning_rate': 5e-4, 'momentum': 0.9})
24 # 訓練，獲取最佳model
25 solver.train()                 
26 
27 plt.subplot(2, 1, 1) 
28 plt.title('Training loss')
29 plt.plot(solver.loss_history, 'o')
30 plt.xlabel('Iteration')
31 
32 plt.subplot(2, 1, 2)
33 plt.title('Accuracy')
34 plt.plot(solver.train_acc_history, '-o', label='train')
35 plt.plot(solver.val_acc_history, '-o', label='val')
36 plt.plot([0.5] * len(solver.val_acc_history), 'k--')
37 plt.xlabel('Epoch')
38 plt.legend(loc='lower right')
39 plt.gcf().set_size_inches(15, 12)
40 plt.show()
41 
42 
43 best_model = model
44 y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)
45 y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)
46 print ('Validation set accuracy: ',(y_val_pred == data['y_val']).mean())
47 print ('Test set accuracy: ', (y_test_pred == data['y_test']).mean())
48 # Validation set accuracy:  about 52.9%
49 # Test set accuracy:  about 54.7%
50 
51 
52 # Visualize the weights of the best network
53 """
54 from vis_utils import visualize_grid
55 
56 def show_net_weights(net):    
57     W1 = net.params['W1']    
58     W1 = W1.reshape(3, 32, 32, -1).transpose(3, 1, 2, 0)    
59     plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))   
60     plt.gca().axis('off')    
61 show_net_weights(best_model)
62 plt.show()
63 """

View Code

cnn.py

 1 # -*- coding: utf-8 -*-
 2 try:
 3     from . import layer_utils
 4     from . import layers
 5 except Exception:
 6     import layer_utils
 7     import layers
 8 import numpy as np
 9 
10 class  ThreeLayerConvNet(object):
11     """    
12     A three-layer convolutional network with the following architecture:       
13        conv - relu - 2x2 max pool - affine - relu - affine - softmax
14     """
15 
16     def __init__(self, input_dim=(3, 32, 32), num_filters=32, filter_size=7,             
17                  hidden_dim=100, num_classes=10, weight_scale=1e-3, reg=0.0,
18                  dtype=np.float32):
19         self.params = {}
20         self.reg = reg
21         self.dtype = dtype
22 
23         # Initialize weights and biases
24         C, H, W = input_dim
25         self.params['W1'] = weight_scale * np.random.randn(num_filters, C, filter_size, filter_size)
26         self.params['b1'] = np.zeros(num_filters)
27         self.params['W2'] = weight_scale * np.random.randn(num_filters*H*W//4, hidden_dim)
28         self.params['b2'] = np.zeros(hidden_dim)
29         self.params['W3'] = weight_scale * np.random.randn(hidden_dim, num_classes)
30         self.params['b3'] = np.zeros(num_classes)
31 
32         for k, v in self.params.items():    
33             self.params[k] = v.astype(dtype)
34 
35 
36     def loss(self, X, y=None):
37         W1, b1 = self.params['W1'], self.params['b1']
38         W2, b2 = self.params['W2'], self.params['b2']
39         W3, b3 = self.params['W3'], self.params['b3']
40 
41         # pass conv_param to the forward pass for the convolutional layer
42         filter_size = W1.shape[2]
43         conv_param = {'stride': 1, 'pad': (filter_size - 1) // 2}
44 
45         # pass pool_param to the forward pass for the max-pooling layer
46         pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}
47 
48         # compute the forward pass
49         a1, cache1 = layer_utils.conv_relu_pool_forward(X, W1, b1, conv_param, pool_param)
50         a2, cache2 = layer_utils.affine_relu_forward(a1, W2, b2)
51         scores, cache3 = layers.affine_forward(a2, W3, b3)
52 
53         if y is None:    
54             return scores
55 
56         # compute the backward pass
57         data_loss, dscores = layers.softmax_loss(scores, y)
58         da2, dW3, db3 = layers.affine_backward(dscores, cache3)
59         da1, dW2, db2 = layer_utils.affine_relu_backward(da2, cache2)
60         dX, dW1, db1 = layer_utils.conv_relu_pool_backward(da1, cache1)
61 
62         # Add regularization 引入修正因子，重新計算損失，梯度
63         dW1 += self.reg * W1
64         dW2 += self.reg * W2
65         dW3 += self.reg * W3
66         reg_loss = 0.5 * self.reg * sum(np.sum(W * W) for W in [W1, W2, W3])
67 
68         loss = data_loss + reg_loss
69         grads = {'W1': dW1, 'b1': db1, 'W2': dW2, 'b2': db2, 'W3': dW3, 'b3': db3}
70 
71         return loss, grads

View Code

data.utils.py

  1 # -*- coding: utf-8 -*-
  2 import pickle 
  3 import numpy as np
  4 import os
  5 
  6 #from scipy.misc import imread
  7 
  8 def load_CIFAR_batch(filename):
  9   """ load single batch of cifar """
 10   with open(filename, 'rb') as f:
 11     datadict = pickle.load(f, encoding='bytes')
 12     X = datadict[b'data']
 13     Y = datadict[b'labels']
 14     X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
 15     Y = np.array(Y)
 16     return X, Y
 17 
 18 def load_CIFAR10(ROOT):
 19   """ load all of cifar """
 20   xs = []
 21   ys = []
 22   for b in range(1,2):
 23     f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
 24     X, Y = load_CIFAR_batch(f)
 25     xs.append(X)
 26     ys.append(Y)    
 27   Xtr = np.concatenate(xs)
 28   Ytr = np.concatenate(ys)
 29   del X, Y
 30   Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
 31   return Xtr, Ytr, Xte, Yte
 32 
 33 
 34 def get_CIFAR10_data(num_training=500, num_validation=50, num_test=50):
 35 
 36     """
 37     Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
 38     it for classifiers. These are the same steps as we used for the SVM, but
 39     condensed to a single function.
 40     """
 41     # Load the raw CIFAR-10 data
 42 
 43     #cifar10_dir = 'C://download//cifar-10-python//cifar-10-batches-py//'
 44     cifar10_dir = '.\\cifar-10-batches-py\\'
 45     X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
 46     print (X_train.shape)
 47     # Subsample the data
 48     mask = range(num_training, num_training + num_validation)
 49     X_val = X_train[mask]
 50     y_val = y_train[mask]
 51     mask = range(num_training)
 52     X_train = X_train[mask]
 53     y_train = y_train[mask]
 54     mask = range(num_test)
 55     X_test = X_test[mask]
 56     y_test = y_test[mask]
 57 
 58     # 標準化數據，求樣本均值，然後 樣本 - 樣本均值，作用：使樣本數據更收斂一些，便於後續處理
 59     # Normalize the data: subtract the mean image
 60     # 如果2維空間 m*n np.mean()後 => 1*n
 61     # 對於4維空間 m*n*k*j np.mean()後 => 1*n*k*j
 62     mean_image = np.mean(X_train, axis=0)
 63     X_train -= mean_image
 64     X_val -= mean_image
 65     X_test -= mean_image
 66 
 67     # 把通道channel 提前
 68     # Transpose so that channels come first
 69     X_train = X_train.transpose(0, 3, 1, 2).copy()
 70     X_val = X_val.transpose(0, 3, 1, 2).copy()
 71     X_test = X_test.transpose(0, 3, 1, 2).copy()
 72 
 73     # Package data into a dictionary
 74     return {
 75       'X_train': X_train, 'y_train': y_train,
 76       'X_val': X_val, 'y_val': y_val,
 77       'X_test': X_test, 'y_test': y_test,
 78     }
 79     
 80 """
 81 def load_tiny_imagenet(path, dtype=np.float32):
 82   
 83   Load TinyImageNet. Each of TinyImageNet-100-A, TinyImageNet-100-B, and
 84   TinyImageNet-200 have the same directory structure, so this can be used
 85   to load any of them.
 86 
 87   Inputs:
 88   - path: String giving path to the directory to load.
 89   - dtype: numpy datatype used to load the data.
 90 
 91   Returns: A tuple of
 92   - class_names: A list where class_names[i] is a list of strings giving the
 93     WordNet names for class i in the loaded dataset.
 94   - X_train: (N_tr, 3, 64, 64) array of training images
 95   - y_train: (N_tr,) array of training labels
 96   - X_val: (N_val, 3, 64, 64) array of validation images
 97   - y_val: (N_val,) array of validation labels
 98   - X_test: (N_test, 3, 64, 64) array of testing images.
 99   - y_test: (N_test,) array of test labels; if test labels are not available
100     (such as in student code) then y_test will be None.
101   
102   # First load wnids
103   with open(os.path.join(path, 'wnids.txt'), 'r') as f:
104     wnids = [x.strip() for x in f]
105 
106   # Map wnids to integer labels
107   wnid_to_label = {wnid: i for i, wnid in enumerate(wnids)}
108 
109   # Use words.txt to get names for each class
110   with open(os.path.join(path, 'words.txt'), 'r') as f:
111     wnid_to_words = dict(line.split('\t') for line in f)
112     for wnid, words in wnid_to_words.iteritems():
113       wnid_to_words[wnid] = [w.strip() for w in words.split(',')]
114   class_names = [wnid_to_words[wnid] for wnid in wnids]
115 
116   # Next load training data.
117   X_train = []
118   y_train = []
119   for i, wnid in enumerate(wnids):
120     if (i + 1) % 20 == 0:
121       print 'loading training data for synset %d / %d' % (i + 1, len(wnids))
122     # To figure out the filenames we need to open the boxes file
123     boxes_file = os.path.join(path, 'train', wnid, '%s_boxes.txt' % wnid)
124     with open(boxes_file, 'r') as f:
125       filenames = [x.split('\t')[0] for x in f]
126     num_images = len(filenames)
127     
128     X_train_block = np.zeros((num_images, 3, 64, 64), dtype=dtype)
129     y_train_block = wnid_to_label[wnid] * np.ones(num_images, dtype=np.int64)
130     for j, img_file in enumerate(filenames):
131       img_file = os.path.join(path, 'train', wnid, 'images', img_file)
132       img = imread(img_file)
133       if img.ndim == 2:
134         ## grayscale file
135         img.shape = (64, 64, 1)
136       X_train_block[j] = img.transpose(2, 0, 1)
137     X_train.append(X_train_block)
138     y_train.append(y_train_block)
139       
140   # We need to concatenate all training data
141   X_train = np.concatenate(X_train, axis=0)
142   y_train = np.concatenate(y_train, axis=0)
143   
144   # Next load validation data
145   with open(os.path.join(path, 'val', 'val_annotations.txt'), 'r') as f:
146     img_files = []
147     val_wnids = []
148     for line in f:
149       img_file, wnid = line.split('\t')[:2]
150       img_files.append(img_file)
151       val_wnids.append(wnid)
152     num_val = len(img_files)
153     y_val = np.array([wnid_to_label[wnid] for wnid in val_wnids])
154     X_val = np.zeros((num_val, 3, 64, 64), dtype=dtype)
155     for i, img_file in enumerate(img_files):
156       img_file = os.path.join(path, 'val', 'images', img_file)
157       img = imread(img_file)
158       if img.ndim == 2:
159         img.shape = (64, 64, 1)
160       X_val[i] = img.transpose(2, 0, 1)
161 
162   # Next load test images
163   # Students won't have test labels, so we need to iterate over files in the
164   # images directory.
165   img_files = os.listdir(os.path.join(path, 'test', 'images'))
166   X_test = np.zeros((len(img_files), 3, 64, 64), dtype=dtype)
167   for i, img_file in enumerate(img_files):
168     img_file = os.path.join(path, 'test', 'images', img_file)
169     img = imread(img_file)
170     if img.ndim == 2:
171       img.shape = (64, 64, 1)
172     X_test[i] = img.transpose(2, 0, 1)
173 
174   y_test = None
175   y_test_file = os.path.join(path, 'test', 'test_annotations.txt')
176   if os.path.isfile(y_test_file):
177     with open(y_test_file, 'r') as f:
178       img_file_to_wnid = {}
179       for line in f:
180         line = line.split('\t')
181         img_file_to_wnid[line[0]] = line[1]
182     y_test = [wnid_to_label[img_file_to_wnid[img_file]] for img_file in img_files]
183     y_test = np.array(y_test)
184   
185   return class_names, X_train, y_train, X_val, y_val, X_test, y_test
186 
187 """
188 def load_models(models_dir):
189   """
190   Load saved models from disk. This will attempt to unpickle all files in a
191   directory; any files that give errors on unpickling (such as README.txt) will
192   be skipped.
193 
194   Inputs:
195   - models_dir: String giving the path to a directory containing model files.
196     Each model file is a pickled dictionary with a 'model' field.
197 
198   Returns:
199   A dictionary mapping model file names to models.
200   """
201   models = {}
202   for model_file in os.listdir(models_dir):
203     with open(os.path.join(models_dir, model_file), 'rb') as f:
204       try:
205         models[model_file] = pickle.load(f)['model']
206       except pickle.UnpicklingError:
207         continue
208   return models

View Code

layer.utils.py

 1 # -*- coding: utf-8 -*-
 2 try:
 3   from . import layers
 4 except Exception:
 5   import layers
 6 
 7 
 8 
 9 
10 def affine_relu_forward(x, w, b):
11   """
12   Convenience layer that perorms an affine transform followed by a ReLU
13 
14   Inputs:
15   - x: Input to the affine layer
16   - w, b: Weights for the affine layer
17 
18   Returns a tuple of:
19   - out: Output from the ReLU
20   - cache: Object to give to the backward pass
21   """
22   a, fc_cache = layers.affine_forward(x, w, b)
23   out, relu_cache = layers.relu_forward(a)
24   cache = (fc_cache, relu_cache)
25   return out, cache
26 
27 
28 def affine_relu_backward(dout, cache):
29   """
30   Backward pass for the affine-relu convenience layer
31   """
32   fc_cache, relu_cache = cache
33   da = layers.relu_backward(dout, relu_cache)
34   dx, dw, db = layers.affine_backward(da, fc_cache)
35   return dx, dw, db
36 
37 
38 pass
39 
40 
41 def conv_relu_forward(x, w, b, conv_param):
42   """
43   A convenience layer that performs a convolution followed by a ReLU.
44 
45   Inputs:
46   - x: Input to the convolutional layer
47   - w, b, conv_param: Weights and parameters for the convolutional layer
48   
49   Returns a tuple of:
50   - out: Output from the ReLU
51   - cache: Object to give to the backward pass
52   """
53   a, conv_cache = layers.conv_forward_fast(x, w, b, conv_param)
54   out, relu_cache = layers.relu_forward(a)
55   cache = (conv_cache, relu_cache)
56   return out, cache
57 
58 
59 def conv_relu_backward(dout, cache):
60   """
61   Backward pass for the conv-relu convenience layer.
62   """
63   conv_cache, relu_cache = cache
64   da = layers.relu_backward(dout, relu_cache)
65   dx, dw, db = layers.conv_backward_fast(da, conv_cache)
66   return dx, dw, db
67 
68 
69 def conv_relu_pool_forward(x, w, b, conv_param, pool_param):
70   """
71   Convenience layer that performs a convolution, a ReLU, and a pool.
72 
73   Inputs:
74   - x: Input to the convolutional layer
75   - w, b, conv_param: Weights and parameters for the convolutional layer
76   - pool_param: Parameters for the pooling layer
77 
78   Returns a tuple of:
79   - out: Output from the pooling layer
80   - cache: Object to give to the backward pass
81   """
82   a, conv_cache = layers.conv_forward_naive(x, w, b, conv_param)
83   s, relu_cache = layers.relu_forward(a)
84   out, pool_cache = layers.max_pool_forward_naive(s, pool_param)
85   cache = (conv_cache, relu_cache, pool_cache)
86   return out, cache
87 
88 
89 def conv_relu_pool_backward(dout, cache):
90   """
91   Backward pass for the conv-relu-pool convenience layer
92   """
93   conv_cache, relu_cache, pool_cache = cache
94   ds = layers.max_pool_backward_naive(dout, pool_cache)
95   da = layers.relu_backward(ds, relu_cache)
96   dx, dw, db = layers.conv_backward_naive(da, conv_cache)
97   return dx, dw, db

View Code

layers.py

  1 import numpy as np
  2 
  3 '''
  4 全連接層：矩陣變換，獲取對應目標相同的行與列
  5 輸入x: 2*32*16*16 
  6 輸入x_row: 2*8192
  7 超參w：8192*100
  8 輸出：矩陣乘法 2*8192 ->8192*100 =>2*100
  9 '''
 10 def affine_forward(x, w, b):   
 11     """    
 12     Computes the forward pass for an affine (fully-connected) layer. 
 13     The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N   
 14     examples, where each example x[i] has shape (d_1, ..., d_k). We will    
 15     reshape each input into a vector of dimension D = d_1 * ... * d_k, and    
 16     then transform it to an output vector of dimension M.    
 17     Inputs:    
 18     - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)    
 19     - w: A numpy array of weights, of shape (D, M)    
 20     - b: A numpy array of biases, of shape (M,)   
 21     Returns a tuple of:    
 22     - out: output, of shape (N, M)    
 23     - cache: (x, w, b)   
 24     """
 25     out = None
 26     # Reshape x into rows
 27     N = x.shape[0]
 28     x_row = x.reshape(N, -1)         # (N,D) -1表示不知道多少列，指定行，就能算出列 = 2 * 32 * 16 * 16/2 = 8192
 29     out = np.dot(x_row, w) + b       # (N,M) 2*8192 8192*100 =>2 * 100
 30     cache = (x, w, b)
 31 
 32     return out, cache
 33 '''
 34 反向傳播之affine矩陣變換
 35 根據dout求出dx,dw,db
 36 由 out = w * x =>
 37 dx = dout * w
 38 dw = dout * x
 39 db = dout * 1
 40 因爲dx 與 x，dw 與 w，db 與 b 大小（維度）必須相同
 41 dx = dout * wT  矩陣乘法
 42 dw = dxT * dout 矩陣乘法
 43 db = dout 按列求和
 44 '''
 45 def affine_backward(dout, cache):   
 46     """    
 47     Computes the backward pass for an affine layer.    
 48     Inputs:    
 49     - dout: Upstream derivative, of shape (N, M)    
 50     - cache: Tuple of: 
 51     - x: Input data, of shape (N, d_1, ... d_k)    
 52     - w: Weights, of shape (D, M)    
 53     Returns a tuple of:   
 54     - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
 55       dx = dout * w
 56     - dw: Gradient with respect to w, of shape (D, M)
 57       dw = dout * x
 58     - db: Gradient with respect to b, of shape (M,)
 59       db = dout * 1
 60     """
 61 
 62     x, w, b = cache    
 63     dx, dw, db = None, None, None
 64     dx = np.dot(dout, w.T)                       # (N,D)
 65     # dx維度必須跟x維度相同
 66     dx = np.reshape(dx, x.shape)                 # (N,d1,...,d_k)
 67     # 轉換成二維矩陣
 68     x_row = x.reshape(x.shape[0], -1)            # (N,D)
 69     dw = np.dot(x_row.T, dout)                   # (D,M)
 70 
 71     db = np.sum(dout, axis=0, keepdims=True)     # (1,M)    
 72 
 73     return dx, dw, db
 74 
 75 def relu_forward(x):   
 76     """ 激活函數，解決sigmoid梯度消失問題，網絡性能比sigmoid更好
 77     Computes the forward pass for a layer of rectified linear units (ReLUs).    
 78     Input:    
 79     - x: Inputs, of any shape    
 80     Returns a tuple of:    
 81     - out: Output, of the same shape as x    
 82     - cache: x    
 83     """   
 84     out = None    
 85     out = ReLU(x)    
 86     cache = x    
 87 
 88     return out, cache
 89 
 90 def relu_backward(dout, cache):   
 91     """  
 92     Computes the backward pass for a layer of rectified linear units (ReLUs).   
 93     Input:    
 94     - dout: Upstream derivatives, of any shape    
 95     - cache: Input x, of same shape as dout    
 96     Returns:    
 97     - dx: Gradient with respect to x    
 98     """    
 99     dx, x = None, cache    
100     dx = dout    
101     dx[x <= 0] = 0    
102 
103     return dx
104 
105 def svm_loss(x, y):   
106     """    
107     Computes the loss and gradient using for multiclass SVM classification.    
108     Inputs:    
109     - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class         
110          for the ith input.    
111     - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and         
112          0 <= y[i] < C   
113     Returns a tuple of:    
114     - loss: Scalar giving the loss   
115     - dx: Gradient of the loss with respect to x    
116     """    
117     N = x.shape[0]   
118     correct_class_scores = x[np.arange(N), y]    
119     margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)    
120     margins[np.arange(N), y] = 0   
121     loss = np.sum(margins) / N   
122     num_pos = np.sum(margins > 0, axis=1)    
123     dx = np.zeros_like(x)   
124     dx[margins > 0] = 1    
125     dx[np.arange(N), y] -= num_pos    
126     dx /= N    
127 
128     return loss, dx
129 '''
130 softmax_loss 求梯度優點: 求梯度運算簡單，方便 
131 softmax: softmax用於多分類過程中，它將多個神經元的輸出，映射到（0,1）區間內，
132 可以看成概率來理解，從而來進行多分類。
133 Si = exp(i)/[exp(j)求和]
134 softmax_loss：損失函數，求梯度dx必須用到損失函數，通過梯度下降更新超參
135 Loss = -[Ypred*ln(Sj真實類別位置的概率值)]求和
136 梯度dx : 對損失函數求一階偏導
137 如果 j = i =>dx = Sj - 1
138 如果 j != i => dx = Sj
139 '''
140 def softmax_loss(x, y):    
141     """    
142     Computes the loss and gradient for softmax classification.    Inputs:    
143     - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class         
144     for the ith input.    
145     - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and         
146          0 <= y[i] < C   
147     Returns a tuple of:    
148     - loss: Scalar giving the loss    
149     - dx: Gradient of the loss with respect to x   
150     """
151     '''
152      x - np.max(x, axis=1, keepdims=True) 對數據進行預處理，
153      防止np.exp(x - np.max(x, axis=1, keepdims=True))得到結果太分散；
154      np.max(x, axis=1, keepdims=True)保證所得結果維度不變；
155     '''
156     probs = np.exp(x - np.max(x, axis=1, keepdims=True))
157     # 計算softmax，準確的說應該是soft，因爲還沒有選取概率最大值的操作
158     probs /= np.sum(probs, axis=1, keepdims=True)
159     # 樣本圖片個數
160     N = x.shape[0]
161     # 計算圖片損失
162     loss = -np.sum(np.log(probs[np.arange(N), y])) / N
163     # 複製概率
164     dx = probs.copy()
165     # 針對 i = j 求梯度
166     dx[np.arange(N), y] -= 1
167     # 計算每張樣本圖片梯度
168     dx /= N    
169 
170     return loss, dx
171 
172 def ReLU(x):    
173     """ReLU non-linearity."""    
174     return np.maximum(0, x)
175 '''
176 功能：獲取圖片特徵
177 前向卷積：每次用一個3維的卷積核與圖片RGB各個通道分別卷積（卷積核1與R進行點積，卷積核2與G點積，卷積核3與B點積）,
178 然後將3個結果求和（也就是 w*x ）,再加上 b，就是新結果某一位置輸出，這是卷積核在圖片某一固定小範圍內（卷積核大小）的卷積，
179 要想獲得整個圖片的卷積結果，需要在圖片上滑動卷積核（先右後下），直至遍歷整個圖片。
180 x: 2*3*32*32  每次選取2張圖片，圖片大小32*32，彩色(3通道)
181 w: 32*3*7*7   卷積核每個大小是7*7；對應輸入x的3通道，所以是3維，有32個卷積核
182 pad = 3(圖片邊緣行列補0)，stride = 1(卷積核移動步長)
183 輸出寬*高結果：(32-7+2*3)/1 + 1 = 32
184 輸出大小：2*32*32*32
185 '''
186 def conv_forward_naive(x, w, b, conv_param):
187     stride, pad = conv_param['stride'], conv_param['pad']
188     N, C, H, W = x.shape
189     F, C, HH, WW = w.shape
190     x_padded = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode='constant')
191     '''// : 求整型'''
192     H_new = 1 + (H + 2 * pad - HH) // stride
193     W_new = 1 + (W + 2 * pad - WW) // stride
194     s = stride
195     out = np.zeros((N, F, H_new, W_new))
196 
197     for i in range(N):       # ith image    
198         for f in range(F):   # fth filter        
199             for j in range(H_new):            
200                 for k in range(W_new):   
201                     #print x_padded[i, :, j*s:HH+j*s, k*s:WW+k*s].shape
202                     #print w[f].shape  
203                     #print b.shape  
204                     #print np.sum((x_padded[i, :, j*s:HH+j*s, k*s:WW+k*s] * w[f]))         
205                     out[i, f, j, k] = np.sum(x_padded[i, :, j*s:HH+j*s, k*s:WW+k*s] * w[f]) + b[f]
206 
207     cache = (x, w, b, conv_param)
208 
209     return out, cache
210 
211 '''
212 反向傳播之卷積：卷積核3*7*7
213 輸入dout:2*32*32*32
214 輸出dx:2*3*32*32
215 '''
216 def conv_backward_naive(dout, cache):
217 
218     x, w, b, conv_param = cache
219     # 邊界補0
220     pad = conv_param['pad']
221     # 步長
222     stride = conv_param['stride']
223     F, C, HH, WW = w.shape
224     N, C, H, W = x.shape
225     H_new = 1 + (H + 2 * pad - HH) // stride
226     W_new = 1 + (W + 2 * pad - WW) // stride
227 
228     dx = np.zeros_like(x)
229     dw = np.zeros_like(w)
230     db = np.zeros_like(b)
231 
232     s = stride
233     x_padded = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), 'constant')
234     dx_padded = np.pad(dx, ((0, 0), (0, 0), (pad, pad), (pad, pad)), 'constant')
235     # 圖片個數
236     for i in range(N):       # ith image
237         # 卷積核濾波個數
238         for f in range(F):   # fth filter        
239             for j in range(H_new):            
240                 for k in range(W_new):
241                     # 3*7*7
242                     window = x_padded[i, :, j*s:HH+j*s, k*s:WW+k*s]
243                     db[f] += dout[i, f, j, k]
244                     # 3*7*7
245                     dw[f] += window * dout[i, f, j, k]
246                     # 3*7*7 => 2*3*38*38
247                     dx_padded[i, :, j*s:HH+j*s, k*s:WW+k*s] += w[f] * dout[i, f, j, k]
248 
249     # Unpad
250     dx = dx_padded[:, :, pad:pad+H, pad:pad+W]
251 
252     return dx, dw, db
253 '''
254 功能：減少特徵尺寸大小
255 前向最大池化：在特徵矩陣中選取指定大小窗口，獲取窗口內元素最大值作爲輸出窗口映射值，
256 先有後下遍歷，直至獲取整個特徵矩陣對應的新映射特徵矩陣。
257 輸入x：2*32*32*32
258 池化參數：窗口：2*2，步長：2
259 輸出窗口寬，高：(32-2)/2 + 1 = 16
260 輸出大小：2*32*16*16
261 '''
262 def max_pool_forward_naive(x, pool_param):
263     HH, WW = pool_param['pool_height'], pool_param['pool_width']
264     s = pool_param['stride']
265     N, C, H, W = x.shape
266     H_new = 1 + (H - HH) // s
267     W_new = 1 + (W - WW) // s
268     out = np.zeros((N, C, H_new, W_new))
269     for i in range(N):    
270         for j in range(C):        
271             for k in range(H_new):            
272                 for l in range(W_new):                
273                     window = x[i, j, k*s:HH+k*s, l*s:WW+l*s] 
274                     out[i, j, k, l] = np.max(window)
275 
276     cache = (x, pool_param)
277 
278     return out, cache
279 
280 '''
281 反向傳播之池化：增大特徵尺寸大小
282 在緩存中取出前向池化時輸入特徵，選取某一範圍矩陣窗口，
283 找出最大值所在的位置，根據這個位置將dout值映射到新的矩陣對應位置上，
284 而新矩陣其他位置都初始化爲0.
285 輸入dout:2*32*16*16
286 輸出dx:2*32*32*32
287 '''
288 def max_pool_backward_naive(dout, cache):
289     x, pool_param = cache
290     HH, WW = pool_param['pool_height'], pool_param['pool_width']
291     s = pool_param['stride']
292     N, C, H, W = x.shape
293     H_new = 1 + (H - HH) // s
294     W_new = 1 + (W - WW) // s
295     dx = np.zeros_like(x)
296     for i in range(N):    
297         for j in range(C):        
298             for k in range(H_new):            
299                 for l in range(W_new):
300                     # 取前向傳播時輸入的某一池化窗口
301                     window = x[i, j, k*s:HH+k*s, l*s:WW+l*s]
302                     # 計算窗口最大值
303                     m = np.max(window)
304                     # 根據最大值所在位置以及dout對應值=>新矩陣窗口數值
305                     # [false,false
306                     #  true, false]  * 1 => [0,0
307                     #                        1,0]
308                     dx[i, j, k*s:HH+k*s, l*s:WW+l*s] = (window == m) * dout[i, j, k, l]
309 
310     return dx

View Code

optim.py

  1 import numpy as np
  2 
  3 def sgd(w, dw, config=None):    
  4     """    
  5     Performs vanilla stochastic gradient descent.    
  6     config format:    
  7     - learning_rate: Scalar learning rate.    
  8     """    
  9     if config is None: config = {}    
 10     config.setdefault('learning_rate', 1e-2)    
 11     w -= config['learning_rate'] * dw    
 12 
 13     return w, config
 14 '''
 15 SGD:隨機梯度下降：由梯度計算新的權重矩陣w
 16 sgd_momentum 是sgd的改進版，解決sgd更新不穩定，陷入局部最優的問題。
 17 增加一個動量因子momentum，可以在一定程度上增加穩定性，
 18 從而學習地更快，並且還有一定擺脫局部最優的能力。
 19 
 20 '''
 21 def sgd_momentum(w, dw, config=None):    
 22     """    
 23     Performs stochastic gradient descent with momentum.    
 24     config format:    
 25     - learning_rate: Scalar learning rate.    
 26     - momentum: Scalar between 0 and 1 giving the momentum value.                
 27     Setting momentum = 0 reduces to sgd.    
 28     - velocity（速度）: A numpy array of the same shape as w and dw used to store a moving
 29     average of the gradients.   
 30     """   
 31     if config is None: config = {}    
 32     config.setdefault('learning_rate', 1e-2)   
 33     config.setdefault('momentum', 0.9)
 34     # config 如果存在屬性velocity，則獲取config['velocity']，否則獲取np.zeros_like(w)
 35     v = config.get('velocity', np.zeros_like(w))    
 36     next_w = None    
 37     v = config['momentum'] * v - config['learning_rate'] * dw    
 38     next_w = w + v    
 39     config['velocity'] = v    
 40 
 41     return next_w, config
 42 
 43 def rmsprop(x, dx, config=None):    
 44     """    
 45     Uses the RMSProp update rule, which uses a moving average of squared gradient    
 46     values to set adaptive per-parameter learning rates.    
 47     config format:    
 48     - learning_rate: Scalar learning rate.    
 49     - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared                  
 50     gradient cache.    
 51     - epsilon: Small scalar used for smoothing to avoid dividing by zero.    
 52     - cache: Moving average of second moments of gradients.   
 53     """    
 54     if config is None: config = {}    
 55     config.setdefault('learning_rate', 1e-2)  
 56     config.setdefault('decay_rate', 0.99)    
 57     config.setdefault('epsilon', 1e-8)    
 58     config.setdefault('cache', np.zeros_like(x))    
 59     next_x = None    
 60     cache = config['cache']    
 61     decay_rate = config['decay_rate']    
 62     learning_rate = config['learning_rate']    
 63     epsilon = config['epsilon']    
 64     cache = decay_rate * cache + (1 - decay_rate) * (dx**2)    
 65     x += - learning_rate * dx / (np.sqrt(cache) + epsilon)  
 66     config['cache'] = cache    
 67     next_x = x    
 68 
 69     return next_x, config
 70 
 71 def adam(x, dx, config=None):    
 72     """    
 73     Uses the Adam update rule, which incorporates moving averages of both the  
 74     gradient and its square and a bias correction term.    
 75     config format:    
 76     - learning_rate: Scalar learning rate.    
 77     - beta1: Decay rate for moving average of first moment of gradient.    
 78     - beta2: Decay rate for moving average of second moment of gradient.   
 79     - epsilon: Small scalar used for smoothing to avoid dividing by zero.    
 80     - m: Moving average of gradient.    
 81     - v: Moving average of squared gradient.    
 82     - t: Iteration number.   
 83     """    
 84     if config is None: config = {}    
 85     config.setdefault('learning_rate', 1e-3)    
 86     config.setdefault('beta1', 0.9)    
 87     config.setdefault('beta2', 0.999)    
 88     config.setdefault('epsilon', 1e-8)    
 89     config.setdefault('m', np.zeros_like(x))    
 90     config.setdefault('v', np.zeros_like(x))    
 91     config.setdefault('t', 0)   
 92     next_x = None    
 93     m = config['m']    
 94     v = config['v']    
 95     beta1 = config['beta1']    
 96     beta2 = config['beta2']    
 97     learning_rate = config['learning_rate']    
 98     epsilon = config['epsilon']   
 99     t = config['t']    
100     t += 1    
101     m = beta1 * m + (1 - beta1) * dx    
102     v = beta2 * v + (1 - beta2) * (dx**2)    
103     m_bias = m / (1 - beta1**t)    
104     v_bias = v / (1 - beta2**t)    
105     x += - learning_rate * m_bias / (np.sqrt(v_bias) + epsilon)    
106     next_x = x    
107     config['m'] = m    
108     config['v'] = v    
109     config['t'] = t    
110 
111     return next_x, config

View Code

solver.py

  1 import numpy as np
  2 try:
  3   from . import optim
  4 except Exception:
  5   import optim
  6 
  7 class Solver(object):
  8   """
  9   A Solver encapsulates all the logic necessary for training classification
 10   models. The Solver performs stochastic gradient descent using different
 11   update rules defined in optim.py.
 12 
 13   The solver accepts both training and validataion data and labels so it can
 14   periodically check classification accuracy on both training and validation
 15   data to watch out for overfitting.
 16 
 17   To train a model, you will first construct a Solver instance, passing the
 18   model, dataset, and various optoins (learning rate, batch size, etc) to the
 19   constructor. You will then call the train() method to run the optimization
 20   procedure and train the model.
 21   
 22   After the train() method returns, model.params will contain the parameters
 23   that performed best on the validation set over the course of training.
 24   In addition, the instance variable solver.loss_history will contain a list
 25   of all losses encountered during training and the instance variables
 26   solver.train_acc_history and solver.val_acc_history will be lists containing
 27   the accuracies of the model on the training and validation set at each epoch.
 28   
 29   Example usage might look something like this:
 30   
 31   data = {
 32     'X_train': # training data
 33     'y_train': # training labels
 34     'X_val': # validation data
 35     'X_train': # validation labels
 36   }
 37   model = MyAwesomeModel(hidden_size=100, reg=10)
 38   solver = Solver(model, data,
 39                   update_rule='sgd',
 40                   optim_config={
 41                     'learning_rate': 1e-3,
 42                   },
 43                   lr_decay=0.95,
 44                   num_epochs=10, batch_size=100,
 45                   print_every=100)
 46   solver.train()
 47 
 48 
 49   A Solver works on a model object that must conform to the following API:
 50 
 51   - model.params must be a dictionary mapping string parameter names to numpy
 52     arrays containing parameter values.
 53 
 54   - model.loss(X, y) must be a function that computes training-time loss and
 55     gradients, and test-time classification scores, with the following inputs
 56     and outputs:
 57 
 58     Inputs:
 59     - X: Array giving a minibatch of input data of shape (N, d_1, ..., d_k)
 60     - y: Array of labels, of shape (N,) giving labels for X where y[i] is the
 61       label for X[i].
 62 
 63     Returns:
 64     If y is None, run a test-time forward pass and return:
 65     - scores: Array of shape (N, C) giving classification scores for X where
 66       scores[i, c] gives the score of class c for X[i].
 67 
 68     If y is not None, run a training time forward and backward pass and return
 69     a tuple of:
 70     - loss: Scalar giving the loss
 71     - grads: Dictionary with the same keys as self.params mapping parameter
 72       names to gradients of the loss with respect to those parameters.
 73   """
 74 
 75   def __init__(self, model, data, **kwargs):
 76     """
 77     Construct a new Solver instance.
 78     
 79     Required arguments:
 80     - model: A model object conforming to the API described above
 81     - data: A dictionary of training and validation data with the following:
 82       'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images
 83       'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images
 84       'y_train': Array of shape (N_train,) giving labels for training images
 85       'y_val': Array of shape (N_val,) giving labels for validation images
 86       
 87     Optional arguments:
 88     - update_rule: A string giving the name of an update rule in optim.py.
 89       Default is 'sgd'.
 90     - optim_config: A dictionary containing hyperparameters that will be
 91       passed to the chosen update rule. Each update rule requires different
 92       hyperparameters (see optim.py) but all update rules require a
 93       'learning_rate' parameter so that should always be present.
 94     - lr_decay: A scalar for learning rate decay; after each epoch the learning
 95       rate is multiplied by this value.
 96     - batch_size: Size of minibatches used to compute loss and gradient during
 97       training.
 98     - num_epochs: The number of epochs to run for during training.
 99     - print_every: Integer; training losses will be printed every print_every
100       iterations.
101     - verbose: Boolean; if set to false then no output will be printed during
102       training.
103     """
104     self.model = model
105     self.X_train = data['X_train']
106     self.y_train = data['y_train']
107     self.X_val = data['X_val']
108     self.y_val = data['y_val']
109     
110     # Unpack keyword arguments
111     # pop(key, default):刪除kwargs對象中key，如果存在該key，返回該key對應的value，否則，返回default值。
112     self.update_rule = kwargs.pop('update_rule', 'sgd')
113     self.optim_config = kwargs.pop('optim_config', {})
114     self.lr_decay = kwargs.pop('lr_decay', 1.0)
115     self.batch_size = kwargs.pop('batch_size', 2)
116     self.num_epochs = kwargs.pop('num_epochs', 10)
117 
118     self.print_every = kwargs.pop('print_every', 10)
119     self.verbose = kwargs.pop('verbose', True)
120 
121     # Throw an error if there are extra keyword arguments
122     # 刪除kwargs中參數後，校驗是否還有多餘參數
123     if len(kwargs) > 0:
124       extra = ', '.join('"%s"' % k for k in kwargs.keys())
125       raise ValueError('Unrecognized arguments %s' % extra)
126 
127     # Make sure the update rule exists, then replace the string
128     # name with the actual function
129     # 檢查optim對象中是否有屬性或方法名爲self.update_rule
130     if not hasattr(optim, self.update_rule):
131       raise ValueError('Invalid update_rule "%s"' % self.update_rule)
132     self.update_rule = getattr(optim, self.update_rule)
133 
134     self._reset()
135 
136 
137   def _reset(self):
138     """
139     Set up some book-keeping variables for optimization. Don't call this
140     manually.
141     """
142     # Set up some variables for book-keeping
143     self.epoch = 0
144     self.best_val_acc = 0
145     self.best_params = {}
146     self.loss_history = []
147     self.train_acc_history = []
148     self.val_acc_history = []
149 
150     # Make a deep copy of the optim_config for each parameter
151     self.optim_configs = {}
152     for p in self.model.params:
153       d = {k: v for k, v in self.optim_config.items()}
154       self.optim_configs[p] = d
155 
156 
157   def _step(self):
158     """
159     Make a single gradient update. This is called by train() and should not
160     be called manually.
161     """
162     # Make a minibatch of training data
163     # 500 張圖片
164     num_train = self.X_train.shape[0]
165     # 隨機選出batch_size：2 張
166     batch_mask = np.random.choice(num_train, self.batch_size)
167 
168    # batch_mask = [t%(num_train//2), num_train//2 + t%(num_train//2)]
169 
170 
171 
172     # 訓練樣本矩陣[2,3,32,32]
173     X_batch = self.X_train[batch_mask]
174     # 標籤矩陣[2,] 圖片類型
175     y_batch = self.y_train[batch_mask]
176 
177     # Compute loss and gradient
178     loss, grads = self.model.loss(X_batch, y_batch)
179     self.loss_history.append(loss)
180 
181     # 更新模型超參(w1,b1),(w2,b2),(w3,b3)，以及保存更新超參時對應參數因子
182     # Perform a parameter update
183     for p, w in self.model.params.items():
184       dw = grads[p]
185       config = self.optim_configs[p]
186       next_w, next_config = self.update_rule(w, dw, config)
187       self.model.params[p] = next_w
188       # 保存參數因子，learning_rate(學習率)，velocity(速度)
189       self.optim_configs[p] = next_config
190 
191 
192   def check_accuracy(self, X, y, num_samples=None, batch_size=2):
193     """
194     Check accuracy of the model on the provided data.
195     
196     Inputs:
197     - X: Array of data, of shape (N, d_1, ..., d_k)
198     - y: Array of labels, of shape (N,)
199     - num_samples: If not None, subsample the data and only test the model
200       on num_samples datapoints.
201     - batch_size: Split X and y into batches of this size to avoid using too
202       much memory.
203       
204     Returns:
205     - acc: Scalar giving the fraction of instances that were correctly
206       classified by the model.
207     """
208     
209     # Maybe subsample the data
210     N = X.shape[0]
211     if num_samples is not None and N > num_samples:
212       # 隨機選取num_samples張圖片，返回選取圖片索引
213       mask = np.random.choice(N, num_samples)
214       N = num_samples
215       X = X[mask]
216       y = y[mask]
217 
218     # Compute predictions in batches
219     num_batches = N // batch_size
220     if N % batch_size != 0:
221       num_batches += 1
222     y_pred = []
223     for i in range(num_batches):
224       start = i * batch_size
225       end = (i + 1) * batch_size
226       scores = self.model.loss(X[start:end])
227       y_pred.append(np.argmax(scores, axis=1))
228     y_pred = np.hstack(y_pred)
229     acc = np.mean(y_pred == y)
230 
231     return acc
232 
233   '''
234    訓練模型：核心方法
235    epoch > batch_size > iteration >= 1
236    訓練總的次數 = num_epochs * iterations_per_epoch
237   '''
238   def train(self):
239     """
240     Run optimization to train the model.
241     """
242     num_train = self.X_train.shape[0]
243     iterations_per_epoch = max(num_train // self.batch_size, 1)
244     num_iterations = self.num_epochs * iterations_per_epoch
245     # 迭代總的次數
246     for t in range(num_iterations):
247       # 某次iteration訓練
248       self._step()
249 
250       # Maybe print training loss
251       # verbose：是否顯示詳細信息
252       if self.verbose and t % self.print_every == 0:
253         print ('(Iteration %d / %d) loss: %f' % (
254                t + 1, num_iterations, self.loss_history[-1]))
255 
256       # At the end of every epoch, increment the epoch counter and decay the
257       # learning rate.
258       # 每迭代完一次epoch後，更新學習率learning_rate，加快運算效率。
259       epoch_end = (t + 1) % iterations_per_epoch == 0
260       if epoch_end:
261         self.epoch += 1
262         for k in self.optim_configs:
263           self.optim_configs[k]['learning_rate'] *= self.lr_decay
264 
265       # Check train and val accuracy on the first iteration, the last
266       # iteration, and at the end of each epoch.
267       # 在第1次迭代，最後1次迭代，或者運行完一個epoch後，校驗訓練結果。
268       first_it = (t == 0)
269       last_it = (t == num_iterations + 1)
270       if first_it or last_it or epoch_end:
271         train_acc = self.check_accuracy(self.X_train, self.y_train,
272                                         num_samples=4)
273         val_acc = self.check_accuracy(self.X_val, self.y_val,num_samples=4)
274         self.train_acc_history.append(train_acc)
275         self.val_acc_history.append(val_acc)
276 
277         if self.verbose:
278           print ('(Epoch %d / %d) train acc: %f; val_acc: %f' % (
279                  self.epoch, self.num_epochs, train_acc, val_acc))
280 
281         # Keep track of the best model
282         if val_acc > self.best_val_acc:
283           self.best_val_acc = val_acc
284           self.best_params = {}
285           for k, v in self.model.params.items():
286             self.best_params[k] = v.copy()
287 
288     # At the end of training swap the best params into the model
289     self.model.params = self.best_params

View Code

5.運行結果以及分析

這裏選取500張圖片作爲訓練樣本，epoch = 5，batch = 2，每次隨機選取2張圖片，迭代 5 * 500/2 = 1250次，測試樣本選取50張。

由運行結果可以看出，損失loss是逐步下降的。

測試結果只有12%左右，原因有以下幾點：

1. 模型比較簡單，特徵提取不能反映真實特徵（一次卷積）；

2. 會出現過擬合問題；

3. 原始訓練數據分類圖片紋理複雜，這些圖片可變性大，從而導致分類結果準確度低；

(airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck)

後續會通過tensorflow來實現CNN，測試準確率可以達到71.95%。

6. 參考文獻

視覺一隻白的博客《常用損失函數小結》https://blog.csdn.net/zhangjunp3/article/details/80467350

理想萬歲的博客《Softmax函數詳解與推導》：http://www.cnblogs.com/zongfa/p/8971213.html

下路派出所的博客《深度學習（九）深度學習最全優化方法總結比較（SGD，Momentum，Nesterov Momentum，Adagrad，Adadelta，RMSprop，Adam）》

http://www.cnblogs.com/callyblog/p/8299074.html

不要讓懶惰佔據你的大腦，不要讓妥協拖垮了你的人生。青春就是一張票，能不能趕上時代的快車，你的步伐就掌握在你的腳下。

深度學習之卷積神經網絡(CNN)詳解與代碼實現（一）

卷積神經網絡(CNN)詳解與代碼實現

目錄

1.應用場景

2.卷積神經網絡結構

3.代碼實現流程圖以及介紹

4.代碼實現（python3.6）

5.運行結果以及分析

6.參考文獻

1.應用場景

2.卷積神經網絡結構

2.1 卷積（convolution）

2.2 Relu激活函數

2.3 池化

2.4 全連接（full connection）

2.5 損失函數（softmax_loss）

2.6 前向傳播（forward propagation）

2.7 反向傳播（back propagation）

2.8 隨機梯度下降（sgd_momentum）

5.運行結果以及分析

6. 參考文獻

深度剖析目標檢測算法YOLOV4

docker容器化python服務部署（supervisor-gunicorn-flask）

機器學習之樸素貝葉斯算法原理與代碼實現

深度學習之卷積神經網絡(CNN)詳解與代碼實現（一）

機器學習之logistic迴歸算法與代碼實現原理

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結