深度學習系列之cs231n assignment1 two_layer_net(五)

寫在開頭:這次是完成assignmengt1的第四個作業淺層神經網絡,通過這樣的淺層神經網絡來感受神經網絡向前計算得分與向後計算梯度更新的過程。

內容安排

今天的任務主要是搭建兩層全連接層,並在中間加入Relu的操作處理,最後使用softmax的損失函數進行梯度的更新,並進行預測。在本次的任務中與上一節softmax的區別在於搭建網絡和全連接層的傳遞,任務的loss是softmax是一樣的,同樣會在需要公示或者講解的地方進行講解。

開始完成任務

1.構建網路並加載測試數據
首先是加載一些調用的包,還有我們用於計算誤差的函數,

# A bit of setup

import numpy as np
import matplotlib.pyplot as plt

from cs231n.classifiers.neural_net import TwoLayerNet


%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

然後爲了方便我們測試代碼是否運行正常,我們在這裏應當使用小樣本數據來進行測試,並且我們要初始化網絡的輸入,隱藏層數量,還有輸出類別的數量,

# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
    np.random.seed(0)
    return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
    np.random.seed(1)
    X = 10 * np.random.randn(num_inputs, input_size)
    y = np.array([0, 1, 2, 2, 1])
    return X, y

net = init_toy_model()
X, y = init_toy_data()

2.編寫得分、損失、梯度函數
老樣子這時我們需打開neural_net.py並對其中的任務進行編輯,在前面我們可以一起編輯得分、損失和梯度的計算,但是這裏我們需要通過分塊並採用講解加編程的結合進行。
淺層神經網絡
首先我們講解一下今天的關鍵的兩層神經網絡是個啥。首先上一張圖,
在這裏插入圖片描述這個圖就是我們今天要研究的兩層淺層神經網絡,首先我們來說一下第一個問題,他們是如何進行傳遞的?
Q1:神經網絡是怎麼進行傳遞的?
我們這裏的數據從輸入層到隱藏層1是怎麼進行的呢?利用的就是一個全連接層,跟我們在前文計算svm和softmax一樣,通過像素與權重相乘得到每個類別的得分,在這裏插入圖片描述
這裏展示的s231n課程文件中對於一張貓貓圖片的預測流程,這與前幾節是一樣的,先將圖片像素向量化乘以權重矩陣子再加上常數調整項,得到每個類別的得分。如果還有更多層,那麼就將計算得到的結果再與下一個權重矩陣相乘得到類的得分。
Q2:ReLU是什麼?
根據題目要求,我們需要在第一層與第二層之間加入一個ReLU函數進行處理,那麼ReLU函數是什麼呢?我們將其函數圖像放出來,
在這裏插入圖片描述
ReLU函數的形式是,
f=max(0,x) f=max(0,x)
通過這樣的處理,在第一層計算得到數據後,需要對其進行ReLU處理,讓小於0的數變爲0,也就是不去考慮得分小於0的值,那麼有的朋友會問了,爲什麼需要這樣一個ReLU處理呢?不處理不行嗎?其他的處理方法不行嗎?這個主要還是因爲ReLU首先符合一種生物學的激活特性,其次能夠引入稀疏增快計算效率,可以參考一下這篇博客,點擊這裏進行查看。筆者會在後期看了論文後進行分析和討論。
任務1:得分函數代碼編寫
所以再將ReLU後的結果放入下一層中就可以輸出我們的得分了。我們直接在這裏展示一下我們第一個需要編輯的代碼,就是求每個樣本在全部類別的得分,我們在編寫代碼時需要注意一下矩陣維度的對應,

"""
X: (N, D). X[i]就是一個訓練樣本,共有N個訓練樣本.
W1:第一層權重;(D, H)
b1: 第一層偏移項;(H,)
W2: 第二層權重;(H, C)
b2: 第二層偏移項;(C,)
"""
layer1 = np.dot(X, W1) + b1  #輸出(N,H)
reluLayer = np.maximum(0, layer1) #輸出(N,H)
scores = np.dot(reluLayer, W2) + b2 #輸出(N,C)

第一個任務就完成了,在作業裏有對應的驗證程序我們後再後面調用時說明。那麼求解完了得分,我們就需要就算損失函數,這裏我們仍使用上一節的softmax損失函數,具體函數形式見上一節,可以點擊這裏進行查看。
任務2:sotfmax損失函數代碼編寫
損失函數是通過計算得分函數的變形得到,於是代碼如下,

scores = scores - np.max(scores, axis=1).reshape(-1,1)
softmaxFucntion = np.exp(scores)/np.sum(np.exp(scores), axis=1).reshape(-1,1)
loss = np.sum(-np.log(softmaxFucntion[range(N), list(y)]))
loss /= N
loss += 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)

這裏我們需要注意兩點,第一點就是需要對得分函數進行處理,減去其最大值爲了使得結果更加穩定;第二個點就是在正則化的時候需要對兩個權重都進行正則化,並且各佔一半的比例,對於正則化的使用我還不是很熟練後面會集中學習一下再更新。同樣針對損失函數也有對應的測試代碼,後面再進行展示。
任務3:淺層神經網絡梯度代碼編寫
計算完得分函數、損失函數後就是需要計算梯度來對參數進行更新,這裏我們將整個的一個推導過程進行展示,主要使用的是求偏導的鏈式法則進行倒推。
首先我們計算得分函數的流程爲,
layer1=X×W1+b1 layer1 = X\times W_1+b_1 reluLayer=max(0,layer1) reluLayer = max(0,layer1) scores=layer2=reluLayer×W2+b2 scores = layer2 = reluLayer\times W_2+b_2 loss=softmax(layer2) loss=softmax(layer2)
然後我們計算倒推梯度計算

lossW2=losslayer2×layer2W2=lossscores×scoresW2=lossscores×reluLayer \frac{\partial loss}{\partial W_2}=\frac{\partial loss}{\partial layer2}\times\frac{\partial layer2}{\partial W_2}=\frac{\partial loss}{\partial scores}\times\frac{\partial scores}{\partial W_2}=\frac{\partial loss}{\partial scores}\times reluLayer lossb2=losslayer2×layer2b2=lossscores×scoresb2=(1,...,1)N×lossscores \frac{\partial loss}{\partial b_2}=\frac{\partial loss}{\partial layer2}\times\frac{\partial layer2}{\partial b_2}=\frac{\partial loss}{\partial scores}\times\frac{\partial scores}{\partial b_2}=(1,...,1)_N\times\frac{\partial loss}{\partial scores} lossreluLayer=losslayer2×layer2reluLayer=lossscores×scoresreluLayer=lossscores×W2 \frac{\partial loss}{\partial reluLayer}=\frac{\partial loss}{\partial layer2}\times\frac{\partial layer2}{\partial reluLayer}=\frac{\partial loss}{\partial scores}\times\frac{\partial scores}{\partial reluLayer}=\frac{\partial loss}{\partial scores}\times W_2 lossW1=lossreluLayer×reluLayerW1=lossscores×X \frac{\partial loss}{\partial W_1}=\frac{\partial loss}{\partial reluLayer}\times\frac{\partial reluLayer}{\partial W_1}=\frac{\partial loss}{\partial scores}\times X lossb1=lossreluLayer×reluLayerb1=(1,...,1)N×lossreluLayer \frac{\partial loss}{\partial b_1}=\frac{\partial loss}{\partial reluLayer}\times\frac{\partial reluLayer}{\partial b_1}=(1,...,1)_N\times\frac{\partial loss}{\partial reluLayer} 這樣就得到了所有參數的梯度,就可以進行梯度編程計算,然後更新權重,進行迭代了。這裏值得一提的是,在我們計算完當前輪的權重梯度時我們需要添加L1正則項,至於爲什麼我會對權重梯度增加正則項,爲什麼不仍然採用L2的正則項,我會在確認後進行更新的。

dscores = softmaxFucntion.copy()
dscores[range(N), list(y)] -= 1
dscores /= N
dW2 = np.dot(reluLayer.T, dscores)
db2 = np.sum(dscores, axis=0)

drelu = np.dot(dscores, W2.T)
drelu[reluLayer <= 0] = 0

dW1 = np.dot(X.T, drelu)
db1 = np.sum(drelu, axis=0)

dW2 += reg*W2
dW1 += reg*W1

grads['W1'] = dW1
grads['b1'] = db1
grads['W2'] = dW2
grads['b2'] = db2

3.訓練和預測函數的編寫
然後在計算完梯度後我們需要去更新我們的權重訓練我們的模型,所以需要編寫訓練函數,但這裏的要求我們只需要完成隨機梯度下降的小樣本選擇以及參數的更新過程的代碼即可,

idx_batch = np.random.choice(num_train, batch_size, replace=True)
X_batch = X[idx_batch]
y_batch = y[idx_batch]
self.params['W1'] -= learning_rate * grads['W1']
self.params['b1'] -= learning_rate * grads['b1']
self.params['W2'] -= learning_rate * grads['W2']
self.params['b2'] -= learning_rate * grads['b2']

最後我們的目的是爲了預測不同樣本的類別因此我們還需要編寫一個預測函數,同樣選擇每一類中得分函數最大的爲該樣本的類,

W1 = self.params['W1']
b1 = self.params['b1']
W2 = self.params['W2']
b2 = self.params['b2']
layer1 = np.dot(X, W1)+b1
reluLayer = np.maximum(0, layer1)
scores = np.dot(reluLayer, W2) + b2
y_pred = np.argmax(scores, axis=1)

好了到這裏我們就完成了neural_net裏的所有任務,完整代碼如下,

from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt
from past.builtins import xrange

class TwoLayerNet(object):
  """
  A two-layer fully-connected neural network. The net has an input dimension of
  N, a hidden layer dimension of H, and performs classification over C classes.
  We train the network with a softmax loss function and L2 regularization on the
  weight matrices. The network uses a ReLU nonlinearity after the first fully
  connected layer.

  In other words, the network has the following architecture:

  input - fully connected layer - ReLU - fully connected layer - softmax

  The outputs of the second fully-connected layer are the scores for each class.
  """

  def __init__(self, input_size, hidden_size, output_size, std=1e-4):
    """
    Initialize the model. Weights are initialized to small random values and
    biases are initialized to zero. Weights and biases are stored in the
    variable self.params, which is a dictionary with the following keys:

    W1: First layer weights; has shape (D, H)
    b1: First layer biases; has shape (H,)
    W2: Second layer weights; has shape (H, C)
    b2: Second layer biases; has shape (C,)

    Inputs:
    - input_size: The dimension D of the input data.
    - hidden_size: The number of neurons H in the hidden layer.
    - output_size: The number of classes C.
    """
    self.params = {}
    self.params['W1'] = std * np.random.randn(input_size, hidden_size)
    self.params['b1'] = np.zeros(hidden_size)
    self.params['W2'] = std * np.random.randn(hidden_size, output_size)
    self.params['b2'] = np.zeros(output_size)

  def loss(self, X, y=None, reg=0.0):
    """
    Compute the loss and gradients for a two layer fully connected neural
    network.

    Inputs:
    - X: Input data of shape (N, D). Each X[i] is a training sample.
    - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
      an integer in the range 0 <= y[i] < C. This parameter is optional; if it
      is not passed then we only return scores, and if it is passed then we
      instead return the loss and gradients.
    - reg: Regularization strength.

    Returns:
    If y is None, return a matrix scores of shape (N, C) where scores[i, c] is
    the score for class c on input X[i].

    If y is not None, instead return a tuple of:
    - loss: Loss (data loss and regularization loss) for this batch of training
      samples.
    - grads: Dictionary mapping parameter names to gradients of those parameters
      with respect to the loss function; has the same keys as self.params.
    """
    # Unpack variables from the params dictionary
    W1, b1 = self.params['W1'], self.params['b1']
    W2, b2 = self.params['W2'], self.params['b2']
    N, D = X.shape

    # Compute the forward pass
    layer1 = np.dot(X, W1) + b1
    reluLayer = np.maximum(0, layer1)
    scores = np.dot(reluLayer, W2) + b2
    #############################################################################

    # TODO: Perform the forward pass, computing the class scores for the input. #
    # Store the result in the scores variable, which should be an array of      #
    # shape (N, C).                                                             #
    #############################################################################
    pass
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

    # If the targets are not given then jump out, we're done
    if y is None:
      return scores

    # Compute the loss
    scores = scores - np.max(scores, axis=1).reshape(-1,1)
    softmaxFucntion = np.exp(scores)/np.sum(np.exp(scores), axis=1).reshape(-1,1)
    loss = np.sum(-np.log(softmaxFucntion[range(N), list(y)]))
    loss /= N
    loss += 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)


    #############################################################################
    # TODO: Finish the forward pass, and compute the loss. This should include  #
    # both the data loss and L2 regularization for W1 and W2. Store the result  #
    # in the variable loss, which should be a scalar. Use the Softmax           #
    # classifier loss.                                                          #
    #############################################################################
    pass
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

    # Backward pass: compute gradients
    grads = {}

    dscores = softmaxFucntion.copy()
    dscores[range(N), list(y)] -= 1
    dscores /= N
    dW2 = np.dot(reluLayer.T, dscores)
    db2 = np.sum(dscores, axis=0)

    drelu = np.dot(dscores, W2.T)
    drelu[reluLayer <= 0] = 0

    dW1 = np.dot(X.T, drelu)
    db1 = np.sum(drelu, axis=0)

    dW2 += reg*W2
    dW1 += reg*W1

    grads['W1'] = dW1
    grads['b1'] = db1
    grads['W2'] = dW2
    grads['b2'] = db2

    #############################################################################
    # TODO: Compute the backward pass, computing the derivatives of the weights #
    # and biases. Store the results in the grads dictionary. For example,       #
    # grads['W1'] should store the gradient on W1, and be a matrix of same size #
    #############################################################################
    pass
    #############################################################################
    #                              END OF YOUR CODE                             #
    #############################################################################

    return loss, grads

  def train(self, X, y, X_val, y_val,
            learning_rate=1e-3, learning_rate_decay=0.95,
            reg=5e-6, num_iters=100,
            batch_size=200, verbose=False):
    """
    Train this neural network using stochastic gradient descent.

    Inputs:
    - X: A numpy array of shape (N, D) giving training data.
    - y: A numpy array f shape (N,) giving training labels; y[i] = c means that
      X[i] has label c, where 0 <= c < C.
    - X_val: A numpy array of shape (N_val, D) giving validation data.
    - y_val: A numpy array of shape (N_val,) giving validation labels.
    - learning_rate: Scalar giving learning rate for optimization.
    - learning_rate_decay: Scalar giving factor used to decay the learning rate
      after each epoch.
    - reg: Scalar giving regularization strength.
    - num_iters: Number of steps to take when optimizing.
    - batch_size: Number of training examples to use per step.
    - verbose: boolean; if true print progress during optimization.
    """
    num_train = X.shape[0]
    iterations_per_epoch = max(num_train / batch_size, 1)

    # Use SGD to optimize the parameters in self.model
    loss_history = []
    train_acc_history = []
    val_acc_history = []

    for it in xrange(num_iters):
      idx_batch = np.random.choice(num_train, batch_size, replace=True)
      X_batch = X[idx_batch]
      y_batch = y[idx_batch]



      #########################################################################
      # TODO: Create a random minibatch of training data and labels, storing  #
      # them in X_batch and y_batch respectively.                             #
      #########################################################################
      pass
      #########################################################################
      #                             END OF YOUR CODE                          #
      #########################################################################

      # Compute loss and gradients using the current minibatch
      loss, grads = self.loss(X_batch, y=y_batch, reg=reg)
      loss_history.append(loss)
      self.params['W1'] -= learning_rate * grads['W1']
      self.params['b1'] -= learning_rate * grads['b1']
      self.params['W2'] -= learning_rate * grads['W2']
      self.params['b2'] -= learning_rate * grads['b2']


      #########################################################################
      # TODO: Use the gradients in the grads dictionary to update the         #
      # parameters of the network (stored in the dictionary self.params)      #
      # using stochastic gradient descent. You'll need to use the gradients   #
      # stored in the grads dictionary defined above.                         #
      #########################################################################
      pass
      #########################################################################
      #                             END OF YOUR CODE                          #
      #########################################################################

      if verbose and it % 100 == 0:
        print('iteration %d / %d: loss %f' % (it, num_iters, loss))

      # Every epoch, check train and val accuracy and decay learning rate.
      if it % iterations_per_epoch == 0:
        # Check accuracy
        train_acc = (self.predict(X_batch) == y_batch).mean()
        val_acc = (self.predict(X_val) == y_val).mean()
        train_acc_history.append(train_acc)
        val_acc_history.append(val_acc)

        # Decay learning rate
        learning_rate *= learning_rate_decay

    return {
      'loss_history': loss_history,
      'train_acc_history': train_acc_history,
      'val_acc_history': val_acc_history,
    }

  def predict(self, X):
    """
    Use the trained weights of this two-layer network to predict labels for
    data points. For each data point we predict scores for each of the C
    classes, and assign each data point to the class with the highest score.

    Inputs:
    - X: A numpy array of shape (N, D) giving N D-dimensional data points to
      classify.

    Returns:
    - y_pred: A numpy array of shape (N,) giving predicted labels for each of
      the elements of X. For all i, y_pred[i] = c means that X[i] is predicted
      to have class c, where 0 <= c < C.
    """
    y_pred = None

    W1 = self.params['W1']
    b1 = self.params['b1']
    W2 = self.params['W2']
    b2 = self.params['b2']
    layer1 = np.dot(X, W1)+b1
    reluLayer = np.maximum(0, layer1)
    scores = np.dot(reluLayer, W2) + b2
    y_pred = np.argmax(scores, axis=1)
    ###########################################################################
    # TODO: Implement this function; it should be VERY simple!                #
    ###########################################################################
    pass
    ###########################################################################
    #                              END OF YOUR CODE                           #
    ###########################################################################

    return y_pred

4.運行作業中的代碼
4.1得分函數驗證
下面我們繼續運行作業中的代碼,在編輯完得分函數後,讓我們來看一下得分函數的結果是否正確,利用提前計算好的結果進行對比查看差異,如果差異小於1e-7那麼認爲我們的得分函數沒有問題,

scores = net.loss(X)
print('Your scores:')
print(scores)
print()
print('correct scores:')
correct_scores = np.asarray([
  [-0.81233741, -1.27654624, -0.70335995],
  [-0.17129677, -1.18803311, -0.47310444],
  [-0.51590475, -1.01354314, -0.8504215 ],
  [-0.15419291, -0.48629638, -0.52901952],
  [-0.00618733, -0.12435261, -0.15226949]])
print(correct_scores)
print()

# The difference should be very small. We get < 1e-7
print('Difference between your scores and correct scores:')
print(np.sum(np.abs(scores - correct_scores)))
Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.6802720496109664e-08

可以看到差異小於給出的1e-7的標準,所以我們認爲得分函數是正確的。下面驗證損失的函數是否正確。
4.2損失函數驗證
這裏需要說明的是我們將reg改爲了0.1,來控制誤差小於1e-12,

loss, _ = net.loss(X, y, reg=0.1)
correct_loss = 1.30378789133

# should be very small, we get < 1e-12
print('Difference between your loss and correct loss:')

print(np.sum(np.abs(loss - correct_loss)))
Difference between your loss and correct loss:
1.7985612998927536e-13

4.3梯度驗證

from cs231n.gradient_check import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.05)

# these should all be less than 1e-8 or so
for param_name in grads:
    f = lambda W: net.loss(X, y, reg=0.05)[0]
    param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
    print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))
W1 max relative error: 3.561318e-09
b1 max relative error: 1.555470e-09
W2 max relative error: 3.440708e-09
b2 max relative error: 3.865091e-11

同樣當梯度的誤差小於1e-8也認爲梯度函數的編寫是沒有問題的。
4.4測試數據損失可視化
好了既然得分、損失、梯度都沒有問題,那麼現在開始訓練模型更新參數,我們可以觀察一下其損失函數的變化,

net = init_toy_model()
stats = net.train(X, y, X, y,
            learning_rate=1e-1, reg=5e-6,
            num_iters=100, verbose=False)

print('Final training loss: ', stats['loss_history'][-1])

# plot the loss history
plt.plot(stats['loss_history'])
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()

在這裏插入圖片描述
從圖上可以看到隨着迭代次數的增加,損失的值飛速下降,在15次跌倒後穩定在0的上面。所以整個訓練過程展現出來的結果是可以接受的,那麼對小樣本的測試數據弄完了,該來運行正式CIFAR-10圖像數據了。
4.5CIFAR-10數據的加載與訓練

from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = list(range(num_training, num_training + num_validation))
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)

我們這裏仍然把數據華爲訓練集、驗證集和測試集,通過驗證集來對訓練集訓練出的模型進行檢驗,

input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=1000, batch_size=200,
            learning_rate=1e-4, learning_rate_decay=0.95,
            reg=0.25, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print('Validation accuracy: ', val_acc)
iteration 0 / 1000: loss 2.302777
iteration 100 / 1000: loss 2.302281
iteration 200 / 1000: loss 2.296825
iteration 300 / 1000: loss 2.256667
iteration 400 / 1000: loss 2.229428
iteration 500 / 1000: loss 2.149038
iteration 600 / 1000: loss 2.078593
iteration 700 / 1000: loss 2.052749
iteration 800 / 1000: loss 1.976308
iteration 900 / 1000: loss 2.036181
Validation accuracy:  0.286

可以看到整體的準確度旨在0.286,結果不是很理想,那有的朋友會說還不如之前單獨用softmax的結果好呢,那我們大費周章的使用神經網絡真的結果還不如單獨用softmax結果好嘛?其實不然,
4.6對訓練集調試
我們對於這樣不好的結果常常會使用繪製loss或者準確率曲線去觀察整個的一個更新過程,或者繪製W1直觀的從圖像中看一下權重訓練的樣子,來判斷一下模型的各方面參數的問題,

# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()

在這裏插入圖片描述
我們可以從圖上看到損失函數到第200次迭代之前都還沒有開始明顯的變化,這根我們之前看到的損失函數形式不一樣,這個導致的原因有可能是學習率過小導致迭代速度過慢。再看準確度函數就在0.29周圍就開始平緩,可以適當增加隱藏層的神經元個數也就是維度來充分利用信息,這一點有一點像CNN中的增加捲積核的操作。
然後我們再來通過可視化W1看看訓練出來的權重是什麼樣子,

from cs231n.vis_utils import visualize_grid

# Visualize the weights of the network

def show_net_weights(net):
    W1 = net.params['W1']
    W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
    plt.gca().axis('off')
    plt.show()

show_net_weights(net)

在這裏插入圖片描述
從W1中可以看到整個模板很模糊,甚至很相似,效果並不好。因此在神經網絡中我們常常會花費許多時間去調節參數,從衆多參數中找到驗證集效果最好的參數進行預測,這個過程也叫參數的調整。
4.7任務:交叉驗證參數的調整
這時候我們需要對不同的參數進行訓練並進行驗證,選擇出驗證效果最好的參數,這裏的代碼需要我們自己來編寫,我們首先需要給出訓練參數的選擇,這裏參數就是將默認的參數進行提取。然後重置網絡格式,需要設置多個隱藏層的這樣一個維度選擇,於是代碼如下,

best_net = None

learning_rate = [1e-4, 4e-4, 8e-4, 16e-4, 32e-4]
learning_rate_decay = 0.9
regList = [0.25, 0.5, 0.75, 1.0]
num_iters = 4000
batch_size = 200

input_size = 32 * 32 * 3
hidden_size = [50, 100, 150]
num_classes = 10

best_net = None
best_lr = None
best_reg = None
best_hidden_size = None
best_val = -1
results = {}
for i in range(len(hidden_size)):
    for lr in learning_rate:
        for reg in regList:
            net = TwoLayerNet(input_size, hidden_size[i], num_classes)
            stat = net.train(X_train, y_train, X_val, y_val,
                            learning_rate = lr, learning_rate_decay=learning_rate_decay,
                            reg = reg, num_iters = num_iters,
                            batch_size=batch_size)
            train_acc = stat['train_acc_history'][-1]
            val_acc = stat['val_acc_history'][-1]
            if val_acc > best_val:
                best_net = net
                best_lr = lr
                best_reg = reg
                best_hidden_size = hidden_size[i]
                best_val = val_acc
            results[(lr, reg)] = train_acc, val_acc
            print('hidden_size:%d, lr %e reg %e train accuracy: %f val accuracy: %f' % (
                hidden_size[i], lr, reg, results[(lr, reg)][0], results[(lr, reg)][1]))
    
print('best_hidden_size:%d, best_lr %e best_reg %e train accuracy: %f val accuracy: %f' % (
         hidden_size[i], best_lr, best_reg, results[(best_lr, best_reg)][0], results[(best_lr, best_reg)][1]))                  

其實整體跟之前的很像,除了增加了一個對隱藏層的循環,下面展示一部分輸出結果,

hidden_size:50, lr 1.000000e-04 reg 2.500000e-01 train accuracy: 0.385000 val accuracy: 0.367000
hidden_size:50, lr 1.000000e-04 reg 5.000000e-01 train accuracy: 0.390000 val accuracy: 0.373000
hidden_size:50, lr 1.000000e-04 reg 7.500000e-01 train accuracy: 0.365000 val accuracy: 0.368000
hidden_size:100, lr 1.600000e-03 reg 5.000000e-01 train accuracy: 0.710000 val accuracy: 0.526000
hidden_size:100, lr 1.600000e-03 reg 7.500000e-01 train accuracy: 0.570000 val accuracy: 0.532000
hidden_size:100, lr 1.600000e-03 reg 1.000000e+00 train accuracy: 0.640000 val accuracy: 0.513000
hidden_size:150, lr 3.200000e-03 reg 2.500000e-01 train accuracy: 0.770000 val accuracy: 0.535000
hidden_size:150, lr 3.200000e-03 reg 5.000000e-01 train accuracy: 0.635000 val accuracy: 0.528000
hidden_size:150, lr 3.200000e-03 reg 7.500000e-01 train accuracy: 0.705000 val accuracy: 0.534000
best_hidden_size:150, best_lr 3.200000e-03 best_reg 2.500000e-01 train accuracy: 0.770000 val accuracy: 0.535000

從最優的結果我們可以看到選擇了150個隱藏層維度,學習率爲3.2e-3,還有reg=0.25,這裏也驗證了前面說的需要更多的隱藏層的維度和更大的學習率來提高精度,加快減小損失。
讓我們來看一下最佳的網絡權重是什麼樣子,

# visualize the weights of the best network
show_net_weights(best_net)

在這裏插入圖片描述4.8運行測試集
最後折騰了這麼久讓我們來運行一下測試集吧,檢驗的時候到了,文檔給出要求需要將準確度提高到48%以上,讓我們來運行一下代碼吧,

test_acc = (best_net.predict(X_test) == y_test).mean()
print('Test accuracy: ', test_acc)
Test accuracy:  0.546

很不錯!我們這裏的精度達到了54.6%。


結語
那麼我們就到此對於淺層神經網絡的研究就結束了,我們大致瞭解了神經網絡的一個運算過程,並體驗了調節參數的快樂。但對於部分地方掌握還是有問題,比如矩陣對於向量的求導,調整參數的更優範圍,正則化的詳細使用,這些還或多或少的存在問題,以及爲什麼用ReLU的具體細節。這都需要後期不斷你的學習去更新自己的認知,希望儘快完善。
謝謝閱讀。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章