cs231n assignment1 two-layer-net

two-layer-net

首先完成神經網絡對scores和損失函數的計算,其中激活函數使用RELU函數,即max(0,x)函數。
neural_net.py的loss()函數

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
h1 = np.maximum(0,X.dot(W1) + b1)
scores = h1.dot(W2) + b2
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# 求指數 (N,C)
exp_scores = np.exp(scores)
# 求和,變爲 (N,1)
row_sum = np.sum(exp_scores, axis=1).reshape(N, 1)
norm_scores = exp_scores / row_sum
data_loss = - 1 / N * np.sum(np.log(norm_scores[np.arange(N),y]))
reg_loss = 0.5 * reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
loss = data_loss + reg_loss
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

接下來是反向傳播計算梯度,這部分有一定的難度,下面我將我自己的理解記錄下來。
整體算法步驟,使用鏈式求導法則。每一個變量的梯度與該變量的原始大小保持一致。
下面是計算圖,從左到右爲神經網絡正向傳遞,橫線上方爲計算得到的值;從右到左爲反向傳播,橫線下方爲梯度。

在這裏插入圖片描述
約定,所有的導數均爲Loss對該變量的導數,即dLossd\frac{d_{Loss}}{d_{變量}},因此在程序中dLossd_{Loss}省略不寫,只寫出分母。

  • 計算dscoresd_{scores},即數學表達爲dLossdscores\frac{d_{Loss}}{d_{scores}},因爲在data_loss計算過程中將正確分類的得分減去了,故在計算導數過程中需要對正確分類的導數值減一。
  • db2d_{b2}=dscoresd_{scores},顯然。
  • 到了乘法器,可知dLossdh1\frac{d_{Loss}}{d_{h_1}} = dLossdscoresdscoresdh1\frac{d_{Loss}}{d_{scores}}\frac{d_{scores}}{d_{h_1}},本地導數dscoresdh1=W2\frac{d_{scores}}{d_{h_1}}=W_2,故得dh1=W2d_{h_1} = W_2 * dscoresd_{scores}
  • 同理對dW2=h1d_{W_2} = h_1 * dscoresd_{scores}
  • 到了max函數,本地導數結果是:不大於0的導數爲0,大於0的導數爲本身的值。故dRelu=(h1>0)dh1d_{Relu} = (h_1 > 0) * d_{h_1}
  • 然後是db1d_{b_1}dRelud_{Relu}
  • 最後是個乘法器,與上面類似。
  • 最後對正則化項分別對W1W_1W2W_2求導,即d0.5regW1W1dW1=regW1\frac{d_{0.5 *reg * W_1 * W_1}}{d_{W_1}} = reg * W_1 ,同理可以求出dW2{d_{W_2}} 的正則化項。
    代碼如下:
dscores = norm_scores.copy()
dscores[range(N),y] -= 1
dscores /= N                        #(N,C)
db2 = np.sum(dscores,axis=0)        #(C,)
dh1 = dscores.dot(W2.T)             #(N,H)
dW2 = h1.T.dot(dscores) + reg * W2  #(H,C)
dRelu = (h1 > 0) * dh1              #(N,H)
dW1 = X.T.dot(dRelu) + reg * W1     #(D,H)
db1 = np.sum(dRelu,axis=0)          #(H,)
grads['b2'] = db2
grads['W2'] = dW2
grads['W1'] = dW1
grads['b1'] = db1

Training data
主要完成隨機選擇數據和對參數進行更新
neural_net.py中的train()

idx = np.random.choice(num_train,batch_size,replace=True)
X_batch = X[idx,:]
y_batch = y[idx]
self.params['W2'] += -learning_rate * grads['W2']
self.params['b2'] += -learning_rate * grads['b2']
self.params['W1'] += -learning_rate * grads['W1']
self.params['b1'] += -learning_rate * grads['b1']

predict()

scores = self.loss(X)
y_pred = np.argmax(scores, axis=1)

接下來通過自己驗證來選擇超參,驗證步驟:

  1. 根據學習曲線得到該模型處於欠擬合狀態,故首先調整隱藏層的大小,寫一個循環對隱藏層進行循環測試。
  2. 根據梯度下降圖像來看,學習率過低,對學習率增大並進行循環。
  3. 增加迭代次數
  4. 減小正則化參數

效果:當hidden_size=150, reg=0.09, learning_rate=1e-3時,準確率達到53.7%
在這裏插入圖片描述
代碼:

input_size = 32 * 32 * 3
hidden_size = [100,125,150]
num_classes = 10
reg = [0.03,0.05,0.09]
learing_rate = [1e-3]

best_acc = 0.40
for hs in hidden_size:
    net = TwoLayerNet(input_size, hs, num_classes)

    
    for r in reg:
        for lr in learing_rate:
            # Train the network
            stats = net.train(X_train, y_train, X_val, y_val,
                        num_iters=2000, batch_size=200,
                        learning_rate=lr, learning_rate_decay=0.95,
                        reg=r, verbose=False)
            # Predict on the validation set
            val_acc = (net.predict(X_val) == y_val).mean()
            print('hidden_size:%f,reg:%f,learning_rate:%f'%(hs,r,lr))
            print('Validation accuracy: ', val_acc)
            if (val_acc > best_acc):
                best_acc = val_acc
                best_net = net

            plt.subplot(2, 1, 1)
            plt.plot(stats['loss_history'])
            plt.title('Loss history')
            plt.xlabel('Iteration')
            plt.ylabel('Loss')

            plt.subplot(2, 1, 2)
            plt.plot(stats['train_acc_history'], label='train')
            plt.plot(stats['val_acc_history'], label='val')
            plt.title('Classification accuracy history')
            plt.xlabel('Epoch')
            plt.ylabel('Classification accuracy')
            plt.legend()
            plt.show()
            pass

將網絡中W1可視化:
在這裏插入圖片描述
最後在測試數據集上的準確率爲:Test accuracy: 0.545

Inline Question

Inline Question

Now that you have trained a Neural Network classifier, you may find that your testing accuracy is much lower than the training accuracy. In what ways can we decrease this gap? Select all that apply.

  1. Train on a larger dataset.
  2. Add more hidden units.
  3. Increase the regularization strength.
  4. None of the above.

YourAnswer:\color{blue}{\textit Your Answer:}
1、2、3
YourExplanation:\color{blue}{\textit Your Explanation:}
當間隙很大時,很可能發生了過擬合,因此增加訓練集、增加隱藏層單元數、增加正則化參數都可以降低過擬合程度,從而減小間隙大小。


參考文章:
cs231n的第一次作業2層神經網絡

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章