two-layer-net
首先完成神經網絡對scores和損失函數的計算,其中激活函數使用RELU函數,即max(0,x)函數。
neural_net.py的loss()函數
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
h1 = np.maximum(0,X.dot(W1) + b1)
scores = h1.dot(W2) + b2
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# 求指數 (N,C)
exp_scores = np.exp(scores)
# 求和,變爲 (N,1)
row_sum = np.sum(exp_scores, axis=1).reshape(N, 1)
norm_scores = exp_scores / row_sum
data_loss = - 1 / N * np.sum(np.log(norm_scores[np.arange(N),y]))
reg_loss = 0.5 * reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
loss = data_loss + reg_loss
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
接下來是反向傳播計算梯度,這部分有一定的難度,下面我將我自己的理解記錄下來。
整體算法步驟,使用鏈式求導法則。每一個變量的梯度與該變量的原始大小保持一致。
下面是計算圖,從左到右爲神經網絡正向傳遞,橫線上方爲計算得到的值;從右到左爲反向傳播,橫線下方爲梯度。
約定,所有的導數均爲Loss對該變量的導數,即,因此在程序中省略不寫,只寫出分母。
- 計算,即數學表達爲,因爲在data_loss計算過程中將正確分類的得分減去了,故在計算導數過程中需要對正確分類的導數值減一。
- =,顯然。
- 到了乘法器,可知 = ,本地導數,故得 * 。
- 同理對 * 。
- 到了max函數,本地導數結果是:不大於0的導數爲0,大於0的導數爲本身的值。故
- 然後是同。
- 最後是個乘法器,與上面類似。
- 最後對正則化項分別對和求導,即 ,同理可以求出 的正則化項。
代碼如下:
dscores = norm_scores.copy()
dscores[range(N),y] -= 1
dscores /= N #(N,C)
db2 = np.sum(dscores,axis=0) #(C,)
dh1 = dscores.dot(W2.T) #(N,H)
dW2 = h1.T.dot(dscores) + reg * W2 #(H,C)
dRelu = (h1 > 0) * dh1 #(N,H)
dW1 = X.T.dot(dRelu) + reg * W1 #(D,H)
db1 = np.sum(dRelu,axis=0) #(H,)
grads['b2'] = db2
grads['W2'] = dW2
grads['W1'] = dW1
grads['b1'] = db1
Training data
主要完成隨機選擇數據和對參數進行更新
neural_net.py中的train()
idx = np.random.choice(num_train,batch_size,replace=True)
X_batch = X[idx,:]
y_batch = y[idx]
self.params['W2'] += -learning_rate * grads['W2']
self.params['b2'] += -learning_rate * grads['b2']
self.params['W1'] += -learning_rate * grads['W1']
self.params['b1'] += -learning_rate * grads['b1']
predict()
scores = self.loss(X)
y_pred = np.argmax(scores, axis=1)
接下來通過自己驗證來選擇超參,驗證步驟:
- 根據學習曲線得到該模型處於欠擬合狀態,故首先調整隱藏層的大小,寫一個循環對隱藏層進行循環測試。
- 根據梯度下降圖像來看,學習率過低,對學習率增大並進行循環。
- 增加迭代次數
- 減小正則化參數
效果:當hidden_size=150, reg=0.09, learning_rate=1e-3時,準確率達到53.7%
代碼:
input_size = 32 * 32 * 3
hidden_size = [100,125,150]
num_classes = 10
reg = [0.03,0.05,0.09]
learing_rate = [1e-3]
best_acc = 0.40
for hs in hidden_size:
net = TwoLayerNet(input_size, hs, num_classes)
for r in reg:
for lr in learing_rate:
# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
num_iters=2000, batch_size=200,
learning_rate=lr, learning_rate_decay=0.95,
reg=r, verbose=False)
# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print('hidden_size:%f,reg:%f,learning_rate:%f'%(hs,r,lr))
print('Validation accuracy: ', val_acc)
if (val_acc > best_acc):
best_acc = val_acc
best_net = net
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Classification accuracy')
plt.legend()
plt.show()
pass
將網絡中W1可視化:
最後在測試數據集上的準確率爲:Test accuracy: 0.545
Inline Question
Inline Question
Now that you have trained a Neural Network classifier, you may find that your testing accuracy is much lower than the training accuracy. In what ways can we decrease this gap? Select all that apply.
- Train on a larger dataset.
- Add more hidden units.
- Increase the regularization strength.
- None of the above.
1、2、3
當間隙很大時,很可能發生了過擬合,因此增加訓練集、增加隱藏層單元數、增加正則化參數都可以降低過擬合程度,從而減小間隙大小。
參考文章:
cs231n的第一次作業2層神經網絡