cs231n assignment1 two-layer-net

two-layer-net

首先完成神經網絡對scores和損失函數的計算，其中激活函數使用RELU函數，即max(0,x)函數。
neural_net.py的loss()函數

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
h1 = np.maximum(0,X.dot(W1) + b1)
scores = h1.dot(W2) + b2
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
# 求指數 (N,C)
exp_scores = np.exp(scores)
# 求和，變爲 (N,1)
row_sum = np.sum(exp_scores, axis=1).reshape(N, 1)
norm_scores = exp_scores / row_sum
data_loss = - 1 / N * np.sum(np.log(norm_scores[np.arange(N),y]))
reg_loss = 0.5 * reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
loss = data_loss + reg_loss
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

接下來是反向傳播計算梯度，這部分有一定的難度，下面我將我自己的理解記錄下來。
整體算法步驟，使用鏈式求導法則。每一個變量的梯度與該變量的原始大小保持一致。
下面是計算圖，從左到右爲神經網絡正向傳遞，橫線上方爲計算得到的值；從右到左爲反向傳播，橫線下方爲梯度。

約定，所有的導數均爲Loss對該變量的導數，即 $\frac{d_{Loss}}{d_{變量}}$ ，因此在程序中 $d_{Loss}$ 省略不寫，只寫出分母。

計算 $d_{scores}$ ，即數學表達爲 $\frac{d_{Loss}}{d_{scores}}$ ，因爲在data_loss計算過程中將正確分類的得分減去了，故在計算導數過程中需要對正確分類的導數值減一。
$d_{b2}$ = $d_{scores}$ ,顯然。
到了乘法器，可知 $\frac{d_{Loss}}{d_{h_1}}$ = $\frac{d_{Loss}}{d_{scores}}\frac{d_{scores}}{d_{h_1}}$ ，本地導數 $\frac{d_{scores}}{d_{h_1}}=W_2$ ,故得 $d_{h_1} = W_2$ * $d_{scores}$ 。
同理對 $d_{W_2} = h_1$ * $d_{scores}$ 。
到了max函數，本地導數結果是：不大於0的導數爲0，大於0的導數爲本身的值。故 $d_{Relu} = (h_1 > 0) * d_{h_1}$
然後是 $d_{b_1}$ 同 $d_{Relu}$ 。
最後是個乘法器，與上面類似。
最後對正則化項分別對 $W_1$ 和 $W_2$ 求導，即 $\frac{d_{0.5 *reg * W_1 * W_1}}{d_{W_1}} = reg * W_1$ ,同理可以求出 ${d_{W_2}}$ 的正則化項。
代碼如下：

dscores = norm_scores.copy()
dscores[range(N),y] -= 1
dscores /= N                        #(N,C)
db2 = np.sum(dscores,axis=0)        #(C,)
dh1 = dscores.dot(W2.T)             #(N,H)
dW2 = h1.T.dot(dscores) + reg * W2  #(H,C)
dRelu = (h1 > 0) * dh1              #(N,H)
dW1 = X.T.dot(dRelu) + reg * W1     #(D,H)
db1 = np.sum(dRelu,axis=0)          #(H,)
grads['b2'] = db2
grads['W2'] = dW2
grads['W1'] = dW1
grads['b1'] = db1

Training data
主要完成隨機選擇數據和對參數進行更新
neural_net.py中的train()

idx = np.random.choice(num_train,batch_size,replace=True)
X_batch = X[idx,:]
y_batch = y[idx]

self.params['W2'] += -learning_rate * grads['W2']
self.params['b2'] += -learning_rate * grads['b2']
self.params['W1'] += -learning_rate * grads['W1']
self.params['b1'] += -learning_rate * grads['b1']

predict()

scores = self.loss(X)
y_pred = np.argmax(scores, axis=1)

接下來通過自己驗證來選擇超參，驗證步驟：

根據學習曲線得到該模型處於欠擬合狀態，故首先調整隱藏層的大小，寫一個循環對隱藏層進行循環測試。
根據梯度下降圖像來看，學習率過低，對學習率增大並進行循環。
增加迭代次數
減小正則化參數

效果：當hidden_size=150, reg=0.09, learning_rate=1e-3時，準確率達到53.7%

代碼：

input_size = 32 * 32 * 3
hidden_size = [100,125,150]
num_classes = 10
reg = [0.03,0.05,0.09]
learing_rate = [1e-3]

best_acc = 0.40
for hs in hidden_size:
    net = TwoLayerNet(input_size, hs, num_classes)

    
    for r in reg:
        for lr in learing_rate:
            # Train the network
            stats = net.train(X_train, y_train, X_val, y_val,
                        num_iters=2000, batch_size=200,
                        learning_rate=lr, learning_rate_decay=0.95,
                        reg=r, verbose=False)
            # Predict on the validation set
            val_acc = (net.predict(X_val) == y_val).mean()
            print('hidden_size:%f,reg:%f,learning_rate:%f'%(hs,r,lr))
            print('Validation accuracy: ', val_acc)
            if (val_acc > best_acc):
                best_acc = val_acc
                best_net = net

            plt.subplot(2, 1, 1)
            plt.plot(stats['loss_history'])
            plt.title('Loss history')
            plt.xlabel('Iteration')
            plt.ylabel('Loss')

            plt.subplot(2, 1, 2)
            plt.plot(stats['train_acc_history'], label='train')
            plt.plot(stats['val_acc_history'], label='val')
            plt.title('Classification accuracy history')
            plt.xlabel('Epoch')
            plt.ylabel('Classification accuracy')
            plt.legend()
            plt.show()
            pass

將網絡中W1可視化：

最後在測試數據集上的準確率爲：Test accuracy: 0.545

Inline Question

Inline Question

Now that you have trained a Neural Network classifier, you may find that your testing accuracy is much lower than the training accuracy. In what ways can we decrease this gap? Select all that apply.

Train on a larger dataset.
Add more hidden units.
Increase the regularization strength.
None of the above.

$\color{blue}{\textit Your Answer:}$
1、2、3
$\color{blue}{\textit Your Explanation:}$
當間隙很大時，很可能發生了過擬合，因此增加訓練集、增加隱藏層單元數、增加正則化參數都可以降低過擬合程度，從而減小間隙大小。

參考文章：
cs231n的第一次作業2層神經網絡

cs231n assignment1 two-layer-net

two-layer-net

Inline Question

win11關閉自動檢測病毒刪文件

千兆寬帶實際網速能到達多少？

Ubuntu安裝破解版MATLAB及問題解決

吳恩達機器學習第六週測驗及編程作業和選做題

貪心-埃及分數

吳恩達機器學習第三章測試及編程練習

吳恩達機器學習第二週測試及編程練習

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結