算法強化 —— 反向傳播

反向傳播

使用反向傳播是爲了防止路徑的重複計算。
爲了方便,我們將之前的一個前向傳播的過程複製過來:
Z1=W1X+b1Z_1 = W_1X+b_1
H1=RELU(Z1)H_1 = RELU(Z_1)
Z2=W2H1+b2Z_2 = W_2H_1 + b_2
H2=RELU(Z2)H_2 = RELU(Z_2)
Z3=W3H2+b3Z_3 = W_3H_2+b_3
y^=sigmoid(Z3)\hat{y} = sigmoid(Z_3)
同時,將損失函數也複製過來
J(w,b)=1mi=1mL(y^(i),y(i))=1mi=1m[y(i)log(y^(i))+(1y(i))log(1y^(i))]+λ2mwF2J(w, b)=\frac{1}{m} \sum_{i=1}^{m} L\left(\hat{y}^{(i)}, y^{(i)}\right)=-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \log \left(\hat{y}^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-\hat{y}^{(i)}\right)\right]+\frac{\lambda}{2 m}\|w\|_{F}^{2}
注意 爲了直觀,沒有寫出來對矩陣求導的轉置。

首先第一件事是對z3z_3進行求導
Jz3=Jy^y^z3=y^y=δ3\frac{\partial J}{\partial z_{3}}=\frac{\partial J}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial z_{3}}=\hat{y}-y=\delta_{3}
然後我們開始對參數w和b進行求導
Jw3=Jz3z3w3=δ3H2+1mλw3Jb3=Jz3z3b3=δ3\begin{aligned} \frac{\partial J}{\partial w_{3}}=\frac{\partial J}{\partial z_{3}} \frac{\partial z_{3}}{\partial w_{3}} &=\delta_{3} H_{2}+\frac{1}{m} \lambda w_{3} \\ \frac{\partial J}{\partial b_{3}} &=\frac{\partial J}{\partial z_{3}} \frac{\partial z_{3}}{\partial b_{3}}=\delta_{3} \end{aligned}

我們完成了對w3和b3這兩個參數進行求導,後面基本類似,就是運用鏈式求導的法則一層層的往前求
Jz2=Jz3z3H2H2z2=δ3w3relu(z2)=δ2Jw2=Jz2z2w2=δ2H1+1mλw2Jb2=Jz2z2b2=δ2\begin{aligned} \frac{\partial J}{\partial z_{2}}=\frac{\partial J}{\partial z_{3}} \frac{\partial z_{3}}{\partial H_{2}} \frac{\partial H_{2}}{\partial z_{2}} &=\delta_{3} w_{3} r e l u^{\prime}\left(z_{2}\right)=\delta_{2} \\ \frac{\partial J}{\partial w_{2}}=\frac{\partial J}{\partial z_{2}} & \frac{\partial z_{2}}{\partial w_{2}}=\delta_{2} H_{1}+\frac{1}{m} \lambda w_{2} \\ \frac{\partial J}{\partial b_{2}} &=\frac{\partial J}{\partial z_{2}} \frac{\partial z_{2}}{\partial b_{2}}=\delta_{2} \end{aligned}
對於W1和b1也一樣
Jz1=Jz2z2H1H1z1=δ2w2 relu’t (z1)=δ1Jw1=Jz1z1w1=δ1x+1mλw1Jb1=Jz1z1b1=δ1\begin{aligned} \frac{\partial J}{\partial z_{1}}=\frac{\partial J}{\partial z_{2}} \frac{\partial z_{2}}{\partial H_{1}} \frac{\partial H_{1}}{\partial z_{1}} &=\delta_{2} w_{2} \text { relu't }\left(z_{1}\right)=\delta_{1} \\ \frac{\partial J}{\partial w_{1}}=\frac{\partial J}{\partial z_{1}} \frac{\partial z_{1}}{\partial w_{1}} &=\delta_{1} x+\frac{1}{m} \lambda w_{1} \\ \frac{\partial J}{\partial b_{1}}=\frac{\partial J}{\partial z_{1}} \frac{\partial z_{1}}{\partial b_{1}} &=\delta_{1} \end{aligned}

首先注意一點,一個標量對一個矩陣求導,其維度不變
Jw3=Jz3z3w3=δ3H2\frac{\partial J}{\partial w_{3}}=\frac{\partial J}{\partial z_{3}} \frac{\partial z_{3}}{\partial w_{3}}=\delta_{3} H_{2}

import numpy as np
def backward_propagation(X, Y, Weight, bias, H, activation, ):
    m = X.shape[1]
    gradients = {}
    L = len(Weight)
    gradients['dZ'+str(L)] = H['H'+str(L)] - Y
    gradients['dW' + str(L)] = 1./m * np.dot(gradients['dZ'+str(L)],H['H'+str(L-1)].T) + 1./m* lambd * Weight['W']
    gradients['db' + str(L)] = 1./m * np.dot(gradients['dZ'+str(L)],axis = 1,keepdims = True)
    for l in range(L-1,0,-1):
        gradients['dH' + str(l)] = np.dot(Weight['W'+str(l+1)].T,gradients['dZ'+str(l+1)])
        if activation[l-1] == 'relu':
            gradients['dZ'+str(l)] = np.multiarray(gradients['dH' + str(l)],np.int64(H['H'+str(1)]>0))
        elif activation[l-1] == 'tanh':
            gradients['dZ' + str(l)] = np.multiarray(gradients['dH' + str(l)], 1-np.power(H['H'+str(1)],2))

        gradients['dW' + str(l)] = 1. / m * np.dot(gradients['dZ' + str(L)], H['H' + str(L - 1)].T) + 1. / m * lambd * \
                                   Weight['W']
        gradients['db' + str(l)] = 1. / m * np.dot(gradients['dZ' + str(L)], axis=1, keepdims=True)

    return gradients


def updata_parameters(Weight,bias,gradients ,lr = 0.1):
    # 更新參數,lr爲leaning rate 代表參數的學習率
    # 太小會使網絡收斂很慢,太大會使網絡在最低點附近徘徊不會收斂
    for i in range(1,len(Weight)+1):
        Weight['W'+str(i)] -= lr*gradients['dW'+str(i)]
        bias['b'+str(i)] -= lr * gradients['db'+str(i)]
    return Weight,bias

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章