反向傳播
使用反向傳播是爲了防止路徑的重複計算。
爲了方便,我們將之前的一個前向傳播的過程複製過來:
Z1=W1X+b1
H1=RELU(Z1)
Z2=W2H1+b2
H2=RELU(Z2)
Z3=W3H2+b3
y^=sigmoid(Z3)
同時,將損失函數也複製過來
J(w,b)=m1i=1∑mL(y^(i),y(i))=−m1i=1∑m[y(i)log(y^(i))+(1−y(i))log(1−y^(i))]+2mλ∥w∥F2
注意 爲了直觀,沒有寫出來對矩陣求導的轉置。
首先第一件事是對z3進行求導
∂z3∂J=∂y^∂J∂z3∂y^=y^−y=δ3
然後我們開始對參數w和b進行求導
∂w3∂J=∂z3∂J∂w3∂z3∂b3∂J=δ3H2+m1λw3=∂z3∂J∂b3∂z3=δ3
我們完成了對w3和b3這兩個參數進行求導,後面基本類似,就是運用鏈式求導的法則一層層的往前求
∂z2∂J=∂z3∂J∂H2∂z3∂z2∂H2∂w2∂J=∂z2∂J∂b2∂J=δ3w3relu′(z2)=δ2∂w2∂z2=δ2H1+m1λw2=∂z2∂J∂b2∂z2=δ2
對於W1和b1也一樣
∂z1∂J=∂z2∂J∂H1∂z2∂z1∂H1∂w1∂J=∂z1∂J∂w1∂z1∂b1∂J=∂z1∂J∂b1∂z1=δ2w2 relu’t (z1)=δ1=δ1x+m1λw1=δ1
首先注意一點,一個標量對一個矩陣求導,其維度不變
∂w3∂J=∂z3∂J∂w3∂z3=δ3H2
import numpy as np
def backward_propagation(X, Y, Weight, bias, H, activation, ):
m = X.shape[1]
gradients = {}
L = len(Weight)
gradients['dZ'+str(L)] = H['H'+str(L)] - Y
gradients['dW' + str(L)] = 1./m * np.dot(gradients['dZ'+str(L)],H['H'+str(L-1)].T) + 1./m* lambd * Weight['W']
gradients['db' + str(L)] = 1./m * np.dot(gradients['dZ'+str(L)],axis = 1,keepdims = True)
for l in range(L-1,0,-1):
gradients['dH' + str(l)] = np.dot(Weight['W'+str(l+1)].T,gradients['dZ'+str(l+1)])
if activation[l-1] == 'relu':
gradients['dZ'+str(l)] = np.multiarray(gradients['dH' + str(l)],np.int64(H['H'+str(1)]>0))
elif activation[l-1] == 'tanh':
gradients['dZ' + str(l)] = np.multiarray(gradients['dH' + str(l)], 1-np.power(H['H'+str(1)],2))
gradients['dW' + str(l)] = 1. / m * np.dot(gradients['dZ' + str(L)], H['H' + str(L - 1)].T) + 1. / m * lambd * \
Weight['W']
gradients['db' + str(l)] = 1. / m * np.dot(gradients['dZ' + str(L)], axis=1, keepdims=True)
return gradients
def updata_parameters(Weight,bias,gradients ,lr = 0.1):
for i in range(1,len(Weight)+1):
Weight['W'+str(i)] -= lr*gradients['dW'+str(i)]
bias['b'+str(i)] -= lr * gradients['db'+str(i)]
return Weight,bias