numpy從0實現兩層神經網絡及數學推導,主要涉及矩陣相乘和鏈式求導以及幾個常見的激活函數形式。
1.logistics regression
可以把logistics單元看作是最簡單的神經網絡,具體這裏就不在介紹了,可以看這篇文章
2.neural network
這裏網絡分爲三層,輸入層,隱含層,輸出層。以圖中爲例來使用numpy實現前向傳播,反向傳播和梯度更新。
3.activation function
激活函數(Activation functions)對於人工神經網絡模型去學習、理解非常複雜和非線性的函數來說具有十分重要的作用。它們將非線性特性引入到我們的網絡中,引入激活函數是爲了增加神經網絡模型的非線性。沒有激活函數的每層都相當於矩陣相乘。就算你疊加了若干層之後,無非還是個矩陣相乘罷了。
(1) sigmoid
numpy實現:
def sigmoid(x):
return 1 / (1+np.exp(-x))
(2) tanh
numpy實現
def tanh(x):
return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
4.loss function
損失函數(loss function)是將隨機事件或其有關隨機變量的取值映射爲非負實數以表示該隨機事件的“風險”或“損失”的函數。在應用中,損失函數通常作爲學習準則與優化問題相聯繫,即通過最小化損失函數求解和評估模型。
Cross Entropy
交叉熵損失是分類任務中的常用損失函數 ,可以用在二分類和多分類中,形式有所不同。
二分類中:
多分類中:
5.forward propagation
(1)初始化參數
首先網絡中除去輸入層還有兩層,也就是我們需要初始化隱含層的權重W和偏置b,輸出層的權重W和偏置b。每層權重初始化爲隨機數,偏置設置爲0.
numpy實現
def init_parameters(input_size,hidden_size,output_size):
w1 = np.random.rand(hidden_size,input_size) * 0.01
b1 = np.zeros((hidden_size,1))
w2 = np.random.rand(output_size,hidden_size) * 0.01
b2 = np.zeros((output_size,1))
parameters = {
"w1":w1,
"b1":b1,
"w2":w2,
"b2":b2
}
return parameters
(2)前向傳播
計算公式:
numpy實現
def forward_propagation(X,parameters):
w1 = parameters["w1"]
b1 = parameters["b1"]
w2 = parameters["w2"]
b2 = parameters["b2"]
Z1 = np.dot(w1,X) + b1
A1 = tanh(Z1)
Z2 = np.dot(w2,A1) + b2
A2 = sigmoid(Z2)
cache = {
"Z1":Z1,
"A1":A1,
"Z2":Z2,
"A2":A2,
}
return A2,cache
6.background propagation
(1)計算損失
numpy實現
def compute_cost(A2,Y):
m = Y.shape[1]
loss = -(np.multiply(Y,np.log(A2)) + np.multiply(1-Y,np.log(1-A2)))
cost = np.sum(loss) / m
cost = np.squeeze(cost)
return cost
(2)反向傳播
numpy實現
def back_propagation(parameters, cache, X, Y):
w1 = parameters["w1"]
b1 = parameters["b1"]
w2 = parameters["w2"]
b2 = parameters["b2"]
Z1 = cache["Z1"]
A1 = cache["A1"]
Z2 = cache["Z2"]
A2 = cache["A2"]
dZ2 = A2 - Y
dw2 = np.dot(dZ2,A1.T) / m
db2 = np.sum(dZ2, axis=1, keepdims=True) / m
dZ1 = np.multiply(np.dot(w2.T, dZ2), 1 - np.power(A1, 2))
dw1 = np.dot(dZ1,X.T) / m
db1 = np.sum(dZ1, axis=1, keepdims=True) / m
grads = {"dw1": dw1,
"db1": db1,
"dw2": dw2,
"db2": db2}
return grads
7.gradient descent
梯度更新規則:
更新公式:
numpy實現
def gradient_descent(parameters, grads, learning_rate):
w1 = parameters["w1"]
b1 = parameters["b1"]
w2 = parameters["w2"]
b2 = parameters["b2"]
dw1 = grads["dw1"]
db1 = grads["db1"]
dw2 = grads["dw2"]
db2 = grads["db2"]
w1 = w1 - learning_rate * dw1
b1 = b1 - learning_rate * db1
w2 = w2 - learning_rate * dw2
b2 = b2 - learning_rate * db2
parameters = {
"w1":w1,
"b1":b1,
"w2":w2,
"b2":b2
}
return parameters
8.訓練網絡
def model(X,Y,hidden_size,num_iterations):
input_size = X.shape[0]
output_size = Y.shape[0]
parameters = init_parameters(input_size,hidden_size,output_size)
for i in range(num_iterations):
A2,cache = forward_propagation(X,parameters)
cost = compute_cost(A2,Y)
grads = back_propagation(parameters,cache,X,Y)
parameters = gradient_descent(parameters,grads,0.5)
if i % 1000 == 0:
print("step:%i %f"%(i,cost))
return parameters
X_assess, Y_assess = nn_model_test_case() #加載訓練數據
parameters = model(X_assess, Y_assess, 4, 10000)
執行結果:
step:0 0.693230
step:1000 0.642704
step:2000 0.451822
step:3000 0.128380
step:4000 0.055486
step:5000 0.033372
step:6000 0.023442
step:7000 0.017932
step:8000 0.014464
step:9000 0.012093