python numpy 從零實現簡單神經網絡

numpy從0實現兩層神經網絡及數學推導,主要涉及矩陣相乘和鏈式求導以及幾個常見的激活函數形式。

1.logistics regression

可以把logistics單元看作是最簡單的神經網絡,具體這裏就不在介紹了,可以看這篇文章

2.neural network

在這裏插入圖片描述

這裏網絡分爲三層,輸入層,隱含層,輸出層。以圖中爲例來使用numpy實現前向傳播,反向傳播和梯度更新。

3.activation function

激活函數(Activation functions)對於人工神經網絡模型去學習、理解非常複雜和非線性的函數來說具有十分重要的作用。它們將非線性特性引入到我們的網絡中,引入激活函數是爲了增加神經網絡模型的非線性。沒有激活函數的每層都相當於矩陣相乘。就算你疊加了若干層之後,無非還是個矩陣相乘罷了。

(1) sigmoid

g(x)=11+exg(x)=g(x)(1g(x)) g(x) = \frac{1}{1+e^{-x}}\\ g(x)^\prime = g(x)(1-g(x))

numpy實現:

def sigmoid(x):
    return 1 / (1+np.exp(-x))

(2) tanh

g(x)=exexex+exg(x)=1g(x)2 g(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}}\\ g(x)^\prime = 1- g(x)^2

numpy實現

def tanh(x):
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))

4.loss function

損失函數(loss function)是將隨機事件或其有關隨機變量的取值映射爲非負實數以表示該隨機事件的“風險”或“損失”的函數。在應用中,損失函數通常作爲學習準則與優化問題相聯繫,即通過最小化損失函數求解和評估模型。

Cross Entropy

交叉熵損失是分類任務中的常用損失函數 ,可以用在二分類和多分類中,形式有所不同。

二分類中:
L(y^,y)=(ylog(y^)+(1y)log(1y^)) L(\hat{y},y) = -(ylog(\hat{y})+(1-y)log(1-\hat{y}))
多分類中:
L(y^,y)=yilog(yi^) L(\hat{y},y) =-y_ilog(\hat{y_i})

5.forward propagation

(1)初始化參數

首先網絡中除去輸入層還有兩層,也就是我們需要初始化隱含層的權重W和偏置b,輸出層的權重W和偏置b。每層權重初始化爲隨機數,偏置設置爲0.
W1(nh,nx),b1(nh,1)W2(ny,nh),b2(ny,1)nxnhny W1 -- (n_h, n_x), b1 -- (n_h, 1)\\ W2 -- (n_y, n_h), b2 -- (n_y, 1)\\ n_x -- 輸入層, n_h -- 隱含層, n_y -- 輸出層
numpy實現

def init_parameters(input_size,hidden_size,output_size):
    w1 = np.random.rand(hidden_size,input_size) * 0.01
    b1 = np.zeros((hidden_size,1))
    w2 = np.random.rand(output_size,hidden_size) * 0.01
    b2 = np.zeros((output_size,1))
    
    parameters = {
        "w1":w1,
        "b1":b1,
        "w2":w2,
        "b2":b2
    }
    return parameters

(2)前向傳播

計算公式:
z[1](i)=W[1]x(i)+b[1](i)a[1](i)=tanh(z[1](i))z[2](i)=W[2]a[1](i)+b[2](i)y^(i)=a[2](i)=sigmoid(z[2](i)) z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1] (i)}\\ a^{[1] (i)} = \tanh(z^{[1] (i)})\\ z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}\\ \hat{y}^{(i)} = a^{[2] (i)} = sigmoid(z^{ [2] (i)})\\
numpy實現

def forward_propagation(X,parameters):
    w1 = parameters["w1"]
    b1 = parameters["b1"]
    w2 = parameters["w2"]
    b2 = parameters["b2"]
    
    Z1 = np.dot(w1,X) + b1
    A1 = tanh(Z1)
    
    Z2 = np.dot(w2,A1) + b2
    A2 = sigmoid(Z2)
    
    cache = {
        "Z1":Z1,
        "A1":A1,
        "Z2":Z2,
        "A2":A2,
    }
    return A2,cache

6.background propagation

(1)計算損失

J=1mi=0m(y(i)log(a[2](i))+(1y(i))log(1a[2](i))) J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small

numpy實現

def compute_cost(A2,Y):
    m = Y.shape[1]
    loss = -(np.multiply(Y,np.log(A2)) + np.multiply(1-Y,np.log(1-A2)))
    cost = np.sum(loss) / m
    cost = np.squeeze(cost)
    return cost

(2)反向傳播

dW1=JW1db1=Jb1dW2=JW2db2=Jb2 dW1 = \frac{\partial \mathcal{J} }{ \partial W_1 }\quad db1 = \frac{\partial \mathcal{J} }{ \partial b_1 }\\ dW2 = \frac{\partial \mathcal{J} }{ \partial W_2 }\quad db2 = \frac{\partial \mathcal{J} }{ \partial b_2 }\\

Jz2(i)=1m(a[2](i)y(i))JW2=Jz2(i)a[1](i)TJb2=iJz2(i)Jz1(i)=W2TJz2(i)(1a[1](i)2)JW1=Jz1(i)XTJib1=iJz1(i) 對輸出層求偏導\\\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)}) \quad\quad \frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T} \quad\quad \frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}\\ 對隱含層求偏導\\\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) \quad\quad\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } X^T \quad\quad\frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}\\

numpy實現

def back_propagation(parameters, cache, X, Y):
    w1 = parameters["w1"]
    b1 = parameters["b1"]
    w2 = parameters["w2"]
    b2 = parameters["b2"]
    
    Z1 = cache["Z1"]
    A1 = cache["A1"]
    Z2 = cache["Z2"]
    A2 = cache["A2"]
    
    dZ2 = A2 - Y
    dw2 = np.dot(dZ2,A1.T) / m
    db2 = np.sum(dZ2, axis=1, keepdims=True) / m
    dZ1 = np.multiply(np.dot(w2.T, dZ2), 1 - np.power(A1, 2))
    dw1 = np.dot(dZ1,X.T) / m
    db1 = np.sum(dZ1, axis=1, keepdims=True) / m
    
    grads = {"dw1": dw1,
             "db1": db1,
             "dw2": dw2,
             "db2": db2}
    
    return grads

7.gradient descent

梯度更新規則:
θ=θαJθα,θ \theta = \theta - \alpha \frac{\partial J }{ \partial \theta }\\ \alpha爲學習率,\theta 代表參數
更新公式:
W1:=W1αJW1b1:=b1αJb1W2:=W2αJW2b2:=b2αJb2 W1 :=W1-\alpha \frac{\partial \mathcal{J} }{ \partial W_1 }\\ b1 := b1-\alpha\frac{\partial \mathcal{J} }{ \partial b_1 }\\ W2 := W2-\alpha\frac{\partial \mathcal{J} }{ \partial W_2 }\\ b2 := b2-\alpha\frac{\partial \mathcal{J} }{ \partial b_2 }\\
numpy實現

def gradient_descent(parameters, grads, learning_rate):
    w1 = parameters["w1"]
    b1 = parameters["b1"]
    w2 = parameters["w2"]
    b2 = parameters["b2"]
    
    dw1 = grads["dw1"]
    db1 = grads["db1"]
    dw2 = grads["dw2"]
    db2 = grads["db2"]
    
    w1 = w1 - learning_rate * dw1
    b1 = b1 - learning_rate * db1
    w2 = w2 - learning_rate * dw2
    b2 = b2 - learning_rate * db2
    
    parameters = {
        "w1":w1,
        "b1":b1,
        "w2":w2,
        "b2":b2
    }
    return parameters

8.訓練網絡

def model(X,Y,hidden_size,num_iterations):
    input_size = X.shape[0]
    output_size = Y.shape[0]
    parameters = init_parameters(input_size,hidden_size,output_size)
    
    for i in range(num_iterations):
        A2,cache = forward_propagation(X,parameters)
        cost = compute_cost(A2,Y)
        grads = back_propagation(parameters,cache,X,Y)
        parameters = gradient_descent(parameters,grads,0.5)
        
        if i % 1000 == 0:
            print("step:%i   %f"%(i,cost))
    return parameters
X_assess, Y_assess = nn_model_test_case() #加載訓練數據
parameters = model(X_assess, Y_assess, 4, 10000)

執行結果:

step:0   0.693230
step:1000   0.642704
step:2000   0.451822
step:3000   0.128380
step:4000   0.055486
step:5000   0.033372
step:6000   0.023442
step:7000   0.017932
step:8000   0.014464
step:9000   0.012093

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章