gradient checking（梯度檢驗）

Gradient Checking（梯度檢驗）

我們有時候實現完backward propagation，我們不知道自己實現的backward propagation到底是不是完全正確的（這篇博客只面向自己手擼的網絡，直接搬磚的不需要考慮這個問題…..），因此，通常要用梯度檢驗來檢查自己實現的bp是否正確。其實所謂梯度檢驗，就是自己實現下導數的定義，去求w和b的導數（梯度），然後去和bp求到的梯度比較，如果差值在很小的範圍內，則可以認爲我們實現的bp沒問題。
先來回顧下，導數的定義：

具體應用到神經網絡的梯度檢驗中，因爲我們沒法做到 $Δ x \to 0$ ，只能取一個比較小的數，所以爲了結果更精確，我們可以把上面的公式稍微變下形：

通常設置 $ε = e^{- 7}$ 即可。
具體到神經網絡中，我們做梯度檢驗的步驟通常爲：

把 $W^{[1]}, b^{[1]}, . . . . . ., W^{[L]}, b^{[L]}$ 轉化成向量 $θ$ 。
同樣把 $d W^{[1]}, d b^{[1]}, . . . . . ., d W^{[L]}, d b^{[L]}$ 轉化成向量 $d θ$ 。
接下來是實現導數定義：
接下來我們比較 $d θ_{a p p r o x}$ 與 $d θ$ 是否大致相等，主要是計算兩個向量之間的歐式距離：

通常設置閾值 $t h r e s h o l d = e^{- 7}$ 。如果difference小於閾值，則認爲實現的bp沒問題。

關於上面的步驟，我們來看看代碼實現，

1.把 $W^{[1]}, b^{[1]}, . . . . . ., W^{[L]}, b^{[L]}$ 轉化成向量 $θ$ 。

#convert parameter into vector
def dictionary_to_vector(parameters):
    """
    Roll all our parameters dictionary into a single vector satisfying our specific required shape.
    """
    count = 0
    for key in parameters:
        # flatten parameter
        new_vector = np.reshape(parameters[key], (-1, 1))#convert matrix into vector
        if count == 0:#剛開始時新建一個向量
            theta = new_vector
        else:
            theta = np.concatenate((theta, new_vector), axis=0)#和已有的向量合併成新向量
        count = count + 1

    return theta

2.把 $d W^{[1]}, d b^{[1]}, . . . . . ., d W^{[L]}, d b^{[L]}$ 轉化成向量 $d θ$ 。
注：這個地方一定要注意bp求得的gradients字典的存儲順序是{dWL,dbL,….dW2,db2,dW1,db1}，因爲後面要求歐式距離，所以一定要把順序轉化爲[dW1,db1,…dWL,dbL]。在這個地方踩過坑，花了很長時間才找出bug。

#convert gradients into vector
def gradients_to_vector(gradients):
    """
    Roll all our parameters dictionary into a single vector satisfying our specific required shape.
    """
    # 因爲gradient的存儲順序是{dWL,dbL,....dW2,db2,dW1,db1}，
    #爲了統一採用[dW1,db1,...dWL,dbL]方面後面求歐式距離（對應元素）
    L = len(gradients) // 2
    keys = []
    for l in range(L):
        keys.append("dW" + str(l + 1))
        keys.append("db" + str(l + 1))
    count = 0
    for key in keys:
        # flatten parameter
        new_vector = np.reshape(gradients[key], (-1, 1))#convert matrix into vector
        if count == 0:#剛開始時新建一個向量
            theta = new_vector
        else:
            theta = np.concatenate((theta, new_vector), axis=0)#和已有的向量合併成新向量
        count = count + 1

    return theta

第三步、第四步的代碼如下：

def gradient_check(parameters, gradients, X, Y, layer_dims, epsilon=1e-7):
    """
    Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n

    Arguments:
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
    grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters.
    x -- input datapoint, of shape (input size, 1)
    y -- true "label"
    epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
    layer_dims -- the layer dimension of nn
    Returns:
    difference -- difference (2) between the approximated gradient and the backward propagation gradient
    """

    parameters_vector = dictionary_to_vector(parameters)  # parameters_values
    grad = gradients_to_vector(gradients)
    num_parameters = parameters_vector.shape[0]
    J_plus = np.zeros((num_parameters, 1))
    J_minus = np.zeros((num_parameters, 1))
    gradapprox = np.zeros((num_parameters, 1))

    # Compute gradapprox
    for i in range(num_parameters):
        thetaplus = np.copy(parameters_vector)
        thetaplus[i] = thetaplus[i] + epsilon
        AL, _ = forward_propagation(X, vector_to_dictionary(thetaplus,layer_dims))
        J_plus[i] = compute_cost(AL,Y)

        thetaminus = np.copy(parameters_vector)
        thetaminus[i] = thetaminus[i] - epsilon
        AL, _ = forward_propagation(X, vector_to_dictionary(thetaminus, layer_dims))
        J_minus[i] = compute_cost(AL,Y)
        gradapprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon)

    numerator = np.linalg.norm(grad - gradapprox)
    denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)
    difference = numerator / denominator

    if difference > 2e-7:
        print(
            "\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
    else:
        print(
            "\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")

    return difference

這裏在每一次計算bp時，還要把向量轉化成矩陣，具體實現如下：

#convert vector into dictionary
def vector_to_dictionary(theta, layer_dims):
    """
    Unroll all our parameters dictionary from a single vector satisfying our specific required shape.
    """
    parameters = {}
    L = len(layer_dims)  # the number of layers in the network
    start = 0
    end = 0
    for l in range(1, L):
        end += layer_dims[l]*layer_dims[l-1]
        parameters["W" + str(l)] = theta[start:end].reshape((layer_dims[l],layer_dims[l-1]))
        start = end
        end += layer_dims[l]*1
        parameters["b" + str(l)] = theta[start:end].reshape((layer_dims[l],1))
        start = end
    return parameters

還是拿sklearn中自帶的breast_cancer數據集，來測試下，自己實現的bp到底對不對，順便也是測下我們的gradient checking的代碼對不對，測試結果如下：

Your backward propagation works perfectly fine! difference = 5.649104934345307e-11

可以看到我們實現的bp求到的梯度和用導數定義實現的梯度之間的差距是 $e^{- 11}$ 這個數量級，所以我們實現的bp正確無誤。
完整的代碼已放到github上：gradient_checking.py

cs231n中也有講解關於gradient cheking的資料，不過它的 $r e l a t i v e e r r o r$ 定義和ng講的有些細微的差別，不過不是什麼大問題，只是定義不一樣而已，只要改變相應的閾值即可。具體見：CS231n Convolutional Neural Networks for Visual Recognition

參考資料
- ng Coursera 《Improving Deep Neural Networks Hyperparameter tuning, Regularization and Optimization》課

gradient checking（梯度檢驗）

Gradient Checking（梯度檢驗）

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

集成學習（ensemble learning）基礎知識

LDA系列

決策樹（decision tree）(二)——剪枝

決策樹（decision tree）(三)——連續值處理

梯度提升決策樹（Gradient Boosting Decision Tree，GBDT）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結