梯度檢驗

梯度檢驗

原創

kyang624823

2020-02-22 02:07

0. 引言

對於一個函數來說，通常有兩種計算梯度的方式：

數值梯度（numerical gradient）;
解析梯度（analytic gradient）;

數值梯度的優點是容易編程實現，不要求函數可微，然而，數值梯度缺點很明顯，通常是近似解，同時求解速度很慢，因此在設計機器學習目標函數時，通常設計成可微的函數，可以快速地求解其解析梯度，同時這個梯度是確切解。

1. 爲何進行梯度檢驗？

神經網絡算法使用反向傳播計算目標函數關於每個參數的梯度，可以看做解析梯度。由於計算過程中涉及到的參數很多，反向傳播計算的梯度很容易出現誤差，導致最後迭代得到效果很差的參數值。

爲了確認代碼中反向傳播計算的梯度是否正確，可以採用梯度檢驗（gradient check）的方法。通過計算數值梯度，得到梯度的近似值，然後和反向傳播得到的梯度進行比較，若兩者相差很小的話則證明反向傳播的代碼是正確無誤的。

2. 數值梯度計算

對於函數J(θ)在θ點處的梯度值，其數學定義爲：

d d θ J (θ) = lim ε \to 0 J ( θ + ε ) - J ( θ - ε ) 2 ε

其近似解(數值梯度)爲：

d d θ J (θ) = J ( θ + e p s i l o n ) - J ( θ - e p s i l o n ) 2 e p s i l o n

epsilon通常被設置爲一個很小的常量，比如10(−4)。

然後，用解析的方式計算J(θ)在θ點處的梯度值g(θ)，若g(θ)和ddθJ(θ)的值很接近的話，則驗證解析方法得到的梯度g(θ)是正確的。

3. 一個簡單的示例

給定一個二次函數：

J (x 1, x 2) = x 21 + 3 x 1 * x 2

其解析梯度和函數值的計算代碼爲：

function [value,grad] = simpleQuadraticFunction(x)
% this function accepts a 2D vector as input. 
% Its outputs are:
%   value: h(x1, x2) = x1^2 + 3*x1*x2
%   grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2 
value = x(1)^2 + 3*x(1)*x(2);

grad = zeros(2, 1);
grad(1)  = 2*x(1) + 3*x(2);
grad(2)  = 3*x(1);
end

數值計算梯度的代碼：

function numgrad = computeNumericalGradient(J, theta)
% numgrad = computeNumericalGradient(J, theta)
% theta: a vector of parameters
% J: a function that outputs a real-number. Calling y = J(theta) will return the
% function value at theta. 

% Initialize numgrad with zeros
numgrad = zeros(size(theta));

% Implement numerical gradient checking, and return the result in numgrad.  

epsilon = 10^(-4);
n = size(theta, 1);
for i=1:n
    theta1 = theta;
    theta1(i) = theta1(i) + epsilon;
    theta2 = theta;
    theta2(i) = theta2(i) - epsilon;
    [J1, grad] = J(theta1);
    [J2, grad] = J(theta2);
    numgrad(i) = (J1-J2)/(2*epsilon);
end
end

兩種計算方式調用，並比較其不同：

x = [4; 10];
[value, grad] = simpleQuadraticFunction(x);

numgrad = computeNumericalGradient(@simpleQuadraticFunction, x);

% Visually examine the two gradient computations.   
disp([numgrad grad]);

% Evaluate the norm of the difference between two solutions.  
diff = norm(numgrad-grad)/norm(numgrad+grad);
disp(diff);

運行代碼，得到的結果爲：

   38.0000   38.0000
   12.0000   12.0000

   2.1452e-12

可以看到，兩種方式得到梯度的範式差是2.1452e−12，是一個很小的數值，證明梯度的解析解是正確的

站內首發文章

kyang624823

發佈了11 篇原創文章 · 獲贊 59 · 訪問量 16萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

0. 引言

1. 爲何進行梯度檢驗？

2. 數值梯度計算

3. 一個簡單的示例

CORS error 但是 status code 是200 OK

壓縮上傳的GPU數據的方案

使用skopeo同步鏡像

添加深度學習工具箱

深度學習：自編碼進行模式分類

Matlab 歸一化函數premnmx [-1,1]

MATLAB積累----repmat(重複)

稀疏度MATLAB源碼分析

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結