Matlab實現線性迴歸和邏輯迴歸: Linear Regression & Logistic Regression

本文爲Maching Learning 欄目補充內容，爲上幾章中所提到單參數線性迴歸、多參數線性迴歸和邏輯迴歸的總結版。旨在幫助大家更好地理解迴歸，所以我在Matlab中分別對他們予以實現，在本文中由易到難地逐個介紹。

本講內容：

Matlab 實現各種迴歸函數

=========================

基本模型

Y=θ₀+θ₁X₁型---線性迴歸（直線擬合）

解決過擬合問題---Regularization

Y=1/(1+e^X)型---邏輯迴歸（sigmod 函數擬合）

=========================

第一部分：基本模型

在解決擬合問題的解決之前，我們首先回憶一下線性迴歸和邏輯迴歸的基本模型。

設待擬合參數 θ_n*1 和輸入參數[ x_m*n, y_m*1] 。

對於各類擬合我們都要根據梯度下降的算法，給出兩部分：

① cost function（指出真實值y與擬合值h<hypothesis>之間的距離）：給出cost function 的表達式，每次迭代保證cost function的量減小；給出梯度gradient，即cost function對每一個參數θ的求導結果。

function [ jVal,gradient ] = costFunction ( theta )

② Gradient_descent（主函數）：用來運行梯度下降算法，調用上面的cost function進行不斷迭代，直到最大迭代次數達到給定標準或者cost function返回值不再減小。

function [optTheta,functionVal,exitFlag]=Gradient_descent( )

線性迴歸：擬合方程爲h_θ(x)=θ₀x₀+θ₁x₁+…+θ_nx_n，當然也可以有x_n的冪次方作爲線性迴歸項（如），這與普通意義上的線性不同，而是類似多項式的概念。

其cost function 爲：

邏輯迴歸：擬合方程爲h_θ(x)=1/(1+e^(θ^Tx))，其cost function 爲：

cost function對各θj的求導請自行求取，看第三章最後一圖，或者參見後文代碼。

_{後面，我們分別對幾個模型方程進行擬合，給出代碼，並用matlab中的fit函數進行驗證。}

第二部分：Y=θ₀+θ₁X₁型---線性迴歸（直線擬合）

在Matlab 線性擬合 & 非線性擬合中我們已經講過如何用matlab自帶函數fit進行直線和曲線的擬合，非常實用。而這裏我們是進行ML課程的學習，因此研究如何利用前面講到的梯度下降法（gradient descent）進行擬合。

cost function：

[cpp]view
plaincopy

function [ jVal,gradient ] = costFunction2( theta )  

%COSTFUNCTION2 Summary of this function goes here  

%   linear regression -> y=theta0 + theta1*x  

%   parameter: x:m*n  theta:n*1   y:m*1   (m=4,n=1)  

%     

%Data  

x=[1;2;3;4];  

y=[1.1;2.2;2.7;3.8];  

m=size(x,1);  

hypothesis = h_func(x,theta);  

delta = hypothesis - y;  

jVal=sum(delta.^2);  

gradient(1)=sum(delta)/m;  

gradient(2)=sum(delta.*x)/m;  

end

其中，h_func是hypothesis的結果：

[cpp]view
plaincopy

function [res] = h_func(inputx,theta)  

%H_FUNC Summary of this function goes here  

%   Detailed explanation goes here  

%cost function 2  

res= theta(1)+theta(2)*inputx;function [res] = h_func(inputx,theta)  

end

Gradient_descent：

[cpp]view
plaincopy

function [optTheta,functionVal,exitFlag]=Gradient_descent( )  

%GRADIENT_DESCENT Summary of this function goes here  

%   Detailed explanation goes here  

  options = optimset('GradObj','on','MaxIter',100);  

  initialTheta = zeros(2,1);  

  [optTheta,functionVal,exitFlag] = fminunc(@costFunction2,initialTheta,options);  

end

result：

[cpp]view
plaincopy

>> [optTheta,functionVal,exitFlag] = Gradient_descent()  

Local minimum found.  

Optimization completed because the size of the gradient is less than  

the default value of the function tolerance.  

<stopping criteria details>  

optTheta =  

    0.3000  

    0.8600  

functionVal =  

    0.0720  

exitFlag =  

     1

即得y=0.3+0.86x;

驗證：

[cpp]view
plaincopy

function [ parameter ] = checkcostfunc(  )  

%CHECKC2 Summary of this function goes here  

%   check if the cost function works well  

%   check with the matlab fit function as standard  

%check cost function 2  

x=[1;2;3;4];  

y=[1.1;2.2;2.7;3.8];  

EXPR= {'x','1'};  

p=fittype(EXPR);  

parameter=fit(x,y,p);  

end

運行結果：

[cpp]view
plaincopy

>> checkcostfunc()  

ans =   

     Linear model:  

     ans(x) = a*x + b  

     Coefficients (with 95% confidence bounds):  

       a =        0.86  (0.4949, 1.225)  

       b =         0.3  (-0.6998, 1.3)

和我們的結果一樣。下面畫圖：

[cpp]view
plaincopy

function PlotFunc( xstart,xend )  

%PLOTFUNC Summary of this function goes here  

%   draw original data and the fitted   

%===================cost function 2====linear regression  

%original data  

x1=[1;2;3;4];  

y1=[1.1;2.2;2.7;3.8];  

%plot(x1,y1,'ro-','MarkerSize',10);  

plot(x1,y1,'rx','MarkerSize',10);  

hold on;  

%fitted line - 擬合曲線  

x_co=xstart:0.1:xend;  

y_co=0.3+0.86*x_co;  

%plot(x_co,y_co,'g');  

plot(x_co,y_co);  

hold off;  

end

第三部分：解決過擬合問題---Regularization

過擬合問題解決方法我們已在第三章中講過，利用Regularization的方法就是在cost function中加入關於θ的項，使得部分θ的值偏小，從而達到fit效果。

例如定義costfunction J(θ)： jVal=(theta(1)-5)^2+(theta(2)-5)^2;

在每次迭代中，按照gradient descent的方法更新參數θ：θ(i)-=gradient(i),其中gradient(i)是J(θ)對θi求導的函數式，在此例中就有gradient(1)=2*(theta(1)-5), gradient(2)=2*(theta(2)-5)。

函數costFunction, 定義jVal=J(θ)和對兩個θ的gradient：

[cpp]view
plaincopy

function [ jVal,gradient ] = costFunction( theta )  

%COSTFUNCTION Summary of this function goes here  

%   Detailed explanation goes here  

jVal= (theta(1)-5)^2+(theta(2)-5)^2;  

gradient = zeros(2,1);  

%code to compute derivative to theta  

gradient(1) = 2 * (theta(1)-5);  

gradient(2) = 2 * (theta(2)-5);  

end

Gradient_descent，進行參數優化

[cpp]view
plaincopy

function [optTheta,functionVal,exitFlag]=Gradient_descent( )  

%GRADIENT_DESCENT Summary of this function goes here  

%   Detailed explanation goes here  

 options = optimset('GradObj','on','MaxIter',100);  

 initialTheta = zeros(2,1)  

 [optTheta,functionVal,exitFlag] = fminunc(@costFunction,initialTheta,options);  

end

matlab主窗口中調用，得到優化厚的參數(θ1,θ2)=(5,5)

[cpp]view
plaincopy

 [optTheta,functionVal,exitFlag] = Gradient_descent()  

initialTheta =  

     0  

     0  

Local minimum found.  

Optimization completed because the size of the gradient is less than  

the default value of the function tolerance.  

<stopping criteria details>  

optTheta =  

     5  

     5  

functionVal =  

     0  

exitFlag =  

     1

第四部分：Y=1/(1+e^X)型---邏輯迴歸（sigmod 函數擬合）

hypothesis function:

[cpp]view
plaincopy

function [res] = h_func(inputx,theta)  

%cost function 3  

tmp=theta(1)+theta(2)*inputx;%m*1  

res=1./(1+exp(-tmp));%m*1  

end

cost function:

[cpp]view
plaincopy

function [ jVal,gradient ] = costFunction3( theta )  

%COSTFUNCTION3 Summary of this function goes here  

%   Logistic Regression  

x=[-3;      -2;     -1;     0;      1;      2;     3];  

y=[0.01;    0.05;   0.3;    0.45;   0.8;    1.1;    0.99];  

m=size(x,1);  

%hypothesis  data  

hypothesis = h_func(x,theta);  

%jVal-cost function  &  gradient updating  

jVal=-sum(log(hypothesis+0.01).*y + (1-y).*log(1-hypothesis+0.01))/m;  

gradient(1)=sum(hypothesis-y)/m;   %reflect to theta1  

gradient(2)=sum((hypothesis-y).*x)/m;    %reflect to theta 2  

end

Gradient_descent:

[cpp]view
plaincopy

function [optTheta,functionVal,exitFlag]=Gradient_descent( )  

 options = optimset('GradObj','on','MaxIter',100);  

 initialTheta = [0;0];  

 [optTheta,functionVal,exitFlag] = fminunc(@costFunction3,initialTheta,options);  

end

運行結果：

[cpp]view
plaincopy

 [optTheta,functionVal,exitFlag] = Gradient_descent()  

Local minimum found.  

Optimization completed because the size of the gradient is less than  

the default value of the function tolerance.  

<stopping criteria details>  

optTheta =  

    0.3526  

    1.7573  

functionVal =  

    0.2498  

exitFlag =  

     1

畫圖驗證：

[cpp]view
plaincopy

function PlotFunc( xstart,xend )  

%PLOTFUNC Summary of this function goes here  

%   draw original data and the fitted   

%===================cost function 3=====logistic regression  

%original data  

x=[-3;      -2;     -1;     0;      1;      2;     3];  

y=[0.01;    0.05;   0.3;    0.45;   0.8;    1.1;    0.99];  

plot(x,y,'rx','MarkerSize',10);  

hold on  

%fitted line  

x_co=xstart:0.1:xend;  

theta = [0.3526,1.7573];  

y_co=h_func(x_co,theta);  

plot(x_co,y_co);  

hold off  

end

有朋友問，這裏就補充一下logistic regression中gradient的推導：

令

$z = \frac{1}{1+e^{-\theta x}}$

則有

$z'_{\theta}=\frac{e^{-\theta x}}{(1+e^{-\theta x})^2} \cdot (-x) = z(z-1)(-x)\\$

由於cost function

可得

$J'_{\theta} = y\frac{1}{z}z'_{\theta}+(1-y)\frac{-z'_\theta}{1-z}\\ J'_{\theta}=z'_\theta(\frac{y}{z}-\frac{1-y}{1-z}) = z(z-1)(-x)\frac{y-yz-z+yz}{z(1-z)} = (y-z)x$