Machine Learning Experiment5 Regularization(正則化) 詳解+代碼實現

  1. 爲什麼要引入正則化?

在做線性迴歸或者邏輯迴歸的時候,會遇到過擬合問題,即,在訓練集上的error很小,但是在測試集上的偏差卻很大。因此,引入正則化項,防止過擬合。保證在測試集上獲得和在訓練集上相同的效果。

例如:對於線性迴歸,不同冪次的方程如下

通過訓練得到的結果如下:

明顯,對於低次方程,容易產生欠擬合,而對於高次方程,容易產生過擬合現象。

因此,我們引入正則化項:

其他的正則化因子

  1. 關於線性迴歸的正則化

(1)首先,繪製數據圖像:

我們可以看到,只有7個數據點,因此,很容易過擬合,(訓練數據集越大,越不容易過擬合)。

(2)我們用一個五次的多項式做線性迴歸:

之所以是線性迴歸,是因爲對於每一個x的不同冪次,它們是線性組合的。對於初始的x,它是一個一維的特徵,因此,我們將x重新構造,得到一個六維的向量。

m=length(y);

x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];

如上實現,那麼對於x的每一個維度,它們都是線性無關的,h(x)是它們的線性組合,因此,此時問題是一個多維線性迴歸問題。

(3)損失函數

其中λ是正則化參數。

(4)採用正規方程方式求解

注意:λ後的矩陣,θ0不參與計算,即不對θ0進行懲罰。

對於不同的參數λ,如果過大,則會把所有的參數都最小化了,導致模型編程常數θ0

即造成欠擬合。

(5)計算方法:

lambda=1;

Lambda=lambda.*eye(6);

Lambda(1)=0;

theta=(x'*x+Lambda)\x'*y

figure;

x_=(minx:0.01:maxx)';

x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]

hold on

plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);

plot(x_,x_1*theta,'--b','LineWidth',2);

legend({'data','5-th line'})

title('\lambda=1')

xlabel('x')

ylabel('y')

hold off


其中λ取0,1,10.結果如下:

 

 

 

 

計算結果如下:

Theta(λ=0) = 6×1
    0.4725
    0.6814
   -1.3801
   -5.9777
    2.4417
    4.7371
Theta(λ=1) = 6×1
    0.3976
   -0.4207
    0.1296
   -0.3975
    0.1753
   -0.3394
Theta(λ=10) = 6×1
    0.5205
   -0.1825
    0.0606
   -0.1482
    0.0743
   -0.1280

 

我們可以看出,當λ=0時,曲線很好的擬合了數據點,但是也明顯產生了過擬合;而當λ=1時,數據點相對均勻地分佈在曲線的兩側,而λ=10時,欠擬合現象明顯。

 

  1. 關於邏輯迴歸的正則化
  1. 繪製原始數據

其中,‘+’表示正例,‘o’表示反例。

繪製方法如下:

pos = find(y); neg = find(y == 0);

plot (x(pos,1),x(pos,2),'+')

hold on

plot (x(neg,1),x(neg,2),'o')

  1. 預測函數與x的轉化

 注意:x是一個二維的向量,我們此處將x轉化爲一個高維的向量,同時,最高次數爲6.特徵映射函數如下:

function out = map_feature(feat1, feat2)

    degree = 6;

    out = ones(size(feat1(:,1)));

    for i = 1:degree

        for j = 0:i

            out(:, end+1) = (feat1.^(i-j)).*(feat2.^j);

        end

    end

 

正則化後的損失函數

參數θ的更新規則,其中H是Hessian矩陣,另一個參數爲J的梯度。

  1. 迭代求解
  2. [m, n] = size(x);

    theta = zeros(n, 1);

    g =@(z)(1.0 ./ (1.0 + exp(-z)));

    % disp(theta)

    lambda=0

    iteration=20

    J = zeros(iteration, 1);

    for i=1:iteration

        z = x*theta;%  x:117x28 theta 28x1

        h = g(z) ;%  sigmoid   h

     

        % Calculate J (for testing convergence)

        J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...

        (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)

        %norm求的是向量theta的歐幾里德範數

     

        % Calculate gradient and hessian.

        G = (lambda/m).*theta; G(1) = 0; % gradient

        L = (lambda/m).*eye(n); L(1) = 0;% Hessian

       

        grad = ((1/m).*x' * (h-y)) + G;

        H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;

     

        % Here is the actual update

        theta = theta - H\grad;

       

    end

    計算出θ的值,然後繪製決策邊界,可視化展示計算結果。

  3. 結果展示
  4. 其中λ的取值同樣爲0,1,10;

    注意,採用MATLAB中的contour函數通過等高線的方式進行繪製,同時,在取值連線的時候注意要對u,v做同樣的處理,如下:

    % Define the ranges of the grid

    u = linspace(-1, 1.5, 200);

    v = linspace(-1, 1.5, 200);

     

    % Initialize space for the values to be plotted

    z = zeros(length(u), length(v));

     

    % Evaluate z = theta*x over the grid

    for i = 1:length(u)

        for j = 1:length(v)

            % Notice the order of j, i here!

            z(j,i) = map_feature(u(i), v(j))*theta;

        end

    end

    繪製圖像結果如下:

  5. 同樣,我們可以看到對於λ=0,過擬合,λ=10,欠擬合。

 

 

附錄 源代碼

附錄:程序源代碼
1.	線性迴歸+正則化
2.	clc,clear
3.	x=load("ex5Linx.dat");
4.	y=load("ex5Liny.dat");
5.	x0=x,y0=y
6.	figure;
7.	plot(x, y, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
8.	title('training data')
9.	xlabel('x')
10.	ylabel('y')
11.	minx=min(x);
12.	maxx=max(x);
13.	m=length(y);
14.	x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];
15.	disp(size(x(1,:)))   %1x6
16.	theta=zeros(size(x(1,:)))
17.	lambda=0;
18.	Lambda=lambda.*eye(6);
19.	Lambda(1)=0;
20.	theta=(x'*x+Lambda)\x'*y
21.	figure;
22.	x_=(minx:0.01:maxx)';
23.	x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
24.	hold on 
25.	plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
26.	plot(x_,x_1*theta,'--b','LineWidth',2);
27.	legend({'data','5-th line'})
28.	title('\lambda=0')
29.	xlabel('x')
30.	ylabel('y')
31.	hold off
32.	lambda=1;
33.	Lambda=lambda.*eye(6);
34.	Lambda(1)=0;
35.	theta=(x'*x+Lambda)\x'*y
36.	figure;
37.	x_=(minx:0.01:maxx)';
38.	x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
39.	hold on 
40.	plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
41.	plot(x_,x_1*theta,'--b','LineWidth',2);
42.	legend({'data','5-th line'})
43.	title('\lambda=1')
44.	xlabel('x')
45.	ylabel('y')
46.	hold off
47.	lambda=10;
48.	Lambda=lambda.*eye(6);
49.	Lambda(1)=0;
50.	theta=(x'*x+Lambda)\x'*y
51.	figure;
52.	x_=(minx:0.01:maxx)';
53.	x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
54.	hold on 
55.	plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
56.	plot(x_,x_1*theta,'--b','LineWidth',2);
57.	legend({'data','5-th line'})
58.	title('\lambda=10')
59.	xlabel('x')
60.	ylabel('y')
61.	hold off

2.	邏輯迴歸+正則化
1.	clc,clear;
2.	x = load ('ex5Logx.dat') ;
3.	y = load ('ex5Logy.dat') ;
4.	x0=x
5.	y0=y
6.	figure
7.	% Find the i n d i c e s f or th e 2 c l a s s e s
8.	pos = find(y); neg = find(y == 0);
9.	plot (x(pos,1),x(pos,2),'+')
10.	hold on
11.	plot (x(neg,1),x(neg,2),'o')
12.	u=x(:,1)
13.	v=x(:,2)
14.	x = map_feature (u,v)
15.	[m, n] = size(x);
16.	theta = zeros(n, 1);
17.	g =@(z)(1.0 ./ (1.0 + exp(-z)));
18.	% disp(theta)
19.	lambda=0
20.	iteration=20
21.	J = zeros(iteration, 1);
22.	for i=1:iteration
23.	    z = x*theta;%  x:117x28 theta 28x1
24.	    h = g(z) ;%  sigmoid   h
25.	 
26.	    % Calculate J (for testing convergence)
27.	    J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
28.	    (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
29.	    %norm求的是向量theta的歐幾里德範數
30.	 
31.	    % Calculate gradient and hessian.
32.	    G = (lambda/m).*theta; G(1) = 0; % gradient
33.	    L = (lambda/m).*eye(n); L(1) = 0;% Hessian
34.	    
35.	    grad = ((1/m).*x' * (h-y)) + G;
36.	    H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;
37.	 
38.	    % Here is the actual update
39.	    theta = theta - H\grad;
40.	    
41.	end
42.	% Define the ranges of the grid
43.	u = linspace(-1, 1.5, 200);
44.	v = linspace(-1, 1.5, 200);
45.	 
46.	% Initialize space for the values to be plotted
47.	z = zeros(length(u), length(v));
48.	 
49.	% Evaluate z = theta*x over the grid
50.	for i = 1:length(u)
51.	    for j = 1:length(v)
52.	        % Notice the order of j, i here!
53.	        z(j,i) = map_feature(u(i), v(j))*theta;
54.	    end
55.	end
56.	% Because of the way that contour plotting works
57.	% in Matlab, we need to transpose z, or
58.	% else the axis orientation will be flipped!
59.	z = z'
60.	% Plot z = 0 by specifying the range [0, 0]
61.	contour(u,v,z,[0,0], 'LineWidth', 2)
62.	xlim([-1.00 1.50])
63.	ylim([-0.8 1.20])
64.	legend({'y=1','y=0','Decision Boundary'})
65.	title('\lambda=0')
66.	xlabel('u')
67.	ylabel('v')
68.	lambda=1
69.	% lambda=10
70.	iteration=20
71.	J = zeros(iteration, 1);
72.	for i=1:iteration
73.	    z = x*theta;%  x:117x28 theta 28x1
74.	    h = g(z) ;%  sigmoid   h
75.	 
76.	    % Calculate J (for testing convergence)
77.	    J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
78.	    (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
79.	    %norm求的是向量theta的歐幾里德範數
80.	 
81.	    % Calculate gradient and hessian.
82.	    G = (lambda/m).*theta; G(1) = 0; % gradient
83.	    L = (lambda/m).*eye(n); L(1) = 0;% Hessian
84.	    
85.	    grad = ((1/m).*x' * (h-y)) + G;
86.	    H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;
87.	 
88.	    % Here is the actual update
89.	%     disp(H\grad)
90.	    theta = theta - H\grad;
91.	%     disp(theta)
92.	%     disp(i)
93.	end
94.	% Define the ranges of the grid
95.	u = linspace(-1, 1.5, 200);
96.	v = linspace(-1, 1.5, 200);
97.	 
98.	% Initialize space for the values to be plotted
99.	z = zeros(length(u), length(v));
100.	 
101.	% Evaluate z = theta*x over the grid
102.	for i = 1:length(u)
103.	    for j = 1:length(v)
104.	        % Notice the order of j, i here!
105.	        z(j,i) = map_feature(u(i), v(j))*theta;
106.	    end
107.	end
108.	% Because of the way that contour plotting works
109.	% in Matlab, we need to transpose z, or
110.	% else the axis orientation will be flipped!
111.	z = z'
112.	% Plot z = 0 by specifying the range [0, 0]
113.	figure;
114.	pos = find(y0); neg = find(y0 == 0);
115.	plot (x0(pos,1),x0(pos,2),'+')
116.	hold on
117.	plot (x0(neg,1),x0(neg,2),'o')
118.	contour(u,v,z,[0,0], 'LineWidth', 2)
119.	xlim([-1.00 1.50])
120.	ylim([-0.8 1.20])
121.	legend({'y=1','y=0','Decision Boundary'})
122.	title('\lambda=1')
123.	xlabel('u')
124.	ylabel('v')
125.	lambda=10
126.	iteration=20
127.	J = zeros(iteration, 1);
128.	for i=1:iteration
129.	    z = x*theta;%  x:117x28 theta 28x1
130.	    h = g(z) ;%  sigmoid   h
131.	 
132.	    % Calculate J (for testing convergence)
133.	    J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
134.	    (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
135.	    %norm求的是向量theta的歐幾里德範數
136.	 
137.	    % Calculate gradient and hessian.
138.	    G = (lambda/m).*theta; G(1) = 0; % gradient
139.	    L = (lambda/m).*eye(n); L(1) = 0;% Hessian
140.	    
141.	    grad = ((1/m).*x' * (h-y)) + G;
142.	    H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;
143.	 
144.	    % Here is the actual update
145.	    theta = theta - H\grad;
146.	end
147.	% Define the ranges of the grid
148.	u = linspace(-1, 1.5, 200);
149.	v = linspace(-1, 1.5, 200);
150.	 
151.	% Initialize space for the values to be plotted
152.	z = zeros(length(u), length(v));
153.	 
154.	% Evaluate z = theta*x over the grid
155.	for i = 1:length(u)
156.	    for j = 1:length(v)
157.	        % Notice the order of j, i here!
158.	        z(j,i) = map_feature(u(i), v(j))*theta;
159.	    end
160.	end
161.	% Because of the way that contour plotting works
162.	% in Matlab, we need to transpose z, or
163.	% else the axis orientation will be flipped!
164.	z = z'
165.	% Plot z = 0 by specifying the range [0, 0]
166.	figure;
167.	pos = find(y0); neg = find(y0 == 0);
168.	plot (x0(pos,1),x0(pos,2),'+')
169.	hold on
170.	plot (x0(neg,1),x0(neg,2),'o')
171.	contour(u,v,z,[0,0], 'LineWidth', 2)
172.	xlim([-1.00 1.50])
173.	ylim([-0.8 1.20])
174.	legend({'y=1','y=0','Decision Boundary'})
175.	title('\lambda=10')
176.	xlabel('u')
177.	ylabel('v')

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章