- 爲什麼要引入正則化?
在做線性迴歸或者邏輯迴歸的時候,會遇到過擬合問題,即,在訓練集上的error很小,但是在測試集上的偏差卻很大。因此,引入正則化項,防止過擬合。保證在測試集上獲得和在訓練集上相同的效果。
例如:對於線性迴歸,不同冪次的方程如下
通過訓練得到的結果如下:
明顯,對於低次方程,容易產生欠擬合,而對於高次方程,容易產生過擬合現象。
因此,我們引入正則化項:
其他的正則化因子
- 關於線性迴歸的正則化
(1)首先,繪製數據圖像:
我們可以看到,只有7個數據點,因此,很容易過擬合,(訓練數據集越大,越不容易過擬合)。
(2)我們用一個五次的多項式做線性迴歸:
之所以是線性迴歸,是因爲對於每一個x的不同冪次,它們是線性組合的。對於初始的x,它是一個一維的特徵,因此,我們將x重新構造,得到一個六維的向量。
m=length(y);
x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];
如上實現,那麼對於x的每一個維度,它們都是線性無關的,h(x)是它們的線性組合,因此,此時問題是一個多維線性迴歸問題。
(3)損失函數
其中λ是正則化參數。
(4)採用正規方程方式求解
注意:λ後的矩陣,θ0不參與計算,即不對θ0進行懲罰。
對於不同的參數λ,如果過大,則會把所有的參數都最小化了,導致模型編程常數θ0
即造成欠擬合。
(5)計算方法:
lambda=1;
Lambda=lambda.*eye(6);
Lambda(1)=0;
theta=(x'*x+Lambda)\x'*y
figure;
x_=(minx:0.01:maxx)';
x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
hold on
plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
plot(x_,x_1*theta,'--b','LineWidth',2);
legend({'data','5-th line'})
title('\lambda=1')
xlabel('x')
ylabel('y')
hold off
其中λ取0,1,10.結果如下:
計算結果如下:
Theta(λ=0) = 6×1
0.4725
0.6814
-1.3801
-5.9777
2.4417
4.7371
Theta(λ=1) = 6×1
0.3976
-0.4207
0.1296
-0.3975
0.1753
-0.3394
Theta(λ=10) = 6×1
0.5205
-0.1825
0.0606
-0.1482
0.0743
-0.1280
我們可以看出,當λ=0時,曲線很好的擬合了數據點,但是也明顯產生了過擬合;而當λ=1時,數據點相對均勻地分佈在曲線的兩側,而λ=10時,欠擬合現象明顯。
- 關於邏輯迴歸的正則化
- 繪製原始數據
其中,‘+’表示正例,‘o’表示反例。
繪製方法如下:
pos = find(y); neg = find(y == 0);
plot (x(pos,1),x(pos,2),'+')
hold on
plot (x(neg,1),x(neg,2),'o')
- 預測函數與x的轉化
注意:x是一個二維的向量,我們此處將x轉化爲一個高維的向量,同時,最高次數爲6.特徵映射函數如下:
function out = map_feature(feat1, feat2)
degree = 6;
out = ones(size(feat1(:,1)));
for i = 1:degree
for j = 0:i
out(:, end+1) = (feat1.^(i-j)).*(feat2.^j);
end
end
正則化後的損失函數
參數θ的更新規則,其中H是Hessian矩陣,另一個參數爲J的梯度。
- 迭代求解
-
[m, n] = size(x);
theta = zeros(n, 1);
g =@(z)(1.0 ./ (1.0 + exp(-z)));
% disp(theta)
lambda=0
iteration=20
J = zeros(iteration, 1);
for i=1:iteration
z = x*theta;% x:117x28 theta 28x1
h = g(z) ;% sigmoid h
% Calculate J (for testing convergence)
J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
(lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
%norm求的是向量theta的歐幾里德範數
% Calculate gradient and hessian.
G = (lambda/m).*theta; G(1) = 0; % gradient
L = (lambda/m).*eye(n); L(1) = 0;% Hessian
grad = ((1/m).*x' * (h-y)) + G;
H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;
% Here is the actual update
theta = theta - H\grad;
end
計算出θ的值,然後繪製決策邊界,可視化展示計算結果。
- 結果展示
-
其中λ的取值同樣爲0,1,10;
注意,採用MATLAB中的contour函數通過等高線的方式進行繪製,同時,在取值連線的時候注意要對u,v做同樣的處理,如下:
% Define the ranges of the grid
u = linspace(-1, 1.5, 200);
v = linspace(-1, 1.5, 200);
% Initialize space for the values to be plotted
z = zeros(length(u), length(v));
% Evaluate z = theta*x over the grid
for i = 1:length(u)
for j = 1:length(v)
% Notice the order of j, i here!
z(j,i) = map_feature(u(i), v(j))*theta;
end
end
繪製圖像結果如下:
-
同樣,我們可以看到對於λ=0,過擬合,λ=10,欠擬合。
附錄 源代碼
附錄:程序源代碼
1. 線性迴歸+正則化
2. clc,clear
3. x=load("ex5Linx.dat");
4. y=load("ex5Liny.dat");
5. x0=x,y0=y
6. figure;
7. plot(x, y, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
8. title('training data')
9. xlabel('x')
10. ylabel('y')
11. minx=min(x);
12. maxx=max(x);
13. m=length(y);
14. x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];
15. disp(size(x(1,:))) %1x6
16. theta=zeros(size(x(1,:)))
17. lambda=0;
18. Lambda=lambda.*eye(6);
19. Lambda(1)=0;
20. theta=(x'*x+Lambda)\x'*y
21. figure;
22. x_=(minx:0.01:maxx)';
23. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
24. hold on
25. plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
26. plot(x_,x_1*theta,'--b','LineWidth',2);
27. legend({'data','5-th line'})
28. title('\lambda=0')
29. xlabel('x')
30. ylabel('y')
31. hold off
32. lambda=1;
33. Lambda=lambda.*eye(6);
34. Lambda(1)=0;
35. theta=(x'*x+Lambda)\x'*y
36. figure;
37. x_=(minx:0.01:maxx)';
38. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
39. hold on
40. plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
41. plot(x_,x_1*theta,'--b','LineWidth',2);
42. legend({'data','5-th line'})
43. title('\lambda=1')
44. xlabel('x')
45. ylabel('y')
46. hold off
47. lambda=10;
48. Lambda=lambda.*eye(6);
49. Lambda(1)=0;
50. theta=(x'*x+Lambda)\x'*y
51. figure;
52. x_=(minx:0.01:maxx)';
53. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
54. hold on
55. plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
56. plot(x_,x_1*theta,'--b','LineWidth',2);
57. legend({'data','5-th line'})
58. title('\lambda=10')
59. xlabel('x')
60. ylabel('y')
61. hold off
2. 邏輯迴歸+正則化
1. clc,clear;
2. x = load ('ex5Logx.dat') ;
3. y = load ('ex5Logy.dat') ;
4. x0=x
5. y0=y
6. figure
7. % Find the i n d i c e s f or th e 2 c l a s s e s
8. pos = find(y); neg = find(y == 0);
9. plot (x(pos,1),x(pos,2),'+')
10. hold on
11. plot (x(neg,1),x(neg,2),'o')
12. u=x(:,1)
13. v=x(:,2)
14. x = map_feature (u,v)
15. [m, n] = size(x);
16. theta = zeros(n, 1);
17. g =@(z)(1.0 ./ (1.0 + exp(-z)));
18. % disp(theta)
19. lambda=0
20. iteration=20
21. J = zeros(iteration, 1);
22. for i=1:iteration
23. z = x*theta;% x:117x28 theta 28x1
24. h = g(z) ;% sigmoid h
25.
26. % Calculate J (for testing convergence)
27. J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
28. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
29. %norm求的是向量theta的歐幾里德範數
30.
31. % Calculate gradient and hessian.
32. G = (lambda/m).*theta; G(1) = 0; % gradient
33. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
34.
35. grad = ((1/m).*x' * (h-y)) + G;
36. H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;
37.
38. % Here is the actual update
39. theta = theta - H\grad;
40.
41. end
42. % Define the ranges of the grid
43. u = linspace(-1, 1.5, 200);
44. v = linspace(-1, 1.5, 200);
45.
46. % Initialize space for the values to be plotted
47. z = zeros(length(u), length(v));
48.
49. % Evaluate z = theta*x over the grid
50. for i = 1:length(u)
51. for j = 1:length(v)
52. % Notice the order of j, i here!
53. z(j,i) = map_feature(u(i), v(j))*theta;
54. end
55. end
56. % Because of the way that contour plotting works
57. % in Matlab, we need to transpose z, or
58. % else the axis orientation will be flipped!
59. z = z'
60. % Plot z = 0 by specifying the range [0, 0]
61. contour(u,v,z,[0,0], 'LineWidth', 2)
62. xlim([-1.00 1.50])
63. ylim([-0.8 1.20])
64. legend({'y=1','y=0','Decision Boundary'})
65. title('\lambda=0')
66. xlabel('u')
67. ylabel('v')
68. lambda=1
69. % lambda=10
70. iteration=20
71. J = zeros(iteration, 1);
72. for i=1:iteration
73. z = x*theta;% x:117x28 theta 28x1
74. h = g(z) ;% sigmoid h
75.
76. % Calculate J (for testing convergence)
77. J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
78. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
79. %norm求的是向量theta的歐幾里德範數
80.
81. % Calculate gradient and hessian.
82. G = (lambda/m).*theta; G(1) = 0; % gradient
83. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
84.
85. grad = ((1/m).*x' * (h-y)) + G;
86. H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;
87.
88. % Here is the actual update
89. % disp(H\grad)
90. theta = theta - H\grad;
91. % disp(theta)
92. % disp(i)
93. end
94. % Define the ranges of the grid
95. u = linspace(-1, 1.5, 200);
96. v = linspace(-1, 1.5, 200);
97.
98. % Initialize space for the values to be plotted
99. z = zeros(length(u), length(v));
100.
101. % Evaluate z = theta*x over the grid
102. for i = 1:length(u)
103. for j = 1:length(v)
104. % Notice the order of j, i here!
105. z(j,i) = map_feature(u(i), v(j))*theta;
106. end
107. end
108. % Because of the way that contour plotting works
109. % in Matlab, we need to transpose z, or
110. % else the axis orientation will be flipped!
111. z = z'
112. % Plot z = 0 by specifying the range [0, 0]
113. figure;
114. pos = find(y0); neg = find(y0 == 0);
115. plot (x0(pos,1),x0(pos,2),'+')
116. hold on
117. plot (x0(neg,1),x0(neg,2),'o')
118. contour(u,v,z,[0,0], 'LineWidth', 2)
119. xlim([-1.00 1.50])
120. ylim([-0.8 1.20])
121. legend({'y=1','y=0','Decision Boundary'})
122. title('\lambda=1')
123. xlabel('u')
124. ylabel('v')
125. lambda=10
126. iteration=20
127. J = zeros(iteration, 1);
128. for i=1:iteration
129. z = x*theta;% x:117x28 theta 28x1
130. h = g(z) ;% sigmoid h
131.
132. % Calculate J (for testing convergence)
133. J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
134. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
135. %norm求的是向量theta的歐幾里德範數
136.
137. % Calculate gradient and hessian.
138. G = (lambda/m).*theta; G(1) = 0; % gradient
139. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
140.
141. grad = ((1/m).*x' * (h-y)) + G;
142. H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;
143.
144. % Here is the actual update
145. theta = theta - H\grad;
146. end
147. % Define the ranges of the grid
148. u = linspace(-1, 1.5, 200);
149. v = linspace(-1, 1.5, 200);
150.
151. % Initialize space for the values to be plotted
152. z = zeros(length(u), length(v));
153.
154. % Evaluate z = theta*x over the grid
155. for i = 1:length(u)
156. for j = 1:length(v)
157. % Notice the order of j, i here!
158. z(j,i) = map_feature(u(i), v(j))*theta;
159. end
160. end
161. % Because of the way that contour plotting works
162. % in Matlab, we need to transpose z, or
163. % else the axis orientation will be flipped!
164. z = z'
165. % Plot z = 0 by specifying the range [0, 0]
166. figure;
167. pos = find(y0); neg = find(y0 == 0);
168. plot (x0(pos,1),x0(pos,2),'+')
169. hold on
170. plot (x0(neg,1),x0(neg,2),'o')
171. contour(u,v,z,[0,0], 'LineWidth', 2)
172. xlim([-1.00 1.50])
173. ylim([-0.8 1.20])
174. legend({'y=1','y=0','Decision Boundary'})
175. title('\lambda=10')
176. xlabel('u')
177. ylabel('v')