魯棒最小二乘法的主要思想是對誤差大的樣本進行抑制,減小他們對結果的影響。這裏主要整理一下參考部分的CVX代碼思路。這個代碼給出了三種等價的優化形式
數據初始部分
測試數據是隨機生成的
randn( 'state' ,0) ;
m = 16; n = 8;
A = randn( m,n) ;
b = randn( m,1) ;
M = 2;
(a) robust least-squares problem
m i n i m i z e β ∑ i = 1 m h u b e r ( β T x i − y i ) h u b e r ( u ) = { u 2 , ∣ u ∣ < = M M ( 2 ∣ u ∣ − M ) , ∣ u ∣ > M
\underset{\beta}{minimize}\sum_{i=1}^{m} huber(\beta^Tx_i - y_i)\\
huber(u)=\left\{\begin{matrix}
u^2 ,&|u| <= M \\
M(2|u| - M),& |u| > M
\end{matrix}\right.
β minimi ze i = 1 ∑ m h u b e r ( β T x i − y i ) h u b e r ( u ) = { u 2 , M ( 2 ∣ u ∣ − M ) , ∣ u ∣ < = M ∣ u ∣ > M
這裏是對他們的誤差進行限制,以免迴歸係數過度偏向誤差大的樣本造成過擬合,M是一個常數。
當∣ β T x i − y i ∣ > M |\beta^Tx_i - y_i|>M ∣ β T x i − y i ∣ > M ,令∣ u ∣ = ∣ β T x i − y i ∣ = M + v , v > 0 |u|=|\beta^Tx_i - y_i|=M+v,v>0 ∣ u ∣ = ∣ β T x i − y i ∣ = M + v , v > 0 ,則有M ( 2 ∣ u ∣ − M ) = M ( 2 ( M + v ) − M ) = M 2 + 2 M v = ( M + v ) 2 − v 2 M(2|u| - M)=M(2(M+v)-M)=M^2+2Mv=(M+v)^2-v^2 M ( 2 ∣ u ∣ − M ) = M ( 2 ( M + v ) − M ) = M 2 + 2 M v = ( M + v ) 2 − v 2
v視爲超出M的部分,爲了放緩殘差的增長速率,這個函數實際時扔掉了v的二次項
disp( 'Computing the solution of the robust least-squares problem...' ) ;
cvx_begin
variable x1( n)
minimize( sum( huber( A*x1-b,M)) )
cvx_end
(b)least-squares problem with variable weights
權值優化
disp( 'Computing the solution of the least-squares problem with variable weights...' ) ;
cvx_begin
variable x2( n)
variable w( m)
minimize( sum( quad_over_lin( diag( A*x2-b) ,w'+1)) + M^2*ones( 1,m) *w)
w >= 0;
cvx_end
這個形式感覺有一些突兀,沒有第一種來得直觀
先看看誤差函數 f ( w ) = u 2 / ( w + 1 ) + M 2 ∗ w , u = β T x i − y i , w > = 0 f(w)=u^2/(w+1)+M^2*w,u =\beta^Tx_i - y_i,w>=0 f ( w ) = u 2 / ( w + 1 ) + M 2 ∗ w , u = β T x i − y i , w > = 0
f ′ ( w ) = M 2 − u 2 / ( w + 1 ) 2 f'(w)=M^2 - u^2/(w + 1)^2 f ′ ( w ) = M 2 − u 2 / ( w + 1 ) 2
可以看到,當∣ u ∣ < M , f ′ ( w ) > 0 , f ( w ) |u|<M,f'(w)>0,f(w) ∣ u ∣ < M , f ′ ( w ) > 0 , f ( w ) 單調遞增,因此極值在w = 0 w=0 w = 0 處得到,反之,取f ′ ( w ) = 0 ⇒ w ∗ = ∣ u ∣ M − 1 f'(w)=0\Rightarrow w^*=\frac{|u|}{M}-1 f ′ ( w ) = 0 ⇒ w ∗ = M ∣ u ∣ − 1
因此帶入f ( w ) f(w) f ( w ) 得到
f ( w ) = { u 2 , ∣ u ∣ < = M M ( 2 ∣ u ∣ − M ) , ∣ u ∣ > M f(w)=\left\{\begin{matrix}
u^2 ,&|u| <= M \\
M(2|u| - M),& |u| > M
\end{matrix}\right. f ( w ) = { u 2 , M ( 2 ∣ u ∣ − M ) , ∣ u ∣ < = M ∣ u ∣ > M
等價於huber函數
這裏需要說明的是
quad_over_lin(diag(A*x2-b),w’+1)的意思
A ∗ x 2 − b ∈ R m × 1 A*x2-b \in \mathbb{R}^{m \times 1} A ∗ x 2 − b ∈ R m × 1 屬於向量,d i a g diag d i a g 將其轉爲對角矩陣,對應其對角元素。個人理解純粹是爲了quad_over_lin(x,y)計算
f quad_over_lin ( x , y ) = { x T x / y y > 0 + ∞ y ≤ 0
f_{\text{quad\_over\_lin}}(x,y) = \begin{cases} x^Tx/y & y > 0 \\ +\infty & y\leq 0 \end{cases}
f quad_over_lin ( x , y ) = { x T x / y + ∞ y > 0 y ≤ 0
©quadratic program
disp( 'Computing the solution of the quadratic program...' ) ;
cvx_begin
variable x3( n)
variable u( m)
variable v( m)
minimize( sum( square( u) + 2*M*v) )
A*x3 - b <= u + v ;
A*x3 - b >= -u - v ;
u >= 0;
u <= M;
v >= 0;
cvx_end
目標值關於v,u的單調遞增的函數,因此u和v越小越好
假設有t i = ∣ A i T x 3 − b i ∣ t_i=|A_i^Tx_3 - b_i| t i = ∣ A i T x 3 − b i ∣ ,則有0 < = t i < = u i + v i 0<=t_i<=u_i+v_i 0 < = t i < = u i + v i
極值條件下,必有u i + v i = t i u_i+v_i=t_i u i + v i = t i
v i = t i − u i , v i > = 0 ⇒ u i < = t i v_i = t_i-u_i,v_i>=0\Rightarrow u_i<=t_i v i = t i − u i , v i > = 0 ⇒ u i < = t i 。
優化目標f ( u i ) = s q u a r e ( u i ) + 2 ∗ M ∗ v i ) = u i 2 − 2 M u i + 2 M t i f(u_i)=square(u_i) + 2*M*v_i)=u_i^2-2Mu_i+2Mt_i f ( u i ) = s q u a r e ( u i ) + 2 ∗ M ∗ v i ) = u i 2 − 2 M u i + 2 M t i
一階導得到f ′ ( u i ) = 2 u i − 2 M f'(u_i)=2u_i-2M f ′ ( u i ) = 2 u i − 2 M ,由於∣ 0 ∣ < = u i < = M |0|<=u_i<=M ∣ 0 ∣ < = u i < = M
當u i < = M u_i<=M u i < = M 目標單調遞減,目標的極值位置取決於t_i
u i ∗ = { ∣ t i ∣ , t i < = M M , t i > M u_i^*=\left\{\begin{matrix}
|t_i| ,&t_i <= M \\
M,& t_i > M
\end{matrix}\right. u i ∗ = { ∣ t i ∣ , M , t i < = M t i > M
相對應的
v i ∗ = { 0 , t i < = M t i − M , t i > M v_i^*=\left\{\begin{matrix}
0 ,&t_i <= M \\
t_i-M,& t_i > M
\end{matrix}\right. v i ∗ = { 0 , t i − M , t i < = M t i > M
將最優解帶入到優化目標種可以得到
f ( u , v ) = { t i 2 , t i < = M M ( 2 t i − M ) , t i > M f(u,v)=\left\{\begin{matrix}
t_i^2 ,&t_i <= M \\
M(2 t_i-M),& t_i > M
\end{matrix}\right. f ( u , v ) = { t i 2 , M ( 2 t i − M ) , t i < = M t i > M
參考
http://web.cvxr.com/cvx/examples/cvxbook/Ch04_cvx_opt_probs/html/ex_4_5.html#source