第二次總結支持向量機,主要涉及到實現SVM的算法—SMO算法,分爲簡易版和完整版。打卡,7月1號之前完成~
一、SMO算法的最優化問題分析
SMO算法要解決的凸二次規劃的對偶問題:min α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i , x j ) − ∑ i = 1 N α i s . t . ∑ i = 1 N α i y i = 0 0 ⩽ α i ⩽ C , i = 1 , 2 , … , N \min\limits_{\alpha}\frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\alpha_{i}\alpha_{j}y_{i}y_{j}K(x_{i},x_{j})-\sum\limits_{i=1}^{N}\alpha_{i}\\s.t.\hspace{0.5cm}\sum\limits_{i=1}^{N}\alpha_{i}y_{i}=0\hspace{3.5cm}\\0\leqslant\alpha_{i}\leqslant C,i =1,2,\dots,N α min 2 1 i = 1 ∑ N j = 1 ∑ N α i α j y i y j K ( x i , x j ) − i = 1 ∑ N α i s . t . i = 1 ∑ N α i y i = 0 0 ⩽ α i ⩽ C , i = 1 , 2 , … , N
注:在寫數學公式的時候,光標出現了問題,之前光標是一條線,想在哪裏修改就在哪裏修改。不知道哪裏出現了問題,光標變成了一個藍框,公式編起來非常麻煩。
解決方法:
切換光標的模式的方法爲按 insert 鍵 。筆記本電腦有的需要同時按 Fn + insert 鍵 能在三種狀態之間切換。
SMO算法是一種啓發式算法,其主要思想爲:選擇兩個變量α 1 , α 2 \alpha_{1},\alpha_{2} α 1 , α 2 ,固定其他變量,針對這兩個變量建立一個二次規劃問題,形式爲:
min α 1 , α 2 W ( α 1 , α 2 ) = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 ∑ i = 3 N y i α i K i 1 + y 2 α 2 ∑ i = 3 N y i α i K i , 2 s . t . α 1 y 1 + α 2 y 2 = − ∑ i = 3 N y i α i = ς 0 ⩽ α i ⩽ C , i = 1 , 2 \min\limits_{\alpha_{1},\alpha_{2}}\ W(\alpha_{1},\alpha_{2})=\frac{1}{2}K_{11}\alpha_{1}^{2}+\frac{1}{2}K_{22}\alpha_{2}^{2}+y_{1}y_{2}K_{12}\alpha_{1}\alpha_{2}-(\alpha_{1}+\alpha_{2})\\+y_{1}\alpha_{1}\sum\limits_{i=3}^{N}y_{i}\alpha_{i}K_{i1}+y_{2}\alpha_{2}\sum\limits_{i=3}^{N}y_{i}\alpha_{i}K_{i,2}\\s.t.\hspace{1cm}\alpha_{1}y_{1}+\alpha_{2}y_{2}=-\sum\limits_{i=3}^{N}y_{i}\alpha_{i}=\varsigma\hspace{4cm}\\0\leqslant\alpha_{i}\leqslant C,i =1,2\hspace{2.5cm} α 1 , α 2 min W ( α 1 , α 2 ) = 2 1 K 1 1 α 1 2 + 2 1 K 2 2 α 2 2 + y 1 y 2 K 1 2 α 1 α 2 − ( α 1 + α 2 ) + y 1 α 1 i = 3 ∑ N y i α i K i 1 + y 2 α 2 i = 3 ∑ N y i α i K i , 2 s . t . α 1 y 1 + α 2 y 2 = − i = 3 ∑ N y i α i = ς 0 ⩽ α i ⩽ C , i = 1 , 2
1.無約束下的二次規劃問題的極值
令ν i = ∑ j = 3 N = g ( x i ) − ∑ j = 1 2 α j y j K ( x i , x j ) − b , i = 1 , 2 \nu_{i}=\sum\limits_{j=3}^{N}=g(x_{i})-\sum\limits_{j=1}^{2}\alpha_{j}y_{j}K(x_{i},x_{j})-b, \ i=1,2 ν i = j = 3 ∑ N = g ( x i ) − j = 1 ∑ 2 α j y j K ( x i , x j ) − b , i = 1 , 2
目標函數爲:
W ( α 1 , α 2 ) = = 1 2 K 11 α 1 2 + 1 2 K 22 α 2 2 + y 1 y 2 K 12 α 1 α 2 − ( α 1 + α 2 ) + y 1 ν 1 α 1 + y 2 ν 2 α 2 W(\alpha_{1},\alpha_{2})==\frac{1}{2}K_{11}\alpha_{1}^{2}+\frac{1}{2}K_{22}\alpha_{2}^{2}+y_{1}y_{2}K_{12}\alpha_{1}\alpha_{2}-(\alpha_{1}+\alpha_{2})\\+y_{1}\nu_{1}\alpha_{1}+y_{2}\nu_{2}\alpha_{2} W ( α 1 , α 2 ) = = 2 1 K 1 1 α 1 2 + 2 1 K 2 2 α 2 2 + y 1 y 2 K 1 2 α 1 α 2 − ( α 1 + α 2 ) + y 1 ν 1 α 1 + y 2 ν 2 α 2
由α 1 y 1 = ς − α 2 y 2 \alpha_{1}y_{1}=\varsigma-\alpha_{2}y_{2} α 1 y 1 = ς − α 2 y 2 及y 1 2 = 1 y_{1}^{2}=1 y 1 2 = 1 ,可將α 1 \alpha_{1} α 1 表示爲:
α 1 = ( ς − y 2 α 2 ) y 1 \alpha_{1}=(\varsigma-y_{2}\alpha_{2})y_{1} α 1 = ( ς − y 2 α 2 ) y 1
帶入到W ( α 1 , α 2 ) W(\alpha_{1},\alpha_{2}) W ( α 1 , α 2 ) 的表達式中,
W ( α 2 ) = 1 2 K 11 ( ς − y 2 α 2 ) 2 + 1 2 K 22 α 2 2 + y 2 K 12 ( ς − y 2 α 2 ) α 2 − ( ς − y 2 α 2 ) y 1 − α 2 + ν 1 ( ς − y 2 α 2 ) + y 2 ν 2 α 2 W(\alpha_{2})=\frac{1}{2}K_{11}(\varsigma-y_{2}\alpha_{2})^{2}+\frac{1}{2}K_{22}\alpha_{2}^{2}+y_{2}K_{12}(\varsigma-y_{2}\alpha_{2})\alpha_{2}\\-(\varsigma-y_{2}\alpha_{2})y_{1}-\alpha_{2}+\nu_{1}(\varsigma-y_{2}\alpha_{2})+y_{2}\nu_{2}\alpha_{2} W ( α 2 ) = 2 1 K 1 1 ( ς − y 2 α 2 ) 2 + 2 1 K 2 2 α 2 2 + y 2 K 1 2 ( ς − y 2 α 2 ) α 2 − ( ς − y 2 α 2 ) y 1 − α 2 + ν 1 ( ς − y 2 α 2 ) + y 2 ν 2 α 2
對α 2 求 偏 導 , \alpha_{2}求偏導, α 2 求 偏 導 ,
∂ W ∂ α 2 = K 11 α 2 + K 22 α 2 − 2 K 12 α 2 − K 11 ς y 2 − K 12 ς y 2 + y 1 y 2 − 1 − ν 1 y 2 + y 2 ν 2 \frac{\partial W}{\partial \alpha_{2}}=K_{11}\alpha_{2}+K_{22}\alpha_{2}-2K_{12}\alpha_{2}-K_{11}\varsigma y_{2}-K_{12}\varsigma \\y_{2}+y_{1}y_{2}-1-\nu_{1}y_{2}+y_{2}\nu_{2} ∂ α 2 ∂ W = K 1 1 α 2 + K 2 2 α 2 − 2 K 1 2 α 2 − K 1 1 ς y 2 − K 1 2 ς y 2 + y 1 y 2 − 1 − ν 1 y 2 + y 2 ν 2
令其爲0,得到,
( K 11 + K 22 − 2 K 12 ) α 2 = y 2 ( y 2 − y 1 + ς K 11 − ς K 12 + ν 1 − ν 2 ) ​= y 2 [ y 2 − ​y 1 ​+ ​ς K 11 − ​ς K 12 + ( g ( x 1 ) − ∑ j = 1 2 α j y j K ( x i , x j ) − b ) − ( g ( x 2 ) − ​∑ j = 1 2 α j y j K ( x i , x j ) − b ) ] (K_{11}+K_{22}-2K_{12})\alpha_{2}=
y_{2}(y_{2}-y_{1}+\varsigma K_{11}-\varsigma K_{12}+\nu_{1}-\nu_{2})\\\hspace{4.5cm}
\!=y_{2}[y_{2}-\!y_{1}\!+\!\varsigma K_{11}-\!\varsigma K_{12}+(g(x_{1}) - \sum\limits_{j=1}^{2}\alpha_{j}y_{j}K(x_{i},x_{j})-b)\\-(g(x_{2})-\!\sum\limits_{j=1}^{2}\alpha_{j}y_{j}K(x_{i},x_{j})-b)] ( K 1 1 + K 2 2 − 2 K 1 2 ) α 2 = y 2 ( y 2 − y 1 + ς K 1 1 − ς K 1 2 + ν 1 − ν 2 ) = y 2 [ y 2 − y 1 + ς K 1 1 − ς K 1 2 + ( g ( x 1 ) − j = 1 ∑ 2 α j y j K ( x i , x j ) − b ) − ( g ( x 2 ) − j = 1 ∑ 2 α j y j K ( x i , x j ) − b ) ]
將ς = α 1 o l d y 1 + α 2 o l d y 2 \varsigma=\alpha_{1}^{old}y_{1}+\alpha_{2}^{old}y_{2} ς = α 1 o l d y 1 + α 2 o l d y 2 代入,
( K 11 + K 22 − 2 K 12 ) α 2 n e w , n u c = y 2 ( ( K 11 + K 22 − 2 K 12 ) α 2 o l d y 2 + y 2 − y 1 + g ( x 1 ) − g ( x 2 ) ) = ( K 11 + K 22 − 2 K 12 ) α 2 o l d + y 2 ( E 1 − E 2 ) (K_{11}+K_{22}-2K_{12})\alpha_{2}^{new,nuc}=y_{2}((K_{11}+K_{22}-2K_{12})\alpha_{2}^{old}y_{2}+y_{2}-y_{1}\\+g(x_{1})-g(x_{2}))\\\hspace{4.1cm}=(K_{11}+K_{22}-2K_{12})\alpha_{2}^{old}+y_{2}(E_{1}-E_{2}) ( K 1 1 + K 2 2 − 2 K 1 2 ) α 2 n e w , n u c = y 2 ( ( K 1 1 + K 2 2 − 2 K 1 2 ) α 2 o l d y 2 + y 2 − y 1 + g ( x 1 ) − g ( x 2 ) ) = ( K 1 1 + K 2 2 − 2 K 1 2 ) α 2 o l d + y 2 ( E 1 − E 2 )
將η = K 11 + K 22 − 2 K 12 \eta=K_{11}+K_{22}-2K_{12} η = K 1 1 + K 2 2 − 2 K 1 2 代入,
α 2 n e w , n u c = α 2 o l d + y 2 ( E 1 − E 2 ) η \alpha_{2}^{new,nuc}=\alpha_{2}^{old}+\frac{y_{2}(E_{1}-E_{2})}{\eta} α 2 n e w , n u c = α 2 o l d + η y 2 ( E 1 − E 2 )
2.約束條件下的可行域問題
由於只有兩個變量,我們可以用如圖所示的正方形區域來表示.
假設該優化問題的的初始可行解爲 α 1 o l d , α 2 o l d \alpha_{1}^{old},\alpha_{2}^{old} α 1 o l d , α 2 o l d ,最優解爲 α 1 n e w , α 2 n e w \alpha_{1}^{new},\alpha_{2}^{new} α 1 n e w , α 2 n e w 一旦一個變量確定,另一個變量也可以確定下來,因爲:
α 1 n e w y 1 + α 2 n e w y 2 = α 1 o l d y 1 + α 2 o l d y 2 \alpha_{1}^{new}y_{1}+\alpha_{2}^{new}y_{2}=\alpha_{1}^{old}y_{1}+\alpha_{2}^{old}y_{2} α 1 n e w y 1 + α 2 n e w y 2 = α 1 o l d y 1 + α 2 o l d y 2
接下來,我們討論α 2 \alpha_{2} α 2 的可行域:
由α 1 y 1 + α 2 y 2 = − ∑ i = 3 N y i α i = ς \alpha_{1}y_{1}+\alpha_{2}y_{2}=-\sum\limits_{i=3}^{N}y_{i}\alpha_{i}=\varsigma α 1 y 1 + α 2 y 2 = − i = 3 ∑ N y i α i = ς ,
1.如果y 1 ≠ y 2 y_{1}\neq y_{2} y 1 ̸ = y 2 ,則原式可以變爲:α 1 − α 2 = k \alpha_{1}-\alpha_{2}=k α 1 − α 2 = k
如左圖所示,
1)當k<0時,直線位於右下方位置,
直線不管怎麼移動,對應的α 2 \alpha_{2} α 2 的最大值爲C,最小值爲直線下端點與α 2 \alpha_{2} α 2 軸的交點,這個點α 1 = 0 \alpha_{1}=0 α 1 = 0 ,故有,0 − α 2 n e w = α 1 o l d − α 2 o l d 0-\alpha_{2}^{new}=\alpha_{1}^{old}-\alpha_{2}^{old} 0 − α 2 n e w = α 1 o l d − α 2 o l d α 2 n e w = α 2 o l d − α 1 o l d \alpha_{2}^{new}=\alpha_{2}^{old}-\alpha_{1}^{old} α 2 n e w = α 2 o l d − α 1 o l d
此時,α 2 ∈ [ α 2 o l d − α 1 o l d , C ] \alpha_{2}\in[\alpha_{2}^{old}-\alpha_{1}^{old},C] α 2 ∈ [ α 2 o l d − α 1 o l d , C ]
2)當k>0時,直線位於左上角位置,
直線不管怎麼移動,對應的α 2 \alpha_{2} α 2 的最小值爲0,最大值爲直線上端點與α 2 \alpha_{2} α 2 軸的平行軸的交點,這個點α 1 = C \alpha_{1}=C α 1 = C ,故有C − α 2 n e w = α 1 o l d − α 2 o l d C-\alpha_{2}^{new}=\alpha_{1}^{old}-\alpha_{2}^{old} C − α 2 n e w = α 1 o l d − α 2 o l d α 2 n e w = C + α 2 o l d − α 1 o l d \alpha_{2}^{new}=C+\alpha_{2}^{old}-\alpha_{1}^{old} α 2 n e w = C + α 2 o l d − α 1 o l d
此時,α 2 ∈ [ 0 , C + α 2 o l d − α 1 o l d ] \alpha_{2}\in[0,C+\alpha_{2}^{old}-\alpha_{1}^{old}] α 2 ∈ [ 0 , C + α 2 o l d − α 1 o l d ]
又因爲,0 ⩽ α i ⩽ C 0\leqslant\alpha_{i}\leqslant C 0 ⩽ α i ⩽ C
設L,H分別爲α 2 \alpha_{2} α 2 的上界和下界,L = m a x ( 0 , α 2 o l d − α 1 o l d ) L=max(0,\alpha_{2}^{old}-\alpha_{1}^{old}) L = m a x ( 0 , α 2 o l d − α 1 o l d ) H = m i n ( C + α 2 o l d − α 1 o l d , C ) \hspace{0.6cm}H=min(C+\alpha_{2}^{old}-\alpha_{1}^{old},C) H = m i n ( C + α 2 o l d − α 1 o l d , C )
2.如果y 1 = y 2 y_{1}= y_{2} y 1 = y 2 ,則原式可以變爲:α 1 + α 2 = k \alpha_{1}+\alpha_{2}=k α 1 + α 2 = k
如右圖所示,
1)當0<k<C時,直線位於左下方位置,
直線不管怎麼移動,對應的α 2 \alpha_{2} α 2 的最小值爲0,最大值爲直線上端點與α 2 \alpha_{2} α 2 軸的平行軸的交點,這個點α 1 = 0 \alpha_{1}=0 α 1 = 0 ,故有0 + α 2 n e w = α 1 o l d + α 2 o l d 0+\alpha_{2}^{new}=\alpha_{1}^{old}+\alpha_{2}^{old} 0 + α 2 n e w = α 1 o l d + α 2 o l d α 2 n e w = α 2 o l d + α 1 o l d \alpha_{2}^{new}=\alpha_{2}^{old}+\alpha_{1}^{old} α 2 n e w = α 2 o l d + α 1 o l d
此時,α 2 ∈ [ 0 , α 2 o l d + α 1 o l d ] \alpha_{2}\in[0,\alpha_{2}^{old}+\alpha_{1}^{old}] α 2 ∈ [ 0 , α 2 o l d + α 1 o l d ]
2)當C<k<2C時,直線位於右上角位置,
直線不管怎麼移動,對應的α 2 \alpha_{2} α 2 的最大值爲C,最小值爲直線上端點與α 2 \alpha_{2} α 2 軸的平行軸的交點,這個點α 1 = C \alpha_{1}=C α 1 = C ,故有C + α 2 n e w = α 1 o l d + α 2 o l d C+\alpha_{2}^{new}=\alpha_{1}^{old}+\alpha_{2}^{old} C + α 2 n e w = α 1 o l d + α 2 o l d α 2 n e w = α 2 o l d + α 1 o l d − C \alpha_{2}^{new}=\alpha_{2}^{old}+\alpha_{1}^{old}-C α 2 n e w = α 2 o l d + α 1 o l d − C
此時,α 2 ∈ [ α 2 o l d + α 1 o l d − C , C ] \alpha_{2}\in[\alpha_{2}^{old}+\alpha_{1}^{old}-C,C] α 2 ∈ [ α 2 o l d + α 1 o l d − C , C ]
又因爲,0 ⩽ α i ⩽ C 0\leqslant\alpha_{i}\leqslant C 0 ⩽ α i ⩽ C
設L,H分別爲α 2 \alpha_{2} α 2 的上界和下界,L = m a x ( 0 , α 2 o l d + α 1 o l d − C ) L=max(0,\alpha_{2}^{old}+\alpha_{1}^{old}-C) L = m a x ( 0 , α 2 o l d + α 1 o l d − C ) H = m i n ( α 2 o l d + α 1 o l d , C ) H=min(\alpha_{2}^{old}+\alpha_{1}^{old},C)\hspace{0.7cm} H = m i n ( α 2 o l d + α 1 o l d , C )
由以上分析,我們得出最終的解α 2 n e w , u n c \alpha_{2}^{new,unc} α 2 n e w , u n c 滿足:
α 2 n e w , u n c = { H α 2 n e w , u n c > H α 2 n e w , u n c L ≤ α 2 n e w , u n c ≤ H L α 2 n e w , u n c < L \alpha_{2}^{new,unc}=
\begin{cases}
H & \alpha_{2}^{new,unc} > H\\
\alpha_{2}^{new,unc} & L\le \alpha_{2}^{new,unc}\le H\\
L & \alpha_{2}^{new,unc} < L
\end{cases} α 2 n e w , u n c = ⎩ ⎪ ⎨ ⎪ ⎧ H α 2 n e w , u n c L α 2 n e w , u n c > H L ≤ α 2 n e w , u n c ≤ H α 2 n e w , u n c < L
二、兩個變量的選擇問題
1.第一個變量的選擇問題
SMO算法把尋找第一個變量的過程稱爲外循環。外循環是在訓練樣本集中選取違反KKT條件最嚴重的的點作爲第一個變量。如何判斷樣本點是否滿足KKT條件?
原始問題是凸二次規劃問題,求得的KKT條件,
∇ w L ( w ∗ , b ∗ , ξ ∗ , α ∗ , μ ∗ ) = w ∗ − ∑ i = 1 N α ∗ y i x i = 0 ∇ b L ( w ∗ , b ∗ , ξ ∗ , α ∗ , μ ∗ ) = − ∑ i = 1 N α i y i = 0 ∇ L ( w ∗ , b ∗ , ξ ∗ , α ∗ , μ ∗ ) = C − α ∗ − μ ∗ = 0 α ∗ ( y i ( w ∗ x i + b ∗ ) − 1 + ξ ∗ ) = 0 μ ∗ ξ ∗ = 0 y i ( w ∗ x i + b ∗ ) − 1 + ξ i ∗ ≥ 0 ξ i ∗ ≥ 0 α i ∗ ≥ 0 μ i ∗ ≥ 0 \nabla_{w} L(w^{*},b^{*},\xi^{*},\alpha^{*},\mu^{*})=w^{*}-\sum\limits_{i=1}^{N}\alpha^{*}y_{i}x_{i}=0\\ \nabla_{b}L(w^{*},b^{*},\xi^{*},\alpha^{*},\mu^{*})=-\sum\limits_{i=1}^{N}\alpha_{i}y_{i}=0\hspace{1cm}\\ \nabla L(w^{*},b^{*},\xi^{*},\alpha^{*},\mu^{*})=C-\alpha^{*}-\mu^{*}=0\hspace{0.7cm}\\ \alpha^{*}(y_{i}(w^{*}x_{i}+b^{*})-1+\xi^{*})=0\\ \mu^{*}\xi^{*}=0\\ y_{i}(w^{*}x_{i}+b^{*})-1+\xi_{i}^{*}\geq 0\\ \xi_{i}^{*}\geq 0\\ \alpha_{i}^{*}\geq 0\\ \mu_{i}^{*}\geq 0 ∇ w L ( w ∗ , b ∗ , ξ ∗ , α ∗ , μ ∗ ) = w ∗ − i = 1 ∑ N α ∗ y i x i = 0 ∇ b L ( w ∗ , b ∗ , ξ ∗ , α ∗ , μ ∗ ) = − i = 1 ∑ N α i y i = 0 ∇ L ( w ∗ , b ∗ , ξ ∗ , α ∗ , μ ∗ ) = C − α ∗ − μ ∗ = 0 α ∗ ( y i ( w ∗ x i + b ∗ ) − 1 + ξ ∗ ) = 0 μ ∗ ξ ∗ = 0 y i ( w ∗ x i + b ∗ ) − 1 + ξ i ∗ ≥ 0 ξ i ∗ ≥ 0 α i ∗ ≥ 0 μ i ∗ ≥ 0
進一步的,
1.當α i = 0 \alpha_{i}=0 α i = 0 時,則 μ ∗ = C ⇒ ξ ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) − 1 ≥ 0 \mu^{*}=C\Rightarrow \xi^{*}=0 \Rightarrow y_{i}(w^{*}x_{i}+b^{*})-1\geq 0 μ ∗ = C ⇒ ξ ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) − 1 ≥ 0 ;
2.當α i = C \alpha_{i}=C α i = C 時,則μ ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) − 1 + ξ i ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) = 1 − ξ i ∗ ≤ 1 \mu^{*}=0\Rightarrow y_{i}(w^{*}x_{i}+b^{*})-1+\xi_{i}^{*}= 0 \Rightarrow y_{i}(w^{*}x_{i}+b^{*})=1-\xi_{i}^{*}\leq 1 μ ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) − 1 + ξ i ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) = 1 − ξ i ∗ ≤ 1 ;
3.當0 < α i < C 0<\alpha_{i}<C 0 < α i < C 時,則ξ ∗ = 0 , y i ( w ∗ x i + b ∗ ) − 1 + ξ i ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) = 1 \xi^{*}=0,y_{i}(w^{*}x_{i}+b^{*})-1+\xi_{i}^{*}= 0\Rightarrow y_{i}(w^{*}x_{i}+b^{*})= 1 ξ ∗ = 0 , y i ( w ∗ x i + b ∗ ) − 1 + ξ i ∗ = 0 ⇒ y i ( w ∗ x i + b ∗ ) = 1
完整的,
α i = 0   ⟺   y i ( w ∗ x i + b ∗ ) ≥ 1 α i = C   ⟺   y i ( w ∗ x i + b ∗ ) ≤ 1 0 < α i < C   ⟺   y i ( w ∗ x i + b ∗ ) = 1 \alpha_{i}=0\iff y_{i}(w^{*}x_{i}+b^{*})\geq 1\\\alpha_{i}=C \iff y_{i}(w^{*}x_{i}+b^{*})\leq 1\hspace{0.1cm}\\0<\alpha_{i}<C \iff y_{i}(w^{*}x_{i}+b^{*})= 1\hspace{0.6cm} α i = 0 ⟺ y i ( w ∗ x i + b ∗ ) ≥ 1 α i = C ⟺ y i ( w ∗ x i + b ∗ ) ≤ 1 0 < α i < C ⟺ y i ( w ∗ x i + b ∗ ) = 1
選擇第一個變量的流程如下:
遍歷整個訓練樣本的數據集,找到違反KKT條件的α i \alpha_{i} α i 作爲第一個變量,然後根據後面講到的規則選擇第二個變量,接着對這兩個變量進行優化。
當遍歷完整個訓練樣本的數據集後,接着遍歷間隔邊界上的樣本點,根據相關規則選擇第二個變量,接着對這兩個變量進行優化。
返回1,繼續遍歷整個訓練樣本點數據集。即在整個樣本的數據集和非邊界樣本上來回進行切換。直到遍歷整個樣本的數據集,沒有可以優化的α i \alpha_{i} α i 爲止,退出循環。
對於第一個變量的選擇,在《機器學習實戰》中有具體的代碼實現:
這裏有一個問題:爲什麼這樣寫呢?
前面我們已經解釋了滿足KKT條件的表達式,如果不滿足KKT條件,則可以表達爲:
α i = 0   ⟺   y i ( w ∗ x i + b ∗ ) < 1 α i = C   ⟺   y i ( w ∗ x i + b ∗ ) > 1 0 < α i < C   ⟺   y i ( w ∗ x i + b ∗ ) > 1 或 y i ( w ∗ x i + b ∗ ) < 1 \alpha_{i}=0\iff y_{i}(w^{*}x_{i}+b^{*})< 1\\\alpha_{i}=C \iff y_{i}(w^{*}x_{i}+b^{*})> 1\hspace{0.1cm}\\0<\alpha_{i}<C \iff y_{i}(w^{*}x_{i}+b^{*})> 1或y_{i}(w^{*}x_{i}+b^{*})<1 α i = 0 ⟺ y i ( w ∗ x i + b ∗ ) < 1 α i = C ⟺ y i ( w ∗ x i + b ∗ ) > 1 0 < α i < C ⟺ y i ( w ∗ x i + b ∗ ) > 1 或 y i ( w ∗ x i + b ∗ ) < 1
進一步的,可以轉化成:
當 y i ( w ∗ x i + b ∗ ) > 1 時 , 0 < α i ≤ C 當y_{i}(w^{*}x_{i}+b^{*})> 1時,0<\alpha_{i}\leq C 當 y i ( w ∗ x i + b ∗ ) > 1 時 , 0 < α i ≤ C 當 y i ( w ∗ x i + b ∗ ) < 1 時 , 0 ≤ α i < C 當y_{i}(w^{*}x_{i}+b^{*})<1時,0\leq \alpha_{i}< C 當 y i ( w ∗ x i + b ∗ ) < 1 時 , 0 ≤ α i < C
我們看代碼中的表達式,E i = g ( x i ) − y i E_{i}=g(x_{i})-y_{i} E i = g ( x i ) − y i
對於∀ δ > 0 \forall \ \delta >0 ∀ δ > 0
1.當α i < C \alpha_{i}<C α i < C ,即0 ≤ α i < C 0\leq \alpha_{i}< C 0 ≤ α i < C
y i [ g ( x i ) − y i ] = y i g ( x i ) − y i y i = y i g ( x i ) − 1 < − δ = y i g ( x i ) < 1 − δ \begin{aligned}y_{i}[g(x_{i})-y_{i}]
&=y_{i}g(x_{i})-y_{i}y_{i}\\
&=y_{i}g(x_{i})-1<-\delta\\
&=y_{i}g(x_{i})<1-\delta
\end{aligned} y i [ g ( x i ) − y i ] = y i g ( x i ) − y i y i = y i g ( x i ) − 1 < − δ = y i g ( x i ) < 1 − δ
2.當α i > 0 \alpha_{i}>0 α i > 0 ,即0 < α i ≤ C 0< \alpha_{i}\leq C 0 < α i ≤ C
y i [ g ( x i ) − y i ] = y i g ( x i ) − y i y i = y i g ( x i ) − 1 > δ = y i g ( x i ) > 1 + δ \begin{aligned}y_{i}[g(x_{i})-y_{i}]
&=y_{i}g(x_{i})-y_{i}y_{i}\\
&=y_{i}g(x_{i})-1>\delta\\
&=y_{i}g(x_{i})>1+\delta
\end{aligned} y i [ g ( x i ) − y i ] = y i g ( x i ) − y i y i = y i g ( x i ) − 1 > δ = y i g ( x i ) > 1 + δ
這裏,δ \delta δ 是一個誤差項。
2.第二個變量的選擇問題
SMO稱第二個變量的選擇過程爲內循環。第二個變量的選擇的標準是使α i \alpha_{i} α i 有足夠大的變化.由於α 2 \alpha_{2} α 2 是依賴於∣ E 1 − E 2 ∣ |E_{1}-E_{2}| ∣ E 1 − E 2 ∣ 的,則我們選擇使得∣ E 1 − E 2 ∣ |E_{1}-E_{2}| ∣ E 1 − E 2 ∣ 最大的α 2 \alpha_{2} α 2 。如果E 1 E_{1} E 1 是正的,則選擇最小的E i E_{i} E i 作爲E 2 E_{2} E 2 ;如果E 1 E_{1} E 1 是負的,則選擇最大的E i E_{i} E i 作爲E 2 E_{2} E 2 ;通常爲每個樣本的 E i E_{i} E i 保存在一個列表中,選擇最大的 ∣ E 1 − E 2 ∣ |E_{1}-E_{2}| ∣ E 1 − E 2 ∣ 來近似最大化步長。
按照上述的啓發式選擇第二個變量,如果不能夠使得函數值有足夠的下降,則需要:
遍歷在間隔邊界上的支持向量點,找到能夠使目標函數下降的點,如果沒有,則遍歷整個數據集;如果還是沒有合適的α 2 \alpha_{2} α 2 ,則放棄第一個α 1 \alpha_{1} α 1 ,通過外層循環重新選擇α 1 \alpha_{1} α 1 。
三.簡易版的SMO算法
from numpy import *
import matplotlib.pyplot as plt
# SMO算法中的輔助函數
# 讀取文本中的數據,將文本中的特徵存放在dataMat中,將標籤存放在labelMat中
def loadDataSet(fireName):
DataMat = [];LabelMat = []
fr = open(fireName)
for line in fr.readlines():
LineArr = line.strip().split('\t')
DataMat.append([float(LineArr[0]),float(LineArr[1])])
LabelMat.append(float(LineArr[2]))
return DataMat,LabelMat
# 隨機選擇alpha_j的索引值
def SelectJrand(i,m):
j = i
while(j == i):
j = int(random.uniform(0,m))
return j
#判斷alpha值是否越界
def clipAlpha(aj,H,L):
if aj > H:
aj = H
elif aj < L:
aj = L
return aj
SMO算法正式部分
# 該函數的僞代碼如下:
# 創建一個alpha向量並將其初始化爲0向量
# 當迭代次數小於最大的迭代次數時(外循環):
# 對數據集中的每個數據向量(內循環):
# 如果該數據向量可以被優化:
# 隨機選擇另外一個數據向量
# 同時優化這兩個向量
# 如果這兩個向量都不能被優化,退出內循環
# 如果所有的向量都沒有被優化,增加迭代數目,繼續下一次循環
def smoSimple(dataMatIn,classLabels,C,toler,MaxIter):
# 將列表 dataMatIn 和 classLabels轉化成矩陣,運用矩陣乘法來減少運算
dataMatrix = mat(dataMatIn)
LabelMat = mat(classLabels).transpose()# .transpose()是轉置運算
# 初始化y = w*x + b 的 b 爲0
b = 0
# 得到 dataMatrix 的行數和列數
m,n = shape(dataMatrix)
# 初始化 alphas 爲 m×1的零矩陣
alphas = mat(zeros((m,1)))
iter = 0
# 開始迭代
while (iter < MaxIter):
# 記錄alphas是否已經發生了優化
alphaPairsChanged = 0
# 遍歷數據集上的每一個樣本
for i in range(m):
# 計算第一個變量alphas[i]的預測值
gxi = float(multiply(alphas,LabelMat).T*(dataMatrix*dataMatrix[i,:].T)) + b
# 計算第一個變量的預測值與真實值的差
Ei = gxi - float(LabelMat[i])
# 選擇違反 KKT 條件的樣本點進行優化,爲什麼這麼處理,我們在正文中有說明
if (LabelMat[i]*Ei < -toler and alphas[i] < C) or (LabelMat[i]*Ei > toler and alphas[i] > 0):
# 選擇好了i,就開始選擇另外一個變量i
j = SelectJrand(i,m)
# 計算第二個變量的預測值
gxj = float(multiply(alphas,LabelMat).T*(dataMatrix*dataMatrix[j,:].T)) + b
# 計算第二個變量的預測值和真實值的差
Ej =gxj - float(LabelMat[j])
# 儲存alpha的初始值
alphaIold = alphas[i].copy()
alphaJold = alphas[j].copy()
# 計算alphas[j]的上界和下界
if LabelMat[j] != LabelMat[i]:
H = min(C,C + alphas[j] - alphas[i])
L = max(0,alphas[j] - alphas[i])
else:
H = min(C,alphas[j]+alphas[i])
L = max(0,alphas[j]+alphas[i]-C)
# 如果 L == H,則跳過本次循環
if L == H:
print("L == H")
continue
# 計算alphas[j]的公式的分母,公式見正文
eta = dataMatrix[i,:]*dataMatrix[i,:].T + dataMatrix[j,:]*dataMatrix[j,:].T \
-2.0*dataMatrix[i,:]*dataMatrix[j,:].T
# 分母不能爲0,否則跳過本次循環
if eta ==0:
print("eta == 0")
continue
# 計算第二個變量alphas[j]的值
alphas[j] += LabelMat[j]*(Ei - Ej)/eta
# 判斷alphas[j]的值是否超過了其上界和下界
alphas[j] = clipAlpha(alphas[j],H,L)
# 如果本次更新的alphas[j]變化太小,則跳過本次循環
if (abs(alphas[j] - alphaJold)) < 0.00001:
print("j not moving enough !")
continue
# 根據alphas[j]去求解alphas[i]
alphas[i] += LabelMat[i]*LabelMat[j]*(alphaJold - alphas[j])
# 更新b值
b1 = - Ei - LabelMat[i]*(alphas[i] - alphaIold)*(dataMatrix[i,:]*dataMatrix[i,:].T)\
-LabelMat[j]*(alphas[j] - alphaJold)*(dataMatrix[j,:]*dataMatrix[i,:].T) + b
b2 = - Ej - LabelMat[i]*(alphas[i] - alphaIold)*(dataMatrix[i,:]*dataMatrix[j,:].T)\
- LabelMat[j]*(alphas[j] - alphaJold)*(dataMatrix[j,:]*dataMatrix[j,:].T) + b
if alphas[i] > 0 and alphas[i] < C:
b = b1
elif alphas[j] > 0 and alphas[j] < C:
b = b2
else:
b = (b1+b2)/2.0
alphaPairsChanged +=1
print("iter:{0};i:{1};alpha pair changed:{2}".format(iter,i,alphaPairsChanged))
# 如果這兩個向量都不能被優化,增加迭代次數
if (alphaPairsChanged == 0):
iter +=1
else:
iter = 0
print("iteration number:%d" % iter)
return b,alphas
有一個地方需要理解:alphaIold = alphas[i].copy() 和 alphaJold = alphas[j].copy()
爲什麼要用.copy()?
這裏就要說到python中的淺拷貝,詳細介紹見https://blog.csdn.net/weixin_40238600/article/details/93603747
如果使用直接複製的話,alphas[j]或alphas[i]一旦更新,alphaIold和alphaJold也會隨之而改變。
# 計算權重係數w
def clacW(alphas,dataArr,LabelArr):
dataMat = mat(dataArr);labelMat = mat(LabelArr).transpose()
m,n = shape(dataMat)
w = zeros((n,1))
for i in range(m):
w += multiply(alphas[i]*labelMat[i],dataMat[i,:].T)
return w
擬合曲線
def plotBestFit(alphas,dataArr,labelArr,b):
dataMat,labelMat = loadDataSet('testSet.txt')
dataArr = array(dataMat)
# 返回的是dataArr的行數
n = shape(dataArr)[0]
xcord1 = [];ycord1 = []
xcord2 = [];ycord2 = []
for i in range(n):
if (labelArr[i] == 1):
xcord1.append(dataArr[i,0]);ycord1.append(dataArr[i,1])
else:
xcord2.append(dataArr[i,0]);ycord2.append(dataArr[i,1])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xcord1,ycord1,s = 30,c = 'red',marker = 's')
ax.scatter(xcord2,ycord2,s = 30,c = 'green')
x =arange(2.0,6.0,0.1)
w = clacW(alphas, dataArr, labelArr)
print(type(b))
print(type(w))
print(type(x))
y = (-b - w[0]* x) / w[1] # 由w1*x1+w2*x2+b=0得到x2(即y)=(-b-w1x1)/w2
ax.plot(x,y)
plt.xlabel('x1');plt.ylabel('x2')
plt.show()
結果報錯:
原因是:b的類型
應該改成數組:y = (-array(b)[0] - w[0] x) / w[1]
主函數運行:
if __name__ == '__main__':
dataArr,labelArr = loadDataSet('testSet.txt')
b,alphas= smoSimple(dataArr, labelArr, 0.6, 0.001, 40)
plotBestFit(alphas,dataArr,labelArr,b)
四.完整版的Platt SMO算法
參考資料:
1.《統計學習方法》李航
2.《機器學習實戰》Peter Harrington
3.【機器學習】支持向量機原理(四)SMO算法原理:https://blog.csdn.net/made_in_china_too/article/details/79547296
4.終端中光標的模式與操作:https://blog.csdn.net/Jeffxu_lib/article/details/84759961
5.[latex] 長公式換行:https://blog.csdn.net/solidsanke54/article/details/45102397
6.LaTeX大括號用法:https://blog.csdn.net/l740450789/article/details/49487847
7.機器學習實戰 簡化SMO算法 對第一個alpha選擇條件的解讀:https://blog.csdn.net/sky_kkk/article/details/79535060