SMO(Sequential minimal optimization)算法的詳細實現過程

SMO算法主要是爲優化SVM(支持向量機)的求解而產生的,SVM的公式基本上都可以推到如下這步:

maxαi=1mαi12i=1mj=1mαiαjyiyjxiTxjmax_{\alpha}\sum_{i=1}^{m}\alpha_{i}-\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_{i}\alpha_{j}y_{i}y_{j}x_{i}^{T}x_{j}

s.t.imαiyi=0s.t. \sum_{i}^{m}\alpha_{i}y_{i}=0

0αiCi=1,2,3,...,m0≤\alpha_{i}≤C,i = 1, 2, 3,...,m

其中,C是SVM中懲罰參數(或正則化常數),可令:

φ(α)=i=1mαi12i=1mj=1mαiαjyiyjxiTxj\varphi(\alpha)=\sum_{i=1}^{m}\alpha_{i}-\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\alpha_{i}\alpha_{j}y_{i}y_{j}x_{i}^{T}x_{j}

SMO的具體步驟:

第一步:爲了滿足imαiyi=0\sum_{i}^{m}\alpha_{i}y_{i}=0公式,首先要固定兩個變量αiαj\alpha_{i}和\alpha_{j},這裏以α1α2\alpha_{1}和\alpha_{2}爲例,其餘的αi(i=3,4,...,m)\alpha_{i}(i=3,4,...,m)都是已知量,則約束條件變成:

α1y1+α2y2=c=i=3mαiyi(0α1C0α2C)\alpha_{1}y_{1}+\alpha_{2}y_{2}=c=-\sum_{i=3}^{m}\alpha_{i}y_{i},(0≤\alpha_{1}≤C,0≤\alpha_{2}≤C)

兩邊同乘y1y_{1},並記y1y2=h0y_{1}y_{2}=h_{0}得:

α1+h0α2=y1i=3mαiyi=α1new+h0α2new\alpha_{1}+h_{0}\alpha_{2}=-y_{1}\sum_{i=3}^{m}\alpha_{i}y_{i}=\alpha_{1_{new}}+h_{0}\alpha_{2_{new}}

H=y1i=3mαiyiH=-y_{1}\sum_{i=3}^{m}\alpha_{i}y_{i},可得:

α1new=Hh0α2new\alpha_{1_{new}}=H-h_{0}\alpha_{2_{new}} (1)

第二步:由於α1new\alpha_{1_{new}}可以用α2new\alpha_{2_{new}}來表示,且αi(i=3,4,...,m)\alpha_{i}(i=3,4,...,m)都是已知量,此時φ(α)\varphi(\alpha)只有一個未知變量α2new\alpha_{2_{new}},那麼可以直接求導得到α2new\alpha_{2_{new}}。具體實施過程如下:

1、展開φ(α)\varphi(\alpha)可得:

φ(α)=α1new+α2new12α1new2k1112α2new2k22α1newα2newy1y2k12α1newy1i=3mαiyiki1α2newy2i=3mαiyiki2+φconstant\varphi(\alpha)=\alpha_{1_{new}}+\alpha_{2_{new}}-\frac{1}{2}\alpha_{1_{new}}^{2}k_{11}-\frac{1}{2}\alpha_{2_{new}}^{2}k_{22}-\alpha_{1_{new}}\alpha_{2_{new}}y_{1}y_{2}k_{12}-\alpha_{1_{new}}y_{1}\sum_{i=3}^{m}\alpha_{i}y_{i}k_{i1}-\alpha_{2_{new}}y_{2}\sum_{i=3}^{m}\alpha_{i}y_{i}k_{i2}+\varphi_{constant} (2)

式中, kij=k(xi,xj)kij=k(x_{i},x_{j}),表示核函數

φconstant=i=3mαi12i=3mj=3mαiαjyiyjkij\varphi_{constant}=\sum_{i=3}^{m}\alpha_{i}-\frac{1}{2}\sum_{i=3}^{m}\sum_{j=3}^{m}\alpha_{i}\alpha_{j}y_{i}y_{j}k_{ij}

2、SVM的超平面模型:f(xj)=wT+b=i=1mαiyikij+bf(x_{j})=w^{T}+b=\sum_{i=1}^{m}\alpha_{i}y_{i}k_{ij}+b

Vj=i=3mαiyikij=f(xj)bα1y1k1jα2y2k2jV_{j}=\sum_{i=3}^{m}\alpha_{i}y_{i}k_{ij}=f(x_{j})-b-\alpha_{1}y_{1}k_{1j}-\alpha_{2}y_{2}k_{2j} (3)

3、 將公式(1)、(3)代入(2)得:

φ(α)=Hh0α2new+α2new12(Hh0α2new)2k1112α2new2k22(Hh0α2new)α2newy1y2k12(Hh0α2new)y1V1α2newy2V2+φconstant\varphi(\alpha)=H-h_{0}\alpha_{2_{new}}+\alpha_{2_{new}}-\frac{1}{2}(H-h_{0}\alpha_{2_{new}})^{2}k_{11}-\frac{1}{2}\alpha_{2_{new}}^{2}k_{22}-(H-h_{0}\alpha_{2_{new}})\alpha_{2_{new}}y_{1}y_{2}k_{12}-(H-h_{0}\alpha_{2_{new}})y_{1}V_{1}-\alpha_{2_{new}}y_{2}V_{2}+\varphi_{constant}

α2new\alpha_{2_{new}}求導數可得:

dφ(α)dα2new=(k11+k222k12)α2new+h0H(k11k22)+y2(V1V2)h0+1=0\frac{d\varphi(\alpha)}{d\alpha_{2_{new}}}=-(k_{11}+k_{22}-2k_{12})\alpha_{2_{new}}+h_{0}H(k_{11}-k_{22})+y_{2}(V_{1}-V_{2})-h_{0}+1=0

求解可得:

(k11+k222k12)α2new=h0H(k11k22)+y2(V1V2)h0+1(k_{11}+k_{22}-2k_{12})\alpha_{2_{new}}=h_{0}H(k_{11}-k_{22})+y_{2}(V_{1}-V_{2})-h_{0}+1 (4)

此時,將HVjH、V_{j}代入公式(4)可得:

(k11+k222k12)α2new=(k11+k222k12)α2+y2(f(x1)y1f(x2)+y2))(k_{11}+k_{22}-2k_{12})\alpha_{2_{new}}=(k_{11}+k_{22}-2k_{12})\alpha_{2}+y_{2}(f(x_{1})-y_{1}-f(x_{2})+y_{2})) (5)

η=k11+k222k12Ei=f(xi)yi\eta=k_{11}+k_{22}-2k_{12},E_{i}=f(x_{i})-y_{i}並代入公式(5)得:

α2new=α2+y2(E1E2)η\alpha_{2_{new}}=\alpha_{2}+\frac{y_{2}(E_{1}-E_{2})}{\eta}

4、由於0α1C0α2C0≤\alpha_{1}≤C,0≤\alpha_{2}≤C,且α1y1+α2y2=c\alpha_{1}y_{1}+\alpha_{2}y_{2}=c,所以\alpha_{2_{new}}必落在如下區域內
選擇範圍

結合圖形可以得到α2\alpha_{2}的範圍:

{L=max{0,α1+α2C},H=min{C,α1+α2},    if y1=y2L=max{0,α2α1},H=min{C,C+α2α1},    if y1y2\left\{\begin{matrix}L=max\left \{ 0,\alpha_{1}+\alpha_{2}-C \right \}, H=min\left \{ C,\alpha_{1}+\alpha_{2}\right \},\: \: \: \: if\: y_{1}=y_{2} \\ L=max\left \{ 0,\alpha_{2}-\alpha_{1} \right \}, H=min\left \{ C,C+\alpha_{2}-\alpha_{1}\right \}, \: \: \: \: if\: y_{1}≠y_{2} \end{matrix}\right.

此時α2new\alpha_{2_{new}}取值爲:

α2new={H  ,    if α2newH     α2new,    if L<α2new<HL  ,    if α2newL     \alpha_{2_{new}}=\left\{\begin{matrix}H\: \: , \: \: \: \: if\: \alpha_{2_{new}}≥H\: \: \: \: \: \\ \alpha_{2_{new}} , \: \: \: \: if\: L<\alpha_{2_{new}}<H \\ L\: \: , \: \: \: \: if\: \alpha_{2_{new}}≤L\: \: \: \: \: \end{matrix}\right.

第三步:重複第一、第二步直到αinew\alpha_{i_{new}}收斂

1、由αinew\alpha_{i_{new}},根據公式w=i=1mαiyixiww=\sum_{i=1}^{m}\alpha_{i}y_{i}x_{i}求出w

2、只有支持向量滿足1yi(wTxi+b)=01-y_{i}(w^{T}x_{i}+b)=0,所以大於0的αinew\alpha_{i_{new}}必然都是支持向量,否則αinew>01yi(wTxi+b)<0\alpha_{i_{new}}>0,1-y_{i}(w^{T}x_{i}+b)<0,則αinew(1yi(wTxi+b))<0\alpha_{i_{new}}(1-y_{i}(w^{T}x_{i}+b))<0與條件αinew(1yi(wTxi+b))=0\alpha_{i_{new}}(1-y_{i}(w^{T}x_{i}+b))=0(KKT條件)相違背

3、現實中採用了一種魯棒的方法求解b,方式爲:

b=1SsS(1yswxs)b=\frac{1}{|S|}\sum_{s∈S}(\frac{1}{y_{s}}-wx_s)

4、最終超平面爲:

wx+b=0wx+b=0

根據分類決策函數f(x)=sign(wx+b)f(x)=sign(wx+b)得:

sign(x)={1 ,  if x<0     0 ,  if x=0      1 ,  if x>0 sign(x)=\left\{\begin{matrix}-1\: , \: \: if\: x<0 \\\: \: \: \: \: 0\: ,\: \: if\:x=0\: \\ \: \: \: \: \: 1\:,\: \: if\: x>0\: \end{matrix}\right.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章