SVM優化--smo

本篇是SVM系列的最後一篇,將講解SVM的求解,即SMO算法

待優化問題

回顧一下,SVM的優化問題最終可以轉化爲如下形式:

\min_{a}L = \min_{a}\left \| \sum_{i}a_i y_i X_i \right \|^2 - \sum_{i}a_i \\ s.t. \;\; 0 \leq a_i \leq C, \;\; \sum_{i}a_i y_i = 0

顯然,該問題同樣不好直接優化求解,而Platt提出的SMO算法便是一個可以高效的求解上述問題的算法,他把原始問題的求解N個參數二次規劃問題分解成多個二次規劃問題進行迭代求解,即每次選擇一對變量\left ( a_p, a_q \right )而固定其它變量來進行求解;根據問題的等式約束,當a_p變動時,a_q也要隨之變動以滿足等式約束。

SMO

如上陳述,選擇一對變量\left ( a_p, a_q \right )而固定其它變量,且\left ( a_p, a_q \right )更新後仍要滿足等式約束,這時有

a_p y_p + a_q y_q = -\sum_{i \notin \{p,q\}}a_i y_i = \beta = a_p^{old} y_p + a_q^{old} y_q \;\; \Rightarrow a_p = y_p \beta - a_q y_p y_q  (1),

化簡L得到:

L =1/2 \left \| \sum_{i}a_i y_i X_i \right \|^2 - \sum_{i}a_i \\ =1/2 \left \| a_p y_p X_p + a_q y_q X_q + \sum_{i \notin \{p,q\}}}a_i y_i X_i \right \|^2 - \{a_p + a_q + \sum_{i \notin \{p,q\}}}a_i \}

\sum_{i \notin \{p,q\}}}a_i y_i X_i = V, \;\; K_{i,j}={X_i}^T{X_j},繼續化簡:

L= \frac{1}{2}{a_p}^2 {y_1}^2 {X_p}^T{X_p} + a_p a_q y_p y_q {X_p}^T {X_q} + (a_p y_p {X_p}^T + a_q y_q {X_q}^T)V + \frac{1}{2}{a_q}^2{y_q}^2{X_q}^T{X_q} + \frac{1}{2}V^TV + (a_p + a_q + \sum_{i \notin \{p,q\}}a_i) \\=\frac{1}{2}{a_p}^2 K_{p,p} + a_p a_q y_p y_q K_{p,q} + (a_p y_p {X_p}^T + a_q y_q {X_q}^T)V + \frac{1}{2}{a_q}^2K_{q,q} + (a_p + a_q) + C

將式(1)代入,繼續化簡得到:

L= \frac{1}{2}{(a_q y_q - \beta)}^2 K_{p,p} + (\beta - a_q y_q) a_q y_q K_{p,q} + (\beta {X_p}^T - a_q y_q {X_p}^T + a_q y_q {X_q}^T)V + \frac{1}{2}{a_q}^2K_{q,q} - (\beta y_p - a_q y_q y_p + a_q) + C \\ = \frac{K_{p,p}^2 + K_{q,q}^2 - 2K_{p,q}}{2}a_q^2 - a_q y_q (\beta (K_{p,p} - K_{p,q}) + (V^T{X_p} - V^T{X_q})+ y_q - y_q) + C_1

f(x_m) = W^T X_m + b \\ = \sum_{i}{a_i y_i X_{i}^T}X_m + b = a_p y_p K_{p,m} + a_q y_q K_{q,m} + V^T X_m + b,得V^T X_p - V^T X_q = f(x_p) - f(x_q) - a_p^{old} y_p (K_{p,p} - K_{p,q}) - a_q^{old} y_q (K_{p,q} - K_{q,q}),代入L得

L= \frac{K_{p,p}^2 + K_{q,q}^2 - 2K_{p,q}}{2}a_q^2 - a_q y_q (a_q^{old} y_q (K_{p,p}+K_{q,q} - 2K_{p,q}) + f(X_p) - f(X_q)+ y_q - y_p) + C_1

\eta = K_{p,p}^2 + K_{q,q}^2 - 2K_{p,q} \;\;,\;\; E(m) = f(X_m) - y_m,則

****************************************************************************************************************

L= \frac{\eta }{2}a_q^2 - a_q y_q (\eta a_q^{old} y_q+ E(p) - E(q)) + C_1

由於{a_q}^2項的係數\eta/2 = \frac{K_{p,p}^2 + K_{q,q}^2 - 2K_{p,q}}{2} \geq 0,所以L是關於a_q的一元二次函數,且開口向上

同時還滿足約束\left\{ \begin{aligned} 0 \leq a_{p} \leq C, \;\;\; 0 \leq a_{q} \leq C \\ a_p y_p + a_q y_q = \beta = a_p^{old} y_p + a_q^{old} y_q \end{aligned} \right. ,即

\left\{ \begin{aligned} L=max(a_p^{old}+a_q^{old}-C,0) \leq a_q \leq min(a_p^{old}+a_q^{old},C)=H &,& y_p = y_q \\ L= max(a_q^{old}-a_p^{old},0) \leq a_q \leq min(a_q^{old}-a_p^{old}+C, C)=H &,& y_p \neq y_q \end{aligned} \right.

所以該函數的最小解要麼在梯度爲0處,要麼在約束邊界上

****************************************************************************************************************

令梯度爲0,得:

\frac{\partial L}{\partial a_q} = \eta a_q - y_q(\eta y_q a_q^{old} + E(p) - E(q)) = 0

所以最終解爲:a_q = min(max(L, a_q^{old} + y_q \frac{E(p)-E(q)}{\eta }), H)

求出了a_q,即可求出a_p,至此,(a_p, a_q)完成一次更新。

更新W,b

由於每次更新a後,W、b也會發生變化,也即要更新W、b:

W = \sum_{i}{a_i y_i X_i} = W^{old} + (a_p - a_p^{old})y_p X_p +(a_q - a_q^{old})y_q X_q,而b的更新稍微複雜,

根據KKT條件,y_i (W^T X_i + b) = 1 &,& 0 < a_i < C,即此時,b = y_i - V^T X_i - a_p y_p K_{p,i} - a_q y_q K_{q,i}

- V^T X_i = -f(X_i) + b^{old} +a_p^{old} y_p K_{p,i} + a_q^{old} y_q K_{q,i}

所以b =b^{old} -E(i) -(a_p-a_p^{old}) y_p K_{p,i} - (a_q-a_q^{old}) y_q K_{q,i}

即當0 < a_q < Cb_q =b^{old} -E(q) -(a_p-a_p^{old}) y_p K_{p,q} - (a_q-a_q^{old}) y_q K_{q,q}

a_q = a_q^{old} + y_q \frac{E(p)-E(q)}{\eta }時,可得0< a_q < C; \;\; b_p - b_q = 0

a_q = H < C\left\{ \begin{aligned} a_q = a_p^{old} + a_q^{old} = a_p + a_q \Rightarrow a_p=0 &,& y_p = y_q \\ a_q = a_q^{old} - a_p^{old} + C = a_q - a_p + C \Rightarrow a_p=C &,& y_p \neq y_q \end{aligned}\right.

a_q = L > 0\left\{ \begin{aligned} a_q = a_p^{old} + a_q^{old} - C = a_p + a_q -C \Rightarrow a_p=C &,& y_p = y_q \\ a_q = a_q^{old} - a_p^{old} = a_q - a_p \Rightarrow a_p=0 &,& y_p \neq y_q \end{aligned}\right.

綜上,\left\{ \begin{aligned} b = b_p = b_q = (b_p+b_q)/2 &,& a_q = a_q^{old} + y_q \frac{E(p)-E(q)}{\eta } \\ b = (b_p+b_q)/2 &,& otherwise \end{aligned}\right.

b = (b_p+b_q)/2

(a_p, a_q)的選取

選取優化變量的基本思想:每輪迭代會確定一個“臨時分類器”,顯然,我們不關心這個臨時分類器的側面之外的元組,而只是關心在側面之上以及側面之間(即邊緣內)的點,所以在下一輪迭代中,就應該在這些關心的元組中選擇兩個,對它們對應的拉格朗日乘子進行優化,至於優化計算,上面已經說得比較詳細了。

雖然大致確定了變量選擇的範圍,但是到底應該選擇哪兩個a還是問題,根據拉格朗日乘子αi對應訓練元組Xi與分類器的關係

\left\{ \begin{aligned} y_i (W^T X_i + b) \geq 1 &,& a_i = 0 \\ y_i (W^T X_i + b) = 1 &,& 0 < a_i < C \\ y_i (W^T X_i + b) \leq 1 &,& a_i = C \end{aligned}\right.

選取規則:

  1. 先確定在當前的分類器中,違反上式的元組Xp,具體來說:
    1. 先找那些拉格朗日乘子0<αi<C所對應的元組,找出一個違反上式最嚴重的X,也就確定了第一個優化變量αp;
    2. 如果找不到,那就再找那些αi=C的,
    3. 再找不到,最後找αi=0的,如果都滿足,則SMO算法的迭代終止。
  2. 找到使得|Ep−Eq|最大的元組Xq,因爲a_q的更新與|Ep−Eq|成正比,|Ep−Eq|越大更新越快

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章