機器學習模型LaTeX公式版:支持向量機

訓練數據集
\begin{align*} \\& T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} \end{align*}
其中,x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, Nx_{i} 爲第i 個特徵向量(實例),y_{i} 爲第x_{i} 的類標記,當y_{i}=+1 時,稱x_{i} 爲正例;當y_{i}= -1 時,稱x_{i} 爲負例,\left( x_{i}, y_{i} \right) 稱爲樣本點。
線性可分支持向量機(硬間隔支持向量機):給定線性可分訓練數據集,通過間隔最大化或等價地求解相應地凸二次規劃問題學習得到分離超平面爲
\begin{align*} \\& w^{*} \cdot x + b^{*} = 0 \end{align*}
以及相應的分類決策函數
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
稱爲線型可分支持向量機。
超平面\left( w, b \right) 關於樣本點\left( x_{i}, y_{i} \right) 的函數間隔爲
\begin{align*} \\& \hat \gamma_{i} = y_{i} \left( w \cdot x_{i} + b \right) \end{align*}
超平面\left( w, b \right) 關於訓練集T 的函數間隔
\begin{align*} \\& \hat \gamma = \min_{i = 1, 2, \cdots, N} \hat \gamma_{i} \end{align*}
即超平面\left( w, b \right) 關於訓練集T 中所有樣本點\left( x_{i}, y_{i} \right) 的函數間隔的最小值。
超平面\left( w, b \right) 關於樣本點\left( x_{i}, y_{i} \right) 的幾何間隔爲
\begin{align*} \\& \gamma_{i} = y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \end{align*}
超平面\left( w, b \right) 關於訓練集T 的幾何間隔
\begin{align*} \\& \gamma = \min_{i = 1, 2, \cdots, N} \gamma_{i} \end{align*}
函數間隔和幾何間隔的關係
\begin{align*} \\& \gamma_{i} = \dfrac{\hat \gamma_{i}}{\| w \|} \\& \gamma = \dfrac{\hat \gamma}{\| w \|} \end{align*}
最大間隔分離超平面等價爲求解
\begin{align*} \\& \max_{w,b} \quad \gamma \\ & s.t. \quad y_{i} \left( \dfrac{w}{\| w \|} \cdot x_{i} + \dfrac{b}{\| w \|} \right) \geq \gamma, \quad i=1,2, \cdots, N \end{align*}
等價的
\begin{align*} \\ & \max_{w,b} \quad \dfrac{\hat \gamma}{\| w \|} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq \hat \gamma, \quad i=1,2, \cdots, N \end{align*}
等價的
\begin{align*} \\ & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{align*}
線性可分支持向量機學習算法(最大間隔法):
輸入:線性可分訓練數據集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} ,其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
輸出:最大間隔分離超平面和分類決策函數
1. 構建並求解約束最優化問題
\begin{align*} \\ & \min_{w,b} \quad \dfrac{1}{2} \| w \|^{2} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) -1 \geq 0, \quad i=1,2, \cdots, N \end{align*}
求得最優解w^{*}, b^{*}
2. 得到分離超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分類決策函數
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
(硬間隔)支持向量:訓練數據集的樣本點中與分離超平面距離最近的樣本點的實例,即使約束條件等號成立的樣本點
\begin{align*} \\ & y_{i} \left( w \cdot x_{i} + b \right) -1 = 0 \end{align*}
y_{i} = +1 的正例點,支持向量在超平面
\begin{align*} \\ & H_{1}:w \cdot x + b = 1 \end{align*}
y_{i} = -1 的正例點,支持向量在超平面
\begin{align*} \\ & H_{1}:w \cdot x + b = -1 \end{align*}
H_{1}H_{2} 稱爲間隔邊界。
H_{1}H_{2} 之間的距離稱爲間隔,且|H_{1}H_{2}| = \dfrac{1}{\| w \|} + \dfrac{1}{\| w \|} = \dfrac{2}{\| w \|}
最優化問題的求解:
1. 引入拉格朗日乘子\alpha_{i} \geq 0, i = 1, 2, \cdots, N 構建拉格朗日函數
\begin{align*} \\ & L \left( w, b, \alpha \right) = \dfrac{1}{2} \| w \|^{2} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 \right] \\ & = \dfrac{1}{2} \| w \|^{2} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left( w \cdot x_{i} + b \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
其中,\alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T} 爲拉格朗日乘子向量。
2. 求\min_{w,b}L \left( w, b, \alpha \right) :
\begin{align*} \\ & \nabla _{w} L \left( w, b, \alpha \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla _{b} L \left( w, b, \alpha \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{align*}

\begin{align*} \\ & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \end{align*}
代入拉格朗日函數,得
\begin{align*} \\ & L \left( w, b, \alpha \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}

\begin{align*} \\ & \min_{w,b}L \left( w, b, \alpha \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
3.求\max_{\alpha} \min_{w,b}L \left( w, b, \alpha \right) :
\begin{align*} \\ & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
等價的
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
線性可分支持向量機(硬間隔支持向量機)學習算法:
輸入:線性可分訓練數據集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} ,其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
輸出:最大間隔分離超平面和分類決策函數
1. 構建並求解約束最優化問題
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
求得最優解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right)
2. 計算
\begin{align*} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{align*}
並選擇\alpha^{*} 的一個正分量\alpha_{j}^{*} > 0 ,計算
\begin{align*} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{align*}
3. 得到分離超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分類決策函數
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
線性支持向量機(軟間隔支持向量機):給定線性不可分訓練數據集,通過求解凸二次規劃問題
\begin{align*} \\ & \min_{w,b,\xi} \quad \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} \\ & s.t. \quad y_{i} \left( w \cdot x_{i} + b \right) \geq 1 - \xi_{i} \\ & \xi_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
學習得到分離超平面爲
\begin{align*} \\& w^{*} \cdot x + b^{*} = 0 \end{align*}
以及相應的分類決策函數
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
稱爲線型支持向量機。
最優化問題的求解:
1. 引入拉格朗日乘子\alpha_{i} \geq 0, \mu_{i} \geq 0, i = 1, 2, \cdots, N 構建拉格朗日函數
\begin{align*} \\ & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} + \sum_{i=1}^{N} \alpha_{i} \left[- y_{i} \left( w \cdot x_{i} + b \right) + 1 - \xi_{i} \right] + \sum_{i=1}^{N} \mu_{i} \left( -\xi_{i} \right) \\ & = \dfrac{1}{2} \| w \|^{2} + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} \left[ y_{i} \left( w \cdot x_{i} + b \right) -1 + \xi_{i} \right] - \sum_{i=1}^{N} \mu_{i} \xi_{i} \end{align*}
其中,\alpha = \left( \alpha_{1}, \alpha_{2}, \cdots, \alpha_{N} \right)^{T} 以及\mu = \left( \mu_{1}, \mu_{2}, \cdots, \mu_{N} \right)^{T} 爲拉格朗日乘子向量。
2. 求\min_{w,b}L \left( w, b, \xi, \alpha, \mu \right) :
\begin{align*} \\ & \nabla_{w} L \left( w, b, \xi, \alpha, \mu \right) = w - \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} = 0 \\ & \nabla_{b} L \left( w, b, \xi, \alpha, \mu \right) = -\sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & \nabla_{\xi_{i}} L \left( w, b, \xi, \alpha, \mu \right) = C - \alpha_{i} - \mu_{i} = 0 \end{align*}

\begin{align*} \\ & w = \sum_{i=1}^{N} \alpha_{i} y_{i} x_{i} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0\end{align*}
代入拉格朗日函數,得
\begin{align*} \\ & L \left( w, b, \xi, \alpha, \mu \right) = \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + C \sum_{i=1}^{N} \xi_{i} - \sum_{i=1}^{N} \alpha_{i} y_{i} \left[ \left( \sum_{j=1}^{N} \alpha_{j} y_{j} x_{j} \right) \cdot x_{i} + b \right] \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad + \sum_{i=1}^{N} \alpha_{i} - \sum_{i=1}^{N} \alpha_{i} \xi_{i} - \sum_{i}^{N} \mu_{i} \xi_{i} \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} y_{i} b + \sum_{i=1}^{N} \alpha_{i} + \sum_{i=1}^{N} \xi_{i} \left( C - \alpha_{i} - \mu_{i} \right) \\ & = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}

\begin{align*} \\ & \min_{w,b,\xi}L \left( w, b, \xi, \alpha, \mu \right) = - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \end{align*}
3.求\max_{\alpha} \min_{w,b, \xi}L \left( w, b, \xi, \alpha, \mu \right) :
\begin{align*} \\ & \max_{\alpha} - \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) + \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & C - \alpha_{i} - \mu_{i} = 0 \\ & \alpha_{i} \geq 0 \\ & \mu_{i} \geq 0, \quad i=1,2, \cdots, N \end{align*}
等價的
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
線性支持向量機(軟間隔支持向量機)學習算法:
輸入:訓練數據集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} ,其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
輸出:最大間隔分離超平面和分類決策函數
1. 選擇懲罰參數C \geq 0 ,構建並求解約束最優化問題
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} \left( x_{i} \cdot x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
求得最優解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right)
2. 計算
\begin{align*} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{align*}
並選擇\alpha^{*} 的一個分量0 < \alpha_{j}^{*} < C ,計算
\begin{align*} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} \left( x_{i} \cdot x_{j} \right) \end{align*}
3. 得到分離超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分類決策函數
\begin{align*} \\& f \left( x \right) = sign \left( w^{*} \cdot x + b^{*} \right) \end{align*}
(軟間隔)支持向量:線性不可分情況下,最優化問題的解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{2}^{*}, \cdots, \alpha_{N}^{*} \right)^{T} 中對應於\alpha_{i}^{*} > 0 的樣本點\left( x_{i}, y_{i} \right) 的實例x_{i}
實例x_{i} 的幾何間隔
\begin{align*} \\& \gamma_{i} = \dfrac{y_{i} \left( w \cdot x_{i} + b \right)}{ \| w \|} = \dfrac{| 1 - \xi_{i} |}{\| w \|} \end{align*}
\dfrac{1}{2} | H_{1}H_{2} | = \dfrac{1}{\| w \|}
則實例x_{i} 到間隔邊界的距離
\begin{align*} \\& \left| \gamma_{i} - \dfrac{1}{\| w \|} \right| = \left| \dfrac{| 1 - \xi_{i} |}{\| w \|} - \dfrac{1}{\| w \|} \right| \\ & = \dfrac{\xi_{i}}{\| w \|}\end{align*}
\begin{align*} \xi_{i} \geq 0 \Leftrightarrow \left\{ \begin{aligned} \ & \xi_{i}=0, x_{i}在間隔邊界上; \\ & 0 < \xi_{i} < 1, x_{i}在間隔邊界與分離超平面之間; \\ & \xi_{i}=1, x_{i}在分離超平面上; \\ & \xi_{i}>1, x_{i}在分離超平面誤分類一側; \end{aligned} \right.\end{align*}
線性支持向量機(軟間隔)的合頁損失函數
\begin{align*} \\& L \left( y \left( w \cdot x + b \right) \right) = \left[ 1 - y \left(w \cdot x + b \right) \right]_{+} \end{align*}
其中,“+”爲取正函數
\begin{align*} \left[ z \right]_{+} = \left\{ \begin{aligned} \ & z, z > 0 \\ & 0, z \leq 0 \end{aligned} \right.\end{align*}
核函數
\mathcal{X} 是輸入空間(歐氏空間R^{n} 的子集或離散集合),\mathcal{H} 是特徵空間(希爾伯特空間),如果存在一個從\mathcal{X}\mathcal{H} 的映射
\begin{align*} \\& \phi \left( x \right) : \mathcal{X} \to \mathcal{H} \end{align*}
使得對所有x,z \in \mathcal{X} ,函數K \left(x, z \right) 滿足條件
\begin{align*} \\ & K \left(x, z \right) = \phi \left( x \right) \cdot \phi \left( z \right) \end{align*}
則稱K \left(x, z \right) 爲核函數,\phi \left( x \right) 爲映射函數,式中\phi \left( x \right) \cdot \phi \left( z \right)\phi \left( x \right)\phi \left( z \right) 的內積。
常用核函數:
1. 多項式核函數
\begin{align*} \\& K \left( x, z \right) = \left( x \cdot z + 1 \right)^{p} \end{align*}
2. 高斯核函數
\begin{align*} \\& K \left( x, z \right) = \exp \left( - \dfrac{\| x - z \|^{2}}{2 \sigma^{2}} \right) \end{align*}
非線性支持向量機:從非線性分類訓練集,通過核函數與軟間隔最大化,學習得到分類決策函數
\begin{align*} \\& f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left(x, x_{i} \right) + b^{*} \right) \end{align*}
稱爲非線性支持向量機,K \left( x, z \right) 是正定核函數。
非線性支持向量機學習算法:
輸入:訓練數據集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} ,其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N
輸出:分類決策函數
1. 選擇適當的核函數K \left( x, z \right) 和懲罰參數C \geq 0 ,構建並求解約束最優化問題
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
求得最優解\alpha^{*} = \left( \alpha_{1}^{*}, \alpha_{1}^{*}, \cdots, \alpha_{N}^{*} \right)
2. 計算
\begin{align*} \\ & w^{*} = \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} x_{i} \end{align*}
並選擇\alpha^{*} 的一個分量0 < \alpha_{j}^{*} < C ,計算
\begin{align*} \\ & b^{*} = y_{j} - \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) \end{align*}
3. 得到分離超平面
\begin{align*} \\ & w^{*} \cdot x + b^{*} = 0 \end{align*}
以及分類決策函數
\begin{align*} \\& f \left( x \right) = sign \left( \sum_{i=1}^{N} \alpha_{i}^{*} y_{i} K \left( x_{i}, x_{j} \right) + b^{*} \right) \end{align*}
序列最小最優化(sequential minimal optimization,SMO)算法 要解如下凸二次規劃的對偶問題:
\begin{align*} \\ & \min_{\alpha} \dfrac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K \left( x_{i}, x_{j} \right) - \sum_{i=1}^{N} \alpha_{i} \\ & s.t. \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2, \cdots, N \end{align*}
選擇\alpha_{1}, \alpha_{2} 兩個變量,其他變量\alpha_{i} \left( i = 3, 4, \cdots, N \right) 是固定的,SMO的最優化問題的子問題
\begin{align*} \\ & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{align*}
其中,K_{ij} = K \left( x_{i}, x_{j} \right), i,j = 1,2, \cdots, N, \varsigma 是常數,且省略了不含\alpha_{1}, \alpha_{2} 的常數項。
設凸二次規劃的對偶問題的初始可行解爲\alpha_{1}^{old}, \alpha_{2}^{old} ,最優解爲\alpha_{1}^{new}, \alpha_{2}^{new} ,且在沿着約束方向未經剪輯時\alpha_{2} 的最優解爲 \alpha_{2}^{new,unc}
由於\alpha_{2}^{new} 需要滿足0 \leq \alpha_{i} \leq C ,所以最優解\alpha_{2}^{new} 的取值範圍需滿足
\begin{align*} \\ & L \leq \alpha_{2}^{new} \leq H \end{align*}
其中,L與H是\alpha_{2}^{new} 所在的對角線段斷點的界。
如果y_{1} \neq y_{2} ,則
\begin{align*} \\ & L = \max \left( 0, \alpha_{2}^{old} - \alpha_{1}^{old} \right), H = \min \left( C, C + \alpha_{2}^{old} - \alpha_{1}^{old} \right) \end{align*}
如果y_{1} = y_{2} ,則
\begin{align*} \\ & L = \max \left( 0, \alpha_{2}^{old} + \alpha_{1}^{old} - C \right), H = \min \left( C, \alpha_{2}^{old} + \alpha_{1}^{old} \right) \end{align*}

\begin{align*} \\ & g \left( x \right) = \sum_{i=1}^{N} \alpha_{i} y_{i} K \left( x_{i}, x \right) + b \end{align*}

\begin{align*} \\ & E_{i} = g \left( x_{i} \right) - y_{i} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K \left( x_{j}, x_{i} \right) + b \right) - y_{i}, \quad i=1,2 \\ & v_{i} = \sum_{j=3}^{N} \alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) = g \left( x_{i} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K \left( x_{i}, x_{j} \right) - b, \quad i=1,2\end{align*}

\begin{align*} \\ & W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} v_{1} \alpha_{1}+ y_{2} v_{2} \alpha_{2} \end{align*}
由於\alpha_{1} y_{1} = \varsigma, y_{i}^{2} = 1 ,可將\alpha_{1} 表示爲
\begin{align*} \\ & \alpha_{1} = \left( \varsigma - y_{2} \alpha_{2} \right) y_{1}\end{align*}
代入,得
\begin{align*} \\ & W \left( \alpha_{2} \right) = \dfrac{1}{2} K_{11} \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \right]^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left[ \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + \alpha_{2} \right] + y_{1} v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} + y_{2} v_{2} \alpha_{2} \\ & = \dfrac{1}{2} K_{11} \left( \varsigma - y_{2} \alpha_{2} \right)^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{2} K_{12} \left( \varsigma - y_{2} \alpha_{2} \right) \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \varsigma - y_{2} \alpha_{2} \right) y_{1} - \alpha_{2} + v_{1} \left( \varsigma - y_{2} \alpha_{2} \right) + y_{2} v_{2} \alpha_{2} \end{align*}
\alpha_{2} 求導
\begin{align*} \\ & \dfrac {\partial W}{\partial \alpha_{2}} = K_{11} \alpha_{2} + K_{22} \alpha_{2} -2 K_{12} \alpha_{2} \\ & \quad\quad\quad - K_{11} \varsigma y_{2} + K_{12} \varsigma y_{2} + y_{1} y_{2} -1 - v_{1} y_{2} + y_{2} v_{2} \end{align*}
令其爲0,得
\begin{align*} \\ & \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2} = y_{2} \left( y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + v_{1} - v_{2} \right) \\ & \quad\quad\quad\quad\quad\quad\quad\quad = y_{2} \left[ y_{2} - y_{1} + \varsigma K_{11} - \varsigma K_{12} + \left( g \left( x_{1} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_1j - b \right) \\ \quad\quad\quad\quad\quad\quad\quad\quad - \left( g \left( x_{2} \right) - \sum_{j=1}^{2}\alpha_{j} y_{j} K_2j - b \right) \right]\end{align*}
\varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} 代入,得
\begin{align*} \\ & \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{new,unc} = y_{2} \left( \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} y_{2} + y_{2} - y_{1} + g \left( x_{1} \right) - g \left( x_{2} \right) \right) \\ & \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad = \left( K_{11} + K_{22} - 2 K_{12} \right) \alpha_{2}^{old} + y_{2} \left( E_{1} - E_{2} \right) \end{align*}
\eta = K_{11} + K_{22} - 2 K_{12} 代入,得
\begin{align*} \\ & \alpha_{2}^{new,unc} = \alpha_{2}^{old} + \dfrac{y_{2} \left( E_{1} - E_{2} \right)}{\eta}\end{align*}
經剪輯後
\begin{align*} \alpha_{2}^{new} = \left\{ \begin{aligned} \ & H, \alpha_{2}^{new,unc} > H \\ & \alpha_{2}^{new,unc}, L \leq \alpha_{2}^{new,unc} \leq H \\ & L, \alpha_{2}^{new,unc} < L \end{aligned} \right.\end{align*}
由於\varsigma = \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2}\varsigma = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2}

\begin{align*} \\ & \alpha_{1}^{old} y_{1} + \alpha_{2}^{old} y_{2} = \alpha_{1}^{new} y_{1} + \alpha_{2}^{new} y_{2} \\ & \quad\quad\quad\quad \alpha_{1}^{new} = \alpha_{1}^{old} + y_{1} y_{2} \left( \alpha_{2}^{old} - \alpha_{2}^{new} \right) \end{align*}
由分量0 < \alpha_{1}^{new} < C ,則
\begin{align*} \\ & b_1^{new} = y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} - \alpha_{1}^{new} y_{1} K_{11} - \alpha_{2}^{new} y_{2} K_{21} \end{align*}

\begin{align*} \\ & E_{1} = g \left( x_{1} \right) - y_{1} = \left( \sum_{j=1}^{N} \alpha_{j} y_{j} K_{ij} + b \right) - y_{1} \\ & = \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} - y_{1} \end{align*}

\begin{align*} \\ & y_{1} - \sum_{i=3}^{N} \alpha_{i} y_{i} K_{i1} = -E_{1} + \alpha_{1}^{old} y_{1} K_{11} + \alpha_{2}^{old} y_{2} K_{21} + b^{old} \end{align*}
代入,得
\begin{align*} \\ & b_1^{new} = -E_{1} + y_{1} K_{11} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{21} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{align*}
同理,得
\begin{align*} \\ & b_2^{new} = -E_{2} + y_{1} K_{12} \left( \alpha_{1}^{new} - \alpha_{1}^{old} \right) - y_{2} K_{22} \left( \alpha_{2}^{new} - \alpha_{2}^{old} \right) + b^{old} \end{align*}
如果\alpha_{1}^{new}, \alpha_{2}^{new} 滿足0 < \alpha_{i}^{new} < C, i = 1, 2

\begin{align*} \\ & b^{new} = b_{1}^{new} = b_{2}^{new}\end{align*}
否則
\begin{align*} \\ & b^{new} = \dfrac{b_{1}^{new} + b_{2}^{new}}{2} \end{align*}
更新E_{i}
\begin{align*} \\ & E_{i}^{new} = \sum_{S} y_{j} \alpha_{j} K_{ \left( x_{i}, x_{j} \right)} + b^{new} - y_{i} \end{align*}
其中,S 是所有支持向量x_{j} 的集合。
SMO算法:
輸入:訓練數據集T = \left\{ \left( x_{1}, y_{1} \right), \left( x_{2}, y_{2} \right), \cdots, \left( x_{N}, y_{N} \right) \right\} ,其中x_{i} \in \mathcal{X} = R^{n}, y_{i} \in \mathcal{Y} = \left\{ +1, -1 \right\}, i = 1, 2, \cdots, N ,精度\varepsilon
輸出:近似解\hat \alpha
1. 取初始值\alpha^{0} = 0 ,令k = 0
2. 選取優化變量\alpha_{1}^{\left( k \right)},\alpha_{2}^{\left( k \right)} ,求解
\begin{align*} \\ & \min_{\alpha_{1}, \alpha_{2}} W \left( \alpha_{1}, \alpha_{2} \right) = \dfrac{1}{2} K_{11} \alpha_{1}^{2} + \dfrac{1}{2} K_{22} \alpha_{2}^{2} + y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2} \\ & \quad\quad\quad\quad\quad\quad - \left( \alpha_{1} + \alpha_{2} \right) + y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i1} + y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_i K_{i2} \\ & s.t. \quad \alpha_{1} + \alpha_{2} = -\sum_{i=3}^{N} \alpha_{i} y_{i} = \varsigma \\ & 0 \leq \alpha_{i} \leq C , \quad i=1,2 \end{align*}
求得最優解\alpha_{1}^{\left( k+1 \right)},\alpha_{2}^{\left( k+1 \right)} ,更新\alpha\alpha^{\left( k+1 \right)}
3. 若在精度\varepsilon 範圍內滿足停機條件
\begin{align*} \\ & \sum_{i=1}^{N} \alpha_{i} y_{i} = 0 \\ & 0 \leq \alpha_{i} \leq C, i = 1, 2, \cdots, N \\ & \end{align*}
\begin{align*} y_{i} \cdot g \left( x_{i} \right) = \left\{ \begin{aligned} \ & \geq 1, \left\{ x_{i} | \alpha_{i} = 0 \right\} \\ & = 1, \left\{ x_{i} | 0 < \alpha_{i} < C \right\} \\ & \leq 1, \left\{ x_{i} | \alpha_{i} = C \right\} \end{aligned} \right.\end{align*}
則轉4.;否則令k = k + 1 ,轉2.;
4.取\hat \alpha = \alpha^{\left( k + 1 \right)}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章