文章目錄

支持向量機（SVM）

可做線性或者非線性的分類、迴歸，甚至異常值檢測。
適用於：複雜但中小規模數據集的分類問題。

一、線性支持向量機分類

1、硬間隔分類

左圖雖紅線和紫線都可以區分不同類別，但是兩條直線都非常靠近樣本，如果有新的樣本加入，有比較大的可能會分類錯誤。

右圖爲線性SVM結果，其的**思想：決策邊界在正確分類的同時離最近的樣本儘可能的遠。**而這些最近的樣本（途中虛線上的點）即爲支持向量(support vector)。因此只要沒有點在這些點劃分的區域之間，決策邊界就只由這些支持向量所決定。

注意：SVM對特徵縮放敏感==》需先進行
不足：

只適用於線性可分數據集；
對異常值敏感，eg：如下圖所示

2、軟間隔分類

在sklearn中的SVM類，用 C 超參數（懲罰係數，鬆弛因子） 來控制軟的程度：較小的 C 會導致更大的 “間隔”，但更多的“間隔”違規。

實現：鳶尾花（ Iris）數據集，縮放特徵，並訓練一個線性 SVM 模型（使用 LinearSVC 類，超參數 C=1 ， hinge 損失函數）來檢測 Virginica 鳶尾花，生成的模型。

import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
x = iris["data"][:,(2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64)

svm_clf = Pipeline((
    ("scaler", StandardScaler()),
    ("linear_svc",LinearSVC(C=1, loss="hinge"))
))
svm_clf.fit(x, y)
predict = svm_clf.predict([[5.5, 1.7]])
print(predict)

輸出結果

[1.]

注意：

SVM分類器不像 Logistic 迴歸分類器一樣有 predict_proba() 方法來計算得分（概率），因爲SVM只是靠支持向量來構建決策線。
loss函數一定要記得填 hinge，因爲默認不是 hinge。

不同實現：

SVC類 svc(kernel="linear", C=1)，在較大訓練集上慢，不推薦；
SGDClassifier 類，SGDClassifier(loss="hinge", alpha=1/(m*C))，這種使用隨機梯度下降方法來訓練SVM，雖然沒有LinearSVC收斂快，但是能夠處理數據量龐大的訓練集，而且能夠在線學習。

二、非線性支持向量機分類

處理非線性數據集方法：

增加更多的特徵（eg：多項式特徵，然後採用線性SVM。例如添加 $x^2、x^3$ 特徵）
直接採用多項式的kernel，直接進行非線性SVM的分類。

注意：

SVC中的 參數C 越大，對於訓練集來說，其誤差越小，但是很容易發生過擬合；C 越小，則允許有更多的訓練集誤分類，相當於soft margin。
SVC中的 參數coef0 反映了高階多項式相對於低階多項式對模型的影響，如果發生了過擬合的現象，則可以減小 coef0；如果發生了欠擬合的現象，可以試着增大 coef0

1、多項式核（Polynomial Kernel）

from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

polynomial_svm_clf = Pipeline((
	# 多項式特徵
    ("poly_features", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge"))
))
polynomial_svm_clf.fit(X, y)

增加多項式特徵來進行非線性分類。但是如果degree設置的比較小，則比較難分類比較複雜的數據；如果degree設置的比較大，產生了大量模型，導致訓練的非常慢。

解決方法：“核技巧（Kernel Trick）”
使用 3 階（ degree=3 ）的多項式核訓練一個SVM分類器，超參數 coef0 控制I高階多項式與低階多項式對模型的影響。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons

def plot_dataset(X, y, axes):
    plt.plot( x[:,0][y==0], x[:,1][y==0], "bs" )
    plt.plot( x[:,0][y==1], x[:,1][y==1], "g^" )
    plt.axis( axes )
    plt.grid( True, which="both" )
    plt.xlabel(r"$x_l$")
    plt.ylabel(r"$x_2$")

# contour函數是畫出輪廓，需要給出X和Y的網格，以及對應的Z，它會畫出Z的邊界（相當於邊緣檢測及可視化）
def plot_predict(clf, axes):
    x0s = np.linspace(axes[0], axes[1], 100)
    x1s = np.linspace(axes[2], axes[3], 100)
    x0, x1 = np.meshgrid( x0s, x1s )
    X = np.c_[x0.ravel(), x1.ravel()]
    y_pred = clf.predict( X ).reshape( x0.shape )
    y_decision = clf.decision_function( X ).reshape( x0.shape )
    plt.contour( x0, x1, y_pred, cmap=plt.cm.winter, alpha=0.5 )
    plt.contour( x0, x1, y_decision, cmap=plt.cm.winter, alpha=0.2 )

x, y = make_moons(n_samples=100, noise=0.15, random_state=2019)
poly_kernel_svm_clf = Pipeline((
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
))
poly_kernel_svm_clf.fit(x, y)
plot_dataset( x, y, [-1.5, 2.5, -1, 1.5])
plot_predict(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plt.show()

輸出結果

==》超參數優化：網格搜素

不同參數的影響

from sklearn.svm import SVC

poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=0.5))
])
poly_kernel_svm_clf.fit(x, y)
plt.figure(figsize=(9, 3))
plt.subplot(131)
plot_dataset(x, y, [-1.5, 2.5, -1, 1.5])
plot_predict(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])

poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=10))
])
poly_kernel_svm_clf.fit(x, y)
plt.subplot(132)
plot_dataset(x, y, [-1.5, 2.5, -1, 1.5])
plot_predict(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])

poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=100, C=0.5))
])
poly_kernel_svm_clf.fit(x, y)
plt.subplot(133)
plot_dataset(x, y, [-1.5, 2.5, -1, 1.5])
plot_predict(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])

plt.show()

2、增加相似特徵

思路：使用相似函數（similarity function）計算每個樣本與特定地表（landmark）的相似度。
此處，定義相似函數：高斯徑向基函數（Gaussian Radial Basis Function，RBF），設置 $\gamma=0.3$

Gaussian徑向基函數：
$\Phi_\gamma(x,l)=e^{(-\gamma||x-l||^2)}$
其中， $l$ 表示地標，最簡單的地標的選擇方法：在數據集中的每一個樣本的位置創建地標，在轉換特徵之後，需刪除原始特徵。缺點：當訓練集非常大，轉換後特徵也非常大。

==》解決方法：高斯 RBF 核
當數據在低維空間中不可分割的時候，可以嘗試將它們映射到高維空間，通過核函數來進行這樣的映射操作。

rbf_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
])
rbf_kernel_svm_clf.fit(x, y)
plt.figure(figsize=(6, 3))
plt.subplot(121)
rbf_kernel_svm_clf.fit(x, y)
plot_dataset(x, y, [-1.5, 2.5, -1, 1.5])
plot_predict(rbf_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])

rbf_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="rbf", gamma=0.1, C=0.001))
])
plt.subplot(122)
rbf_kernel_svm_clf.fit(x, y)
plot_dataset(x, y, [-1.5, 2.5, -1, 1.5])
plot_predict(rbf_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plt.
plt.show()

增大 γ 使鍾型曲線更窄），導致每個樣本的影響範圍變得更小：即判定邊界最終變得更不規則，在單個樣本週圍環繞。相反的，較小的 γ 值使鍾型曲線更寬，樣本有更大的影響範圍，判定邊界最終則更加平滑。
==》 γ 是可調整的超參數：如果模型過擬合，你應該減小 γ 值，若欠擬合，則增大 γ （與超參數 C 相似）。

其他核函數：字符串核（String Kernels）、SSK核、編輯距離的核函數。

核函數的選擇：一般先嚐試線性核函數（Linear比SVC(kernel=“linear”)要快得多），尤其是訓練集很大或有大量特徵的情況下。若訓練集不太大，則嘗試高斯徑向基核，其對大多數情況下都很有效。

3、計算複雜度

LinearSVC 類基於 liblinear 庫，是線性 SVM 的優化算法，不支持核技巧，訓練時間複雜度大約爲 $O(m × n)$ ， $m$ 爲樣本個數， $n$ 爲特徵數。

SVC 類基於 libsvm 庫，支持核技巧。訓練時間複雜度：介於 $O(m^2\times n) 和 O(m^3\times n)$ 之間。適用於：複雜但小型或中等數量的數據集。可以對特徵數量進行縮放，尤其是稀疏特徵（sparse features）。

三、SVM迴歸

SVM也可以用於迴歸問題，有線性SVM與非線性SVM迴歸。

SVM迴歸任務是限制間隔違規情況下，儘量放置更多的樣本在“間隔（margin）”上，“間隔（margin）”由超參數 $\epsilon$ 控制。在間隔之內添加數據樣本不會影響模型的預測，因此這個模型認爲是不敏感的（ $\epsilon-insensitive$ ）

1、線性SVM迴歸

np.random.seed(42)
m = 50
X = 2 * np.random.rand(m, 1)
y = (4 + 3 * X + np.random.randn(m, 1)).ravel()

## 找到訓練集中所有支持向量的下標
def find_support_vectors(svm_reg, X, y):
    y_pred = svm_reg.predict(X)
    off_margin = np.abs(y - y_pred) >= svm_reg.epsilon
    ## 返回 off_margin 中值爲 True 的下標
    return np.argwhere(off_margin)

def plot_svm_regression(svm_reg, X, y, axes):
    x1s = np.linspace(axes[0], axes[1], 100).reshape(-1, 1)
    y_pred = svm_reg.predict(x1s)
    plt.plot(x1s, y_pred, "r-", linewidth=2, label="$\hat{y}$")
    plt.plot(x1s, y_pred - svm_reg.epsilon, "k--")
    plt.plot(x1s, y_pred + svm_reg.epsilon, "k--")
    plt.plot(X, y, "bo")
    plt.scatter(X[svm_reg.support_], y[svm_reg.support_], s=180, facecolors="#FFAAAA")
    plt.xlabel(r"$x_1$", fontsize=18)
    plt.legend(loc="upper left", fontsize=18)
    plt.axis(axes)

svm_reg_1 = LinearSVR(epsilon=1.5, random_state=2019)
svm_reg_2 = LinearSVR(epsilon=0.5, random_state=2019)
svm_reg_1.fit(X, y)
svm_reg_2.fit(X, y)

svm_reg_1.support_ = find_support_vectors(svm_reg_1, X, y)
svm_reg_2.support_ = find_support_vectors(svm_reg_2, X, y)

eps_x1 = 1
eps_y_pred = svm_reg_1.predict([[eps_x1]])
plt.figure(figsize=(8, 3))
plt.subplot(121)
plot_svm_regression(svm_reg_1, X, y, [0, 2, 3, 11])
plt.title(r"$\epsilon={}$".format(svm_reg_1.epsilon), fontsize=18)
plt.ylabel(r"$y$", fontsize=18, rotation=0)
plt.annotate(
    '', xy=(eps_x1, eps_y_pred), xycoords='data',
    xytext=(eps_x1, eps_y_pred - svm_reg_1.epsilon),
    textcoords='data', arrowprops={'arrowstyle':'<->','linewidth':1.5}
)
plt.text(0.9, 5.6, r"$\epsilon$",fontsize=20)

plt.subplot(122)
plot_svm_regression(svm_reg_2, X, y, [0, 2, 3, 11])
plt.title(r"$\epsilon={}$".format(svm_reg_2.epsilon), fontsize=18)
plt.show()

2、非線性SVM迴歸

多項式迴歸，指定SVM的kernel爲poly即可

from sklearn.svm import SVR

np.random.seed(42)
m = 100
X = 2 * np.random.rand(m, 1) - 1
y = (0.2 + 0.1 * X + 0.5 * X ** 2 + np.random.randn(m, 1)/10).ravel()
#
svm_poly_reg1 = SVR(kernel="poly", degree=2, C=100, epsilon=0.1)
svm_poly_reg2 = SVR(kernel="poly", degree=2, C=0.01, epsilon=0.1)
svm_poly_reg1.fit(X, y)
svm_poly_reg2.fit(X, y)

plt.figure(figsize=(8, 3))
plt.subplot(121)
plot_svm_regression(svm_poly_reg1, X, y, [-1, 1, 0, 1])
plt.title(r"$degree={}, C={}, \epsilon={}$".format(svm_poly_reg1.degree, svm_poly_reg1.C, svm_poly_reg1.epsilon), fontsize=18)
plt.ylabel(r"$y$", fontsize=18, rotation=0)
plt.subplot(122)
plot_svm_regression(svm_poly_reg2, X, y, [-1, 1, 0, 1])
plt.title(r"$degree={}, C={}, \epsilon={}$".format(svm_poly_reg2.degree, svm_poly_reg2.C, svm_poly_reg2.epsilon), fontsize=18)
plt.show()

四、SVM理論

（待補充）

用Scikit-learn和TensorFlow進行機器學習（五）

文章目錄

支持向量機（SVM）

一、線性支持向量機分類

1、硬間隔分類

2、軟間隔分類

二、非線性支持向量機分類

1、多項式核（Polynomial Kernel）

2、增加相似特徵

3、計算複雜度

三、SVM迴歸

1、線性SVM迴歸

2、非線性SVM迴歸

四、SVM理論

Nginx R31 doc 官方文檔-01-nginx 如何安裝

golang開發環境搭建(win10)

【Keras】學習筆記（一）

典型分類器評價指標及實例

【論文】Legal Judgment Prediction via Topological Learning

【Paper】Few-Shot Charge Prediction with Discriminative Legal Attributes

【Paper】Learning to Predict Charges for Criminal Cases with Legal Basis

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結