機器學習-支持向量機(Support Vector Machine)

原創

2020-06-30 09:10

Section I: Brief Introduction on SVM

Another powerful and widely used learning algorithm is the Support Vector Machine (SVM), which can be considered an extension of the perceptron. Using the perceptron algortihm, misclassification errors is our target to be optimized. However, for SVM, our optimization objective is to maximize the margin. The margin is defined as the distance between the separating hyperplance (decision boundary) and the training samples that are closest to this hyperplance, which are the so-called support vectors.
From
Sebastian Raschka, Vahid Mirjalili. Python機器學習第二版. 南京：東南大學出版社，2018.

Section II: Apply it to Linearly-Sperable Scenario

import matplotlib.pyplot as plt
from sklearn import datasets
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from SupportVectorMachine.visualize_test_idx import plot_decision_regions

plt.rcParams['figure.dpi']=200
plt.rcParams['savefig.dpi']=200
font = {'family': 'Times New Roman',
        'weight': 'light'}
plt.rc("font", **font)

##Section 1: Load data and split it into train/test dataset
iris=datasets.load_iris()
X=iris.data[:,[2,3]]
y=iris.target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=1,stratify=y)

#Section 2: Preprocessing data in standardized
sc=StandardScaler()
sc.fit(X_train)
X_train_std=sc.transform(X_train)
X_test_std=sc.transform(X_test)
X_combined_std=np.vstack((X_train_std,X_test_std))
y_combined=np.hstack((y_train,y_test))

#Section 3: Train SVC classifier via Sklearn
svm=SVC(kernel='linear',C=1.0,random_state=1)
svm.fit(X_train_std,y_train)

plot_decision_regions(X=X_combined_std,
                      y=y_combined,
                      classifier=svm,
                      test_idx=range(105,150))
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.savefig('./fig1.png')
plt.show()

備註：
plot_decision_regions如果不作特別說明，均爲機器學習-感知機(Perceptron)-Scikit-Learn中的plot_decision_regions函數，鏈接爲：機器學習-感知機(Perceptron)-Scikit-Learn。

Section III: Apply SVM to Nonlinearly Separable Case

第一部分：構造非線性的與非問題數據集

#Section 1: Construct nonlinear "logic_or" dataset
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(1)
X_xor=np.random.randn(200,2)
y_xor=np.logical_xor(X_xor[:,0]>0,
                     X_xor[:,1]>0)
y_xor=np.where(y_xor,1,-1)
plt.scatter(X_xor[y_xor==1,0],
            X_xor[y_xor==1,1],
            c='b',
            marker='x',
            label='1')
plt.scatter(X_xor[y_xor==-1,0],
            X_xor[y_xor==-1,1],
            c='r',
            marker='s',
            label='-1')
plt.xlim([-3,3])
plt.ylim([-3,3])
plt.legend(loc='best')
plt.savefig('./fig2.png')
plt.show()

第二部分：Cut-Off參數對劃分邊界的影響

#Section 2: Train SVC Classifier on Non-linear Model
#Section 2.1: Smaller gamma: "cut-off parameter for Gaussian sphere"
#             Used to soften or tighten the decision boundary
from SupportVectorMachine.visualize import plot_decision_regions

svm=SVC(kernel='rbf',random_state=1,gamma=0.10,C=10.0)
svm.fit(X_xor,y_xor)

plot_decision_regions(X_xor,y_xor,classifier=svm)
plt.legend(loc='upper left')
plt.savefig('./fig3.png')
plt.show()

#Section 2.2: Larger gamma
svm=SVC(kernel='rbf',random_state=1,gamma=100,C=10.0)
svm.fit(X_xor,y_xor)

plot_decision_regions(X_xor,y_xor,classifier=svm)
plt.legend(loc='upper left')
plt.savefig('./fig4.png')
plt.show()

Gamma參數：
Gamma=0.1的場景：

Gamma=100的場景：

值得注意，Gamma參數爲Gauss核函數的自由參數，C爲正則化參數。由於核函數的引入，將低維空間不可分的問題轉化到線性可分的高維空間。然而，映射到高維空間將使得特徵空間的轉換和距離計算將不可避免地導致計算量增大。因此，核函數的引入主要在於應用低維空間的特徵計算高維空間的距離量。
Roughly Spearking, the term kernel can be interpreted as a similarity function between a pair of samples. The minus sign inverts the distance measure into a similarity score, and, due to the exponential term, the resulting similarity score will fall into a range between 1 (for exactly similar sample) and 0 (for very dissimilar samples).

From
Sebastian Raschka, Vahid Mirjalili. Python機器學習第二版. 南京：東南大學出版社，2018.

參考文獻：
Sebastian Raschka, Vahid Mirjalili. Python機器學習第二版. 南京：東南大學出版社，2018.

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

機器學習-支持向量機(Support Vector Machine)

Section I: Brief Introduction on SVM

Section II: Apply it to Linearly-Sperable Scenario

Section III: Apply SVM to Nonlinearly Separable Case

985 碩士程序員，空窗 4 個月沒有 Offer！

營銷系統黑名單優化：位圖的應用解析

我真的從測試轉成了開發......

nginx添加相應配置，通過瀏覽器訪問或curl時返回客戶端對應公網IP

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

python內置函數——sorted

[oeasy]python020在遊戲中體驗數值自由_勇闖地下城_終端文字遊戲

爲何我建議你學會抄代碼

一文搞懂 Spring 循環依賴

抖音面試：說說延遲任務的調度算法？

大數據平臺分佈式搭建 - Flume部署

機器學習-支持向量機(Support Vector Machine)

機器學習-感知機(Perceptron)-Scikit-Learn

大數據平臺分佈式搭建 - Kafka之監測Kafka-Eagle部署

機器學習-感知機Perceptron

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結