支持向量機－手寫數字識別

原創

2020-02-23 15:37

支持向量機分類器：

決定分類直線位置的樣本並不是所有的訓練數據，而是其中對兩個空間間隔最小的兩個不同類別的數據點，把這種可以用來真正幫助決策最優賢行分類模型的數據點叫做“支持向量”。LR模型由於在訓練過程中考慮了所有訓練樣本對於參數的影響，因此不一定能獲得最佳的分類器。

本文使用支持向量機分類器處理sklearn內部集成的手寫字體數字圖片數據集。(sklearn中集成的手寫體數字圖像僅僅是https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits的測試數據集。)

Python源碼

#coding=utf-8
from sklearn.datasets import load_digits
#-------------
from sklearn.cross_validation import train_test_split
#-------------
#load data standardize model
from sklearn.preprocessing import StandardScaler
#load SVM:LinearSVC which is based on Linear hypothesis
from sklearn.svm import LinearSVC
#-------------
from sklearn.metrics import classification_report

#-------------  store handwrite num datas on digits
digits=load_digits()
print 'Total dataset shape',digits.data.shape
#-------------  data prepare
#75% training set,25% testing set
X_train,X_test,y_train,y_test=train_test_split(digits.data,digits.target,test_size=0.25,random_state=33)
print 'training data shape',y_train.shape
print 'testing data shape',y_test.shape
#-------------  training
ss=StandardScaler()
X_train=ss.fit_transform(X_train)
X_test=ss.transform(X_test)

#initialize LinearSVC
lsvc=LinearSVC()
#training model
lsvc.fit(X_train,y_train)
#use trained model to predict testing dataset,and store the result on y_predict
y_predict=lsvc.predict(X_test)

#-------------  performance measure
print 'The Accuracy is',lsvc.score(X_test,y_test)

print classification_report(y_test,y_predict,target_names=digits.target_names.astype(str))

Result：

Total dataset shape (1797, 64)
training data shape (1347,)
testing data shape (450,)
The Accuracy of Linear SVC is 0.953333333333
precision recall f1-score support
0 0.92 1.00 0.96 35
1 0.96 0.98 0.97 54
2 0.98 1.00 0.99 44
3 0.93 0.93 0.93 46
4 0.97 1.00 0.99 35
5 0.94 0.94 0.94 48
6 0.96 0.98 0.97 51
7 0.92 1.00 0.96 35
8 0.98 0.84 0.91 58
9 0.95 0.91 0.93 44
avg / total 0.95 0.95 0.95 450

R,P 和F1指標最先使用於二分類任務，在數字識別中有0-9共計10個類別，無法直接計算三個性能指標。通常逐一來進行計算：把其他的類別看作負樣本，因此創造了十個二分類任務

SVM模型曾經在ML領域繁榮了很長一段時間，由於其精妙的模型假設，可以幫助在海量甚至更高維度的數據中，篩選對預測任務最爲有效的少數數據樣本。這樣不僅節省了模型學習需要的數據內存，也提高了模型的預測性能。但如此的優勢要付出更多的計算代價。實際使用該模型時候，需要權衡利弊，達成任務目標。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

支持向量機－手寫數字識別

Wireshark 安裝+使用（一）

線性分類器－Tumer Prediction

普通程序員如何轉向AI方向

蘋果核 - 天貓APP改版之全新大首頁架構&開發模式全面升級-TAC

支持向量機－手寫數字識別

樸素貝葉斯－新聞分類

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結