R語言實現黑箱方法——支持向量機

原創

2020-02-23 14:19

Support Vector Machines -------------------

Step 1: Exploring and preparing the data ----

read in data and examine structure
將輸入讀入到R中，確認接收到的數據具有16個特徵，這些特徵定義了每一個字母的案例。

letters <- read.csv("F:\\rwork\\Machine Learning with R (2nd Ed.)\\Chapter 07\\letterdata.csv")
str(letters)

支持向量機學習算法要求所有特徵都是數值型的，並且每一個特徵需要壓縮到一個相當小的區間中。

divide into training and test data
一部分作爲訓練數據，一部分作爲測試數據

letters_train <- letters[1:16000, ]
letters_test  <- letters[16001:20000, ]

Step 2: Training a model on the data ----訓練模型

begin by training a simple linear SVM
#install.packages(‘kernel’)

爲了提供度量度量支持向量機性能的基準，我們從訓練一個簡單的線性支持向量機分類器開始。

library(kernlab)
letter_classifier <- ksvm(letter ~ ., data = letters_train,
                          kernel = "vanilladot")

ksvm函數默認使用高斯RBF核函數
vanilladot表示線性函數

look at basic information about the model

letter_classifier

這裏沒有提供任何信息告訴我們模型在真實世界中運行的好壞，所以想下面我們用測試數據來研究模型的性能。

Step 3: Evaluating model performance ----評估模型性能

predictions on testing dataset

letter_predictions <- predict(letter_classifier, letters_test)
head(letter_predictions)

這裏我們用table函數對預測值和真實值之間進行比較

table(letter_predictions, letters_test$letter)

對角線的值144、121.120.156和127表示的是預測值與真實值相匹配的總記錄數。同樣，出錯的數目也列出來了。例如，位於行B和列D的值5表示有5種情況將字母D誤認爲字母B。

單個地看每個錯誤類型，可能會揭示一些有趣的關於模型識別有困難的特定字母類型的模式，但這也是很耗費時間的。因此，我們可以通過計算整體的準確度來簡化我們的評估，即只考慮預測的字母是正確的還是不正確的，並忽略錯誤的類型。

look only at agreement vs. non-agreement
construct a vector of TRUE/FALSE indicating correct/incorrect predictions

下面的命令返回一個元素爲TRUE或者FALSE值的向量，表示在測試數據集中，模型預測的字母是否與真實的字母相符(即匹配)。

agreement <- letter_predictions == letters_test$letter

使用table()函數，我們看到，在4000個測試記錄中，分類器正確識別的字母有3357個:

table(agreement)

以百分比計算，準確度大約爲84%

prop.table(table(agreement))

Step 4: Improving model performance ----提高模型性能

之前的支持向量機模型使用簡單的線性核函數。通過使用一-個更復雜的核函數,我們可以將數據映射到一個更高維的空間，並有可能獲得-一個較好的模型擬合度。

然而，從許多不同的核函數進行選擇是具有挑戰性的。一個流行的慣例就是從高斯RBF核函數開始，因爲它已經被證明對於許多類型的數據都能運行得很好。我們可以使用ksvm()函數來訓練-一個基於RBF的支持向量機，如下所示：

set.seed(12345)
letter_classifier_rbf <- ksvm(letter ~ ., data = letters_train, kernel = "rbfdot")
letter_predictions_rbf <- predict(letter_classifier_rbf, letters_test)

最後，與我們的線性支持向量機的準確度進行比較：

agreement_rbf <- letter_predictions_rbf == letters_test$letter
table(agreement_rbf)

prop.table(table(agreement_rbf))

通過簡單地改變核函數，我們可以將字符識別模型的準確度從84%提高到93%。如果這種性能水平對於光學字符識別程序仍不能令人滿意，那麼你可以測試其他的核函數或者通過改變成本約束參數C來修正決策邊界的寬度。

歡迎指正哦~（原理百度一下很多，所以就不添加了）
需要數據請私信哦~

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

R語言實現黑箱方法——支持向量機

Support Vector Machines -------------------

Step 1: Exploring and preparing the data ----

Step 2: Training a model on the data ----訓練模型

Step 3: Evaluating model performance ----評估模型性能

Step 4: Improving model performance ----提高模型性能

R語言可視化 ggplot2—設定、映射、分組、匹配圖形屬性和圖形對象

Latex排版 Chapter2格式調整（長度單位、字體、段落、頁面、目錄）

R語言可視化 ggplot2—基本用法+顏色、大小、形狀和其他圖形屬性

R語言可視化 ggplot2—分面+其他選項

R語言可視化 ggplot2—工具箱（基本圖形類型）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

R語言實現 黑箱方法——支持向量機

Support Vector Machines -------------------

Step 1: Exploring and preparing the data ----

Step 2: Training a model on the data ----訓練模型

Step 3: Evaluating model performance ----評估模型性能

Step 4: Improving model performance ----提高模型性能

R語言實現黑箱方法——支持向量機