利用 sklearn 貝葉斯分類器對 IRIS 數據集分類

貝葉斯分類的基本思想一言以蔽之“將樣本歸爲其後驗概率最大的那個類”。

具體原理參考: http://www.cnblogs.com/leoo2sk/archive/2010/09/17/naive-bayesian-classifier.html

sklearn 工具包中對根據樣本的分佈特性對樸素貝葉斯分類器進行了實現，分爲以下幾個具體情況：

樸素貝葉斯-高斯模型
樸素貝葉斯-多項式模型
樸素貝葉斯-伯努利模型

參考官方文檔：http://sklearn.lzjqsdd.com/modules/naive_bayes.html

其中，高斯模型應用最普遍，本文調用 sklearn 工具包中樸素貝葉斯-高斯模型分類器（GaussianNB）對 IRIS 進行分類。

嚴格來講首先應該進行假設檢驗，判斷樣本是否符合高斯分佈。在這裏將這一步驟省略，以分佈直方圖的形式直觀展現樣本的分佈特徵。

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.ticker import PercentFormatter

if __name__ == '__main__':
  
    iris = datasets.load_iris() 
    print(type(iris), dir(iris))

    x = iris.get('data')
    y = iris.get('target')

    # show attributes histogram
    c = np.unique(y)
    ind = []
    ind.append(y==c[0])
    ind.append(y==c[1])
    ind.append(y==c[2])
    bin_num = 40
    fig, axes = plt.subplots(len(c),4)
    for i, ax in enumerate(axes.flat):
        ind_ = ind[i//4]
        j = i%4
        ax.hist(x[ind_,j], bins=bin_num)

    axes[0,0].set_ylabel("y = 0")
    axes[1,0].set_ylabel("y = 1")
    axes[2,0].set_ylabel("y = 2")
    axes[0,0].set_title("attribute 0")
    axes[0,1].set_title("attribute 1")
    axes[0,2].set_title("attribute 2")
    axes[0,3].set_title("attribute 3")
    plt.show()

從分佈直方圖看出，樣本數據的分佈呈現單峯特性，近似服從高斯分佈。

下面對數據集進行劃分，分類和測試。

 # 隨機劃分訓練集和測試集
    num = x.shape[0] # 樣本總數
    ratio = 7/3 # 劃分比例，訓練集數目:測試集數目
    num_test = int(num/(1+ratio)) # 測試集樣本數目
    num_train = num -  num_test # 訓練集樣本數目
    index = np.arange(num) # 產生樣本標號
    np.random.shuffle(index) # 洗牌
    x_test = x[index[:num_test],:] # 取出洗牌後前 num_test 作爲測試集
    y_test = y[index[:num_test]]
    x_train = x[index[num_test:],:] # 剩餘作爲訓練集
    y_train = y[index[num_test:]]

    gnb = GaussianNB()
    gnb.fit(x_train, y_train)
    y_test_pre = gnb.predict(x_test)

    # 計算分類準確率
    acc = sum(y_test_pre==y_test)/num_test
    print('The accuracy is', acc) # 顯示預測準確率

分類結果顯示：

The accuracy is 0.9111111111111111

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

貝葉斯分類器原理和應用

利用 sklearn 貝葉斯分類器對 IRIS 數據集分類

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

MATLAB控制系統校正工具sisotool的使用

基於 OpenCV PCA實現過程

OpenCV人臉檢測例程分析

MATLAB文件目錄操作常用函數

MATLAB繪圖屬性操作--學會使用句柄

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結