樸素貝葉斯分類器
其知識詳見:http://blog.csdn.net/batuwuhanpei/article/details/51910349
- 導入模塊
import numpy as np
from sklearn.naive_bayes import GaussianNB
from func_plot_classifier import plot_classifier
- 加載數據
input_file= 'data_multivar.txt'
x = []
y = []
with open(input_file,'r') as f:
for line in f.readlines():
data=[float(i) for i in line.split(',')]
x.append(data[:-1])
y.append(data[-1])
x=np.array(x)
y=np.array(y)
- 將數據分割成訓練集和測試集
from sklearn import cross_validation
x_train,x_test,y_train,y_test= cross_validation.train_test_split(x,y,test_size=0.25,random_state=5)
- 建立一個樸素貝葉斯分類器
gaussiannb_classifier = GaussianNB()
gaussiannb_classifier.fit(x_train,y_train)
- 計算分類器的準確性
y_predict = gaussiannb_classifier.predict(x_test)
accuracy= 100*(y_predict==y_test).sum()/x_test.shape[0]
print("Accuracy of the GuassianNb Classifier: ",round(accuracy,2),'%')
代碼運行結果:
Accuracy of the GuassianNb Classifier: 98.0 %
- 繪圖
plot_classifier(gaussiannb_classifier,x_test,y_test)
交叉驗證
- 知識前備
有100個樣品,其中有83個及格。分類器分出73個認爲及格的,但其中只有65個及格的。
精度(precision)= 65/73
召回率(recall)= 65/83
F1得分(F1 Scole)= 2*精度*召回率/(精度+召回率)
精度和召回率是二律背反的,不能同時具備。
num_validation=5
accuracy = cross_validation.cross_val_score(gaussiannb_classifier,x,y,scoring='accuracy',cv=num_validation)
print("Accuracy = ",round(100*accuracy.mean(),5),'%')
precision = cross_validation.cross_val_score(gaussiannb_classifier,x,y,scoring='precision_weighted',cv=num_validation)
print("Accuracy = ",round(100*precision.mean(),5),'%')
recall = cross_validation.cross_val_score(gaussiannb_classifier,x,y,scoring='recall_weighted',cv=num_validation)
print("Accuracy = ",round(100*recall.mean(),5),'%')
F1 = cross_validation.cross_val_score(gaussiannb_classifier,x,y,scoring='f1_weighted',cv=num_validation)
print("Accuracy = ",round(100*F1.mean(),5),'%')
代碼運行結果:
Accuracy = 99.5 %
Accuracy = 99.52381 %
Accuracy = 99.5 %
Accuracy = 99.49969 %
- 直接提取性能報告
from sklearn.metrics import classification_report
names=['Class 0','Class 1','Class 2','Class 3']
print(classification_report(y_test,y_predict,target_names=names))
代碼運行結果:
precision recall f1-score support
Class 0 0.96 1.00 0.98 23
Class 1 1.00 0.95 0.98 21
Class 2 0.97 1.00 0.99 34
Class 3 1.00 0.95 0.98 22
avg / total 0.98 0.98 0.98 100