XGBoost解決多分類問題

寫在前面的話

XGBoost官方給的二分類問題的例子是區別蘑菇有無毒，數據集和代碼都可以在xgboost中的demo文件夾對應找到，我是用的Anaconda安裝的XGBoost，實現起來比較容易。唯一的梗就是在終端中運行所給命令： ../../xgboost mushroom.conf 時會報錯，是路徑設置的問題，所以我乾脆把xgboost文件夾下的xgboost.exe拷到了mushroom.conf配置文件所在文件夾下，這樣直接定位到該文件夾下就可以運行： xgboost mushroom.conf。二分類數據預處理，也就是data wraggling部分的代碼有一定的借鑑意義，值得一看。

多分類問題給的例子是根據34個特徵識別6種皮膚病，由於終端中運行runexp.sh沒有反應，也不報錯，所以我乾脆把數據集下載到對應的demo文件夾下了,主要的代碼如下，原來有部分比較難懂的語句我自己加了一些註釋，這樣理解起來就會順暢多了。

[python]view
plaincopy

#! /usr/bin/python  

import numpy as np  

import xgboost as xgb  

# label need to be 0 to num_class -1  

# if col 33 is '?' let it be 1 else 0, col 34 substract 1  

data = np.loadtxt('./dermatology.data', delimiter=',',converters={33: lambda x:int(x == '?'), 34: lambda x:int(x)-1 } )  

sz = data.shape  

train = data[:int(sz[0] * 0.7), :] # take row 1-256 as training set  

test = data[int(sz[0] * 0.7):, :]  # take row 257-366 as testing set  

train_X = train[:,0:33]  

train_Y = train[:, 34]  

test_X = test[:,0:33]  

test_Y = test[:, 34]  

xg_train = xgb.DMatrix( train_X, label=train_Y)  

xg_test = xgb.DMatrix(test_X, label=test_Y)  

# setup parameters for xgboost  

param = {}  

# use softmax multi-class classification  

param['objective'] = 'multi:softmax'  

# scale weight of positive examples  

param['eta'] = 0.1  

param['max_depth'] = 6  

param['silent'] = 1  

param['nthread'] = 4  

param['num_class'] = 6  

watchlist = [ (xg_train,'train'), (xg_test, 'test') ]  

num_round = 5  

bst = xgb.train(param, xg_train, num_round, watchlist );  

# get prediction  

pred = bst.predict( xg_test );  

print ('predicting, classification error=%f' % (sum( int(pred[i]) != test_Y[i] for i in range(len(test_Y))) / float(len(test_Y)) ))  

# do the same thing again, but output probabilities  

param['objective'] = 'multi:softprob'  

bst = xgb.train(param, xg_train, num_round, watchlist );  

# Note: this convention has been changed since xgboost-unity  

# get prediction, this is in 1D array, need reshape to (ndata, nclass)  

yprob = bst.predict( xg_test ).reshape( test_Y.shape[0], 6 )  

ylabel = np.argmax(yprob, axis=1)  # return the index of the biggest pro  

print ('predicting, classification error=%f' % (sum( int(ylabel[i]) != test_Y[i] for i in range(len(test_Y))) / float(len(test_Y)) ))

結果如下：

[python]view
plaincopy

[0] train-merror:0.011719   test-merror:0.127273  

[1] train-merror:0.015625   test-merror:0.127273  

[2] train-merror:0.011719   test-merror:0.109091  

[3] train-merror:0.007812   test-merror:0.081818  

[4] train-merror:0.007812   test-merror:0.090909  

predicting, classification error=0.090909  

[0] train-merror:0.011719   test-merror:0.127273  

[1] train-merror:0.015625   test-merror:0.127273  

[2] train-merror:0.011719   test-merror:0.109091  

[3] train-merror:0.007812   test-merror:0.081818  

[4] train-merror:0.007812   test-merror:0.090909  

predicting, classification error=0.090909

不管是直接返回診斷類型，還是返回各類型的概率，然後取概率最大的那個對應的類型的index，結果都是一樣的。

Michael_Shentu

發佈了148 篇原創文章 · 獲贊 277 · 訪問量 114萬+

他的留言板關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

XGBoost解決多分類問題

XGBoost解決多分類問題

寫在前面的話

使用c#強大的表達式樹實現對象的深克隆之解決循環引用的問題

GPT-4o 引領人機交互新風向，向量數據庫賽道沸騰了

free AI online tools All In One

痞子衡嵌入式：恩智浦i.MX RT1xxx系列MCU啓動那些事（12.A）- uSDHC eMMC啓動時間(RT1170)

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（二）使用kube-vip實現集羣VIP訪問

企業大模型如何成爲自己數據的“百科全書”？

本地SSL證書過期輸入命令在IIS自動生成

.NET週刊【5月第2期 2024-05-12】

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（一）部署K8s

基於Ubuntu-22.04安裝K8s-v1.28.2實驗（三）數據卷掛載NFS（網絡文件系統）

變量更新和控制依賴

tensorflow的運行流程與核心關鍵概念含義介紹

TensorFlow中的tf.nn.softmax_cross_entropy_with_logits 交叉熵損失函數

Tushare社區介紹推廣

XGBoost解決多分類問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結