XGBoost 入門實戰 - xgboost的基礎應用

原創

2020-06-16 10:36

"""
本節內容
* 直接使用XGBoost訓練毒蘑菇分類
"""
import xgboost as xgb
from sklearn.metrics import accuracy_score  # 計算分類正確率
import time
from matplotlib import pyplot
import graphviz

"""
整體流程
1、構造學習器
2、訓練模型
3、預測
"""

# 數據讀取
"""
* XGBoost可以加載libsvm格式的文本數據，
* libsvm的文件格式（稀疏特徵）： 
* 1 3:1 10:1 11:1 21:1 30:1 34:1
* 0 3:1 10:1 20:1 21:1 23:1 34:1
* 開頭0/1表示正負樣本，也支持[0,1]表示概率用來做標籤，表示爲正樣本的概率
* 3、10是feature的key，1、1是對應的value
* XGBoost加載的數據存儲在對象DMatri x中 XGBoost自定義了一個數據矩陣類DMatrix，優化了存儲和運算速度
"""
work_path = '../data/'
dtrain = xgb.DMatrix(work_path + 'agaricus.txt.train')
dtest = xgb.DMatrix(work_path + 'agaricus.txt.test')

# 查看數據情況
print(dtrain.num_col())  # 127
print(dtrain.num_row())  # 6513
print(dtest.num_row())  # 1611

# 訓練參數設置
param = {'max_depth': 2,
         'eta': 1,
         'objective': 'binary:logistic'}
"""
* max_depth:樹的最大深度，缺省值爲6，取值範圍[1,∞]
* eta:防止過擬合，更新過程中使用到收縮步長，在每次提升計算之後，算法會直接獲得新特徵的權重，缺省值0.3，範圍[0，1]
* objective:定義學習任務及相應學習目標，'binary:logistic'表示二分類的邏輯迴歸問題，輸出爲概率
"""

# 訓練模型
num_boost_round = 2   # boosting迭代計算次數
start_time = time.process_time()
bst = xgb.train(param, dtrain, num_boost_round)
end_time = time.process_time()
print(end_time - start_time)    # 訓練時間0.0215

# 查看訓練模型準確度
# 獲取預測值
train_preds = bst.predict(dtrain)
# XGBoost預測的輸出是概率。這裏蘑菇分類是一個二類分類問題，輸出值是樣本爲第一類的概率。 我們需要將概率值轉換爲0或1
train_predictions = [round(value) for value in train_preds]
# 獲取label，label有監督學習
y_train = dtrain.get_label()
# 計算預測值和label的正確率
train_accuracy = accuracy_score(y_train, train_predictions)
print("Train Accuary: %.2f%%" % (train_accuracy * 100.0))       # 97.77%

# 對測試數據進行測試
test_preds = bst.predict(dtest)
test_predictions = [round(value) for value in test_preds]
y_test = dtest.get_label()
test_accuracy = accuracy_score(y_test, test_predictions)
print("Test Accuary: %.2f%%" % (test_accuracy * 100.0))     # 97.83%

# 模型可視化
"""
* 調用XGBoost工具包中的plot_tree，在顯示 要可視化模型需要安裝graphviz軟件包
"""
xgb.plot_tree(bst, num_trees=0, rankdir='LR')
pyplot.show()

"""
* max_depth = 2, num_round = 2
* train正確率：97.77%, test正確率：97.83%

* max_depth = 3, num_round = 2
* train正確率：99.88%, test正確率：100%
"""

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

XGBoost 入門實戰 - xgboost的基礎應用

前端使用 Konva 實現可視化設計器（13）- 折線 - 最優路徑應用【思路篇】

XGBoost 入門實戰 - 配合sklearn應用

php Interpreter is not configured

iPhone X適配最簡單粗暴的

iOS11適配 tableView頂部多一塊 cell高度錯誤

神器Anaconda

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結