xgboost快速入門

原創

lingerlanlan

2020-02-26 07:02

xgboost快速入門

xgboost是gbdt算法的實現，可以做迴歸，分類，和排序。支持各種語言調用，支持單機和分佈式。非常適合於大規模數據集。

項目主頁

https://github.com/dmlc/xgboost

安裝https://github.com/dmlc/xgboost/blob/master/doc/python/python_intro.md

我選擇了python調用xgboost的方式。

1 從項目主頁下載源碼，解壓。

2 在解壓後的目錄下執行make命令安裝。

3 在子文件夾python-package目錄下，執行pythonsetup.py install。

當然，你的電腦可能會缺失一些依賴庫需要安裝。比如在步驟二需要你安裝g++，在步驟三需要你安全python的一些數學庫。

分類算法實踐

https://github.com/dmlc/xgboost/tree/master/demo/guide-python

這個頁面有很多demo都值得研究一下。

下面是一個二分類的問題的具體做法。

首先，輸入數據仍然支持libsvm的格式，這也是比較喜歡的一個格式。

每一行都是

label index1:value1 index2:value2……

的格式。

不過xgboost對label的有個要求，就是要從0開始。

比如2分類，label只能是0,1。

3分類，label只能是0,1,2。

#! /usr/bin/python
import numpy as np
import xgboost as xgb


dtrain = xgb.DMatrix('train.txt')
dtest = xgb.DMatrix('test.txt')


# specify parameters via map, definition are same as c++ version
param = {'max_depth':22, 'eta':0.1, 'silent':0, 'objective':'binary:logistic','min_child_weight':3,'gamma':14 }

# specify validations set to watch performance
watchlist  = [(dtest,'eval'), (dtrain,'train')]
num_round = 33
bst = xgb.train(param, dtrain, num_round, watchlist)

# this is prediction
preds = bst.predict(dtest)
labels = dtest.get_label()
print ('error=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)!=labels[i]) /float(len(preds))))

print ('correct=%f' % ( sum(1 for i in range(len(preds)) if int(preds[i]>0.5)==labels[i]) /float(len(preds))))

本文作者:linger

本文鏈接：http://blog.csdn.net/lingerlanlan/article/details/49804551

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

xgboost快速入門

一天一段scala代碼（十五）

一天一段scala代碼（九）

一天一段scala代碼（八）

map-reduce入門

Numpy數組的序列化和反序列化

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結