xgboost项目实战

原創

2020-02-21 17:42

import xgboost as xgb
start_time = time.time()
offline = 0
online = 0
params = {'booster': 'gbtree',
          'objective': 'rank:pairwise',
          'eval_metric': 'auc',
          'gamma': 0.1,
          'min_child_weight': 1.1,
          'max_depth': 7,
          'lambda': 10,
          'subsample': 0.7,
          'colsample_bytree': 0.7,
          'colsample_bylevel': 0.7,
          'eta': 0.01,
          'tree_method': 'exact',
          'seed': 1000,
          'nthread': 12
          }

params1 = {
    'booster': 'gbtree',
    'objective': 'binary:logistic',
    'scale_pos_weight': 1 / 7.5,
    # 7183正样本
    # 55596条总样本
    # 差不多1:7.7这样子
    'gamma': 0.2,  # 用于控制是否后剪枝的参数,越大越保守，一般0.1、0.2这样子。
    'max_depth': 8,  # 构建树的深度，越大越容易过拟合
    'lambda': 3,  # 控制模型复杂度的权重值的L2正则化项参数，参数越大，模型越不容易过拟合。
    'subsample': 0.7,  # 随机采样训练样本
    # 'colsample_bytree':0.7, # 生成树时进行的列采样
    'min_child_weight': 3,
    # 这个参数默认是 1，是每个叶子里面 h 的和至少是多少，对正负样本不均衡时的 0-1 分类而言
    # ，假设 h 在 0.01 附近，min_child_weight 为 1 意味着叶子节点中最少需要包含 100 个样本。
    # 这个参数非常影响结果，控制叶子节点中二阶导的和的最小值，该参数值越小，越容易 overfitting。
    'silent': 0,  # 设置成1则没有运行信息输出，最好是设置为0.
    'eta': 0.03,  # 如同学习率
    'seed': 1000,
    'nthread': 12,  # cpu 线程数
    'eval_metric': 'auc'
}
train = tabel
plst = list(params.items())
num_rounds = 5000  # 迭代次数

y = train['标签']
X = train.drop(['标签', '用户标识'], axis=1)
# X=train[feature_list]

xgb_train = xgb.DMatrix(X, label=y)
watchlist = [(xgb_train, 'train'),(xgb_train, 'val')]
print("跑到这里了xgb.train")
# training model
# early_stopping_rounds 当设置的迭代次数较大时，early_stopping_rounds 可在一定的迭代次数内准确率没有提升就停止训练
model = xgb.train(plst, xgb_train, num_boost_round=7000, evals=watchlist, early_stopping_rounds=500)
print("跑到这里了save_model")
model.save_model('20170201_B.model')  # 用于存储训练出的模型

Alphapeople

发布了222 篇原创文章 · 获赞 47 · 访问量 7万+

私信关注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

xgboost项目实战

MySQL 核心模块揭秘 | 18 期 | 锁在内存里长什么样*

使用perf工具生成火焰图

大龄程序员思考

响应式界面控件DevExtreme * 更强的数据分析和可视化功能

HttpSecurity 是如何组装过滤器链的

数说海南——近6年海南各市县人口简单看

长序列中Transformers的高级注意力机制总结

WebStorm 创建 Vue 项目

tornado部署

transformer代碼

NLP文本的表示

matplotlib條形圖

labelme轉換

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結