sklearn的模型训练与预测

sklearn是强大的python机器学习工具，支持丰富的机器学习算法和数据预处理，在学术界和企业中应用广泛，下面是sklearn的代码编写流程和各种算法使用示例（以分类为例）。

分类任务流程三步走

创建模型对象
训练
预测与性能评价

xgboost算法分类

'''
 * xgboost分类
'''

from classifier import LogRegClassifier
import numpy as np
import json
import math
import time
import os
import random
from sklearn.model_selection import train_test_split
from sklearn import metrics


def main():
    time_begin = time.time()
    # 原始数据（省略）
    data = d.data
    labels = d.labels
    # 数据标准化
    from sklearn.preprocessing import StandardScaler
    data = StandardScaler().fit_transform(data)
    x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.3)

    # 1.创建模型对象
    import sklearn
    from xgboost import XGBClassifier
    clf = XGBClassifier(learning_rate=0.1,
                        n_estimators=1000,  # 树的个数--1000棵树建立xgboost
                        max_depth=6,  # 树的深度
                        min_child_weight=1,  # 叶子节点最小权重
                        gamma=0.,  # 惩罚项中叶子结点个数前的参数
                        subsample=0.8,  # 随机选择80%样本建立决策树
                        colsample_btree=0.8,  # 随机选择80%特征建立决策树
                        objective='multi:softmax',  # 指定损失函数
                        scale_pos_weight=1,  # 解决样本个数不平衡的问题
                        random_state=27  # 随机数
                        )

    # 2.训练
    clf = clf.fit(x_train, y_train, eval_set=[(x_test, y_test)], eval_metric="mlogloss", early_stopping_rounds=10,
                  verbose=True)

    # 3.预测与性能评价
    np.set_printoptions(threshold=np.inf)
    predicted = clf.predict(x_test)
    predicted = np.array(predicted)
    print(metrics.classification_report(y_test, predicted))
    print(metrics.confusion_matrix(y_test, predicted))
    time_end = time.time()
    print("total time is ", time_end-time_begin)


# 程序入口
if __name__ == "__main__":
    main()

随机森林算法分类

n_estimators是随机森林的一个重要调优参数，表示树的个数。

'''
 * 随机森林分类
'''

from classifier import LogRegClassifier
import numpy as np
import json
import math
import time
import os
import random
from sklearn.model_selection import train_test_split
from sklearn import metrics


def main():
    time_begin = time.time()
    # 原始数据（省略）
    data = d.data
    labels = d.labels
    # 数据标准化
    from sklearn.preprocessing import StandardScaler
    data = StandardScaler().fit_transform(data)
    x_train, x_test, y_train, y_test = train_test_split(data, labels, test_size=0.3)

    # 1.创建模型对象
    import sklearn
    from xgboost import XGBClassifier
    clf = sklearn.ensemble.RandomForestClassifier(n_estimators=100)

    # 2.训练
    clf = clf.fit(x_train, y_train, eval_set=[(x_test, y_test)], eval_metric="mlogloss", early_stopping_rounds=10,
                  verbose=True)

    # 3.预测与性能评价
    np.set_printoptions(threshold=np.inf)
    predicted = clf.predict(x_test)
    predicted = np.array(predicted)
    print(metrics.classification_report(y_test, predicted))
    print(metrics.confusion_matrix(y_test, predicted))
    time_end = time.time()
    print("total time is ", time_end-time_begin)


# 程序入口
if __name__ == "__main__":
    main()

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

sklearn的模型训练与预测

目录

sklearn的模型训练与预测

分类任务流程三步走

xgboost算法分类

随机森林算法分类

10分钟搞定Mysql主从部署配置

如何使用 JS 判断用户是否处于活跃状态

「Pygors跨平台GUI」2：安装MinGW-w64、MSYS2还是WSL2

[转帖]

python列出centos7内存使用前50的进程信息

「Pygors跨平台GUI」1：Pygors跨平台GUI应用研究

一键自动化博客发布工具,用过的人都说好(掘金篇)

lightdb数据库超时相关控制参数

lightdb秒级增加列和删除列（not null带默认值）

Java ThreadPoolShutdown

dpdk使用筆記

Python生成文檔向量

查找電腦某個應用的端口號（測試系統：windows7）

python腳本：自動啓動windows軟件

JAVA WEB調試記錄

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結