下載數據集

天池練習賽"工業蒸汽量預測"，下個數據集：https://tianchi.aliyun.com/competition/entrance/231693/introduction

安裝H2O

H2O requirements：

pip install requests
pip install tabulate
pip install "colorama>=0.3.8"
pip install future

install H2O：

pip install -f http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o

訓練模型並預測

import h2o

from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator

# 初始化H2O
h2o.init()

# 讀數據集
col_types = ["numeric"]*39 # 列數
data = h2o.import_file('zhengqi_train.txt',sep='\t', col_types=col_types)
out = h2o.import_file('zhengqi_test.txt',sep='\t')

#切分數據集用以訓練模型
train, test = data.split_frame(ratios=[.7], seed=1) 

# 列名賦值
x = train.columns
y = "target"
x.remove(y)

# 訓練模型
nfolds = 7
gbm = H2OGradientBoostingEstimator(nfolds=nfolds,
                                   fold_assignment="Modulo",
                                   keep_cross_validation_predictions=True)
gbm.train(x=x, y=y, training_frame=train)
rf = H2ORandomForestEstimator(nfolds=nfolds,
                              fold_assignment="Modulo",
                              keep_cross_validation_predictions=True)
rf.train(x=x, y=y, training_frame=train)
stack = H2OStackedEnsembleEstimator(model_id="ensemble",
                                    training_frame=train,
                                    validation_frame=test,
                                    base_models=[gbm.model_id, rf.model_id])
stack.train(x=x, y=y, training_frame=train, validation_frame=test)
stack.model_performance()


# 預測並保存待提交結果
result = stack.predict(out)
result = result.as_data_frame()['predict'].to_list()

with open('result_h2o.txt', 'w') as f:
    for i in result:
        f.write("{}\n".format(i))

# h2o.export_file(result,'result_h2o.txt',sep = "\n",parts = 1)

h2o.shutdown()

提交結果

直接不做任何特徵工程，超過了這個練習賽86%的隊伍！

看來H2O還是可以的，接下來用Spark結合H2O跑大數據試試

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用H2O機器學習"十分鐘"提交天池練習賽--工業蒸汽量預測，超過86%的隊伍

下載數據集

安裝H2O

訓練模型並預測

提交結果

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

【2024-05-21】以茶會友

通過MultiLabelBinarizer進行multi-label分類任務的數據預處理

dataframe常用操作筆記

sh xxx.py 報錯xxx.sh : Syntax error: Bad for loop variable

免密登錄遠程服務器（SSH免密登錄）

python通過pandas讀取格式爲xlsx的excel文件

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結