使用H2O機器學習"十分鐘"提交天池練習賽--工業蒸汽量預測,超過86%的隊伍

試用一下H2O全自動機器學習

 

下載數據集

天池練習賽"工業蒸汽量預測",下個數據集:https://tianchi.aliyun.com/competition/entrance/231693/introduction

 

安裝H2O

H2O requirements:

pip install requests
pip install tabulate
pip install "colorama>=0.3.8"
pip install future

install H2O:

pip install -f http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o

 

訓練模型並預測

import h2o

from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator

# 初始化H2O
h2o.init()

# 讀數據集
col_types = ["numeric"]*39 # 列數
data = h2o.import_file('zhengqi_train.txt',sep='\t', col_types=col_types)
out = h2o.import_file('zhengqi_test.txt',sep='\t')

#切分數據集用以訓練模型
train, test = data.split_frame(ratios=[.7], seed=1) 

# 列名賦值
x = train.columns
y = "target"
x.remove(y)

# 訓練模型
nfolds = 7
gbm = H2OGradientBoostingEstimator(nfolds=nfolds,
                                   fold_assignment="Modulo",
                                   keep_cross_validation_predictions=True)
gbm.train(x=x, y=y, training_frame=train)
rf = H2ORandomForestEstimator(nfolds=nfolds,
                              fold_assignment="Modulo",
                              keep_cross_validation_predictions=True)
rf.train(x=x, y=y, training_frame=train)
stack = H2OStackedEnsembleEstimator(model_id="ensemble",
                                    training_frame=train,
                                    validation_frame=test,
                                    base_models=[gbm.model_id, rf.model_id])
stack.train(x=x, y=y, training_frame=train, validation_frame=test)
stack.model_performance()


# 預測並保存待提交結果
result = stack.predict(out)
result = result.as_data_frame()['predict'].to_list()

with open('result_h2o.txt', 'w') as f:
    for i in result:
        f.write("{}\n".format(i))

# h2o.export_file(result,'result_h2o.txt',sep = "\n",parts = 1)

h2o.shutdown()

提交結果

直接不做任何特徵工程,超過了這個練習賽86%的隊伍!

 

看來H2O還是可以的,接下來用Spark結合H2O跑大數據試試

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章