Kaggle競賽PetFinder日記,第2天:隨機森林,調參

使用隨機森林,調參,類似這樣,找到最大點,結果稍微提高了一兩個百分點,Train內測試都在4.0以上了。

由於xgboost在我機器上跑的慢,所以,不再選用。

加上網上找來的各省人口,人均GDP數據

GDPAVG  =  {41336: 99, 41325: 99, 41367: 80, 41401: 662, 41415: 7327, 41324: 582, 41332: 404, 41335: 225, 41330: 119, 41380: 934, 41327: 309, 41345: 73, 41342: 93, 41326: 187, 41361: 255}
Population = {41336: 346, 41325: 204, 41367: 168, 41401: 168, 41415: 9, 41324: 79, 41332: 103, 41335: 157, 41330: 244, 41380: 25, 41327: 161, 41345: 327, 41342: 528, 41326: 256, 41361: 115}
df['GDPAVG'] = df['State'].map(GDPAVG)
df['Population'] = df['State'].map(Population)

import matplotlib.pyplot as plt
test = []
ranges= range(2,20)
for i in ranges:
    rfc = RandomForestClassifier(n_estimators=230
                                 ,max_depth= 11
                                 ,max_features=4
                                 ,min_samples_split=10
                                 ,random_state=10
                                 ,min_samples_leaf=i
                                 )
    rfc.fit(x_train, y_train)
    # rfc_y_predict = rfc.predict(x_test)
    score = rfc.score(x_test, y_test)
    test.append(score)
plt.plot(ranges,test,color="red",label="min_samples_leaf")
plt.legend()
plt.show()

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章